Friday, December 9, 2011

Unit tests - a useful thing causing occasional pain

For whoever has never heard that expression, unit tests are a way to test the functionality of a program or library in an automated way, by for example checking the evaluation result of test data. This is a great thing and can help massively in identifying coding mistakes. CMake provides a mechanism, and in many KDE packages test routines are included.
So far, so good, this is the theory, now let's look at reality. In Gentoo, the main KDE distribution is split into 295 (!) packages. Many dont have any tests (oxygen-icons comes to my mind as a nice example). Of those that have, 37 now have all the test routines hard disabled, and I guess a few are missing that I still have to add to the list. Why? Well, first of all, you have to know that compilation usually runs in Gentoo as an unprivileged user detached from any graphical desktop session, and this user also runs the test routines. In addition, tests should also run fine "offline", i.e. without internet connection. One of our developers is running a great thing called a "tinderbox", where build and unit tests in various configurations are automated- in an isolated environment, and possibly even without network access. So, what can go wrong?
  • The tests need X11. Well, actually we can get around this, with a nice hack called virtualx. A background framebuffer X server is started for the test phase only, and the tests are redirected to that display. Works even sometimes. :o)
  • The tests need a dbus session bus. I think we could get around this too, but nobody bothered to do it so far and I'm not a dbus expert. 
  • The tests need an entire KDE desktop to interact with. Yes, that happens, and unfortunately, we have to disable them in this case, see above. 
  • The tests try to download a data set from, say, IMDB, CDDB, or similar. Bad, bad, bad, see above.
  • Next, the tests pop up dialog boxes asking for user interaction. A window asking for a GnuPG passphrase is my personal favourite... 
  • And last, what I particularly like is a unit test that fails... and after digging into the test log, I see a fat comment "This does not work yet and needs to be fixed!"
I guess at least part of the above points is a matter of policy. Test routines requiring user interaction can provide valid information, but of course they are not particularly useful for automated testing... Anyway...

PS. This blog post took slightly longer to write since a libreoffice-3.5 build in the background went berserk... load 65, >180 concurrent compilers... :D

8 comments:

  1. Very good blog post to make the kde test situation visible. My opinions about unit tests: Better have no tests rather than bad written or unmaintained tests.

    ReplyDelete
  2. personally I don't see how packagers should be concerned here? unit tests are aimed at developers, why do you run them at all? here are a few comments though:

    regarding X11: run xvfb. if you test user interaction (raw mouse stuff, squish stuff), you'll need x11.

    regarding dbus: if you test something that requires dbus, then there is nothing you can do about that - better have a test that requires dbus than no test at all.

    regarding failing unit tests: developers, use Q_EXPECT_FAIL in code or set_tests_properties(mytest PROPERTIES WILL_FAIL) in cmake

    regarding downloading data: if you test kio stuff e.g. then what else should you do? otherwise of course shipping test data with the unit test should be preferred.

    but again, I don't really see the point in the blog post. unit tests are *always* good - but only aimed at developers. "make install" doesn't run them on purpose.

    bye

    ReplyDelete
  3. @milianw: it is quite a standard practice for packagers to run automated tests at build time. That can help detecting bugs before actually shipping packages.

    Regarding dbus, I can suggest two options:

    - Start your own dbus session. It all boils down to running "export $(dbus-launch)" before running your tests.

    - Use https://launchpad.net/dbus-test-runner/

    ReplyDelete
  4. By definition, any test that depends on any of: X11, dbus, a network connection, or user input is NOT a unit test. Unit tests that don't depend on any of those are easy to write, easy to maintain, and catch the most bugs with the least false positives.

    Now I have no problem with integration tests that rely on any of the above. However they are not unit tests, and are subject to a lot of false positives causing someone to turn them off.

    ReplyDelete
  5. i guess the writer confuses unit tests and the generalized term test-suite. If you apply the V-model the unit-tests should never reach the end-user (here packages) but packagers might want to do their own acceptance test most likely running a provided test-suite to ensure that the software works as expected (intergration included). Hank differently got a point and here is a second, unit tests are mostly used internally against components/modules as part of the development (and later regression testing) and _can_ later be used as a part of a test suite as a part of acceptance test. But most of the time you use integration or system tests for that. There is a 3. point being that unit test have become a buzzword so there is a a lot of incorrect information about it i guess.

    And we cant all be educated testers can we?

    ReplyDelete
  6. I don't think Gentoo users should run tests at all. The packagers may want to run the supplied tests to verify the package before bumping an ebuild or something, but end users shouldn't have to handle that.

    ReplyDelete
  7. In an ideal world, with hardware that couldn't cause problems and no undocumented incompatibilities between packages, Gentoo users wouldn't need to run tests. But because we don't live in that ideal world---sometimes subtle hardware failures cause a broken program or library to be installed, and some incompatibilities become obvious at runtime but are ignored by the preparation and compilation phases---two kinds of situations that tests (even unit tests) are ideal for diagnosing. If its test suite ever passed, glibc would be a good example to point out---the package manager won't let you downgrade it if a broken version gets installed, and if a version is broken enough it can affect every other package on the system, so if every single version in the tree didn't fail its tests whether it worked or not, I would say that every user should run tests for at least glibc.

    While the adage "users shouldn't have to run tests" is true enough for binary distributions---where installing a package simply involves downloading the package, perhaps cryptographically verifying it, unpacking it, and copying the files into place, every user of a source-based distribution like Gentoo is in effect the packager of his or her own system; isn't one of Gentoo's taglines "the meta-distribution"?

    It's simple enough for packagers on Debian, Ubuntu, or Red Hat to verify a package and identify any collisions or breakages by simply installing various versions of various packages together---or even by static analysis of the packages in the archive---though this is complicated somewhat by the existence of many independent repositories (e.g. PPAs). The equivalent on Gentoo---the tinderbox---is a Sisyphean task, especially since many packages have several stable versions and even more ~arch versions---to say nothing of the immense complexity introduced by USE flags, profiles, CFLAGS, and the like. Even if all the Gentoo developers checked their version bumps and stabilization candidates as diligently as would be ideal, I still wouldn't want to trust their choices for my systems without verifying the packages that I built for myself.

    However, I agree with the post: tests requiring user interaction, network access, the kernel module being tested to be in the kernel (dev-util/exmap), or something else that can't easily be emulated by the test framework should be disabled as a matter of course (preferably granularly). On the other hand, perhaps a FEATURES or (restricted-by-default) USE "interactive-test" to let users who want to be sure run the interactive or network-accessing tests?

    ReplyDelete