Colin Watson
   


About
Colin Watson's blog
cjwatson@debian.org

Subscribe
Subscribe to a syndicated feed of my weblog, brought to you by the wonders of RSS.

Flavours
There's more than one way to view this weblog; try these flavours on for size.


Powered by Blosxom

       
Tue, 29 Jul 2008

Why annoyances exist

(This post is inspired by this post on ArsGeek, hence the title. Thanks to Scott James Remnant for some of the ideas in this post.)

I'll get to the details of ArsGeek's post in a moment, but really the more interesting part of it is the discussion of how we sometimes end up releasing software with (from various people's perspectives) serious bugs, a question of release management. As a member of the Ubuntu release team, I thought I'd respond to this and try to clarify what goes on.

Release management strategies

Firstly, I know of no way to ensure a bug-free release of a system the size of a modern GNU/Linux distribution, with software coming from a huge variety of sources with all kinds of different QA standards, not to mention plenty of sources within the distribution itself where new bugs can be introduced. Indeed, ensuring releases free of serious bugs is rather more of an art than a science. What may be surprising is that, for a composite distribution of software from lots of sources, it doesn't actually seem to make a whole lot of difference to bug density whether you choose to whack all the bugs out before picking a release date, or just pick a release date up-front and tell everyone to aim for that (as long as people believe your release dates; more on this later). I can't offer hard evidence for this, only the gut feel of somebody with experience of both approaches, but it does seem to me to be the case.

The idea of releasing "when it's ready" is a very appealing one. I have some experience here: I used to be one of Debian's release managers too, which is just about as far to that end of the scale as you get among GNU/Linux distributions. In theory, you put all the software into a big pot, stir it up, collect all the bugs, and at some point when it smells reasonably good you get everyone to whack away at the bugs in turn until they're all gone. Then you release.

In theory, theory and practice are the same thing ...

In practice, what happens is that there are a number of counterbalancing pressures. Developers don't want to be tied down waiting for all those other lazy developers who haven't fixed their bugs yet; users want the latest version of their favourite application which has some new and shiny feature; security support for the last release is getting harder as it gets older and the backport distance increases; people with new hardware want a new release that will support it; and so on. The longer you spend fixing bugs, the more the internal pressures show, and the harder it gets. Thus, even Debian compromises on this, and many of the bugs that are "fixed" shortly before release are in fact dealt with by removing leaf packages from the distribution, documenting workarounds, or simply deciding that they aren't release-critical. And this is only for the highest-severity bugs! Many other bugs, by straightforward necessity, are simply not important enough to hold up the release of the whole distribution.

Now, Ubuntu opted for the time-based approach up front. You pick a date and you stick to it as if your life depended on it. Within that constraint, you figure out (read: guess) when new upstream software and major new features have to land in order to be able to fix the expected new bugs they introduce, and make case-by-case decisions on anything that slips past the intermediate deadlines you impose. You have to say "no" a lot, and you spend a lot of time in planning. Nowadays, this is familiar to most people: you don't delay a time-based release for new features.

The hard bit comes when things go wrong, and then you find that you can't delay a time-based release to fix bugs either! If you slip your deadlines as a matter of course, then nobody will believe your release dates next time, and you'll start finding that people ignore the internal deadlines because "hey, they're not going to release on time anyway"; it all goes to pot very quickly. (We saw this in Debian: every time a published release date slipped, interest in putting in time to meet the next one diminished.) Thus you typically have a difficult choice: you can roll back to the last-known-good version (if this is even possible; often enough, interdependencies mean you'd end up rolling back a whole subsystem in order to get back to something that worked, and figuring out whether that's worth it can be harder than just fixing the bug to start with), or you can document the new bugs that have been introduced. Each choice will make somebody unhappy, as one group of people expects the latest-and-greatest software, and another expects stability. Neither group is intrinsically wrong, but you can't satisfy both of them in the presence of buggy software.

Of course, there is a third choice: fix the bug already! In rather a lot of cases, we do, and we also spend a great deal of time triaging and organising bugs so that developers can focus on the most important ones first. In a time-based world, though, the clock is ticking and you have finite resources. If you have any, you can get your paid staff to focus on the really critical issues, and of course we do that; many of the Ubuntu team who are employed by Canonical pull some pretty long hours around release time. (As a side note, time-based releases seem to work a bit better when paid developers are involved because you can get away with doing this.) But all those paid staff tend to have extensive demands on their time, and so some things usually slip through the net.

Ubuntu 8.04 decisions

So, we have a bunch of tough decisions to make. I don't especially expect this to attract sympathy: after all, if it were easy, everyone would be doing it. Nevertheless, we have to make tough decisions all the time where there may not necessarily be a right answer, and they are an inevitable consequence of a time-based release process.

When this kind of thing goes wrong, people rightly criticise us; as ArsGeek says, the buck stops with Ubuntu and Canonical as far as Ubuntu users are concerned. It often isn't as straightforward as it looks, though, since you don't get a practical demonstration of what would have gone wrong the other way round. If you look at it purely on the basis of bugs introduced, it seems clear that we should have held back on switching to GVFS (and GIO) and instead stuck with GNOME-VFS. The VFS layer is at the core of GNOME, though. Sticking with GNOME-VFS would mean either that we'd have to be using something that the upstream developers were no longer putting much effort into supporting, very probably resulting in other regressions, or we'd have had to ship GNOME 2.18 in Ubuntu 8.04 as we did in 7.10. This would be a big deal: the update in 8.10 would be fiendishly difficult because now we'd be behind, security support would be harder to achieve, software developers would find that software developed on other systems of around the same vintage might not even build on Ubuntu, and many users would be unhappy because they'd be missing out on new features introduced in GNOME 2.20.

Instead, what we decided to do was to take the pain and do what we could to mitigate it, and, partly as a result of this, we handled the 8.04.1 point release somewhat differently from how we've handled similar point releases in the past. We set a date for 8.04.1 in advance, and a number of people worked solely towards that. Canonical sponsored work in GNOME to help iron out certain GVFS bugs that had already been raised as major omissions in the new framework. We agreed that 8.04.1 would be the first in a series of six-monthly point releases of 8.04 (out of phase with the main Ubuntu release cycle) so that users would be able to plan around it.

There were various other major decisions in Ubuntu 8.04, some of which made headlines and some of which didn't. Here are a couple of examples with contrasting results:

Firefox 3.0
Even though our Firefox maintainer is involved with upstream security work, we weren't comfortable with declaring that we were going to support Firefox 2 for three years when the likelihood was that the rug was going to be pulled out from under us on such a large, complex, and security-critical package. Being forced to upgrade to Firefox 3 for security support midway through an LTS cycle would be much, much worse than some teething troubles caused by starting out with it. We knew that we were going to end up shipping with a beta and decided that it was the lesser of two evils.
GDM
GDM was rewritten upstream, and the rewrite formed part of the GNOME 2.20 release. Nevertheless, we opted not to ship with it. We knew of a number of missing features and regressions that were going to take a long time to clear up in the rewrite. While GDM itself is important, its precise version is not very tightly interconnected with anything else, so it was quite feasible to hold it back at its previous upstream version.

Specific problems

ArsGeek raised several specific annoyances, so I'll respond to those while I'm here.

  • Drag-and-drop of themes into the Appearances dialog doesn't work.

    Amazingly, it seems that nobody had filed a bug about this yet, so no wonder it didn't get fixed. I've filed bug 252885 and we'll see what we can do.

  • Configuration of shared folders doesn't work out of the box.

    There seems to be a good deal of confusion here. For instance, it was claimed that nautilus-share should be installed out of the box, but in fact it is in Ubuntu 8.04 (you can tell by running apt-cache show nautilus-share and observing Task: ubuntu-desktop). Thomas installed using the 8.04 beta, when it wasn't installed by default.

    The core problem is that the user isn't in the sambashare group by default (this is most of the reason you have to log out and log back in, in order to acquire the new group membership). For Intrepid we've fixed this in the installer. However, we haven't yet fixed the other part of this: installing file sharing software also requires installing libpam-smbpass, and before you can do any non-anonymous sharing it needs to see your password so that it can sync up the smb password database. I'd like to get this sorted out really cleanly in Intrepid before backporting the essential pieces to the 8.04 series.

  • nautilus-share doesn't permit setting the workgroup name.

    This is bug 214720. It's perhaps worth noting that Steve's comment on that bug was simply categorisation, not rejecting it out of hand; and indeed he acknowledges that more should be done here. The aggrieved commenter took it as a rejection when I don't think that was intended.

    As Steve points out, a fix that actually covers all the reasonable possibilities is rather more complicated than just adding a Workgroup entry box. Having the wrong workgroup simply means a couple of extra levels of indirection when browsing shares; having the wrong authentication settings will completely break sharing, so in fact workgroup is probably the least urgent smb.conf configuration setting to offer. Rather than adding complexity to nautilus-share, we probably ought to have a proper configuration interface in System -> Administration.

  • Can't browse network when using "Connect to server".

    This seems to be something on which users disagree. For example, GNOME bug 171218 includes a comment saying that the feature is available from many other places and is rather inconsistent here. As GNOME bug 486101 notes, the button in "Connect to server" was not specialised enough to be much more useful than Places -> Network anyway.

    Again, nobody seemed to have filed a bug about the removal yet. I don't feel strongly about the removal itself, but have filed bug 252904 on the grounds of the inconsistency with printing. (I also filed bug 252907 since the documentation is out of date.)

  • Desktop SSH connections are horribly broken.

    There were a number of problems with the GVFS FUSE backend (a system that allows attaching anything that GVFS can understand directly to the Unix file system, and so extends the reach of the desktop virtual file system even to non-GNOME applications - this should be a big win for consistency). Bugs 211205 and 235326 seem to have been the worst ones here; those were fixed in Ubuntu 8.04.1, and I believe that gvfs-fuse-daemon is now pretty stable. Certainly it's working well for me. If there are still problems, we'd appreciate bug reports about them.

  • Default location of mounted volumes changed from $HOME to /.

    I tend to agree that / is awkward; ArsGeek already linked to an upstream bug for this. That said, this doesn't seem like a regression in Ubuntu 8.04; bug 30039 has been open since early 2006.

  • Update Manager sometimes finds more updates after updating and pressing Check.

    I'm not sure exactly what ArsGeek means here; did he press Check before installing updates, then press Check afterwards and find out that there were more updates available? (If so, that's probably just because we're such industrious folk that updates sometimes arrive on the mirrors in the meantime!) Or did he not press Check beforehand? (If so, then Update Manager doesn't force an update so that it can be responsive immediately - doing otherwise would certainly attract complaints from others - but you can always press Check first, or else wait until tomorrow and get the updates then, after the daily cron job has automatically done the equivalent of Check.) Or was it something weirder? We'd need to see a detailed description of an example session where this goes wrong, I think.

    There are some special cases like bug 249220 (updates not automatically checked after suspend/resume) that ArsGeek could potentially be running into here.

  • Desktop background is fuzzy at 1920x1200.

    As far as I can tell, nobody filed a bug about this. I've filed bug 252925.

In many of these cases, I noticed that nobody had actually filed a bug to tell us about the problem. While of course we do as much testing as we can internally, and you can always say "oh, that's obvious, they should have tested that one", we still don't have anything like the number of people you'd need to do complete testing on a full operating system, and we do have to rely on our users to report problems in the things they use. So, the last in my list of reasons why annoyances exist is that, every so often, developers don't notice them, and the users who do notice them don't report them!

[/ubuntu] permanent link