| Colin Watson | |||||
|
Subscribe
Flavours |
Tue, 29 Jul 2008 (This post is inspired by this post on ArsGeek, hence the title. Thanks to Scott James Remnant for some of the ideas in this post.) I'll get to the details of ArsGeek's post in a moment, but really the more interesting part of it is the discussion of how we sometimes end up releasing software with (from various people's perspectives) serious bugs, a question of release management. As a member of the Ubuntu release team, I thought I'd respond to this and try to clarify what goes on. Release management strategiesFirstly, I know of no way to ensure a bug-free release of a system the size of a modern GNU/Linux distribution, with software coming from a huge variety of sources with all kinds of different QA standards, not to mention plenty of sources within the distribution itself where new bugs can be introduced. Indeed, ensuring releases free of serious bugs is rather more of an art than a science. What may be surprising is that, for a composite distribution of software from lots of sources, it doesn't actually seem to make a whole lot of difference to bug density whether you choose to whack all the bugs out before picking a release date, or just pick a release date up-front and tell everyone to aim for that (as long as people believe your release dates; more on this later). I can't offer hard evidence for this, only the gut feel of somebody with experience of both approaches, but it does seem to me to be the case. The idea of releasing "when it's ready" is a very appealing one. I have some experience here: I used to be one of Debian's release managers too, which is just about as far to that end of the scale as you get among GNU/Linux distributions. In theory, you put all the software into a big pot, stir it up, collect all the bugs, and at some point when it smells reasonably good you get everyone to whack away at the bugs in turn until they're all gone. Then you release. In theory, theory and practice are the same thing ... In practice, what happens is that there are a number of counterbalancing pressures. Developers don't want to be tied down waiting for all those other lazy developers who haven't fixed their bugs yet; users want the latest version of their favourite application which has some new and shiny feature; security support for the last release is getting harder as it gets older and the backport distance increases; people with new hardware want a new release that will support it; and so on. The longer you spend fixing bugs, the more the internal pressures show, and the harder it gets. Thus, even Debian compromises on this, and many of the bugs that are "fixed" shortly before release are in fact dealt with by removing leaf packages from the distribution, documenting workarounds, or simply deciding that they aren't release-critical. And this is only for the highest-severity bugs! Many other bugs, by straightforward necessity, are simply not important enough to hold up the release of the whole distribution. Now, Ubuntu opted for the time-based approach up front. You pick a date and you stick to it as if your life depended on it. Within that constraint, you figure out (read: guess) when new upstream software and major new features have to land in order to be able to fix the expected new bugs they introduce, and make case-by-case decisions on anything that slips past the intermediate deadlines you impose. You have to say "no" a lot, and you spend a lot of time in planning. Nowadays, this is familiar to most people: you don't delay a time-based release for new features. The hard bit comes when things go wrong, and then you find that you can't delay a time-based release to fix bugs either! If you slip your deadlines as a matter of course, then nobody will believe your release dates next time, and you'll start finding that people ignore the internal deadlines because "hey, they're not going to release on time anyway"; it all goes to pot very quickly. (We saw this in Debian: every time a published release date slipped, interest in putting in time to meet the next one diminished.) Thus you typically have a difficult choice: you can roll back to the last-known-good version (if this is even possible; often enough, interdependencies mean you'd end up rolling back a whole subsystem in order to get back to something that worked, and figuring out whether that's worth it can be harder than just fixing the bug to start with), or you can document the new bugs that have been introduced. Each choice will make somebody unhappy, as one group of people expects the latest-and-greatest software, and another expects stability. Neither group is intrinsically wrong, but you can't satisfy both of them in the presence of buggy software. Of course, there is a third choice: fix the bug already! In rather a lot of cases, we do, and we also spend a great deal of time triaging and organising bugs so that developers can focus on the most important ones first. In a time-based world, though, the clock is ticking and you have finite resources. If you have any, you can get your paid staff to focus on the really critical issues, and of course we do that; many of the Ubuntu team who are employed by Canonical pull some pretty long hours around release time. (As a side note, time-based releases seem to work a bit better when paid developers are involved because you can get away with doing this.) But all those paid staff tend to have extensive demands on their time, and so some things usually slip through the net. Ubuntu 8.04 decisionsSo, we have a bunch of tough decisions to make. I don't especially expect this to attract sympathy: after all, if it were easy, everyone would be doing it. Nevertheless, we have to make tough decisions all the time where there may not necessarily be a right answer, and they are an inevitable consequence of a time-based release process. When this kind of thing goes wrong, people rightly criticise us; as ArsGeek says, the buck stops with Ubuntu and Canonical as far as Ubuntu users are concerned. It often isn't as straightforward as it looks, though, since you don't get a practical demonstration of what would have gone wrong the other way round. If you look at it purely on the basis of bugs introduced, it seems clear that we should have held back on switching to GVFS (and GIO) and instead stuck with GNOME-VFS. The VFS layer is at the core of GNOME, though. Sticking with GNOME-VFS would mean either that we'd have to be using something that the upstream developers were no longer putting much effort into supporting, very probably resulting in other regressions, or we'd have had to ship GNOME 2.18 in Ubuntu 8.04 as we did in 7.10. This would be a big deal: the update in 8.10 would be fiendishly difficult because now we'd be behind, security support would be harder to achieve, software developers would find that software developed on other systems of around the same vintage might not even build on Ubuntu, and many users would be unhappy because they'd be missing out on new features introduced in GNOME 2.20. Instead, what we decided to do was to take the pain and do what we could to mitigate it, and, partly as a result of this, we handled the 8.04.1 point release somewhat differently from how we've handled similar point releases in the past. We set a date for 8.04.1 in advance, and a number of people worked solely towards that. Canonical sponsored work in GNOME to help iron out certain GVFS bugs that had already been raised as major omissions in the new framework. We agreed that 8.04.1 would be the first in a series of six-monthly point releases of 8.04 (out of phase with the main Ubuntu release cycle) so that users would be able to plan around it. There were various other major decisions in Ubuntu 8.04, some of which made headlines and some of which didn't. Here are a couple of examples with contrasting results:
Specific problemsArsGeek raised several specific annoyances, so I'll respond to those while I'm here.
In many of these cases, I noticed that nobody had actually filed a bug to tell us about the problem. While of course we do as much testing as we can internally, and you can always say "oh, that's obvious, they should have tested that one", we still don't have anything like the number of people you'd need to do complete testing on a full operating system, and we do have to rely on our users to report problems in the things they use. So, the last in my list of reasons why annoyances exist is that, every so often, developers don't notice them, and the users who do notice them don't report them! |
||||