Tuesday, August 28, 2007

Those Dang DPCs Clogging the MMCSS

Vista's funky networking performance amid multimedia playback elicited a reply from Microsoft's own Mark Russinovich:

Besides activity by other threads, media playback can also be affected by network activity. When a network packet arrives at [the] system, it triggers a CPU interrupt, which causes the device driver for the device at which the packet arrived to execute an Interrupt Service Routine (ISR). Other device interrupts are blocked while ISRs run, so ISRs typically do some device book-keeping and then perform the more lengthy transfer of data to or from their device in a Deferred Procedure Call (DPC) that runs with device interrupts enabled. While DPCs execute with interrupts enabled, they take precedence over all thread execution, regardless of priority, on the processor on which they run, and can therefore impede media playback threads.

Network DPC receive processing is among the most expensive, because it includes handing packets to the TCP/IP driver, which can result in lengthy computation. The TCP/IP driver verifies each packet, determines the packet’s protocol, updates the connection state, finds the receiving application, and copies the received data into the application’s buffers.

Mark goes on to show that copying a file from one machine to another consumes a staggering 41% of the available processor. In Joey's words, that is horrid and just an awful situation.

Like Vista, Linux separates interrupt handling into two distinct components, a top half (the ISR) and a bottom half. The bottom half is a mechanism for deferring work away from the interrupt handler (see Chapter 7 in Linux Kernel Development). Vista's DPC mechanism is a bottom half implementation that sounds similar to Linux's workqueues, which allow the deferment of work from one context (typically interrupt) to another. As with the DPC mechanism, workqueues run in process context, with interrupts enabled, generally (although not necessarily) with priority over other tasks on the system. Workqueues, as with DPCs, are well-suited for deferring the processing of networking work from the ISR to a later point, when interrupts are enabled.

Unlike DPCs, however, the Linux parallel does not consume nearly half of your CPU. In fact, in repeated tests involving both "copying a large file from another system" and a simple unabated ping flood, I was unable to consume any tangible amount of processor. That is, Linux can achieve high utilization of a GigE network interface with only minimal CPU usage.

Critical optimizations such as zero-copy aside, there is no excusable reason why processing IP packets should so damagingly affect the system. Thus, this absolutely abysmal networking performance should be an issue in and of itself. Unfortunately, however, the Windows developers decided to focus on a secondary effect:

Tests of [Multimedia Class Scheduler Service (MMCSS), a mechanism for the automatic priority-enhancement of multimedia playback,] during Vista development showed that, even with thread-priority boosting, heavy network traffic can cause enough long-running DPCs to prevent playback threads from keeping up with their media streaming requirements, resulting in glitching.

In other words, consuming half of your processor is (surprise!) detrimental to multimedia playback performance. At this point, it becomes clear that the process scheduler folks and the networking folks are bitter enemies and do not converse. Consequently, the obvious solution of fixing the abhorrent networking performance was bypassed for a quick bandaid:

MMCSS’ glitch-resistant mechanisms were therefore extended to include throttling of network activity. It does so by issuing a command to the NDIS device driver, which is the driver that gives packets received by network adapter drivers to the TCP/IP driver, that causes NDIS to “indicate”, or pass along, at most 10 packets per millisecond (10,000 packets per second).

Putting aside the larger problem for the moment, there are several issues with this solution. It prioritizes multimedia playback over networking performance, which, as the resulting clamor has shown, is not everyone's personal policy preference. It is almost assuredly a layering violation. It picks a fixed and hard-coded packet limit (ten per millisecond), which won't scale across different hardware—think significantly faster processors or substantially slower networking drivers. It ignores the commonality of GigE. And, finally, the solution is complicated, as the convoluted description and resulting bugs in the implementation demonstrate.

Moreover, I can only imagine how this solution performs while streaming video over the network.

Mr Russinovich concludes:

The hard-coded limit was short-sighted with respect to today’s systems that have faster CPUs, multiple cores and Gigabit networks, and in addition to fixing the bug that affects throttling on multi-adapter systems, the networking team is actively working with the MMCSS team on a fix that allows for not so dramatically penalizing network traffic, while still delivering a glitch-resistant experience.

We shall see. Vista is no where near ready for deployment and adopters should—as always—wait until there are several service packs for and the server variant of the OS before upgrading. In the meantime, let me recommend an alternative or two.

Curious about more of Vista's internals? Read Mark's three part exposé, Inside the Windows Vista Kernel: Part 1, 2, and 3. Mark is also the co-author of Microsoft Windows Internals, an excellent tome on the design of Windows XP and Windows Server 2003.

Friday, August 24, 2007

London Photos

Bus in London

I have received a number of emails about my photographs of London. My favorite is this desaturated shot of Westminster Abbey with the over-saturated Routemaster-esque double-decker bus racing by; the photo links to the oft-requested larger image.

In the New Yorker, historian Niall Ferguson asks, How much did the Marshall Plan matter?.

At some point, all programmers are concerned with the system level. Whether your day-to-day hacking lives much higher—in Python or Java, say—or if your raison d'être is indeed just above the kernel, a strong understanding of the system on which you build is crucial. There are plenty of books on Unix system programming—the late Stevens' sterling efforts, for example—but no book is both an excellent Unix system programming reference and a smart guide to what makes Linux unique, a reference to the Linux-specific interfaces that showcase our innovation.

Such a book would not waste time covering the differences between eight different Unix systems and the behavior dictated by three different Unix standards, but in lieu concentrate on what matters: Linux, and you writing better code, faster. Linux System Programming, out next month and published by the good folks at O'Reilly, is such a tome. I touch on file I/O to memory management, process scheduling to time management, from the kernel to glibc to gcc. I cover the basics and the advanced interfaces, the standard bearers and the Linux specific. The book explains how system calls are actually implemented in the kernel and how best to utilize this knowledge.

On this day, the 12th anniversary of Windows 95, which did not come with a web browser or install TCP/IP by default, you must preorder your seven copies of Linux System Programming!

Wednesday, August 22, 2007

Google Sky

Google today announced the release of Sky for Google Earth, a tool to marvel at the heavens and, with outstretched arm, reach for the hand of god.

The Moon
Full Moon over Gainesville

It is the most romantic software I have ever used. And I say this having played Leisure Suit Larry.

Sunday, August 19, 2007

Open Spectrum

Amusingly, my post on carbon tax is the eighth result for that query.

Man on Chair
Carrying a chair through London, this man was sporting a Linux in a Nutshell t-shirt

AT&T crippling BlackBerry GPS so that they don't outshine the iPhone? The FCC intends to adopt rules that allow open devices and open applications, a step toward preventing such chicanery, but I wait to see the auction's actual wording.

Also, a Washington Post editorial on white space devices.

Preorder my next book, Linux System Programming, today. Easily the greatest Linux system programming text I have ever read.

Friday, August 17, 2007

Robert's Blog Moved

If reading this, you likely know that my blog moved to blog.rlove.org.

The feed is available in both Atom and RSS.

London Bridge
London Bridge, River Thames, London

If you use a feed aggregator or run a planet, please update.

Wednesday, August 15, 2007

From the Worst Named to the Best

For awhile now, I have intended to move away from PyBlosxom, mainly due to its name. Blogger does not have import functionality, but it does provide access via the incredibly powerful GData API, which is an Atom-based protocol for writing and reading data to and from Google services. Our mission is to organize the world's information and make it universally accessible and useful. In pursuit of that goal, GData lets you access data from outside of Google proper, or even from contexts other than a web browser.

Robert in England drinking tea
Afternoon Tea, The Capital Hotel, London

Anyhow, I wrote py2blogger over the weekend, a tool for importing your pyblosxom files into Blogger via GData. It likely works for blosxom, too.

Two caveats. First, later versions of pyblosxom support storing entries in various formats. This tool works only with the classic model: Publish date is the file's m time, blog title is the first line in the file, blog entry is all subsequent lines.

Second, Blogger implements rate limiting to hinder the efforts of spammers. Once over a per-day post limit, you will have to fill out a CAPTCHA to submit each post; py2blogger obviously does not support this.

Oh, this thing could end up blogging your tax information and divorcing your wife. Use at your own risk.

Thursday, August 2, 2007

Carbon Tax

John Dingell (D-MI), the Dean of the House of Representatives, in today's Washington Post: The Power in the Carbon Tax.

The Congressman argues for a carbon tax but concludes that a cap-and-trade system is the more likely congressional outcome. The democratic presidential candidates apparently favor cap-and-trade over a carbon tax, too. Both observations are unfortunate.

The London Eye

Several criticisms of cap-and-trade vìs-a-vìs the more efficient carbon tax:

  • Implementing a cap-and-trade system is complex and requires the creation of government bureaucracy; a carbon tax is transparent and simple—less likely to be gamed, less likely to invite special interest
  • Cap-and-trade, with a fixed number of permits, makes no provisions for business cycle adjustments
  • Most models of cap-and-trade (those without an auction, for example) do not raise revenue for the government—generating revenue from a Pigovian tax is great, because it can allow the government to lower taxes on productivity
  • Permit prices can be volatile; a carbon tax is stable—price stability will help encourage further investment and innovation in energy-saving endeavors while price volatility in permits will cause price volatility in consumer goods
  • We don't know where the marginal benefit of cutting emissions equals the marginal cost—thus, policymakers are flying blind; finding that equilibrium is easier and safer via a tax that can be gradually raised
  • Cap-and-trade's complexity does not extend well to individuals, households, or even small businesses—most cap-and-trade proposals apply only to certain industries; a carbon tax is universal
  • We can implement a carbon tax today; a cap-and-trade system will take extensive time and effort
  • At its core, a cap-and-trade system is simply a carbon tax with a rebate given to existing polluters—we probably do not want that level of corporate welfare, but if we do, we can incorporate it much more easily into the tax code than into a cap-and-trade system

The largest argument against a carbon tax is that it is regressive. For two reasons, this is nonsense. First, cap-and-trade will just as readily increase consumer prices; the causal mechanism is just less direct. Second, a carbon tax can be rebated to households (or whomever) in the form of lower income taxes, an increase in the EITC, and so on.

Cap-and-trade proponents, affectionately called CATs, argue that these and other issues are rectifiable with additional rules, such as price floors and ceilings, permit auctions, or a Fed-like dynamic adjustment of the permit supply. These changes simply make a cap-and-trade system more like a carbon tax, albeit with layers of legislation and bureaucracy.

Yes, it is a tax. But it is an efficient one.