|
Linux as a Hypervisor
Many Different Types of VirtualizationIt's quite fascinating how many virtualization designs there are out there. Even in Linux itself there are so many different approaches. To reduce the scope a bit for this post we'll ignore all the designs that can't revirtualize the CPU to allow other operative systems to run. If we look back, before virtualization existed, there have been a lots and lots of discussions on OS designs too. For example Linux uses the monolithic design (modulo drivers in userland). A few other OS uses the microkernel design. Each solution brings pros and cons. In the end most production OS tend to use the monolithic design for all performance critical tasks. As far as Linux is concerned the major cons of having a driver in userland is that it's much slower. Most isolation layering brings slowdown to the system. So for example it's perfectly ok to have a fingerprint reader driver in userland (like the one I use on my laptop). Nobody cares if it takes a bit more of CPU to read a fingerprint during login into the system. On the other hand it would be very bad to pay for a kernel entry and exit every time a network packet is sent or received on a gigabit ethernet, which is also why paravirtualization pays off big for high bandwidth devices.
Parallels With Kernel DesignsThe very basic idea of the microkernel is that by isolating certain services and drivers to their own isolated address space with a safe API to interface with the microkernel, if a certain driver crashes it won't bring down and destabilize the entire system. You can imagine how much it helps not to bring down the whole system if the SATA driver crashes, or if the ext3 filesystem crashes. However, such failures don't always result in loss of functionality. For example, a pure firewall may be ok with such a failure, because the network packets will pass through regardless of the I/O not being functional. You can imagine how much it helps to keep doing I/O just fine if the TCP stack or the ethernet driver crashes, and no network packet can reach the server anymore, or if the data received from the network is corrupted. So, for the majority of services a failure is fatal to the operation of the whole system. In a fault-tolerant setup the secondary should takeover when a failure happens regardless if the microkernel of the primary is still alive. Those design issues that have been common to all OS for the last decades, and have parallels in the virtualization world too. For example, if an hypervisor is stripped of all the drivers in order to do only memory management and scheduling of the virtual machines, much like a microkernel, and it relies on a monlithic kernel living in its own separate and securely isolated address space for all drivers and I/O, that will imply a slowdown in the API. Because it will require a privilege level change for any privileged operation that involves the hardware. It doesn't really bring any benefit in security terms if there's a little bug in the SATA driver of the "trusted" monolithic kernel that has to take care of all the I/O. This without mentioning the fact that without VT-d or proper iommu support a buggy driver can overwrite the hypervisor with DMA. Furthermore, as the requirements for advanced virtualization features increase, this stripped-off hypervisor will have to either grow in size by including a lot more code and algorithms that will make it look closer and closer to a real OS, and in turn diminishing the benefit of the effort of keeping most drivers in a separate address space, or alternatively it would need to grow its API with the OS doing the I/O and in turn it would run slower and slower and more complex by keeping many critical algorithms in a different privilege level.
Linux Advantages as a HypervisorFor example, once the hypervisor needs to become multicore aware, NUMA aware, able to SWAP with its own aging algorithms that detect the working set of each guest OS, able to keep the CPU in C4 state and at the lowest frequency the whole idle time, able to support CPU hotplug, memory hotplug, suspend to ram, and all other sort of features that a real OS has to support, Linux becomes a perfect fit to be the hypervisor itself as it solves all those problems already. To make a few examples of the practical advantages of using Linux as Hypervisor, I was amazed how clean it was to allow KVM to swap reliably the entire guest memory in only a few weeks of work by taking avantage of the core Linux virtual memory management to do all the aging and working set calculations. I'm also pleased of how the 2.6.24 kernel of my Penryn laptop that suspends with s2ram automatically with acpid when I close the lid, and it consumes only 0.5watts until I open the lid again. It continues playing YouTube video and audio inside KVM whenever I open the lid with only a few lines of KVM being aware of the suspend and resume to disable vmx/svm mode while the CPU is suspended. I noticed that the design that requires the lowest effort to quickly reach equal or superior features usually wins the marketplace as it tends to be the most efficient and stable over time. I guess this is why we're not using IA64 laptops just yet. ResourcesAndrea Arcangeli is a kernel hacker and coder extraordinaire, and he works for Qumranet.
The Best Virtualization Program You've Never Heard Of, part 1
|