X-Git-Url: http://vcs.maemo.org/git/?a=blobdiff_plain;f=qemu-tech.texi;h=ed2d35bf5e4382e515a0f00f5b720b408011edbe;hb=HEAD;hp=c86094b7c65d50f40c6477b2df474b55bd567476;hpb=b671f9ed2de4c45995843a85a7f3adc90071a47e;p=qemu diff --git a/qemu-tech.texi b/qemu-tech.texi index c86094b..ed2d35b 100644 --- a/qemu-tech.texi +++ b/qemu-tech.texi @@ -1,7 +1,12 @@ \input texinfo @c -*- texinfo -*- +@c %**start of header +@setfilename qemu-tech.info +@settitle QEMU Internals +@exampleindent 0 +@paragraphindent 0 +@c %**end of header @iftex -@settitle QEMU Internals @titlepage @sp 7 @center @titlefont{QEMU Internals} @@ -9,8 +14,34 @@ @end titlepage @end iftex +@ifnottex +@node Top +@top + +@menu +* Introduction:: +* QEMU Internals:: +* Regression Tests:: +* Index:: +@end menu +@end ifnottex + +@contents + +@node Introduction @chapter Introduction +@menu +* intro_features:: Features +* intro_x86_emulation:: x86 and x86-64 emulation +* intro_arm_emulation:: ARM emulation +* intro_mips_emulation:: MIPS emulation +* intro_ppc_emulation:: PowerPC emulation +* intro_sparc_emulation:: Sparc32 and Sparc64 emulation +* intro_other_emulation:: Other CPU emulation +@end menu + +@node intro_features @section Features QEMU is a FAST! processor emulator using a portable dynamic @@ -20,18 +51,18 @@ QEMU has two operating modes: @itemize @minus -@item -Full system emulation. In this mode, QEMU emulates a full system -(usually a PC), including a processor and various peripherals. It can -be used to launch an different Operating System without rebooting the -PC or to debug system code. - -@item -User mode emulation (Linux host only). In this mode, QEMU can launch -Linux processes compiled for one CPU on another CPU. It can be used to -launch the Wine Windows API emulator (@url{http://www.winehq.org}) or -to ease cross-compilation and cross-debugging. - +@item +Full system emulation. In this mode (full platform virtualization), +QEMU emulates a full system (usually a PC), including a processor and +various peripherals. It can be used to launch several different +Operating Systems at once without rebooting the host machine or to +debug system code. + +@item +User mode emulation. In this mode (application level virtualization), +QEMU can launch processes compiled for one CPU on another CPU, however +the Operating Systems must match. This can be used for example to ease +cross-compilation and cross-debugging. @end itemize As QEMU requires no host kernel driver to run, it is very safe and @@ -39,77 +70,106 @@ easy to use. QEMU generic features: -@itemize +@itemize @item User space only or full system emulation. -@item Using dynamic translation to native code for reasonnable speed. +@item Using dynamic translation to native code for reasonable speed. -@item Working on x86 and PowerPC hosts. Being tested on ARM, Sparc32, Alpha and S390. +@item +Working on x86, x86_64 and PowerPC32/64 hosts. Being tested on ARM, +HPPA, Sparc32 and Sparc64. Previous versions had some support for +Alpha and S390 hosts, but TCG (see below) doesn't support those yet. @item Self-modifying code support. @item Precise exceptions support. -@item The virtual CPU is a library (@code{libqemu}) which can be used +@item The virtual CPU is a library (@code{libqemu}) which can be used in other projects (look at @file{qemu/tests/qruncom.c} to have an example of user mode @code{libqemu} usage). +@item +Floating point library supporting both full software emulation and +native host FPU instructions. + @end itemize QEMU user mode emulation features: -@itemize +@itemize @item Generic Linux system call converter, including most ioctls. @item clone() emulation using native CPU clone() to use Linux scheduler for threads. -@item Accurate signal handling by remapping host signals to target signals. -@end itemize +@item Accurate signal handling by remapping host signals to target signals. @end itemize +Linux user emulator (Linux host only) can be used to launch the Wine +Windows API emulator (@url{http://www.winehq.org}). A Darwin user +emulator (Darwin hosts only) exists and a BSD user emulator for BSD +hosts is under development. It would also be possible to develop a +similar user emulator for Solaris. + QEMU full system emulation features: -@itemize -@item QEMU can either use a full software MMU for maximum portability or use the host system call mmap() to simulate the target MMU. +@itemize +@item +QEMU uses a full software MMU for maximum portability. + +@item +QEMU can optionally use an in-kernel accelerator, like kqemu and +kvm. The accelerators execute some of the guest code natively, while +continuing to emulate the rest of the machine. + +@item +Various hardware devices can be emulated and in some cases, host +devices (e.g. serial and parallel ports, USB, drives) can be used +transparently by the guest Operating System. Host device passthrough +can be used for talking to external physical peripherals (e.g. a +webcam, modem or tape drive). + +@item +Symmetric multiprocessing (SMP) even on a host with a single CPU. On a +SMP host system, QEMU can use only one CPU fully due to difficulty in +implementing atomic memory accesses efficiently. + @end itemize -@section x86 emulation +@node intro_x86_emulation +@section x86 and x86-64 emulation QEMU x86 target features: -@itemize +@itemize -@item The virtual x86 CPU supports 16 bit and 32 bit addressing with segmentation. -LDT/GDT and IDT are emulated. VM86 mode is also supported to run DOSEMU. +@item The virtual x86 CPU supports 16 bit and 32 bit addressing with segmentation. +LDT/GDT and IDT are emulated. VM86 mode is also supported to run +DOSEMU. There is some support for MMX/3DNow!, SSE, SSE2, SSE3, SSSE3, +and SSE4 as well as x86-64 SVM. @item Support of host page sizes bigger than 4KB in user mode emulation. @item QEMU can emulate itself on x86. -@item An extensive Linux x86 CPU test program is included @file{tests/test-i386}. +@item An extensive Linux x86 CPU test program is included @file{tests/test-i386}. It can be used to test other x86 virtual CPUs. @end itemize Current QEMU limitations: -@itemize - -@item No SSE/MMX support (yet). +@itemize -@item No x86-64 support. +@item Limited x86-64 support. @item IPC syscalls are missing. -@item The x86 segment limits and access rights are not tested at every +@item The x86 segment limits and access rights are not tested at every memory access (yet). Hopefully, very few OSes seem to rely on that for normal use. -@item On non x86 host CPUs, @code{double}s are used instead of the non standard -10 byte @code{long double}s of x86 for floating point emulation to get -maximum performances. - @end itemize +@node intro_arm_emulation @section ARM emulation @itemize @@ -122,30 +182,111 @@ maximum performances. @end itemize +@node intro_mips_emulation +@section MIPS emulation + +@itemize + +@item The system emulation allows full MIPS32/MIPS64 Release 2 emulation, +including privileged instructions, FPU and MMU, in both little and big +endian modes. + +@item The Linux userland emulation can run many 32 bit MIPS Linux binaries. + +@end itemize + +Current QEMU limitations: + +@itemize + +@item Self-modifying code is not always handled correctly. + +@item 64 bit userland emulation is not implemented. + +@item The system emulation is not complete enough to run real firmware. + +@item The watchpoint debug facility is not implemented. + +@end itemize + +@node intro_ppc_emulation @section PowerPC emulation @itemize -@item Full PowerPC 32 bit emulation, including privileged instructions, +@item Full PowerPC 32 bit emulation, including privileged instructions, FPU and MMU. @item Can run most PowerPC Linux binaries. @end itemize -@section SPARC emulation +@node intro_sparc_emulation +@section Sparc32 and Sparc64 emulation @itemize -@item Somewhat complete SPARC V8 emulation, including privileged -instructions, FPU and MMU. +@item Full SPARC V8 emulation, including privileged +instructions, FPU and MMU. SPARC V9 emulation includes most privileged +and VIS instructions, FPU and I/D MMU. Alignment is fully enforced. -@item Can run some SPARC Linux binaries. +@item Can run most 32-bit SPARC Linux binaries, SPARC32PLUS Linux binaries and +some 64-bit SPARC Linux binaries. @end itemize +Current QEMU limitations: + +@itemize + +@item IPC syscalls are missing. + +@item Floating point exception support is buggy. + +@item Atomic instructions are not correctly implemented. + +@item There are still some problems with Sparc64 emulators. + +@end itemize + +@node intro_other_emulation +@section Other CPU emulation + +In addition to the above, QEMU supports emulation of other CPUs with +varying levels of success. These are: + +@itemize + +@item +Alpha +@item +CRIS +@item +M68k +@item +SH4 +@end itemize + +@node QEMU Internals @chapter QEMU Internals +@menu +* QEMU compared to other emulators:: +* Portable dynamic translation:: +* Condition code optimisations:: +* CPU state optimisations:: +* Translation cache:: +* Direct block chaining:: +* Self-modifying code and translated code invalidation:: +* Exception support:: +* MMU emulation:: +* Device emulation:: +* Hardware interrupts:: +* User emulation specific details:: +* Bibliography:: +@end menu + +@node QEMU compared to other emulators @section QEMU compared to other emulators Like bochs [3], QEMU emulates an x86 CPU. But QEMU is much faster than @@ -179,19 +320,24 @@ patches. However, user mode Linux requires heavy kernel patches while QEMU accepts unpatched Linux kernels. The price to pay is that QEMU is slower. -The new Plex86 [8] PC virtualizer is done in the same spirit as the -qemu-fast system emulator. It requires a patched Linux kernel to work -(you cannot launch the same kernel on your PC), but the patches are -really small. As it is a PC virtualizer (no emulation is done except -for some priveledged instructions), it has the potential of being -faster than QEMU. The downside is that a complicated (and potentially -unsafe) host kernel patch is needed. +The Plex86 [8] PC virtualizer is done in the same spirit as the now +obsolete qemu-fast system emulator. It requires a patched Linux kernel +to work (you cannot launch the same kernel on your PC), but the +patches are really small. As it is a PC virtualizer (no emulation is +done except for some privileged instructions), it has the potential of +being faster than QEMU. The downside is that a complicated (and +potentially unsafe) host kernel patch is needed. The commercial PC Virtualizers (VMWare [9], VirtualPC [10], TwoOStwo [11]) are faster than QEMU, but they all need specific, proprietary and potentially unsafe host drivers. Moreover, they are unable to provide cycle exact simulation as an emulator can. +VirtualBox [12], Xen [13] and KVM [14] are based on QEMU. QEMU-SystemC +[15] uses QEMU to simulate a system where some hardware devices are +developed in SystemC. + +@node Portable dynamic translation @section Portable dynamic translation QEMU is a dynamic translator. When it first encounters a piece of code, @@ -200,63 +346,57 @@ are very complicated and highly CPU dependent. QEMU uses some tricks which make it relatively easily portable and simple while achieving good performances. -The basic idea is to split every x86 instruction into fewer simpler -instructions. Each simple instruction is implemented by a piece of C -code (see @file{target-i386/op.c}). Then a compile time tool -(@file{dyngen}) takes the corresponding object file (@file{op.o}) -to generate a dynamic code generator which concatenates the simple -instructions to build a function (see @file{op.h:dyngen_code()}). - -In essence, the process is similar to [1], but more work is done at -compile time. - -A key idea to get optimal performances is that constant parameters can -be passed to the simple operations. For that purpose, dummy ELF -relocations are generated with gcc for each constant parameter. Then, -the tool (@file{dyngen}) can locate the relocations and generate the -appriopriate C code to resolve them when building the dynamic code. - -That way, QEMU is no more difficult to port than a dynamic linker. - -To go even faster, GCC static register variables are used to keep the -state of the virtual CPU. - -@section Register allocation - -Since QEMU uses fixed simple instructions, no efficient register -allocation can be done. However, because RISC CPUs have a lot of -register, most of the virtual CPU state can be put in registers without -doing complicated register allocation. - +After the release of version 0.9.1, QEMU switched to a new method of +generating code, Tiny Code Generator or TCG. TCG relaxes the +dependency on the exact version of the compiler used. The basic idea +is to split every target instruction into a couple of RISC-like TCG +ops (see @code{target-i386/translate.c}). Some optimizations can be +performed at this stage, including liveness analysis and trivial +constant expression evaluation. TCG ops are then implemented in the +host CPU back end, also known as TCG target (see +@code{tcg/i386/tcg-target.c}). For more information, please take a +look at @code{tcg/README}. + +@node Condition code optimisations @section Condition code optimisations -Good CPU condition codes emulation (@code{EFLAGS} register on x86) is a -critical point to get good performances. QEMU uses lazy condition code -evaluation: instead of computing the condition codes after each x86 -instruction, it just stores one operand (called @code{CC_SRC}), the -result (called @code{CC_DST}) and the type of operation (called -@code{CC_OP}). - -@code{CC_OP} is almost never explicitely set in the generated code +Lazy evaluation of CPU condition codes (@code{EFLAGS} register on x86) +is important for CPUs where every instruction sets the condition +codes. It tends to be less important on conventional RISC systems +where condition codes are only updated when explicitly requested. On +Sparc64, costly update of both 32 and 64 bit condition codes can be +avoided with lazy evaluation. + +Instead of computing the condition codes after each x86 instruction, +QEMU just stores one operand (called @code{CC_SRC}), the result +(called @code{CC_DST}) and the type of operation (called +@code{CC_OP}). When the condition codes are needed, the condition +codes can be calculated using this information. In addition, an +optimized calculation can be performed for some instruction types like +conditional branches. + +@code{CC_OP} is almost never explicitly set in the generated code because it is known at translation time. -In order to increase performances, a backward pass is performed on the -generated simple instructions (see -@code{target-i386/translate.c:optimize_flags()}). When it can be proved that -the condition codes are not needed by the next instructions, no -condition codes are computed at all. +The lazy condition code evaluation is used on x86, m68k, cris and +Sparc. ARM uses a simplified variant for the N and Z flags. +@node CPU state optimisations @section CPU state optimisations -The x86 CPU has many internal states which change the way it evaluates -instructions. In order to achieve a good speed, the translation phase -considers that some state information of the virtual x86 CPU cannot -change in it. For example, if the SS, DS and ES segments have a zero -base, then the translator does not even generate an addition for the -segment base. +The target CPUs have many internal states which change the way it +evaluates instructions. In order to achieve a good speed, the +translation phase considers that some state information of the virtual +CPU cannot change in it. The state is recorded in the Translation +Block (TB). If the state changes (e.g. privilege level), a new TB will +be generated and the previous TB won't be used anymore until the state +matches the state recorded in the previous TB. For example, if the SS, +DS and ES segments have a zero base, then the translator does not even +generate an addition for the segment base. [The FPU stack pointer register is not handled that way yet]. +@node Translation cache @section Translation cache A 16 MByte cache holds the most recently used translations. For @@ -265,6 +405,7 @@ contains just a single basic block (a block of x86 instructions terminated by a jump or by a virtual CPU state change which the translator cannot deduce statically). +@node Direct block chaining @section Direct block chaining After each translated basic block is executed, QEMU uses the simulated @@ -280,6 +421,7 @@ it easier to make the jump target modification atomic. On some host architectures (such as x86 or PowerPC), the @code{JUMP} opcode is directly patched so that the block chaining has no overhead. +@node Self-modifying code and translated code invalidation @section Self-modifying code and translated code invalidation Self-modifying code is a special challenge in x86 emulation because no @@ -287,64 +429,80 @@ instruction cache invalidation is signaled by the application when code is modified. When translated code is generated for a basic block, the corresponding -host page is write protected if it is not already read-only (with the -system call @code{mprotect()}). Then, if a write access is done to the -page, Linux raises a SEGV signal. QEMU then invalidates all the -translated code in the page and enables write accesses to the page. +host page is write protected if it is not already read-only. Then, if +a write access is done to the page, Linux raises a SEGV signal. QEMU +then invalidates all the translated code in the page and enables write +accesses to the page. Correct translated code invalidation is done efficiently by maintaining a linked list of every translated block contained in a given page. Other -linked lists are also maintained to undo direct block chaining. - -Although the overhead of doing @code{mprotect()} calls is important, -most MSDOS programs can be emulated at reasonnable speed with QEMU and -DOSEMU. - -Note that QEMU also invalidates pages of translated code when it detects -that memory mappings are modified with @code{mmap()} or @code{munmap()}. +linked lists are also maintained to undo direct block chaining. -When using a software MMU, the code invalidation is more efficient: if -a given code page is invalidated too often because of write accesses, -then a bitmap representing all the code inside the page is -built. Every store into that page checks the bitmap to see if the code -really needs to be invalidated. It avoids invalidating the code when -only data is modified in the page. +On RISC targets, correctly written software uses memory barriers and +cache flushes, so some of the protection above would not be +necessary. However, QEMU still requires that the generated code always +matches the target instructions in memory in order to handle +exceptions correctly. +@node Exception support @section Exception support longjmp() is used when an exception such as division by zero is -encountered. +encountered. The host SIGSEGV and SIGBUS signal handlers are used to get invalid -memory accesses. The exact CPU state can be retrieved because all the -x86 registers are stored in fixed host registers. The simulated program -counter is found by retranslating the corresponding basic block and by -looking where the host program counter was at the exception point. +memory accesses. The simulated program counter is found by +retranslating the corresponding basic block and by looking where the +host program counter was at the exception point. The virtual CPU cannot retrieve the exact @code{EFLAGS} register because in some cases it is not computed because of condition code optimisations. It is not a big concern because the emulated code can still be restarted in any cases. +@node MMU emulation @section MMU emulation -For system emulation, QEMU uses the mmap() system call to emulate the -target CPU MMU. It works as long the emulated OS does not use an area -reserved by the host OS (such as the area above 0xc0000000 on x86 -Linux). - -In order to be able to launch any OS, QEMU also supports a soft -MMU. In that mode, the MMU virtual to physical address translation is -done at every memory access. QEMU uses an address translation cache to -speed up the translation. +For system emulation QEMU supports a soft MMU. In that mode, the MMU +virtual to physical address translation is done at every memory +access. QEMU uses an address translation cache to speed up the +translation. In order to avoid flushing the translated code each time the MMU mappings change, QEMU uses a physically indexed translation cache. It -means that each basic block is indexed with its physical address. +means that each basic block is indexed with its physical address. When MMU mappings change, only the chaining of the basic blocks is reset (i.e. a basic block can no longer jump directly to another one). +@node Device emulation +@section Device emulation + +Systems emulated by QEMU are organized by boards. At initialization +phase, each board instantiates a number of CPUs, devices, RAM and +ROM. Each device in turn can assign I/O ports or memory areas (for +MMIO) to its handlers. When the emulation starts, an access to the +ports or MMIO memory areas assigned to the device causes the +corresponding handler to be called. + +RAM and ROM are handled more optimally, only the offset to the host +memory needs to be added to the guest address. + +The video RAM of VGA and other display cards is special: it can be +read or written directly like RAM, but write accesses cause the memory +to be marked with VGA_DIRTY flag as well. + +QEMU supports some device classes like serial and parallel ports, USB, +drives and network devices, by providing APIs for easier connection to +the generic, higher level implementations. The API hides the +implementation details from the devices, like native device use or +advanced block device formats like QCOW. + +Usually the devices implement a reset method and register support for +saving and loading of the device state. The devices can also use +timers, especially together with the use of bottom halves (BHs). + +@node Hardware interrupts @section Hardware interrupts In order to be faster, QEMU does not check at every basic block if an @@ -355,6 +513,7 @@ block. It ensures that the execution will return soon in the main loop of the CPU emulator. Then the main loop can test if the interrupt is pending and handle it. +@node User emulation specific details @section User emulation specific details @subsection Linux system call translation @@ -408,15 +567,16 @@ it is not very useful, it is an important test to show the power of the emulator. Achieving self-virtualization is not easy because there may be address -space conflicts. QEMU solves this problem by being an executable ELF -shared object as the ld-linux.so ELF interpreter. That way, it can be -relocated at load time. +space conflicts. QEMU user emulators solve this problem by being an +executable ELF shared object as the ld-linux.so ELF interpreter. That +way, it can be relocated at load time. +@node Bibliography @section Bibliography @table @asis -@item [1] +@item [1] @url{http://citeseer.nj.nec.com/piumarta98optimizing.html}, Optimizing direct threaded code by selective inlining (1998) by Ian Piumarta, Fabio Riccardi. @@ -434,7 +594,7 @@ by Kevin Lawton et al. x86 emulator on Alpha-Linux. @item [5] -@url{http://www.usenix.org/publications/library/proceedings/usenix-nt97/full_papers/chernoff/chernoff.pdf}, +@url{http://www.usenix.org/publications/library/proceedings/usenix-nt97/@/full_papers/chernoff/chernoff.pdf}, DIGITAL FX!32: Running 32-Bit x86 Applications on Alpha NT, by Anton Chernoff and Ray Hookway. @@ -443,32 +603,56 @@ Chernoff and Ray Hookway. Willows Software. @item [7] -@url{http://user-mode-linux.sourceforge.net/}, +@url{http://user-mode-linux.sourceforge.net/}, The User-mode Linux Kernel. @item [8] -@url{http://www.plex86.org/}, +@url{http://www.plex86.org/}, The new Plex86 project. @item [9] -@url{http://www.vmware.com/}, +@url{http://www.vmware.com/}, The VMWare PC virtualizer. @item [10] -@url{http://www.microsoft.com/windowsxp/virtualpc/}, +@url{http://www.microsoft.com/windowsxp/virtualpc/}, The VirtualPC PC virtualizer. @item [11] -@url{http://www.twoostwo.org/}, +@url{http://www.twoostwo.org/}, The TwoOStwo PC virtualizer. +@item [12] +@url{http://virtualbox.org/}, +The VirtualBox PC virtualizer. + +@item [13] +@url{http://www.xen.org/}, +The Xen hypervisor. + +@item [14] +@url{http://kvm.qumranet.com/kvmwiki/Front_Page}, +Kernel Based Virtual Machine (KVM). + +@item [15] +@url{http://www.greensocs.com/projects/QEMUSystemC}, +QEMU-SystemC, a hardware co-simulator. + @end table +@node Regression Tests @chapter Regression Tests In the directory @file{tests/}, various interesting testing programs -are available. There are used for regression testing. +are available. They are used for regression testing. + +@menu +* test-i386:: +* linux-test:: +* qruncom.c:: +@end menu +@node test-i386 @section @file{test-i386} This program executes most of the 16 bit and 32 bit x86 instructions and @@ -484,12 +668,20 @@ The Linux system call @code{vm86()} is used to test vm86 emulation. Various exceptions are raised to test most of the x86 user space exception reporting. +@node linux-test @section @file{linux-test} This program tests various Linux system calls. It is used to verify that the system call parameters are correctly converted between target and host CPUs. +@node qruncom.c @section @file{qruncom.c} Example of usage of @code{libqemu} to emulate a user mode i386 CPU. + +@node Index +@chapter Index +@printindex cp + +@bye