`
buliedian
  • 浏览: 1193425 次
  • 性别: Icon_minigender_2
  • 来自: 北京
文章分类
社区版块
存档分类
最新评论

深入剖析Win32可移植可执行文件格式

阅读更多

kemin:研究了PE几天,略译这篇文章,今天发现已经有人译得比较详细的了,并整理成PDF档(下面有下载地址),我就不再发明轮子了,发出来权当笔记。原文作者说他的描述比标准更详尽,比标准详尽但不代表完美。我搜了下其他资料对比了下,这篇文章在讲述PE方面不是最全面的(我也给相关参考网址),起码并不适合所有读者的“口味”,也难怪,可执行文件太混血了,需要很多笔墨,对读者的技术背景也要求很高。

一An In-Depth Look into the Win32 Portable Executable File Format
Matt Pietrek
This article assumes you're familiar with C++ and Win32

SUMMARY

A good understanding of the Portable Executable (PE) file format leads to a good understanding of the operating system. If you know what's in your DLLs and EXEs, you'll be a more knowledgeable programmer. This article, the first of a two-part series, looks at the changes to the PE format that have occurred over the last few years, along with an overview of the format itself.

After this update, the author discusses how the PE format fits into applications written for .NET, PE file sections, RVAs, the DataDirectory, and the importing of functions. An appendix includes lists of the relevant image header structures and their descriptions.

long time ago, in a galaxy far away, I wrote one of my first articles for Microsoft Systems Journal (now MSDN? Magazine). The article, "Peering Inside the PE: A Tour of the Win32 Portable Executable File Format," turned out to be more popular than I had expected. To this day, I still hear from people (even within Microsoft) who use that article, which is still available from the MSDN Library. Unfortunately, the problem with articles is that they're static. The world of Win32? has changed quite a bit in the intervening years, and the article is severely dated. I'll remedy that situation in a two-part article starting this month.

You might be wondering why you should care about the executable file format. The answer is the same now as it was then: an operating system's executable format and data structures reveal quite a bit about the underlying operating system. By understanding what's in your EXEs and DLLs, you'll find that you've become a better programmer all around.

Sure, you could learn a lot of what I'll tell you by reading the Microsoft specification. However, like most specs, it sacrifices readability for completeness. My focus in this article will be to explain the most relevant parts of the story, while filling in the hows and whys that don't fit neatly into a formal specification. In addition, I have some goodies in this article that don't seem to appear in any official Microsoft documentation.

Bridging the Gap

Let me give you just a few examples of what has changed since I wrote the article in 1994. Since 16-bit Windows? is history, there's no need to compare and contrast the format to the Win16 New Executable format. Another welcome departure from the scene is Win32s?. This was the abomination that ran Win32 binaries very shakily atop Windows 3.1.

Back then, Windows 95 (codenamed "Chicago" at the time) wasn't even released. Windows NT? was still at version 3.5, and the linker gurus at Microsoft hadn't yet started getting aggressive with their optimizations. However, there were MIPS and DEC Alpha implementations of Windows NT that added to the story.

And what about all the new things that have come along since that article? 64-bit Windows introduces its own variation of the Portable Executable (PE) format. Windows CE adds all sorts of new processor types. Optimizations such as delay loading of DLLs, section merging, and binding were still over the horizon. There are many new things to shoehorn into the story.

And let's not forget about Microsoft? .NET. Where does it fit in? To the operating system, .NET executables are just plain old Win32 executable files. However, the .NET runtime recognizes data within these executable files as the metadata and intermediate language that are so central to .NET. In this article, I'll knock on the door of the .NET metadata format, but save a thorough survey of its full splendor for a subsequent article.

And if all these additions and subtractions to the world of Win32 weren't enough justification to remake the article with modern day special effects, there are also errors in the original piece that make me cringe. For example, my description of Thread Local Storage (TLS) support was way out in left field. Likewise, my description of the date/time stamp DWORD used throughout the file format is accurate only if you live in the Pacific time zone!

In addition, many things that were true then are incorrect now. I had stated that the .rdata section wasn't really used for anything important. Today, it certainly is. I also said that the .idata section is a read/write section, which has been found to be most untrue by people trying to do API interception today.
Along with a complete update of the PE format story in this article, I've also overhauled the PEDUMP program, which displays the contents of PE files. PEDUMP can be compiled and run on both the x86 and IA-64 platforms, and can dump both 32 and 64-bit PE files. Most importantly, full source code for PEDUMP is available for download fropm the link at the top of this article, so you have a working example of the concepts and data structures described here.

Overview of the PE File Format

Microsoft introduced the PE File format, more commonly known as the PE format, as part of the original Win32 specifications. However, PE files are derived from the earlier Common Object File Format (COFF) found on VAX/VMS. This makes sense since much of the original Windows NT team came from Digital Equipment Corporation. It was natural for these developers to use existing code to quickly bootstrap the new Windows NT platform.

The term "Portable Executable" was chosen because the intent was to have a common file format for all flavors of Windows, on all supported CPUs. To a large extent, this goal has been achieved with the same format used on Windows NT and descendants, Windows 95 and descendants, and Windows CE.
OBJ files emitted by Microsoft compilers use the COFF format. You can get an idea of how old the COFF format is by looking at some of its fields, which use octal encoding! COFF OBJ files have many data structures and enumerations in common with PE files, and I'll mention some of them as I go along.
The addition of 64-bit Windows required just a few modifications to the PE format. This new format is called PE32+. No new fields were added, and only one field in the PE format was deleted. The remaining changes are simply the widening of certain fields from 32 bits to 64 bits. In most of these cases, you can write code that simply works with both 32 and 64-bit PE files. The Windows header files have the magic pixie dust to make the differences invisible to most C++-based code.

The distinction between EXE and DLL files is entirely one of semantics. They both use the exact same PE format. The only difference is a single bit that indicates if the file should be treated as an EXE or as a DLL. Even the DLL file extension is artificial. You can have DLLs with entirely different extensions,for instance .OCX controls and Control Panel applets (.CPL files) are DLLs.
A very handy aspect of PE files is that the data structures on disk are the same data structures used in memory. Loading an executable into memory (for example, by calling LoadLibrary) is primarily a matter of mapping certain ranges of a PE file into the address space. Thus, a data structure like the IMAGE_NT_HEADERS (which I'll examine later) is identical on disk and in memory. The key point is that if you know how to find something in a PE file, you can almost certainly find the same information when the file is loaded in memory.
It's important to note that PE files are not just mapped into memory as a single memory-mapped file. Instead, the Windows loader looks at the PE file and decides what portions of the file to map in. This mapping is consistent in that higher offsets in the file correspond to higher memory addresses when mapped into memory. The offset of an item in the disk file may differ from its offset once loaded into memory. However, all the information is present to allow you to make the translation from disk offset to memory offset (see Figure 1).

When PE files are loaded into memory via the Windows loader, the in-memory version is known as a module. The starting address where the file mapping begins is called an HMODULE. This is a point worth remembering: given an HMODULE, you know what data structure to expect at that address, and you can use that knowledge to find all the other data structures in memory. This powerful capability can be exploited for other purposes such as API interception. (To be completely accurate, an HMODULE isn't the same as the load address under Windows CE, but that's a story for yet another day.)
A module in memory represents all the code, data, and resources from an executable file that is needed by a process. Other parts of a PE file may be read, but not mapped in (for instance, relocations). Some parts may not be mapped in at all, for example, when debug information is placed at the end of the file. A field in the PE header tells the system how much memory needs to be set aside for mapping the executable into memory. Data that won't be mapped in is placed at the end of the file, past any parts that will be mapped in.

The central location where the PE format (as well as COFF files) is described is WINNT.H. Within this header file, you'll find nearly every structure definition, enumeration, and #define needed to work with PE files or the equivalent structures in memory. Sure, there is documentation elsewhere. MSDN has the "Microsoft Portable Executable and Common Object File Format Specification," for instance (see the October 2001 MSDN CD under Specifications). But WINNT.H is the final word on what PE files look like.
There are many tools for examining PE files. Among them are Dumpbin from Visual Studio, and Depends from the Platform SDK. I particularly like Depends because it has a very succinct way of examining a file's imports and exports. A great free PE viewer is PEBrowse Professional, from Smidgeonsoft (http://www.smidgeonsoft.com). The PEDUMP program included with this article is also very comprehensive, and does almost everything Dumpbin does.
From an API standpoint, the primary mechanism provided by Microsoft for reading and modifying PE files is MAGEHLP.DLL.
Before I start looking at the specifics of PE files, it's worthwhile to first review a few basic concepts that thread their way through the entire subject of PE files. In the following sections, I will discuss PE file sections, relative virtual addresses (RVAs), the data directory, and how functions are imported.

PE File Sections PE文件的节

A PE file section represents code or data of some sort. While code is just code, there are multiple types of data. Besides read/write program data (such as global variables), other types of data in sections include API import and export tables, resources, and relocations. Each section has its own set of in-memory attributes, including whether the section contains code, whether it's read-only or read/write, and whether the data in the section is shared between all processes using the executable.
Generally speaking, all the code or data in a section is logically related in some way. At a minimum, there are usually at least two sections in a PE file: one for code, the other for data. Commonly, there's at least one other type of data section in a PE file. I'll look at the various kinds of sections in Part 2 of this article next month.
Each section has a distinct name. This name is intended to convey the purpose of the section. For example, a section called .rdata indicates a read-only data section. Section names are used solely for the benefit of humans, and are insignificant to the operating system. A section named FOOBAR is just as valid as a section called .text. Microsoft typically prefixes their section names with a period, but it's not a requirement. For years, the Borland linker used section names like CODE and DATA.
While compilers have a standard set of sections that they generate, there's nothing magical about them. You can create and name your own sections, and the linker happily includes them in the executable. In Visual C++, you can tell the compiler to insert code or data into a section that you name with #pragma statements. For instance, the statement
#pragma data_seg( "MY_DATA" )
causes all data emitted by Visual C++ to go into a section called MY_DATA, rather than the default .data section. Most programs are fine using the default sections emitted by the compiler, but occasionally you may have funky requirements which necessitate putting code or data into a separate section.

Sections don't spring fully formed from the linker; rather, they start out in OBJ files, usually placed there by the compiler. The linker's job is to combine all the required sections from OBJ files and libraries into the appropriate final section in the PE file. For example, each OBJ file in your project probably has at least a .text section, which contains code. The linker takes all the sections named .text from the various OBJ files and combines them into a single .text section in the PE file. Likewise, all the sections named .data from the various OBJs are combined into a single .data section in the PE file. Code and data from .LIB files are also typically included in an executable, but that subject is outside the scope of this article.
There is a rather complete set of rules that linkers follow to decide which sections to combine and how. I gave an introduction to the linker algorithms in the July 1997 Under The Hood(http://www.microsoft.com/msj/0797/hood0797.aspx) column in MSJ. A section in an OBJ file may be intended for the linker's use, and not make it into the final executable. A section like this would be intended for the compiler to pass information to the linker.

Sections have two alignment values, one within the disk file and the other in memory. The PE file header specifies both of these values, which can differ. Each section starts at an offset that's some multiple of the alignment value. For instance, in the PE file, a typical alignment would be 0x200. Thus, every section begins at a file offset that's a multiple of 0x200.
Once mapped into memory, sections always start on at least a page boundary. That is, when a PE section is mapped into memory, the first byte of each section corresponds to a memory page. On x86 CPUs, pages are 4KB aligned, while on the IA-64, they're 8KB aligned. The following code shows a snippet of PEDUMP output for the .text and .data section of the Windows XP KERNEL32.DLL.
Section Table
01 .text VirtSize: 00074658 VirtAddr: 00001000
raw data offs: 00000400 raw data size: 00074800
???
02 .data VirtSize: 000028CA VirtAddr: 00076000
raw data offs: 00074C00 raw data size: 00002400
The .text section is at offset 0x400 in the PE file and will be 0x1000 bytes above the load address of KERNEL32 in memory. Likewise, the .data section is at file offset 0x74C00 and will be 0x76000 bytes above KERNEL32's load address in memory.

It's possible to create PE files in which the sections start at the same offset in the file as they start from the load address in memory. This makes for larger executables, but can speed loading under Windows 9x or Windows Me. The default /OPT:WIN98 linker option (introduced in Visual Studio 6.0) causes PE files to be created this way. In Visual Studio? .NET, the linker may or may not use /OPT:NOWIN98, depending on whether the file is small enough.
An interesting linker feature is the ability to merge sections. If two sections have similar, compatible attributes, they can usually be combined into a single section at link time. This is done via the linker /merge switch. For instance, the following linker option combines the .rdata and .text sections into a single section called .text:
/MERGE:.rdata=.text
The advantage to merging sections is that it saves space, both on disk and in memory. At a minimum, each section occupies one page in memory. If you can reduce the number of sections in an executable from four to three, there's a decent chance you'll use one less page of memory. Of course, this depends on whether the unused space at the end of the two merged sections adds up to a page.

Things can get interesting when you're merging sections, as there are no hard and fast rules as to what's allowed. For example, it's OK to merge .rdata into .text, but you shouldn't merge .rsrc, .reloc, or .pdata into other sections. Prior to Visual Studio .NET, you could merge .idata into other sections. In Visual Studio .NET, this is not allowed, but the linker often merges parts of the .idata into other sections, such as .rdata, when doing a release build.
Since portions of the imports data are written to by the Windows loader when they are loaded into memory, you might wonder how they can be put in a read-only section. This situation works because at load time the system can temporarily set the attributes of the pages containing the imports data to read/write. Once the imports table is initialized, the pages are then set back to their original protection attributes.

Relative Virtual Addresses

In an executable file, there are many places where an in-memory address needs to be specified. For instance, the address of a global variable is needed when referencing it. PE files can load just about anywhere in the process address space. While they do have a preferred load address, you can't rely on the executable file actually loading there. For this reason, it's important to have some way of specifying addresses that are independent of where the executable file loads.
To avoid having hardcoded memory addresses in PE files, RVAs are used. An RVA is simply an offset in memory, relative to where the PE file was loaded. For instance, consider an EXE file loaded at address 0x400000, with its code section at address 0x401000. The RVA of the code section would be:
(target address) 0x401000 - (load address)0x400000 = (RVA)0x1000.

To convert an RVA to an actual address, simply reverse the process: add the RVA to the actual load address to find the actual memory address. Incidentally, the actual memory address is called a Virtual Address (VA) in PE parlance. Another way to think of a VA is that it's an RVA with the preferred load address added in. Don't forget the earlier point I made that a load address is the same as the HMODULE.
Want to go spelunking through some arbitrary DLL's data structures in memory? Here's how. Call GetModuleHandle with the name of the DLL. The HMODULE that's returned is just a load address; you can apply your knowledge of the PE file structures to find anything you want within the module.

Relative Virtual Address. In an image file, an RVA is always the address of an item once loaded into memory, with the base address
of the image file subtracted from it. The RVA of an item will almost always differ from its position within the file on disk (File Pointer).
In an object file, an RVA is less meaningful because memory locations are not assigned. In this case, an RVA would be an address within a section (see below), to which a relocation is later applied during linking. For simplicity, compilers should just set the first RVA in each section to zero.

The Data Directory

There are many data structures within executable files that need to be quickly located. Some obvious examples are the imports, exports, resources, and base relocations. All of these well-known data structures are found in a consistent manner, and the location is known as the DataDirectory.
The DataDirectory is an array of 16 structures. Each array entry has a predefined meaning for what it refers to. The IMAGE_DIRECTORY_ENTRY_ xxx #defines are array indexes into the DataDirectory (from 0 to 15). Figure 2 describes what each of the IMAGE_DATA_DIRECTORY_xxx values refers to. A more detailed description of many of the pointed-to data structures will be included in Part 2 of this article.

Importing Functions

When you use code or data from another DLL, you're importing it. When any PE file loads, one of the jobs of the Windows loader is to locate all the imported functions and data and make those addresses available to the file being loaded. I'll save the detailed discussion of data structures used to accomplish this for Part 2 of this article, but it's worth going over the concepts here at a high level.
When you link directly against the code and data of another DLL, you're implicitly linking against the DLL. You don't have to do anything to make the addresses of the imported APIs available to your code. The loader takes care of it all. The alternative is explicit linking. This means explicitly making sure that the target DLL is loaded and then looking up the address of the APIs. This is almost always done via the LoadLibrary and GetProcAddress APIs.
When you implicitly link against an API, LoadLibrary and GetProcAddress-like code still executes, but the loader does it for you automatically. The loader also ensures that any additional DLLs needed by the PE file being loaded are also loaded. For instance, every normal program created with Visual C++? links against KERNEL32.DLL. KERNEL32.DLL in turn imports functions from NTDLL.DLL. Likewise, if you import from GDI32.DLL, it will have dependencies on the USER32, ADVAPI32, NTDLL, and KERNEL32 DLLs, which the loader makes sure are loaded and all imports resolved. (Visual Basic 6.0 and the Microsoft .NET executables directly link against a different DLL than KERNEL32, but the same principles apply.)
When implicitly linking, the resolution process for the main EXE file and all its dependent DLLs occurs when the program first starts. If there are any problems (for example, a referenced DLL that can't be found), the process is aborted.

Visual C++ 6.0 added the delayload feature, which is a hybrid between implicit linking and explicit linking. When you delayload against a DLL, the linker emits something that looks very similar to the data for a regular imported DLL. However, the operating system ignores this data. Instead, the first time a call to one of the delayloaded APIs occurs, special stubs added by the linker cause the DLL to be loaded (if it's not already in memory), followed by a call to GetProcAddress to locate the called API. Additional magic makes it so that subsequent calls to the API are just as efficient as if the API had been imported normally.
Within a PE file, there's an array of data structures, one per imported DLL. Each of these structures gives the name of the imported DLL and points to an array of function pointers. The array of function pointers is known as the import address table (IAT). Each imported API has its own reserved spot in the IAT where the address of the imported function is written by the Windows loader. This last point is particularly important: once a module is loaded, the IAT contains the address that is invoked when calling imported APIs.

The beauty of the IAT is that there's just one place in a PE file where an imported API's address is stored. No matter how many source files you scatter calls to a given API through, all the calls go through the same function pointer in the IAT.
Let's examine what the call to an imported API looks like. There are two cases to consider: the efficient way and inefficient way. In the best case, a call to an imported API looks like this:
CALL DWORD PTR [0x00405030]

If you're not familiar with x86 assembly language, this is a call through a function pointer. Whatever DWORD-sized value is at 0x405030 is where the CALL instruction will send control. In the previous example, address 0x405030 lies within the IAT.
The less efficient call to an imported API looks like this:
CALL 0x0040100C
???
0x0040100C:
JMP DWORD PTR [0x00405030]

In this situation, the CALL transfers control to a small stub. The stub is a JMP to the address whose value is at 0x405030. Again, remember that 0x405030 is an entry within the IAT. In a nutshell, the less efficient imported API call uses five bytes of additional code, and takes longer to execute because of the extra JMP.
You're probably wondering why the less efficient method would ever be used. There's a good explanation. Left to its own devices, the compiler can't distinguish between imported API calls and ordinary functions within the same module. As such, the compiler emits a CALL instruction of the form
CALL XXXXXXXX

where XXXXXXXX is an actual code address that will be filled in by the linker later. Note that this last CALL instruction isn't through a function pointer. Rather, it's an actual code address. To keep the cosmic karma in balance, the linker needs to have a chunk of code to substitute for XXXXXXXX. The simplest way to do this is to make the call point to a JMP stub, like you just saw.
Where does the JMP stub come from? Surprisingly, it comes from the import library for the imported function. If you were to examine an import library, and examine the code associated with the imported API name, you'd see that it's a JMP stub like the one just shown. What this means is that by default, in the absence of any intervention, imported API calls will use the less efficient form.
Logically, the next question to ask is how to get the optimized form. The answer comes in the form of a hint you give to the compiler. The __declspec(dllimport) function modifier tells the compiler that the function resides in another DLL and that the compiler should generate this instruction
CALL DWORD PTR [XXXXXXXX]

rather than this one:
CALL XXXXXXXX

In addition, the compiler emits information telling the linker to resolve the function pointer portion of the instruction to a symbol named __imp_functionname. For instance, if you were calling MyFunction, the symbol name would be __imp_MyFunction. Looking in an import library, you'll see that in addition to the regular symbol name, there's also a symbol with the __imp__ prefix on it. This __imp__ symbol resolves directly to the IAT entry, rather than to the JMP stub.
So what does this mean in your everyday life? If you're writing exported functions and providing a .H file for them, remember to use the __declspec(dllimport) modifier with the function:
__declspec(dllimport) void Foo(void);

If you look at the Windows system header files, you'll find that they use __declspec(dllimport) for the Windows APIs. It's not easy to see this, but if you search for the DECLSPEC_IMPORT macro defined in WINNT.H, and which is used in files such as WinBase.H, you'll see how __declspec(dllimport) is prepended to the system API declarations

附:
PE文件格式概要

The entire format consists of an MS-DOS MZ header, followed by a real-mode stub program, the PE file signature, the PE file header, the PE optional header, all of the section headers, and finally, all of the section bodies.

The optional header ends with an array of data directory entries that are relative virtual addresses to data directories contained within section bodies. Each data directory indicates how a specific section body's data is structured.

The PE file format has eleven predefined sections, as is common to applications for Windows NT, but each application can define its own unique sections for code and data.

The .debug predefined section also has the capability of being stripped from the file into a separate debug file. If so, a special debug header is used to parse the debug file, and a flag is specified in the PE file header to indicate that the debug data has been stripped.

概要

对PE文件格式有更好的了解有助于更好理解操作系统(的运作机理)。如果你知道你的exe和dll里有些什么东西,你将成为很深刻很专业的程序员。作为上下系列文章的第一部分,本文将关注这几年来PE格式的变化,同时包括对PE格式本身的探讨。这次更新,作者探讨了PE格式是如何与.NET协作的、PE文件的节(sections,有的地方译为块或区)、RVA、The DataDirectory,函数的引入等。

很久以前,我给Microsoft Systems Journal(现在的MSDN)写了一篇名为《Peering Inside the PE: A Tour of the Win32 Portable Executable File Format》的文章,发表后比我期望的还流行,到现在我还听说有人在用它(它还在MSDN里)。可惜,那篇文章过期了,WIN32的世界静悄悄地变了好多。从这个月开始我将用这两篇文章来弥补。

你可能会问为什么我们应当了解可执行文件格式,答案依旧:操作系统的可执行文件格式及其数据结构揭露了大量系统的底层细节,通过了解这些细节,你将会发现自己变得更专业。

当然,你可以通过阅读微软的标准(PE格式标准)来了解我将要告诉你的。但是,标准都是“宁可晦涩,但为瓦全”,以牺牲可读性来换取全面性的。本文讲述的都是与PE文件格式密切相关的内容,而这些内容又因为各种的原因没有放进标准里的。

一些背景

让我举一些自1994年我写那篇文章以来这个世界的变化。WIN32的到来,WIN16已经成为历史,也就没有必要对它的NE作什么比较和说明了……

而那时候,WINDOWS 95(也叫Chicago)还没有发行,NT还是3.5版,微软的编译器和连接器还没有很多代码优化功能,不过,Windows NT已经在 MIPS 、DEC Alpha 等构架平台都有了实现。

那么究竟,这么些年来,有些什么新的东西出来呢?64位的WINDOWS有了它自己的PE变种,WINDOWS CE 支持各种CPU了,各种编连接器载入器的优化功能游上水面,如DLL的延迟载入、节的合并,动态绑定等……

当然,更不要忘了Microsoft? .NET了。对操作系统而言,.NET的可执行文件格式还是PE,不过对.NET的运行时……FIXME

如果WIN32的这些变化都不足以让我重写这篇文章,就是原来的那些错误也让我汗颜。比如我对TLS的描述只是一带而过,我对时间戳的描述只有你生活在美国西部才有效等等。还有,很多东西已是作是今非了。我曾说过.rdata没什么用处,但现在却不然了;我还说过.idata节是可读可写的,但是一些搞API拦截的人却证明那是错的。

在更新这篇文章的过程当中,我也更新了PEDUMP这个用来列出PE文件风容的程序。这个程序能够在X86和IA-64平台下编译和运行。

微软引进的PE文件格式,并作为最初Win32官方标准一部分。不过,PE是从VAX/VMS上的COFF派生出来的。最初的WINDOWS NT小组成员的大部分是从DEC转过来,PE的这种血统是可以理解的。也很自然,这些人在NT的开发上会用他们以往的代码。

PE格式概览

采用术语“Portable Executable”是因为微软希望有一个通用可执行文件格式,使所有Windows系统能运行在所有支持的CPU上。整体来说,这个目标已经实现,PE现用于NT及其后代,95及其后代,和CE上。

微软的编译器产生(emitted)的目标文件(OBJ)使用的是COFF格式(kemin:目标文件不是可执行文件,它是链接器的输入,只需要可链接,不需可执行;FIXME:添加COFF文件格式示意图)。当你看到它的很多域字段(field )都是用八进制编码的,你会发现它是多么古老了。COFF OBJ文件用到了很多和PE一样的数据结构和枚举,我马上会提到一些。

64位的Windows 只对PE格式作了一点点改变。这个新的格式叫做PE32+。不但没有增加一个字段,反而只删了一个字段。其他的改变就是把以前的32位字段扩展成64位。对于C++代码,通过宏定义Windows 的头文件已经屏蔽了这些差别。

EXE与DLL的区别完全是语义上的一点点差别。它们用的都是同样一种文件格式-PE。唯一的区别就是其中有一个单独的标志位标识出文件是EXE还是DLL。同样的,动态链接库的扩展名DLL也是人为的,有很多动态链接库使用不同的扩展名,比如.OCX控件和CPL等都是动态链接库。

PE文件的一个非常方便的地主就是PE文件在磁盘中的数据结构布局和内存中的数据结构布局是一样的。载入可执行文件到内存(比如调用LoadLibrary)的主要任务就是把PE文件映射到地址空间(FIXME:载入过程请补充详细)。因此,数据结构像IMAGE_NT_HEADER在磁盘和内存中是一样的。关键的是如果你要懂得怎样从磁盘中的PE文件获取某些信息,那么当它被载入内存后,你一样知道怎样获得这些信息。

PE很重要的一点,PE文件并不是被简单的映射为单一个内存块,Windows 载入器通过察看PE文件的内部信息决定PE文件的哪些部分映射到哪里。这种映射的规则是,文件的高偏移地址映射到内存的高地址,如此顺然。不过,在磁盘上的PE文件内的一项的偏移地址与其被映射到内存的偏移可能会有所不同(KEMIN:原因没讲)。尽管如此,你也有足够的信息把文件偏移转化成内存偏移。见图一:

当Windows载入器把PE文件载入内存,在内存中它称作模块(module),文件从HMODULE这个地址开始映射。记住这点:给你个HMODULE,从那你(PE的结构)可以知道一个数据结构,然后你还可以推算出所有得数据结构。这个强大的功能对于API拦截特别有意义。(准确地说:对于Windows CE,这是不成立的,不过这是后话)。

内存中的模块有着进程所需要的所有代码,数据,资源。PE其他部分也可能被载入器读取,但不映射进内存(比如,重定位信息)。还有一些部分根本就不映射,比如调试信息。PE文件头中有一个字段(FIXME:哪个?)指定了文件映射到内存需要多少内存。不需要映射的数据放在文件的尾部,而在过去,所有部分都映射。

PE格式在WINNT.H得到详细的描述。在这个文件中,你可以找到所有的关于PE的数据结构定义,枚举、#define宏。当然,其它地方也有相关文档,MSDN 有其标准《Microsoft Portable Executable and Common Object File Format Specification》。

有很多检测PE文件的工具,有Visual Studio的Dumpbin,SDK中的DEPENDS,我比较喜欢DEPENDS,因为它以一种简洁的方式检测出文件的引入引出。一个免费的PE察看器,PEBrowse,来自smidgenosoft。我的pedump也是很有用的,它和dumpbin有一样的功能。

从api的角度看,imagehlp.dll提供了读写pe文件的机制。

在开始讨论具体的PE文件前,回顾一下PE文件的一些基本概念是有意义的。在下面几节,我将讨论:PE文件 节,相对虚拟地址(rva),数据目录,函数的引入。

一节PE文件的节要么表示代码要么表示某种数据。代码只有一种,而数据却有多种的类型。除了可读写的程序数据(如全局变量)外,PE文件的节可包含API的引入引出表,资源和重定位数据。每个节都有自己的内存属性(in-memory attributes),包括是否是代码节,是否只读还是可读可写,节的数据是否全局共享。(FIXME:More about 内存属性)

通常,节中的代码与数据间存在某种逻辑上的关系。PE文件一般至少要有两个节,一个是代码,另一个是数据。一般还有一个其它类型的数据的节。后面我将描述各种类型的节。

每个节都有一个独特的名字。这个名字是用来传达这个节的用途的。比如,.rdata表示一个只读节,节的名字对于操作系统毫无意义,只是为了人们便于理解。把节命名为FOOBAR与命名.text节一样是合法的。微软习惯给节命前加一个句点,这只是偏好并不是硬性规定。Borland的连接器用的是code和 data。

除了编译器将生产的标准的节(FIXME:给链接到COFF),你还可以建立和命名自己的节。在visual c++中,你能用#pragma指令告诉编译器把数据或指令插入到指定的节中。像下面这样:
  #pragma data_seg("MY_DATA")

这样,数据就会射入MY_DATA节,而不是一般的默认的.data节。大部分的程序使用编译器产生的默认节的情况下工作良好,但是有时候你会有这种特殊数据节的需求。

一个PE文件的节并不是完全由连接器生产的,节的起点是编译器(更确切的是汇编器)的输出--目标文件OBJ。连接器的工作就是把所有OBJ和库(.LIB其实也是已经编译过的目标文件)中相应的节合并成一个最终的PE文件的节。比如,你的工程中每一个OBJ文件有至少有一个.text节,连接器把不同的OJB文件中的.text节合并成PE文件的单一.text节。对于.data节亦然。静态库(.lib)的代码和数据也会包进PE文件,这些主题超出了这篇文章的范围了。(看这里

链接器在决定合并哪个节和如何合并节有很多的规则,我在July 1997 Under The Hood(July 1997 Under The Hood)介绍了链接器的算法。目标文件里的一些节是链接器本身用的,并不合并到PE文件,这些节信息一般是编译器传给链接器的(FIXME:什么信息什么节?)。

节有两个对齐的值(在可选头IMAGE_OPTIONAL_HEADER的File Alignment和Section Alignment字段),一个用于磁盘,另一个用于内存。PE文件头指定了这两个值,它们可以不一样。每个节的偏移从对齐值的倍数开始。比如,典型的对齐值是0x200,那么每个节的的偏移必须是0x200的倍数。(kemin:为什么使用两个对齐值,看这里;为什么对齐,看这里。)

当PE映射入内存,节的起始地址总是从页的边界开始,以页对齐。x86系列的cpu的页大小为4k,IA-64的为8k。下面是pedump打印出的Windows XP KERNEL32.DLL.的.text .data节的信息:

  Section Table
  01 .text VirtSize: 00074658 VirtAddr: 00001000
  raw data offs: 00000400 raw data size: 00074800
  ...
  02 .data VirtSize: 000028CA VirtAddr: 00076000
  raw data offs: 00074C00 raw data size: 00002400

.text节在PE文件中的偏移地址是0x400,映射入内存后成了0x1000;同样.data节在PE文件中的偏移是0x74C00,映射入内存后成了0x76000。

创建一个在文件中的偏移和它相对于载入地址的偏移相同的节是可能的。在98/me中,这会加速大文件的载入。Visual studio 6.0 的默认选项 /opt:win98j就是这样产生文件的。在Visual studio.net中是否用/opt:nowin98取决于文件是否够小。

一个有趣的链接器特征是合并节的能力。如果两个节有相似的、兼容的属性,链接的时候就可以合并为一个节。使用/merger选项合并。像下面就把.rdata和.text合并为一个节.text

/MERGE:.rdata=.text

合并节的优点就是对于磁盘和内存节省空间。每个节至少占用一页内存,如果你可以把可执行文件的节数从4减到3,很可能就可以少用一页内存。当然,这取决于两个节的空余空间加起来是否达到一页。

合并不同的节是件违规的事情,当然标准也没有硬性规定不可合并。比如你可以合并.rdata到.text,不过你最好不要把.rsrc、.reloc或.pdata节合并其它节。Visual Studio .NET前可把.idata合并到其它节,到Visual Studio .NET就不允许了。不过当创建发行版本时,链接器常用来把.idata合并到其它节,比如只读数据的节.rdata。

在Windows 载入器载入PE时,部分的引入信息会写进PE;你一定很想知道载入器是如何将这些引入信息写入只读节的(kemin:什么引入信息,为什么要写入只读节?只读属性用什么来保证?)这个写入操作之所可能,是因为在载入时,系统操作把包含引入数据的那一页临时设置为可读写,当引入表已经初始化,那一页被设回原来的属性。

相对虚拟地址

在可执行文件中,有很多对象需要指定其在内存时的地址,比如,引用全局变量时,需要指定它的地址。PE文件可载入进程地址空间的任意位置。尽管PE文件有一个首选的载入地址(KEMIN:怎么个首先法?),你不能依赖于PE的载入点。由于这点,必须有一个方法来指定(PE内的对象)地址而不依赖于PE的载入地址。为了避免把内存地址硬编码进pe文件,引入了RVA。RVA是一个简单的相对于PE载入点的内存偏移。比如,PE载入点为0x400000,代码节中的地址0x401000,那么它的RVA为

(target address) 0x401000 - (load address)0x400000 = (RVA)0x1000.

要将RVA转换为实际地址,只需简单的反转处理就行了:把RVA加上PE的载入点得到实际地址。顺便说一下,这里的实际地址在PE的层面上看也叫虚拟地址(因为操作系统也使用虚拟内存虚拟地址这个概念),这个虚拟地址是由RVA加上一个首选的载入地址得到。不要忘了早点我说的PE的载入点就是HMODULE。
想对探索内存中的任意DLL吗?用GetModuleHanle(LPCTSTR)取得载入点,用你的PE知识来干活吧

在映像文件中,相对虚拟地址总是当一个对象被载入内存后的地址减去映像文件的基地址得到。

数据目录

PE文件中有很多数据结构需要很快地被找到,比较显然的有引入表、引出表、资源,重定位数据。这些东西是以一致的方式来定位的--数据目录(DataDirectory)。

数据目录是一个结构数组,包含16个结构。每个元素有一个预定义的意思。有16宏定义了(IMAGE_DATA_DIRECTORY_xxx)数据目录的索引。下表描述了每个索引项所指向的东西:(FIXME:插入表

Value

IMAGE_DIRECTORY_ENTRY_EXPORT
IMAGE_DIRECTORY_ENTRY_IMPORT
IMAGE_DIRECTORY_ENTRY_RESOURCE
IMAGE_DIRECTORY_ENTRY_EXCEPTION
IMAGE_DIRECTORY_ENTRY_SECURITY
IMAGE_DIRECTORY_ENTRY_BASERELOC
IMAGE_DIRECTORY_ENTRY_DEBUG
IMAGE_DIRECTORY_ENTRY_ARCHITECTURE
IMAGE_DIRECTORY_ENTRY_GLOBALPTR
IMAGE_DIRECTORY_ENTRY_TLS
IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG
IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT
IMAGE_DIRECTORY_ENTRY_IAT
IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT
IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR

有关这些被指向的数据结构的详细信息会包含在文章的下一部分。

引入函数

当你使用别的DLL中的代码或数据,你在引入它。当任何一个PE文件被载入时,载入器的工作之一就是定位所有引入函数及数据,使那些地址对于载入的PE可用。具体细节在文章下部分讨论,在这里只是大概讲一下。

当你用到了一个DLL中的代码或数据,你就隐式地链接了这个DLL。但是你不必为这些引入的API的地址是否可用而操心,载入器为你做这些工作。另一种引入的方式是显式链接……FIXME

当你隐式链接了一个API,LoadLibrary 和GetProcAddress类似的代码还是执行的,只不过载入器为你做了这些。载入器还要确保PE文件所需得任何附加的DLL都已被载入。比如,当你链接了KERNEL32.DLL,而它又引入了NTDLL.DLL的函数;又比如当你链接了GDI32.DLL,而它又依赖于USER32、ADVAPI32、NTDLL和 KERNEL32 DLLs的函数,载入器会保证这些DLL被载入及所有的引入都被正确解析(resolved)。

隐式链接时,解析过程如果有什么问题(比如找不到DLL文件),进程将被终止。

Visual C++ 6.0 加入了DLL的延迟载入的特征,它是隐式链接和显式链接的混合。当你延迟载入DLL,连接器做出一些和引入标准规则DLL类似的东西,但是操作系统却不管这些东西,而是在第一次调用这个DLL中的函数的时候载入(如果还没载入),然后调用GetProcAddress取得函数的地址……FIXME

在PE文件内部有一个数据结构数组,每个引入DLL对应一个数据结构。每个数据结构给出了被引入的DLL的名字及指向一个函数指针数组的指针,这个函数指针数组就是引入地址表(import address table)(FIXME:给出结构详细)。每个引入的API函数,在IAT中都有一个保留槽,载入器将在那里写入真正的函数地址。最后特别重要一点的是:模块一旦载入,IAT中包含所要调用的引入函数的地址。

把所有输入函数放在IAT一个地方是很有意义的,这样无论代码中多少次调用一个引入函数,都是通过IAT中的一个函数指针。

让我们看看是怎样调用一个引入函数的。有两种情况需要考虑:有效率的和效率差的。最好的情况像下面这样:
CALL DWORD PTR [0x00405030]

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics