22C3 - 2.2

22nd Chaos Communication Congress
Private Investigations

Richard Johnson
Day 3
Room Saal 2
Start time 18:00
Duration 01:00
ID 926
Event type Lecture
Track Hacking
Language English

Disassembler Internals II: Automated Data Structure Recognition

Disassembler Internals II is an advanced look at the power of programmatic disassembly analysis. The talk will focus on data structure recognition for the purposes of reducing time spent reverse engineering protocols and proprietary file formats.

Disassembler Internals II is an advanced look at programmatic disassembly analysis with a focus on data structure recognition. The original Disassember Internals presentation given at Toorcon 7 discussed the basic concepts required to build a high-level disassembler. These topics included binary format parsing, opcode disassemblers, and elementary disassembly analysis algorithms for indentifying relationships within the code. These topics will be reintroduced to bring attendees up to speed, and Disassembler Internals II will take the audience to the next level with a discussion of techniques for programmatically recognizing data structures.

The ability to properly identify high-level data structures is crucial in the process of reverse-engineering. General structure recognition is accomplished by tracking references to offsets within a known set of data. Depending on the complexity of the assembly code, a great percentage of fields can be immediately identified, reducing the amount tedious manual labor required when reversing a protocol or file format. Given advanced disassembler tools with cross- referencing abilities, tracking variables and examining the transfer of pointers from one location to another to identify high-level objects is fully attainable through static binary analysis. A view of how the program interacts with supplied data can be analyzed to determine memory allocation for structures, structure member data-types, and potential flaws in structure parsing code. This sort of analysis can be rapidly prototyped with IDA Pro and developed further as desired in custom reverse-engineering tools.

Finally, the presentation will discuss the usefulness of the concepts when applied to automated vulnerability discovery. The category of vulnerability discovery tools known as "fuzzers" can benefit greatly from the ability to automatically determine the structure of the data being manipulated. Fuzzers can be used to rapidly determine parsing errors in protocols and file formats. There are generally two approaches to software fuzzing: random manipulation of a valid dataset or using pre-defined protocol templates. The latter approach is typically more effective, but requires substantial effort to construct a protocol template that is useful for the fuzzer. The combination of fuzzing technology and algorithms for automatic protocol template generation will lead to intelligent fuzzers that are more effective at finding vulnerabilities. The presentation will conclude with a demonstration and release of a standalone console disassembler/analyzer for PE and ELF binaries and an IDA plugin capable of identifying structures in code.