Playlists
History
PRACTICAL BINARY ANALYSIS
Build Your Own Linux Tools for Binary Instrumentation, Analysis, and
Disassembly
Topics
Tutorials
by Dennis Andriesse
Offers & Deals
Highlights
Settings
Support
Sign Out
San Francisco
Playlists
PRACTICAL BINARY ANALYSIS. Copyright © 2019 by Dennis Andriesse.
History
Topics
All rights reserved. No part of this work may be reproduced or transmitted in any form
or by any means, electronic or mechanical, including photocopying, recording, or by
any information storage or retrieval system, without the prior written permission of the
copyright owner and the publisher.
Tutorials
Offers & Deals
ISBN10: 1593279124
ISBN13: 9781593279127
Highlights
Settings
Publisher: William Pollock
Production Editor: Riley Hoffman
Cover Illustration: Rick Reese
Interior Design: Octopod Studios
Developmental Editor: Annie Choi
Technical Reviewers: Thorsten Holz and Tim Vidas
Copyeditor: Kim Wimpsett
Compositor: Riley Hoffman
Proofreader: Paula L. Fleming
Support
Sign Out
For information on distribution, translations, or bulk sales, please contact No Starch
Press, Inc. directly:
No Starch Press, Inc.
245 8th Street, San Francisco, CA 94103
phone: 1.415.863.9900; info@nostarch.com
www.nostarch.com
Library of Congress CataloginginPublication Data
Names: Andriesse, Dennis, author.
Title: Practical binary analysis : build your own Linux tools for binary
instrumentation, analysis, and disassembly / Dennis Andriesse.
Description: San Francisco : No Starch Press, Inc., [2019] | Includes index.
Identifiers: LCCN 2018040696 (print) | LCCN 2018041700 (ebook) | ISBN
9781593279134 (epub) | ISBN 1593279132 (epub) | ISBN
9781593279127 (print)
| ISBN 1593279124 (print)
Subjects: LCSH: Disassemblers (Computer programs) | Binary system
(Mathematics) | Assembly languages (Electronic computers) | Linux.
Classification: LCC QA76.76.D57 (ebook) | LCC QA76.76.D57 A53 2019 (print) |
DDC 005.4/5dc23
LC record available at https://lccn.loc.gov/2018040696
No Starch Press and the No Starch Press logo are registered trademarks of No Starch
Press, Inc. Other product and company names mentioned herein may be the
trademarks of their respective owners. Rather than use a trademark symbol with every
occurrence of a trademarked name, we are using the names only in an editorial fashion
and to the benefit of the trademark owner, with no intention of infringement of the
trademark.
The information in this book is distributed on an “As Is” basis, without warranty. While
every precaution has been taken in the preparation of this work, neither the author nor
No Starch Press, Inc. shall have any liability to any person or entity with respect to any
loss or damage caused or alleged to be caused directly or indirectly by the information
contained in it.
History
INTRODUCTION
Topics
The vast majority of computer programs are written in highlevel languages like C or
C++, which computers can’t run directly. Before you can use these programs, you must
first compile them into binary executables containing machine code that the computer
can run. But how do you know that the compiled program has the same semantics as
the highlevel source? The unnerving answer is that you don’t!
Tutorials
Offers & Deals
Highlights
There’s a big semantic gap between highlevel languages and binary machine code that
not many people know how to bridge. Even most programmers have limited knowledge
of how their programs really work at the lowest level, and they simply trust that the
compiled program is true to their intentions. As a result, many compiler bugs, subtle
implementation errors, binarylevel backdoors, and malicious parasites can go
unnoticed.
Settings
Support
Sign Out
To make matters worse, there are countless binary programs and libraries—in industry,
at banks, in embedded systems—for which the source code is long lost or proprietary.
That means it’s impossible to patch those programs and libraries or assess their
security at the source level using conventional methods. This is a real problem even for
major software companies, as evidenced by Microsoft’s recent release of a painstakingly
handcrafted binary patch for a buffer overflow in its Equation Editor program, which is
part of the Microsoft Office suite.
1
In this book, you’ll learn how to analyze and even modify programs at the binary level.
Whether you’re a hacker, a security researcher, a malware analyst, a programmer, or
simply interested, these techniques will give you more control over and insight into the
binary programs you create and use every day.
WHAT IS BINARY ANALYSIS, AND WHY DO YOU NEED IT?
Binary analysis is the science and art of analyzing the properties of binary computer
programs, called binaries, and the machine code and data they contain. Briefly put, the
goal of all binary analysis is to figure out (and possibly modify) the true properties of
binary programs—in other words, what they really do as opposed to what we think they
should do.
Many people associate binary analysis with reverse engineering and disassembly, and
they’re at least partially correct. Disassembly is an important first step in many forms of
binary analysis, and reverse engineering is a common application of binary analysis and
is often the only way to document the behavior of proprietary software or malware.
However, the field of binary analysis encompasses much more than this.
Broadly speaking, you can divide binary analysis techniques into two classes, or a
combination of these:
Static analysis Static analysis techniques reason about a binary without running it.
This approach has several advantages: you can potentially analyze the whole binary in
one go, and you don’t need a CPU that can run the binary. For instance, you can
statically analyze an ARM binary on an x86 machine. The downside is that static
analysis has no knowledge of the binary’s runtime state, which can make the analysis
very challenging.
Dynamic analysis In contrast, dynamic analysis runs the binary and analyzes it as it
executes. This approach is often simpler than static analysis because you have full
knowledge of the entire runtime state, including the values of variables and the
outcomes of conditional branches. However, you see only the executed code, so the
analysis may miss interesting parts of the program.
Both static and dynamic analyses have their advantages and disadvantages, and you’ll
learn techniques from both schools of thought in this book. In addition to passive
binary analysis, you’ll also learn binary instrumentation techniques that you can use to
modify binary programs without needing source. Binary instrumentation relies on
analysis techniques like disassembly, and at the same time it can be used to aid binary
analysis. Because of this symbiotic relationship between binary analysis and
instrumentation techniques, this books covers both.
I already mentioned that you can use binary analysis to document or pentest programs
for which you don’t have source. But even if source is available, binary analysis can be
useful to find subtle bugs that manifest themselves more clearly at the binary level than
at the source level. Many binary analysis techniques are also useful for advanced
debugging. This book covers binary analysis techniques that you can use in all these
scenarios and more.
WHAT MAKES BINARY ANALYSIS CHALLENGING?
Binary analysis is challenging and much more difficult than equivalent analysis at the
source code level. In fact, many binary analysis tasks are fundamentally undecidable,
meaning that it’s impossible to build an analysis engine for these problems that always
returns a correct result! To give you an idea of the challenges to expect, here is a list of
some of the things that make binary analysis difficult. Unfortunately, the list is far from
exhaustive.
No symbolic information When we write source code in a highlevel language like C
or C++, we give meaningful names to constructs such as variables, functions, and
classes. We call these names symbolic information, or symbols for short. Good naming
conventions make the source code much easier to understand, but they have no real
relevance at the binary level. As a result, binaries are often stripped of symbols, making
it much harder to understand the code.
No type information Another feature of highlevel programs is that they revolve
around variables with welldefined types, such as int, float, or string, as well as more
complex data structures like struct types. In contrast, at the binary level, types are never
explicitly stated, making the purpose and structure of data hard to infer.
No highlevel abstractions Modern programs are compartmentalized into classes
and functions, but compilers throw away these highlevel constructs. That means
binaries appear as huge blobs of code and data, rather than wellstructured programs,
and restoring the highlevel structure is complex and errorprone.
Mixed code and data Binaries can (and do) contain data fragments mixed in with the
executable code.
versa, leading to incorrect results.
This makes it easy to accidentally interpret data as code, or vice
2
Locationdependent code and data Because binaries are not designed to be
modified, even adding a single machine instruction can cause problems as it shifts
other code around, invalidating memory addresses and references from elsewhere in
the code. As a result, any kind of code or data modification is extremely challenging and
prone to breaking the binary.
As a result of these challenges, we often have to live with imprecise analysis results in
practice. An important part of binary analysis is coming up with creative ways to build
usable tools despite analysis errors!
WHO SHOULD READ THIS BOOK?
This book’s target audience includes security engineers, academic security researchers,
hackers and pentesters, reverse engineers, malware analysts, and computer science
students interested in binary analysis. But really, I’ve tried to make this book accessible
for anyone interested in binary analysis.
That said, because this book covers advanced topics, some prior knowledge of
programming and computer systems is required. To get the most out of this book, you
should have the following:
• A reasonable level of comfort programming in C and C++.
• A basic working knowledge of operating system internals (what a process is, what
virtual memory is, and so on).
• Knowledge of how to use a Linux shell (preferably bash).
• A working knowledge of x86/x8664 assembly. If you don’t know any assembly yet,
make sure to read Appendix A first!
If you’ve never programmed before or you don’t like delving into the lowlevel details of
computer systems, this book is probably not for you.
WHAT’S IN THIS BOOK?
The primary goal of this book is to make you a wellrounded binary analyst who’s
familiar with all the major topics in the field, including both basic topics and advanced
topics like binary instrumentation, taint analysis, and symbolic execution. This book
does not presume to be a comprehensive resource, as the binary analysis field and tools
change so quickly that a comprehensive book would likely be outdated within a year.
Instead, the goal is to make you knowledgeable enough on all important topics so that
you’re well prepared to learn more independently.
Similarly, this book doesn’t dive into all the intricacies of reverse engineering x86 and
x8664 code (though Appendix A covers the basics) or analyzing malware on those
platforms. There are many dedicated books on those subjects already, and it makes no
sense to duplicate their contents here. For a list of books dedicated to manual reverse
engineering and malware analysis, refer to Appendix D.
This book is divided into four parts.
Part I: Binary Formats introduces you to binary formats, which are crucial to
understanding the rest of this book. If you’re already familiar with the ELF and PE