Thursday, September 7, 2017

How to Get Into Reverse Engineering: Where to Start?

One of the biggest hurdles I experienced when getting into reverse engineering was finding an entry point into this seemingly arcane realm of the computer world. It is not an easy subject by any means, nor do I claim to be an expert in any way, but hopefully by writing this blog post I can ease the process of learning about RE.

This is meant to be a high-level guide on how to build a solid foundation for getting into RE. My goal with this post is to provide direction rather than technique, so I will not go in to great detail on each subject. I am going to be focusing x86 and x86_64 on Windows and Linux (mosty Linux), as this is where my experience has come from. I will post a "Further Reading" section at the bottom of this post as well. I want to pack a ton of resources into this and I'm sure I will continually update this as I find new stuff to add.

First off is having strong fundamental knowledge of how computers work, on both the hardware and software level. Make sure you understand how hardware and software work together and how an operating system works. Here are some books/resources that are helpful:

Code: The Hidden Language of Computer Hardware and Software by Charles Petzold

Windows Internals Pt. 1 7th ed.

How Linux Works by Brian Ward

http://www.tldp.org/LDP/tlk/tlk-toc.html

Google

That last one is the most important in the list. Do research! If you don't know something, Google it! Or use Bing if you are a serial killer. But seriously, the best tool in RE is the ability to read. If you ever wonder how something works, look it up. Chances are someone else has had the same question you do, and they may have answered it. If no one else has answered it, maybe you can be the one to provide the answer to everyone else! That's the beauty of the internet.

Learn to program! This is VERY important. I'm not saying "Become a Level 20 C++ wizard!" or anything to that extent, but it is absolutely necessary to understand how programs function. There are more than enough resources out there to learn whatever programming languages you want, but there are a few languages that I highly recommend becoming very familiar with:

  • x86 and x86_64 Assembly (110% necessary)
  • C++
  • Python 

These are by no means the only languages you should become familiar with, but in my opinion I found that these are prevalent enough to be considered necessary. Also, I am personally a terrible programmer. I mostly just write specialized programs for myself when I need to. The idea of developing a large scale program actually sounds pretty terrible to me. But what I do have is the understanding of how a program is written, compiled and eventually, run (Which is what counts right?? ;) ). Here are some programming resources:

https://www.nostarch.com/greatcode.htm <== 2 part series

http://www.learn-c.org/

http://opensecuritytraining.info/IntroX86.html

https://www.learnpython.org/

http://www.learncpp.com/


Learn how executables and binaries work! Learn about ELF binaries, PEs and DLLs! Learn about what the OS does when they run, and what happens in memory at runtime. How can you reverse engineer something if you don't know how it actually works?

Fantastic free course on how binaries work:
http://opensecuritytraining.info/LifeOfBinaries.html

Linux:
http://www.skyfree.org/linux/references/ELF_Format.pdf

https://linux-audit.com/elf-binaries-on-linux-understanding-and-analysis/

https://lwn.net/Articles/631631/

Windows:
https://msdn.microsoft.com/en-us/library/ms809762.aspx

https://support.microsoft.com/en-us/help/815065/what-is-a-dll

https://en.wikibooks.org/wiki/X86_Disassembly/Windows_Executable_Files

These next resources are geared towards the memory management and processing side of things:

What Makes it Page? by Enrico Martignetti

Windows Internals Pt. 1 7th ed. <== Again!

http://www.tldp.org/LDP/tlk/mm/memory.html


So far I have only covered the informational aspect of things and not practical application. This is the area that most people get stuck at when trying to get into RE. I got stuck here big time. Not only is this the hardest part to learn, it is also the hardest part to teach. There is no single right way to reverse engineer. It is a topic that is too vast and there are far too many variables involved to create an effective "all encompassing" book or course. There are tips and tricks people can teach you along the way, but the majority of the heavy lifting will have to be done by you. This is where critical thinking, logical analytic skills and abstract thought come in. The ability to really visualize what is going on and being able to think in an abstract manner is invaluable. Even though the world of computers is very logical, it seems incredibly abstract in contrast to the way our brains are wired. This book will be a huge help and it will teach you C++ as you go:

https://www.nostarch.com/thinklikeaprogrammer

Now that we have the knowledge and hopefully some magic new brain skills, it brings us to our next topic: Tools! There are a plethora of RE tools out there, almost an overwhelming amount to be honest. Which ones should you use? Well, it all depends on what you're doing and what you prefer.

My preference of diassembler and debugger when I am working in Windows is using IDA Pro:

https://www.hex-rays.com/products/ida/

and WinDBG:

https://developer.microsoft.com/en-us/windows/hardware/download-windbg

IDA Pro is an incredibly advanced disassembler. I highly recommend becoming familiar with it. There is an amazing book you can read to do so:

https://www.nostarch.com/idapro2.htm

Not only is it a great book on IDA, it is a great RE book in general. Here is a pretty good intro using the demo version:

http://resources.infosecinstitute.com/basics-of-ida-pro-2/#gref

My preference of diassembler and debugger while working in Linux is is radare2:

http://rada.re/r/

and GDB:

https://www.gnu.org/software/gdb/

Here is a quick intro to GDB:

https://www.cs.cmu.edu/~gilpin/tutorial/

And a full book on radare2:

https://www.gitbook.com/book/radare/radare2book/details

To Be Continued...

Well, I don't know what else to say, so this will be it for now. As I stated earlier, I will continue to add material as I think of it.

Further reading etc:

Practical Malware Analysis

Practical Reverse Engineering

https://github.com/michalmalik/linux-re-101

https://exploit-exercises.com/

The Art of Memory Forensics

https://www.alchemistowl.org/pocorgtfo/

https://github.com/rshipp/awesome-malware-analysis

No comments:

Post a Comment