The Infomaniac's Classroom

Welcome! C'mon in!
 
HomeGalleryCalendarFAQRegisterLog in

Share | 
 

 TUTORIAL: Using a Hex Editor to Analyze Binary Files

View previous topic View next topic Go down 
AuthorMessage
legomoe
Roamer
avatar

Bricks Placed : 6
Join date : 2011-09-28
Location : Space

PostSubject: TUTORIAL: Using a Hex Editor to Analyze Binary Files   Thu Jul 19, 2012 2:37 am

-----Segendony wanted me to post a topic on this subject, so I've taken the liberty of writing a small tutorial on how to analyze binary files using a program called a 'hex editor'. Analyzing binary files used by a video game (like LEGO Island 2) can be helpful for writing modding tools, because once you understand how the different binary file formats used by a game are structured, you (or someone you share the information with) can write a program to convert that type of file to a more usable format (for example: converting the binary LEGO Island 2 '.msh' format into the text-based and widely used '.obj' format).

-----To start, you're going to need a hex editor. I personally use Hexplorer for windows. but you can use any hex editor you want. (Note: if you use Hexplorer, on some computers, the default font used by the program is too small to read. To fix this, launch Hexplorer, then in the menu bar, under "View" select "Options...". This should open a windows that lets you pick the font used by the editor. You can also change the editor's text color scheme from this window as well.)

-----You can analyze any binary file you want, but for this tutorial I'll be using a file from LEGO Island 2. The 'Pizza.col' file from the 'col' directory inside the z02LGI.bod/z02LGI.bob archive. If you want to get this file for yourself, you'll need to download my LI2Explorer tool (http://www.mediafire...34oy5quwy5qtsc4) and use it to open the z02LGI.bod/z02LGI.bob archive located in the '_data' directory of the LEGO Island 2 folder (The path should be "C:\Program Files\LEGO Media\LEGO Island 2\_data").
-----Once you open the z02LGI.bod/z02LGI.bob archive with LI2Explore, in the LI2Explorer's menu bar under 'Edit' select 'Export' and then choose a location on your hard drive (I like to use my Desktop for things like this). After a short pause, this should create a folder named 'z02LGI' at the location you specified. Inside the new 'z02LGI' folder should be another folder called 'col', the 'Pizza.col' file should be inside this directory.

-----Open the 'Pizza.col' file with Hexplorer and you should see something like this:



On the left hand side is the hexadecimal representation of the contents of the binary file, and on the right is the exact same data, only represented as ASCII characters. You're going to need both of these representations to analyze the file.
-----Now, the first thing you need to keep in mind when trying to analyze a binary file is what kind of data is the binary file going to contain. Is it an image file of some sort? Is it an audio file? Does it hold data for a 3D model? or maybe its an archive used to hold other files and folders? Once you know the type of data being represented in the binary file, you can start to figure out exactly how it's organized, and how to write a program to extract the data from the file.
-----For our example, it can be guessed that the '.col' files in the 'col' directory contain collision data for the game's 3D models, because col is probably short for collision, and since most all the '.msh' files in the meshes directory that would need collision data have a corresponding '.col' file in the 'col' directory.

-----Now, before you go any further, it's important to understand about basic data types and how they are represented in a binary file. While a binary file may be a very long string of 1s and 0s (bits), for practical purposes they are treated as a string of 8-bit bytes. Again, notice on the left side of the Hexplorer window you can see the individual bytes that make up the file 43 42 42 58 00 7E 42 BB ... etc, represented in hexadecimal (base 16) notation, and on the right, the exact same bytes are represented as ASCII characters. These bytes, alone or in groups, can be use to represent all sorts or data from characters to numbers to strings of text. Here's a basic overview of the different common data types stored in binary files:

Code:
Name:-------------------Number of Bytes:-----Values: (Can signed (- and +) or unsigned (just +))
Character---------------1--------------------0 to 255 or -127 to 127
Short or Word-----------2--------------------0 to 65535 or -32767 to 32767
Integer-----------------4--------------------0 to 4294967295 or -2147483647 to 2147483647
Floating Point----------4--------------------(Varies based on representation)

-----Note: When you put bytes together to form larger data types, you need to think about what order the bytes are stored in (Endianness). There are 2 different ways to order bytes: One is known as Little-endian, where the lest significant byte is stored first. the other is known as Big-endian, where the most significant byte is stored first. For example, take the bytes 00 and 10. If these two bytes are read as being a Little-endian short, they would be equal to hexadecimal 1000 (decimal 4096), because the 00 byte is stored first, it is the least significant byte, therefor it goes behind the 10 byte when you read the short. Now, if we interpret these same 2 bytes as being a Big-endian short, they would be equal to hexadecimal 0010 (decimal 16). This is because the 00 byte, being stored first, is the most significant byte, therefore it goes before the 10 byte when you read the short.
-----Just so you know, most windows computers use Little-endian, while Macs use Big-endian. I also know that all the binary files used by LEGO Island 2 are meant to be read at Little-endian.

-----Part of analyzing a binary file is being able to figure out what kind of data types are being represented by the bytes. For our example, lets take a look at the first four bytes: 43 42 42 58, These bytes could represent the decimal values 67, 66, 66, and 88, but they could also represent the two short values 16963 and 22594, or the single 4 byte integer value 1480737347.
-----Now, if you look to the right hand side of the Hexplorer window, you see that these four bytes 43 42 42 58 form the ASCII characters C, B, B, and X. All of these are printable ASCII characters, which means that these first four bytes may be representing a text string, rather than a number. I know from experience that binary files often begin with a 4 character string that identifies what kind of file it is, so it's a pretty safe bet to assume that the first four characters are mean to be read as "CBBX".
-----On a side note, not all binary files begin with 4 character file identifiers, and many binary files are often divided into sections beginning with 4 character identification strings. So the fact that the 'Pizza.col' file begins with "CBBX" does not necessarily mean that the entire file is a "CBBX" file, it may simply mean the the file does not have a 4 byte string at the beginning, and that "CBBX" is just the first section to appear in the file.

-----The reason that binary files and sections within those files have these sorts of 4 byte ASCII strings is because when a program reads a binary file, it needs some way to know for sure what kind of file its reading, and where sections may appear within the file. Remember, the 4 byte ASCII strings that appear at the beginning of the file or at the start of sections is often followed by information that can be use to analyze the file or section, such as the number of sections within the file, or how large the particular section of the file may be.

-----Now if you examine the next 28 bytes of the file, you'll see that if you treat them as 4 byte integers, they all represent arbitrary, excessively large numbers, and if you look to the right side of the Hexplorer window, these 28 bytes don't form a printable ASCII string. So what do these bytes represent? I personally don't know, but I do know that if you just can't seem to figure out what certain bytes represent, it may be better to move on to a different part of the the file until you have an insight into what said bytes represent.

-----Take a look at the bytes just beyond the first 32 bytes. The first four are 43 4F 4C and 50, and the next four are 40 00 00 and 00. Since the first four form the ASCII string "COLP", we can assume that this is the start of the "COLP" section. Now, since this is the start of a section, often the next few bytes will give us a clue as to how big this section is (don't forget that a computer program needs to read this file, and it can't magically know how big a section of a file will be).
-----The four bytes after 43 4F 4C and 50 may hold the key to the length of the "COLP" section. If you read 40 00 00 and 00 as a 4 byte integer, you get decimal 64. Now, this probaly doesn't mean that the "COLP" section is 64 bytes long, since if we look 64 bytes past 40 00 00 and 00, you can't see any 4 byte ASCII string to mark the beginning of a new section. However, the 64 may mean that the "COLP" section is made up of 64 sub-sections.
-----This is where its important to look for patterns in binary files, since if you look at the next 88 bytes, you will see many elements repeat themselves. The next 88 bytes begin with 23 00 00 00 00 00 00 00, and end with four sets of CC CC CC CC. Now if we look at the 88 bytes after this, we see the exact same thing, and if we look at the 88 bytes after that, the pattern repeats.
-----Now we need to do a little math. If the "COLP" section is supposedly 64 sub-sections long, and each sub-section is 88 bytes long, whats 64 times 88? The answer is 5632. Now in Hexplorer if you click and hold at the beginning of the first sub-section (The bytes 23 00 00 00 right after the "COLP" ASCII string and the bytes '40 00 00 00' that tell you the length of the section) and then scroll down with the mouse wheel till you have exactly 5632 bytes selected (it should tell you how many bytes you have selected at the bottom of the window). The next four bytes past the end of the 5632 selected bytes should be 47 52 44 and 50, which form the ASCII string "GRDP" and signify the start of a new section.

-----Congradulations! You now know that the second section of the 'Pizza.col' file begins with the 4 byte ASCII string "COLP", and the next four bytes represent a 4 byte integer that tells you the length of the section in 88 byte sub-sections. This is as far as I'm going to go in this tutorial, but if you feel like continuing on your own, you could try and figure out what sort of data is stored in each of the 88 byte sub-sections. Remember that 3D collision data is commonly stored as a simplified 3D mesh, so you might try looking for vertex location information in each of the sub-sections, but I can't guaranty that that's what you'll find.

-----These are the basics of how you analyze a binary file with a hex editor. It takes a lot of time, and you might not always be able to figure out every part of a file, but you can still learn plenty of useful information about how the file, and other files like it, may be structured.
-----If you know Assembly language, an easier way to analyze a binary file is to disassemble the program that loads said binary file and examine how the program goes about loading the file.

-----I hope you found this tutorial helpful. If you have any questions or comments, please feel free to post them in the topic here.

Legomoe
Back to top Go down
View user profile
Xiron
Infomaniac
avatar

Bricks Placed : 129
Join date : 2011-04-10
Location : At a computer in the Infomaniac's Classroom.

PostSubject: Re: TUTORIAL: Using a Hex Editor to Analyze Binary Files   Thu Jul 19, 2012 3:09 pm

Yaaay, you rewrote what you told me in a pm into something everyone can read. :D
Back to top Go down
View user profile http://s3.zetaboards.com/Lighthouse_of_Yoshi/profile/4101850/
Teawater
Roamer
avatar

Bricks Placed : 31
Join date : 2012-01-18

PostSubject: Re: TUTORIAL: Using a Hex Editor to Analyze Binary Files   Fri Jul 27, 2012 2:10 pm

Quote :
-----Just so you know, most windows computers use Little-endian, while Macs use Big-endian. I also know that all the binary files used by LEGO Island 2 are meant to be read at Little-endian.
Interesting... GBA and DS are also Little-endian, it seems.


Quote :
-----If you know Assembly language, an easier way to analyze a binary file is to disassemble the program that loads said binary file and examine how the program goes about loading the file.
Even better, run it through a debugger so you can keep up with the values of the registers. Breakpoint Debugging is awesome, by the way.

I'm not as familiar with Windows Assembly as I am with GBA/DS assembly, so Debugging executable programs is something I'll have to do some testing with in the future. I'm hoping it'll help me with improving an editor that I'm making for the M&L series.
Back to top Go down
View user profile
legomoe
Roamer
avatar

Bricks Placed : 6
Join date : 2011-09-28
Location : Space

PostSubject: Re: TUTORIAL: Using a Hex Editor to Analyze Binary Files   Fri Jul 27, 2012 9:36 pm

Teawater wrote:
Even better, run it through a debugger so you can keep up with the values of the registers. Breakpoint Debugging is awesome, by the way.

I'm not as familiar with Windows Assembly as I am with GBA/DS assembly, so Debugging executable programs is something I'll have to do some testing with in the future. I'm hoping it'll help me with improving an editor that I'm making for the M&L series.

Actually, a program I found recently that could be useful for this purpose is Cheat Engine. Especially if you already have some experience with assembly language and binary hacking, Cheat Engine looks like it could be a very powerful tool.
Back to top Go down
View user profile
Xiron
Infomaniac
avatar

Bricks Placed : 129
Join date : 2011-04-10
Location : At a computer in the Infomaniac's Classroom.

PostSubject: Re: TUTORIAL: Using a Hex Editor to Analyze Binary Files   Sat Jul 28, 2012 1:04 am

legomoe wrote:
Actually, a program I found recently that could be useful for this purpose is Cheat Engine. Especially if you already have some experience with assembly language and binary hacking, Cheat Engine looks like it could be a very powerful tool.
I know how to use Cheat Engine for game hacking (as you can see by my LEGO Racer 2 videos on youtube, and my cheat/code file posted on RRU), though I don't know how to use it to study ASM in games.
Back to top Go down
View user profile http://s3.zetaboards.com/Lighthouse_of_Yoshi/profile/4101850/
Teawater
Roamer
avatar

Bricks Placed : 31
Join date : 2012-01-18

PostSubject: Re: TUTORIAL: Using a Hex Editor to Analyze Binary Files   Sat Jul 28, 2012 11:40 pm

Ah. Cheat Engine, I remember looking at that program a couple years ago, I think it was. ( I have version 5.6, and I bet it is outdated by now. )

At that time, I don't think I understood their assembly code, but thanks to you, I might take another look soon enough.

Other than that, I've seen other interesting programs as well, but I doubt that they compare to cheat engine?
Back to top Go down
View user profile
Xiron
Infomaniac
avatar

Bricks Placed : 129
Join date : 2011-04-10
Location : At a computer in the Infomaniac's Classroom.

PostSubject: Re: TUTORIAL: Using a Hex Editor to Analyze Binary Files   Mon Jun 02, 2014 8:32 pm

Looking back at this I honestly understood very little of what you were saying at the time. After having some hex editing experience of my own (with LEGO Racers 2 WRL files mainly) several months back, I now clearly understand everything you say! :D
Back to top Go down
View user profile http://s3.zetaboards.com/Lighthouse_of_Yoshi/profile/4101850/
Sponsored content




PostSubject: Re: TUTORIAL: Using a Hex Editor to Analyze Binary Files   

Back to top Go down
 

TUTORIAL: Using a Hex Editor to Analyze Binary Files

View previous topic View next topic Back to top 
Page 1 of 1

 Similar topics

-
» I need solution for Multiple FCK Editor for Selenium IDE...
» LocalFileDetector (uploading files) - InternetExplorer Driver (Java platform)
» Social Engineering Tutorial
» Getting main" org.openqa.selenium.WebDriverException: Cannot find firefox binary in PATH
» [ANSWERED] Integrate drivers tutorial

Permissions in this forum:You cannot reply to topics in this forum
The Infomaniac's Classroom :: LEGO Island Games Discussion :: Modding Discussion and Research :: LEGO Island 2-