Squishing Circles
I was admiring my Procedural Circles Generator the other day, and lamenting on the fact that I never ended up doing anything with it. It really is a clever trick for generating some vaguely circle-like shapes, and could still be useful for future genetic algorithm type projects. But this day, I wanted to do something with it. That’s when I thought to myself, “How would this circle look in binary”.
Expanded Circles
As it stands, the circles are already fairly minimal when it comes to the amount of data they use. Seeing as their original purpose was to be easily modified by a not-quite-genetic genetic algorithm, the circle really just consists of an origin point, and then a series of floats that signify one point’s distance from the origin. Everything else that makes the circle a circle is all in the draw step. All of this is kept in a JSON object with just a handful of properties. Even the maximum & minimum distances, and the maximum delta difference aren’t stored.
{
"origin": {
"x": 116,
"y": 143
},
"distances": [
38.26689843458476,
38.25581926094269,
41.03205144648891,
41.95031210993302,
// etc...
]
}
Squishing This Down
Looking at this from a file size perspective, though, there’s a lot of superfluous bytes being used. The computer doesn’t need to know that one integer is the origin.x
and the other is origin.y
. So, there was my first step in compressing this down. Given that the original project only uses an allowed range of 100 - 200 for the both of the origin points, we can store this in a single byte, and disregard the potential future limitations of this for now. Truth be told, the coordinates of the origin point don’t really need to be stored in object anyways, so this oversight really is a nonissue.
With those silly labels out of the way, we’re already down from an initial size of 26 bytes (whitespace removed) to a meager 2 bytes.
0x74 0x8f
Floating Down The Byte-Stream
The real challenge from this task came from the only other part of the circle, the array of floats. To recap, each float in this array represents the distance of a point in the circle from the center. This is then used along with some fancy math that I don’t fully understand to draw a point. That point is then connected by a straight line to the previous point. During the creation of the circle, each point has a minimum and maximum distance from the origin point, and it also has a maximum delta difference from the previous point.
Now, I’ve worked with raw binary in an assortment of data types and structures before, but floats in binary have always escaped me. And today, I’m proud to say I still don’t fully understand them! What I do understand, though, is that floats can either take up 4 bytes (float32) or 8 bytes (float64) depending on how precise you want them to be. Seeing as I originally wrote this script in JavaScript, it uses float64. I really don’t think it needs that much precision, though. Float32’s allow for 7 points of precision, and seeing as I’m trying to compress this data, that should be plenty! By choosing to store them as float32’s, we’ve essentially already compressed the structure down by 50%! Woohoo!
Building The Bin
Now that we know how to store the array of floats, the only thing left to do is put them all together. To accomplish this, I wrote a simple python script, circles.py, that performs the steps outlined above, as well as a couple others. I would’ve liked to have done this in JavaScript, but I don’t find it the most agreeable language for working with binary, and Perl can be a bit cryptic at times, so Python it is!
The ‘Ol John Hancock
Every good file needs a signature, right? For my file, I decided on 0x42
. I originally did this to be silly. 42 funny number, ha ha. After looking at it in a hex editor, though, I realized that’s ASCII B
, so I accidentally made my file signature my name (kinda)!
Circular Origins
As mention above, the next set of bytes in the file are the circle’s origins. These are stored as two 1-byte numbers.
How Many Points?
While not wholly necessary, the next byte stores the amount of points that make up the circle. If I wanted to, I could use this value for a checksum at the end of the file.
Finishing The File
Last but certainly not least, the file ends with the list of 4-byte floats.
The Final Product
With all that, we now have a compressed circle! Pictured below is a highlighted look at the finished binary. The image doesn’t really scale well, so you can click through to open the image in its full size.
But How’s It Perform?
Very well, thanks for asking! The original JSON file that represents the circle above comes in at 2.34 kilobytes, but the compressed file is an impressive 508 bytes. That’s almost an 80% compression rate! Of course, the compression rate will depend on the number of points in the circle, but even for the smallest “circle” with only one point we go from 60 bytes of JSON to 8 bytes when compressed. That’s nearing 90% compression!
What I Learned
Having had some interest for a while now in compression, and learning about it some with my PentaBit Cipher, I did still learn a fair bit doing this exercise. Most importantly, floats are, computationally speaking, weird as shit. It was also a fun exercise to have think about data structure in the smallest terms possible. I’m used to having to do with with SQL (do you really need that field to be an INT
?), but I’ve never really had to do it outside of that.
I hope to update the Procedural Circles Generator in the near future to be able to export as these compressed files, as well as to be able to import from them as well.