Twitter user "Logic_Beach" tweeted out this puzzle, with a prize of ~$200 USD worth of UNI token for whomever could solve it first:

The puzzle was solved in less than a day, and while I wasn't the fastest to solve, the process of puzzling it out was a fun journey, and so I wanted to document it for those who want to get involved in this sort of puzzle but don't know where to start. The winner of the prize posted a video showing how they cracked it, which includes using the zsteg and stegolsb wavsteg command line tools. Those tools automate several things, so it might not be evident to you exactly what they're doing. So, let's step though the puzzle in a more manual way to get a better understanding of how this works.

The Puzzle itself

The puzzle itself is posted as a PNG file, which looks like grayscale image of some white lines that get more and more blurry, going left-to-right across the image.

As I assessed the image itself, things I noticed first were it was not directly embedded into the Tweet. Tweets on Twitter can have images attached directly, which Twitter hosts, but this one was actually a link to Imgur, to an image by user "hermanfelker", named "test". That "hermanfelker" user account is seven years old, but isn't the host for any of the other puzzles LogicBeach has posted previously. So, I logged it away that maybe part of the key will be other images posted by that user.

The image itself reminded me of graphs of sin waves , drawn where the cyclical period was varying (some of the sin waves are squished horizontally). That reminded me of radio waves, which can be encoded by frequency modulation (FM radio), so logged that away that the solution might have to do with FM logic.

The PNG

I then moved on to investigating the pixel data of the PNG itself. One trick that is sometimes used in puzzles like this is writing messages in areas of the image that look like they're a single solid color, but in reality it could be an absolute black background with very-very-very-very dark gray text written on it (for example, this Tesla ad). So, I dropped the PNG into photoshop and played around with the brightness/contrast curves, and found this:

Level-adjusted

Just by adjusting the contrast, I could see that the image, though it appeared to be completely grayscale, many of the pixels actually have color values to them (the red, green, and blue color channels aren't all the same, which they would be in a true grayscale image).

The colors evident there didn't make legible text, but optically looking at it, it was clearly not random noise. Visually, the colors form horizontal bands, which indicates there's some stream of data encoded there, written left-to-right. I guessed the message was also written top-to-bottom, as the bottom section of the image seemingly had none of these modifications to it.

The left-most section of the image is a "pure black" background, but as the image progresses to the right, there's a noise/cloud effect added, which makes the background not be pure black, and so the pattern gets muddled. Using the eyedropper tool in the supposedly "pure black" background of the left edge of the image, I found the pixels there had 8 different color values, with their red, green, and blue color channels either having a 4 or a 5 as a value (i.e. (4, 4, 4), (4, 4, 5), (4, 5, 4) ... (5, 5, 4), (5, 5, 5)). In most modern image formats, they use the RGB color model, where each pixel of an image is encoded with how much red, green, and blue it has, from zero to 255 in value, which allows for 16 million different color combinations. 16 million colors gives lots of control to get a fine-tuned color, but in practice, most monitors and human eyes couldn't tell the difference between (4, 4, 4) and (5, 5, 5). Here's an example:

Black colors test

The left half of that image is (4, 4, 4), and the right half is (5, 5, 5); can you see the difference?

Steganography

I recognized this sort of data hiding as a common form of steganography in digital media; using very small changes in the digital value to encode a different message.

I wrote a script to parse through the whole image's pixel values, and I found that across the whole image, there were no pixels that varied by more than 1 across its red, green, and blue channels (e.g. (4, 4, 5) could exist, as could (200, 200, 199), but (100, 100, 105) would not, since that's a jump of 5, not 1). When dealing with data like that, it means there's only two types of numbers being used: even and odd. If we treat even as "zero" and odd as "one", we can therefore convert the picture into binary data. Since every pixel has three bits of data (red, green, and blue), and there's eight possible combinations of zero and one in a set of three, we can recolor the whole image by picking 8 colors to represent each of those combinations of three binary digits. Doing so, we get this:

Binary false-color

Working visually like this, we can use human perception to pick out a few things that would be harder to parse out quickly with just code analysis. Looking at that image, it's clear the data encoded into it stops partway through the image, and then the rest of the image is left un-encoded (you can see the bottom edges of some of the "sin waves", and the effect of the noise pattern of the main image, that weren't "zeroed out" when the image had the hidden message applied). We can also see that the encoded data appears to have some repeating segments; it looks like stripes/bands running diagonally across the image that have the same texture/pattern to them.

I zeroed in on that repeating pattern, and extracted one of the repetitions of it, and tried changing how wide a canvas it was drawn on to see if there was other detail there I was missing:

Repeating chunk

This is one repetition of the repeating section of data (with a little bit of overlap to show the repeat; the colored string of data that's followed by pure white that's in the top-left corner of this image you can also see in the bottom-left corner as the pattern starts to repeat), using the same colors from the previous image. Looking at this, I noticed there were several sections that were similar, but not identical, that repeated through. There were several areas where there was a lot of black, then a short section of colored bits, then a long section of white. And there were two different sorts of these black-colored-white repeats; some of the colored chunks had longer spans of black/white before and after (orange circles), and some were a little less (green circles).

Repeating subdivisions

Now, since I knew this was a cryptocurrency-based puzzle, I guessed that the solution I was looking for was a mnemonic phrase, which would be the private key to the address with the prize. Mnemonic phrases come as a string of words that is some multiple of three, but in each of these repeating sections, there were five of these repeating patterns, so it didn't make sense for those to be the start of "words" in the mnemonic phrase.

So, I backed up and looked for a different path:

RGB to Binary

Being able to encode data into the red, green, and blue color channels in an image means each pixel has three bits of data in it, and if you keep it in that format, you're dealing with a numbering system with 8 options (0 through 7). That would be the "Octal" numbering system, which isn't the most common way to store data in modern computing. Most computer data storage works off of "bytes" of data, which is eight bits of data (leading to 256 values, from 0 through 255). Going from 8 bits of data in a byte, to trying to store it in a pixel that can only hold 3 bits means each byte of data would take up two-and-two-thirds of a pixel to store. So, possibly I'd need to unwind the data that was packed in bundles of 3 and re-pack it into bundles of 8? To give that a test, I went through the whole image again, and converted the pixels like: (4, 4, 5) to (even, even, odd) to (0, 0, 1), and concatenated all the values together. That leads to a string of zeroes and ones three times longer than the number of pixels in the image.

I did a visualization of that string of data (using black for zero and white for one), to check and see if it looked like it was making logical sense. Here's the first part of the message, in binary:

Binary data

The width of this canvas is 512, which is a multiple of 8, to see if things line up that way. And indeed there seems to be several sections of data where there's bits set every 8 bits or so, so indications are good it's organized into 8-bit chunks, which is a standard "byte" of data.

Taking chunks of 8-bits of data from that stream, and treating it as if it were text yields something like:

éÑ.RIFFáÑ.WAVEfmt...D¬

That's clearly not fully English, but there's a few bits in there that might be. The phrases "RIFF" and "WAVE" are visible in the first few characters of the translated data. Most all file formats, even if they use other compression methods for the rest of the file, often start with some bytes of data that convert to ASCII characters, with some sort of recognizable phrase. These are known as "magic numbers" or "magic bytes", and can be used to identify a file for what format it is, without using a file extension. "RIFF" and "WAVE" are the signature phrases for the WAV audio format. So, this picture has an audio file embedded in it!

`zsteg` Tool

Coming back to the video posted on how Lia cracked the puzzle, you can hopefully now see what the zsteg tool did more clearly. zsteg takes several well-known means of hiding data in an image and brute-force tries them all, and then shows you a fragment of the resulting file, for you to see if that makes sense. In the video at 0:30, Lia notices one of the steganography methods results in something that has "WAVE" in the output, and correctly identifies that's the parsing method needed. zsteg then helps automate the process of getting the embedded WAV file out.

WAV file

If we take a look a the raw output of the "WAV file" embedded in the image, we find that the "RIFF" magic word isn't right at the beginning of the file; there's three bytes of data (0xe9d106) at the beginning of the file, and most audio programs won't open the file if those three bytes are there. Is this a mistake made by the puzzle creator when encoding? Or are those bytes going to be a key for something later? Deleting those three bytes off the front of the file (Lia does this in a hex editor at timestamp 0:58 in the video) makes the file able to be opened with an audio program.

Here's a link to grab the WAV file itself. It's a 10-second audio file, which sounds like it's a pulsing tone, but no other human-identifiable sounds in it.

This is as far as I got on my own solving the puzzle before Lia beat me to the punch and solved the puzzle. So, I have no further doodling on different avenues explored for possibilities, but from their solving video, we can see what the next steps are to finish the puzzle:

Steganography again!

Just as the WAV file itself was hidden in minute color changes in the PNG image, there's something hidden in the WAV file through minute changes in the sound volume.

The stegolsb wavsteg command line tool converts the very small volume changes to zeroes and ones, and parses that as a binary file, and this embedded file turns out to be text.

Python script

At 1:38 in the video Lia opens the text file that was hidden in the WAV file. The text that was hidden in the WAV file has a message from LogicBeach, and a chunk of Python code. At the bottom of the screen you can see there's additional garbage at the end of the file, which is the additional values from the WAV file that didn't actually have encoded data in it (similar to the section of the original image at the bottom that didn't need to have data encoded into it).

The Python script shows what you need to do to transform this audio file into the final secret. It parses the wave file and measures not the volume (how tall the sound waves are) but the frequency (how squished/scaled horizontally it is; which the original image visual alluded to). The output of the Python script is a chart, showing what blend of frequencies have been combined to make that audio tone.

As Lia explains in the video, there's twelve "peaks" in the resulting graph. These become the the twelve words in the seed phrase by looking at the numeric frequency, and then looking up that word in the BIP39 word list (which is the standard way seed phrases are generated).

Conclusion

So, that's the solution for this one: the PNG file has a WAV file in it, and the WAV file has a Python script in it, and the Python script gives an output that points to the 12 words needed to claim the reward! A fun puzzle to work through; thanks LogicBeach for creating, and good job Lia for solving it so quickly!

The rabbit trail I went down looking at the repeating cycles in the binary data saved in the image we now know was part of the WAV data format, and it repeated many times, because the ten-second audio clip has many repetitions of the same wave pattern happening over and over again.

Hopefully this has been helpful to see how I approach puzzles like this; for most of these there's many ways/tools to arrive at the correct answer, so feel free to try several to find one that suits you and the way you approach problems. Happy puzzle-solving!

Logic Beach UNI puzzle