Parsing A WebAssembly Binary With Kaitai Struct

· 305 words · 2 minute read

Traditionally Java libraries come with everything but the kitchen sink. In the past, people have given a lot of crap to the JavaScript ecosystem, but if there is something I envy, is how tiny some libraries are. People are still joking about left-pad to this day; in reality, the JavaScript ecosystem has come a long way ever since, and I believe we have all something to learn.

Long story short, I was looking for a Java library to parse WebAssembly binaries, and what I found was either a full-blown interpreter or something outdated. Then I remembered about Kaitai Struct, a parser generator for binary structures that supports a number of languages. I have read great things, but I never had a reason to try it personally; until now.

Kaitai Home

Kaitai Struct comes with a rich library of supported binary formats. Because it is a parser generator with multiple language backends, by writing one good grammar you will get support for all those languages for “free”. If a wasm grammar was there, my search would have ended. Unfortunately, WebAssembly is not present officially.

But nothing was lost: there is a Sophos Labs repository with a fairly complete Kaitai grammar for Wasm binaries. This grammar is working in most cases except it does not support Data Count Sections and it does not parse Custom Sections correctly.

Custom sections are in fact fairly important for compilers. For instance, LLVM’s WebAssembly backend stores extra metadata in a linking custom section. When such a section is present, the wasm file is an object file, and it is not expected to be directly executable. So, I decided to fix the grammar myself; this is when I learned that Kaitai comes with a fantastic Web-based IDE, complete with a live preview of the parse-tree and an hex-editor that syncs with the parse tree!

That is pretty cool! I have opened a PR to the original repository to fix the initial issues I found, but I am working on extending the grammar to add some support to the custom linking section. Hopefully, this will make it easier to support reading a wasm file from multiple languages, without having to go through the entire spec every time.