Simple JSON parser in c++, rust, ocaml, standard ml

in #cplusplus2 years ago

I've written a simple ASCII-only JSON parser in C++, Rust, OCaml and Standard ML. This was done for educational purposes mainly. The focus was to learn some more about writing parsers by hand. Another point was to compare implementations in these languages. I was mostly interested in CG v non-GC.

The parser was first implemented in OCaml. All other implementations was based on that and therefore are somewhat similar (at least as much as possible).
It was then ported to Standard ML, next to Rust and finally to C++.
All the implementations are external-dependencies free. The only exception to that is catch2 library used to unit test C++ solution.

Here's the link to repo with all implementations https://github.com/serpent7776/json-parse

Compilers versions that I have used:

compilerversion
g++12.2.0
rustc1.65.0 (Arch Linux rust 1:1.65.0-1)
ocaml4.14.0
mlton20210117-4

OCaml

The parser was first written in OCaml.
After some thinking on how to implement parsing, I came up with a solution that is loosely based on parser combinators. I previously used Angstrom, which is a parser combinator library for OCaml and it felt quite easy to use. Implementing the parser in this style resulted in code that's easy to use and is immutable as much as possible.
To read characters from a string to be parsed I just used indexing. Strings are basically arrays of bytes in OCaml and since I was writing ASCII-only parser, that was all I needed.
I very much liked OCaml's custom let bindings and used them for error reporting.
OCaml's compile times are super fast. Thanks to that the development cycle feels very rapid and helps with focusing on implementation. Another advantage of OCaml is that it can be both compiled to native executable and also used interactively with its REPL. The one that comes with OCaml installation is quite limited, but there's a better one provided as a package: utop. I think that every modern language should come with interactive REPL. This really helps with both learning the language and just trying things out.
One thing to be careful about are type annotations. They can be omitted in most cases, as compiler will infer them anyway. Without them though compile time error messages will tend to be less helpful and sometimes will be marked at wrong places. So even though not needed, it's better to provide them at least for functions and their parameters.

Standard ML

Once I got the parser ready in OCaml, I thought I port it to Standard ML, since it belong to the same ML language family. I was also curious on how well mlton could optimise it.
The language lacks custom let bindings, so I resorted to use Result.bind manually. This makes code much less readable and more verbose. The standard library also lacks result type, so I had to come up with my own simple implementation. There's also a lack of any hash map in the standard library, so I just used a list of key-value pairs. This isn't correct, but it's the closest I could get without inventing my own hash map.
MLton's compile times are slow. It also lacks interactive REPL. Because of that I used alternative Standard ML implementation for interactive usage: PolyML. Debugging MLton binaries is also pretty hard. gdb doesn't work and there's no bundled debugger. I had to resort to debugging facilities built into PolyML.
Valgrind doesn't work with mlton binaries, as it doesn't report any memory allocations. Looks like mlton uses mmap for allocation memory.
Surprisingly, performance is not the best. This might be due to heavy usage of my custom Result type and bind calls. Exceptions seem to be a more natural choice for error reporting in Standard ML. I tried to make such a change, but this didn't improve the performance much.

Rust

While not in ML family, Rust has somewhat similar language features, so I decided to port the parser to that language in the next step.
The porting process was not very complicated, but one issue was that OCaml's way of indexing strings didn't work. That's because strings in rust are always UTF-8 and don't support indexing, so I had to go some other way. Initially I tried to go with iterators, but that didn't compose well enough for me. So I looked up how nom does it. As it turned out, it uses byte slices. So I used that too and it worked out well.
The language relies heavily on the traits and documentation is quite hard to follow at first because of that. That's not a big issue though and it just takes a bit of reading to take used to it.
The standard library is geared towards forcing the programmer to handle errors, which is a good thing. The compilation times are not the best, but if there's an error, it's reported fast.
One big advantage of Rust is that all copies are explicit via the clone function of Clone trait. It is because values are moved by default. It is therefore easy to make sure no extra copies are made.
Rust is a step towards linear/affine type system and I really believe it's the direction where languages should head. It helps with achieving better performance, eases lifetime analysis and gives a way to handle resource management. I don't think it's more difficult to understand than value semantics and it's way saner than reference semantics (unless it's fully immutable).

C++

C++ is a very different beast from the previous languages, but it's in a similar category as Rust - it's used where performance matters.
Contrary to the previous languages, C++ lacks pattern matching and algebraic data structures. To simulate code structure from earlier implementations, I came up with a match function to simulate the former and used std::variant for the latter. Unfortunately, this was somewhat awkward to use and resulting code is quite verbose and doesn't look very idiomatic.
The compilation times are pretty large and error reporting was pretty bad. This partially might be due to the point above.
While developing the code I came across a huge, weird warning from g++: https://cpp.godbolt.org/z/9Yv9MPPnn
clang doesn't generate such a warning, so g++'s message might be incorrect here. valgrind doesn't find any issues.
In more complex code it's hard to guarantee that no extra copies are created, because language defaults to copying things. This can be changed by making type move-only, but usage of such types is somewhat harder. I cannot create a vector of such types, and initialise it with a list of objects, because that tries to make copies. I had to create a helper functions to do this.
Performance is not the best. It is at least partially because the compiler couldn't optimise visit calls on std::variant which I use a lot.
In C++ it would be more natural to write such code the mutable, imperative way.

Here's some quick basic stats:

c++rustocamlsml
lines of code939604556529
binary size [B]78K9.3M1.2M279K
stripped binary size [B]63K363K962K180K
time parsing 1 json [s]1.0100.6831.6372.300
time parsing 13 jsons [s]3.9072.7356.3267.716
peak heap [MB]376.7388.2317.6N/A
total allocated [B]804,664,036598,650,503335,248,296N/A

Parsing single 90MB JSON

c++rustocamlsml
real [s]1.0100.6831.6372.300
user [s]0.8750.5661.5191.066
sys [s]0.1330.1160.1161.232

Parser written in Rust is the fastest one. It beats C++ noticeably. Parser written in Standard ML is the slowest one, but if we look at the user time then it's faster than the parser written in OCaml.
This is because of the system time that mlton generated binary needs to parse a JSON file. It's an order of magnitude larger than other binaries (even PolyMl which does not aim at generating high performance binaries like mlton does).

Parsing 13 JSONs of varying sizes

c++rustocamlsml
real [s]3.9072.7356.3267.716
user [s]3.4062.2055.8573.584
sys [s]0.4960.5230.4624.124

These results look very similar to the previous ones. And again parser generated by mlton needs an order of magnitude more system time to parse files.

Diving in

Let's see what's going on. First, what syscalls are performed during parsing of the single 90 MB file.
On Linux we can use strace, on FreeBSD truss can be used to get similar data.

strace ./parser -n ../json/_all.json

c++rustocamlsml
bufsize [B]819197359720655364096
reads118882148623771
total syscalls139382444157023863

We see that mlton generated parser uses the smallest buffer size for reading and therefore performs the biggest number of read syscall.
I tried to increase buffer size (via -const 'TextIO.bufSize 65536' commandline option) to 64k and that reduced the number of syscalls to 1576. That improved run time only by about 0.1s though.
When I tried to increase the buffer size to the size of the file, my entire system was OOM killed.
An interesting case here is Rust, which reads whole file at once. The second read doesn't return any data. It's probably a just-in-case read. This is possible, because Rust has a dedicated library call to read entire file contents into a string. C++ doesn't have such a function, so it has to do a chunked read in a loop. Both OCaml and Standard ML have such functions too, so it should be possible for them to also provide similar functionality.

There must be something more than just buffer size. Let's see at the GC stats with the gc-summary runtime option:
./parser @MLton gc-summary -- -n ../json/_all.json:

GC type         time ms  number           bytes       bytes/sec
-------------   ------- ------- --------------- ---------------
copying             358       7     589,620,880   1,646,985,653
mark-compact          0       0               0               -
minor                 0       0               0               -
total time: 2,056 ms
total GC time: 510 ms (24.8%)
max pause time: 209 ms
total bytes allocated: 4,124,334,760 bytes
max bytes live: 220,716,792 bytes
max heap size: 1,765,801,984 bytes
max stack size: 1,728 bytes
num cards marked: 0
bytes scanned: 0 bytes
bytes hash consed: 0 bytes

Almost 25% of the total time is spent in CG. Let's increase the heap to 0.5GB via fixed-heap runtime option:
/parser @MLton gc-summary fixed-heap 0.5G -- -n ../json/_all.json:

GC type         time ms  number           bytes       bytes/sec
-------------   ------- ------- --------------- ---------------
copying              37       1     101,169,096   2,734,299,891
mark-compact        165       1     123,660,360     749,456,727
minor               256      22     171,290,376     669,103,000
total time: 1,281 ms
total GC time: 472 ms (36.8%)
max pause time: 165 ms
total bytes allocated: 4,124,331,144 bytes
max bytes live: 123,660,360 bytes
max heap size: 532,701,184 bytes
max stack size: 1,728 bytes
num cards marked: 22
bytes scanned: 39,600 bytes
bytes hash consed: 0 bytes

This reduced runtime to about 1.3s and system time to about 0.25s. The total GC time was lowered, but the percentage of time needed for GC increased to 36%. Further increase of the heap size does not help. In fact, it makes things worse. When set to 2GB, the run time goes back to initial value and both total time and percentage of total time increase even further:

GC type         time ms  number           bytes       bytes/sec
-------------   ------- ------- --------------- ---------------
copying             182       2     333,032,048   1,829,846,505
mark-compact        781       1     288,717,832     369,677,111
minor                40       1               0               0
total time: 2,441 ms
total GC time: 1,033 ms (42.3%)
max pause time: 782 ms
total bytes allocated: 4,124,331,144 bytes
max bytes live: 288,717,832 bytes
max heap size: 2,130,829,312 bytes
max stack size: 1,728 bytes
num cards marked: 1
bytes scanned: 1,800 bytes
bytes hash consed: 0 bytes

With these optimisations mlton generated binary is able to beat parser generated by OCaml, but only by about 0.3s. Manually adjusting heap size is also very unstable, so it's hard to rely on.

valgrind's memcheck output on parsing 90MB JSON file

c++rustocamlocaml (OCAMLRUNPARAM=c)
allocs5,745,7616,861,7118282
frees5,745,7616,861,7111778
bytes allocated804,664,036598,650,503335,248,296335,249,515
still reachable [B]005,650,8811,096

Valgrind doesn't work with binaries generated by mlton. It doesn't report any memory allocation.
OCaml is doing the least number of allocations, only 82 are reported. This is probably because it preallocates memory in larger chunks. Surprisingly, it looks like OCaml is allocating the least number of bytes. Not sure how to interpret this, but maybe it's reusing already allocated memory more heavily.
It also looks like OCaml is not cleaning all allocated memory when exiting. There's even a run time parameter to force doing this (OCAMLRUNPARAM=c), but even then number of allocations doesn't exactly match number of frees.
C++ and Rust are pretty similar here. Rust is doing a few more allocations, but C++ allocates more bytes.

valgrind's massif results

Similarly to memcheck, massif also reports that OCaml uses the least memory at peak usage. The difference isn't big, but it's there.
On the other hand, it requires astonishing 14.5G instructions to execute. That's pretty huge.
C++ and Rust are again similar, but Rust requires the least number of instructions to execute (3.4G). This is about half what C++ needs. This is probably because of poorly optimised visit calls on variants in C++ code. When analysing output of massif with ms_print for C++, it generated a lot of messages about deep recursion:

Deep recursion on subroutine "main::read_heap_tree" at /usr/bin/ms_print line 224, <INPUTFILE> line 41626.

C++

    MB
376.7^                                                                      : 
     |                                                                   @@@#:
     |                                                              @@@::@ @#@
     |                                                           @@:@ @: @ @#@
     |                                                      ::@@@@@:@ @: @ @#@
     |                                                   @@:: @@ @@:@ @: @ @#@
     |                                              @@@::@ :: @@ @@:@ @: @ @#@
     |                                         @@@@@@@ : @ :: @@ @@:@ @: @ @#@
     |                                     @@@@@@ @@@@ : @ :: @@ @@:@ @: @ @#@
     |                                 @@:@@@ @@@ @@@@ : @ :: @@ @@:@ @: @ @#@
     |                             @@@@@ :@@@ @@@ @@@@ : @ :: @@ @@:@ @: @ @#@
     |         :               ::@@@ @ @ :@@@ @@@ @@@@ : @ :: @@ @@:@ @: @ @#@
     |         :           ::::::@ @ @ @ :@@@ @@@ @@@@ : @ :: @@ @@:@ @: @ @#@
     |         :       ::::::: ::@ @ @ @ :@@@ @@@ @@@@ : @ :: @@ @@:@ @: @ @#@
     |         :::::::::: :::: ::@ @ @ @ :@@@ @@@ @@@@ : @ :: @@ @@:@ @: @ @#@
     |         :    : ::: :::: ::@ @ @ @ :@@@ @@@ @@@@ : @ :: @@ @@:@ @: @ @#@
     |    :    :    : ::: :::: ::@ @ @ @ :@@@ @@@ @@@@ : @ :: @@ @@:@ @: @ @#@
     |    ::::::    : ::: :::: ::@ @ @ @ :@@@ @@@ @@@@ : @ :: @@ @@:@ @: @ @#@
     |    :    :    : ::: :::: ::@ @ @ @ :@@@ @@@ @@@@ : @ :: @@ @@:@ @: @ @#@
     |  :::    :    : ::: :::: ::@ @ @ @ :@@@ @@@ @@@@ : @ :: @@ @@:@ @: @ @#@
   0 +----------------------------------------------------------------------->Gi
     0                                                                   7.037

OCaml

    MB
317.6^                                                                   ::::#
     |                                                                   :   #
     |                                                                   :   #
     |                                                      @:::::::::::::   #
     |                                                      @            :   #
     |                                          ::::::::::::@            :   #
     |                                          :           @            :   #
     |@:::::::::::::::::::::::::@@@@@@@@@@@@@@@@:           @            :   #
     |@                         @               :           @            :   #
     |@                         @               :           @            :   #
     |@                         @               :           @            :   #
     |@                         @               :           @            :   #
     |@                         @               :           @            :   #
     |@                         @               :           @            :   #
     |@                         @               :           @            :   #
     |@                         @               :           @            :   #
     |@                         @               :           @            :   #
     |@                         @               :           @            :   #
     |@                         @               :           @            :   #
     |@                         @               :           @            :   #
   0 +----------------------------------------------------------------------->Gi
     0                                                                   14.56

Rust

    MB
388.2^                                                                     #  
     |                                                                 @@@@#: 
     |                                                            @@@@@@@@@#: 
     |                                                         :::@@@@@@@@@#: 
     |                                                    ::::::::@@@@@@@@@#: 
     |                                               :::::: :: :::@@@@@@@@@#: 
     |                                          @@@::::: :: :: :::@@@@@@@@@#::
     |                                      ::::@ @: ::: :: :: :::@@@@@@@@@#::
     |                                  ::::: ::@ @: ::: :: :: :::@@@@@@@@@#::
     |                             :::::::::: ::@ @: ::: :: :: :::@@@@@@@@@#::
     |                        @@@@@:::::::::: ::@ @: ::: :: :: :::@@@@@@@@@#::
     |                     @@@@@@@ :::::::::: ::@ @: ::: :: :: :::@@@@@@@@@#::
     |                @@@@@@@@@@@@ :::::::::: ::@ @: ::: :: :: :::@@@@@@@@@#::
     |           @@@@@@@ @ @@@@@@@ :::::::::: ::@ @: ::: :: :: :::@@@@@@@@@#::
     |      @:::@@@@@ @@ @ @@@@@@@ :::::::::: ::@ @: ::: :: :: :::@@@@@@@@@#::
     |  ::::@:: @@@@@ @@ @ @@@@@@@ :::::::::: ::@ @: ::: :: :: :::@@@@@@@@@#::
     | :: ::@:: @@@@@ @@ @ @@@@@@@ :::::::::: ::@ @: ::: :: :: :::@@@@@@@@@#::
     | :: ::@:: @@@@@ @@ @ @@@@@@@ :::::::::: ::@ @: ::: :: :: :::@@@@@@@@@#::
     | :: ::@:: @@@@@ @@ @ @@@@@@@ :::::::::: ::@ @: ::: :: :: :::@@@@@@@@@#::
     | :: ::@:: @@@@@ @@ @ @@@@@@@ :::::::::: ::@ @: ::: :: :: :::@@@@@@@@@#::
   0 +----------------------------------------------------------------------->Gi
     0                                                                   3.423

Summary

Overall the best results for this particular use case is probably Rust. Move by default together with pattern matching and algebraic data structures makes it easy to use and yields very good performance. OCaml is very good too, but its standard library is a bit dated and it yields worse performance. On the other hand, I find the language more readable than Rust.
Porting OCaml code to C++ is not very straightforward. There's no algebraic data structures and no pattern matching and so it would be way more natural to write a parser in a mutable, imperative style in this language.
I was a bit disappointed that mlton generated binary was less performant than OCaml version, but it may be just a matter of my code.