Hardwood: A New Parser for Apache Parquet
https://www.morling.dev/blog/hardwood-new-parser-for-apache-parquet/loading story #47209846
loading story #47209649
This sounds great. parquet-java is extremely unpleasant to use with its massive fan-out of dependencies, an awkward API which exposes these dependencies causing the dependencies to bleed into a user's code base - the Hadoop stuff is particularly annoying given the relatively poor quality (IMO) of the Hadoop code base and the amount of class name sharing with built in Java types (like File, FileSystem, etc.). And the performance of parquet-java is very poor compared to the libraries available to other languages.
loading story #47206905
loading story #47208710
Respect for doing this. I recently implemented a Parquet reader in Swift using parquet-java as a reference and it was by a long way the hardest bit of coding I’ve done. Your bit unpacking is interesting, is it faster then the 74 KLOC parquet-java bit unpacker?
Thanks! See https://news.ycombinator.com/item?id=47206861 for some general comments on performance. I haven't measured bit unpacking specifically yet.
{"deleted":true,"id":47206289,"parent":47167432,"time":1772369804,"type":"comment"}