Google Open-Sources Protocol Buffers: High-Performance Data Serialization
Google announced yesterday that it is open-sourcing a technology that they use internally called Protocol Buffers. The idea is this: parsing and dumping structured, hierarchical data is slow. XML might be nice to read, but performance is in the gutter. One solution is to adopt a language specific approach to serialization, such as Python's pickle module. Problem with that is, you end up being stuck with a particular programming language, and passing data between programs becomes an issue.
Protocol Buffers introduces a simple Interface Definition Language (IDL) for structured data. You use the language to define the data's structure, and then you compile it into the programming language of your choice. The end result is blazingly fast parsing and dumping of data. Google describes it thus:
Protocol Buffers allow you to define simple data structures in a special definition language, then compile them to produce classes to represent those structures in the language of your choice. These classes come complete with heavily-optimized code to parse and serialize your message in an extremely compact format. Best of all, the classes are easy to use: each field has simple "get" and "set" methods, and once you're ready, serializing the whole thing to – or parsing it from – a byte array or an I/O stream just takes a single method call.
Obviously, Google didn't have science in mind when they created Protocol Buffers, but it sounds like it could be an interesting option for working with scientific data. It would be nice to eventually see support for Fortran and other scientific languages, but for the time being Protocol Buffers can generate C++, Python, and Java.


