RFC: Ideal CouchDB DB Dump Format
There’s a discussion going on in CouchDB as to what an ideal dump format for a CouchDB database would look like. A CouchDB database is a collection of URI’s, while the content associated with any given URI is often JSON, CouchDB supports the notion of an attachment that could be pretty much anything.
So... how do you dump a database? Leading options are:
- Build a custom JSON wrapper around everything, containing names, content-types, and the data. Certainly no invention required, and doesn’t require any additional libraries to support it (beyond base64 and/or hex). The format is a big specific to CouchDB, but that’s not an issue.
- Chose a wrapper format like multipart MIME. Libraries for this are plentiful, and the format is easy enough to parse or produce anyway. The format is theoretically less specific to CouchDB, but it isn’t clear that there is any existing tool or library which would benefit from this, so that is less likely to be an issue.
Before I solicit input, I’ll share my leanings, which is towards the latter. My reasons are twofold: in order to do JSON right, you would need a streaming JSON parser. Using MIME to segment a stream seams easier to me. Secondly, we are talking backups here. A single bit error can be difficult to recover from in a JSON stream, but the effects in a multipart MIME segmented stream would be a lot more localized.
But I’m far from an expert on multipart MIME, so I would welcome any input. To get the ball rolling, try playing with this (source).