Efficient compression of molecular dynamics trajectory files

We investigate whether specific properties of molecular dynamics trajectory files can be exploited to achieve effective file compression. We explore two classes of lossy, quantized compression scheme: “interframe” predictors, which exploit temporal coherence between successive frames in a simulation, and more complex “intraframe” schemes, which compress each frame independently. Our interframe predictors are fast, memory-efficient and well suited to on-the-fly compression of massive simulation data sets, and significantly outperform the benchmark BZip2 application. Our schemes are configurable: atomic positional accuracy can be sacrificed to achieve greater compression. For high fidelity compression, our linear interframe predictor gives the best results at very little computational cost: at moderate levels of approximation (12-bit quantization, maximum error ≈ 10−2 Å), we can compress a 1–2 fs trajectory file to 5–8% of its original size. For 200 fs time steps—typically used in fine grained water diffusion experiments—we can compress files to 25% of their input size, still substantially better than BZip2. While compression performance degrades with high levels of quantization, the simulation error is typically much greater than the associated approximation error in such cases.