Publication Date
9-1-2022
Journal
Journal of Structural Biology
DOI
10.1016/j.jsb.2022.107875
PMID
35724904
PMCID
PMC9645247
PubMedCentral® Posted Date
9-1-2023
PubMedCentral® Full Text Version
Author MSS
Published Open-Access
yes
Keywords
Automation, Cryoelectron Microscopy, Data Collection, Data Compression
Abstract
With larger, higher speed detectors and improved automation, individual CryoEM instruments are capable of producing a prodigious amount of data each day, which must then be stored, processed and archived. While it has become routine to use lossless compression on raw counting-mode movies, the averages which result after correcting these movies no longer compress well. These averages could be considered sufficient for long term archival, yet they are conventionally stored with 32 bits of precision, despite high noise levels. Derived images are similarly stored with excess precision, providing an opportunity to decrease project sizes and improve processing speed. We present a simple argument based on propagation of uncertainty for safe bit truncation of flat-fielded images combined with lossless compression. The same method can be used for most derived images throughout the processing pipeline. We test the proposed strategy on two standard, data-limited CryoEM data sets, demonstrating that these limits are safe for real-world use. We find that 5 bits of precision is sufficient for virtually any raw CryoEM data and that 8-12 bits is sufficient for intermediate averages or final 3-D structures. Additionally, we detail and recommend specific rules for discretization of data as well as a practical compressed data representation that is tuned to the specific needs of CryoEM.