Cryo-EM is a method for 3D imaging of biological macromolecules through transmission electron microscopes. The problem has a natural geometric structure but involves a high degree of noise, requiring sophisticated mathematical methods to obtain high accuracy. Standard methods for 3D reconstruction in typically only make minimal assumptions on the nature of the images, instead leveraging the geometry of the problem to reconstruct the molecular structure. As a result, these results break down at very high noise levels where the regularity induced by the imaging geometry is lost. It has therefore become necessary to develop algorithms that leverage additional information about the problem, such as prior information about the particular objects that are imaged – in this case biomolecules.
An important step in the cryo-EM processing pipeline is denoising. While conventional methods have enjoyed significant success for this task, they all break down at high noise levels. We propose an approach based on neural networks, where prior information on the structure of the projection images is used to denoise the images. In an earlier project (NAISS 2025/5-397), we investigated various neural network architectures for multiple-image denoising building on a conventional class-averaging method. For this, a transformer architecture was found to perform well, where the self-attention block serves to share information between the images. This resulted in improved performance when aligning and averaging small sets of cryo-EM images already classified. The architecture was also able to classify images, automatically clustering them and aligning the images within each cluster. Given a large enough set of images of the same molecule, each with its own, unknown, viewing direction, the model was able to reduce the mean squared error by 40% compared to single-image methods. Additionally, the denoised images were used as input to an ab initio reconstruction algorithm (based on common lines) where it was found to yield higher-resolution reconstructions compared to the alternative denoising methods (DnCNN, U-Net).
The second is to extend these results to experimental data. The latter will require fine-tuning the networks on experimental datasets and possibly leveraging more sophisticated (deeper) architectures. The next step is to use this denoiser as a first block in a pipeline for full 3D reconstruction based on method of moments estimators. Here, the moments of the denoised images would be fed into an equivariant reconstruction network, separately pretrained on synthetic projection images. By jointly training the whole pipeline of denoising and reconstruction, we will obtain a learned model for 3D ab initio modeling. Given its data-driven nature, such a model will provide a more accurate model compared to existing methods, which rely on standard frequentist estimation methods or only incorporate weak, Gaussian priors on the reconstruction. As such, the creation of such an end-to-end method would yield a significant impact to the field of 3D reconstruction in cryo-EM.