Structural bioinformatics.
The 3D structure of biological molecules such as proteins and RNA are key to understanding their functions. This information in turn can lead to understanding disease mechanisms, viral activities and many other important biological processes. Traditionally, X-ray crystallography has been used to study protein structures - which is like trying to guess an entire day's stock market movements from a single snapshot taken at 10:10 AM! Additionally, forming an useful protein crystal is a time consuming and uncertain process. As a result, only a handful proteins 3D structures have been resolved over the years.
Pulse ESR.
Most common proteins have 1,000 or more atoms in its structure. Most of these atoms are not ESR-active; thus they don't produce any ESR signal. That inspired the spin-labeling technique - introduction of an ESR-active probe to a point of interest in a protein. In pulse ESR experiments with frozen samples, two such probes are introduced and data analysis of the signal yields the distance distributions between the protein-regions where the probes are located. The flexibility of the regions is indicated by the width of the distribution while the peak positions show the average distances between these regions in the stable protein conformers. The cartoon shows how this technique can identify different 3D structures resulting from the sequence of molecular units. When both conformations are present, the pulse ESR derived distance distribution reveals their relative proportions.
Relevance of pulse ESR in structural bioinformatics.
These distances (or distributions) measured by pulse ESR experiments are highly informative. For example, pulse ESR can inform if only one of the conformers in the above image is present or the structure switches from one conformation to the other after substrate binding. Tracking such partial structural features or changes are vital in understanding a protein's role in a biological processes. With recent advent of machine learning and AI-driven proteins structure predictions (e.g., AlphaFold), these distances can be used as structural constraints or in selecting the most probable 3D structure. This can improve the accuracy of the 3D structure predictions significantly.
Pulse ESR data analysis challenges.
Determination of the distance distributions between a pair of spin probes in a protein from its pulse ESR signal can be tricky. It requires an accurate (yet rapid) simulation of the experimental signal and a robust matrix inversion algorithm. An accurate inversion algorithm for the process is available, but often the accuracy of simulations are compromised to reduce the overall data analysis time. A typical pulse ESR experiments contains multiple pulses with more than 100 time-domain data points. On top of that, the distance vector between the spin probes relative to the laboratory frame of reference has a wide range of orientations in a frozen sample. Hence, for a 4-pulse experiment with a time-domain size of 128 and 200 orientations of the distance vector, numerical calculations must be repeated 51,200 (4 x 128 x 100) times to simulate the signal. That means, it will take more than an hour to run a simulation even if each of the 51,200 calculations requires 1/10th of a second. A highly accurate numerical simulation would require more parameters and thus, would make the data analysis process more time intensive than running the actual experiments.
Solved! An analytical expression that works.
We have used UNIDYN, a Mathematica package to obtain analytical expressions (kernel) for the two most important pulse ESR techniques. The expressions are very long, but the simulations are rapid, taking 2 to 6 minutes on a standard computer. The results are highly accurate and as a result, the use of these kernels produce robust distance distributions with low uncertainty. Previously, distances shorter than 2 nanometer were not reproduced accurately by using the standard but simplified signal expression. With the improved kernels, we were able to measure distances in the range of 1 to 8 nm, achieving the theoretical limits for these techniques.
What next?
ALS or Lou Gehrig's disease is a neurodegenerative disease - which means that the disease progression is irreversible. The famous physicist and popular science writer Stephen Hawkins has ALS, but he was one of rare cases where the patient survived a long life. For most patients, the life expectancy is about 5 years. Unfortunately, there is not many therapeutic options available to slow the disease progression significantly. The number of people living with ALS in the US is about 5 to 7 cases per 100,000 or 17,500 to 24,500 cases considering a population of 350 million. However,