INTRODUCTION
On a global scale from an electron perspective, all organisms are electronic half-cells, powered by circuits plugged into electron sources and sinks in the environment [1-3]. For example, in aerobic respiration, which is probably most familiar to us, as that is our source of energy, the oxidation of organic matter leads to a flux of electrons and protons through metabolic pathways to reduce oxygen to water and CO2. This, like all metabolic pathways, is a half-cell terms of chemical oxidation-reduction pathways. In the case of aerobic respiration, the other half cell is oxygenic photosynthesis, where sunlight is used to oxidize water and the electrons and protons drive reduction of CO2 to organic matter. The voltage potential between the anode (e.g., organic matter; in its simplest form, sugars) and the cathode (e.g., oxygen) provides over 1 volt of energy. That is the most energy available for life on this planet - but life existed long before there was molecular oxygen.
In deep time, a set of enzymes evolved to facilitate electron transport – the oxidoreductases or EC 1 proteins. Biological electronic circuits require the movement of electrons over sub-nanometer distances through an electron transfer chain that powers life. The movement of electrons is governed by physical laws [4-6]. Oxidoreductases organize the positions and relative energetics of chains of redox-active cofactors, assuring the rapid, directional flow of electrons [7]. The energetic tendency of a redox-active group to gain electron or lose electrons can be experimentally measured as the redox potential, expressed in volts (V), relative to a reference such as the standard hydrogen electrode, at a standard pH. Redox-active groups that contribute to the redox potential can be cofactors such as iron-sulfur clusters, hemes, or flavins, or amino acid residues such as cysteine, methionine, or tryptophan. The relative stability of cofactor oxidation states are largely determined by the cofactor itself [8] but are further modulated by the protein matrix. Electrostatic interactions, such as proximity of positively charged basic amino acids, can stabilize a redox cofactor in the reduced state [9, 10]. The protein can modulate oxidation-reduction energetics through hydrogen bonding [11, 12], hydration [13] and dynamical features [14] of the protein-cofactor environment. Groups of oxidoreductases form metabolic pathways, powering cellular-scale circuits where the current depends on the rate of catalysis and diffusion of substrates [15]. It is critical to study how the protein environment modulates the energetics of oxidation-reduction reactions in order to understand how electron transfer is coupled to metabolism.
The connection between oxidoreductase structure and energetics is central to the deep-time evolution of metabolism. Oxidoreductases must have been among the first proteins at the origins of life over 3.5 billion years ago providing the spark for metabolism [2, 16-20]. Due to its fundamental electrical nature, the evolution of metabolism, and the associated oxidoreductases, was strongly coupled with changes in the redox state of the planet, which has become increasingly oxidized over time due to both geochemical and biological processes [2, 21, 22].
Modern oxidoreductases are massive nanomachines – far too complex to have arisen early in metabolism. Various structure-based bioinformatics approaches have been applied to identify universal sub-folds or domains within larger proteins that may have derived from early protein forms [16, 17, 23-32]. In previous work focused on the evolution of oxidoreductases, we found that modern, large enzymes were largely derived from just a few minimal protein-cofactor building blocks [16, 17, 33]. In addition to identifying core cofactor binding folds, we used a structure-derived criterion for electron transfer based on cofactor-cofactor distances [7] to map a network of electron transfer pathways between the different folds – which we refer to as the Spatial Adjacency Network, SpAN. A notable feature of the SpAN was the abundance of more reducing cofactor-binding folds in the network center and more oxidizing cofactor-folds at the periphery [17]. This suggests a time axis in the SpAN from the center to the periphery of the network reflecting the adaptation of protein redox energetics to emerging electron sources and sinks made available by an oxidizing planetary environment over geologic time. Mapping quantitative estimates of protein redox energetics onto the SpAN would allow us to potentially constrain the age of various protein folds based on redox information in the geologic record [2, 34, 35].
Computational approaches for prediction of redox energetics based on protein structures is an ongoing challenge. Current methods span many levels of theory from quantum-mechanical to empirical [36] and recent advances using machine learning [37]. Site-directed mutagenesis studies on natural oxidoreductases [38-40] and protein engineering [41-44] have been used to test molecular hypothesis of how the protein environment tunes redox energetics. Large datasets of protein structures, including oxidoreductases, are on the horizon with advances in functional annotation from genomic and metagenomic datasets [20, 45] combined with recent advances in structure prediction [46-48] including bound cofactors [49]. Effective models that can predict redox energetics based on structural information will become increasingly valuable for understanding bioenergetics, evolution of metabolism and engineering of bioelectronic pathways [42, 50].
Motivated by the need to design and train better models and the goal of mapping redox energetics onto the SpAN to study oxidoreductase evolution, we develop ProtReDox, a manually curated database of protein redox potentials. We examined literature reports of oxidoreductase energetics and identified the cofactor type, redox potential, UniProt and PDB (if available) identifiers, and experimental metadata such as potentiometric measurement technique, pH and buffer conditions. ProtReDox version one is available athttps://protein-redox-potential.web.app. We apply this dataset to explore how redox energetics is modulated by cofactor-type, protein environment, experimental conditions and finally how energetics mapped onto the SpAN inform geochemical constraints on deep-time oxidoreductase evolution.