1. Introduction
Fermentation technology is enjoying a significant moment, due to the potential of metabolic engineering, systems biology, and synthetic biology [1]. Various economically important compounds such as different chemicals, fuels, and biopharmaceuticals can be obtained through fermentation processes. With the purpose of commercialization of any fermentation-based product, the amount of obtained product should meet the market demand [2]. Therefore, optimization of the fermentation parameters (e.g., temperature, pH, medium composition, feeding strategies, etc.) is a critical factor that has an important role in bioprocess overall yield and productivity. Furthermore, fermentation optimization can reduce the overall cost of bioprocess through its impact on downstream processes and purification [3].
Various strategies have been implemented to find the optimal values of fermentation parameters so far. Modeling has always been one of the most popular methods due to its ability to replace expensive laboratory experiments, or at least diminish the amount of them. In this approach, according to the specified algorithms, the output is calculated as a function of given inputs [4]. For instance, the inputs can be media composition, temperature, pH, etc., then the appropriate values of these parameters are optimized to make the desired output. Generally, three types of models are implemented to the problems: purely mechanistic/knowledge-driven, merely data-driven, or a combination of the two [5]. Each of these approaches has its own advantages and disadvantages. For example, data-driven models are black-box models, which do not provide adequate information on the underlying mechanism [6]. Nevertheless, large datasets may be not incorporated into a model framework smoothly. On the other hand, hypothesis/mechanistic-driven approaches use basic knowledge to extract deeper information from datasets and provide valuable information on the underlying mechanism. Nonetheless, the construction of these models is challenging due to the rapid growth of data. However, in order to construct the most powerful model, it is crucial for the researcher to understand the strengths and weaknesses of these approaches. Moreover, the hybridization of these two approaches might be the most powerful model [7].
Fermentation parameters have a significant effect on cellular metabolism, thus productivity. So, mechanistic analysis of the interaction between environmental conditions and metabolic pathways leads us to fine-tune fermentation parameters in a comprehensive way [8]. There are several mechanistic models for simulating metabolism in the field of systems biology [9]. Among them, constraint-based modeling (CBM) of metabolism is one of the most common approaches [10]. These models are built from a genome-scale metabolic network reconstruction to predict metabolic flux values through optimization techniques such as flux balance analysis (FBA) [11, 12]. To date, genome-scale metabolic models (GEMs) for diverse eukaryotic and prokaryotic organisms and cells have been reconstructed [13] and applied in biotechnology and human health [14, 15]. Such models have been extensively utilized for qualitative mapping of cellular metabolism, predicting metabolic functions, and guiding metabolic engineering designs and bioprocess optimizations toward the desired phenotype [16].
In parallel, machine learning (ML) is a purely data-driven approach with the creation and evolution of algorithms that identify patterns and makes hypothesis or models based on learning from existing data [17, 18]. Because of the rapid increase in omics datasets, many researchers prefer to use machine learning independently to interpret systems biology and metabolic engineering datasets. For instance, genome annotation, host strain selection, pathway discovery, metabolic pathway reconstruction, metabolic flux optimization, multi-omic data integration, and protein modeling can be obtained through machine learning methods [3, 19]. Besides, due to the availability of the large amounts of fermentation parameter values from empirical studies, machine learning algorithms can be implemented directly to this multivariate system to fine-tune the fermentation conditions [20, 21].
Although the applications of each of the two methods separately are constantly increasing, the unique capabilities of each have led to the integration of models with more prediction power and accuracy. Recently, comprehensive reviews of the integration of machine learning algorithms and mechanistic models have been published, indicating a promising outlook for this field of knowledge [22-26]. However, it is worthwhile to review the capabilities of these two methods individually or in combination for fermentation parameter optimization. The basic idea here is that machine learning is a powerful computational tool for analyzing omics data individually or inferring multi-omic relationships. Moreover, as a result of CBM, an additional layer of omics data called fluxomics is created, which can be analyzed by machine learning methods separately or by integrating with other omics data [26].
In the present review, first, we highlight the latest efforts in the literature that utilize CBM as a mechanistic approach for fermentation optimization. Next, we introduce ML as a data-driven method and highlight its recent applications in tuning the fermentation parameters. Finally, we present the studies in which CBM and ML combined to improve the model accuracy for analyzing fermentation conditions.