CN113380345A

CN113380345A - Organic chemical coupling reaction yield prediction and analysis method based on deep forest

Info

Publication number: CN113380345A
Application number: CN202110761921.XA
Authority: CN
Inventors: 彭李超; 杨晓慧; 穆雪纯; 董晶; 邹雪艳; 孙磊
Original assignee: Henan University
Current assignee: Henan University
Priority date: 2021-06-08
Filing date: 2021-07-06
Publication date: 2021-09-10

Abstract

The invention discloses a method for predicting and analyzing organic chemical coupling reaction yield based on deep forest, which comprises the steps of calculating a feature descriptor, building a model, intelligently regressing the yield and predicting the yield by classification, and specifically comprises the following steps: 1) calculating a characteristic descriptor of each coupling reaction component by using chemical software, and converting the characteristic descriptor into one-dimensional data; 2) a deep forest model is built to train the feature descriptors, the optimal prediction effect is achieved by self-adjusting parameters, and the method combines the idea of deep learning feature learning and integrated learning, so that the efficient prediction of chemical reaction is realized; 3) carrying out intelligent regression and classification prediction on the yield by using the trained model, and analyzing the prediction result; the importance of the one-dimensional feature descriptors is calculated, the influence of the feature descriptors on the yield is analyzed, and more reliable decision information is provided for users in production experiments. The method can assist chemists to quickly predict the yield on the basis of saving cost.

Description

Organic chemical coupling reaction yield prediction and analysis method based on deep forest

Technical Field

The invention belongs to the field of organic synthesis based on pattern recognition and artificial intelligence, and particularly relates to a method for predicting and analyzing yield of organic chemical coupling reaction based on deep forest.

Background

Coupling Reaction (Coupling Reaction) is a process of obtaining an organic molecule by performing a certain chemical Reaction between two organic chemical units (Molecules), including cross-Coupling Reaction and self-Coupling Reaction. Among them, the cross-coupling reaction has advantages of high efficiency, mild reaction conditions, etc., and is often used for organic synthesis. As atoms and molecules are more efficiently manipulated and manipulated, materials that were difficult or even impossible to synthesize have been easily created. The coupling reaction is also mainly used in the fields of natural product synthesis, material science, pesticide chemistry, ligand synthesis and the like. Therefore, increasing the yield of the coupling reaction can drive the production life. In order to realize effective preparation of coupling reaction products under the premise of reducing consumption, the yield of the coupling reaction needs to be predicted more accurately, important factors influencing the yield of the coupling reaction are explored, and more reliable decision information is provided for production experiments.

Over the past few decades, the coupling reaction has progressed rapidly. In 1903, the Ullmann subject group realizes the construction of a C-N bond through a coupling reaction experiment of aryl halide and amine; in 1972, Richard f.heck discovered that palladium catalysts were able to achieve the linkage between carbon atoms under milder conditions; migita et al, 1983, reported the first palladium-catalyzed reaction to form a C (sp2) -N bond; in 1995, two research teams, Stephen l.buchwald and John f.hartwig, almost simultaneously discovered a palladium-catalyzed coupling reaction of aryl bromides with amines without the participation of organotin compounds; in 2010, Richard f.heck, Ei-ichi negishi and Akira Suzuki three scientists developed a "palladium catalyzed cross-coupling method in organic synthesis" awarded the nobel prize for chemistry.

Currently, common palladium catalysts are widely used, and although these catalysts are commercialized, there is still a need to reduce the preparation cost for large-scale production reactions. The traditional chemical experimental method has the defects of high reaction cost, long reaction period, complicated experimental process, incapability of reasonably utilizing experimental data and the like, and when the preparation of the arylamine is realized through the Buchwald-Hartwig coupling reaction, Pd metal has high price and toxicity, and byproducts such as aromatic compounds and the like can be generated in the reaction process, so that the Buchwald-Hartwig coupling reaction yield is low. To solve these problems, chemists hope to find a method for scientifically and intelligently predicting organic chemical synthesis.

In recent years, with the rapid development of machine learning algorithms, more and more experts apply machine learning algorithms to organic synthesis and chemical property prediction in order to improve the yield of coupling reactions on the basis of resource saving in consideration of the multidimensional nature of chemical structures and reactivity. In 2018, Doyle et al realized high-precision prediction of Buchwald-Hartwig coupling reaction yield based on a random forest algorithm. This also demonstrates that the machine learning method can predict the synthesis of multidimensional chemical space reactions using data obtained through high-throughput experiments. There is an urgent need for a method for predicting and analyzing the yield of organic chemical coupling reaction, which can extract feature descriptors, convert the coupling reaction into one-dimensional data, and rapidly mine the correlation between complex reaction conditions in chemical experiments by combining a machine learning method and utilizing a computer, thereby reducing the consumption of human resources and chemical resources, helping chemists make reasonable analysis prediction, and promoting the research and development of organic chemical synthesis.

Disclosure of Invention

In order to solve the defects of the prior art, the invention aims to provide a method for predicting and analyzing the yield of organic chemical coupling reaction based on deep forest, which can quickly achieve the optimal prediction effect of the coupling reaction yield with higher accuracy by self-adjusting parameters, excavate important characteristics influencing the yield and assist chemists in predicting the yield on the basis of saving cost.

In order to achieve the purpose, the invention adopts the technical scheme that:

the organic chemical coupling reaction yield prediction and analysis method based on the deep forest comprises the steps of calculating a feature descriptor, building a model, intelligently regressing yield and predicting classification;

1) calculating the characteristic descriptors, namely calculating the characteristic descriptors of each coupling reaction component according to chemical software, and converting the characteristic descriptors into one-dimensional data so as to train the model in the following process;

2) building a model, namely building a deep forest model to train the feature descriptors, and achieving the best prediction effect by self-adjusting parameters, wherein the method combines the ideas of deep learning feature learning and integrated learning, and realizes efficient prediction of chemical reactions;

3) intelligent regression and classification prediction of yield, namely intelligently predicting the yield by using a trained deep forest model and analyzing the result; including yield prediction analysis and significance analysis of feature descriptors.

The specific implementation steps of the calculation of the feature descriptors include:

(1.1) introducing chemical reactants and reagents into an interface of chemical software, wherein the software automatically calculates a characteristic descriptor of each coupling reaction component and converts the chemical reactants into one-dimensional data;

(1.2) dividing the one-dimensional feature descriptors into a training set and a testing set, and respectively matching the training set and the testing set with corresponding yield or a category to which the yield belongs;

the model building method comprises the following concrete implementation steps:

(2.1) reading and preprocessing a training set, and selecting a deep forest model to perform regression prediction or classification prediction according to requirements;

and (2.2) carrying out regression prediction on the yield of the coupling reaction by adopting a deep forest algorithm. And importing a training set into cascade layers to carry out feature learning, splicing a prediction result obtained by each random forest in each layer of cascade with original features to be used as input of the next layer of cascade, continuously training in the way, estimating the mean square error of the whole cascade in a verification set when each layer is expanded, and stopping training of the model if no obvious gain exists or the maximum upper limit layer is reached, thereby automatically determining the number of cascade levels. Averaging the predicted values obtained by all random forests in the last layer to obtain a final predicted value, outputting a predicted result at the moment, selecting the best predicted result through adjusting parameters, and storing the model;

and (2.3) carrying out classification prediction on the class to which the yield of the coupling reaction belongs by adopting a deep forest algorithm. And importing a training set into cascade layers to carry out feature learning, splicing class probability vectors and original features obtained by each random forest in each layer of cascade as input of the next layer of cascade, continuously training in the way, estimating the prediction accuracy of the whole cascade in a verification set when each layer is expanded, and stopping training of the model if no obvious gain exists or the set maximum upper limit layer is reached so as to automatically determine the number of cascade levels. Averaging class probability vectors output by all random forests in the last layer, wherein the class to which the maximum class probability belongs is the final prediction class; and at the moment, outputting a prediction result, selecting the best prediction result by adjusting parameters and storing the model.

(2.4) performing out-of-sample prediction on the trained model, and if the out-of-sample prediction is effective, verifying the effectiveness of the model, thus proving that the method can effectively predict the yield of the coupling reaction.

And (2.5) the user can adjust the parameters by self according to the prediction effect and by combining self requirements, if the user is not satisfied, the user can adjust the type and the number of the forests in the deep forest, the number of the decision trees contained in each forest and the maximum depth of the deep forest, and the step (2.3) is returned until the user is satisfied.

Wherein, the specific calculation process of the step (2.2) comprises the following steps:

the deep forest model has K layers of cascade, each layer of cascade is composed of L forests, and training samples input by the K level of cascade are (x)^kY), K ═ 0,1,. ·, K; wherein x is^kRepresenting the feature vectors of training samples input into the k-th layer cascade, y representing the true value of the yield corresponding to each feature vector, and x representing the input features received by the k-th layer cascade^kIs the original feature x⁰Concatenation with the output of the layer k-1 cascade, so the combined features are expressed as:

x^k＝(f_k(x^k-1)，x⁰)，

wherein f is_k(x) Representing the real value of the feature x obtained by the k-th level joint training.

The final predicted value is the average of all forest predicted values in the last layer of cascade:

the specific calculation process of the step (2.3) comprises the following steps:

the deep forest model has M layers of cascade, each layer of cascade is composed of N forests, and training samples input by the mth layer of cascade are (x)^mC), M ═ 0,1,. ·, M; wherein x is^mRepresenting the feature vectors of the training samples input into the mth layer cascade, C representing the corresponding category of each feature vector, and the input feature x received by the mth layer cascade^mIs the original feature x⁰And (3) splicing the class probability vector cascaded with the (m-1) th layer, so that the combined features are represented as:

x^m＝(p_m(x^m-1)，x⁰)，

wherein p is_m(x) Representing a class probability vector obtained by training the feature x through the mth level;

the final class probability vector is the average of all forest prediction probabilities in the last layer of cascade:

if the training samples have a common class c, then p (x) ═ p₁(x)，p₂(x)，...，p_c(x) Category corresponding to the maximum probability in the category probability vectors is the category to which the prediction belongs:

3) the intelligent regression and classification prediction of the yield specifically comprises the following steps:

(3.1) intelligent regression prediction of yield, namely, introducing a training set and a corresponding yield into a cascade layer for feature learning, and averaging predicted values of each random forest in the last layer of cascade when a model stops training to obtain a final prediction result;

(3.2) intelligent classification prediction of yield, namely, introducing a training set and corresponding yield categories into a cascade layer for feature learning, and averaging class probability vectors output by all random forests in the last layer when a model stops training, wherein the category to which the maximum class probability belongs is the final prediction category;

(3.3) calculating importance ranking of the one-dimensional feature descriptors by a depth forest algorithm; therefore, the descriptor which has a remarkable influence on the reaction yield is found, and reliable decision information is provided for the user to carry out the organic chemical coupling reaction.

The invention has the following beneficial effects:

1. the invention provides an intelligent prediction method based on deep forest, aiming at the problem that the traditional machine learning lacks feature learning when the yield of Buchwald-Hartwig coupling reaction is predicted. The model combines the characteristic learning idea of deep learning, enables a machine to automatically learn useful data and characteristics thereof by means of an algorithm, and increases the complexity of the model by utilizing an integrated learning method to improve the prediction precision of the model, so that a user can self-adjust parameters to achieve the optimal prediction effect; and calculating to obtain the importance sequence of the feature descriptors, and providing reliable decision information for the user to perform organic chemical coupling reaction. The method can assist chemists to make reasonable analysis and prediction, and quickly realize organic synthesis on the basis of saving cost.

2. The deep forest algorithm self-adaptively adjusts the complexity of the model through training, the hyper-parameters have good robustness, and good results can be obtained even by using default parameters.

3. The method for predicting and analyzing the yield of the organic chemical coupling reaction based on the deep forest is simple to operate and easy to implement, and a user can quickly obtain a relatively accurate analysis result.

Drawings

FIG. 1 is a diagram of the reaction equations and reaction components of a chemical reaction in an example of the present invention;

FIG. 2 is a flow chart of an analysis method of the present invention.

Reference numbers in figure 1: equation Buchwald-Hartwig coupling reaction and reaction components, Aryl halide, Base: substrate, Ligand: ligand, Additive: and (3) an additive.

Detailed Description

As shown in FIG. 1, the invention provides a method for predicting and analyzing yield of organic chemical coupling reaction based on deep forest, which comprises 1) calculation of feature descriptors, 2) construction of a model, and 3) intelligent regression and classification prediction of yield.

Wherein, the step 1) of calculating the feature descriptor specifically comprises:

(1.1) introducing all reaction components (comprising 23 additives, 15 halides, 3 substrates and 4 ligands) of the Buchwald-Hartwig coupling reaction shown in the figure 1 into chemical software, automatically calculating and extracting one-dimensional descriptors of each reaction component to finally obtain 120 characteristic descriptors, and converting chemical reactants into one-dimensional data;

(1.2) after removing the partially ineffective reaction and few partially missing values, the remaining 3955 groups of reactions were used as experimental data. Dividing feature descriptors in the one-dimensional data corresponding to the reactions into a training set (70%) and a testing set (30%), and respectively matching the feature descriptors with corresponding yields or categories to which the yields belong;

step 2) model construction of the deep forest, as shown in fig. 2, comprising the steps of:

(2.1) reading the training set preprocessed in the step (1.2), respectively matching the training set with corresponding yield and the category to which the yield belongs, and selecting a deep forest model to perform regression prediction or classification prediction according to requirements;

(2.2) performing regression prediction on the yield of the coupling reaction by adopting a deep forest algorithm, taking the 3955 groups of reactions in the step (1.2) as experimental data, introducing a deep forest model to train a feature descriptor, expanding each layer of cascade, estimating the mean square error of the whole cascade in verification concentration, stopping the training of the model if no obvious gain exists or a set maximum upper limit layer is reached, and achieving the optimal prediction effect by self-adjusting parameters so as to perform regression prediction on the yield; the specific calculation process comprises the following steps:

x^k＝(f_k(x^k-1)，x⁰)，

and (2.3) respectively carrying out classification prediction on the categories of the yield of the coupling reaction by adopting a deep forest algorithm. Taking the 3955-series reaction described in the step (1.2) as experimental data, 1/4 quantile and 3/4 quantile of the specified yields were defined as threshold values, yields of 1/4 quantile or less were low yields, yields of 1/4 quantile or more and 3/4 quantile or less were medium yields, and yields of 3/4 quantile or more were high yields. The reaction data are led into a deep forest model to train the feature descriptors, the prediction accuracy of the whole cascade is estimated in a verification set when each layer of cascade is expanded, and if no obvious gain exists or the maximum upper limit layer is reached, the model stops training, so that the number of cascade levels is automatically determined; averaging class probability vectors output by all random forests in the last layer, wherein the class to which the maximum class probability belongs is the final prediction class; and at the moment, outputting a prediction result, selecting the best prediction result by adjusting parameters and storing the model. The specific calculation process comprises the following steps:

x^m＝(p_m(x^m-1)，x⁰)，

(2.4) selecting part of additives (18 th, 19 th, 21 th and 22 th in the figure 1) to carry out off-sample prediction on the trained model, and if the off-sample prediction is effective, verifying the effectiveness of the model, thus proving that the method can effectively predict the yield of the coupling reaction.

(2.5) the user can adjust the parameters by himself according to the prediction effect and by combining the self requirements, check the prediction results of the steps 2) and 3), and if the prediction results are not satisfactory, the user can adjust the types and the number of forests in the deep forests, the number of decision trees contained in each forest and the maximum depth of the deep forests, and the step (2.3) is returned until the user is satisfied.

And 3) an intelligent regression and classification prediction module for the yield adopts a deep forest algorithm to train a feature descriptor, automatically learns useful data and features of the useful data by a machine by means of the algorithm, and achieves the optimal prediction effect by self-adjusting parameters so as to perform regression prediction on the value or the category of the yield. The concrete implementation steps comprise:

(3.1) the intelligent regression prediction of the yield is to introduce the training set and the corresponding yield into the cascade layers for feature learning, when the model stops training, the predicted values of each random forest in the last layer of cascade are averaged to obtain the final prediction result, and simultaneously, the coefficient (R-Square, R) of the regression prediction is output²) And Root Mean Square Error (RMSE), the prediction effect of the model is evaluated;

(3.2) intelligent classification prediction of yield, namely, introducing a training set and corresponding yield categories into a cascade layer for feature learning, averaging class probability vectors output by all random forests in the last layer when a model stops training, wherein the category to which the maximum class probability belongs is the final prediction category, and meanwhile, outputting the classification accuracy and kappa statistic of classification prediction to evaluate the prediction effect of the model;

(3.3) calculating the importance of the one-dimensional feature descriptors by a depth forest algorithm, and sequencing the importance according to the importance; and (3) finding descriptors with remarkable influence on reaction yield through the importance sequencing of the descriptors, mining internal rules and analyzing, and providing reliable decision information for users to carry out organic chemical coupling reaction.

The method utilizes the deep forest model to adaptively adjust the complexity of the model through training, the hyper-parameters have good robustness, good results can be obtained even if default parameters are used, a user can adjust the parameters by himself and check the test set, and if the check results are satisfied, parameter adjustment is stopped, and the prediction results are output.

Compared with the traditional machine learning algorithm, the deep forest algorithm has more accurate prediction result, not only combines the characteristic learning thought of deep learning and leads a machine to automatically learn useful data and characteristics thereof by means of the algorithm, but also increases the complexity of the model by utilizing an integrated learning method so as to improve the prediction precision of the model; compared with a general deep learning algorithm, the deep forest has fewer hyper-parameters and good robustness on the parameters, and a good prediction result can be obtained even if default parameters are used; the deep forest is different from a general deep learning algorithm and has better performance on a small sample; cross validation is used during each cascade generation of the deep forest, so that overfitting is effectively avoided; the deep forest can be calculated in parallel, so that the time required by a single machine to run the deep forest is similar to the time required by the GPU to run the deep neural network in an accelerating mode, and the algorithm efficiency can be improved.

Simulation experiment:

the system of the present invention is further shown by simulation experiments, taking Buchwald-Hartwig coupling reaction as an example (chemical reaction formula is shown in FIG. 1), firstly, according to Spartan software (pay for use), descriptors of each reaction component are calculated and extracted, and each group of reaction components is calculated to obtain 120 feature descriptors, wherein each group of feature descriptors comprises 64 atom descriptors, 28 molecule descriptors and 28 vibration descriptors.

Introducing the feature descriptors and the corresponding yields into a model for regression prediction; the simulation results are shown in table 1.

TABLE 1 comparison of regression predictions

Evaluation compares deep forest andseveral machine learning algorithms: the prediction accuracy of Linear Regression (LR), k-nearest neighbor (KNN), Support Vector Machine (SVM), Neural Network (NN) and Random Forest (RF) is shown in the experimental result that R is a deep Forest²Greater than the remaining five algorithms, indicating that the regression goodness of fit for the depth forest is optimal among the six different algorithms, and that the RMSE for the depth forest is 6.8, less than the remaining five algorithms, indicating that the regression rms error for the depth forest is less. In conclusion, the regression prediction result of the deep forest is superior to that of a general machine learning algorithm, and the reaction yield can be predicted with high accuracy.

Taking Buchwald-Hartwig coupling reaction as an example (chemical reaction formula is shown in figure 1), introducing the feature descriptors and the corresponding yield categories into a model for classification prediction; the simulation results are shown in table 2.

TABLE 2 comparison of classified predictions

As shown in table 2, the evaluation contrasts the deep forest with several machine learning algorithms: the classification accuracy of the Logistic Regression (Logistic Regression), the k neighbor, the support vector machine, the neural network and the random forest can be seen from the experimental result, the classification accuracy of the deep forest is 88.37%, the classification accuracy is higher than that of the rest five algorithms, the classification accuracy of the deep forest is the highest, the value of the kappa statistic of the deep forest is 0.813, the classification accuracy of the deep forest is higher than that of the rest five algorithms and is higher than 0.8, and the classification result predicted by the deep forest is almost completely consistent with the real classification result from the statistical viewpoint. In conclusion, the classification prediction result of the deep forest is superior to that of a general machine learning algorithm, and the class of the coupling reaction yield can be predicted with higher accuracy.

Claims

1. The organic chemical coupling reaction yield prediction and analysis method based on deep forests is characterized by comprising the following steps of: the method comprises the following steps: 1) calculating a characteristic descriptor; 2) building a model; 3) intelligent regression and classification prediction of yield;

calculating the characteristic descriptors of each coupling reaction component according to chemical software, and converting the characteristic descriptors into one-dimensional data so as to train a model subsequently;

building a model, namely building a deep forest model to train the feature descriptors, and achieving the best prediction effect by self-adjusting parameters;

intelligent regression and classification prediction of yield, namely intelligently predicting the yield by using a trained deep forest model and analyzing the result; including yield prediction analysis and significance analysis of feature descriptors.

2. The method for predicting and analyzing yield of organic chemical coupling reaction based on deep forest according to claim 1, wherein the method comprises the following steps: step 1) the calculation of the feature descriptors specifically comprises:

(1.1) introducing chemical reactants and reagents into chemical software, wherein the software automatically calculates a characteristic descriptor of each coupling reaction component and converts the chemical reactants into one-dimensional data;

(1.2) dividing the feature descriptors in the one-dimensional data into a training set and a testing set, and respectively matching the training set and the testing set with the corresponding yield or the category to which the yield belongs.

3. The method for predicting and analyzing yield of organic chemical coupling reaction based on deep forest according to claim 1, wherein the method comprises the following steps: step 2) the construction of the model comprises the following steps:

(2.2) carrying out regression prediction on the yield of the coupling reaction by adopting a deep forest algorithm; importing a training set into cascade layers to carry out feature learning, splicing a prediction result obtained by each random forest in each layer of cascade with original features to be used as input of the next layer of cascade, continuously training in the same way, estimating the mean square error of the whole cascade in a verification set when each layer is expanded, and stopping training of a model if no obvious gain exists or the maximum upper limit layer is reached, thereby automatically determining the number of cascade levels; averaging the predicted values obtained by all random forests in the last layer to obtain a final predicted value, outputting a predicted result at the moment, selecting the best predicted result through adjusting parameters, and storing the model;

(2.3) carrying out classification prediction on the category of the yield of the coupling reaction by adopting a deep forest algorithm; importing a training set into cascade layers to carry out feature learning, splicing class probability vectors and original features obtained by each random forest in each layer of cascade as input of the next layer of cascade, continuously training in the way, estimating the prediction accuracy of the whole cascade in a verification set when each layer is expanded, and stopping training of a model if no obvious gain exists or the set maximum upper limit layer is reached so as to automatically determine the number of cascade levels; averaging class probability vectors output by all random forests in the last layer, wherein the class to which the maximum class probability belongs is the final prediction class; at the moment, a prediction result is output, the best prediction result is selected through adjusting parameters, and the model is stored;

(2.4) performing out-of-sample prediction on the trained model, and if the out-of-sample prediction is effective, verifying the effectiveness of the model;

4. The method for predicting and analyzing yield of organic chemical coupling reaction according to claim 1, wherein: step 3) intelligent regression and classification prediction of yield, which specifically comprises the following steps:

(3.1) intelligent regression prediction of yield, introducing the training set and the corresponding yield into a cascade layer for feature learning, and averaging predicted values of each random forest in the last layer of cascade when the model stops training to obtain a final prediction result;

(3.2) intelligent classification prediction of yield, namely, introducing the training set and the corresponding class to which the yield belongs into a cascade layer for feature learning, and averaging class probability vectors output by all random forests in the last layer when the model stops training, wherein the class to which the maximum class probability belongs is the final prediction class;

(3.3) calculating importance ranking of the feature descriptors by a deep forest algorithm; therefore, the descriptor which has a remarkable influence on the reaction yield is found, and reliable decision information is provided for the user to carry out the organic chemical coupling reaction.

5. The method for predicting and analyzing yield of organic chemical coupling reaction based on deep forest as claimed in claim 3, wherein: the specific calculation process of the step (2.2) comprises the following steps:

the deep forest model has K layers of cascade, each layer of cascade is composed of L forests, and training samples input by the K level of cascade are (x)^kY), K ═ 0,1, …, K; wherein x is^kRepresenting the feature vectors of training samples input into the k-th layer cascade, y representing the true value of the yield corresponding to each feature vector, and x representing the input features received by the k-th layer cascade^kIs the original feature x⁰Concatenation with the output of the layer k-1 cascade, so the combined features are expressed as:

x^k＝(f_k(x^k-1),x⁰),

wherein f is_k(x) Representing a real numerical value obtained by training the characteristic x through a kth level;

6. the method for predicting and analyzing yield of organic chemical coupling reaction based on deep forest as claimed in claim 3, wherein: the specific calculation process of the step (2.3) comprises the following steps:

the deep forest model has M layers of cascade, each layer of cascade is composed of N forests, and training samples input by the mth layer of cascade are (x)^mC), M ═ 0,1, …, M; wherein x is^mRepresenting the feature vectors of the training samples input into the mth layer cascade, C representing the corresponding category of each feature vector, and the input feature x received by the mth layer cascade^mIs the original feature x⁰And (3) splicing the class probability vector cascaded with the (m-1) th layer, so that the combined features are represented as:

x^m＝(p_m(x^m-1),x⁰),

if the training samples have a common class c, then p (x) ═ p₁(x),p₂(x),…,p_c(x) Category corresponding to the maximum probability in the category probability vectors is the category to which the prediction belongs: