CN114004282A

CN114004282A - Method for extracting deep reinforcement learning emergency control strategy of power system

Info

Publication number: CN114004282A
Application number: CN202111188349.9A
Authority: CN
Inventors: 张俊; 高天露; 戴宇欣; 张科; 许沛东; 陈思远
Original assignee: Wuhan University WHU; State Grid Zhejiang Electric Power Co Ltd
Current assignee: Wuhan University WHU; State Grid Zhejiang Electric Power Co Ltd
Priority date: 2021-10-12
Filing date: 2021-10-12
Publication date: 2022-02-01

Abstract

The invention provides a method for extracting an emergency control strategy of deep reinforcement learning of an electric power system. Establishing observation data by introducing characteristic data of a plurality of historical moments of a power system node model; further constructing a deep Q learning network model, and performing optimization training by adopting a random gradient descent optimization algorithm to obtain an emergency control deep reinforcement learning model of the power system; generating a data set under a specific fault scene based on the trained deep Q learning network model; training a weighted tilt decision tree model based on the information gain ratio on the data to complete strategy extraction; and setting a strategy fidelity index, a strategy actual control performance index and a model complexity index to evaluate the model performance under different super parameters, so as to select an optimal model according to actual requirements for the field of emergency control of the power system.

Description

Method for extracting deep reinforcement learning emergency control strategy of power system

Technical Field

The invention belongs to the field of crossing of artificial intelligence and an electric power system, and particularly relates to an extraction method of an emergency control strategy for deep reinforcement learning of the electric power system.

Background

The great social and economic losses are caused by major power failure events occurring around the world, such as the power failure increase in the U.S. 2003, and the urgent need for constructing a safer and more reliable power system is warned. However, the current protection and control mechanisms of the power system are designed based on some typical scenarios in an offline situation and cannot adapt to unknown changes of the power system. Meanwhile, with the development of Artificial Intelligence (AI) technology in the fields of natural language processing technology, computer vision, automatic driving, etc., these technologies have also been successfully applied in power systems, such as load prediction and new energy prediction, power transmission line icing thickness identification, rapid charging guidance for electric vehicles, etc. Artificial intelligence algorithms, represented by Deep Learning (DL), can more easily cope with unknown changes in the power system because of their strong feature extraction and nonlinear mapping capabilities.

In recent years, the application of Deep Learning (DRL) to autonomous driving, games, and the like has verified its advantages in solving the problem of sequence decision, and certainly also the problem of power system control. Many scholars try to solve the problems of prevention control, emergency control, recovery control and the like of the power system based on the DRL and obtain good effects.

However, the black box property and the difficult interactivity of the artificial intelligence algorithms limit the application of the artificial intelligence algorithms in practical scenes, especially in the occasions involving some critical decisions. Therefore, scholars at home and abroad try to build a lighter and interpretable decision model by utilizing the existing AI model based on the concepts of imitation learning, knowledge distillation and the like. In particular, some scholars propose reinforcement learning strategy extraction methods based on decision trees and their variants. However, these efforts have only verified their feasibility in some simple game scenarios, and the related efforts have not been carried out on the research and success in the power system control problem.

Disclosure of Invention

In order to overcome the defects of the background art, the invention provides an emergency control strategy extraction method for deep reinforcement learning of an electric power system.

The specific technical scheme of the invention is a method for extracting an emergency control strategy for deep reinforcement learning of an electric power system, which specifically comprises the following steps.

Step 1: introducing characteristic data of a plurality of historical moments of a power system node model to construct observation data;

step 2: introducing a deep Q learning network model, further sequentially inputting a plurality of groups of observation data into the deep Q learning network model, predicting and obtaining load reduction actions, and further performing optimization training by adopting a random gradient descent optimization algorithm to obtain an emergency control deep reinforcement learning model of the power system;

and step 3: generating a data set under a specific fault scene based on the trained deep Q learning network model;

and 4, step 4: under each non-leaf node in a weighted inclined decision tree model of an information gain ratio, inputting state-action pair data in a lower data set under each non-leaf node into the weighted inclined decision tree model of the information gain ratio, solving the minimum value of a model objective function through a quasi-Newton algorithm, obtaining the optimal parameter of the model under the node, simultaneously dividing the data set under the node into a left subset and a right subset, constructing a left sub node and a right sub node, and circulating the steps until the termination condition of the algorithm is met;

and 5: setting a strategy fidelity index, a strategy actual control performance index and a model complexity index to evaluate the model performance under different super parameters, so as to select an optimal model according to task requirements for emergency control of the power system;

preferably, the observation data in step 1 is specifically defined as:

X_t＝[u_t,u_t+1,...,u_t+L-1]^T

u_t+l＝{data_t+l,p,j|1≤p≤P,1≤j≤J},l∈[0,L-1]

wherein, X_tThe method is characterized by representing the t-th group of observation data, t represents the starting time of the t-th group of observation data, and L is a positive integer and is the length of an observation data window. u. of_t+kThe method comprises the steps of representing the ith group of observation data in the tth group of observation data, namely the observation data at the t + l moment in a multi-node model of the power system; data_t+l,p,jThe J-th type characteristic data of the P-th bus node representing the l-th step of observation data in the t-th group of observation data, namely the J-th type characteristic data of the P-th power system node at the t + l moment in the power system multi-node model, P represents the number of the power system nodes, and J is the number of the node characteristics.

Preferably, the predicted load reduction action in step 2 is formed by a combination of percentage load shedding of bus nodes of the power system, each bus node has two load reduction ways, which respectively define no action and load reduction of 20% of the total load on the bus node;

the number of load reduction actions predicted by the deep Q learning network model includes 2 in total^HH is the number of controllable nodes; further, the actions included in the actions are sorted and numbered, that is, the action set is defined as:

Y＝[0,1,...,y,...,2^H-1],y∈Ν

preferably, the specific generation step of the data set in step 3 is: after the deep Q learning network model training is completed, aiming at a set fault scene, the characteristic quantity x from t moment to t + L-1 moment of the power system is measured_tIn the DQN decision model of the rolling input, the optimal action Y is selected in the action set Y by the decision model_tAnd recording the input and output data of the model at each step to construct the formA state-action pair, (x)_t,y_t) To complete the generation of the tagged data set. Step 3 the state-action pair data set can be expressed as:

S＝{(x₁,y₁),(x₂,y₂),(x_i,y_i),...,(x_N,y_N)}

wherein (x)_i,y_i) Representing the ith state-action pair, x, in the state-action pair dataset_iRepresents the power system state quantity, y, of the i-th state-action pair in the state-action pair data set_iRepresenting the control action of the ith state-action pair in the state-action pair data set, and N representing the number of the state-action pairs in the state-action pair data set;

preferably, the step 4 is specifically as follows:

step 4.1: the input condition of each non-leaf node in the weighted tilt decision tree model with the set information gain ratio is a training data set S, (x)_i,y_i) E, determining that the number of samples of the data set under the current node is equal to or larger than N, wherein M is the number of samples of the data set under the current node, and N is the total number of samples; setting the maximum depth of the model as D and the current node depth as D;

wherein the training data set under the root node is the data set S generated in the step 3, and the training data sets under other non-leaf nodes are left subsets S 'obtained by dividing the training set of the parent node'_LS 'of right subset'_R；

Step 4.2: creating a model root node G based on the data set S, and enabling the current node depth d to be 0;

step 4.3: if the current node depth D is larger than the maximum model depth D, the node G is set as a leaf node, and the label of the node G is the corresponding label k with the maximum number of samples in the data set S; otherwise, turning to step 4.4;

step 4.4: if all samples in dataset S belong to the same class k, node G is set as a leaf node, labeled k. Otherwise, turning to step 4.5;

step 4.5: initializing a parameter theta under a current node of the model in a univariate decision tree mode to obtain an initial value theta₀；

Step 4.6: based on quasi-Newton algorithm and initial value theta₀Solving the minimum value of the model objective function and obtaining the optimal parameter theta of the model_best；

Wherein L (theta) is a model target function, lambda is a regularization term coefficient of L2, theta is a parameter to be trained under each node of the model, | | theta | | magnetism²Is a two-norm of theta, H (S) is the empirical entropy of the sample set S, and H (S | theta) is the conditional empirical entropy of the sample set S under theta;

wherein K is the total number of classes of the sample; k represents a kth class sample label in the sample; i S_kI is the number of kth samples in the sample set S, and I S is the total number of samples in the sample set S;

wherein, W_LIs the sum of the weights of all samples belonging to the left subset, W_RIs the sum of the weights of all samples belonging to the right subset, H_LWeighting the entropy of the information for the left child node, H_RWeighting information entropy for the right child node, wherein M is the total sample number of a sample set S under the node, and theta is a parameter to be trained under each node of the model;

wherein the content of the first and second substances,

is a sample (x)_i,y_i) Weights belonging to the left child node, S_LEach sample association belongs to a set of left child node weight information.

Wherein the content of the first and second substances,

is a sample (x)_i,y_i) Weights belonging to the right child node, S_REach sample association belongs to a set of right child node weight information.

Wherein K is the total number of classes of the sample; k represents the kth class sample label in the sample,

is the sum of the weights of the samples belonging to the left subset under k classes in the sample set, W_LIs the sum of the weights of all samples in the set S belonging to the left subset.

the sum of the weights of the samples belonging to the right subset under the k classes in the sample set is obtained; w_RThe sum of the weights of all samples in the sample set S belonging to the right subset;

wherein (x)_i,y_i) Represents the ith sample in the sample set S, S_LAssociating sets of left subset weight information to which samples belong, S_RAssociating and assigning a set of right subset weight information for each sample;

wherein the content of the first and second substances,

the ith sample is assigned the weight of the left child node,

the weight of the ith sample belonging to the right child node is represented, and sigma (·) is a sigmoid function;

is the sum of the weights of the left subset to which the samples belong under k classes in the sample set.

is the sum of the weights of the right subset to which the samples belong under k classes in the sample set.

Step 4.7: based on the current model parameter θ_bestThe objective function value L at this time is calculated in conjunction with L (theta)₀；

Step 4.8: randomly initializing a parameter theta, and repeating the initialization for C times, wherein C is a model hyper-parameter to obtain a parameter initial value theta'₀；

Step 4.9: based on quasi-Newton algorithm and initial value theta'₀Solving the formula L (theta) to obtain a model optimal parameter theta'_best；

Step 4.10: model parameter theta 'obtained based on current solution'_bestCalculating the objective function value L 'at this time'₀；

Step 4.11: if objective function value L'₀＜L₀Then the model optimum parameter theta_best＝θ’_bestOtherwise, turning to step 4.12;

step 4.12: based on the optimum parameter theta_bestObtaining a left subset, a right subset, S'_L,S’_R(ii) a Wherein the content of the first and second substances,

wherein (x)_i,y_i) Representing the ith sample in the sample set S,

assigning the ith sample a weight, S, of the left child node_LA set of left subset weight information is assigned to each sample association,

assigning the ith sample a weight of the right child node, S_RAssociating and assigning a set of right subset weight information for each sample;

step 4.13: construction of left child node G_LAnd let the node lower training set be S'_LIf d is d +1, turning to step 4.3;

step 4.14: construction of the Right child node G_RAnd let the node lower training set be S'_RIf d is d +1, turning to step 4.3;

preferably, the step 5 specifically comprises:

step 5.1, the strategy fidelity index in step 5 is strategy fidelity, the meaning is the decision matching degree of the weighted inclined decision tree model based on the information gain ratio and the depth reinforcement learning strategy, and the calculation formula is as follows:

wherein y is

The output of the deep reinforcement learning and the weighted tilt decision tree model based on the information gain ratio is given the same input x, N is the total amount of samples, and I (-) is an illustrative function.

Step 5.2: and 5, the strategy actual control performance index is the strategy actual control performance and represents the average return obtained in each round when the weighted inclined decision tree strategy based on the information gain ratio is applied to an actual control scene. When the deep reinforcement learning model is applied to the actual control scenario, the average reward it gets at each round is: r is_eCorrespondingly, the average return of the weighted tilt decision tree based on the information gain ratio in the corresponding scene is recorded as: r'_e. Thus, R_e＝r’_e-r_eA value > 0 represents that the weighted inclined decision tree model based on the information gain ratio has better control performance than the deep reinforcement learning model and vice versa.

Step 5.3: and 5, the model complexity index is the model complexity, and is measured by the model parameter number or the model depth. From the viewpoint of model interpretability and interactivity, a weighted tilt decision tree strategy based on an information gain ratio is sought, wherein the complexity of the model is as small as possible.

Step 5.4: after comprehensively considering the index results of the step 5.1 to the step 5.3, selecting a weighted tilt decision tree model with the optimal information gain ratio according to actual requirements;

the invention has the advantages that the invention can well extract the complex deep reinforcement learning model into a light-weight control strategy with certain interpretability, provides good control performance and solves the problem that the artificial intelligence technology is difficult to be practically applied due to the black box property.

Drawings

FIG. 1: the method is a work flow chart for extracting the deep reinforcement learning emergency control strategy of the power system;

FIG. 2: is an IEEE39 node system topology;

FIG. 3: is an algorithm pseudo code graph of a weighted tilt decision tree model based on information gain ratio;

Detailed Description

In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.

The method is based on introduction of a low-voltage load shedding problem on an IEEE39 node system, strategy extraction is carried out on a deep reinforcement learning intelligent body for low-voltage load shedding, a complex deep reinforcement learning strategy is converted into a lighter strategy in a weighted inclined decision tree model form with a certain interpretable information gain ratio, and effectiveness and advancement of the method are evaluated through three indexes of strategy fidelity, strategy actual control performance and model complexity.

Referring to fig. 1 to 3, an embodiment of the present invention is described below, and referring to fig. 1, an embodiment of the present invention is a method for extracting an emergency control strategy for deep reinforcement learning of an electric power system, and specifically includes the following steps:

step 1: as shown in step 1 of fig. 1, the observation data is constructed by introducing feature data of a plurality of historical moments of a multi-node model of the power system.

The observation data in the step 1 is specifically defined as:

X_t＝[u_t,u_t+1,...,u_t+L-1]^T

u_t+l＝{data_t+l,p,j|1≤p≤P,1≤j≤J},l∈[0,L-1]

As shown in fig. 2, the observed quantities of the agent are set as: 4. the voltage per unit value of the high and low voltage sides of the No. 7, No. 8 and No. 18 buses and the load allowance of the No. 4 and No. 7 buses, wherein P is 4, J_{Bus bar 4}＝J_{Bus 7}＝3，J_{Bus bar 8}＝J _{Bus bar 18}2. In order to capture the characteristic variation trend, the characteristic quantity of the latest 5 simulation steps is superposed to be used as the input quantity, x, of the final intelligent agent. Namely: p is 4, J_{Bus bar 4}＝J_{Bus 7}＝3，J_{Bus bar 8}＝J_{Bus bar 18}When L is 2, then x_t＝[u_t-4,u_t-3,...u_t]。

Step 2: as shown in step 2 of the attached figure 1, a deep Q learning network model is introduced, a plurality of groups of observation data are further sequentially input into the deep Q learning network model, load reduction actions are predicted and obtained, and a random gradient descent optimization algorithm is further adopted for optimization training to obtain an emergency control deep reinforcement learning model of the power system;

and 2, the predicted load reduction action is formed by a load shedding percentage combination mode of bus nodes of the power system, each bus node has two load reduction modes, and the modes of no action and load reduction are respectively defined as 20% of the total load on the bus nodes.

Y＝[0,1,...,y,...,2^H-1],y∈Ν

here, the operable bus is set to be bus 4 or 7, that is, H is 2, and therefore: y ═ 0,1,2, 3.

And step 3: and as shown in step 3 of fig. 1, generating a data set under a specific fault scene based on the trained deep Q learning network model.

The specific generation steps of the data set related to the step 3 are as follows: after the deep Q learning network model training is completed, aiming at the set fault scene, the characteristic quantity x from t moment to t +4 moment of the power system is calculated_tScrolling input DQN decision model from which to set actions [0,1,2,3]Selecting an optimal action y_tRecording the model input and output data of each step to construct a state-action pair (x)_t,y_t) To complete the generation of the tagged data set. Step 3 the state-action pair data set can be expressed as:

S＝{(x₁,y₁),(x₂,y₂),(x_i,y_i),...,(x_N,y_N)}

wherein (x)_i,y_i) Representing the ith state-action pair, x, in the state-action pair dataset_iRepresents the power system state quantity, y, of the i-th state-action pair in the state-action pair data set_iThe control action of the ith state-action pair in the state-action pair data set is shown, and N is the number of the state-action pairs in the state-action pair data set, wherein N is 4836.

And 4, step 4: as shown in step 4 of fig. 1, under each non-leaf node in the weighted tilt decision tree model of the information gain ratio, inputting the state-action pair data in the data set under each non-leaf node into the weighted tilt decision tree model of the information gain ratio, solving the minimum value of the model objective function through a quasi-newton algorithm, obtaining the optimal parameter of the model under the node, dividing the data set under the node into a left subset and a right subset, constructing a left child node and a right child node, and repeating the above steps until the algorithm termination condition is met;

as shown in the pseudo code of fig. 3, the step 4 is as follows:

step 4.1: the input condition of each non-leaf node in the weighted tilt decision tree model with the set information gain ratio is a training data set S, (x)_i,y_i) E, S, i is equal to 1,2,3, and M is equal to or less than N, where M is the number of data set samples under the current node, N is the total number of samples, and N is equal to 4836; setting the maximum depth of the model as D, wherein D belongs to {3,4,5,6,7,8}, and the depth of the current node is D;

Wherein L (theta) is a model objective function, and lambda is L2 regularization term coefficients, set here as: λ is 0.0001; theta is a parameter to be trained under each node of the model, | | theta | | ventilation²Is a two-norm of theta, H (S) is the empirical entropy of the sample set S, and H (S | theta) is the conditional empirical entropy of the sample set S under theta;

where K is the total number of classes of samples, where K is 4; k represents a kth class sample label in the sample; i S_kI is the number of kth samples in the sample set S, and I S is the total number of samples in the sample set S;

wherein the content of the first and second substances,

Wherein the content of the first and second substances,

is a sample(x_i,y_i) Weights belonging to the right child node, S_REach sample association belongs to a set of right child node weight information.

Where K is the total number of classes of samples, where K is 4; k represents the kth class sample label in the sample,

wherein the content of the first and second substances,

the ith sample is assigned the weight of the left child node,

Step 4.8: randomly initializing a parameter theta, and repeating the initialization C times, wherein C is a model hyper-parameter and is set as follows: c-3, obtaining an initial parameter value theta'₀；

wherein (x)_i,y_i) Representing the ith sample in the sample set S,

and 5: as shown in the step 5 of fig. 1, a strategy fidelity index, a strategy actual control performance index and a model complexity index are set to evaluate the model performance under different hyper-parameters, so that an optimal model is selected according to actual requirements;

the step 5 specifically comprises the following steps:

wherein y is

here, the index results are:

therefore, in the case that the policy fidelity is not similar to the actual control performance of the policy, we select a model with lower model complexity, namely: model depth is 3 weighted tilt decision tree model based on information gain ratio.

It should be understood that parts of the specification not set forth in detail are well within the prior art.

Although specific embodiments of the present invention have been described above with reference to the accompanying drawings, it will be appreciated by those skilled in the art that these are merely illustrative and that various changes or modifications may be made to these embodiments without departing from the principles and spirit of the invention. The scope of the invention is only limited by the appended claims.

Claims

1. A method for extracting an emergency control strategy for deep reinforcement learning of an electric power system is characterized by comprising the following steps;

and 5: and setting a strategy fidelity index, a strategy actual control performance index and a model complexity index to evaluate the model performance under different super parameters, so as to select an optimal model according to task requirements for emergency control of the power system.

2. The method for extracting the emergency control strategy for deep reinforcement learning of the power system according to claim 1, wherein the observation data in step 1 is specifically defined as:

X_t＝[u_t,u_t+1,...,u_t+L-1]^T

u_t+l＝{data_t+l,p,j|1≤p≤P,1≤j≤J},l∈[0,L-1]

wherein, X_tThe method comprises the steps of representing a t-th group of observation data, wherein t represents the starting moment of the t-th group of observation data, L is a positive integer and is the length of an observation data window; u. of_t+kThe method comprises the steps of representing the ith group of observation data in the tth group of observation data, namely the observation data at the t + l moment in a multi-node model of the power system; data_t+l,p,jThe J-th type characteristic data of the P-th bus node representing the l-th step of observation data in the t-th group of observation data, namely the J-th type characteristic data of the P-th power system node at the t + l moment in the power system multi-node model, P represents the number of the power system nodes, and J is the number of the node characteristics.

3. The method for extracting the emergency control strategy through deep reinforcement learning of the power system according to claim 1, wherein the predicted load reduction action in step 2 is formed by a combination of load shedding percentages of bus nodes of the power system, each bus node has two load reduction ways, which respectively define the total load on the bus nodes of no action and load shedding of 20%;

Y＝[0,1,...,y,...,2^H-1],y∈Ν。

4. the method for extracting the emergency control strategy for deep reinforcement learning of the power system according to claim 1, wherein the step 3 of generating the data set specifically comprises the following steps: after the deep Q learning network model training is completed, aiming at a set fault scene, the characteristic quantity x from t moment to t + L-1 moment of the power system is measured_tIn the DQN decision model of the rolling input, the optimal action Y is selected in the action set Y by the decision model_tRecording the model input and output data of each step to construct a state-action pair (x)_t,y_t) To complete the generation of tagged data sets; step 3 the state-action pair data set can be expressed as:

S＝{(x₁,y₁),(x₂,y₂),(x_i,y_i),...,(x_N,y_N)}

wherein (x)_i,y_i) Representing the ith state-action pair, x, in the state-action pair dataset_iRepresents the power system state quantity, y, of the i-th state-action pair in the state-action pair data set_iRepresenting the control action of the i-th state-action pair in the state-action pair data set, and N representing the number of state-action pairs in the state-action pair data set.

5. The method for extracting the emergency control strategy for deep reinforcement learning of the power system according to claim 1, wherein the step 4 is specifically as follows:

step 4.1: the input condition of each non-leaf node in the weighted tilt decision tree model with the set information gain ratio is a training data set S, (x)_i,y_i) E.g. S, i ═ 1,2,3, and M ≦ N, where M is under the current nodeThe number of data set samples, N is the total number of samples; setting the maximum depth of the model as D and the current node depth as D;

step 4.4: if all samples in the data set S belong to the same class k, the node G is set as a leaf node, and the label of the leaf node is k; otherwise, turning to step 4.5;

wherein the content of the first and second substances,

is a sample (x)_i,y_i) Weights belonging to the left child node, S_LEach sample association belongs to a set of left child node weight information;

wherein the content of the first and second substances,

is a sample (x)_i,y_i) Weights belonging to the right child node, S_REach sample is associated with a set of right child node weight information;

is the sum of the weights of the samples belonging to the left subset under k classes in the sample set, W_LThe sum of the weights of all samples in the sample set S belonging to the left subset;

wherein the content of the first and second substances,

the ith sample is assigned the weight of the left child node,

the sum of the weights of the samples belonging to the left subset under the k classes in the sample set is used as the weight of the sample;

the sum of the weights of the samples belonging to the right subset under the k classes in the sample set is obtained;

Step 4.11: if objective function value L'₀＜L₀Then the model optimum parameter theta_best＝θ'_bestOtherwise, turning to step 4.12;

step 4.12: based on the optimum parameter theta_bestObtaining a left subset, a right subset, S'_L,S'_R(ii) a Wherein the content of the first and second substances,

wherein (x)_i,y_i) Representing the ith sample in the sample set S,

step 4.14: construction of the Right child node G_RAnd let the node lower training set be S'_RAnd d +1, turning to step 4.3.

6. The method according to claim 1,

the method is characterized in that the step 5 specifically comprises the following steps:

wherein y is

Respectively outputting a deep reinforcement learning and weighted tilt decision tree model based on an information gain ratio under the condition of giving the same input x, wherein N is the total amount of samples, and I (·) is an indicative function;

step 5.2: step 5, the strategy actual control performance index is a strategy actual control performance and represents an average return obtained in each round when the information gain ratio-based weighted tilt decision tree strategy is applied to an actual control scene; when the deep reinforcement learning model is applied to the actual control scenario, the average reward it gets at each round is: r is_eCorrespondingly, the average return of the weighted tilt decision tree based on the information gain ratio in the corresponding scene is recorded as: r'_e(ii) a Thus, R_e＝r'_e-r_eThe condition that the control performance of the weighted inclined decision tree model based on the information gain ratio is superior to that of the deep reinforcement learning model when the control performance is more than 0 is represented, and vice versa;

step 5.3: step 5, the model complexity index is the model complexity, and is measured by the model parameter number or the model depth; from the viewpoint of model interpretability and interactivity, a weighted tilt decision tree strategy based on an information gain ratio is required to be found, wherein the complexity of the model is as small as possible;

step 5.4: and after comprehensively considering the index results of the step 5.1 to the step 5.3, selecting a weighted tilt decision tree model with the optimal information gain ratio according to actual requirements.