CN111951097A - Enterprise credit risk assessment method, device, equipment and storage medium - Google Patents
Enterprise credit risk assessment method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN111951097A CN111951097A CN202010805252.7A CN202010805252A CN111951097A CN 111951097 A CN111951097 A CN 111951097A CN 202010805252 A CN202010805252 A CN 202010805252A CN 111951097 A CN111951097 A CN 111951097A
- Authority
- CN
- China
- Prior art keywords
- variable
- data
- sample
- model
- enterprise
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000012502 risk assessment Methods 0.000 title claims abstract description 55
- 238000011156 evaluation Methods 0.000 claims abstract description 49
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 33
- 238000007781 pre-processing Methods 0.000 claims abstract description 32
- 238000012549 training Methods 0.000 claims abstract description 31
- 238000004458 analytical method Methods 0.000 claims abstract description 28
- 238000012216 screening Methods 0.000 claims abstract description 26
- 230000002159 abnormal effect Effects 0.000 claims abstract description 14
- 238000012545 processing Methods 0.000 claims description 28
- 230000006870 function Effects 0.000 claims description 13
- 238000007619 statistical method Methods 0.000 claims description 10
- 238000000926 separation method Methods 0.000 claims description 8
- 238000007477 logistic regression Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 4
- 238000012417 linear regression Methods 0.000 claims description 4
- 230000008569 process Effects 0.000 abstract description 12
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 7
- 238000012360 testing method Methods 0.000 description 7
- 238000004140 cleaning Methods 0.000 description 5
- 238000011835 investigation Methods 0.000 description 5
- 238000007637 random forest analysis Methods 0.000 description 5
- 238000007405 data analysis Methods 0.000 description 4
- 238000003066 decision tree Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013079 data visualisation Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/12—Accounting
- G06Q40/123—Tax preparation or submission
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Data Mining & Analysis (AREA)
- Probability & Statistics with Applications (AREA)
- Development Economics (AREA)
- General Engineering & Computer Science (AREA)
- General Business, Economics & Management (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Fuzzy Systems (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The application discloses an enterprise credit risk assessment method, in the method, by receiving enterprise tax data, the business credit risk of an enterprise is quantified from the tax data dimension, and a foundation is laid for accurate risk assessment of enterprise credit; meanwhile, a small and micro enterprise credit risk model called by the method is built based on an XGboost algorithm, so that the characteristic cross capability of a weak variable of the model is guaranteed; in the training process, based on the analysis of tax sample data, after variable preprocessing, the variable stability and the model stability of the sample variable data are used as evaluation indexes, and the sample variable data are subjected to in-mode variable screening, so that the influence of abnormal sample variables on model training can be filtered, the problem of overfitting when the XGboost algorithm is adopted by a small and micro enterprise credit risk model is solved, and the enterprise credit risk evaluation accuracy is improved. The application also provides an enterprise credit risk assessment device, equipment and a readable storage medium, and the enterprise credit risk assessment device has the beneficial effects.
Description
Technical Field
The present application relates to the field of anterior segment inspection technologies, and in particular, to a method, an apparatus, and a device for evaluating an enterprise credit risk, and a readable storage medium.
Background
The wide application of big data and internet technology has profound influence on the financial ecology of China, and simultaneously provides a new platform and channel for the financing of small and micro enterprises; the innovative application of the big data technology in the field of internet finance creates more possibilities for the development of financial services of small and micro enterprises.
The enterprise credit investigation system can solve the problem of asymmetric information, reduce the information cost and the transaction cost and further lighten the reverse selection. The credit investigation system can collect, process and process the information of the transaction efficiently in a large scale, reduce the uncertainty in the transaction process as much as possible, reduce the cost of bank information and improve the quality of bank loan. Meanwhile, the credit investigation system of the enterprise makes the risk of the medium and small enterprises more transparent, thereby increasing the financing chance of the medium and small enterprises, and in addition, the credit investigation system of the enterprise can also form an enterprise operation risk constraint mechanism, the credit investigation system provides a platform for the enterprise to display the operation risk level and credit of the enterprise, the enterprise can spontaneously form the constraint mechanism, tends to disclose real information, and finally forms a credit transaction mechanism for social approval.
At present, a credit risk model is generally used as a traditional logistic regression model, and although logistic regression has good business interpretability, a feature cross capability model of some weak variables cannot be learned in the internet era, so that more and more machine learning algorithms are applied to a small and micro enterprise credit risk model.
At present, a small and micro enterprise credit risk model usually adopts an XGboost method for data processing, the XGboost (eXtreme Gradient boosting) is an integrated learning data processing method, and due to the characteristics of few small and micro enterprise wind control modeling samples, complex enterprise types and the like, the XGboost method for data processing can cause the model to be easily over-fitted, and the model after over-fitting can cause the generalization capability of the model to be weak, so that the identification precision of the model is influenced.
Therefore, how to ensure the feature crossing capability of the weak variables and avoid the influence of model overfitting on the identification precision is an urgent problem to be solved by the technical personnel in the field.
Disclosure of Invention
The method can ensure the characteristic cross capability of a weak variable and simultaneously avoid the influence of model overfitting on the identification precision; another object of the present application is to provide an enterprise credit risk assessment apparatus, device and readable storage medium.
In order to solve the above technical problem, the present application provides an enterprise credit risk assessment method, including:
receiving enterprise tax data of an enterprise to be evaluated;
calling a pre-trained small and micro enterprise credit risk model built based on an XGboost algorithm to carry out operation credit risk evaluation on the enterprise tax data to obtain an evaluation result;
the training method of the credit risk model of the small micro-enterprise comprises the following steps:
acquiring tax sample data of an enterprise;
performing variable preprocessing on the tax sample data to obtain sample variable data;
taking the variable stability and the model stability of the sample variable data as evaluation indexes, performing variable screening on the sample variable data, and determining a mode entering variable in the sample variable data;
determining model parameters in a small and micro enterprise credit risk model built based on an XGboost algorithm;
and calling the sample variable data to train the credit risk model of the small micro enterprise.
Optionally, performing variable preprocessing on the tax sample data to obtain sample variable data, including:
performing variable analysis on the tax sample data, and taking data output by the variable analysis as preprocessing sample data;
and performing box separation woe on the preprocessed sample data to obtain boxed variable data, and taking the boxed variable data as sample variable data.
Optionally, performing variable analysis on the tax sample data, and taking data output by the variable analysis as pre-processing sample data, including:
performing statistical analysis on the distribution of the tax sample data to obtain sample distribution statistical information;
and performing data filling processing on the missing values and the abnormal values in the sample distribution statistical information, and taking the processed data as pre-processing sample data.
Optionally, with the variable stability and the model stability of the sample variable data as evaluation indexes, performing variable screening on the sample variable data, and determining an input variable in the sample variable data, including:
screening the sample variable data according to the correlation and the variable importance among the sample variable data to obtain a first variable;
and calculating a model stability index of the first variable, and taking the first variable with the model stability index lower than a threshold value as a model entering variable.
Optionally, the determining model parameters in the small micro enterprise credit risk model built based on the XGBoost algorithm includes:
determining the type of the XGboost model base learner; wherein the XGboost model base learner type comprises: gbtree and gbiner;
determining a learning objective function and a model evaluation index of the XGboost; wherein the objective function comprises: logistic regression and linear regression, wherein the model evaluation indexes comprise: auc, loglos, rmse, mae, error;
and adjusting and optimizing the XG boost algorithm parameters, and combining the obtained optimal model parameters to serve as the XG boost model parameters.
The application also provides an enterprise credit risk assessment device, including:
the data receiving unit is used for receiving enterprise tax data of an enterprise to be evaluated;
the model evaluation unit is used for calling a pre-trained small and micro enterprise credit risk model built based on the XGboost algorithm to carry out operation credit risk evaluation on the enterprise tax data to obtain an evaluation result;
wherein the model training unit for training the small micro enterprise credit risk model called by the model evaluation unit comprises:
the data acquisition subunit is used for acquiring tax sample data of an enterprise;
the variable preprocessing subunit is used for performing variable preprocessing on the tax sample data to obtain sample variable data;
the variable screening subunit is used for performing variable screening on the sample variable data by taking the variable stability and the model stability of the sample variable data as evaluation indexes to determine a mode entering variable in the sample variable data;
the parameter determining subunit is used for determining model parameters in a small and micro enterprise credit risk model built based on an XGboost algorithm;
and the training subunit is used for calling the sample variable data to train the credit risk model of the small micro-enterprise.
Optionally, the variable preprocessing subunit includes:
the variable analysis subunit is used for carrying out variable analysis on the tax sample data and taking data output by the variable analysis as preprocessed sample data;
and the box dividing processing subunit is used for carrying out box dividing woe processing on the preprocessed sample data to obtain the variable data after box dividing, and taking the variable data after box dividing as the sample variable data.
Optionally, the variable analysis subunit includes:
the statistical analysis subunit is used for performing statistical analysis on the distribution of the tax sample data to obtain sample distribution statistical information;
and the exception processing subunit is used for performing data filling processing on the missing values and the abnormal values in the sample distribution statistical information and taking the processed data as pre-processing sample data.
The present application further provides an enterprise credit risk assessment device, comprising:
a memory for storing a computer program;
and the processor is used for realizing the steps of the enterprise credit risk assessment method when executing the computer program.
The present application also provides a readable storage medium having a program stored thereon, which when executed by a processor, performs the steps of the enterprise credit risk assessment method.
According to the enterprise credit risk assessment method, the enterprise tax data is received, the operation credit risk of the enterprise is quantified from the tax data dimension, and the enterprise credit can be assessed relatively compared with other assessment dimensions, so that a foundation is laid for accurate risk assessment of the enterprise credit; meanwhile, a small and micro enterprise credit risk model called by the method is built based on an XGboost algorithm, so that the characteristic cross capability of a weak variable of the model is guaranteed; in the training process, based on the analysis of tax sample data, a characteristic project is constructed, after variable preprocessing, the variable stability and the model stability of sample variable data are used as evaluation indexes, the sample variable data are subjected to modeling variable screening, the influence of abnormal sample variables on model training can be filtered, and the over-fitting problem of a small and micro enterprise credit risk model when an XGboost algorithm is adopted is relieved, so that the recognition effect of the trained model is improved, and the enterprise credit risk evaluation accuracy is improved.
The application also provides an enterprise credit risk assessment device, equipment and a readable storage medium, which have the beneficial effects and are not repeated herein.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of an enterprise credit risk assessment method according to an embodiment of the present application;
fig. 2 is a block diagram illustrating an architecture of an enterprise credit risk assessment apparatus according to an embodiment of the present disclosure;
FIG. 3 is a block diagram illustrating an alternative embodiment of an enterprise credit risk assessment device;
fig. 4 is a schematic structural diagram of an enterprise credit risk assessment apparatus according to an embodiment of the present application.
Detailed Description
The core of the application is to provide an enterprise credit risk assessment method, which can ensure the characteristic cross capability of a weak variable and simultaneously avoid the influence of model overfitting on the identification precision; at the other core of the application, an enterprise credit risk assessment device, equipment and a readable storage medium are provided.
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a flowchart illustrating an enterprise credit risk assessment method according to the present embodiment, where the method mainly includes:
step s110, receiving enterprise tax data of an enterprise to be evaluated;
the enterprise tax data of the enterprise to be evaluated is received, the information types specifically included in the enterprise tax data are not limited, and corresponding setting can be performed according to the needs of actual enterprise operation management, for example, value-added tax, consumption tax, urban construction tax, real estate tax, land use tax, vehicle and ship use tax, enterprise and personal income tax, stamp tax and the like can be included, and the enterprise tax data can be obtained from an enterprise asset liability statement and a profit statement. In the embodiment, the operation risk of the enterprise is quantified from the dimension of the tax data, the operation risk of the enterprise can be more comprehensively and accurately evaluated by the machine learning rating method based on the tax data, and an enterprise operation risk constraint mechanism can be formed.
And wherein, enterprise tax data can be gathered by the system and acquireed, also can direct import enterprise tax data of gathering in advance, does not limit to enterprise tax data acquisition mode in this embodiment, can set for according to actual data acquisition's needs.
Step s120, calling a pre-trained small and micro enterprise credit risk model built based on an XGboost algorithm to perform operation credit risk evaluation on the enterprise tax data to obtain an evaluation result;
compared with the traditional logistic regression model, the pre-trained small and micro enterprise credit risk model built based on the XGboost algorithm can learn the cross action of some weak variables, and has better model prediction capability.
The training method of the credit risk model of the small micro-enterprise called by the embodiment specifically comprises the following steps:
(1) acquiring tax sample data of an enterprise;
(2) carrying out variable preprocessing on tax sample data to obtain sample variable data;
the variable preprocessing mainly refers to performing variable analysis processing on sample data, removing irrelevant variables, abnormal variables and the like in the sample data, and avoiding the influence of the data on subsequent data analysis.
The specific variable preprocessing means is not limited in this embodiment, and may be set correspondingly according to the data item of the actual sample data and the requirement of data analysis, which is not limited in this embodiment.
(3) Taking the variable stability of the sample variable data and the model stability as evaluation indexes, performing variable screening on the sample variable data, and determining a mode entering variable in the sample variable data;
the variable stability refers to a stability factor of the variable embodying features, and specific measurement indexes can be elimination and reassignment of abnormal data, assignment of missing data and the like, and are not limited; the model stability refers to a stability factor of the model training process after the model stability variable is applied to the model, and the specific measurement index may be a stability index or the like, which is not limited to this.
(4) Determining model parameters in a small and micro enterprise credit risk model built based on an XGboost algorithm;
model parameter selection is performed through the XGBoost parameter, which may specifically include: a basis learner (boost), an objective function (objective), a model evaluation index (eval _ metric), iteration times (n _ estimators), a maximum depth of a tree (max _ depth), a minimum loss function value (gamma) required for node partitioning, a minimum leaf node sample weight sum (min _ child _ weight), a proportion of sub-samples of a training model to the whole sample set (subsample), a proportion of feature random samples (colsample _ byte), an L1 regularization term weight coefficient (alpha), an L2 regularization term weight coefficient (lambda), a learning rate (learning _ rate), and the like.
The specific model parameter determination strategy is not limited in this embodiment, and may be set according to actual risk assessment requirements.
(5) And calling sample variable data to train the credit risk model of the small micro enterprise.
The specific implementation steps of the model training may refer to implementation manners in related technologies, which are not limited in this embodiment and are not described herein again.
Based on the introduction, in the enterprise credit risk assessment method provided by the embodiment, the enterprise tax data is received, the operation credit risk of the enterprise is quantified from the tax data dimension, and the credit of the enterprise can be assessed relatively compared with other assessment dimensions, so that a foundation is laid for accurate risk assessment of the enterprise credit; meanwhile, a small and micro enterprise credit risk model called by the method is built based on an XGboost algorithm, so that the characteristic cross capability of a weak variable of the model is guaranteed; in the training process, based on the analysis of tax sample data, a characteristic project is constructed, after variable preprocessing, the variable stability and the model stability of sample variable data are used as evaluation indexes, the sample variable data are subjected to modeling variable screening, the influence of abnormal sample variables on model training can be filtered, and the over-fitting problem of a small and micro enterprise credit risk model when an XGboost algorithm is adopted is relieved, so that the recognition effect of the trained model is improved, and the enterprise credit risk evaluation accuracy is improved.
In the above embodiment, a specific implementation process of performing variable preprocessing on tax sample data in training of a credit risk model of a small micro enterprise is not limited, and optionally, a variable preprocessing process may specifically include the following steps:
(1) performing variable analysis on tax sample data, and taking data output by the variable analysis as preprocessing sample data;
the process of actually performing variable analysis on the sample is not limited herein, and can be set according to the requirement of actual data analysis.
Optionally, a process of performing variable analysis on tax sample data may specifically include the following steps:
(1.1) carrying out statistical analysis on the distribution of tax sample data to obtain sample distribution statistical information; data visualization of sample target variable distribution, continuous and categorical variable distribution
And (1.2) performing data filling processing on missing values and abnormal values in the sample distribution statistical information, and taking the processed data as pre-processing sample data.
And starting a program of the data cleaning module, and cleaning and processing missing values and abnormal values of the tax data, wherein the processing specifically comprises operations of transposing, summing and the like of the data. In this embodiment, only the above preprocessing process is taken as an example for description, and other implementation manners can refer to the description of this embodiment, which is not described herein again.
(2) And performing box separation woe on the preprocessed sample data to obtain the variable data after box separation, and taking the variable data after box separation as the sample variable data.
And performing box separation on the preprocessed sample data to obtain a sample data set subjected to box separation.
Because the sample data volume of the wind control model of the small and micro enterprise is small, when the XGboost algorithm is applied for modeling, the model training is carried out after the variable is subjected to binning woe, and the model can be prevented from being over-fitted. The binning woe processing may specifically include decision tree binning, chi-square binning, equal-frequency binning, equal-distance binning, and the like, and may refer to operation processing steps of related binning processing techniques, and the specific binning processing operation steps in this embodiment are not specifically limited.
After sample data is subjected to binning, the samples can be segmented into a training set and a testing set so as to adapt to the sample data requirements under different model use scenes.
The variable screening method provided by the embodiment screens the in-mold variables through variable screening conditions such as the variable deletion rate and the feature importance, is simple in implementation mode, can ensure high variable effectiveness, and can effectively relieve the overfitting condition of the model.
The specific implementation steps of performing variable screening on sample variable data and determining a modulus entering variable in the sample variable data are not limited in the above embodiment, and a variable screening implementation manner is mainly introduced in this embodiment, and mainly includes the following steps:
(1) screening sample variable data according to the correlation and variable importance among the sample variable data to obtain a first variable;
the specific evaluation of the relevance and the importance of the variable means that the specimen embodiment is not limited, for example, woe value may be used as an evaluation standard when evaluating the relevance of the sample data, or the relative distance between two variables may be calculated; when the variable importance of the sample data is evaluated, a random forest or a GBDT (Gradient Boosting Decision Tree) algorithm and the like can be used as evaluation criteria. For example, the variables may be subjected to a screening rule that screens the first variable according to sample variable data variable woe correlation (less than 0.6), random forest or GBDT algorithm variable importance.
(2) And calculating a model stability index of the first variable, and taking the first variable with the model stability index lower than a threshold value as a model entering variable.
The model stability index (PSI) may measure a distribution difference between scores of the test sample and the model development sample, and if a first variable of the model stability index not lower than a threshold is used as a model entry variable, it indicates that the distribution difference between the scores of the test sample and the model development sample is large, and the evaluation accuracy in the actual evaluation of the model may be low, and if a first variable of the model stability index lower than the threshold is used as a model entry variable, it indicates that the distribution difference between the scores of the test sample and the model development sample is small, and the evaluation accuracy in the actual evaluation of the model may be high. For example, the variable PSI may be calculated, and variables with PSI less than 0.1 may be filtered as the last modulo-in variables.
The above-mentioned variable screening mode that this embodiment provided screens the input model variable through variable correlation, training and variable screening conditions such as test sample variable PSI, and the realization mode is simple, and can guarantee that the variable validity is high, can effectively alleviate the condition of model overfitting.
In addition, the specific implementation steps for selecting and determining the model parameters in the foregoing embodiments are also not specifically limited, and a specific implementation manner is described in this embodiment to deepen understanding of the steps.
The method comprises the following implementation steps:
(1) determining the type of the XGboost model base learner; the XGboost model base learner type comprises the following steps: gbtree and gbiner;
the XGboost model base learner is selected, and the XGboost model base learner mainly comprises two types: gbtree (decision tree) and gbiner (linear classifier). Different types of base learners can be configured according to different use requirements in different application scenarios, which is not limited in this embodiment.
(2) Determining a learning objective function and a model evaluation index of the XGboost; wherein the objective function includes: logistic regression and linear regression, and the model evaluation indexes comprise: auc (area Under cutter), logloss, rmse (root mean squared error), mae (mean absolute error), error (error rate);
selecting a learning objective function and a model evaluation index of the XGboost, wherein the objective function mainly comprises: logistic regression and linear regression, and the model evaluation indexes mainly comprise: auc, loglos, rmse, mae, error, etc.
(3) And adjusting and optimizing the XG boost algorithm parameters, and combining the obtained optimal model parameters to serve as the XG boost model parameters.
And adjusting and optimizing the commonly used parameters to obtain the optimal model parameter combination. Because the sample size of the small and micro enterprise wind control model is small, the maximum depth of the tree can be generally set to be 5, and regular parameters of L1 and L2 can also be set to be larger.
The determination mode of the model parameters can be widely applied to risk assessment scenes of different enterprises, and can also ensure a better model training effect when the sample size is less, and improve the accuracy of model identification.
Referring to fig. 2, fig. 2 is a block diagram of a structure of an enterprise credit risk assessment apparatus provided in the present embodiment; the device mainly includes: a data receiving unit 110, a model evaluation unit 120, and a model training unit 130. The enterprise credit risk assessment device provided by the embodiment can be mutually contrasted with the enterprise credit risk assessment method.
The data receiving unit 100 is mainly used for receiving enterprise tax data of an enterprise to be evaluated;
the model evaluation unit 200 is mainly used for calling a pre-trained small and micro enterprise credit risk model built based on the XGboost algorithm to perform operation credit risk evaluation on the enterprise tax data to obtain an evaluation result;
the model training unit 130, which is mainly used for training the small micro enterprise credit risk model called by the model evaluation unit, includes:
the data acquiring subunit 131 is mainly used for acquiring tax sample data of an enterprise;
the variable preprocessing subunit 132 is mainly configured to perform variable preprocessing on the tax sample data to obtain sample variable data;
the variable screening subunit 133 is mainly configured to perform variable screening on the sample variable data by using the variable stability of the sample variable data and the model stability as evaluation indexes, and determine a mode entering variable in the sample variable data;
the parameter determining subunit 134 is mainly used for determining model parameters in a small and micro enterprise credit risk model built based on the XGBoost algorithm;
the training subunit 135 is mainly used for calling sample variable data to train the credit risk model of the small micro-enterprise.
Optionally, the variable preprocessing subunit may specifically include:
the variable analysis subunit is used for carrying out variable analysis on the tax sample data and taking data output by the variable analysis as preprocessed sample data;
and the box dividing processing subunit is used for carrying out box dividing woe processing on the preprocessed sample data to obtain the variable data after box dividing, and taking the variable data after box dividing as the sample variable data.
Optionally, the variable analysis subunit may specifically include:
the statistical analysis subunit is used for performing statistical analysis on the distribution of the tax sample data to obtain sample distribution statistical information;
and the exception processing subunit is used for performing data filling processing on the missing values and the abnormal values in the sample distribution statistical information and taking the processed data as pre-processing sample data.
The present embodiment provides another enterprise credit risk assessment apparatus, and as shown in fig. 3, a block diagram of the enterprise credit risk assessment apparatus provided in the present embodiment is shown; the device mainly includes: a variable selection background and a model parameter console.
The variable selection background is responsible for processing the enterprise data and selecting the variables.
The enterprise data processing unit is mainly used for cleaning and describing data analysis of the data and visually displaying the distribution of sample characteristics so as to preliminarily know the data.
The variable selection unit screens the variables according to the methods of variable missing rate, characteristic importance, variable binning, variable correlation, correlation after variable binning and the like. And constructing a feature engineering by an integrated learning method such as Random Forest (Random Forest) and the like. And finally, calculating the variable PSI of the training set and the test set according to the result of variable binning, and screening the variable PSI smaller than 0.1 to serve as a final mode entering variable.
Specifically, the variable selection background comprises the following units:
(1) a data acquisition unit: the system collects original enterprise sample data
(2) A variable distribution unit: is responsible for carrying out statistical analysis on the distribution of the sample variable and visualizing the variable distribution diagram
(3) A data cleaning unit: the method is responsible for cleaning, missing value and abnormal value processing of sample data, missing value and abnormal value filling of the data, and specifically comprises operations such as transposition of the data, mathematical operation and the like
(4) Variable box separation unit: because the sample data volume of the wind control model of the small and micro enterprise is small, when the XGboost algorithm is applied for modeling, the model training is carried out after the variable is subjected to binning woe, and the model is prevented from being over-fitted. Therefore, the unit is responsible for binning the preprocessed sample data, specifically comprising decision tree binning, chi-square binning, equal frequency binning, and equal distance binning. And supports graphical output of binned trend graphs.
(5) A variable selection unit: supporting the division of a training test set, screening variables according to the relevance of the variables after the binning woe, and providing a plurality of algorithms (random forests, GBDT and other algorithms) for selecting the variables according to the importance of the variables. Finally, the variables are selected according to the PSI values of the divided data set variables.
And the model parameter control console is mainly responsible for XGboost model parameter tuning.
After the data is processed by the variable selection background, the sample data enters the model parameter console. And (4) determining a group of finally suitable parameter combinations by adjusting parameters of the XGboost common model.
Specifically, the model parameter console comprises the following units:
(1) a general parameter unit: the XGboost model is responsible for controlling the macroscopic function of the XGboost model, and the main parameter is the type of a base learner.
(2) Learning target parameter unit: and the control of model objective functions and model evaluation indexes is carried out.
(3) Booster parameter unit: the method is responsible for controlling the common boost parameter, and specifically comprises iteration times, the maximum depth of a tree, the lowest loss function value required by node division, the minimum leaf node sample weight sum, the proportion of a sub-sample of a training model in the whole sample set, the proportion of feature random sampling, an L1 regularization term weight coefficient, an L2 regularization term weight coefficient and a learning rate.
The enterprise credit risk assessment device provided by the embodiment strictly screens the model-entering variables of the model, screens the co-linearity problem of the variables, the importance of the variables and the stability of the variables, obtains the final effective and stable model-entering variables, constructs the XGboost model, and avoids the problem of model overfitting caused by small modeling sample size of the small and micro enterprise risk model.
The embodiment provides an enterprise credit risk assessment device, mainly including: a memory and a processor.
Wherein, the memory is used for storing programs;
the processor is configured to execute the program to implement the steps of the enterprise credit risk assessment method described in the above embodiments, which may be referred to in the above description of the enterprise credit risk assessment method.
Referring to fig. 4, a schematic structural diagram of an enterprise credit risk assessment device provided in this embodiment is provided, where the enterprise credit risk assessment device may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 322 (e.g., one or more processors), a memory 332, and one or more storage media 330 (e.g., one or more mass storage devices) storing applications 342 or data 344. Memory 332 and storage media 330 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 330 may include one or more modules (not shown), each of which may include a series of instructions operating on a data processing device. Still further, the central processor 322 may be configured to communicate with the storage medium 330 to execute a series of instruction operations in the storage medium 330 on the enterprise credit risk assessment device 301.
The enterprise credit risk assessment device 301 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input-output interfaces 358, and/or one or more operating systems 341, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and the like.
The steps in the enterprise credit risk assessment method described in fig. 1 above can be implemented by the structure of the enterprise credit risk assessment apparatus introduced in this embodiment.
The present embodiment discloses a readable storage medium, on which a program is stored, and the program, when executed by a processor, implements the steps of the enterprise credit risk assessment method described in the above embodiments, which may be referred to in the description of the enterprise credit risk assessment method in the above embodiments.
The readable storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and various other readable storage media capable of storing program codes.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The enterprise credit risk assessment method, device, equipment and readable storage medium provided by the application are described in detail above. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
Claims (10)
1. An enterprise credit risk assessment method, comprising:
receiving enterprise tax data of an enterprise to be evaluated;
calling a pre-trained small and micro enterprise credit risk model built based on an XGboost algorithm to carry out operation credit risk evaluation on the enterprise tax data to obtain an evaluation result;
the training method of the credit risk model of the small micro-enterprise comprises the following steps:
acquiring tax sample data of an enterprise;
performing variable preprocessing on the tax sample data to obtain sample variable data;
taking the variable stability and the model stability of the sample variable data as evaluation indexes, performing variable screening on the sample variable data, and determining a mode entering variable in the sample variable data;
determining model parameters in a small and micro enterprise credit risk model built based on an XGboost algorithm;
and calling the sample variable data to train the credit risk model of the small micro enterprise.
2. The enterprise credit risk assessment method of claim 1, wherein performing variable preprocessing on the tax sample data to obtain sample variable data comprises:
performing variable analysis on the tax sample data, and taking data output by the variable analysis as preprocessing sample data;
and performing box separation woe on the preprocessed sample data to obtain boxed variable data, and taking the boxed variable data as sample variable data.
3. The enterprise credit risk assessment method of claim 2, wherein performing variable analysis on the tax sample data and using data output by the variable analysis as pre-processed sample data comprises:
performing statistical analysis on the distribution of the tax sample data to obtain sample distribution statistical information;
and performing data filling processing on the missing values and the abnormal values in the sample distribution statistical information, and taking the processed data as pre-processing sample data.
4. The enterprise credit risk assessment method of claim 1, wherein the variable screening of the sample variable data to determine the modelled variables in the sample variable data with the variable stability and the model stability of the sample variable data as assessment indicators comprises:
screening the sample variable data according to the correlation and the variable importance among the sample variable data to obtain a first variable;
and calculating a model stability index of the first variable, and taking the first variable with the model stability index lower than a threshold value as a model entering variable.
5. The enterprise credit risk assessment method of claim 1, wherein the determining model parameters in the small micro enterprise credit risk model built based on the XGBoost algorithm comprises:
determining the type of the XGboost model base learner; wherein the XGboost model base learner type comprises: gbtree and gbiner;
determining a learning objective function and a model evaluation index of the XGboost; wherein the objective function comprises: logistic regression and linear regression, wherein the model evaluation indexes comprise: auc, loglos, rmse, mae, error;
and adjusting and optimizing the XG boost algorithm parameters, and combining the obtained optimal model parameters to serve as the XG boost model parameters.
6. An enterprise credit risk assessment device, comprising:
the data receiving unit is used for receiving enterprise tax data of an enterprise to be evaluated;
the model evaluation unit is used for calling a pre-trained small and micro enterprise credit risk model built based on the XGboost algorithm to carry out operation credit risk evaluation on the enterprise tax data to obtain an evaluation result;
wherein the model training unit for training the small micro enterprise credit risk model called by the model evaluation unit comprises:
the data acquisition subunit is used for acquiring tax sample data of an enterprise;
the variable preprocessing subunit is used for performing variable preprocessing on the tax sample data to obtain sample variable data;
the variable screening subunit is used for performing variable screening on the sample variable data by taking the variable stability and the model stability of the sample variable data as evaluation indexes to determine a mode entering variable in the sample variable data;
the parameter determining subunit is used for determining model parameters in a small and micro enterprise credit risk model built based on an XGboost algorithm;
and the training subunit is used for calling the sample variable data to train the credit risk model of the small micro-enterprise.
7. The enterprise credit risk assessment device of claim 6, wherein the variable preprocessing subunit comprises:
the variable analysis subunit is used for carrying out variable analysis on the tax sample data and taking data output by the variable analysis as preprocessed sample data;
and the box dividing processing subunit is used for carrying out box dividing woe processing on the preprocessed sample data to obtain the variable data after box dividing, and taking the variable data after box dividing as the sample variable data.
8. The enterprise credit risk assessment device of claim 7, wherein the variable analysis subunit comprises:
the statistical analysis subunit is used for performing statistical analysis on the distribution of the tax sample data to obtain sample distribution statistical information;
and the exception processing subunit is used for performing data filling processing on the missing values and the abnormal values in the sample distribution statistical information and taking the processed data as pre-processing sample data.
9. An enterprise credit risk assessment device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the enterprise credit risk assessment method according to any one of claims 1 to 5 when executing the computer program.
10. A readable storage medium, characterized in that the readable storage medium has stored thereon a program which, when being executed by a processor, realizes the steps of the enterprise credit risk assessment method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010805252.7A CN111951097A (en) | 2020-08-12 | 2020-08-12 | Enterprise credit risk assessment method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010805252.7A CN111951097A (en) | 2020-08-12 | 2020-08-12 | Enterprise credit risk assessment method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111951097A true CN111951097A (en) | 2020-11-17 |
Family
ID=73332732
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010805252.7A Pending CN111951097A (en) | 2020-08-12 | 2020-08-12 | Enterprise credit risk assessment method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111951097A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112529477A (en) * | 2020-12-29 | 2021-03-19 | 平安普惠企业管理有限公司 | Credit evaluation variable screening method, device, computer equipment and storage medium |
CN112633635A (en) * | 2020-11-29 | 2021-04-09 | 龙马智芯(珠海横琴)科技有限公司 | Exhibitor risk assessment method, exhibitor risk assessment device, exhibitor risk assessment server and readable storage medium |
CN112749922A (en) * | 2021-02-01 | 2021-05-04 | 深圳无域科技技术有限公司 | Wind control model training method, system, equipment and computer readable medium |
CN113205403A (en) * | 2021-03-30 | 2021-08-03 | 北京中交兴路信息科技有限公司 | Method and device for calculating enterprise credit level, storage medium and terminal |
CN113222731A (en) * | 2021-04-25 | 2021-08-06 | 北京工业大学 | Small sample credit evaluation method, system and medium based on machine learning |
CN113393328A (en) * | 2021-06-21 | 2021-09-14 | 深圳微众信用科技股份有限公司 | Method and device for assessing pre-financing and pre-loan approval and computer storage medium |
CN113409150A (en) * | 2021-06-21 | 2021-09-17 | 深圳微众信用科技股份有限公司 | Operation risk and credit risk assessment method, device and computer storage medium |
CN113793212A (en) * | 2021-09-24 | 2021-12-14 | 重庆富民银行股份有限公司 | Credit assessment method |
CN114492929A (en) * | 2021-12-23 | 2022-05-13 | 江南大学 | XGboost-based financial credit enterprise credit prediction method |
CN115329207A (en) * | 2022-10-17 | 2022-11-11 | 启客(北京)科技有限公司 | Intelligent sales information recommendation method and system |
CN115860926A (en) * | 2023-02-20 | 2023-03-28 | 江西汉辰信息技术股份有限公司 | Wind control decision method and system based on decision tree |
CN116051296A (en) * | 2022-12-28 | 2023-05-02 | 中国银行保险信息技术管理有限公司 | Customer evaluation analysis method and system based on standardized insurance data |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106779457A (en) * | 2016-12-29 | 2017-05-31 | 深圳微众税银信息服务有限公司 | A kind of rating business credit method and system |
CN110163743A (en) * | 2019-04-28 | 2019-08-23 | 钛镕智能科技(苏州)有限公司 | A kind of credit-graded approach based on hyperparameter optimization |
CN111507822A (en) * | 2020-04-13 | 2020-08-07 | 深圳微众信用科技股份有限公司 | Enterprise risk assessment method based on feature engineering |
-
2020
- 2020-08-12 CN CN202010805252.7A patent/CN111951097A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106779457A (en) * | 2016-12-29 | 2017-05-31 | 深圳微众税银信息服务有限公司 | A kind of rating business credit method and system |
CN110163743A (en) * | 2019-04-28 | 2019-08-23 | 钛镕智能科技(苏州)有限公司 | A kind of credit-graded approach based on hyperparameter optimization |
CN111507822A (en) * | 2020-04-13 | 2020-08-07 | 深圳微众信用科技股份有限公司 | Enterprise risk assessment method based on feature engineering |
Non-Patent Citations (1)
Title |
---|
吴锦华 等: "特征选择方法在信用评分系统中的应用", 信息与电脑(理论版), no. 08, 25 April 2019 (2019-04-25) * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112633635A (en) * | 2020-11-29 | 2021-04-09 | 龙马智芯(珠海横琴)科技有限公司 | Exhibitor risk assessment method, exhibitor risk assessment device, exhibitor risk assessment server and readable storage medium |
CN112529477A (en) * | 2020-12-29 | 2021-03-19 | 平安普惠企业管理有限公司 | Credit evaluation variable screening method, device, computer equipment and storage medium |
CN112749922A (en) * | 2021-02-01 | 2021-05-04 | 深圳无域科技技术有限公司 | Wind control model training method, system, equipment and computer readable medium |
CN113205403A (en) * | 2021-03-30 | 2021-08-03 | 北京中交兴路信息科技有限公司 | Method and device for calculating enterprise credit level, storage medium and terminal |
CN113222731A (en) * | 2021-04-25 | 2021-08-06 | 北京工业大学 | Small sample credit evaluation method, system and medium based on machine learning |
CN113409150A (en) * | 2021-06-21 | 2021-09-17 | 深圳微众信用科技股份有限公司 | Operation risk and credit risk assessment method, device and computer storage medium |
CN113393328A (en) * | 2021-06-21 | 2021-09-14 | 深圳微众信用科技股份有限公司 | Method and device for assessing pre-financing and pre-loan approval and computer storage medium |
CN113793212A (en) * | 2021-09-24 | 2021-12-14 | 重庆富民银行股份有限公司 | Credit assessment method |
CN114492929A (en) * | 2021-12-23 | 2022-05-13 | 江南大学 | XGboost-based financial credit enterprise credit prediction method |
CN115329207A (en) * | 2022-10-17 | 2022-11-11 | 启客(北京)科技有限公司 | Intelligent sales information recommendation method and system |
CN116051296A (en) * | 2022-12-28 | 2023-05-02 | 中国银行保险信息技术管理有限公司 | Customer evaluation analysis method and system based on standardized insurance data |
CN116051296B (en) * | 2022-12-28 | 2023-09-29 | 中国银行保险信息技术管理有限公司 | Customer evaluation analysis method and system based on standardized insurance data |
CN115860926A (en) * | 2023-02-20 | 2023-03-28 | 江西汉辰信息技术股份有限公司 | Wind control decision method and system based on decision tree |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111951097A (en) | Enterprise credit risk assessment method, device, equipment and storage medium | |
CN108564286B (en) | Artificial intelligent financial wind-control credit assessment method and system based on big data credit investigation | |
CN113642849B (en) | Geological disaster risk comprehensive evaluation method and device considering spatial distribution characteristics | |
CN110738564A (en) | Post-loan risk assessment method and device and storage medium | |
CN108960269B (en) | Feature acquisition method and device for data set and computing equipment | |
CN112700324A (en) | User loan default prediction method based on combination of Catboost and restricted Boltzmann machine | |
CN110634060A (en) | User credit risk assessment method, system, device and storage medium | |
CN113361690A (en) | Water quality prediction model training method, water quality prediction device, water quality prediction equipment and medium | |
CN113344438A (en) | Loan system, loan monitoring method, loan monitoring apparatus, and loan medium for monitoring loan behavior | |
CN112488496A (en) | Financial index prediction method and device | |
CN113435713B (en) | Risk map compiling method and system based on GIS technology and two-model fusion | |
CN115203496A (en) | Project intelligent prediction and evaluation method and system based on big data and readable storage medium | |
CN114004691A (en) | Line scoring method, device, equipment and storage medium based on fusion algorithm | |
CN116129189A (en) | Plant disease identification method, plant disease identification equipment, storage medium and plant disease identification device | |
CN115906669A (en) | Dense residual error network landslide susceptibility evaluation method considering negative sample selection strategy | |
CN113673609B (en) | Questionnaire data analysis method based on linear hidden variables | |
CN113240513A (en) | Method for determining user credit line and related device | |
CN113553754A (en) | Memory, fire risk prediction model construction method, system and device | |
CN112862014A (en) | Client credit early warning method and device | |
CN111695989A (en) | Modeling method and platform of wind-control credit model | |
CN111612626A (en) | Method and device for preprocessing bond evaluation data | |
CN117493140B (en) | Evaluation system for deep learning model | |
Thilaka et al. | A Machine Learning Approach to GDP Prediction by Analyzing Economic Indicators | |
CN112465310A (en) | Computer-implemented data processing method, system, apparatus, and storage medium | |
Subagyo et al. | Study of Economic Inequality in The Agglomeration Region of Malang Raya |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |