CN114139931A

CN114139931A - Enterprise data evaluation method and device, computer equipment and storage medium

Info

Publication number: CN114139931A
Application number: CN202111432548.XA
Authority: CN
Inventors: 田鸥; 郭丹丹; 尹传金; 张一鹏; 朱婷
Original assignee: Ping An Bank Co Ltd
Current assignee: Ping An Bank Co Ltd
Priority date: 2021-11-29
Filing date: 2021-11-29
Publication date: 2022-03-04

Abstract

The application relates to the field of artificial intelligence, and discloses an enterprise data evaluation method, an enterprise data evaluation device, computer equipment and a storage medium, wherein the method comprises the steps of obtaining enterprise training data with multiple dimensions and multiple preset evaluation models; aiming at the enterprise training data of each dimension, training by utilizing a plurality of preset evaluation models to obtain a plurality of first sub-models; judging whether the index of the first submodel meets the preset index requirement, and extracting the first submodel meeting the preset index requirement under each dimensionality to serve as a second submodel; combining the second submodels under all dimensions to obtain a plurality of first main models; calculating the lifting degrees of the plurality of first main models according to the indexes of the plurality of first main models, and extracting the first main model with the largest lifting degree as a second main model; inputting the enterprise data into the second main model for evaluation to obtain an evaluation result; the present application also relates to blockchain techniques, so the evaluation results are stored in blockchains. The method and the device improve the accuracy of the evaluation result.

Description

Enterprise data evaluation method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a method and an apparatus for enterprise data evaluation, a computer device, and a storage medium.

Background

In an actual financial enterprise, in order to perform a certain evaluation result on a client, the wind control model under different application scenes is single, and mainly refers to a financial service surrounding a client core enterprise or a core platform based on supply chain data, the way of describing client main body information by the data is not comprehensive enough, and under the condition of large data difference, the evaluation of the client enterprise is difficult to make; in the prior art, a typical averaging, bagging, boosting, ensemble and other integrated algorithms are mainly used, sub-data of enterprise data at different latitudes can exist, and due to the fact that different sub-models are used, output result values are obviously different, and further the overall result accuracy is low, so that the problem that when the enterprise data are evaluated in the prior art, the accuracy of the output result values is low is solved urgently.

Disclosure of Invention

The application provides an enterprise data evaluation method and device, computer equipment and a storage medium, and aims to solve the problem that in the prior art, the accuracy rate is low when an integrated algorithm is used for processing client data.

In order to solve the above problem, the present application provides an enterprise data evaluation method, including:

acquiring enterprise training data of multiple dimensions and multiple preset evaluation models;

aiming at the enterprise training data of each dimension, training by utilizing a plurality of preset evaluation models to obtain a plurality of corresponding first sub-models;

judging whether the index of the first submodel meets the preset index requirement, and extracting the first submodel meeting the preset index requirement under each dimensionality to serve as a second submodel;

combining the second submodels under the dimensions to obtain a plurality of first main models, wherein the second submodels under the first main models respectively belong to different dimensions;

calculating indexes of the first main model, calculating the lifting degrees of the plurality of first main models according to the indexes of the plurality of first main models, and extracting the first main model with the largest lifting degree as a second main model;

and enterprise data of the enterprise to be evaluated is received, and the enterprise data of the enterprise to be evaluated is input into the second main model for evaluation, so that an evaluation result corresponding to the enterprise to be evaluated is obtained.

Further, the determining whether the index of the first sub-model meets a preset index requirement includes:

calculating a performance index and a stability index of the first sub-model;

and respectively comparing and judging the performance index and the stability index of the first submodel with the corresponding elements in the preset index requirement.

Further, the calculating the index of the first main model includes:

and according to the weight occupied by the second submodel under the first main model, carrying out weighted summation on the performance index and the stability index in the second submodel to obtain the index of the first main model, wherein the weight is obtained according to the ratio of the AUC value in the index of the second submodel to the sum of the AUC values corresponding to all the second submodels under the first main model.

Further, the receiving enterprise data of the enterprise to be evaluated, and inputting the enterprise data of the enterprise to be evaluated into the second master model for evaluation, and obtaining an evaluation result corresponding to the enterprise to be evaluated includes:

the enterprise data of the enterprise to be evaluated is received, and the content of the enterprise data of the enterprise to be evaluated is classified to obtain data of multiple categories;

inputting the data of different types into a second sub-model corresponding to the second main model respectively to correspondingly obtain a plurality of sub-evaluation results;

and according to the weight occupied by the second sub-model under the second main model, carrying out weighted summation on the plurality of sub-evaluation results to obtain a corresponding first evaluation result.

Further, before the inputting the different categories of data into the corresponding second sub-models under the second main model, the method further includes:

counting the data quantity corresponding to the data of each category, and comparing the data quantity with a preset value to obtain the data of the category corresponding to the preset value of which the data quantity is greater than or equal to the preset value;

the respectively inputting the different types of data into the corresponding second sub-models under the second main model comprises:

and respectively inputting a plurality of data of the category corresponding to the data quantity larger than or equal to the preset numerical value into a second sub-model corresponding to the second main model.

sorting the sub-evaluation results respectively obtained by the plurality of second sub-models under the second main model to obtain a sorting value;

normalizing the ranking values to obtain normalized values;

and according to the weight occupied by the second submodel under the second main model, carrying out weighted summation on the plurality of normalized numerical values to obtain a corresponding second evaluation result.

Further, after the data of different categories are respectively input into the corresponding second sub-models in the second main model and a plurality of sub-evaluation results are correspondingly obtained, the method further includes:

according to the weight occupied by the second sub-model under the second main model, carrying out weighted summation on the plurality of sub-evaluation results to obtain a corresponding first evaluation result;

after the weighted summation of the plurality of normalization values to obtain a corresponding second evaluation result, the method further includes:

according to the second evaluation result, the enterprises to be evaluated are sorted to obtain a sorted list;

and calibrating the sorted list by combining the sorted list with the first evaluation result.

In order to solve the above problem, the present application also provides an enterprise data evaluation device, including:

the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring enterprise training data with multiple dimensions and multiple preset evaluation models;

the training module is used for training enterprise training data of each dimension by utilizing a plurality of preset evaluation models to obtain a plurality of corresponding first sub-models;

the first decimation module is used for judging whether the index of the first submodel meets the preset index requirement or not, and extracting the first submodel meeting the preset index requirement under each dimensionality to serve as a second submodel;

the combination module is used for combining the second submodels under the dimensions to obtain a plurality of first main models, and the second submodels under the first main models belong to different dimensions respectively;

a second decimation module for calculating an index of the first main model, calculating a lifting degree of the plurality of first main models according to the indexes of the plurality of first main models, and extracting the first main model with the largest lifting degree as a second main model;

and the processing module is used for receiving enterprise data of the enterprise to be evaluated, inputting the enterprise data of the enterprise to be evaluated into the second main model for evaluation, and obtaining an evaluation result corresponding to the enterprise to be evaluated.

In order to solve the above problem, the present application also provides a computer device, including:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the enterprise data assessment method as described above.

To solve the above problem, the present application also provides a non-volatile computer-readable storage medium having computer-readable instructions stored thereon, which when executed by a processor implement the enterprise data assessment method as described above.

Compared with the prior art, the enterprise data evaluation method, the enterprise data evaluation device, the computer equipment and the storage medium provided by the embodiment of the application have at least the following beneficial effects:

the enterprise training data of multiple dimensions and multiple preset evaluation models are obtained, aiming at the enterprise training data of each dimension, multiple preset models are used for training, so that multiple corresponding first submodels can be obtained under each dimension, the first submodels are preliminarily screened according to whether indexes of the first submodels meet preset index requirements, second submodels meeting requirements are obtained, one second submodel is extracted from each dimension and combined to obtain multiple first main models, each index of each first main model is the weighted sum of multiple second submodels under the first main model, the lifting degree of each first main model is calculated according to the indexes of the first main models, and the first main model with the largest lifting degree is extracted to serve as the second main model; therefore, the training and screening of the model are completed, and the stability and the efficiency are better when the second main model is subsequently utilized to process data; the enterprise data of the enterprise to be evaluated is received and input into the second main model for evaluation, so that the evaluation result corresponding to the enterprise to be evaluated is obtained, and the accuracy of the evaluation result of the enterprise to be evaluated according to the enterprise data of the enterprise to be evaluated is improved.

Drawings

In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for describing the embodiments of the present application, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without inventive effort.

FIG. 1 is a schematic flow chart illustrating an enterprise data evaluation method according to an embodiment of the present application;

FIG. 2 is a flowchart of one embodiment of step S6 of FIG. 1;

FIG. 3 is a flowchart of still another embodiment of step S6 in FIG. 1;

FIG. 4 is a block diagram of an enterprise data evaluation device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. One skilled in the art will explicitly or implicitly appreciate that the embodiments described herein can be combined with other embodiments.

The application provides an enterprise data evaluation method. Referring to fig. 1, fig. 1 is a schematic flowchart of an enterprise data evaluation method according to an embodiment of the present disclosure.

In this embodiment, the enterprise data evaluation method includes:

s1, acquiring enterprise training data of multiple dimensions and multiple preset evaluation models;

in the present application, the enterprise training data of multiple dimensions are data of multiple different types, for example, in the embodiment of the present application, the enterprise training data of 7 types, that is, the enterprise training data of 7 dimensions, is based on basic information, financial status, tax payment, off-line settlement, off-line credit, in-line storage, in-line settlement, and in-line credit corresponding to the enterprise, and the enterprise training data of each dimension has its corresponding label, that is, overdue probability or score.

The preset evaluation model is a basic model prepared for use in advance, and in the embodiment of the present application, the preset evaluation model includes five preset algorithms, i.e., logistic regression, random forest, bayes, svm, and ann.

S2, aiming at the enterprise training data of each dimension, training by utilizing a plurality of preset evaluation models to obtain a plurality of corresponding first sub-models;

specifically, the enterprise training data of each dimension is used, and five preset evaluation models, namely logistic regression, random forest, bayer, svm and ann, are combined for training to obtain 5 first sub models corresponding to each dimension, so that the enterprise training data of 7 dimensions can obtain 35 first sub models in total.

Before the enterprise training data for each dimension is trained by using the plurality of preset evaluation models to obtain a plurality of corresponding first sub-models, it can be understood that the enterprise training data for each dimension is preprocessed, for example, missing value filling, abnormal value replacement, feature extraction, and the like.

S3, judging whether the index of the first submodel meets the preset index requirement, and extracting the first submodel meeting the preset index requirement under each dimensionality to serve as a second submodel;

specifically, according to the judgment of whether each index of the first submodel, such as a performance index and a stability index, meets the requirement of a preset index, if the index meets the requirement, the first submodel meeting the requirement of the preset index under each dimensionality is extracted to serve as a second submodel, and the first submodel which does not meet the requirement is excluded.

calculating a performance index and a stability index of the first sub-model;

Specifically, the performance indicators include KS (the larger the KS value, the greater the degree to which the model can distinguish between positive and negative customers), AUC (Area Under customer), and AR (accuracy ratio); the Stability Index is PSI (Population Stability Index);

1) the calculation formula of the KS value is KS ═ F_g(x)-F_b(x)|；

Firstly, grading a whole sample in a certain period by using a model to be verified, and dividing the whole sample set in the period into two sample subsets, namely a normal customer and a default customer; then, the cumulative probability distribution of the two sample subsets, namely the normal client cumulative probability distribution F is calculated according to the risk level from low to high_g(x) And cumulative probability distribution of default customers F_b(x) And taking the maximum value of the difference values between the two distributions corresponding to each risk level as the KS value.

The larger the KS value is, the larger the degree that the model can distinguish the positive customer from the negative customer is, and the more obvious the difference between the cumulative probability distributions of the two sub-samples is, so that the normal customer and the default customer can be effectively distinguished through the model, namely the stronger the distinguishing capability of the model to be verified is.

2) Obtaining the number of good clients (TP, True Positive) correctly classified by the model, the number of bad clients (FP, False Positive) correctly classified by the model, the number of bad clients (TN, True Negative) correctly classified by the model, and the number of good clients (FN, False Negative) correctly classified by the model; according to the above, the formula is obtained:

according to the contents, an ROC curve is constructed by taking the FPR as an X axis and the TRP as a y axis, and the area enclosed by the ROC curve and the X axis and the y axis is calculated, so that the AUC value can be obtained. AUC represents the prediction precision of the model, and the larger the AUC value is, the higher the classification accuracy of the model is.

3) The Accuracy Ratio (AR) measures the ability of the model to correctly rank the client risk levels, another indicator of the model's discriminative power verification. The calculation of the AR value depends on the cumulative accuracy curve (CAP curve); the abscissa and ordinate of the CAP curve respectively represent the cumulative customer percentage and the cumulative default customer percentage after the verification samples are sorted from large to small according to the prediction probability of the main model. The random model corresponds to a straight line OB (representing no distinguishing capability at all), the ideal rating model corresponds to a broken line OAB (representing complete distinguishing capability), the model to be verified corresponds to a curve OB, the curve OB is located in a triangle formed by the broken line OAB and the straight line OB, the closer the curve is to the OAB, the stronger the distinguishing capability of the model is, and the weaker the model is otherwise.

The AR value is a comprehensive index of the CAP curve and is defined as the area a between the CAP curve of the model to be verified and the CAP curve of the random model_RArea a between CAP curve of ideal rating model and CAP curve of random model_P(area of triangular AOB), i.e. ratio

AR＝a_R/a_P。

4) Dividing the probability values of the training set and the test set into N groups from low to high, and respectively calculating the sample ratio (the ratio of the total samples of the training set to the sample stations) X of the training set of the N groups_iRatio of test machine samples (ratio of total samples in test set sample stations) Y_iThen, the calculation formula of PSI is as follows:

after values of KS, AUC, AR and PSI are obtained through calculation, whether the values of KS, AUC, AR and PSI obtained through calculation according to the first submodel meet requirements or not is judged according to elements in preset index requirements, such as the KS value being more than or equal to 0.3, the AR value being more than or equal to 0.4, the AUC value being more than or equal to 0.7 and the PSI value being less than 0.25.

And the performance index and the stability index are calculated and are correspondingly compared and judged with each element in the preset index, so that the first submodel is screened, and the accuracy of the final output result is improved.

S4, combining the second submodels under the dimensions to obtain a plurality of first main models, wherein the second submodels under the first main models belong to different dimensions respectively;

specifically, the second submodels in each dimension are combined to obtain a plurality of first main models, the second submodels in each first main model belong to different dimensions, and the indexes of the combined first main models inherit the indexes of the second submodels, that is, the indexes of the second submodels are weighted and summed to obtain the indexes of the first main models.

For example, under the existing dimensions 1, 2 and 3, there are two second submodels a and b under dimension 1, three second submodels c, d and e under dimension 2, and f one second submodel under dimension 3; now, a second submodel is arbitrarily extracted from each dimension to form a first main model, so that 2 × 3 × 1 combinations, that is, 6 first main models exist, and the second submodels forming the first main model belong to different dimensions respectively.

S5, calculating indexes of the first main model, calculating the lifting degrees of the first main models according to the indexes of the first main models, and extracting the first main model with the largest lifting degree as a second main model;

specifically, the promotion degrees of the plurality of first main models are calculated according to the performance indexes or stability indexes of the plurality of first main models, so that the first main model with the highest promotion degree is extracted and used as a second main model, namely a final main model, the optimal sub-model combination is obtained, and the accuracy of subsequent enterprise processing to be evaluated is improved.

Further, calculating the indicator of the first main model comprises:

Specifically, in this embodiment of the present application, the calculation of the weight is calculated according to a ratio between an AUC value of each second submodel in the first main model and a sum of AUC values of all second submodels in the first main model, and a specific formula is as follows:

and according to the weight occupied by the second submodel under the first main model, carrying out weighted summation on the performance index and the stability index in the second submodel so as to obtain the index of the first main model.

And calculating the output result of the first main model, and carrying out weighted summation on the output result of each second submodel according to the weight occupied by the second submodel under the first main model.

And taking the ratio of the AUC value of each second submodel to the sum of the AUC values of all the second submodels in the first main model as the weight of the second submodel, and calculating each index of the first main model by using the weight so as to select the first main model subsequently.

Further, the calculating the lifting degrees of the plurality of first main models according to the indexes of the plurality of first main models comprises:

calculating a lift-off of the first master model according to the following formula:

index refers to an Index for measuring model performance, i.e., a performance Index in the present application; a refers to the second submodel, n refers to the number of second submodels; a represents a first main model integrated by a plurality of second submodels; lift represents the Lift; k is a value between 1 and n, and k is an integer; extreme () is taken to be an extreme value. The maximum value is taken when the index is larger and represents the better performance of the model, and the minimum value is taken when the index is larger and represents the worse performance of the model.

Specifically, the effect difference between the first main model and the second sub-model under the first main model is directly compared, and the superiority of the main model in performance in a specific aspect compared with the sub-model is measured by means of the Lift degree Lift concept. Index is an Index for measuring model performance, and can adopt an accuracy Index AUC or a discriminative power Index KS and the like.

In the present application, AUC is used as an index for measuring model performance. And the AUC for evaluating the promotion degree is an AUC index calculated by the trained second submodel under the full-scale sample. Similarly, the lifting degree of the full sample is calculated according to the training set and the test set respectively.

By calculating the lifting degree of each first main model, selecting the first main model with the highest lifting degree as the second main model, and the result output by using the second main model has the optimal accuracy.

And S6, receiving enterprise data of the enterprise to be evaluated, and inputting the enterprise data of the enterprise to be evaluated into the second main model for evaluation to obtain an evaluation result corresponding to the enterprise to be evaluated.

Specifically, enterprise data of the enterprise to be evaluated to be processed is evaluated by using the second master model, sub-evaluation results are obtained by using each second sub-model under the second master model, and the corresponding sub-evaluation results are weighted and summed according to the weight of each second sub-model to obtain the evaluation results corresponding to the enterprise to be evaluated.

Further, as shown in fig. 2, the receiving enterprise data of the enterprise to be evaluated, and inputting the enterprise data of the enterprise to be evaluated into the second master model for evaluation, and obtaining an evaluation result corresponding to the enterprise to be evaluated includes:

s61, receiving the enterprise data of the enterprise to be evaluated, and classifying the content of the enterprise data of the enterprise to be evaluated to obtain data of multiple categories;

s62, inputting the data of different categories into corresponding second sub-models under the second main model respectively, and obtaining a plurality of sub-evaluation results correspondingly;

and S63, according to the weight occupied by the second sub-model under the second main model, carrying out weighted summation on the plurality of sub-evaluation results to obtain a corresponding first evaluation result.

Specifically, enterprise data of an enterprise to be evaluated is received, the enterprise data of the enterprise to be evaluated can be formatted data or unformatted data, a plurality of categories of data are obtained by classifying the content in the enterprise data of the enterprise to be evaluated, namely classifying the content into data corresponding to each dimension, the data of different categories are respectively input into a second sub-model corresponding to a second main model, namely the data of different categories correspond to the second sub-model under the second main model one to one, so that a plurality of sub-evaluation results are correspondingly obtained, and the sub-evaluation results are weighted and summed according to the weight occupied by the second sub-model under the second main model to obtain the corresponding evaluation result. The calculation can be made according to the following formula:

p is the first evaluation result of the second main model, n is the number of second submodels under the second main model, P_kFor the result of the sub-evaluation of the kth second submodel, w_kIs the weight of the kth second submodel.

When category data corresponding to a part of dimensionality is missing in enterprise data of an enterprise to be evaluated, a second sub-model corresponding to the dimensionality is not calculated or used and does not participate in subsequent integration according to the weight;

and when the category data corresponding to a part of dimensions in the enterprise data of the enterprise to be evaluated is too little, the second sub-model corresponding to the dimension is not utilized.

And the first evaluation result is the overdue probability or score corresponding to the enterprise to be evaluated.

The data of each dimension is respectively processed by using each second sub-model under the second main model to obtain a corresponding sub-evaluation result, and the sub-evaluation results are subjected to weighted summation according to the weight occupied by each second sub-model to obtain a final evaluation result, so that the accuracy of the output result value is improved.

Still further, before the inputting the different categories of data into the corresponding second sub-models under the second main model, the method further includes:

Specifically, only the data of the category meeting the preset numerical value requirement is input into the corresponding second submodel under the second main model for processing, and the data of the category not meeting the preset numerical value requirement is not input into the corresponding second submodel for processing and is not subjected to subsequent weighting processing; since there is a large deviation in the obtained sub-evaluation result if the data amount of the second sub-model is too small or the data of the missing category, the second sub-model corresponding to the category having too small or missing data amount will not be used.

By screening the second submodel corresponding to the category with the too small data amount or the missing data amount, the second submodel corresponding to the category with the too small data amount or the missing data amount is not used when the result is actually evaluated, so that the accuracy of the second main model is improved.

Further, as shown in fig. 3, the receiving enterprise data of the enterprise to be evaluated, and inputting the enterprise data of the enterprise to be evaluated into the second master model for evaluation, and obtaining an evaluation result corresponding to the enterprise to be evaluated includes:

s61', the enterprise data of the enterprise to be evaluated is received, and the content of the enterprise data of the enterprise to be evaluated is classified to obtain data of a plurality of categories;

s62', respectively inputting the data of different categories into the corresponding second sub-models under the second main model, and correspondingly obtaining a plurality of sub-evaluation results;

s63', sorting the sub-evaluation results respectively obtained by the plurality of second sub-models under the second main model to obtain a sorting value;

s64', normalizing the ranking values to obtain normalized values;

s65', according to the weight occupied by the second submodel under the second main model, carrying out weighted summation on the plurality of normalized values to obtain a corresponding second evaluation result.

Specifically, the second numerical value corresponds to the ranking, so that the input enterprise to be evaluated is ranked.

The enterprise data of the enterprise to be evaluated is received, the content of the enterprise data of the enterprise to be evaluated is classified to obtain a plurality of categories of data, the data of different categories are respectively input into second sub models corresponding to a second main model, a plurality of sub-evaluation results are correspondingly obtained, due to the fact that overdue probabilities corresponding to group sorting are consistent under different algorithms, weighting processing is not directly carried out by using the sub-evaluation results, sorting values of a plurality of users by using each sub model are utilized, weighted summation is carried out on normalized first numerical values, namely the first numerical values are weighted summation is carried out according to the weight occupied by the second sub model under the second main model, and a corresponding second evaluation result is obtained. The second evaluation result is not the overdue probability directly corresponding to the enterprise to be evaluated, but is a sort value.

By obtaining the corresponding sequence of a batch of enterprises to be evaluated, the accuracy rate is higher when the enterprises to be evaluated with the front preset proportion or the rear preset proportion in the sequence are subjected to targeted processing.

The method comprises the steps of sorting all sub-evaluation results, normalizing after sorting to obtain normalized numerical values, and weighting and summing the normalized numerical values to obtain corresponding evaluation results, wherein the evaluation results are not directly corresponding probability values but are sorting values, so that the input data of a plurality of enterprises to be evaluated are sorted, and the obtained results are more suitable for actual needs. Namely, only the enterprise to be evaluated in the preset part is subjected to subsequent processing.

Still further, after the data of different categories are respectively input into the second sub-model corresponding to the second main model and a plurality of sub-evaluation results are correspondingly obtained, the method further includes:

Specifically, a first evaluation result of the enterprise to be evaluated is calculated, the enterprise to be evaluated is ranked according to the second evaluation result to obtain a ranking list, and the ranking list is combined with the first evaluation result to achieve calibration of the ranking list. Namely, the overdue probability or the evaluation result of the previous preset proportion is indicated, so as to assist in explaining the sorted list.

By combining the sorted list with the actual overdue probability or the evaluation result, how the overdue probability of the preset proportion is displayed is more clear, and the subsequent related personnel can use the list conveniently.

It is emphasized that, in order to further ensure the privacy and security of the data, the enterprise training data and the evaluation result data corresponding to the enterprise to be evaluated may also be stored in the nodes of a block chain.

The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The embodiment also provides an enterprise data evaluation device, as shown in fig. 4, which is a functional block diagram of the enterprise data evaluation device according to the present application.

The enterprise data evaluation device 100 may be installed in an electronic device. Depending on the functionality implemented, the enterprise data evaluation device 100 can include an acquisition module 101, a training module 102, a first decimation module 103, a combination module 104, a second decimation module 105, and a processing module 106. A module, which may also be referred to as a unit in this application, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.

In the present embodiment, the functions regarding the respective modules/units are as follows:

the acquisition module 101 is used for acquiring enterprise training data of multiple dimensions and multiple preset evaluation models;

the training module 102 is configured to train, for the enterprise training data of each dimension, by using a plurality of preset evaluation models to obtain a plurality of corresponding first sub-models;

a first decimation module 103, configured to determine whether an index of the first submodel meets a preset index requirement, and extract the first submodel meeting the preset index requirement in each dimension as a second submodel;

further, the first decimation module 103 comprises a calculation submodule and a corresponding judgment submodule;

the first calculating submodule is used for calculating the performance index and the stability index of the first submodel;

and the corresponding judgment submodule is used for correspondingly comparing and judging the performance index and the stability index of the first submodel with the elements in the preset index requirement.

Through the cooperation of the calculation submodule and the corresponding judgment submodule, the performance index and the stability index are calculated and are correspondingly compared and judged with each element in the preset index, so that the screening of the first submodel is completed, and the accuracy of the final output result is improved.

A combination module 104, configured to combine the second submodels in the dimensions to obtain a plurality of first main models, where the second submodels in the first main models belong to different dimensions respectively;

a second decimation module 105 for calculating an index of the first main model, calculating a lifting degree of the plurality of first main models based on the indexes of the plurality of first main models, and extracting the first main model having the largest lifting degree as a second main model;

further, the second decimation module 105 includes a second calculation submodule;

the second calculation submodule is configured to perform weighted summation on the performance index and the stability index in the second submodel according to the weight occupied by the second submodel in the first main model to obtain an index of the first main model, and the weight is obtained according to a ratio of an AUC value in the index of the second submodel to a sum of AUC values corresponding to all second submodels in the first main model.

And the second calculating submodule takes the ratio of the AUC value of each second submodel to the sum of the AUC values of all the second submodels in the first main model as the weight of the second submodel, and calculates each index of the first main model by using the weight so as to select the first main model in the follow-up process.

Further, the second decimation module 105 includes a boost degree operator module;

the lifting degree calculation sub-module is used for calculating the lifting degree of the first main model according to the following formula:

index refers to an Index for measuring model performance, i.e., a performance Index in the present application; a refers to the second submodel, n refers to the number of second submodels; a represents a first main model integrated by a plurality of second submodels; lift represents the Lift; k is a value between 1 and n, and k is an integer; extreme () is taken to be an extreme value.

And calculating the lifting degree of each first main model through the lifting degree calculation operator module, selecting the first main model with the highest lifting degree as a second main model, and outputting a result with the optimal accuracy by using the second main model.

And the processing module 106 is configured to receive enterprise data of an enterprise to be evaluated, input the enterprise data of the enterprise to be evaluated into the second master model, and evaluate the enterprise data to be evaluated to obtain an evaluation result corresponding to the enterprise to be evaluated.

Further, the processing module 106 includes a first receiving sub-module, a first processing sub-module, and a first summing sub-module;

the first receiving submodule is used for receiving the enterprise data of the enterprise to be evaluated and classifying the content of the enterprise data of the enterprise to be evaluated to obtain data of a plurality of categories;

the first processing sub-module is used for respectively inputting the data of different types into the corresponding second sub-models under the second main model to correspondingly obtain a plurality of sub-evaluation results;

and the first summation submodule is used for carrying out weighted summation on the plurality of sub-evaluation results according to the weight occupied by the second sub-model under the second main model to obtain the corresponding first evaluation result.

Through the cooperation of the first receiving submodule, the first processing submodule and the first summing submodule, the data of each dimension are respectively processed by using each second submodel under the second main model to obtain corresponding sub-evaluation results, and the sub-evaluation results are subjected to weighted summation according to the weight occupied by each second submodel to obtain the final evaluation result, so that the accuracy of the output result value is improved.

Still further, the processing module 106 further includes a numerical judgment sub-module, and the first processing sub-module includes a corresponding input unit;

the numerical value judgment submodule is used for counting the data volume corresponding to the data of each category, and comparing the data volume with a preset numerical value to obtain the data of the category corresponding to the preset numerical value of which the data volume is greater than or equal to the preset numerical value;

and the corresponding input unit is used for respectively inputting a plurality of data of the category corresponding to the data quantity greater than or equal to the preset numerical value into the corresponding second submodel under the second main model.

And the first processing submodule screens a second submodel corresponding to the category with the too small data amount or the missing data amount through the matching of the numerical value judging submodule and the first processing submodule, and does not use the second submodel corresponding to the category with the too small data amount or the missing data amount when the actual evaluation result is obtained, so that the accuracy of the second main model is improved.

Further, the processing module 106 includes a second receiving sub-module, a second processing sub-module, an ordering sub-module, a normalizing sub-module, and a second summing sub-module;

the second receiving submodule is used for receiving the enterprise data of the enterprise to be evaluated and classifying the content of the enterprise data of the enterprise to be evaluated to obtain data of a plurality of categories;

the second processing submodule is used for respectively inputting the data of different types into a corresponding second sub-model under a second main model to correspondingly obtain a plurality of sub-evaluation results;

the first sequencing submodule is used for sequencing the sub-evaluation results respectively obtained by the plurality of second sub-models under the second main model to obtain a sequencing value;

the normalization submodule is used for normalizing the sequencing values to obtain normalized values;

and the second summation submodule is used for carrying out weighted summation on the plurality of normalization values according to the weight occupied by the second submodel under the second main model to obtain a corresponding second evaluation result.

Through the cooperation of the second receiving submodule, the second processing submodule, the sorting submodule, the normalization submodule and the second summation submodule, all the sub-evaluation results are sorted and normalized after being sorted to obtain normalized values, then the normalized values are weighted and summed to obtain corresponding evaluation results, the evaluation results are not directly corresponding probability values but are sorting values, the input data of a plurality of enterprises to be evaluated are sorted, and the obtained results are more suitable for actual needs. Namely, only the enterprise to be evaluated in the preset part is subjected to subsequent processing.

Still further, the processing module 106 further includes a third summing sub-module, a second sorting sub-module, and a combining sub-module;

the third summation submodule is used for carrying out weighted summation on the plurality of sub-evaluation results according to the weight occupied by the second sub-model under the second main model to obtain the corresponding first evaluation result;

the second sorting submodule is used for sorting the enterprises to be evaluated according to the second evaluation result to obtain a sorted list;

and the combining submodule is used for combining the sorted list with the first evaluation result to realize the calibration of the sorted list.

Through the cooperation of the third summation submodule, the second sorting submodule and the combination submodule, the sorted list is combined with the actual overdue probability or the evaluation result, so that how the overdue probability of the previous preset proportion is displayed is more clearly, and the follow-up related personnel can utilize the list conveniently.

By adopting the above device, the enterprise data evaluation device 100 obtains enterprise training data of multiple dimensions and multiple preset evaluation models through the cooperation of the obtaining module 101, the training module 102, the first lottery module 103, the combining module 104, the second lottery module 105 and the processing module 106, and trains the enterprise training data of each dimension by using multiple preset models, so that multiple corresponding first sub-models can be obtained in each dimension, the first sub-models are preliminarily screened according to whether indexes of the first sub-models meet the preset index requirements, so as to obtain second sub-models meeting the requirements, one second sub-model is extracted from each dimension to be combined, multiple first main models are obtained, each index of the first main models is the weighted sum of multiple second sub-models below the first main models, and according to the indexes of the first main models, calculating the lifting degree of the first main model, and extracting the first main model with the largest lifting degree as a second main model; therefore, the training and screening of the model are completed, and the stability and the efficiency are better when the second main model is subsequently utilized to process data; the enterprise data of the enterprise to be evaluated is received and input into the second main model for evaluation, so that the evaluation result corresponding to the enterprise to be evaluated is obtained, and the accuracy of the evaluation result of the enterprise to be evaluated according to the enterprise data of the enterprise to be evaluated is improved.

The embodiment of the application also provides computer equipment. Referring to fig. 5, fig. 5 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only computer device 4 having components 41-43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a flash Card (FlashCard), and the like, which are provided on the computer device 4. Of course, the memory 41 may also include both internal and external storage devices of the computer device 4. In this embodiment, the memory 41 is generally used for storing an operating system and various application software installed on the computer device 4, such as computer readable instructions of an enterprise data evaluation method. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, such as computer readable instructions for executing the enterprise data assessment method.

The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.

In this embodiment, when the processor executes the computer readable instructions stored in the memory, the steps of the enterprise data evaluation method according to the above embodiments are implemented, by obtaining enterprise training data of multiple dimensions and multiple preset evaluation models, and training the enterprise training data of each dimension by using multiple preset models, so that multiple corresponding first sub-models can be obtained in each dimension, preliminarily screening the first sub-models according to whether the indexes of the first sub-models meet the preset index requirements, thereby obtaining second sub-models meeting the requirements, extracting one second sub-model from each dimension for combination, thereby obtaining multiple first main models, each index of a first main model being a weighted sum of multiple second sub-models below the first main model, calculating the lifting degree of the first main model according to the index of the first main model, and extracting the first main model with the largest lifting degree, as a second master model; therefore, the training and screening of the model are completed, and the stability and the efficiency are better when the second main model is subsequently utilized to process data; the enterprise data of the enterprise to be evaluated is received and input into the second main model for evaluation, so that the evaluation result corresponding to the enterprise to be evaluated is obtained, and the accuracy of the evaluation result of the enterprise to be evaluated according to the enterprise data of the enterprise to be evaluated is improved.

The embodiment of the present application further provides a computer-readable storage medium, which stores computer-readable instructions, where the computer-readable instructions are executable by at least one processor, so as to enable the at least one processor to perform the steps of the enterprise data evaluation method as described above, by obtaining enterprise training data of multiple dimensions and multiple preset evaluation models, and training the enterprise training data of each dimension by using multiple preset models, so that multiple corresponding first sub-models can be obtained in each dimension, by prescreening a first sub-model according to whether an index of the first sub-model meets a preset index requirement, so as to obtain a second sub-model meeting the requirement, extracting one second sub-model from each dimension to combine to obtain multiple first main models, where each index of the first main model is a weighted sum of multiple second sub-models below the first main model, calculating the lifting degree of the first main model according to the index of the first main model, and extracting the first main model with the largest lifting degree as a second main model; therefore, the training and screening of the model are completed, and the stability and the efficiency are better when the second main model is subsequently utilized to process data; the enterprise data of the enterprise to be evaluated is received and input into the second main model for evaluation, so that the evaluation result corresponding to the enterprise to be evaluated is obtained, and the accuracy of the evaluation result of the enterprise to be evaluated according to the enterprise data of the enterprise to be evaluated is improved.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

The enterprise data evaluation device, the computer device, and the computer-readable storage medium according to the embodiments of the present application have the same technical effects as the enterprise data evaluation method according to the embodiments, and are not expanded herein.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. A method for enterprise data evaluation, the method comprising:

2. The method of claim 1, wherein the determining whether the metrics of the first sub-model meet predetermined metric requirements comprises:

calculating a performance index and a stability index of the first sub-model;

3. The enterprise data evaluation method of claim 2, wherein said calculating an indicator of the first master model comprises:

4. The enterprise data evaluation method according to claim 1, wherein the receiving enterprise data of the enterprise to be evaluated, and inputting the enterprise data of the enterprise to be evaluated into the second master model for evaluation to obtain an evaluation result corresponding to the enterprise to be evaluated comprises:

5. The method of claim 4, further comprising, prior to said entering different categories of data into respective second submodels under a second master model:

6. The enterprise data evaluation method according to claim 1, wherein the receiving enterprise data of the enterprise to be evaluated, and inputting the enterprise data of the enterprise to be evaluated into the second master model for evaluation to obtain an evaluation result corresponding to the enterprise to be evaluated comprises:

normalizing the ranking values to obtain normalized values;

7. The enterprise data evaluation method of claim 6, wherein after the different categories of data are respectively input into the second sub-models corresponding to the second main model and a plurality of sub-evaluation results are obtained, the method further comprises:

8. An enterprise data evaluation device, the device comprising:

9. A computer device, characterized in that the computer device comprises:

at least one processor; and the number of the first and second groups,

the memory stores computer readable instructions which, when executed by the processor, implement the enterprise data assessment method of any one of claims 1-7.

10. A computer-readable storage medium having computer-readable instructions stored thereon which, when executed by a processor, implement the enterprise data assessment method of any one of claims 1-7.