CN111401329A - Information flow direction identification method, device, equipment and storage medium - Google Patents

Information flow direction identification method, device, equipment and storage medium Download PDF

Info

Publication number
CN111401329A
CN111401329A CN202010338853.1A CN202010338853A CN111401329A CN 111401329 A CN111401329 A CN 111401329A CN 202010338853 A CN202010338853 A CN 202010338853A CN 111401329 A CN111401329 A CN 111401329A
Authority
CN
China
Prior art keywords
preset
information
historical
variable
variables
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010338853.1A
Other languages
Chinese (zh)
Other versions
CN111401329B (en
Inventor
郭玮
高宇航
张丙松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Xinzhi Automotive Technology Co ltd
Original Assignee
Beijing Xinzhi Junyang Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xinzhi Junyang Information Technology Co ltd filed Critical Beijing Xinzhi Junyang Information Technology Co ltd
Priority to CN202010338853.1A priority Critical patent/CN111401329B/en
Publication of CN111401329A publication Critical patent/CN111401329A/en
Application granted granted Critical
Publication of CN111401329B publication Critical patent/CN111401329B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The application provides an information flow direction identification method, an information flow direction identification device, information flow direction identification equipment and a storage medium, wherein the method comprises the following steps: acquiring document information to be processed; analyzing the bill information according to a preset variable to generate a flow characteristic set of the bill information; and inputting the flow characteristic set into a target recognition model, and recognizing the flow direction information of the bill information. According to the method and the device, the flow direction information of the bill information is automatically identified by adopting the preset target identification model according to the analyzed flow characteristic set of the bill information.

Description

Information flow direction identification method, device, equipment and storage medium
Technical Field
The present application relates to the field of information processing technologies, and in particular, to an information flow direction identification method, apparatus, device, and storage medium.
Background
Motor vehicle insurance, i.e. automobile insurance (car insurance for short), refers to a commercial insurance for paying responsibility for personal casualties or property losses caused by natural disasters or accidents of motor vehicles.
The recent vehicle insurance market reports show that the vehicle insurance is obviously reduced in both acceleration and profit levels, and the saving cost of vehicle insurance clients is greatly lower than the development cost of the vehicle insurance clients. In addition, the automobile insurance is taken as short-term insurance in one year, the insurance industry has huge client amount, meanwhile, the automobile insurance renewal rate of insurance companies is between 50% and 60% on the whole, and the insurance industry has more than 2 hundred million insurance policy due amount every year.
Therefore, how to identify which expired warranties are about to run off so as to continuously improve the renewal rate becomes an important problem in the development of the industry.
Disclosure of Invention
An object of the embodiments of the present application is to provide an information flow direction identification method, apparatus, device and storage medium, so as to implement automatic identification of flow direction information of document information by using a preset target identification model according to a flow feature set of parsed document information.
A first aspect of the embodiments of the present application provides an information flow direction identification method, including: acquiring document information to be processed; analyzing the bill information according to a preset variable to generate a flow characteristic set of the bill information; and inputting the flow characteristic set into a target recognition model, and recognizing the flow direction information of the bill information.
In an embodiment, the analyzing the document information according to a preset variable to generate a flowing feature set of the document information further includes: identifying an initial variable set contained in the document information; carrying out invalid data cleaning on the initial variable set to generate an effective variable set; and extracting a data set corresponding to the preset variable from the effective variable set to serve as a flowing feature set of the bill information.
In an embodiment, the identifying the initial variable set included in the document information includes: analyzing the bill information according to the data dimension corresponding to the bill information to generate a plurality of initial variables; analyzing actual data of each initial variable in the bill information, and classifying all the initial variables according to a preset classification rule; and generating the initial variable set according to the actual data and the classification result.
In one embodiment, the step of selecting the predetermined variable includes: acquiring data of a plurality of historical documents; carrying out invalid data cleaning on the data of each historical receipt to generate a historical variable set; and after the historical variables with the information flow contribution rate to the historical documents smaller than a preset contribution threshold value are removed from the historical variable set, generating a plurality of preset variables.
In an embodiment, after removing the historical variables from the historical variable set, the generating a plurality of preset variables after the historical variables having the information flow contribution rate to the historical documents smaller than a preset contribution threshold includes: acquiring actual historical flow direction information of the historical document; calculating the correlation degree of each historical variable and the actual historical flow direction information; and after the historical variables with the correlation degrees smaller than the preset contribution threshold value are removed from the historical variable set, generating a plurality of preset variables.
In an embodiment, the step of presetting the target recognition model includes: respectively training multiple mathematical algorithm models according to the preset variables and generating multiple preset recognition models; respectively calculating the truth of each preset identification model based on the historical documents; judging whether a plurality of equal preset identification models with the same true degree and the maximum true degree exist in the plurality of preset identification models or not; and if the same preset identification model does not exist in the plurality of preset identification models, selecting the preset identification model with the maximum true degree as the target identification model.
In one embodiment, the method further comprises: if a plurality of identical preset recognition models exist in the plurality of preset recognition models, respectively calculating the accuracy of a confusion matrix of each identical preset recognition model; and selecting the equivalent preset recognition model with the maximum accuracy of the confusion matrix from the equivalent preset recognition models as the target recognition model.
A second aspect of the embodiments of the present application provides an information flow direction identification apparatus, including: the first acquisition module is used for acquiring the information of the document to be processed; the analysis module is used for analyzing the bill information according to a preset variable to generate a flow characteristic set of the bill information; and the identification module is used for inputting the flow characteristic set into a target identification model and identifying the flow direction information of the bill information.
In one embodiment, the parsing module is configured to: identifying an initial variable set contained in the document information; carrying out invalid data cleaning on the initial variable set to generate an effective variable set; and extracting a data set corresponding to the preset variable from the effective variable set to serve as a flowing feature set of the bill information.
In an embodiment, the identifying the initial variable set included in the document information includes: analyzing the bill information according to the data dimension corresponding to the bill information to generate a plurality of initial variables; analyzing actual data of each initial variable in the bill information, and classifying all the initial variables according to a preset classification rule; and generating the initial variable set according to the actual data and the classification result.
In one embodiment, the method further comprises: the second acquisition module is used for acquiring data of a plurality of historical documents; the cleaning module is used for cleaning invalid data of each historical document to generate a historical variable set; and the removing module is used for removing the historical variables with the information flow contribution rate to the historical documents smaller than a preset contribution threshold value from the historical variable set to generate a plurality of preset variables.
In one embodiment, the culling module is to: acquiring actual historical flow direction information of the historical document; calculating the correlation degree of each historical variable and the actual historical flow direction information; and after the historical variables with the correlation degrees smaller than the preset contribution threshold value are removed from the historical variable set, generating a plurality of preset variables.
In one embodiment, the method further comprises: the training module is used for respectively training a plurality of mathematical algorithm models according to a plurality of preset variables and generating a plurality of preset recognition models; the calculation module is used for calculating the truth of each preset identification model based on the historical documents; the judging module is used for judging whether a plurality of equal preset identification models with the same true degree and the maximum true degree exist in the preset identification models or not; and the selecting module is used for selecting the preset identification model with the maximum true degree as the target identification model if the same preset identification model does not exist in the plurality of preset identification models.
In an embodiment, the calculating module is further configured to calculate a correctness of a confusion matrix of each of the equal predetermined recognition models, if a plurality of the equal predetermined recognition models exist in the plurality of predetermined recognition models; the selecting module is further configured to select, from the multiple equivalent preset recognition models, the equivalent preset recognition model with the largest accuracy of the confusion matrix as the target recognition model.
A third aspect of embodiments of the present application provides an electronic device, including: a memory to store a computer program; the processor is configured to execute the method of the first aspect and any embodiment thereof of the embodiments of the present application to identify flow information of document information.
A fourth aspect of embodiments of the present application provides a non-transitory electronic device-readable storage medium, including: a program which, when run by an electronic device, causes the electronic device to perform the method of the first aspect of an embodiment of the present application and any embodiment thereof.
According to the information flow direction identification method, the information flow direction identification device, the information flow direction identification equipment and the storage medium, the document information to be processed is analyzed based on the preset variable, the flow characteristic set of the document information is obtained, then the flow characteristic set is input into the target identification model, the flow direction information of the document information is output, and the future flow direction of the document information is automatically identified and predicted.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 2 is a schematic flow chart of an information flow direction identification method according to an embodiment of the present application;
fig. 3 is a schematic flow chart of an information flow direction identification method according to an embodiment of the present application;
fig. 4A is a schematic flowchart of an information flow direction identification method according to an embodiment of the present application;
FIG. 4B is a schematic diagram of ROC curves generated for a sample of a partial set of historical variables according to an embodiment of the present application;
FIG. 4C is a diagram of L ift-train corresponding to a partial set of historical variables used as training samples according to an embodiment of the present application;
FIG. 4D is a schematic diagram of a ROC curve of a training set according to an embodiment of the present application;
FIG. 4E is a schematic diagram of a ROC curve for a validation set according to one embodiment of the present application;
fig. 5 is a schematic structural diagram of an information flow direction identification apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. In the description of the present application, the terms "first," "second," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
As shown in fig. 1, the present embodiment provides an electronic apparatus 1 including: at least one processor 11 and a memory 12, one processor being exemplified in fig. 1. The processor 11 and the memory 12 are connected by the bus 10, and the memory 12 stores instructions executable by the processor 11, and the instructions are executed by the processor 11, so that the electronic device 1 can execute all or part of the flow of the method in the embodiments described below, so as to automatically identify the flow information of the document information.
In an embodiment, the electronic device 1 may be a mobile phone, a notebook computer, a desktop computer, or the like.
Please refer to fig. 2, which is a method for identifying information flow direction according to an embodiment of the present application, and the method can be executed by the electronic device 1 shown in fig. 1 and can be applied to a scenario of predicting policy loss information, so as to identify the flow direction information of the document information by using a target identification model according to a flow feature set of the document information. The method comprises the following steps:
step 201: and acquiring the information of the document to be processed.
In this step, the document information may be an insurance document or various order documents, such as a car insurance policy, a personal health policy, a purchase order, and the like, the document information to be processed may be a document which is about to expire, such as a car insurance policy which is about to expire, policy information which is about to expire for a preset time (such as three months) may be selected as the document information to be processed, the preset time may be determined based on actual historical statistical data, the document information to be processed may be multiple ones, a CSV (Comma-Separated Values) structure data file which is externally provided may be imported into a data storage structure by a data import function, or the policy data file which is about to expire and is to be predicted may be imported into a data storage structure by an ET L (Extract-Transform-L ad) manner of loading data of a business system into a data warehouse after extraction and cleaning conversion.
Step 202: and analyzing the bill information according to the preset variable to generate a flow characteristic set of the bill information.
In this step, the preset variable is a characteristic variable representing the flow direction of the document information, and may be set based on historical statistical information in an actual scene. The preset variable may be plural. And performing data analysis on the document information to be processed based on preset variables, wherein the document information comprises data information corresponding to each preset variable, and a set formed by the data information corresponding to all the preset variables is used as a flow characteristic set of the document information.
Step 203: and inputting the flow characteristic set into a target recognition model, and recognizing the flow information of the bill information.
In this step, the target recognition model can automatically recognize the flow information of the document information. Training samples can be collected based on historical statistical data of an actual scene, and then an algorithm model is trained to obtain a target recognition model. Inputting the flow feature set of the bill information obtained in step 202 into the target recognition model, and outputting the flow information of the bill information. The flow information may be information about whether the document will be lost in the future. Such as whether the policy client will keep the relevant information.
According to the information flow direction identification method, the document information to be processed is analyzed based on the preset variable, the flow characteristic set of the document information is obtained, then the flow characteristic set is input into the target identification model, and further the flow direction information of the document information is output, so that the future flow direction of the predicted document information is automatically identified.
Please refer to fig. 3, which is a flow direction identification method according to an embodiment of the present application, and the method can be executed by the electronic device 1 shown in fig. 1, and can be applied to a scenario of policy loss information prediction to identify flow direction information of document information according to a flow feature set of document information and by using a target identification model. The method comprises the following steps:
step 301: and acquiring data of a plurality of historical documents.
In this step, before identifying the document information to be processed, a preset variable needs to be selected based on historical statistical data. The historical documents and the document information to be processed are the same type of documents, and are similar to vehicle insurance policies. Taking the prediction of the loss information of the vehicle insurance policy customer as an example, the data can be primarily screened from the historical data, and key factors influencing the loss of the vehicle insurance policy of the customer can be excavated.
In one embodiment, the size of the sample size, the number of variables, the field missing condition, and the binary y in the model are determined, and the meanings of 0 and 1 in y are defined. The data of a plurality of historical documents can be stored in a wide-table data mode, and the data information can be divided into the following data according to the information dimension: customer data dimensions, vehicle data dimensions, insurance data dimensions, life insurance customer data dimensions, business member data dimensions, brand data dimensions. Specific broad table data are shown in table 1:
TABLE 1 possible significant factors for policy customer churn
Figure BDA0002467640540000071
Figure BDA0002467640540000081
Step 302: and carrying out invalid data cleaning on the data of each historical document to generate a historical variable set.
In this step, taking the prediction of the vehicle insurance policy loss information as an example, since the historical documents in the historical database contain a large amount of information, in order to more accurately obtain the variable factors capable of representing the document flow direction, invalid data cleaning is firstly performed on the data of each historical document. The imported data can be checked from the logical relationship rationality, the missing value condition and the abnormal data condition through the data checking function so as to judge whether the data meets the condition for predicting the loss. And cleaning invalid data, and finally generating a historical variable set.
In one embodiment, the actual meaning of each variable in all historical document data is first interpreted and divided into continuous variables, categorical variables, date variables, and non-function variables. As shown in table 2:
TABLE 2 practical significance and Classification of variables
Figure BDA0002467640540000091
Figure BDA0002467640540000101
Then, for the processing of the data missing value, the variable deletion processing with the missing value accounting for more than 80% can be performed, and if the actual meaning of the missing of one variable indicates 0, the value is filled with 0. Fields are distributed uniformly in the variable, and the missing data value can be filled by using a mean value; unevenly distributed fields, missing data values can be padded with a median or with 0.
For example: and (3) continuously keeping the number of times, wherein the missing value indicates that the continuous keeping is not continued, 0 can be used for filling, and the filled value indicates that the number of continuous keeping is 0. As shown in Table 3, the statistical distribution of the variables was found to have a missing value of 143792 for the number of consecutive passages. The data after missing value padding is shown in table 4.
TABLE 3 loss of consecutive hold times
var mean median 0% 1% 10% 25% 50% 75% 90% 99% 100% nmiss
RENEWNUM 2.20620647218415 2 1 1 1 1 2 3 5 6 6 143792
TABLE 4 loss of consecutive hold times
var mean median 0% 1% 10% 25% 50% 75% 90% 99% 100% nmiss
RENEWNUM 1.1001032063709 0 0 0 0 0 0 2 3 6 6 0
In one embodiment, 1% and 99% of the quantiles may be replaced or left untreated for treatment of variable outliers.
For example: as shown in Table 5, the age of the applicant was variable at 999% quantile, which really means the missing value.
TABLE 5 Defect value of applicant's age
var mean median 0% 1% 10% 25% 50% 75% 90% 99% 100% nmiss
APPLIACE 41.4385538555948 39 0 23 28 33 39 48 55 67 999 0
The 999 can be replaced by NA in Table 5, which is expressed as the true missing value, and the statistical distribution is seen, which is not uniform, so the missing value is filled by the median, and 1% and 99% quantiles are processed, as shown in Table 6:
TABLE 6 Defect value treatment of applicant's age
var mean median 0% 1% 10% 25% 50% 75% 90% 99% 100% nmiss
APPLIACE 40.623209798995 39 0 23 28 33 39 48 55 66 91 244
As shown in Table 7, the statistical distribution of the age variables of the applicant after the treatment of the missing values and abnormal values is:
TABLE 7 data of age-loss and abnormal values of applicant
var mean median 0% 1% 10% 25% 50% 75% 90% 99% 100% nmiss
APPLIACE 40.6127564469115 39 23 23 28 33 39 48 55 66 66 0
In one embodiment, some original variables may be processed and derived into new variables according to the statistical distribution, and the new variables are more significant in the representation of information flow and more useful for the model.
For example: number variable of insurance policy for insuring non-vehicle insurance
From the distribution statistics, the loss proportion of the insurable non-vehicle insurance unit number is too large, and the true meaning of the loss is that the client does not insurable non-vehicle insurance, so that the sample with the insurance policy is marked as 1, and the sample without the insurance policy is marked as 0, and a two-classification derivative variable is generated to indicate whether the non-vehicle insurance is insurable or not. As shown in table 8:
TABLE 8 non-insurance odd number distribution data for insuring
Figure BDA0002467640540000111
Step 303: and after removing the historical variables with the information flow contribution rate to the historical documents smaller than a preset contribution threshold value from the historical variable set, generating a plurality of preset variables.
In this step, the historical variables in the historical variable set are further analyzed, and in order to improve the accuracy of the variables in representing the loss information of the policy-preserving client, the historical variables with small contribution to information flow need to be removed, and then the historical variables with large contribution rate are left, and these historical variables can be used as preset variables.
Step 304: and respectively training various mathematical algorithm models according to a plurality of preset variables and generating a plurality of preset recognition models.
In this step, after the preset variable is selected, a target recognition model needs to be set. And taking a historical data set in the preset variable as a training sample document, and simultaneously training a plurality of mathematical algorithm models by adopting the training sample document, so that each mathematical algorithm model can generate a preset recognition model. For example, the historical data in the cleaned and screened preset variables can be reintegrated into a new table, and a training set and a verification set are divided, wherein the proportion of the training set to the verification set can be 7:3 or 8: 2.
In an embodiment, the preset variables finally entering the model are determined, and mathematical algorithm models such as CART classification tree, naive bayes, KNN, GradientBoosting, Xgboost and the like can be constructed. In the process of comparing the precision of each model, for example, when an xgboost algorithm model is constructed, parameters in each mathematical algorithm model can be automatically adjusted and optimized, so that the parameters are optimal in the value range.
Step 305: and respectively calculating the truth of each preset identification model based on the historical documents.
In this step, the history document may include the flow information of the document, such as actual history data of the sample document a that the customer has not continued, is lost, or the sample document B that the customer has continued, and so on. Comparing the identification result of the sample document A of each preset identification model in the step 304 with the actual historical data of the sample document A, and if the identification result of the sample document A is the same, indicating that the identification result of the preset identification model is real. And by analogy, counting the recognition truth of each preset recognition model to the training sample set.
In an embodiment, the degree of truth of the corresponding model may be represented by an AUC (Area Under cut, defined as an Area enclosed by coordinate axes Under an ROC (receiver operating characteristic Curve)) value of the training set of each predetermined recognition model.
Step 306: and judging whether a plurality of equal preset identification models with the same truth degree and the maximum truth degree exist in the plurality of preset identification models. If yes, go to step 308, otherwise go to step 307.
In this step, the comparison of the truth can be realized by judging the AUC values of the plurality of preset identification models, the preset identification model with the largest AUC value is selected first, and then it is judged whether there are a plurality of preset identification models with the largest AUC values, if so, step 308 is entered, otherwise, step 307 is entered.
Step 307: and selecting the preset recognition model with the maximum truth as the target recognition model. And proceeds to step 310.
In this step, if there is no equivalent preset identification model in the plurality of preset identification models, that is, there is only one preset identification model with the largest AUC value, the preset identification model with the largest AUC value is taken as the target identification model.
Step 308: and respectively calculating the accuracy of the confusion matrix of each equal preset identification model.
In this step, if there are a plurality of identical preset recognition models among the plurality of preset recognition models, that is, if there are a plurality of preset recognition models with the largest AUC values, it is necessary to further calculate the confusion matrix of the identical preset recognition models with the same AUC values, and the accuracy, sensitivity, hit rate, specificity, etc. of the confusion matrix can be calculated respectively, and the above information can be stored as one dataframe (data frame).
Step 309: and selecting the equivalent preset recognition model with the maximum accuracy of the confusion matrix from the equivalent preset recognition models as a target recognition model.
In this step, the same preset recognition model with the highest accuracy of the confusion matrix may be selected as the target recognition model and stored in a model (model) form.
Step 310: and acquiring the information of the document to be processed. See the description of step 201 in the above embodiments for details.
Step 311: an initial set of variables contained in the document information is identified.
In this step, the document information to be processed includes related information of the whole document, and the initial data is relatively complicated and needs to identify an initial variable set that contributes to information flow identification.
Step 312: and carrying out invalid data cleaning on the initial variable set to generate an effective variable set.
In this step, similarly, a plurality of missing or abnormal invalid data are found in the initial variable set, and the invalid data are subjected to data cleaning, so that the information flow direction characteristics of the effective variable set can be represented more accurately, and the calculation efficiency is improved.
Step 313: and extracting a data set corresponding to the preset variable from the effective variable set to serve as a flowing characteristic set of the document information.
In this step, based on the plurality of preset variables set in step 303, data is read from the set of valid variables, valid data information is given to each preset variable, and then a flow feature set of the document information is generated.
Step 314: and inputting the flow characteristic set into a target recognition model, and recognizing the flow information of the bill information. See the description of step 203 in the above embodiments for details.
Please refer to fig. 4A, which is an information flow direction identification method according to an embodiment of the present application, and the method can be executed by the electronic device 1 shown in fig. 1 and can be applied to a scenario of policy loss information prediction to identify flow direction information of document information by using a target identification model according to a flow feature set of the document information. The method comprises the following steps:
step 401: and acquiring data of a plurality of historical documents. See the description of step 301 in the above embodiments for details.
Step 402: and carrying out invalid data cleaning on the data of each historical document to generate a historical variable set. See the description of step 302 in the above embodiments for details.
Step 403: and acquiring actual historical flow information of the historical document.
In this step, the actual historical flow information may be the final renewal or loss information of the customer of the historical document. For example, the history document is a vehicle insurance policy A, and after the policy A is expired, the final renewal of the customer, the change of the customer into other types of insurance or non-renewal information, and the like can be used as actual history flow information. The historical data is recorded in the historical record of each bill, and corresponding actual historical flow information can be obtained from the historical database through statistical analysis.
Step 404: and calculating the correlation degree of each historical variable and the actual historical flow information.
In this step, correlation analysis may be employed to calculate the degree of correlation. Correlation analysis is a common method for screening variables in data analysis, and correlation analysis between an explanatory variable x and a response variable y and correlation analysis between the explanatory variable x1 and x2 are used to obtain the correlation degree between each historical variable and actual historical flow direction information.
Step 405: and after removing the historical variables with the correlation degrees smaller than the preset contribution threshold value from the historical variable set, generating a plurality of preset variables.
In this step, some variables can be effectively eliminated based on the correlation. For example, by dividing the historical variable and actual historical flow information into an explanatory variable x and a responsive variable y. By calculating the index value of the interpretation variable x, variables which do not contribute to the response variable y or variables with contribution rates smaller than a preset contribution threshold value can be effectively eliminated. For example: as shown in table 9, since indexes classified as 0 and 1 each represent about 100, it was determined that this variable has no significant influence on y and can be eliminated.
TABLE 9 variable distribution data
Figure BDA0002467640540000151
In one embodiment, a stepwise regression method may be used to perform variable screening on the historical variable set. And (4) reintegrating the cleaned and screened data into a new table, and dividing a training set and a verification set according to the proportion of 7:3 or 8: 2. Establishing a logistic regression model, wherein the establishment standard of the model is as follows:
1. the number of variables that eventually go to the model is between 8 and 15. The number of variables that the present embodiment finally proceeds to the model is 11.
2. There is no high correlation between the explanatory variables x and x entering the model (positive or negative correlation less than 0.6).
3. As shown in table 10, the correlation between the explanatory variable x and the response variable y entering the model cannot be too low to be lower than the preset value.
Table 10 partially explains the correlation between variable x and response variable y
Figure BDA0002467640540000152
4. After the logistic regression model is established, the model needs to be evaluated, and the evaluation model needs to evaluate the following important indexes:
I. the value of the regression coefficient Pr was less than 0.05 as shown in table 10.
TABLE 11 regression coefficient schematic for partial variables
Estimate Std.Error z value Pr(>|z|)
(Intercept) 1.49E+00 1.32E-01 11.299 <2e-16 ***
DISCOUNT -5.77E-01 8.86E-02 -6.514 7.30E-11 ***
RENEWNUM -2.61E-01 1.58E-02 -16.511 <2e-16 ***
UNDERWRITESTART 1.83E-03 7.51E-04 2.437 0.0148
PRDGROUP11 -5.04E-01 3.74E-02 -13.486 <2e-16 ***
APPLIAGE -1.05E-02 1.27E-03 -8.284 <2e-16 ***
II. The predetermined contribution threshold range of the explanatory variables is [ 5%, 40% ], and as shown in table 11, after the logistic regression model is established based on the historical variable set, the explanatory variables are normalized, and a new logistic regression model is re-established, and the interpretation degree of each explanatory variable x coefficient to the response variable y is compared with the whole, that is, the contribution ratio of the explanatory variable x.
TABLE 12 contribution rates for variables partially within the preset contribution threshold
Figure BDA0002467640540000161
IV, AUC values and ROC curves:
as shown in fig. 4B, ROC curves are generated for a portion of the set of historical variables for the sample. Wherein:
the abscissa of the ROC curve is FPR (false positive rate) (FP/(FP + TN)), which is the predicted positive negative sample result number/negative sample actual number, and the range of values of FPR [0, 1 ].
The ROC curve ordinate is TPR (true positive rate) (TP/(TP + FN)), and is the number of positive sample predicted results/number of positive samples actual, and the value range of TPR [0, 1 ].
The ROC curve is actually formed by connecting a plurality of points, each threshold (above which the value is classified into 1, and vice versa classified into 0) corresponds to a group of classification results, i.e., a group of FPR and TPR, and the plurality of thresholds form a plurality of points, i.e., ROC curve.
The area under the ROC curve is the AUC value. The physical meaning of the AUC values is: one sample (namely 0 and 1) is randomly selected from the two classes of 0 and 1, the two samples are predicted according to a classifier, the probability of classifying the sample 1 into the class 1 is p _1, the probability of classifying the sample 0 into the class 1 is p _0, and the probability of p _1> p _0 is an AUC value. I.e., AUC values reflect the ability of the classifier to rank the samples.
In this embodiment, the AUC of the historical variable entering the logistic regression model is [0.5,1 ].
V, L ift graph of the regression model generated by the historical variable set as the training set is to show a gradient descending trend, L ift graph is essential to measure the degree of distinction of the model, in general, the ratio (Resp-index) index of the packet actual response rate to the overall average level in L ift graph is required to be in a descending trend, for example, based on the partial historical variable set as shown in Table 13 as the training sample, the corresponding L ift-train graph is shown in FIG. 4C, wherein the solid line represents the overall average level index, i.e., the ratio of the packet actual response rate to the overall average level, the dotted line represents the overall average level, L ift graph shows a gradient descending trend, and thus the historical variable set in Table 13 can be retained as the preset variable.
Table 13 set of partial historical variables
Figure BDA0002467640540000171
In table 13, escape cnt: the train data set was equally divided into 10 bins by sample size. Total: and ranking the predicted response rates, and respectively putting the predicted response rates into 10 boxes according to the sequence. Resp: actual number of responses in each bin. And (3) Rate: in each box, the actual response counts are in proportion. Resp _ index is the ratio of the actual response to the overall average in each bin.
There was no significant difference in performance on VI, training set and validation set, avoiding overfitting. That is, the AUC values of the corresponding ROC curves show no significant difference between the training set (train) and the validation set, as shown in fig. 4D, which is a ROC curve of the training set, wherein the AUC values show AUC 0.7284 in the training set. As shown in fig. 4E, is a ROC curve for the validation set (test), where the AUC value is represented by AUC 0.7249 on the validation set. It can be seen that the difference between the two sets is small, and the corresponding training set and validation set meet the requirements of prediction. If the difference is large, the model over-learns the features on the training set, and over-fitting is generated, and the model needs to be adjusted at the moment. Whether the difference is significant or not can be obtained based on historical statistical data and actual application scene analysis.
The historical variable sets satisfying the above conditions may be used as preset variables.
In an embodiment, the variance inflation factor VIF (variance inflation coefficient) value may be calculated to perform a variable on the historical variable set, for example, the historical variable whose VIF value is less than 2 may be retained, so as to effectively reduce the collinearity of the regression model, as shown in table 14.
TABLE 14 variables with partial VIF values less than 2
var vif.fit.
AGENTID1 1.122457
DISCOUNT 1.403647
RENEWNUM 1.472236
PRDGROUP1 1.328946
APPLIAGE 1.058096
APPLINOCARNUMS1 1.021844
ISJTCUST 1.018463
PURCHASEPRICE 1.041563
APPLICARMINYEAR1 1.476288
RATE 1.051276
In one embodiment, it is able to analyze whether The interpretation variable x has an influence on The response variable y by using a statistical distribution map in RStudio (a development environment based on The R (The R) language for statistical analysis, drawing and operation environment) language of The historical variable set, thereby assisting in screening out a preset variable with a proper contribution rate from The historical variable set.
For example: take the explanation variable x-premium discount as an example: by plotting, via the ratetle interface in R, a statistical profile of premium discount x and response variable y-policy customer churn, if analyzed from the statistical profile: the larger the discount strength is, the lower the loss proportion is, the explanation variable x has an influence on the response variable y, and the explanation variable x-premium discount can be temporarily reserved as a primary screen.
In an embodiment, combining the above variable screening methods, and finally performing data analysis, the significant factors (preset variables) that affect the customer churn are screened from the historical variable set, as shown in table 15:
TABLE 15 Preset variables
Whether it runs off REFLAG
Representatives AGENTID1
Discount and method for making same DISCOUNT
Number of continuous maintenance RENEWNUM
Identity of same insurance PRDGROUP1
Age of insuring person APPLIAGE
Whether or not the customer applies insurance other than car insurance APPLINOCARNUMS1
Whether or not to group clients ISJTCUST
Purchase price of new car PURCHASEPRICE
Minimum year of car insurance for client APPLICARMINYEAR1
Loss rate of brand RATE
Step 406: and respectively training various mathematical algorithm models according to a plurality of preset variables and generating a plurality of preset recognition models. See the description of step 304 in the above embodiments for details.
Step 407: and respectively calculating the truth of each preset identification model based on the historical documents. See the description of step 305 in the above embodiments for details.
Step 408: and judging whether a plurality of equal preset identification models with the same truth degree and the maximum truth degree exist in the plurality of preset identification models. If yes, go to step 410, otherwise go to step 409. See the description of step 306 in the above embodiments for details.
Step 409: and if the same preset identification model does not exist in the plurality of preset identification models, selecting the preset identification model with the maximum true degree as the target identification model. And proceeds to step 412. See the description of step 307 in the above embodiments for details.
Step 410: if a plurality of identical preset recognition models exist in the plurality of preset recognition models, the accuracy of the confusion matrix of each identical preset recognition model is calculated respectively. See the description of step 308 in the above embodiments for details.
Step 411: and selecting the equivalent preset recognition model with the maximum accuracy of the confusion matrix from the equivalent preset recognition models as a target recognition model. See the description of step 309 in the above embodiments for details.
Step 412: and acquiring the information of the document to be processed. See the description of step 201 in the above embodiments for details.
Step 413: and analyzing the bill information according to the data dimension corresponding to the bill information to generate a plurality of initial variables.
In this step, the data dimensions may correspond to the type of document information. For example, the data dimension of a vehicle insurance policy can be divided into: customer data dimensions, vehicle data dimensions, insurance data dimensions, life insurance customer data dimensions, business member data dimensions, brand data dimensions. And analyzing the bill information based on the corresponding data dimension so as to obtain a plurality of initial variables.
Step 414: analyzing actual data of each initial variable in the document information, and classifying all the initial variables according to a preset classification rule.
In this step, the actual meaning of each initial variable in all the document information to be processed is read, and the preset classification rule may be similar to the classification scheme shown in table 2, and the initial variables are classified into continuous variables, classification variables, date variables, and non-function variables.
Step 415: and generating an initial variable set according to the actual data and the classification result.
In this step, the actual data, i.e. the meaning of each initial variable in the actual scenario, and the same set of initial variables may be stored in a manner similar to that shown in table 2.
Step 416: and carrying out invalid data cleaning on the initial variable set to generate an effective variable set. See the description of step 312 in the above embodiments for details.
Step 417: and extracting a data set corresponding to the preset variable from the effective variable set to serve as a flowing characteristic set of the document information. See the description of step 313 in the above embodiments for details.
Step 418: and inputting the flow characteristic set into a target recognition model, and recognizing the flow information of the bill information. See the description of step 203 in the above embodiments for details.
According to the information flow identification method, as the application scene is specific to the vehicle insurance policy data and the data platforms are different, the identification model models can be respectively constructed. Different bill information data are different, so that the variables which finally enter the identification model are different, and the obtained identification model is also different. When customer attrition probability prediction needs to be performed on certain type of document data, firstly, data processing is performed on original data of a branch to be predicted in a Database, missing values and abnormal values are filled, then, connection of the Database is performed in a development environment through an ODBC (Open Database Connectivity) connection mode, the processed data needing prediction is read, a previously stored target identification model is loaded, finally, customer attrition probability prediction is performed on the data, prediction results are stored into a data frame and written back to a data storage structure, and the prediction results include but are not limited to: loss/persistence flag, corresponding probability. The prediction result can be sent to the foreground terminal for page display, for example, the prediction result condition of each policy can be displayed from the foreground page. Export of bulk files may also be provided.
Please refer to fig. 5, which is an information flow direction identification apparatus 500 according to an embodiment of the present application, and the apparatus can be applied to the electronic device 1 shown in fig. 1 and can be applied to a scenario of predicting policy loss information, so as to identify the flow direction information of the document information by using a target identification model according to a flow feature set of the document information. The device includes: the first obtaining module 501, the analyzing module 502 and the identifying module 503 are as follows:
the first obtaining module 501 is configured to obtain document information to be processed. See the description of step 201 in the above embodiments for details.
The analyzing module 502 is configured to analyze the document information according to a preset variable, and generate a flow feature set of the document information. See the description of step 202 in the above embodiments for details.
And the identifying module 503 is configured to input the flow feature set to the target identification model, and identify flow direction information of the document information. See the description of step 203 in the above embodiments for details.
In one embodiment, the parsing module 502 is configured to: an initial set of variables contained in the document information is identified. And carrying out invalid data cleaning on the initial variable set to generate an effective variable set. And extracting a data set corresponding to the preset variable from the effective variable set to serve as a flowing characteristic set of the document information. Refer to the description of steps 311 to 313 in the above embodiments in detail.
In one embodiment, identifying an initial set of variables included in document information includes: and analyzing the bill information according to the data dimension corresponding to the bill information to generate a plurality of initial variables. Analyzing actual data of each initial variable in the document information, and classifying all the initial variables according to a preset classification rule. And generating an initial variable set according to the actual data and the classification result. See the description of steps 413-415 in the above embodiments in detail.
In one embodiment, the method further comprises: and a second obtaining module 504, configured to obtain data of a plurality of history documents. And a cleaning module 505, configured to perform invalid data cleaning on the data of each history document, and generate a history variable set. And the removing module 506 is configured to generate a plurality of preset variables after removing the historical variables from the historical variable set, where the information flow contribution rate of the historical documents is smaller than a preset contribution threshold. See the description of steps 301 to 303 in the above embodiments in detail.
In one embodiment, the culling module 506 is configured to: and acquiring actual historical flow information of the historical document. And calculating the correlation degree of each historical variable and the actual historical flow information. And after removing the historical variables with the correlation degrees smaller than the preset contribution threshold value from the historical variable set, generating a plurality of preset variables. See the above embodiments for a detailed description of steps 404 to 405.
In one embodiment, the method further comprises: the training module 507 is configured to train multiple mathematical algorithm models according to multiple preset variables, and generate multiple preset recognition models. And the calculation module is used for calculating the truth of each preset identification model based on the historical documents. The determining module 508 is configured to determine whether there are multiple identical preset recognition models with the same degree of truth and the largest degree of truth among the multiple preset recognition models. The selecting module 509 is configured to, if there is no equivalent preset recognition model in the plurality of preset recognition models, select the preset recognition model with the largest degree of truth as the target recognition model. See the description of step 304 to step 307 in the above embodiments in detail.
In an embodiment, the calculation module is further configured to calculate a correctness of the confusion matrix of each of the identical preset recognition models, if a plurality of identical preset recognition models exist in the plurality of preset recognition models. The selecting module 509 is further configured to select, from the multiple equivalent preset recognition models, an equivalent preset recognition model with a largest accuracy of the confusion matrix as the target recognition model. See the description of steps 308-309 in the above embodiments for details.
For a detailed description of the information flow identification device 500, please refer to the description of the related method steps in the above embodiments.
An embodiment of the present invention further provides a non-transitory electronic device readable storage medium, including: a program that, when run on an electronic device, causes the electronic device to perform all or part of the procedures of the methods in the above-described embodiments. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like. The storage medium may also comprise a combination of memories of the kind described above.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (16)

1. An information flow direction identification method is characterized by comprising the following steps:
acquiring document information to be processed;
analyzing the bill information according to a preset variable to generate a flow characteristic set of the bill information;
and inputting the flow characteristic set into a target recognition model, and recognizing the flow direction information of the bill information.
2. The method according to claim 1, wherein the parsing the document information according to a preset variable to generate a flowing feature set of the document information further comprises:
identifying an initial variable set contained in the document information;
carrying out invalid data cleaning on the initial variable set to generate an effective variable set;
and extracting a data set corresponding to the preset variable from the effective variable set to serve as a flowing feature set of the bill information.
3. The method of claim 2, wherein the identifying the initial set of variables contained in the document information comprises:
analyzing the bill information according to the data dimension corresponding to the bill information to generate a plurality of initial variables;
analyzing actual data of each initial variable in the bill information, and classifying all the initial variables according to a preset classification rule;
and generating the initial variable set according to the actual data and the classification result.
4. The method of claim 1, wherein the step of selecting the predetermined variable comprises:
acquiring data of a plurality of historical documents;
carrying out invalid data cleaning on the data of each historical receipt to generate a historical variable set;
and after the historical variables with the information flow contribution rate to the historical documents smaller than a preset contribution threshold value are removed from the historical variable set, generating a plurality of preset variables.
5. The method according to claim 4, wherein the generating a plurality of preset variables after removing the history variables with the information flow contribution rate to the history document smaller than a preset contribution threshold from the history variable set comprises:
acquiring actual historical flow direction information of the historical document;
calculating the correlation degree of each historical variable and the actual historical flow direction information;
and after the historical variables with the correlation degrees smaller than the preset contribution threshold value are removed from the historical variable set, generating a plurality of preset variables.
6. The method of claim 4, wherein the step of pre-programming the target recognition model comprises:
respectively training multiple mathematical algorithm models according to the preset variables and generating multiple preset recognition models;
respectively calculating the truth of each preset identification model based on the historical documents;
judging whether a plurality of equal preset identification models with the same true degree and the maximum true degree exist in the plurality of preset identification models or not;
and if the same preset identification model does not exist in the plurality of preset identification models, selecting the preset identification model with the maximum true degree as the target identification model.
7. The method of claim 6, further comprising:
if a plurality of identical preset recognition models exist in the plurality of preset recognition models, respectively calculating the accuracy of a confusion matrix of each identical preset recognition model;
and selecting the equivalent preset recognition model with the maximum accuracy of the confusion matrix from the equivalent preset recognition models as the target recognition model.
8. An information flow direction identification device, comprising:
the first acquisition module is used for acquiring the information of the document to be processed;
the analysis module is used for analyzing the bill information according to a preset variable to generate a flow characteristic set of the bill information;
and the identification module is used for inputting the flow characteristic set into a target identification model and identifying the flow direction information of the bill information.
9. The apparatus of claim 8, wherein the parsing module is configured to:
identifying an initial variable set contained in the document information;
carrying out invalid data cleaning on the initial variable set to generate an effective variable set;
and extracting a data set corresponding to the preset variable from the effective variable set to serve as a flowing feature set of the bill information.
10. The apparatus of claim 9, wherein the identifying the initial set of variables contained in the document information comprises:
analyzing the bill information according to the data dimension corresponding to the bill information to generate a plurality of initial variables;
analyzing actual data of each initial variable in the bill information, and classifying all the initial variables according to a preset classification rule;
and generating the initial variable set according to the actual data and the classification result.
11. The apparatus of claim 8, further comprising:
the second acquisition module is used for acquiring data of a plurality of historical documents;
the cleaning module is used for cleaning invalid data of each historical document to generate a historical variable set;
and the removing module is used for removing the historical variables with the information flow contribution rate to the historical documents smaller than a preset contribution threshold value from the historical variable set to generate a plurality of preset variables.
12. The apparatus of claim 11, wherein the culling module is to:
acquiring actual historical flow direction information of the historical document;
calculating the correlation degree of each historical variable and the actual historical flow direction information;
and after the historical variables with the correlation degrees smaller than the preset contribution threshold value are removed from the historical variable set, generating a plurality of preset variables.
13. The apparatus of claim 11, further comprising:
the training module is used for respectively training a plurality of mathematical algorithm models according to a plurality of preset variables and generating a plurality of preset recognition models;
the calculation module is used for calculating the truth of each preset identification model based on the historical documents;
the judging module is used for judging whether a plurality of equal preset identification models with the same true degree and the maximum true degree exist in the preset identification models or not;
and the selecting module is used for selecting the preset identification model with the maximum true degree as the target identification model if the same preset identification model does not exist in the plurality of preset identification models.
14. The apparatus of claim 13,
the calculation module is further configured to, if a plurality of identical preset recognition models exist in the plurality of preset recognition models, respectively calculate a correctness of a confusion matrix of each identical preset recognition model;
the selecting module is further configured to select, from the multiple equivalent preset recognition models, the equivalent preset recognition model with the largest accuracy of the confusion matrix as the target recognition model.
15. An electronic device, comprising:
a memory to store a computer program;
a processor arranged to perform the method of any of claims 1 to 7 to identify flow information for document information.
16. A non-transitory electronic device readable storage medium, comprising: program which, when run by an electronic device, causes the electronic device to perform the method of any one of claims 1 to 7.
CN202010338853.1A 2020-04-26 2020-04-26 Information flow direction identification method, device, equipment and storage medium Active CN111401329B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010338853.1A CN111401329B (en) 2020-04-26 2020-04-26 Information flow direction identification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010338853.1A CN111401329B (en) 2020-04-26 2020-04-26 Information flow direction identification method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111401329A true CN111401329A (en) 2020-07-10
CN111401329B CN111401329B (en) 2021-10-29

Family

ID=71414095

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010338853.1A Active CN111401329B (en) 2020-04-26 2020-04-26 Information flow direction identification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111401329B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699872A (en) * 2020-12-29 2021-04-23 天津幸福生命科技有限公司 Form auditing processing method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100169124A1 (en) * 2004-03-31 2010-07-01 Aetna Inc. System and method for administering health care cost reduction
CN104834983A (en) * 2014-12-25 2015-08-12 平安科技(深圳)有限公司 Business data processing method and device
CN107798615A (en) * 2017-02-17 2018-03-13 平安科技(深圳)有限公司 Declaration form renews charge difficulty Forecasting Methodology and device
CN108182638A (en) * 2018-01-31 2018-06-19 泰康保险集团股份有限公司 The analysis method and device that declaration form is lost in
CN108648011A (en) * 2018-05-11 2018-10-12 上海赢科信息技术有限公司 Model generates, identification client buys the method and system of vehicle insurance intention
CN109242539A (en) * 2018-08-14 2019-01-18 中国平安人寿保险股份有限公司 Based on potential user's prediction technique, device and the computer equipment for being lost user

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100169124A1 (en) * 2004-03-31 2010-07-01 Aetna Inc. System and method for administering health care cost reduction
CN104834983A (en) * 2014-12-25 2015-08-12 平安科技(深圳)有限公司 Business data processing method and device
CN107798615A (en) * 2017-02-17 2018-03-13 平安科技(深圳)有限公司 Declaration form renews charge difficulty Forecasting Methodology and device
CN108182638A (en) * 2018-01-31 2018-06-19 泰康保险集团股份有限公司 The analysis method and device that declaration form is lost in
CN108648011A (en) * 2018-05-11 2018-10-12 上海赢科信息技术有限公司 Model generates, identification client buys the method and system of vehicle insurance intention
CN109242539A (en) * 2018-08-14 2019-01-18 中国平安人寿保险股份有限公司 Based on potential user's prediction technique, device and the computer equipment for being lost user

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699872A (en) * 2020-12-29 2021-04-23 天津幸福生命科技有限公司 Form auditing processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111401329B (en) 2021-10-29

Similar Documents

Publication Publication Date Title
CN107025596B (en) Risk assessment method and system
CN111507831A (en) Credit risk automatic assessment method and device
CN110706039A (en) Electric vehicle residual value rate evaluation system, method, equipment and medium
US20150269669A1 (en) Loan risk assessment using cluster-based classification for diagnostics
US20130290167A1 (en) System and method for credit risk management for educational institutions
CN112017040B (en) Credit scoring model training method, scoring system, equipment and medium
CN113139687B (en) Method and device for predicting credit card user default
CN112102073A (en) Credit risk control method and system, electronic device and readable storage medium
CN112686749B (en) Credit risk assessment method and device based on logistic regression technology
CN111145006A (en) Automobile financial anti-fraud model training method and device based on user portrait
CN111192140A (en) Method and device for predicting customer default probability
CN110781380A (en) Information pushing method and device, computer equipment and storage medium
CN111275338A (en) Method, device, equipment and storage medium for judging enterprise fraud behaviors
CN114926299A (en) Prediction method for predicting vehicle accident risk based on big data analysis
CN111401329B (en) Information flow direction identification method, device, equipment and storage medium
US20060248096A1 (en) Early detection and warning systems and methods
CN117437019A (en) Credit card overdue risk prediction method, apparatus, device, medium and program product
CN115205026A (en) Credit evaluation method, device, equipment and computer storage medium
CN112348584A (en) Vehicle estimation method, device and equipment
CN114170000A (en) Credit card user risk category identification method, device, computer equipment and medium
CN114596152A (en) Method, device and storage medium for predicting debt subject default based on unsupervised model
CN115222119A (en) Bank account deposit prediction method
CN116883153A (en) Pedestrian credit investigation-based automobile finance pre-credit rating card development method and terminal
CN116308745A (en) Risk classification method and apparatus
CN116976187A (en) Modeling variable determining method, abnormal data prediction model construction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220311

Address after: 401120 data of Xiantao street, Yubei District, Chongqing 19

Patentee after: Chongqing Xinzhi Jinfu Information Technology Co.,Ltd.

Address before: Room 720-2, floor 7, building 5, yard 1, Shangdi 10th Street, Haidian District, Beijing 100082

Patentee before: Beijing Xinzhi junyang Information Technology Co.,Ltd.

TR01 Transfer of patent right
CP01 Change in the name or title of a patent holder

Address after: 401120 data of Xiantao street, Yubei District, Chongqing 19

Patentee after: Chongqing Xinzhi Automotive Technology Co.,Ltd.

Address before: 401120 data of Xiantao street, Yubei District, Chongqing 19

Patentee before: Chongqing Xinzhi Jinfu Information Technology Co.,Ltd.

CP01 Change in the name or title of a patent holder