CN110879821A - Method, device, equipment and storage medium for generating rating card model derivative label - Google Patents

Method, device, equipment and storage medium for generating rating card model derivative label Download PDF

Info

Publication number
CN110879821A
CN110879821A CN201911096223.1A CN201911096223A CN110879821A CN 110879821 A CN110879821 A CN 110879821A CN 201911096223 A CN201911096223 A CN 201911096223A CN 110879821 A CN110879821 A CN 110879821A
Authority
CN
China
Prior art keywords
derivative
sample
label
card model
sample data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911096223.1A
Other languages
Chinese (zh)
Inventor
杨良志
白琳
汪志新
周光辉
张奇
张卓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Caixun Technology Co Ltd
Original Assignee
Caixun Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Caixun Technology Co Ltd filed Critical Caixun Technology Co Ltd
Priority to CN201911096223.1A priority Critical patent/CN110879821A/en
Publication of CN110879821A publication Critical patent/CN110879821A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

The embodiment of the invention discloses a method, a device, equipment and a storage medium for generating a label derived from a rating card model, wherein the method comprises the following steps: acquiring sample data; obtaining analysis parameters of the sample data through correlation analysis; obtaining a sorting parameter of the analysis parameter through a preset rule; carrying out box separation operation on the sample data according to the sorting parameters to generate a sample derivative label; and generating a derivative label of the scoring card model according to the sample derivative label. According to the embodiment of the invention, when the scoring card model is used for characteristic engineering, the scoring card model is not limited to the univariate analysis of the label, the derivative label with strong directivity can be generated, and a large number of available label boxes are provided for the subsequent scoring card model, so that the accuracy of the scoring card model is improved.

Description

Method, device, equipment and storage medium for generating rating card model derivative label
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a method, a device, equipment and a storage medium for generating a label derived from a rating card model.
Background
The mainstream modeling method for obtaining the passenger and controlling the wind big data at present is a scoring card model. The scoring card model is a model which is used for carrying out systematic analysis on a large amount of data of user characteristics by applying an advanced data mining technology and a statistical analysis method, mining behavior patterns and characteristics contained in the data, capturing the relation between historical information and future behavior, developing a predictive model, and evaluating a certain behavior of a user in the future by a score.
Before using algorithmic calculations, it is necessary to analyze the data and format the data into a form that conforms to the logic of the algorithm, i.e., feature engineering, which is also a critical part of machine learning modeling. The quality of the derived tags in the feature engineering is of great importance to the final modeling effect. In the feature engineering, the traditional scoring card model only performs rough univariate analysis, and enters the model by means of ready-made labels, so that key information is often omitted, and the modeling accuracy is poor.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, an apparatus, a device, and a storage medium for generating a label derived from a score card model, so as to enhance the directionality of the label in the score card model feature engineering and improve the fitting effect of the score card model.
In a first aspect, an embodiment of the present invention provides a method for generating a derivative tag of a score card model, including:
acquiring sample data;
obtaining analysis parameters of the sample data through correlation analysis;
obtaining a sorting parameter of the analysis parameter through a preset rule;
carrying out box separation operation on the sample data according to the sorting parameters to generate a sample derivative label;
and generating a derivative label of the scoring card model according to the sample derivative label.
Further, the obtaining analysis parameters of the sample data through correlation analysis includes:
and obtaining the analysis parameters of the sample data through an Apriori association algorithm.
Further, the analysis parameters include: one or more of support, confidence, coverage, strength, boost, and availability.
Further, the obtaining of the sorting parameter of the analysis parameter by the preset rule includes:
and obtaining the sequencing parameters of the analysis parameters by an entropy weight method.
Further, the generating of the derivative label of the score card model according to the sample derivative label includes:
judging whether preset scoring card model data are in a range determined by the sample derivative label;
and if the preset scoring card model data is in the range determined by the sample derivative label, taking the sample derivative label as the derivative label of the scoring card model.
Further, before obtaining the analysis parameters of the sample data through the association analysis, the method further includes:
and preprocessing the sample data.
Further, after the generating of the derivative tag of the score card model according to the sample derivative tag, the method further includes:
performing logistic regression calculations on the derived tags to determine the performance of the scoring card model after including the derived tags.
In a second aspect, an embodiment of the present invention provides a device for generating a derivative tag of a rating card model, including:
the sample data acquisition module is used for acquiring sample data;
the analysis parameter acquisition module is used for acquiring analysis parameters of the sample data through correlation analysis;
the sequencing parameter acquisition module is used for acquiring sequencing parameters of the analysis parameters through a preset rule;
the sample derived tag generation module is used for performing box separation operation on the sample data according to the sorting parameters to generate sample derived tags;
and the scoring card model derivative label generating module is used for generating a derivative label of the scoring card model according to the sample derivative label.
In a third aspect, an embodiment of the present invention provides a computer device, including:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors implement the scoring card model derivative tag generation method provided by any embodiment of the invention.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for generating a derivative tag of a scorecard model according to any embodiment of the present invention.
The generation method of the mark card model derivative label provided by the embodiment of the invention obtains sample data; obtaining analysis parameters of the sample data through correlation analysis; obtaining a sorting parameter of the analysis parameter through a preset rule; carrying out box separation operation on the sample data according to the sorting parameters to generate a sample derivative label; and generating a derivative label of the scoring card model according to the sample derivative label. According to the embodiment, when the scoring card model is used for characteristic engineering, the scoring card model is not limited to univariate analysis of the label, the derivative label with strong directivity can be generated, a large number of available label boxes are provided for the subsequent scoring card model, and therefore the accuracy of the scoring card model is improved.
Drawings
Fig. 1 is a schematic flow chart of a method for generating a derivative tag of a scorecard model according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a method for generating a derivative tag of a scorecard model according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a scoring card model derivative tag generation apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. A process may be terminated when its operations are completed, but may have additional steps not included in the figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc.
Furthermore, the terms "first," "second," and the like may be used herein to describe various orientations, actions, steps, elements, or the like, but the orientations, actions, steps, or elements are not limited by these terms. These terms are only used to distinguish one direction, action, step or element from another direction, action, step or element. For example, association analysis may be a preset rule without departing from the scope of the present application, and similarly, a preset rule may be referred to as association analysis. Both the association analysis and the preset rule are preset rules, but they are not the same preset rule. The terms "first", "second", etc. are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "plurality", "batch" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Example one
Fig. 1 is a schematic flow chart of a method for generating a derivative tag of a score card model according to an embodiment of the present invention, which may be applied to generate a derivative tag in a feature engineering of a score card model. As shown in fig. 1, a method for generating a derivative tag of a scorecard model according to an embodiment of the present invention includes:
and S110, acquiring sample data.
Specifically, the sample data includes positive sample data and negative sample data, sample data that meets a certain specific condition is referred to as sample data, and sample data that does not meet a certain specific condition is referred to as negative sample data. For example, if the examination result is equal to or greater than 60 points, the sample data with the examination result equal to or greater than 60 points is called positive sample data, and the sample data with the examination result less than 60 points is called negative sample data.
A certain score card model includes a large amount of positive and negative sample data, but in this embodiment, the obtained sample data is positive sample data, and in order to ensure the accuracy of the generated derivative tag, the number of the positive sample data is usually much larger than that of the sample data in the score card model.
And S120, obtaining analysis parameters of the sample data through correlation analysis.
In particular, association analysis, also known as association mining, is to find frequent patterns, associations, correlations, or causal structures existing between sets of items or objects in transaction data, relationship data, or other information carriers. Alternatively, the association analysis is to discover the association between different items in the transaction database. The analysis parameters are data parameters required in the subsequent steps, and can be calculated through correlation analysis, including but not limited to: support degree, confidence degree, coverage degree, force, promotion degree, utilization degree and other parameters.
Before the sample data is subjected to the association analysis, the sample data needs to be subjected to a box separation operation so as to be separated into different item sets. When the classification model is established, continuous variables need to be discretized, and after the characteristics are discretized, the model is more stable, so that the risk of overfitting the model is reduced. The discretization of the continuous variable is a binning operation, or the binning operation is to classify sample data meeting a certain specific condition into the same class, and the sample data classified into the same class is called as one sample binning.
In general, there are many methods for binning, such as equidistant binning, equal-frequency binning, and the like, and the binning operation in this embodiment is not limited to use of one binning method, and the binning method can be flexibly used according to specific situations. For example, the sample data is classified into 10 types by age (age is classified into 2 types by gender (male and female types), and classified into 4 types by monthly fee (fee is classified into 22 types by 0-100 yuan and classified into 25 yuan), and the 22 types are totally called as 22 sample bins, wherein the age, the gender and the monthly fee are the existing tags of the sample data. After the sample data is subjected to binning operation, analysis parameters among different sample binning are respectively calculated through correlation analysis.
S130, obtaining the sequencing parameters of the analysis parameters through a preset rule.
Specifically, the sorting parameter is a parameter capable of sorting the samples in a descending order or an ascending order, and the sorting parameter is obtained by calculating the analysis parameter according to a preset rule. For example, the analysis parameter includes one or more of 6 parameters of support degree, confidence degree, coverage degree, strength, promotion degree, and utilization degree, and weight data can be obtained by taking the weight of the 6 parameters, and then the weight data is the sorting parameter.
And S140, performing box separation operation on the sample data according to the sorting parameters to generate a sample derivative label.
Specifically, the samples are sorted in a descending order according to the sorting parameters of the sample data, and the samples in the front of the sorting are sorted and then subjected to box sorting, so that a new derivative label, called a sample derivative label, can be obtained. And the sample sorting at the back is subjected to box sorting to show that the numerical value of the sorting parameter is smaller, and further deducing that the numerical value of the analysis parameter is smaller, so that the relevance of the sample data is lower, and the step of performing box sorting on the part of data is not needed. For example, from the ranking results: the sample data is most probably house sales personnel, the sample derivative label is the house sales personnel, the numerical value of the sample derivative label of the sample data, namely the house sales personnel label, is given as 1, namely the specific numerical value of the house sales personnel label is 1, and the rest sample data which do not meet the condition have the value of 0 under the label.
And S150, generating a derivative label of the scoring card model according to the sample derivative label.
Specifically, according to the generation method of the sample derivative label, the scoring card model data in the scoring card model, which meets the same conditions as the sample derivative label data, are also combined in the same way, so that a new derivative label of the scoring card model is generated. For example, a specific value of 1 for a house salesperson label is provided by: and if the age is 20-30 years, the gender is male, and the monthly telephone charge is 75-100 yuan, adding a new derivative tag to all data combinations of the age tag value of 20-30 years, the gender tag of male and the monthly telephone charge tag of 75-100 yuan in the scoring card model: house sales personnel and the specific value of this label is 1.
The method for generating the label derived from the grading card model provided by the embodiment of the invention comprises the steps of obtaining sample data; obtaining analysis parameters of the sample data through correlation analysis; obtaining a sorting parameter of the analysis parameter through a preset rule; carrying out box separation operation on the sample data according to the sorting parameters to generate a sample derivative label; and generating a derivative label of the scoring card model according to the sample derivative label. According to the embodiment, when the scoring card model is used for characteristic engineering, the scoring card model is not limited to univariate analysis of the label, the derivative label with strong directivity can be generated, a large number of available label boxes are provided for the subsequent scoring card model, and therefore the accuracy of the scoring card model is improved.
Example two
Fig. 2 is a schematic flow chart of a method for generating a derivative tag of a scorecard model according to a second embodiment of the present invention, which is a further refinement of the second embodiment. As shown in fig. 2, a method for generating a derivative label of a scorecard model according to a second embodiment of the present invention includes:
and S210, acquiring sample data.
Specifically, the sample data includes positive sample data and negative sample data, sample data that meets a certain specific condition is referred to as sample data, and sample data that does not meet a certain specific condition is referred to as negative sample data. For example, if the examination result is equal to or greater than 60 points, the sample data with the examination result equal to or greater than 60 points is called positive sample data, and the sample data with the examination result less than 60 points is called negative sample data.
A certain score card model includes a large amount of positive and negative sample data, but in this embodiment, the obtained sample data is positive sample data, and in order to ensure the accuracy of the generated derivative tag, the number of the positive sample data is usually much larger than that of the sample data in the score card model.
S220, preprocessing the sample data.
Specifically, the preprocessing is to process missing values and duplicate values of the sample data, and generally includes: the method comprises the steps of removing unique attributes, processing missing values, encoding attributes, standardizing and regularizing data, selecting features, analyzing principal components and the like. When the classification model is established, continuous variables need to be discretized, and after the characteristics are discretized, the model is more stable, so that the risk of overfitting the model is reduced. The discretization of the continuous variable is a binning operation, or the binning operation is to classify sample data meeting a certain specific condition into the same class, and the sample data classified into the same class is called as one sample binning.
In general, there are many methods for binning, such as equidistant binning, equal-frequency binning, and the like, and the binning operation in this embodiment is not limited to use of one binning method, and the binning method can be flexibly used according to specific situations. For example, the sample data is classified into 10 types by age (age is classified into 2 types by gender (male and female types), and classified into 4 types by monthly fee (fee is classified into 22 types by 0-100 yuan and classified into 25 yuan), and the 22 types are totally called as 22 sample bins, wherein the age, the gender and the monthly fee are the existing tags of the sample data.
And S230, obtaining analysis parameters of the sample data through an Apriori association algorithm.
In particular, association analysis, also known as association mining, is to find frequent patterns, associations, correlations, or causal structures existing between sets of items or objects in transaction data, relationship data, or other information carriers. Alternatively, the association analysis is to discover the association between different items in the transaction database. There are many calculation methods for correlation analysis, and preferably, the present embodiment employs Apriori correlation algorithm, which is used to find frequently occurring data sets in data values. The analysis parameters are data parameters required in the subsequent steps, and are calculated by Apriori correlation algorithm, including but not limited to: support degree, confidence degree, coverage degree, force, promotion degree, utilization degree and other parameters. In general, the evaluation criteria of a frequently-used item set include three of support degree, confidence degree and promotion degree.
The Apriori correlation algorithm uses the prior property of the frequent item set, i.e. all non-empty subsets of the frequent item set must also be frequent, so the aim of the Apriori correlation algorithm is to find the largest K-term frequent set. The Apriori algorithm uses an iterative method called layer-by-layer search, where a set of k terms is used to explore a set of (k +1, k being a positive integer) terms. First, by scanning the database, the counts for each item are accumulated, and the items that meet the minimum support are collected, finding the first set of frequent items, denoted as L1. Then, find the set of second frequent item set L2 using L1, find L3 using L2, and so on until the kth frequent item set can no longer be found. A complete scan of the database is required each time an Lk is found. The Apriori algorithm uses the a priori nature of the frequent item set to compress the search space.
And S240, obtaining the sequencing parameters of the analysis parameters through an entropy weight method.
Specifically, the sorting parameter is a parameter that can arrange the sample bins in a descending order or an ascending order, and preferably, the sorting parameter is calculated by an entropy weight method in this embodiment. The basic idea of the entropy weight method is to determine objective weights according to the degree of index variability, and generally, the smaller the information entropy of an index is, the greater the degree of variation of the index value is, the more information is provided, the greater the effect that can be played in the comprehensive evaluation is, and the greater the weight is. Conversely, if the information entropy of a certain index is larger, the degree of variation of the index value is smaller, the amount of information to be provided is smaller, the effect of the overall evaluation is smaller, and the weight is smaller.
For example, the analysis parameters include 6 parameters of support degree, confidence degree, coverage degree, strength, promotion degree and utilization degree, entropy weight calculation is performed on the 6 analysis parameters, firstly, data of the 6 analysis parameters are subjected to standardization processing, then, information entropy of each parameter is obtained, then, weight indexes of each parameter are determined according to the information entropy of each parameter, and finally, ranking parameters are calculated according to the weight indexes of each parameter. For example, the weight indexes of the 6 analysis parameters of support, confidence, coverage, strength, promotion and utilization calculated by the entropy weight method are respectively as follows: 25%, 23%, 11%, 20%, 10%, the ranking parameter ═ support × 25% + confidence × 23% + coverage × 11% + force × 11% + lift × 20% + utilization × 10%.
And S250, performing box separation operation on the sample data according to the sorting parameters to generate a sample derivative label.
Specifically, the samples are sorted in a descending order according to the sorting parameters of the sample data, and the samples in the front of the sorting are sorted and then subjected to box sorting, so that a new derivative label, called a sample derivative label, can be obtained. And the sample sorting at the back is subjected to box sorting to show that the numerical value of the sorting parameter is smaller, and further deducing that the numerical value of the analysis parameter is smaller, so that the relevance of the sample data is lower, and the step of performing box sorting on the part of data is not needed. For example, from the ranking results: the sample data is most probably house sales personnel, the sample derivative label is the house sales personnel, the numerical value of the sample derivative label of the sample data, namely the house sales personnel label, is given as 1, namely the specific numerical value of the house sales personnel label is 1, and the rest sample data which do not meet the condition have the value of 0 under the label.
And S260, judging whether the preset scoring card model data is in the range determined by the sample derivative label.
And S270, if the preset scoring card model data are in the range determined by the sample derivative label, taking the sample derivative label as the derivative label of the scoring card model.
Specifically, the preset scoring card model data refers to positive and negative sample data included in the scoring card model, and scoring card model data in the scoring card model, which meets the same conditions as the sample derivative tag data, are also combined in the same way according to the generation method of the sample derivative tag, so that the derivative tag of the new scoring card model is generated. For example, a specific value of 1 for a house salesperson label is provided by: and if the age is 20-30 years, the gender is male, and the monthly telephone charge is 75-100 yuan, adding a new derivative tag to all data combinations of the age tag value of 20-30 years, the gender tag of male and the monthly telephone charge tag of 75-100 yuan in the scoring card model: house sales personnel and the specific value of this label is 1. For example, if the rating card model has an age tag of 25 years, a gender tag of male, and a monthly fee tag of 85 dollars, the house salesman tag is 1.
S280, performing logistic regression calculation on the derived tags to determine the performance of the scoring card model including the derived tags.
Specifically, the scoring card model is subjected to logistic regression calculation after the derivative tags are added, and whether the accuracy of the scoring card model is improved or not can be determined according to the result of the logistic regression calculation.
Further, a method for performing logistic regression calculation on the derived tags to determine the performance of the scoring card model including the derived tags includes steps S281 to S284 (not shown in the figure).
And S281, performing logistic regression calculation on the scoring card model added with the derivative label.
Specifically, logistic regression is a generalized linear regression analysis model, and mainly includes: selecting a prediction function; solving a loss function and a J (theta) function, wherein the J (theta) function represents the deviation of all the training data predicted values and the actual categories; the gradient descent method minimizes J (θ) and vectorizes (vectorization) the recursive descent process.
And S282, generating a ten-degree table according to the result of the logistic regression calculation.
Specifically, the decimal table is a table for verifying the quality of the model. After the training of the scoring card model with the derivative labels added is finished, namely after the logistic regression calculation is finished, each sample data can obtain a feedback rate value, and the sample data is divided into ten equal parts after being sorted according to the feedback rate value, namely a ten equal part table.
And S283, calculating preset evaluation parameters according to the ten-division table.
Specifically, an ROC curve (receiver operating characteristic curve) is generated from the ten-point table. The ROC curve is a graph formed by using a horizontal axis of False positive probability (FPR) and a vertical axis of True positive probability (TPR), and is drawn by using different results obtained by using different judgment standards under specific stimulation conditions. The preset evaluation parameters include AUC values (area of area under ROC Curve), and the AUC values are mainly used to test the capability of the scoring card model to correctly sort sample data. The stronger the resolution capability of the scoring card model is, the closer the ROC curve is to the upper left corner, the higher the AUC value is, and the stronger the risk distinguishing capability of the scoring card model is.
Further, a KS curve is generated according to a ten-point table, and the KS curve is a graph formed by taking the FPR or TPR as a vertical axis and taking the ten-point table as a horizontal axis. The larger the KS value, the better the prediction accuracy of the scorecard model.
According to the method for generating the derivative label of the score card model, provided by the embodiment of the invention, the derivative label is generated through an Apriori correlation algorithm and an entropy weight method, so that the accuracy of the method for generating the derivative label is improved, and the directivity of the derivative label is enhanced. The accuracy and the stability of the model are further improved by verifying the performance of the scoring card model after the new derivative label is generated.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a scoring card model derivative tag generation apparatus according to a third embodiment of the present invention, which is applicable to generating derivative tags in feature engineering of a scoring card model. The device for generating a derivative label of a rating card model provided in this embodiment may implement the method for generating a derivative label of a rating card model provided in any embodiment of the present invention, and has corresponding functions and structures of an execution method. As shown in fig. 3, the device for generating a derivative label of a rating card model according to a third embodiment of the present invention includes: the system comprises a sample data acquisition module 310, an analysis parameter acquisition module 320, a sorting parameter acquisition module 330, a sample derivative tag generation module 340 and a score card model derivative tag generation module.
The sample data obtaining module 310 is configured to obtain sample data;
the analysis parameter obtaining module 320 is configured to obtain analysis parameters of the sample data through correlation analysis;
the sorting parameter obtaining module 330 is configured to obtain a sorting parameter of the analysis parameter according to a preset rule;
the sample derivative tag generation module 340 is configured to perform binning operation on the sample data according to the sorting parameter, and generate a sample derivative tag;
the scoring card model derivative tag generating module 350 is configured to generate a derivative tag of the scoring card model according to the sample derivative tag.
Further, the analysis parameter obtaining module 320 is specifically configured to:
and obtaining the analysis parameters of the sample data through an Apriori association algorithm.
Further, the analysis parameters include: one or more of support, confidence, coverage, strength, boost, and availability.
Further, the sorting parameter obtaining module 330 is specifically configured to:
and obtaining the sequencing parameters of the analysis parameters by an entropy weight method.
Further, the scoring card model derivative tag generating module 350 is specifically configured to:
judging whether preset scoring card model data are in a range determined by the sample derivative label;
and if the preset scoring card model data is in the range determined by the sample derivative label, taking the sample derivative label as the derivative label of the scoring card model.
Further, the method also comprises the following steps:
and the preprocessing module is used for preprocessing the sample data.
Further, the method also comprises the following steps:
and the performance detection module is used for performing logistic regression calculation on the derived tags to determine the performance of the scoring card model after the derived tags are included.
The scoring card model derivative tag generation device provided by the third embodiment of the invention enables the scoring card model to be not limited to the univariate analysis of tags when performing characteristic engineering, can generate derivative tags with strong directivity, and provides a large number of available tag bins for subsequent scoring card models, thereby improving the accuracy of the scoring card model.
Example four
Fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention. FIG. 4 illustrates a block diagram of an exemplary computer device 412 suitable for use in implementing embodiments of the present invention. The computer device 412 shown in FIG. 4 is only one example and should not impose any limitations on the functionality or scope of use of embodiments of the present invention.
As shown in fig. 4, the computer device 412 is in the form of a general purpose computer device. Components of computer device 412 may include, but are not limited to: one or more processors 416, a storage device 428, and a bus 418 that couples the various system components including the storage device 428 and the processors 416.
Bus 418 represents one or more of any of several types of bus structures, including a memory device bus or memory device controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer device 412 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 412 and includes both volatile and nonvolatile media, removable and non-removable media.
Storage 428 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 430 and/or cache Memory 432. The computer device 412 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 434 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk such as a Compact disk Read-Only Memory (CD-ROM), Digital Video disk Read-Only Memory (DVD-ROM) or other optical media may be provided. In these cases, each drive may be connected to bus 418 by one or more data media interfaces. Storage 428 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 440 having a set (at least one) of program modules 442 may be stored, for instance, in storage 428, such program modules 442 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. The program modules 442 generally perform the functions and/or methodologies of the described embodiments of the invention.
The computer device 412 may also communicate with one or more external computer devices 414 (e.g., keyboard, pointing computer device, display 424, etc.), with one or more computer devices that enable a user to interact with the computer device 412, and/or with any computer device (e.g., network card, modem, etc.) that enables the computer device 412 to communicate with one or more other computer devices. Such communication may occur via input/output (I/O) interfaces 422. Also, computer device 412 may communicate with one or more networks (e.g., a Local Area Network (LAN), Wide Area Network (WAN), and/or a public Network, such as the internet) through Network adapter 420. As shown in FIG. 4, network adapter 420 communicates with the other modules of computer device 412 via bus 418. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the computer device 412, including but not limited to: microcode, computer device drivers, Redundant processors, external disk drive Arrays, Redundant Array of Independent Disks (RAID) systems, tape drives, and data backup storage systems, to name a few.
The processor 416 executes programs stored in the storage device 428 to perform various functional applications and data processing, for example, implement a scoring card model derivative tag generation method provided by any embodiment of the present invention, which may include:
acquiring sample data;
obtaining analysis parameters of the sample data through correlation analysis;
obtaining a sorting parameter of the analysis parameter through a preset rule;
carrying out box separation operation on the sample data according to the sorting parameters to generate a sample derivative label;
and generating a derivative label of the scoring card model according to the sample derivative label.
EXAMPLE five
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a method for generating a derivative tag of a scorecard model according to any embodiment of the present invention, where the method may include:
acquiring sample data;
obtaining analysis parameters of the sample data through correlation analysis;
obtaining a sorting parameter of the analysis parameter through a preset rule;
carrying out box separation operation on the sample data according to the sorting parameters to generate a sample derivative label;
and generating a derivative label of the scoring card model according to the sample derivative label.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or terminal. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A scoring card model derivative label generation method is characterized by comprising the following steps:
acquiring sample data;
obtaining analysis parameters of the sample data through correlation analysis;
obtaining a sorting parameter of the analysis parameter through a preset rule;
carrying out box separation operation on the sample data according to the sorting parameters to generate a sample derivative label;
and generating a derivative label of the scoring card model according to the sample derivative label.
2. The method of claim 1, wherein said obtaining analysis parameters for said sample data by correlation analysis comprises:
and obtaining the analysis parameters of the sample data through an Apriori association algorithm.
3. The method of claim 2, wherein the analysis parameters comprise: one or more of support, confidence, coverage, strength, boost, and availability.
4. The method of claim 1, wherein the obtaining the ranking parameter of the analysis parameter by a preset rule comprises:
and obtaining the sequencing parameters of the analysis parameters by an entropy weight method.
5. The method of claim 1, wherein generating derivative tags for a scoring card model from the sample derivative tags comprises:
judging whether preset scoring card model data are in a range determined by the sample derivative label;
and if the preset scoring card model data is in the range determined by the sample derivative label, taking the sample derivative label as the derivative label of the scoring card model.
6. The method of claim 1, wherein prior to obtaining analysis parameters for the sample data by correlation analysis, further comprising:
and preprocessing the sample data.
7. The method of claim 1, wherein after generating derivative tags for a scoring card model from the sample derivative tags, further comprising:
performing logistic regression calculations on the derived tags to determine the performance of the scoring card model after including the derived tags.
8. A score card model derivative tag generation apparatus, comprising:
the sample data acquisition module is used for acquiring sample data;
the analysis parameter acquisition module is used for acquiring analysis parameters of the sample data through correlation analysis;
the sequencing parameter acquisition module is used for acquiring sequencing parameters of the analysis parameters through a preset rule;
the sample derived tag generation module is used for performing box separation operation on the sample data according to the sorting parameters to generate sample derived tags;
and the scoring card model derivative label generating module is used for generating a derivative label of the scoring card model according to the sample derivative label.
9. A computer device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the scorecard model derivative tag generation method of any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the scorecard model derivative tag generation method according to any one of claims 1 to 7.
CN201911096223.1A 2019-11-11 2019-11-11 Method, device, equipment and storage medium for generating rating card model derivative label Pending CN110879821A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911096223.1A CN110879821A (en) 2019-11-11 2019-11-11 Method, device, equipment and storage medium for generating rating card model derivative label

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911096223.1A CN110879821A (en) 2019-11-11 2019-11-11 Method, device, equipment and storage medium for generating rating card model derivative label

Publications (1)

Publication Number Publication Date
CN110879821A true CN110879821A (en) 2020-03-13

Family

ID=69729698

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911096223.1A Pending CN110879821A (en) 2019-11-11 2019-11-11 Method, device, equipment and storage medium for generating rating card model derivative label

Country Status (1)

Country Link
CN (1) CN110879821A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112232944A (en) * 2020-09-29 2021-01-15 中诚信征信有限公司 Scoring card creating method and device and electronic equipment
CN113923006A (en) * 2021-09-30 2022-01-11 北京淇瑀信息科技有限公司 Equipment data authentication method and device and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112232944A (en) * 2020-09-29 2021-01-15 中诚信征信有限公司 Scoring card creating method and device and electronic equipment
CN113923006A (en) * 2021-09-30 2022-01-11 北京淇瑀信息科技有限公司 Equipment data authentication method and device and electronic equipment
CN113923006B (en) * 2021-09-30 2024-02-02 北京淇瑀信息科技有限公司 Equipment data authentication method and device and electronic equipment

Similar Documents

Publication Publication Date Title
US20200410660A1 (en) Image segmentation into overlapping tiles
CN108108743B (en) Abnormal user identification method and device for identifying abnormal user
US7797260B2 (en) Automated document classifier tuning including training set adaptive to user browsing behavior
US20230385333A1 (en) Method and system for building training database using automatic anomaly detection and automatic labeling technology
CN110708285B (en) Flow monitoring method, device, medium and electronic equipment
CN113361593B (en) Method for generating image classification model, road side equipment and cloud control platform
CN116451139B (en) Live broadcast data rapid analysis method based on artificial intelligence
CN111931809A (en) Data processing method and device, storage medium and electronic equipment
CN113778894A (en) Test case construction method, device, equipment and storage medium
CN112148766A (en) Method and system for sampling data using artificial neural network model
CN113112497A (en) Industrial appearance defect detection method based on zero sample learning, electronic device and storage medium
CN110879821A (en) Method, device, equipment and storage medium for generating rating card model derivative label
CN112364708A (en) Multi-mode human body action recognition method based on knowledge distillation and antagonistic learning
CN115294397A (en) Classification task post-processing method, device, equipment and storage medium
CN113591998A (en) Method, device, equipment and storage medium for training and using classification model
CN116152576B (en) Image processing method, device, equipment and storage medium
CN114610924A (en) Commodity picture similarity matching search method and system based on multi-layer classification recognition model
US20230237371A1 (en) Systems and methods for providing predictions with supervised and unsupervised data in industrial systems
CN114237182B (en) Robot scheduling method and system
CN111461152A (en) Cargo detection method and device, electronic equipment and computer readable medium
CN115017385A (en) Article searching method, device, equipment and storage medium
CN114610953A (en) Data classification method, device, equipment and storage medium
CN115098681A (en) Open service intention detection method based on supervised contrast learning
CN114020916A (en) Text classification method and device, storage medium and electronic equipment
CN108280531B (en) Student class score ranking prediction method based on Lasso regression

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination