CN109858633A - A kind of characteristic information recognition methods and system - Google Patents
A kind of characteristic information recognition methods and system Download PDFInfo
- Publication number
- CN109858633A CN109858633A CN201910132261.1A CN201910132261A CN109858633A CN 109858633 A CN109858633 A CN 109858633A CN 201910132261 A CN201910132261 A CN 201910132261A CN 109858633 A CN109858633 A CN 109858633A
- Authority
- CN
- China
- Prior art keywords
- unique identification
- discrete
- continuous
- data unit
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
The present invention provides a kind of characteristic information recognition methods and systems, comprising: obtains the corresponding discrete data unit of the first unique identification and continuous data unit of data group to be predicted;The corresponding discrete data unit of first unique identification is inputted into preset discrete model and calculates corresponding first predicted value of the preset discrete model of generation;First predicted value includes: the first unique identification;Preset continuous model is inputted after the corresponding continuous data unit of first unique identification and the first predicted value are merged calculates corresponding second predicted value of the preset continuous model of generation;Second predicted value includes: the first unique identification;The corresponding characteristic information of data group to be predicted is generated according to the corresponding data group to be predicted of the first unique identification and the second predicted value.The application can be improved machine learning algorithm to not only including discrete data but also including the data-handling efficiency of continuous data, to improve the efficiency for carrying out characteristic information identification using the machine learning algorithm.
Description
Technical field
The present invention relates to computer data processing technology field more particularly to a kind of characteristic information recognition methods and systems.
Background technique
Currently, having the main machine learning algorithm of two classes in machine learning field: suitable for discrete data algorithm and
Suitable for the algorithm of continuous data, these two types of algorithm existing defects are as follows:
1, suitable for the machine learning algorithm of discrete data (such as: logistic regression), existing defects are as follows: need in advance to sample
It (both includes discrete data in sample data, also comprising continuous that continuity data in notebook data carry out discretization sometimes
Type data), but the selection of discrete logarithm (including: a point bucket, segmentation, LOG processing etc.) will affect final assessment result.It is applicable in
In the machine learning algorithm complex disposal process of discrete data, select to need in discretization algorithmic procedure by test of many times,
It could assess and obtain preferably discretization algorithm.
2, suitable for the machine learning algorithm of continuous data (such as: GBDT algorithm), existing defects are as follows: in model training or
During prediction, the decision tree of GBDT needs to carry out discrete data " right and wrong " processing in logic.Work as discrete data
Type (such as occupation type has teacher, doctor, engineer, peasant, worker, director, performer) it is very more when, will lead to
GBDT decision tree becomes very huge, will greatly reduce the treatment effeciency for being suitable for the machine learning algorithm of continuous data.
Therefore, for not only including discrete data but also including the data of continuous data, existing machine learning algorithm
Complex disposal process and efficiency is lower will lead to and carry out that characteristic information recognition efficiency is low to ask using the machine learning algorithm
Topic.
Summary of the invention
In order to solve defect in the prior art, the present invention provides a kind of characteristic information recognition methods and systems, can
Effectively improve the efficiency that characteristic information identification is carried out using machine learning algorithm.
To achieve the goals above, the present invention provides a kind of characteristic information recognition methods, this method comprises:
Obtain the corresponding discrete data unit of the first unique identification and continuous data unit of data group to be predicted;
It is described pre- that the corresponding discrete data unit of first unique identification is inputted into preset discrete model calculating generation
If corresponding first predicted value of discrete model;First predicted value includes: first unique identification;
It is inputted after the corresponding continuous data unit of first unique identification and first predicted value are merged preset
Continuous model, which calculates, generates corresponding second predicted value of the preset continuous model;Second predicted value includes: described
One unique identification;
It is generated according to the corresponding data group to be predicted of first unique identification and second predicted value described to be predicted
The corresponding characteristic information of data group.
The present invention also provides a kind of characteristic information identifying system, which includes:
Acquiring unit, for obtaining the corresponding discrete data unit of the first unique identification and consecutive numbers of data group to be predicted
According to unit;
First generation unit, for the corresponding discrete data unit of first unique identification to be inputted preset walk-off-mode
Type, which calculates, generates corresponding first predicted value of the preset discrete model;First predicted value includes: described first unique
Mark;
Second generation unit is used for the corresponding continuous data unit of first unique identification and first predicted value
Preset continuous model calculating is inputted after merging generates corresponding second predicted value of the preset continuous model;Described second is pre-
Measured value includes: first unique identification;
Third generation unit, for according to the corresponding data group to be predicted of first unique identification and second prediction
Value generates the corresponding characteristic information of the data group to be predicted.
The present invention also provides a kind of electronic equipment, comprising: memory, processor and storage on a memory and can located
The computer program run on reason device, the processor realize the step of the characteristic information recognition methods when executing described program
Suddenly.
The present invention provides a kind of computer readable storage mediums, are stored thereon with computer program, the computer program
The step of characteristic information recognition methods is realized when being executed by processor.
A kind of characteristic information recognition methods provided by the invention and system, comprising: obtain the first of data group to be predicted only
The corresponding discrete data unit of one mark and continuous data unit;The corresponding discrete data unit of first unique identification is defeated
Enter preset discrete model calculating and generates corresponding first predicted value of the preset discrete model;The first predicted value packet
It includes: first unique identification;The corresponding continuous data unit of first unique identification and first predicted value are merged
After input preset continuous model and calculate and generate corresponding second predicted value of the preset continuous model;Second predicted value
It include: first unique identification;According to the corresponding data group to be predicted of first unique identification and second predicted value
Generate the corresponding characteristic information of the data group to be predicted.The application can be improved machine learning algorithm to both comprising discrete type number
According to the data-handling efficiency for again including continuous data, characteristic information identification is carried out using the machine learning algorithm to improve
Efficiency.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is a kind of characteristic information recognition methods flow chart of the application;
Fig. 2 is the characteristic information recognition methods flow chart in one embodiment of the application;
Fig. 3 is the flow chart of the step S201 in one embodiment of the application;
Fig. 4 is the flow chart of the step S205 in one embodiment of the application;
Fig. 5 is the flow chart of the step S207 in one embodiment of the application;
Fig. 6 is the flow chart of the step S209 in one embodiment of the application;
Fig. 7 is the flow chart of the step S211 in one embodiment of the application;
Fig. 8 is the flow chart of the step S212 in one embodiment of the application;
Fig. 9 is the fraud characteristic information recognition methods flow chart in another embodiment of the application;
Figure 10 is the corresponding discrete training pattern M-S1 and each first of logistic regression algorithm S1 in one embodiment of the application
The generating process schematic diagram of training predicted value;
Figure 11 is the corresponding merging instruction of each unique identification K1-i in the logistic regression algorithm S1 in one embodiment of the application
Practice the generating process schematic diagram of data cell T13-S1-i;
Figure 12 is the generating process schematic diagram of the continuous training pattern M-L1 in one embodiment of the application;
Figure 13 is the structural schematic diagram of the characteristic information identification model Zj in one embodiment of the application;
Figure 14 is the corresponding each first verifying predicted value Y1-M1-i of the discrete training pattern M-S1 in one embodiment of the application
Generating process schematic diagram;
Figure 15 is the merging verification data unit T23-S1-i in the discrete training pattern M-S1 in one embodiment of the application
Generating process schematic diagram;
Figure 16 is each second verifying predicted value Y2-M1-i in the continuous training pattern M-L1 in one embodiment of the application
Generating process schematic diagram;
Figure 17 is the generating process signal of each difference V1-i in the continuous training pattern M-L1 in one embodiment of the application
Figure;
Figure 18 is that the generating process of the corresponding first predicted value C1-i of discrete model M-S2 in one embodiment of the application is shown
It is intended to;
Figure 19 is that the generating process of the corresponding second predicted value C2-i of continuous model M-L2 in one embodiment of the application is shown
It is intended to;
Figure 20 is a kind of structural schematic diagram of characteristic information identifying system of the application.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
About " first " used herein, " second " ... etc., not especially censure the meaning of order or cis-position,
Also non-to limit the present invention, only for distinguishing with the element of same technique term description or operation.
It is open term, i.e., about "comprising" used herein, " comprising ", " having ", " containing " etc.
Mean including but not limited to.
About it is used herein " and/or ", including any of the things or all combination.
In view of the deficiencies in the prior art, a kind of characteristic information recognition methods provided by the invention, flow chart is such as
Shown in Fig. 1, this method comprises:
S101: the corresponding discrete data unit of the first unique identification and continuous data unit of data group to be predicted are obtained.
Wherein, data group to be predicted is multiple groups, and the application is not limited.
When it is implemented, the specific implementation procedure of step S101 is as follows:
Firstly, one first unique identification is respectively set for each data group to be predicted obtained.Wherein, each data to be predicted
Group includes: several fisrt feature data.
Secondly, according to the data type of each fisrt feature data of each data group to be predicted by each data group to be predicted
It splits and generates the corresponding discrete data unit of each data group to be predicted and continuous data unit.Wherein, each discrete data list
Member includes: the fisrt feature of one first unique identification and the discrete type in the corresponding data group to be predicted of the first unique identification
Data.The corresponding continuous data unit of each discrete data unit includes: that the first unique identification and the first unique identification are corresponding
Data group to be predicted in continuous type fisrt feature data.
S102: it is preset that the corresponding discrete data unit of the first unique identification is inputted into preset discrete model calculating generation
Corresponding first predicted value of discrete model.Wherein, the first predicted value includes: the first unique identification.
S103: it is inputted after the corresponding continuous data unit of the first unique identification and the first predicted value are merged preset continuous
Model, which calculates, generates corresponding second predicted value of preset continuous model.Wherein, the second predicted value includes: the first unique identification.
S104: data group pair to be predicted is generated according to the corresponding data group to be predicted of the first unique identification and the second predicted value
The characteristic information answered.
Process as shown in Figure 1 it is found that the application by the first unique identification for obtaining data group to be predicted it is corresponding from
Dissipate data cell and continuous data unit;The corresponding discrete data unit of first unique identification is inputted into preset discrete model meter
It calculates and generates corresponding first predicted value of preset discrete model;The corresponding continuous data unit of first unique identification and first is pre-
Measured value inputs preset continuous model and calculates corresponding second predicted value of the preset continuous model of generation after merging;Only according to first
The corresponding data group to be predicted of one mark and the second predicted value generate the corresponding characteristic information of data group to be predicted, to make this Shen
Please have fitting phenomenon probability of happening, machine learning algorithm caused by reducing discrete data simple and machine learning algorithm effect
The high beneficial effect of rate, to improve the efficiency for carrying out characteristic information identification using machine learning algorithm in the application.
In order to make those skilled in the art be better understood by the present invention, a more detailed embodiment is set forth below,
As shown in Fig. 2, a kind of characteristic information recognition methods provided in an embodiment of the present invention, this method need to be implemented before prediction process
Then training process and verification process execute prediction process, packet according to the optimal characteristics information identification model of verification process output
Include following execution step:
Step 1: prediction process
S201: the corresponding discrete data unit of the first unique identification and continuous data unit of data group to be predicted are obtained.
Wherein, data group to be predicted is multiple groups, and the application is not limited.
When it is implemented, as shown in figure 3, the specific implementation procedure of step S201 is as follows:
S301: one first unique identification is respectively set for each data group to be predicted of acquisition.Wherein, each data to be predicted
Group includes: several fisrt feature data.
S302: according to the data type of each fisrt feature data of each data group to be predicted by each data group to be predicted
It splits and generates the corresponding discrete data unit of each data group to be predicted and continuous data unit.
Wherein, each discrete data unit includes: that one first unique identification and the first unique identification are corresponding to be predicted
The fisrt feature data of discrete type in data group.The corresponding continuous data unit of each discrete data unit includes:
The fisrt feature data of one unique identification and the continuous type in the corresponding data group to be predicted of the first unique identification.
S202: the corresponding discrete data unit of each first unique identification is inputted into preset discrete model and calculates generation
Corresponding first predicted value of each first unique identification in preset discrete model.Wherein, the first predicted value includes: first unique
Mark.
When it is implemented, preset discrete model is using existing discrete logarithms such as logistic regression algorithms, the application is not with this
It is limited.
S203: it is inputted after the corresponding continuous data unit of the first unique identification and the first predicted value are merged preset continuous
Model, which calculates, generates corresponding second predicted value of preset continuous model.Wherein, the second predicted value includes: the first unique identification.
When it is implemented, preset continuous model is using any one existing continuation algorithm such as GBDT algorithm, the application
It is not limited.
S204: data group pair to be predicted is generated according to the corresponding data group to be predicted of the first unique identification and the second predicted value
The characteristic information answered.
Step 2: training process
S205: the corresponding discrete training data unit of the second unique identification and continuous training data of training data group are obtained
Unit.
When it is implemented, training data group be multiple groups, each training data group include: a discrete training data unit and
One continuous training data unit.Discrete training data unit and continuous training data unit pass through corresponding training data group
Second unique identification realizes one-to-one relationship.
As shown in figure 4, the specific implementation procedure of step S205 is as follows:
S401: one second unique identification and a fisrt feature markup information is respectively set for each training data group of acquisition.
Wherein, each training data group includes: several second feature data.Second unique identification and fisrt feature markup information have
One-to-one relationship.
S402: each training data group is split according to the data type of each second feature data of each training data group
Generate the corresponding discrete training data unit of each training data group and continuous training data unit.
Wherein, each discrete training data unit includes: one second unique identification, the corresponding training of the second unique identification
The fisrt feature of the corresponding training data group of second feature data and the second unique identification of discrete type in data group marks
Information.The corresponding continuous training data unit of each discrete training data unit includes: the second unique identification, second unique mark
Know the second feature data and the second unique identification corresponding training data group of the continuous type in corresponding training data group
Fisrt feature markup information.
The quantity of discrete training data unit in each training data group is equal with the quantity of continuous training data unit,
And the quantity of each discrete training data unit, each continuous training data unit and each training data group pass through each training data group
Second unique identification realizes one-to-one relationship.
S206: the corresponding discrete each preset discrete logarithm of training data unit input of the second unique identification is calculated and is generated
Each corresponding discrete training pattern of preset discrete logarithm and the corresponding first training predicted value of each preset discrete logarithm.
Wherein, the first training predicted value includes: the second unique identification.
Wherein, preset discrete logarithm is multiple existing discrete logarithms, such as logistic regression algorithm, NB Algorithm
And decision Tree algorithms etc., the application is not limited.
When it is implemented, the corresponding discrete training data unit input logic regression algorithm of each second unique identification is calculated
Generate the corresponding discrete training pattern of logistic regression algorithm and the corresponding first training predicted value of logistic regression algorithm.
The corresponding discrete training data unit input naive Bayesian discrete logarithm of each second unique identification is calculated and is generated
The corresponding discrete training pattern of naive Bayesian discrete logarithm and the corresponding first training predicted value of naive Bayesian discrete logarithm,
And so on, the corresponding discrete training data unit of each second unique identification is inputted into other preset discrete logarithms and calculates difference
It generates other preset corresponding discrete training patterns of discrete logarithm and other preset discrete logarithms corresponding first is trained in advance
Measured value.
Wherein, each first training predicted value and each second unique identification have one-to-one relationship, and each first instruction
Practicing predicted value and each discrete logarithm also has one-to-one relationship.
S207: the corresponding continuous training data unit of the second unique identification is each with each preset discrete logarithm respectively
It is each preset discrete that the corresponding first training predicted value of second unique identification inputs preset continuation algorithm calculating generation after merging
The corresponding continuous training pattern of the discrete training pattern of algorithm.
Wherein, preset continuation algorithm is multiple known continuation algorithms, such as GBDT algorithm, linear regression algorithm, K-
Means algorithm etc., the application is not limited.
As shown in figure 5, when step S207 is specifically executed the following steps are included:
S501: respectively by the corresponding continuous training data unit of each second unique identification and each preset discrete logarithm
The corresponding first training predicted value of each second unique identification merge each second unique mark for generating each preset discrete logarithm
Know corresponding merging training data unit.
Wherein, merging training data unit and preset discrete logarithm has one-to-one relationship, and each preset
There is one-to-one relationship with the second unique identification in discrete logarithm.
When it is implemented, by each of the corresponding continuous training data unit of each second unique identification and logistic regression algorithm
The corresponding first training predicted value of second unique identification merges the corresponding conjunction of each second unique identification for generating logistic regression algorithm
And training data unit.
By the corresponding continuous training data unit of each second unique identification and each the second of naive Bayesian discrete logarithm
Each second unique identification that the corresponding first training predicted value of unique identification merges generation naive Bayesian discrete logarithm is corresponding
Merge training data unit.And so on, the corresponding continuous training data unit of each second unique identification is preset with other
Discrete logarithm each second unique identification it is corresponding first training predicted value merge generate other preset discrete logarithms it is each
The corresponding merging training data unit of second unique identification.
S502: each merging training data unit is inputted into preset continuation algorithm and calculates each preset discrete logarithm of generation
The corresponding continuous training pattern of discrete training pattern.Wherein, it may be one that preset continuation algorithm, which can be multiple, this
Application is not limited.
Wherein, preset continuation algorithm includes: GBDT algorithm, linear regression algorithm and K-means algorithm etc., and the application is not
As limit.
When it is implemented, the corresponding merging training data unit of each second unique identification of logistic regression algorithm is inputted
GBDT algorithm calculates the corresponding continuous training pattern of discrete training pattern for generating logistic regression algorithm.
Wherein, the corresponding merging training data unit of each second unique identification of different preset discrete logarithms can input
Identical preset continuation algorithm (as shown in example 1) can also input different preset continuation algorithm (as shown in example 2).
Example 1: the corresponding merging training data unit of each second unique identification of logistic regression algorithm is inputted into GBDT algorithm
Calculate the corresponding continuous training pattern of discrete training pattern for generating logistic regression discrete logarithm.And so on, other are preset
Each second unique identification corresponding merging training data unit input GBDT algorithm of discrete logarithm calculate that generate other default
Discrete logarithm the corresponding continuous training pattern of discrete training pattern.
Example 2: the corresponding merging training data unit of each second unique identification of logistic regression algorithm is inputted into GBDT algorithm
Calculate the corresponding continuous training pattern of discrete training pattern for generating logistic regression algorithm.
The corresponding merging training data unit input linear of each second unique identification of NB Algorithm is returned and is calculated
Method calculates the corresponding continuous training pattern of discrete training pattern for generating NB Algorithm.
The corresponding merging training data unit input GBDT of each second unique identification of other preset discrete logarithms is calculated
Any one in method, linear regression algorithm or K-means algorithm calculates the discrete trained mould for generating other preset discrete logarithms
The corresponding continuous training pattern of type.
S208: by each discrete training pattern respectively continuous training pattern corresponding with each discrete training pattern merge generate it is each
Characteristic information identification model.
When it is implemented, characteristic information identification model be it is multiple, each characteristic information identification model includes: a discrete instruction
Practice model and the corresponding continuous training pattern of the discrete training pattern.
By the corresponding discrete training pattern of logistic regression algorithm company corresponding with the discrete training pattern of logistic regression algorithm
Continuous training pattern, which merges, generates a characteristic information identification model.
By the discrete training of naive Bayesian discrete logarithm corresponding discrete training pattern and naive Bayesian discrete logarithm
The corresponding continuous training pattern of model, which merges, generates another feature information identification model.
And so on, by the corresponding discrete training pattern of discrete logarithm of other reservations and the discrete logarithms of other reservations
The corresponding continuous training pattern of discrete training pattern, which merges, generates other characteristic information identification models.
Step 3: verification process
S209: third unique identification, second feature markup information and the third unique identification for obtaining verify data group are corresponding
Discrete verification data unit and continuous verification data unit.
Wherein, verify data group is multiple groups, and each verify data group includes: a discrete verification data unit and a company
Continuous verification data unit.Discrete verification data unit and continuous verification data unit by the third of corresponding verify data group only
One mark realizes one-to-one relationship;Second feature markup information and third unique identification have one-to-one relationship.
When it is implemented, as shown in fig. 6, the specific implementation procedure of step S209 is as follows:
S601: a third unique identification and a second feature markup information is respectively set for each verify data group of acquisition.
Wherein, each verify data group includes: several third feature data.
S602: each verify data group is split according to the data type of each third feature data of each verify data group
Generate the discrete verification data unit and continuous verification data unit of each verify data group.
Wherein, each discrete verification data unit includes: a third unique identification, the corresponding verifying of third unique identification
The second feature of the corresponding verify data group of third feature data and third unique identification of discrete type in data group marks
Information.The corresponding continuous verification data unit of each discrete verification data unit includes: that third unique identification, third are uniquely marked
Know the third feature data and third unique identification corresponding verify data group of the continuous type in corresponding verify data group
Second feature markup information.
S210: the corresponding discrete verification data unit of third unique identification is inputted into each discrete training pattern respectively and calculates life
Predicted value is verified at each discrete training pattern corresponding first.Wherein, the first verifying predicted value includes: third unique identification.
S211: it is inputted after the corresponding continuous verification data unit of third unique identification is merged with the first verifying predicted value every
It is corresponding that the corresponding continuous training pattern of discrete training pattern of a first verifying predicted value calculates each continuous training pattern of generation
Second verifying predicted value.Wherein, the second verifying predicted value includes: third unique identification.
When it is implemented, as shown in fig. 7, when step S211 is specifically executed the following steps are included:
S701: respectively by each of the corresponding continuous verification data unit of each third unique identification and each discrete training pattern
Each third unique identification that the corresponding first verifying predicted value of third unique identification merges each discrete training pattern of generation is corresponding
Merging verification data unit.
S702: the corresponding merging verification data unit of each third unique identification of each discrete training pattern is inputted each
The corresponding continuous training pattern of discrete training pattern calculates each third unique identification corresponding for generating each continuous training pattern
Two verifying predicted values.
S212: according to the corresponding second feature markup information of each third unique identification with it is each in each continuous training pattern
The corresponding second verifying predicted value of third unique identification generates optimal characteristics information identification model.
Wherein, optimal characteristics information identification model includes: preset discrete model and preset continuous model.
When it is implemented, the application is not limited as shown in figure 8, the execution of step S212 detailed process is as follows:
S801: by each the in the corresponding second feature markup information of each third unique identification and each continuous training pattern
The corresponding second verifying predicted value of three unique identifications makes the difference each third unique identification generated in each continuous training pattern respectively
Corresponding difference.
S802: each continuous instruction is done and generated to the corresponding difference of each third unique identification in each continuous training pattern
Practice the validation value of the corresponding characteristic information identification model of model.
S803: the validation value of each characteristic information identification model is ranked up, and the corresponding feature of the smallest validation value is believed
Identification model is ceased as optimal characteristics information identification model.
In one embodiment, characteristic information includes: fraud characteristic information, potential big customer's characteristic information etc., the application
It is not limited.
In order to make those skilled in the art be better understood by the present invention, a more detailed scene is set forth below and implements
Example one.
As shown in figure 9, a kind of fraud characteristic information recognition methods provided in an embodiment of the present invention, this method includes following step
It is rapid:
Step 1: training process
Unique identification Ki is set as positive integer, feature markup information Bi includes: 1 and 0, and wherein i is just whole more than or equal to 1
Number.Wherein, training data group is represented when feature markup information Bi is 1 as the training data group with fraud feature, feature marks
Information Bi represents training data group as the training data group without fraud feature when being 0.Training data with fraud feature
Group will be identified that fraud client, and the training data group without fraud feature will be identified that non-fraud client.
S901: a unique identification K1 and fisrt feature markup information B1 is respectively set for each training data group H of acquisition.Respectively
Unique identification K1 and each fisrt feature markup information B1 has one-to-one relationship.
Wherein, unique identification K1 is positive integer, and fisrt feature markup information B1 includes: 1 and 0, and i is just more than or equal to 1
Integer.Wherein, training data group H is represented when fisrt feature markup information B1 is 1 as the training data group with fraud feature, the
One feature markup information B1 represents training data group H as the training data group without fraud feature when being 0.With fraud feature
Training data group will be identified that fraud client, the training data group without fraud feature will be identified that non-fraud is objective
Family.
Each training data group H includes: several characteristic T1, as shown in table 1.
Table 1
Wherein, training data group H includes: the characteristics T1 such as age, city, occupation and income, the application not as
Limit.The city T1- and T1- occupation are set as discrete type characteristic, T1- age and T1- income are continuous type feature data.
Training data group H is multiple groups, comprising: H1, H2 ..., H99999999, the application is not limited.
S902: each training data group H is split according to the data type of each characteristic T1 of each training data group H
Generate each training data group H corresponding discrete training data unit T11 and continuous training data unit T12.
Wherein, each discrete training data unit T11-i includes: unique identification K1-i, unique identification K1-i corresponding
The first of the corresponding training data group Hi of characteristic T1 and unique identification K1-i of discrete type in training data group Hi is special
Markup information B1-i is levied, wherein i is the positive integer more than or equal to 1.
Specifically, discrete training data unit T11-i include: the city characteristic T1-, it is characteristic T1- occupation, unique
Identify K1-i and fisrt feature markup information B1-i.As shown in table 2, the corresponding discrete training data unit of each training data group Hi
T11-i be respectively T11-1, T11-2 ..., T11-99999999, wherein i=1,2 ... ..., 99999999.
Table 2
Training data group H | Discrete training data unit T11 | Unique identification K1 | The city T1- | T1- occupation | Fisrt feature markup information B1 |
H1 | T11-1 | 00000001 | 021 (Shanghai) | 0 (teacher) | 0 |
H 2 | T11-2 | 00000002 | 010 (Beijing) | 1 (doctor) | 0 |
…… | …… | …… | …… | …… | …… |
H 99999999 | T11-H99999999 | 99999999 | 020 (Guangzhou) | 5 (unemployed) | 1 |
The corresponding continuous training data unit T12-i of each discrete training data unit T11-i includes: unique identification
The characteristic T1 and unique identification K1-i of continuous type in the corresponding training data group Hi of K1-i, unique identification K1-i are corresponding
Training data group Hi fisrt feature markup information B1-i, wherein i is positive integer more than or equal to 1, as shown in table 3.
Table 3
Training data group H | Continuous training data unit T12 | Unique identification K1 | The T1- age | T1- income | Fisrt feature markup information B1 |
H1 | T12-1 | 00000001 | 20 | 100000 | 0 |
H 2 | T12-2 | 00000002 | 60 | 150000 | 0 |
…… | …… | …… | …… | …… | …… |
H 99999999 | T12-H99999999 | 99999999 | 30 | 2000 | 1 |
Specifically, continuous training data unit T12-i include: the characteristic T1- age, it is characteristic T1- income, unique
Identify K1-i and fisrt feature markup information B1-i.As shown in table 3.Wherein, the corresponding continuous trained number of each training data group Hi
According to unit T12-i be respectively T12-1, T12-2 ..., T12-99999999, wherein i=1,2 ... ..., 99999999.
The quantity of discrete training data unit T11-i in each training data group Hi and continuous training data unit T12-
The quantity of i is equal, and the quantity of each discrete training data unit T11-i, each continuous training data unit T12-i and each trained number
One-to-one relationship is realized by the unique identification K1-i of each training data group Hi according to group Hi.
S903: it is each pre- that each discrete each preset discrete logarithm Sj of training data unit T11-i input is calculated into generation
If discrete logarithm Sj corresponding discrete training pattern M-Sj and each preset discrete logarithm Sj it is corresponding first training prediction
Value X-Sj-i, i and j are the positive integer more than or equal to 1.Wherein, the first training predicted value X-Sj-i includes: unique identification K1-i
And predicted value Xi.Wherein, i=1,2,3 ... ... 99999999.
When it is implemented, preset discrete logarithm Sj includes: logistic regression algorithm S1, NB Algorithm S2 and decision
Tree algorithm S3 etc., the application is not limited, wherein j=1,2,3 ... ....
As shown in Figure 10, by the corresponding T11-1 of unique identification K1-1, the corresponding T11-2 of unique identification K1-2 ... and
The equal input logic regression algorithm S1 of the corresponding T11-99999999 of unique identification K1-99999999, which is calculated, generates logistic regression algorithm
Each unique identification K1-i corresponding first trains predicted value in S1 corresponding discrete training pattern M-S1 and logistic regression algorithm S1
X-S1-1, first training predicted value X-S1-2 ... and first training predicted value X-S1-99999999.
Wherein, the first training predicted value X-S1-1 includes: unique identification K1-1 and training result value X1, the first training prediction
Value X-S1-2 includes: unique identification K1-2 and training result value X2 ... ..., the first training predicted value X-S1-99999999 include:
Unique identification K1-99999999 and training result value X99999999, as shown in table 4.
Table 4
First training predicted value X-S1-i | Unique identification K1-i | Training result value Xi |
X-S1-1 | 00000001 | -2.45 |
X-S1-2 | 00000002 | -4.56 |
…… | …… | …… |
X-S1-99999999 | 99999999 | 10.23 |
It is corresponding each referring to the corresponding discrete training pattern M-S1 and logistic regression algorithm S1 of above-mentioned logistic regression algorithm S1
First training predicted value X-S1-i calculating process successively calculates and generates the corresponding discrete trained mould of each preset discrete logarithm Sj
The corresponding each first training predicted value X-Sj-i of type M-Sj and each preset discrete logarithm Sj.
Wherein, each first training predicted value X-Sj-i and each unique identification K1-i has one-to-one relationship, and each
First training predicted value X-Sj-i and each preset discrete logarithm Sj also has one-to-one relationship.
S904: respectively by the corresponding continuous training data unit T12-i of each unique identification K1-i and it is each it is preset from
The corresponding first training predicted value X-Sj-i of each unique identification K1-i for dissipating algorithm Sj, which merges, generates each preset discrete logarithm
The corresponding merging training data unit T13-Sj-i of each unique identification K1-i of Sj.Wherein, i, j are just whole more than or equal to 1
Number.
Wherein, merging training data unit T13-Sj-i and preset discrete logarithm Sj has a corresponding relationship, and with every
Unique identification K1-i in a preset discrete logarithm Sj has one-to-one relationship.
When it is implemented, as shown in figure 11, by the corresponding continuous training data unit T12-1 of unique identification K1-1 and logic
The corresponding first training predicted value X-S1-1 of unique identification K1-1 in regression algorithm S1, which merges, to be generated in logistic regression algorithm S1
The corresponding merging training data unit T13-S1-1 of unique identification K1-1;By the corresponding continuous training data of unique identification K1-2
Unit T12-2 first training predicted value X-S1-2 corresponding with the unique identification K1-2 in logistic regression algorithm S1, which merges to generate, to patrol
Collect the corresponding merging training data unit T13-S1-2 of unique identification K1-2 in regression algorithm S1;..., by unique identification K1-
Unique identification K1- in 99999999 corresponding continuous training data unit T12-99999999 and logistic regression algorithm S1
99999999 corresponding first training predicted value X-S1-99999999 merge the unique identification generated in logistic regression algorithm S1
The corresponding merging training data unit T13-S1-99999999 of K1-99999999.
And so on, the corresponding continuous training data unit T12-i of each unique identification K1-i is any pre- with other respectively
If discrete logarithm Sj in each unique identification K1-i it is corresponding first training predicted value X-Sj-i merge generate this it is preset from
Dissipate the corresponding merging training data unit T13-Sj-i of each unique identification K1-i in algorithm Sj.
S905: the preset continuation algorithm Lj of each merging training data unit T13-Sj-i input at least one is calculated and is generated often
The corresponding continuous training pattern M-Lj of the discrete training pattern M-Sj of a preset discrete logarithm Sj.
Wherein, it may be one that preset continuation algorithm Lj, which can be multiple, and the application is not limited.Preset company
Continuous algorithm Lj includes: GBDT algorithm L1, linear regression algorithm L2 and K-means algorithm L3 etc., and the application is not limited,
In, j=1,2,3 ....
When it is implemented, as shown in figure 12, by the corresponding merging training number of the unique identification K1-1 of logistic regression algorithm S1
According to the corresponding merging training data unit T13-S1-1 of unit T13-S1-1, unique identification K1-2 ... and unique identification K1-
99999999 corresponding merging training data unit T13-S1-99999999 input GBDT algorithm L1 and calculate generation logistic regression
The corresponding continuous training pattern M-L1 of the discrete training pattern M-S1 of algorithm S1.
In the present embodiment, the corresponding merging training number of each unique identification K1-i of the different preset discrete logarithm Sj of setting
Different preset continuation algorithm Lj is inputted according to unit T13-Sj-i, such as:
By corresponding merging training data unit T13-S2-1, the Wei Yibiao of the unique identification K1-1 of NB Algorithm S2
The corresponding merging training data unit T13-S2-1 of knowledge K1-2 ... and the corresponding merging training of unique identification K1-99999999
The equal input linear regression algorithm L2 of data cell T13-S2-99999999 calculates the discrete training for generating NB Algorithm S2
The corresponding continuous training pattern M-L2 of model M-S2.
S906: by each discrete training pattern M-Sj continuous training pattern M- corresponding with each discrete training pattern M-Sj respectively
Lj, which merges, generates each characteristic information identification model Zj.
As shown in figure 13, characteristic information identification model Zj be it is multiple, each characteristic information identification model Zj include: one from
Dissipate the training pattern M-Sj and corresponding continuous training pattern M-Lj of the discrete training pattern M-Sj.
When it is implemented, by the corresponding discrete training pattern M-S1 of logistic regression algorithm S1 and logistic regression algorithm S1 from
It dissipates the corresponding continuous training pattern M-L1 of training pattern M-S1 and merges one characteristic information identification model Z1 of generation.By simple pattra leaves
The corresponding discrete training pattern M-S2 of this algorithm S2 is corresponding with the discrete training pattern M-S2 of NB Algorithm S2 continuous
Training pattern M-L2, which merges, generates characteristic information identification model Z2, and so on, the discrete logarithm Sj that other are reserved is corresponding
Discrete training pattern M-Sj continuous training pattern M- corresponding with the discrete training pattern M-Sj for the discrete logarithm Sj that other are reserved
Lj, which merges, generates other characteristic information identification models Zj.
Step 2: verification process
S907: an a unique identification K2 and second feature markup information B2 is respectively set for each verify data group G of acquisition.
Each unique identification K2 and each second feature markup information B2 has one-to-one relationship.
Wherein, unique identification K2 is positive integer, and second feature markup information B2 includes: 1 and 0, and i is just more than or equal to 1
Integer.Wherein, verify data group G is represented when second feature markup information B2 is 1 as the verify data group with fraud feature, the
Two feature markup information B2 represent verify data group G as the verify data group without fraud feature when being 0.With fraud feature
Verify data group G will be identified that fraud client, the verify data group G without fraud feature will be identified that non-fraud is objective
Family.
Each verify data group G includes: several characteristic T2, as shown in table 5.
Wherein, verify data group G includes: the characteristics T2 such as age, city, occupation and income, the application not as
Limit.The city T2- and T2- occupation are set as discrete type characteristic, T2- age and T2- income are continuous type feature data.
Verify data group G is multiple groups, comprising: G1, G2 ..., G99999999, the application is not limited.
Table 5
S908: each verify data group G is split according to the data type of each characteristic T2 of each verify data group G
Generate the discrete verification data unit T21 and continuous verification data unit T21 of each verify data group G.
Wherein, each discrete verification data unit T21-i includes: unique identification K2-i, unique identification K2-i corresponding
The second feature of the corresponding verify data group G of characteristic T2 and unique identification K2-i of discrete type in verify data group G
Markup information B2-i, wherein i is the positive integer more than or equal to 1.
Specifically, discrete verification data unit T21-i include: the city characteristic T2-, it is characteristic T2- occupation, unique
Identify K2-i and second feature markup information B2-i.As shown in table 6, the corresponding discrete verification data unit of each verify data group Gi
T21-i be respectively T21-1, T21-2 ..., T21-99999999, wherein i=1,2 ... ..., 99999999.
Table 6
Verify data group G | Discrete verification data unit T21 | Unique identification K2 | The city T2- | T2- occupation | Second feature markup information B2 |
G1 | T21-1 | 00000001 | 021 (Shanghai) | 0 (teacher) | 0 |
G2 | T21-2 | 00000002 | 010 (Beijing) | 1 (doctor) | 0 |
…… | …… | …… | …… | …… | …… |
G99999999 | T21-99999999 | 99999999 | 020 (Guangzhou) | 5 (unemployed) | 1 |
Table 7
Verify data group G | Continuous verification data unit T22 | Unique identification K2 | The T2- age | T2- income | Second feature markup information B2 |
G1 | T22-1 | 00000001 | 20 | 100000 | 0 |
G2 | T22-2 | 00000002 | 60 | 150000 | 0 |
…… | …… | …… | …… | …… | …… |
G99999999 | T22-99999999 | 99999999 | 30 | 2000 | 1 |
The corresponding continuous verification data unit T22-i of each discrete verification data unit T21-i includes: unique identification
The characteristic T2 and unique identification K2-i of continuous type in the corresponding verify data group Gi of K2-i, unique identification K2-i are corresponding
Verify data group Gi second feature markup information B2-i, wherein i is positive integer more than or equal to 1, as shown in table 7.
Specifically, continuous verification data unit T22-i include: the characteristic T2- age, it is characteristic T2- income, unique
Identify K2-i and second feature markup information B2-i.As shown in table 7.Wherein, the corresponding continuous verifying number of each verify data group Gi
According to unit T22-i be respectively T22-1, T22-2 ..., T22-99999999, wherein i=1,2 ... ..., 99999999.
The quantity of discrete verification data unit T21-i in each verify data group Gi and continuous verification data unit T22-
The quantity of i is equal, and each discrete verification data unit T21-i, each continuous verification data unit T22-i and each verify data group Gi
One-to-one relationship is realized by the unique identification K2-i of each verify data group Gi.
S909: each each discrete training pattern M-Sj of discrete verification data unit T21-i input is calculated generate each from
Dissipating training pattern M-Sj corresponding each first verifying predicted value Y1-Mj-i, i and j is the positive integer more than or equal to 1.
Wherein, the first verifying predicted value Y1-Mj-i includes: unique identification K2-i and the first verification result value Y1-i.Wherein,
I=1,2,3 ... ... 99999999.
When it is implemented, as shown in figure 14, by discrete verification data unit T21-1, discrete verification data unit T21-
2 ... and discrete verification data unit T21-99999999 inputs discrete training pattern M-S1 and calculates the discrete training pattern of generation
M-S1 it is corresponding first verifying predicted value Y1-M1-1, first verifying predicted value Y1-M1-2 ... and first verifying predicted value Y1-
M1-99999999。
Wherein, the first verifying predicted value Y1-M1-1 includes: unique identification K2-1 and the first verification result value Y1-1, and first
Verifying predicted value Y1-M1-2 includes: unique identification K2-2 and the first verification result value Y1-2 ... ..., the first verifying predicted value Y1-
M1-99999999 includes: unique identification K2-99999999 and the first verification result value Y1-99999999, as shown in table 8.
Table 8
First verifying predicted value Y1-M1-i | Unique identification K2-i | First verification result value Y1-i |
Y1-M1-1 | 00000001 | -2.45 |
Y1-M1-2 | 00000002 | -4.56 |
…… | …… | …… |
Y1-M1-99999999 | 99999999 | 10.23 |
The calculating process of predicted value Y1-M1-i is verified referring to above-mentioned discrete training pattern M-S1 corresponding each first, successively
Calculating generates other discrete training pattern M-Sj corresponding each first and verifies predicted value Y1-Mj-i.Each first verifying predicted value Y1-
Mj-i and each unique identification K2-i has one-to-one relationship.
S910: respectively by the corresponding continuous verification data unit T22-i of each unique identification K2-i and each discrete training
The corresponding first verifying predicted value Y1-Mj-i of each unique identification K2-i in model M-Sj, which merges, generates each discrete training pattern
The corresponding merging verification data unit T23-Sj-i of each unique identification K2-i of M-Sj.Wherein, i, j are just more than or equal to 1
Integer.
Wherein, merging verification data unit T23-Sj-i and discrete training pattern M-Sj has a corresponding relationship, and with every
Each unique identification K2-i in a discrete training pattern M-Sj has one-to-one relationship.
When it is implemented, as shown in figure 15, by the corresponding continuous verification data unit T22-1 of unique identification K2-1 with it is discrete
The corresponding first verifying predicted value Y1-M1-1 of unique identification K2-1 in training pattern M-S1, which merges, generates discrete training pattern M-
The corresponding merging verification data unit T23-S1-1 of the unique identification K2-1 of S1;By the corresponding continuous verifying number of unique identification K2-2
Merge according to unit T22-2 first verifying predicted value Y1-M1-2 corresponding with the unique identification K2-2 in discrete training pattern M-S1
The corresponding merging verification data unit T23-S1-2 ... ... of unique identification K2-2 for generating discrete training pattern M-S1, will be unique
Identify unique mark in K2-99999999 corresponding continuous verification data unit T22-99999999 and discrete training pattern M-S1
Know the corresponding first verifying predicted value Y1-M1-99999999 of K2-99999999 and merges the unique of the discrete training pattern M-S1 of generation
Identify the corresponding merging verification data unit T23-S1-99999999 of K2-99999999.
And so on, by the corresponding continuous verification data unit T22-i of each unique identification K2-i and other any discrete instructions
Practice the corresponding first verifying predicted value Y1-Mj-i of each unique identification K2-i in model M-Sj and merges generation any discrete training
The corresponding merging verification data unit T23-Sj-i of each unique identification K2-i of model M-Sj.
S911: by the corresponding merging verification data unit T23- of each unique identification K2-i of each discrete training pattern M-Sj
Each continuous training pattern M-Lj of each discrete corresponding continuous training pattern M-Lj calculating generation of training pattern M-Sj of Sj-i input
The corresponding second verifying predicted value Y2-Mj-i, i and j of each unique identification K2-i be positive integer more than or equal to 1.
Wherein, the second verifying predicted value Y2-Mj-i includes: unique identification K2-i and the second verification result value Y2-i.Wherein,
I=1,2,3 ... ... 99999999.
When it is implemented, as shown in figure 16, the corresponding merging of the unique identification K2-1 in discrete training pattern M-S1 is tested
Demonstrate,prove data cell T23-S1-1, the corresponding merging verification data unit T23-S1-2 of unique identification K2-2 ... and unique identification
The corresponding merging verification data unit T23-S1-99999999 of K2-99999999 inputs the corresponding company of discrete training pattern M-S1
Continuous training pattern M-L1 calculates the corresponding second verifying predicted value Y2- of unique identification K2-1 for generating continuous training pattern M-L1
The corresponding second verifying predicted value Y2-M1-2 of M1-1, unique identification K2-2 ... and unique identification K2-99999999 is corresponding
Second verifying predicted value Y2-M1-99999999.
Wherein, the second verifying predicted value Y2-M1-1 includes: unique identification K2-1 and the second verification result value Y2-1, and second
Verifying predicted value Y2-M1-2 includes: unique identification K2-2 and the second verification result value Y2-2 ... ..., the second verifying predicted value Y2-
M1-99999999 includes: unique identification K2-99999999 and the second verification result value Y2-99999999, as shown in table 9.
Table 9
Second verifying predicted value Y2-M1-i | Unique identification K2-i | Second verification result value Y2-i |
Y2-M1-1 | 00000001 | -2.45 |
Y2-M1-2 | 00000002 | -4.56 |
…… | …… | …… |
Y2-M1-99999999 | 99999999 | 10.23 |
By the corresponding merging verification data unit T23-S2-1 of unique identification K2-1, unique in discrete training pattern M-S2
The corresponding merging verification data unit T23-S2-2 of mark K2-2 ... and the corresponding merging of unique identification K2-99999999 is tested
It demonstrate,proves data cell T23-S2-99999999 and inputs the corresponding continuous training pattern M-L2 calculating of discrete training pattern M-S2 and generate and connect
The corresponding second verifying predicted value Y2-M2-1 of the unique identification K2-1 of continuous training pattern M-L2, unique identification K2-2 corresponding the
Two verifying predicted value Y2-M2-2 ... and the corresponding second verifying predicted value Y2-M2- of unique identification K2-99999999
99999999。
Predicted value Y2-M1-i and discrete training pattern M- is verified referring to above-mentioned discrete training pattern M-S1 corresponding each second
The calculating process of the corresponding each second verifying predicted value Y2-M2-i of S2, successively calculates and generates other M-Sj pairs of discrete training patterns
Each second verifying predicted value Y2-Mj-i answered.Wherein, each second verifying predicted value Y2-Mj-i and each unique identification K2-i has
One-to-one relationship.
S912: will be in the corresponding second feature markup information B2-i of each unique identification K2-i and each continuous training pattern M-Lj
The corresponding second verifying predicted value Y2-Mj-i of each unique identification K2-i made the difference in each continuous training pattern M-Lj of generation respectively
The corresponding difference Vj-i of each unique identification K2-i.
When it is implemented, as shown in figure 17, by the corresponding second feature markup information B2-1 of unique identification K2-1 and continuously
Corresponding make the difference of the corresponding second verifying predicted value Y2-M1-1 of unique identification K2-1 in training pattern M-L1 generates continuous training mould
The corresponding difference V1-1 of unique identification K2-1 in type M-L1, by the corresponding second feature markup information B2-2 of unique identification K2-2
It is continuous that second verifying predicted value Y2-M1-2 corresponding with each unique identification K2-2 in each continuous training pattern M-L1 makes the difference generation
The corresponding difference V1-2 of unique identification K2-2 in training pattern M-L1 ... and unique identification K2-99999999 is corresponding
Second feature markup information B2-99999999 is tested with each unique identification K2-1 corresponding second in each continuous training pattern M-L1
Card predicted value Y2-M1-1 makes the difference the corresponding difference V1-99999999 of unique identification K2-1 generated in continuous training pattern M-L1.
By in the corresponding second feature markup information B2-1 of unique identification K2-1 and each continuous training pattern M-L2 it is each only
The corresponding second verifying predicted value Y2-M2-1 of one mark K2-1 makes the difference the unique identification K2-1 generated in continuous training pattern M-L2
Corresponding difference V2-1, will be in the corresponding second feature markup information B2-2 of unique identification K2-2 and each continuous training pattern M-L2
The corresponding second verifying predicted value Y2-M2-2 of each unique identification K2-2 make the difference the unique mark generated in continuous training pattern M-L2
Know the corresponding difference V2-2 of K2-2 ... and by the corresponding second feature markup information B2- of unique identification K2-99999999
99999999 second verifying predicted value Y2-M2-s corresponding with each unique identification K2-1 in each continuous training pattern M-L2
99999999 make the difference the corresponding difference V2-99999999 of unique identification K2-1 generated in continuous training pattern M-L2.
And so on, the corresponding difference Vj- of each unique identification K2-i in each continuous training pattern M-Lj is generated respectively
i。
S913: each difference Vj-i in each continuous training pattern M-Lj is done and is generated each continuous training pattern M-Lj
The validation value Qj of corresponding characteristic information identification model Zj.
When it is implemented, by continuous training pattern M-L1 difference V1-1, C1-2 ..., C1-99999999 does and gives birth to
At the validation value Q1 of the corresponding characteristic information identification model Z1 of continuous training pattern M-L1;By the difference in continuous training pattern M-L2
Value V2-1, C2-2 ..., C2-99999999 do and generate the corresponding characteristic information identification model Z2 of continuous training pattern M-L2
Validation value Q2;And so on, generate the validation value Qj of each characteristic information identification model Zj.
S914: the validation value Qj of each characteristic information identification model Zj is ranked up, and the smallest validation value Qj is corresponding
Characteristic information identification model Zj is as optimal characteristics information identification model.
Wherein, optimal characteristics information identification model includes: preset discrete model and preset continuous model.
When it is implemented, by the validation value of the validation value Q1 of characteristic information identification model Z1, characteristic information identification model Z2
Q2 ... and the validation value Q99999999 of characteristic information identification model Z99999999 is ranked up, by the smallest validation value pair
The characteristic information identification model answered is as optimal characteristics information identification model.
Q2 is set in the present embodiment as minimum value, then optimal characteristics information identification model is Z2, preset discrete model
M-S2 and preset continuous model M-L2.
Step 3: test process
S915: a unique identification K3 is respectively set for each data group D to be predicted of acquisition.
Wherein, unique identification K3 is positive integer.
Each data group D to be predicted includes: several characteristic T3, as shown in table 10.
Table 10
Wherein, data group D to be predicted includes: the characteristics T3 such as age, city, occupation and income, and the application is not with this
It is limited.The city T3- and T3- occupation are set as discrete type characteristic, T3- age and T3- income are continuous type feature number
According to.Data group D to be predicted is multiple groups, comprising: D1, D2 ..., D99999999, the application is not limited.
S916: according to the data type of each characteristic T3 of each data group D to be predicted by each data group D to be predicted
It splits and generates the corresponding discrete data unit T31 and continuous data unit T32 of each data group D to be predicted.
Wherein, each discrete data unit T31-i includes: unique identification K3-i and unique identification K3-i corresponding to pre-
The characteristic T3 of discrete type in measured data group D.
Specifically, discrete data unit T31-i includes: the city characteristic T3-, characteristic T3- occupation, unique identification
K3-i.As shown in table 11, the corresponding discrete data unit T31-i of each data group Di to be predicted is respectively T31-1, T31-
2 ..., T31-99999999, wherein i=1,2 ... ..., 99999999.
Table 11
Data group D to be predicted | Discrete data unit T31 | Unique identification K3 | The city T3- | T3- occupation |
D1 | T31-1 | 00000001 | 021 (Shanghai) | 0 (teacher) |
D2 | T31-2 | 00000002 | 010 (Beijing) | 1 (doctor) |
…… | …… | …… | …… | …… |
D99999999 | T31-99999999 | 99999999 | 020 (Guangzhou) | 5 (unemployed) |
The corresponding continuous data unit T32-i of each discrete data unit T31-i includes: unique identification K3-i and unique
The characteristic T3 of the continuous type in the corresponding data group D to be predicted of K3-i is identified, wherein i is the positive integer more than or equal to 1.
Specifically, continuous data unit T32-i includes: characteristic T3- age, characteristic T3- income and unique mark
Know K3-i.As shown in table 12.Wherein, the corresponding continuous data unit T32-i of each data group Di to be predicted be respectively T32-1,
T32-2 ..., T32-99999999, wherein i=1,2 ... ..., 99999999.
Table 12
Data group D to be predicted | Continuous data unit T32 | Unique identification K3 | The T3- age | T3- income |
D1 | T32-1 | 00000001 | 20 | 100000 |
D2 | T32-2 | 00000002 | 60 | 150000 |
…… | …… | …… | …… | …… |
D99999999 | T32-99999999 | 99999999 | 30 | 2000 |
The quantity of discrete data unit T31-i in each data group Di to be predicted and the number of continuous data unit T32-i
Measure equal, and each discrete data unit T31-i, each continuous data unit T32-i and each data group Di to be predicted pass through respectively to pre-
The unique identification K3-i of measured data group Di realizes one-to-one relationship.
S917: the corresponding discrete data unit T31-i of each unique identification K3-i is inputted into preset discrete model M-S2
It calculates and generates the corresponding first predicted value C1-i of each unique identification K3-i in preset discrete model M-S2.
Wherein, the first predicted value C1-i includes: unique identification K3-i and the first test result values C1i.
When it is implemented, preset discrete model M-S2 is using existing discrete logarithms such as logistic regression algorithms, the application is not
As limit.
As shown in figure 18, by the corresponding T31-1 of unique identification K3-1, the corresponding T31-2 of unique identification K3-2 ... and
The corresponding T31-99999999 of unique identification K3-99999999 inputs discrete model M-S2 and calculates M-S2 pairs of discrete model of generation
The the first predicted value C1-i answered.
Wherein, the first predicted value C1-1 includes: unique identification K3-1 and the first test result values C11, the first predicted value C1-2
It include: unique identification K3-2 and the first test result values C12... ..., the first predicted value C1-99999999 includes: unique identification
K3-2 and the first test result values C199999999, as shown in table 13.
Table 13
First predicted value C1-i | Unique identification K3-i | First test result values C1i |
C1-1 | 00000001 | -2.45 |
C1-2 | 00000002 | -4.56 |
…… | …… | …… |
C1-99999999 | 99999999 | 10.23 |
Wherein, each first predicted value C1-i and each unique identification K3-i has one-to-one relationship.
S918: the corresponding continuous data unit T32-i of unique identification K3-i is merged with each first predicted value C1-i respectively
After input preset continuous model M-L2 and calculate and generate the corresponding second predicted value C2-i of preset continuous model M-L2.
Wherein, the second predicted value C2-i includes: the first unique identification K3-i and the second test result values C2i。
When it is implemented, preset continuous model M-L2 uses GBDT algorithm, the application is not limited.
As shown in figure 19, the corresponding continuous data unit T32-1 of unique identification K3-1 is merged with the first predicted value C1-1
After input preset continuous model M-L2 and calculate and generate the second predicted value C2-1, by the corresponding continuous data list of unique identification K3-2
First T32-2 inputs preset continuous model M-L2 and calculates the second predicted value C2-2 of generation, incites somebody to action after merging with the first predicted value C1-2
The corresponding continuous data unit T32-3 of unique identification K3-3 inputs preset continuous model M- after merging with the first predicted value C1-3
L2 calculate generate the second predicted value C2-3 ..., by the corresponding continuous data unit T32- of unique identification K3-99999999
99999999 merge with the first predicted value C1-99999999 after input preset continuous model M-L2 and calculate and generate the second predicted value
C2-99999999。
Wherein, the second predicted value C2-1 includes: unique identification K3-1 and the second test result values C21, the second predicted value C2-2
It include: unique identification K3-2 and the second test result values C22..., the second predicted value C2-99999999 include: unique identification
K3-99999999 and the second test result values C299999999, as shown in table 14.
Table 14
Second predicted value C2-i | Unique identification K3-i | Second test result values C2i |
C2-1 | 00000001 | 0.1346 |
C2-2 | 00000002 | 0.0293 |
…… | …… | …… |
C2-99999999 | 99999999 | 0.9374 |
S919: number to be predicted is generated according to unique identification K3-i corresponding data group Di to be predicted and the second predicted value C2-i
According to the corresponding characteristic information B3i of group Di.
When it is implemented, according to unique identification K3-1 corresponding data group D1 to be predicted and the second predicted value C2-1 generate to
The corresponding characteristic information B1 of prediction data group D1, according to the corresponding data group D2 to be predicted of unique identification K3-2 and the second predicted value
C2-2 generate the corresponding characteristic information B32 of data group D31 to be predicted ..., according to unique identification K3-99999999 it is corresponding to
Prediction data group D99999999 and the second predicted value C2-99999999 generates the corresponding feature of data group D99999999 to be predicted
Information B399999999, as shown in Table 15.
Table 15
Characteristic information B3 includes: 1 and 0, and i is the positive integer more than or equal to 1.Wherein, it represents when characteristic information B3 is 1 to pre-
Measured data group Di is the data group to be predicted with fraud feature, is identified as cheating client;Characteristic information B3 be 0 when represent to
Prediction data group Di is to be identified as non-fraud client without the data group to be predicted of fraud feature.
Conceived based on application identical with features described above information identifying method, the present invention also provides a kind of knowledges of characteristic information
Other system, as described in following example.The principle solved the problems, such as due to this feature information identification system and characteristic information identification side
Method is similar, therefore the implementation of this feature information identification system may refer to the implementation of characteristic information recognition methods, repeats place not
It repeats again.
Figure 20 is the structural schematic diagram of the characteristic information identifying system of the embodiment of the present application.As shown in figure 20, this feature is believed
Ceasing identifying system includes: acquiring unit 101, the first generation unit 102, the second generation unit 103 and third generation unit 104.
Acquiring unit 101 obtains the corresponding discrete data unit of the first unique identification and consecutive numbers of data group to be predicted
According to unit.
First generation unit 102, for the corresponding discrete data unit of the first unique identification to be inputted preset walk-off-mode
Type, which calculates, generates corresponding first predicted value of preset discrete model.Wherein, the first predicted value includes: the first unique identification.
Second generation unit 103, for merging the corresponding continuous data unit of the first unique identification and the first predicted value
After input preset continuous model and calculate and generate corresponding second predicted value of preset continuous model.Wherein, the second predicted value packet
It includes: the first unique identification.
Third generation unit 104, for raw according to the corresponding data group to be predicted of the first unique identification and the second predicted value
At the corresponding characteristic information of data group to be predicted.
Conceived based on application identical with features described above information identifying method, the application provides a kind of computer equipment, such as
Described in following example.Since the principle that the computer equipment solves the problems, such as is similar to characteristic information recognition methods, the meter
The implementation for calculating machine equipment may refer to the implementation of characteristic information recognition methods, and overlaps will not be repeated.
In one embodiment, electronic equipment include: memory, processor and storage on a memory and can be in processor
The computer program of upper operation, the processor realize the whole of the method in above-described embodiment when executing the computer program
Step, for example, as shown in Figure 1, the processor realizes following step when executing the computer program:
S101: the corresponding discrete data unit of the first unique identification and continuous data unit of data group to be predicted are obtained.
S102: it is preset that the corresponding discrete data unit of the first unique identification is inputted into preset discrete model calculating generation
Corresponding first predicted value of discrete model.Wherein, the first predicted value includes: the first unique identification.
S103: it is inputted after the corresponding continuous data unit of the first unique identification and the first predicted value are merged preset continuous
Model, which calculates, generates corresponding second predicted value of preset continuous model.Wherein, the second predicted value includes: the first unique identification.
S104: data group pair to be predicted is generated according to the corresponding data group to be predicted of the first unique identification and the second predicted value
The characteristic information answered.
Conceived based on application identical with features described above information identifying method, the application provides a kind of computer-readable storage
Medium, as described in following example.The principle solved the problems, such as due to the computer readable storage medium and characteristic information identification side
Method is similar, therefore the implementation of the computer readable storage medium may refer to the implementation of characteristic information recognition methods, repeats place
It repeats no more.
In one embodiment, it is stored with computer program on computer readable storage medium, which is located
Reason device realizes the Overall Steps of the characteristic information recognition methods in above-described embodiment when executing, for example, as shown in Figure 1, the calculating
Machine program performs the steps of when being executed by processor
S101: the corresponding discrete data unit of the first unique identification and continuous data unit of data group to be predicted are obtained.
S102: it is preset that the corresponding discrete data unit of the first unique identification is inputted into preset discrete model calculating generation
Corresponding first predicted value of discrete model.Wherein, the first predicted value includes: the first unique identification.
S103: it is inputted after the corresponding continuous data unit of the first unique identification and the first predicted value are merged preset continuous
Model, which calculates, generates corresponding second predicted value of preset continuous model.Wherein, the second predicted value includes: the first unique identification.
S104: data group pair to be predicted is generated according to the corresponding data group to be predicted of the first unique identification and the second predicted value
The characteristic information answered.
A kind of characteristic information recognition methods provided by the invention and system, comprising: obtain the first of data group to be predicted only
The corresponding discrete data unit of one mark and continuous data unit;The corresponding discrete data unit input of first unique identification is pre-
If discrete model calculate and generate corresponding first predicted value of preset discrete model;First predicted value includes: first unique mark
Know;Preset continuous model, which is inputted, after the corresponding continuous data unit of first unique identification and the first predicted value are merged calculates life
At corresponding second predicted value of preset continuous model;Second predicted value includes: the first unique identification;According to the first unique identification
Corresponding data group to be predicted and the second predicted value generate the corresponding characteristic information of data group to be predicted.The application can be improved machine
Device learning algorithm is to not only including discrete data but also including the data-handling efficiency of continuous data, to improve using the machine
The efficiency of learning algorithm progress characteristic information identification.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
Specific embodiment is applied in the present invention, and principle and implementation of the present invention are described, above embodiments
Explanation be merely used to help understand method and its core concept of the invention;At the same time, for those skilled in the art,
According to the thought of the present invention, there will be changes in the specific implementation manner and application range, in conclusion in this specification
Appearance should not be construed as limiting the invention.
Claims (21)
1. a kind of characteristic information recognition methods characterized by comprising
Obtain the corresponding discrete data unit of the first unique identification and continuous data unit of data group to be predicted;
It is described preset that the corresponding discrete data unit of first unique identification is inputted into preset discrete model calculating generation
Corresponding first predicted value of discrete model;First predicted value includes: first unique identification;
It is inputted after the corresponding continuous data unit of first unique identification and first predicted value are merged preset continuous
Model, which calculates, generates corresponding second predicted value of the preset continuous model;Second predicted value includes: described first only
One mark;
The data to be predicted are generated according to the corresponding data group to be predicted of first unique identification and second predicted value
The corresponding characteristic information of group.
2. characteristic information recognition methods according to claim 1, which is characterized in that the data group to be predicted is multiple groups.
3. characteristic information recognition methods according to claim 2, which is characterized in that described to obtain the one of data group to be predicted
The corresponding discrete data unit of first unique identification and continuous data unit, comprising:
One first unique identification is respectively set for each data group to be predicted of acquisition;Each data group to be predicted is wrapped
It includes: several fisrt feature data;
According to the data type of each fisrt feature data of each data group to be predicted by each number to be predicted
It is mitogenetic at the corresponding discrete data unit of each data group to be predicted and continuous data unit according to assembling and dismantling.
4. characteristic information recognition methods according to claim 3, which is characterized in that each discrete data unit is wrapped
It includes: the fisrt feature number of one first unique identification and the discrete type in the corresponding data group to be predicted of first unique identification
According to.
5. characteristic information recognition methods according to claim 4, which is characterized in that each discrete data unit is corresponding
Continuous data unit include: in first unique identification and the corresponding data group to be predicted of first unique identification
The fisrt feature data of continuous type.
6. characteristic information recognition methods according to claim 1, which is characterized in that further include:
Obtain the corresponding discrete training data unit of the second unique identification and continuous training data unit of training data group;
The corresponding discrete each preset discrete logarithm of training data unit input of second unique identification is calculated and is generated each
The corresponding discrete training pattern of preset discrete logarithm and each preset discrete logarithm corresponding first are trained pre-
Measured value;The first training predicted value includes: second unique identification;
By the corresponding continuous training data unit of second unique identification respectively with each preset discrete logarithm described
The corresponding first training predicted value of two unique identifications inputted after merging preset continuation algorithm calculate generate it is each it is described it is preset from
Dissipate the corresponding continuous training pattern of discrete training pattern of algorithm;
By each discrete training pattern respectively continuous training pattern corresponding with each discrete training pattern merge generate it is each
Characteristic information identification model.
7. characteristic information recognition methods according to claim 6, which is characterized in that the training data group is multiple groups.
8. characteristic information recognition methods according to claim 7, which is characterized in that described to obtain the one the of training data group
The corresponding discrete training data unit of two unique identifications and continuous training data unit, comprising:
One second unique identification and a fisrt feature markup information is respectively set for each training data group of acquisition;Each institute
Stating training data group includes: several second feature data;
According to the data type of each second feature data of each training data group by each training data group
It splits and generates each corresponding discrete training data unit of training data group and continuous training data unit.
9. characteristic information recognition methods according to claim 8, which is characterized in that each discrete training data unit
It include: one second unique identification, the second feature of discrete type in the corresponding training data group of second unique identification
The fisrt feature markup information of data and the corresponding training data group of second unique identification;
The corresponding continuous training data unit of each discrete training data unit includes: second unique identification, institute
State the second feature data and second unique identification pair of the continuous type in the corresponding training data group of the second unique identification
The fisrt feature markup information for the training data group answered.
10. characteristic information recognition methods according to claim 9, which is characterized in that described by second unique identification
Corresponding continuous training data unit corresponding with second unique identification of each preset discrete logarithm first is instructed respectively
Practice and inputs the discrete training pattern pair that preset continuation algorithm calculates each preset discrete logarithm of generation after predicted value merges
The continuous training pattern answered, comprising:
Respectively by each corresponding continuous training data unit of second unique identification and each preset discrete logarithm
The corresponding first training predicted value of each second unique identification merge and generate each described the second of each discrete logarithm
The corresponding merging training data unit of unique identification;
It is each described preset discrete that each merging training data unit input preset continuation algorithm is calculated into generation
The corresponding continuous training pattern of the discrete training pattern of algorithm.
11. characteristic information recognition methods according to claim 6, which is characterized in that further include:
Third unique identification, second feature markup information and the third unique identification for obtaining verify data group are corresponding discrete
Verification data unit and continuous verification data unit;
The corresponding discrete each discrete training pattern of verification data unit input of the third unique identification is calculated and is generated respectively
The discrete training pattern corresponding first verifies predicted value;The first verifying predicted value includes: the third unique identification;
It is inputted after the corresponding continuous verification data unit of the third unique identification is merged with the first verifying predicted value every
The corresponding continuous training pattern of discrete training pattern of a first verifying predicted value, which calculates, generates each continuous trained mould
The corresponding second verifying predicted value of type;The second verifying predicted value includes: the third unique identification;
According in the corresponding second feature markup information of the third unique identification and each continuous training pattern
The ratio of the corresponding second verifying predicted value of third unique identification generates optimal characteristics information identification model.
12. characteristic information recognition methods according to claim 11, which is characterized in that the verify data group is multiple groups.
13. characteristic information recognition methods according to claim 12, which is characterized in that it is described, obtain verify data group
It third unique identification, second feature markup information and the corresponding discrete verification data unit of the third unique identification and continuously tests
Demonstrate,prove data cell, comprising:
A third unique identification and a second feature markup information is respectively set for each verify data group of acquisition;Each institute
Stating verify data group includes: several third feature data;
According to the data type of each third feature data of each verify data group by each verify data group
Split the discrete verification data unit and continuous verification data unit for generating each verify data group.
14. characteristic information recognition methods according to claim 13, which is characterized in that each discrete verify data list
Member includes: that a third unique identification, the third of discrete type in the corresponding verify data group of the third unique identification are special
Levy the second feature markup information of data and the corresponding verify data group of the third unique identification;
The corresponding continuous verification data unit of each discrete verification data unit includes: the third unique identification, institute
State the third feature data and the third unique identification pair of the continuous type in the corresponding verify data group of third unique identification
The second feature markup information for the verify data group answered.
15. characteristic information recognition methods according to claim 14, which is characterized in that described by the third unique identification
Corresponding continuous verification data unit inputs each first verifying predicted value after merging with the first verifying predicted value
The corresponding continuous training pattern of discrete training pattern, which calculates, generates the corresponding second verifying predicted value of each continuous training pattern,
Include:
Respectively by each of the corresponding continuous verification data unit of each third unique identification and each discrete training pattern
Each third of each discrete training pattern of the corresponding first verifying predicted value merging generation of third unique identification
The corresponding merging verification data unit of unique identification;
The corresponding merging verification data unit input of each third unique identification of each discrete training pattern is each
Each third of each continuous training pattern of the corresponding continuous training pattern calculating generation of discrete training pattern is unique
Identify corresponding second verifying predicted value.
16. characteristic information recognition methods according to claim 11, which is characterized in that described uniquely to be marked according to the third
It is corresponding with the third unique identification in each continuous training pattern described to know corresponding second feature markup information
The ratio of second verifying predicted value generates optimal characteristics information identification model, comprising:
The corresponding second feature markup information of the third unique identification is corresponding with each continuous training pattern described
The corresponding second verifying predicted value of third unique identification makes the difference the third generated in each continuous training pattern respectively
The corresponding difference of unique identification;
The corresponding feature of each continuous training pattern is done and generated to the difference in each continuous training pattern
The validation value of information identification model;
Each validation value is ranked up, using the corresponding characteristic information identification model of the smallest validation value as it is described most
Excellent characteristic information identification model.
17. characteristic information recognition methods according to claim 11, which is characterized in that the optimal characteristics information identifies mould
Type includes: the preset discrete model and the preset continuous model.
18. according to claim 1 to characteristic information recognition methods described in any one of 17, which is characterized in that the feature
Information includes: fraud characteristic information.
19. a kind of characteristic information identifying system characterized by comprising
Acquiring unit, for obtaining the corresponding discrete data unit of the first unique identification and continuous data list of data group to be predicted
Member;
First generation unit, by inputting the corresponding discrete data unit of first unique identification based on preset discrete model
It calculates and generates corresponding first predicted value of the preset discrete model;First predicted value includes: first unique identification;
Second generation unit, for merging the corresponding continuous data unit of first unique identification and first predicted value
After input preset continuous model and calculate and generate corresponding second predicted value of the preset continuous model;Second predicted value
It include: first unique identification;
Third generation unit, for raw according to the corresponding data group to be predicted of first unique identification and second predicted value
At the corresponding characteristic information of the data group to be predicted.
20. a kind of electronic equipment including memory, processor and stores the calculating that can be run on a memory and on a processor
Machine program, which is characterized in that the processor realizes feature described in any one of claims 1 to 18 when executing described program
The step of information identifying method.
21. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt
The step of processor realizes characteristic information recognition methods described in any one of claims 1 to 18 when executing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910132261.1A CN109858633B (en) | 2019-02-22 | 2019-02-22 | Characteristic information identification method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910132261.1A CN109858633B (en) | 2019-02-22 | 2019-02-22 | Characteristic information identification method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109858633A true CN109858633A (en) | 2019-06-07 |
CN109858633B CN109858633B (en) | 2021-02-02 |
Family
ID=66898550
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910132261.1A Active CN109858633B (en) | 2019-02-22 | 2019-02-22 | Characteristic information identification method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109858633B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110406530A (en) * | 2019-07-02 | 2019-11-05 | 宁波吉利汽车研究开发有限公司 | A kind of automatic Pilot method, apparatus, equipment and vehicle |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7246125B2 (en) * | 2001-06-21 | 2007-07-17 | Microsoft Corporation | Clustering of databases having mixed data attributes |
CN104778176A (en) * | 2014-01-13 | 2015-07-15 | 阿里巴巴集团控股有限公司 | Data search processing method and device |
CN105354198A (en) * | 2014-08-19 | 2016-02-24 | 中国移动通信集团湖北有限公司 | Data processing method and apparatus |
CN106548343A (en) * | 2016-10-21 | 2017-03-29 | 中国银联股份有限公司 | A kind of illegal transaction detection method and device |
CN107451266A (en) * | 2017-07-31 | 2017-12-08 | 北京京东尚科信息技术有限公司 | For processing data method and its equipment |
CN108154430A (en) * | 2017-12-28 | 2018-06-12 | 上海氪信信息技术有限公司 | A kind of credit scoring construction method based on machine learning and big data technology |
CN108491408A (en) * | 2018-01-24 | 2018-09-04 | 北京三快在线科技有限公司 | A kind of processing method of action message, device, electronic equipment and storage medium |
CN108509627A (en) * | 2018-04-08 | 2018-09-07 | 腾讯科技(深圳)有限公司 | data discretization model training method and device, data discrete method |
CN108805332A (en) * | 2018-05-07 | 2018-11-13 | 北京奇艺世纪科技有限公司 | A kind of feature evaluation method and apparatus |
-
2019
- 2019-02-22 CN CN201910132261.1A patent/CN109858633B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7246125B2 (en) * | 2001-06-21 | 2007-07-17 | Microsoft Corporation | Clustering of databases having mixed data attributes |
CN104778176A (en) * | 2014-01-13 | 2015-07-15 | 阿里巴巴集团控股有限公司 | Data search processing method and device |
CN105354198A (en) * | 2014-08-19 | 2016-02-24 | 中国移动通信集团湖北有限公司 | Data processing method and apparatus |
CN106548343A (en) * | 2016-10-21 | 2017-03-29 | 中国银联股份有限公司 | A kind of illegal transaction detection method and device |
CN107451266A (en) * | 2017-07-31 | 2017-12-08 | 北京京东尚科信息技术有限公司 | For processing data method and its equipment |
CN108154430A (en) * | 2017-12-28 | 2018-06-12 | 上海氪信信息技术有限公司 | A kind of credit scoring construction method based on machine learning and big data technology |
CN108491408A (en) * | 2018-01-24 | 2018-09-04 | 北京三快在线科技有限公司 | A kind of processing method of action message, device, electronic equipment and storage medium |
CN108509627A (en) * | 2018-04-08 | 2018-09-07 | 腾讯科技(深圳)有限公司 | data discretization model training method and device, data discrete method |
CN108805332A (en) * | 2018-05-07 | 2018-11-13 | 北京奇艺世纪科技有限公司 | A kind of feature evaluation method and apparatus |
Non-Patent Citations (1)
Title |
---|
孔玉婷: "数据挖掘中分类算法研究及应用", 《中国优秀硕士学位论文全文数据库 信息科技辑(月刊)计算机软件及计算机应用》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110406530A (en) * | 2019-07-02 | 2019-11-05 | 宁波吉利汽车研究开发有限公司 | A kind of automatic Pilot method, apparatus, equipment and vehicle |
Also Published As
Publication number | Publication date |
---|---|
CN109858633B (en) | 2021-02-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yu et al. | Should college dropout prediction models include protected attributes? | |
Amini et al. | The role of top manager behaviours on adoption of cloud computing for small and medium enterprises | |
Söderström et al. | School choice and segregation: Evidence from an admission reform | |
Loveless et al. | Being unequal and seeing inequality: Explaining the political significance of social inequality in new market democracies | |
Arrieta et al. | Multi-objective black-box test case selection for cost-effectively testing simulation models | |
CN109800890A (en) | A kind of model prediction method and device | |
Yuan et al. | Epistatic genetic algorithm for test case prioritization | |
Sobaci | Regional development agencies in Turkey: are they examples of obligated policy transfer? | |
Karasev et al. | Hybrid logical and probabilistic models for management of socioeconomic safety | |
Jackson et al. | The Character of Top Leader on Adoption of Cloud Predictive Analysis for Urban Planning of Small and Medium Enterprises | |
CN109858633A (en) | A kind of characteristic information recognition methods and system | |
Hegele | Explaining bureaucratic power in intergovernmental relations: A network approach | |
Hasić | Post-conflict Cooperation in Multi-ethnic Local Communities of Bosnia and Herzegovina: A Qualitative Comparative Analysis of Diaspora's Role | |
Zhou et al. | Application research of grey fuzzy evaluation method in enterprise product reputation evaluation | |
Black et al. | An empirical study of a deliberation dialogue system | |
CN113127955A (en) | Building anti-seismic performance evaluation method, system, device and storage medium | |
Armstrong et al. | On the limited applicability of liquid democracy | |
Nebout et al. | When Allais meets Ulysses: Dynamic axioms and the common ratio effect | |
Sahasranamam et al. | Founding team entrepreneurial experience, external financing and social enterprise performance | |
Harten et al. | Talking about the likelihood of risks: an agent-based simulation of discussion processes in risk workshops | |
CN112560024A (en) | Block chain consensus method based on node trust evaluation | |
Gentry et al. | Can High School Counselors Help the Economics Pipeline? | |
Kunißen | The Independent Variable Problem: Welfare Stateness as an Explanatory Concept | |
Iakusheva et al. | Metamorphic testing for recommender systems | |
Abdullah | A Panel Data Approach of Determining Factors of Economic Growth for Different IncomeGroups of Countries. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |