CN109858633A

CN109858633A - A kind of characteristic information recognition methods and system

Info

Publication number: CN109858633A
Application number: CN201910132261.1A
Authority: CN
Inventors: 郭振宇; 黄炳; 刘华杰; 姜璐
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2019-02-22
Filing date: 2019-02-22
Publication date: 2019-06-07
Anticipated expiration: 2039-02-22
Also published as: CN109858633B

Abstract

The present invention provides a kind of characteristic information recognition methods and systems, comprising: obtains the corresponding discrete data unit of the first unique identification and continuous data unit of data group to be predicted；The corresponding discrete data unit of first unique identification is inputted into preset discrete model and calculates corresponding first predicted value of the preset discrete model of generation；First predicted value includes: the first unique identification；Preset continuous model is inputted after the corresponding continuous data unit of first unique identification and the first predicted value are merged calculates corresponding second predicted value of the preset continuous model of generation；Second predicted value includes: the first unique identification；The corresponding characteristic information of data group to be predicted is generated according to the corresponding data group to be predicted of the first unique identification and the second predicted value.The application can be improved machine learning algorithm to not only including discrete data but also including the data-handling efficiency of continuous data, to improve the efficiency for carrying out characteristic information identification using the machine learning algorithm.

Description

A kind of characteristic information recognition methods and system

Technical field

The present invention relates to computer data processing technology field more particularly to a kind of characteristic information recognition methods and systems.

Background technique

Currently, having the main machine learning algorithm of two classes in machine learning field: suitable for discrete data algorithm and Suitable for the algorithm of continuous data, these two types of algorithm existing defects are as follows:

1, suitable for the machine learning algorithm of discrete data (such as: logistic regression), existing defects are as follows: need in advance to sample It (both includes discrete data in sample data, also comprising continuous that continuity data in notebook data carry out discretization sometimes Type data), but the selection of discrete logarithm (including: a point bucket, segmentation, LOG processing etc.) will affect final assessment result.It is applicable in In the machine learning algorithm complex disposal process of discrete data, select to need in discretization algorithmic procedure by test of many times, It could assess and obtain preferably discretization algorithm.

2, suitable for the machine learning algorithm of continuous data (such as: GBDT algorithm), existing defects are as follows: in model training or During prediction, the decision tree of GBDT needs to carry out discrete data " right and wrong " processing in logic.Work as discrete data Type (such as occupation type has teacher, doctor, engineer, peasant, worker, director, performer) it is very more when, will lead to GBDT decision tree becomes very huge, will greatly reduce the treatment effeciency for being suitable for the machine learning algorithm of continuous data.

Therefore, for not only including discrete data but also including the data of continuous data, existing machine learning algorithm Complex disposal process and efficiency is lower will lead to and carry out that characteristic information recognition efficiency is low to ask using the machine learning algorithm Topic.

Summary of the invention

In order to solve defect in the prior art, the present invention provides a kind of characteristic information recognition methods and systems, can Effectively improve the efficiency that characteristic information identification is carried out using machine learning algorithm.

To achieve the goals above, the present invention provides a kind of characteristic information recognition methods, this method comprises:

Obtain the corresponding discrete data unit of the first unique identification and continuous data unit of data group to be predicted；

It is described pre- that the corresponding discrete data unit of first unique identification is inputted into preset discrete model calculating generation If corresponding first predicted value of discrete model；First predicted value includes: first unique identification；

It is inputted after the corresponding continuous data unit of first unique identification and first predicted value are merged preset Continuous model, which calculates, generates corresponding second predicted value of the preset continuous model；Second predicted value includes: described One unique identification；

It is generated according to the corresponding data group to be predicted of first unique identification and second predicted value described to be predicted The corresponding characteristic information of data group.

The present invention also provides a kind of characteristic information identifying system, which includes:

Acquiring unit, for obtaining the corresponding discrete data unit of the first unique identification and consecutive numbers of data group to be predicted According to unit；

First generation unit, for the corresponding discrete data unit of first unique identification to be inputted preset walk-off-mode Type, which calculates, generates corresponding first predicted value of the preset discrete model；First predicted value includes: described first unique Mark；

Second generation unit is used for the corresponding continuous data unit of first unique identification and first predicted value Preset continuous model calculating is inputted after merging generates corresponding second predicted value of the preset continuous model；Described second is pre- Measured value includes: first unique identification；

Third generation unit, for according to the corresponding data group to be predicted of first unique identification and second prediction Value generates the corresponding characteristic information of the data group to be predicted.

The present invention also provides a kind of electronic equipment, comprising: memory, processor and storage on a memory and can located The computer program run on reason device, the processor realize the step of the characteristic information recognition methods when executing described program Suddenly.

The present invention provides a kind of computer readable storage mediums, are stored thereon with computer program, the computer program The step of characteristic information recognition methods is realized when being executed by processor.

A kind of characteristic information recognition methods provided by the invention and system, comprising: obtain the first of data group to be predicted only The corresponding discrete data unit of one mark and continuous data unit；The corresponding discrete data unit of first unique identification is defeated Enter preset discrete model calculating and generates corresponding first predicted value of the preset discrete model；The first predicted value packet It includes: first unique identification；The corresponding continuous data unit of first unique identification and first predicted value are merged After input preset continuous model and calculate and generate corresponding second predicted value of the preset continuous model；Second predicted value It include: first unique identification；According to the corresponding data group to be predicted of first unique identification and second predicted value Generate the corresponding characteristic information of the data group to be predicted.The application can be improved machine learning algorithm to both comprising discrete type number According to the data-handling efficiency for again including continuous data, characteristic information identification is carried out using the machine learning algorithm to improve Efficiency.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 is a kind of characteristic information recognition methods flow chart of the application；

Fig. 2 is the characteristic information recognition methods flow chart in one embodiment of the application；

Fig. 3 is the flow chart of the step S201 in one embodiment of the application；

Fig. 4 is the flow chart of the step S205 in one embodiment of the application；

Fig. 5 is the flow chart of the step S207 in one embodiment of the application；

Fig. 6 is the flow chart of the step S209 in one embodiment of the application；

Fig. 7 is the flow chart of the step S211 in one embodiment of the application；

Fig. 8 is the flow chart of the step S212 in one embodiment of the application；

Fig. 9 is the fraud characteristic information recognition methods flow chart in another embodiment of the application；

Figure 10 is the corresponding discrete training pattern M-S1 and each first of logistic regression algorithm S1 in one embodiment of the application The generating process schematic diagram of training predicted value；

Figure 11 is the corresponding merging instruction of each unique identification K1-i in the logistic regression algorithm S1 in one embodiment of the application Practice the generating process schematic diagram of data cell T13-S1-i；

Figure 12 is the generating process schematic diagram of the continuous training pattern M-L1 in one embodiment of the application；

Figure 13 is the structural schematic diagram of the characteristic information identification model Zj in one embodiment of the application；

Figure 14 is the corresponding each first verifying predicted value Y1-M1-i of the discrete training pattern M-S1 in one embodiment of the application Generating process schematic diagram；

Figure 15 is the merging verification data unit T23-S1-i in the discrete training pattern M-S1 in one embodiment of the application Generating process schematic diagram；

Figure 16 is each second verifying predicted value Y2-M1-i in the continuous training pattern M-L1 in one embodiment of the application Generating process schematic diagram；

Figure 17 is the generating process signal of each difference V1-i in the continuous training pattern M-L1 in one embodiment of the application Figure；

Figure 18 is that the generating process of the corresponding first predicted value C1-i of discrete model M-S2 in one embodiment of the application is shown It is intended to；

Figure 19 is that the generating process of the corresponding second predicted value C2-i of continuous model M-L2 in one embodiment of the application is shown It is intended to；

Figure 20 is a kind of structural schematic diagram of characteristic information identifying system of the application.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

About " first " used herein, " second " ... etc., not especially censure the meaning of order or cis-position, Also non-to limit the present invention, only for distinguishing with the element of same technique term description or operation.

It is open term, i.e., about "comprising" used herein, " comprising ", " having ", " containing " etc. Mean including but not limited to.

About it is used herein " and/or ", including any of the things or all combination.

In view of the deficiencies in the prior art, a kind of characteristic information recognition methods provided by the invention, flow chart is such as Shown in Fig. 1, this method comprises:

S101: the corresponding discrete data unit of the first unique identification and continuous data unit of data group to be predicted are obtained.

Wherein, data group to be predicted is multiple groups, and the application is not limited.

When it is implemented, the specific implementation procedure of step S101 is as follows:

Firstly, one first unique identification is respectively set for each data group to be predicted obtained.Wherein, each data to be predicted Group includes: several fisrt feature data.

Secondly, according to the data type of each fisrt feature data of each data group to be predicted by each data group to be predicted It splits and generates the corresponding discrete data unit of each data group to be predicted and continuous data unit.Wherein, each discrete data list Member includes: the fisrt feature of one first unique identification and the discrete type in the corresponding data group to be predicted of the first unique identification Data.The corresponding continuous data unit of each discrete data unit includes: that the first unique identification and the first unique identification are corresponding Data group to be predicted in continuous type fisrt feature data.

S102: it is preset that the corresponding discrete data unit of the first unique identification is inputted into preset discrete model calculating generation Corresponding first predicted value of discrete model.Wherein, the first predicted value includes: the first unique identification.

S103: it is inputted after the corresponding continuous data unit of the first unique identification and the first predicted value are merged preset continuous Model, which calculates, generates corresponding second predicted value of preset continuous model.Wherein, the second predicted value includes: the first unique identification.

S104: data group pair to be predicted is generated according to the corresponding data group to be predicted of the first unique identification and the second predicted value The characteristic information answered.

Process as shown in Figure 1 it is found that the application by the first unique identification for obtaining data group to be predicted it is corresponding from Dissipate data cell and continuous data unit；The corresponding discrete data unit of first unique identification is inputted into preset discrete model meter It calculates and generates corresponding first predicted value of preset discrete model；The corresponding continuous data unit of first unique identification and first is pre- Measured value inputs preset continuous model and calculates corresponding second predicted value of the preset continuous model of generation after merging；Only according to first The corresponding data group to be predicted of one mark and the second predicted value generate the corresponding characteristic information of data group to be predicted, to make this Shen Please have fitting phenomenon probability of happening, machine learning algorithm caused by reducing discrete data simple and machine learning algorithm effect The high beneficial effect of rate, to improve the efficiency for carrying out characteristic information identification using machine learning algorithm in the application.

In order to make those skilled in the art be better understood by the present invention, a more detailed embodiment is set forth below, As shown in Fig. 2, a kind of characteristic information recognition methods provided in an embodiment of the present invention, this method need to be implemented before prediction process Then training process and verification process execute prediction process, packet according to the optimal characteristics information identification model of verification process output Include following execution step:

Step 1: prediction process

S201: the corresponding discrete data unit of the first unique identification and continuous data unit of data group to be predicted are obtained.

When it is implemented, as shown in figure 3, the specific implementation procedure of step S201 is as follows:

S301: one first unique identification is respectively set for each data group to be predicted of acquisition.Wherein, each data to be predicted Group includes: several fisrt feature data.

S302: according to the data type of each fisrt feature data of each data group to be predicted by each data group to be predicted It splits and generates the corresponding discrete data unit of each data group to be predicted and continuous data unit.

Wherein, each discrete data unit includes: that one first unique identification and the first unique identification are corresponding to be predicted The fisrt feature data of discrete type in data group.The corresponding continuous data unit of each discrete data unit includes: The fisrt feature data of one unique identification and the continuous type in the corresponding data group to be predicted of the first unique identification.

S202: the corresponding discrete data unit of each first unique identification is inputted into preset discrete model and calculates generation Corresponding first predicted value of each first unique identification in preset discrete model.Wherein, the first predicted value includes: first unique Mark.

When it is implemented, preset discrete model is using existing discrete logarithms such as logistic regression algorithms, the application is not with this It is limited.

S203: it is inputted after the corresponding continuous data unit of the first unique identification and the first predicted value are merged preset continuous Model, which calculates, generates corresponding second predicted value of preset continuous model.Wherein, the second predicted value includes: the first unique identification.

When it is implemented, preset continuous model is using any one existing continuation algorithm such as GBDT algorithm, the application It is not limited.

S204: data group pair to be predicted is generated according to the corresponding data group to be predicted of the first unique identification and the second predicted value The characteristic information answered.

Step 2: training process

S205: the corresponding discrete training data unit of the second unique identification and continuous training data of training data group are obtained Unit.

When it is implemented, training data group be multiple groups, each training data group include: a discrete training data unit and One continuous training data unit.Discrete training data unit and continuous training data unit pass through corresponding training data group Second unique identification realizes one-to-one relationship.

As shown in figure 4, the specific implementation procedure of step S205 is as follows:

S401: one second unique identification and a fisrt feature markup information is respectively set for each training data group of acquisition. Wherein, each training data group includes: several second feature data.Second unique identification and fisrt feature markup information have One-to-one relationship.

S402: each training data group is split according to the data type of each second feature data of each training data group Generate the corresponding discrete training data unit of each training data group and continuous training data unit.

Wherein, each discrete training data unit includes: one second unique identification, the corresponding training of the second unique identification The fisrt feature of the corresponding training data group of second feature data and the second unique identification of discrete type in data group marks Information.The corresponding continuous training data unit of each discrete training data unit includes: the second unique identification, second unique mark Know the second feature data and the second unique identification corresponding training data group of the continuous type in corresponding training data group Fisrt feature markup information.

The quantity of discrete training data unit in each training data group is equal with the quantity of continuous training data unit, And the quantity of each discrete training data unit, each continuous training data unit and each training data group pass through each training data group Second unique identification realizes one-to-one relationship.

S206: the corresponding discrete each preset discrete logarithm of training data unit input of the second unique identification is calculated and is generated Each corresponding discrete training pattern of preset discrete logarithm and the corresponding first training predicted value of each preset discrete logarithm. Wherein, the first training predicted value includes: the second unique identification.

Wherein, preset discrete logarithm is multiple existing discrete logarithms, such as logistic regression algorithm, NB Algorithm And decision Tree algorithms etc., the application is not limited.

When it is implemented, the corresponding discrete training data unit input logic regression algorithm of each second unique identification is calculated Generate the corresponding discrete training pattern of logistic regression algorithm and the corresponding first training predicted value of logistic regression algorithm.

The corresponding discrete training data unit input naive Bayesian discrete logarithm of each second unique identification is calculated and is generated The corresponding discrete training pattern of naive Bayesian discrete logarithm and the corresponding first training predicted value of naive Bayesian discrete logarithm, And so on, the corresponding discrete training data unit of each second unique identification is inputted into other preset discrete logarithms and calculates difference It generates other preset corresponding discrete training patterns of discrete logarithm and other preset discrete logarithms corresponding first is trained in advance Measured value.

Wherein, each first training predicted value and each second unique identification have one-to-one relationship, and each first instruction Practicing predicted value and each discrete logarithm also has one-to-one relationship.

S207: the corresponding continuous training data unit of the second unique identification is each with each preset discrete logarithm respectively It is each preset discrete that the corresponding first training predicted value of second unique identification inputs preset continuation algorithm calculating generation after merging The corresponding continuous training pattern of the discrete training pattern of algorithm.

Wherein, preset continuation algorithm is multiple known continuation algorithms, such as GBDT algorithm, linear regression algorithm, K- Means algorithm etc., the application is not limited.

As shown in figure 5, when step S207 is specifically executed the following steps are included:

S501: respectively by the corresponding continuous training data unit of each second unique identification and each preset discrete logarithm The corresponding first training predicted value of each second unique identification merge each second unique mark for generating each preset discrete logarithm Know corresponding merging training data unit.

Wherein, merging training data unit and preset discrete logarithm has one-to-one relationship, and each preset There is one-to-one relationship with the second unique identification in discrete logarithm.

When it is implemented, by each of the corresponding continuous training data unit of each second unique identification and logistic regression algorithm The corresponding first training predicted value of second unique identification merges the corresponding conjunction of each second unique identification for generating logistic regression algorithm And training data unit.

By the corresponding continuous training data unit of each second unique identification and each the second of naive Bayesian discrete logarithm Each second unique identification that the corresponding first training predicted value of unique identification merges generation naive Bayesian discrete logarithm is corresponding Merge training data unit.And so on, the corresponding continuous training data unit of each second unique identification is preset with other Discrete logarithm each second unique identification it is corresponding first training predicted value merge generate other preset discrete logarithms it is each The corresponding merging training data unit of second unique identification.

S502: each merging training data unit is inputted into preset continuation algorithm and calculates each preset discrete logarithm of generation The corresponding continuous training pattern of discrete training pattern.Wherein, it may be one that preset continuation algorithm, which can be multiple, this Application is not limited.

Wherein, preset continuation algorithm includes: GBDT algorithm, linear regression algorithm and K-means algorithm etc., and the application is not As limit.

When it is implemented, the corresponding merging training data unit of each second unique identification of logistic regression algorithm is inputted GBDT algorithm calculates the corresponding continuous training pattern of discrete training pattern for generating logistic regression algorithm.

Wherein, the corresponding merging training data unit of each second unique identification of different preset discrete logarithms can input Identical preset continuation algorithm (as shown in example 1) can also input different preset continuation algorithm (as shown in example 2).

Example 1: the corresponding merging training data unit of each second unique identification of logistic regression algorithm is inputted into GBDT algorithm Calculate the corresponding continuous training pattern of discrete training pattern for generating logistic regression discrete logarithm.And so on, other are preset Each second unique identification corresponding merging training data unit input GBDT algorithm of discrete logarithm calculate that generate other default Discrete logarithm the corresponding continuous training pattern of discrete training pattern.

Example 2: the corresponding merging training data unit of each second unique identification of logistic regression algorithm is inputted into GBDT algorithm Calculate the corresponding continuous training pattern of discrete training pattern for generating logistic regression algorithm.

The corresponding merging training data unit input linear of each second unique identification of NB Algorithm is returned and is calculated Method calculates the corresponding continuous training pattern of discrete training pattern for generating NB Algorithm.

The corresponding merging training data unit input GBDT of each second unique identification of other preset discrete logarithms is calculated Any one in method, linear regression algorithm or K-means algorithm calculates the discrete trained mould for generating other preset discrete logarithms The corresponding continuous training pattern of type.

S208: by each discrete training pattern respectively continuous training pattern corresponding with each discrete training pattern merge generate it is each Characteristic information identification model.

When it is implemented, characteristic information identification model be it is multiple, each characteristic information identification model includes: a discrete instruction Practice model and the corresponding continuous training pattern of the discrete training pattern.

By the corresponding discrete training pattern of logistic regression algorithm company corresponding with the discrete training pattern of logistic regression algorithm Continuous training pattern, which merges, generates a characteristic information identification model.

By the discrete training of naive Bayesian discrete logarithm corresponding discrete training pattern and naive Bayesian discrete logarithm The corresponding continuous training pattern of model, which merges, generates another feature information identification model.

And so on, by the corresponding discrete training pattern of discrete logarithm of other reservations and the discrete logarithms of other reservations The corresponding continuous training pattern of discrete training pattern, which merges, generates other characteristic information identification models.

Step 3: verification process

S209: third unique identification, second feature markup information and the third unique identification for obtaining verify data group are corresponding Discrete verification data unit and continuous verification data unit.

Wherein, verify data group is multiple groups, and each verify data group includes: a discrete verification data unit and a company Continuous verification data unit.Discrete verification data unit and continuous verification data unit by the third of corresponding verify data group only One mark realizes one-to-one relationship；Second feature markup information and third unique identification have one-to-one relationship.

When it is implemented, as shown in fig. 6, the specific implementation procedure of step S209 is as follows:

S601: a third unique identification and a second feature markup information is respectively set for each verify data group of acquisition. Wherein, each verify data group includes: several third feature data.

S602: each verify data group is split according to the data type of each third feature data of each verify data group Generate the discrete verification data unit and continuous verification data unit of each verify data group.

Wherein, each discrete verification data unit includes: a third unique identification, the corresponding verifying of third unique identification The second feature of the corresponding verify data group of third feature data and third unique identification of discrete type in data group marks Information.The corresponding continuous verification data unit of each discrete verification data unit includes: that third unique identification, third are uniquely marked Know the third feature data and third unique identification corresponding verify data group of the continuous type in corresponding verify data group Second feature markup information.

S210: the corresponding discrete verification data unit of third unique identification is inputted into each discrete training pattern respectively and calculates life Predicted value is verified at each discrete training pattern corresponding first.Wherein, the first verifying predicted value includes: third unique identification.

S211: it is inputted after the corresponding continuous verification data unit of third unique identification is merged with the first verifying predicted value every It is corresponding that the corresponding continuous training pattern of discrete training pattern of a first verifying predicted value calculates each continuous training pattern of generation Second verifying predicted value.Wherein, the second verifying predicted value includes: third unique identification.

When it is implemented, as shown in fig. 7, when step S211 is specifically executed the following steps are included:

S701: respectively by each of the corresponding continuous verification data unit of each third unique identification and each discrete training pattern Each third unique identification that the corresponding first verifying predicted value of third unique identification merges each discrete training pattern of generation is corresponding Merging verification data unit.

S702: the corresponding merging verification data unit of each third unique identification of each discrete training pattern is inputted each The corresponding continuous training pattern of discrete training pattern calculates each third unique identification corresponding for generating each continuous training pattern Two verifying predicted values.

S212: according to the corresponding second feature markup information of each third unique identification with it is each in each continuous training pattern The corresponding second verifying predicted value of third unique identification generates optimal characteristics information identification model.

Wherein, optimal characteristics information identification model includes: preset discrete model and preset continuous model.

When it is implemented, the application is not limited as shown in figure 8, the execution of step S212 detailed process is as follows:

S801: by each the in the corresponding second feature markup information of each third unique identification and each continuous training pattern The corresponding second verifying predicted value of three unique identifications makes the difference each third unique identification generated in each continuous training pattern respectively Corresponding difference.

S802: each continuous instruction is done and generated to the corresponding difference of each third unique identification in each continuous training pattern Practice the validation value of the corresponding characteristic information identification model of model.

S803: the validation value of each characteristic information identification model is ranked up, and the corresponding feature of the smallest validation value is believed Identification model is ceased as optimal characteristics information identification model.

In one embodiment, characteristic information includes: fraud characteristic information, potential big customer's characteristic information etc., the application It is not limited.

In order to make those skilled in the art be better understood by the present invention, a more detailed scene is set forth below and implements Example one.

As shown in figure 9, a kind of fraud characteristic information recognition methods provided in an embodiment of the present invention, this method includes following step It is rapid:

Step 1: training process

Unique identification Ki is set as positive integer, feature markup information Bi includes: 1 and 0, and wherein i is just whole more than or equal to 1 Number.Wherein, training data group is represented when feature markup information Bi is 1 as the training data group with fraud feature, feature marks Information Bi represents training data group as the training data group without fraud feature when being 0.Training data with fraud feature Group will be identified that fraud client, and the training data group without fraud feature will be identified that non-fraud client.

S901: a unique identification K1 and fisrt feature markup information B1 is respectively set for each training data group H of acquisition.Respectively Unique identification K1 and each fisrt feature markup information B1 has one-to-one relationship.

Wherein, unique identification K1 is positive integer, and fisrt feature markup information B1 includes: 1 and 0, and i is just more than or equal to 1 Integer.Wherein, training data group H is represented when fisrt feature markup information B1 is 1 as the training data group with fraud feature, the One feature markup information B1 represents training data group H as the training data group without fraud feature when being 0.With fraud feature Training data group will be identified that fraud client, the training data group without fraud feature will be identified that non-fraud is objective Family.

Each training data group H includes: several characteristic T1, as shown in table 1.

Table 1

Wherein, training data group H includes: the characteristics T1 such as age, city, occupation and income, the application not as Limit.The city T1- and T1- occupation are set as discrete type characteristic, T1- age and T1- income are continuous type feature data. Training data group H is multiple groups, comprising: H1, H2 ..., H99999999, the application is not limited.

S902: each training data group H is split according to the data type of each characteristic T1 of each training data group H Generate each training data group H corresponding discrete training data unit T11 and continuous training data unit T12.

Wherein, each discrete training data unit T11-i includes: unique identification K1-i, unique identification K1-i corresponding The first of the corresponding training data group Hi of characteristic T1 and unique identification K1-i of discrete type in training data group Hi is special Markup information B1-i is levied, wherein i is the positive integer more than or equal to 1.

Specifically, discrete training data unit T11-i include: the city characteristic T1-, it is characteristic T1- occupation, unique Identify K1-i and fisrt feature markup information B1-i.As shown in table 2, the corresponding discrete training data unit of each training data group Hi T11-i be respectively T11-1, T11-2 ..., T11-99999999, wherein i=1,2 ... ..., 99999999.

Table 2

Training data group H	Discrete training data unit T11	Unique identification K1	The city T1-	T1- occupation	Fisrt feature markup information B1
						H1	T11-1	00000001	021 (Shanghai)	0 (teacher)	0
H 2	T11-2	00000002	010 (Beijing)	1 (doctor)	0
						……	……	……	……	……	……
H 99999999	T11-H99999999	99999999	020 (Guangzhou)	5 (unemployed)	1

The corresponding continuous training data unit T12-i of each discrete training data unit T11-i includes: unique identification The characteristic T1 and unique identification K1-i of continuous type in the corresponding training data group Hi of K1-i, unique identification K1-i are corresponding Training data group Hi fisrt feature markup information B1-i, wherein i is positive integer more than or equal to 1, as shown in table 3.

Table 3

Training data group H	Continuous training data unit T12	Unique identification K1	The T1- age	T1- income	Fisrt feature markup information B1
						H1	T12-1	00000001	20	100000	0
H 2	T12-2	00000002	60	150000	0
						……	……	……	……	……	……
H 99999999	T12-H99999999	99999999	30	2000	1

Specifically, continuous training data unit T12-i include: the characteristic T1- age, it is characteristic T1- income, unique Identify K1-i and fisrt feature markup information B1-i.As shown in table 3.Wherein, the corresponding continuous trained number of each training data group Hi According to unit T12-i be respectively T12-1, T12-2 ..., T12-99999999, wherein i=1,2 ... ..., 99999999.

The quantity of discrete training data unit T11-i in each training data group Hi and continuous training data unit T12- The quantity of i is equal, and the quantity of each discrete training data unit T11-i, each continuous training data unit T12-i and each trained number One-to-one relationship is realized by the unique identification K1-i of each training data group Hi according to group Hi.

S903: it is each pre- that each discrete each preset discrete logarithm Sj of training data unit T11-i input is calculated into generation If discrete logarithm Sj corresponding discrete training pattern M-Sj and each preset discrete logarithm Sj it is corresponding first training prediction Value X-Sj-i, i and j are the positive integer more than or equal to 1.Wherein, the first training predicted value X-Sj-i includes: unique identification K1-i And predicted value Xi.Wherein, i=1,2,3 ... ... 99999999.

When it is implemented, preset discrete logarithm Sj includes: logistic regression algorithm S1, NB Algorithm S2 and decision Tree algorithm S3 etc., the application is not limited, wherein j=1,2,3 ... ....

As shown in Figure 10, by the corresponding T11-1 of unique identification K1-1, the corresponding T11-2 of unique identification K1-2 ... and The equal input logic regression algorithm S1 of the corresponding T11-99999999 of unique identification K1-99999999, which is calculated, generates logistic regression algorithm Each unique identification K1-i corresponding first trains predicted value in S1 corresponding discrete training pattern M-S1 and logistic regression algorithm S1 X-S1-1, first training predicted value X-S1-2 ... and first training predicted value X-S1-99999999.

Wherein, the first training predicted value X-S1-1 includes: unique identification K1-1 and training result value X1, the first training prediction Value X-S1-2 includes: unique identification K1-2 and training result value X2 ... ..., the first training predicted value X-S1-99999999 include: Unique identification K1-99999999 and training result value X99999999, as shown in table 4.

Table 4

First training predicted value X-S1-i	Unique identification K1-i	Training result value Xi
			X-S1-1	00000001	-2.45
X-S1-2	00000002	-4.56
			……	……	……
X-S1-99999999	99999999	10.23

It is corresponding each referring to the corresponding discrete training pattern M-S1 and logistic regression algorithm S1 of above-mentioned logistic regression algorithm S1 First training predicted value X-S1-i calculating process successively calculates and generates the corresponding discrete trained mould of each preset discrete logarithm Sj The corresponding each first training predicted value X-Sj-i of type M-Sj and each preset discrete logarithm Sj.

Wherein, each first training predicted value X-Sj-i and each unique identification K1-i has one-to-one relationship, and each First training predicted value X-Sj-i and each preset discrete logarithm Sj also has one-to-one relationship.

S904: respectively by the corresponding continuous training data unit T12-i of each unique identification K1-i and it is each it is preset from The corresponding first training predicted value X-Sj-i of each unique identification K1-i for dissipating algorithm Sj, which merges, generates each preset discrete logarithm The corresponding merging training data unit T13-Sj-i of each unique identification K1-i of Sj.Wherein, i, j are just whole more than or equal to 1 Number.

Wherein, merging training data unit T13-Sj-i and preset discrete logarithm Sj has a corresponding relationship, and with every Unique identification K1-i in a preset discrete logarithm Sj has one-to-one relationship.

When it is implemented, as shown in figure 11, by the corresponding continuous training data unit T12-1 of unique identification K1-1 and logic The corresponding first training predicted value X-S1-1 of unique identification K1-1 in regression algorithm S1, which merges, to be generated in logistic regression algorithm S1 The corresponding merging training data unit T13-S1-1 of unique identification K1-1；By the corresponding continuous training data of unique identification K1-2 Unit T12-2 first training predicted value X-S1-2 corresponding with the unique identification K1-2 in logistic regression algorithm S1, which merges to generate, to patrol Collect the corresponding merging training data unit T13-S1-2 of unique identification K1-2 in regression algorithm S1；..., by unique identification K1- Unique identification K1- in 99999999 corresponding continuous training data unit T12-99999999 and logistic regression algorithm S1 99999999 corresponding first training predicted value X-S1-99999999 merge the unique identification generated in logistic regression algorithm S1 The corresponding merging training data unit T13-S1-99999999 of K1-99999999.

And so on, the corresponding continuous training data unit T12-i of each unique identification K1-i is any pre- with other respectively If discrete logarithm Sj in each unique identification K1-i it is corresponding first training predicted value X-Sj-i merge generate this it is preset from Dissipate the corresponding merging training data unit T13-Sj-i of each unique identification K1-i in algorithm Sj.

S905: the preset continuation algorithm Lj of each merging training data unit T13-Sj-i input at least one is calculated and is generated often The corresponding continuous training pattern M-Lj of the discrete training pattern M-Sj of a preset discrete logarithm Sj.

Wherein, it may be one that preset continuation algorithm Lj, which can be multiple, and the application is not limited.Preset company Continuous algorithm Lj includes: GBDT algorithm L1, linear regression algorithm L2 and K-means algorithm L3 etc., and the application is not limited, In, j=1,2,3 ....

When it is implemented, as shown in figure 12, by the corresponding merging training number of the unique identification K1-1 of logistic regression algorithm S1 According to the corresponding merging training data unit T13-S1-1 of unit T13-S1-1, unique identification K1-2 ... and unique identification K1- 99999999 corresponding merging training data unit T13-S1-99999999 input GBDT algorithm L1 and calculate generation logistic regression The corresponding continuous training pattern M-L1 of the discrete training pattern M-S1 of algorithm S1.

In the present embodiment, the corresponding merging training number of each unique identification K1-i of the different preset discrete logarithm Sj of setting Different preset continuation algorithm Lj is inputted according to unit T13-Sj-i, such as:

By corresponding merging training data unit T13-S2-1, the Wei Yibiao of the unique identification K1-1 of NB Algorithm S2 The corresponding merging training data unit T13-S2-1 of knowledge K1-2 ... and the corresponding merging training of unique identification K1-99999999 The equal input linear regression algorithm L2 of data cell T13-S2-99999999 calculates the discrete training for generating NB Algorithm S2 The corresponding continuous training pattern M-L2 of model M-S2.

S906: by each discrete training pattern M-Sj continuous training pattern M- corresponding with each discrete training pattern M-Sj respectively Lj, which merges, generates each characteristic information identification model Zj.

As shown in figure 13, characteristic information identification model Zj be it is multiple, each characteristic information identification model Zj include: one from Dissipate the training pattern M-Sj and corresponding continuous training pattern M-Lj of the discrete training pattern M-Sj.

When it is implemented, by the corresponding discrete training pattern M-S1 of logistic regression algorithm S1 and logistic regression algorithm S1 from It dissipates the corresponding continuous training pattern M-L1 of training pattern M-S1 and merges one characteristic information identification model Z1 of generation.By simple pattra leaves The corresponding discrete training pattern M-S2 of this algorithm S2 is corresponding with the discrete training pattern M-S2 of NB Algorithm S2 continuous Training pattern M-L2, which merges, generates characteristic information identification model Z2, and so on, the discrete logarithm Sj that other are reserved is corresponding Discrete training pattern M-Sj continuous training pattern M- corresponding with the discrete training pattern M-Sj for the discrete logarithm Sj that other are reserved Lj, which merges, generates other characteristic information identification models Zj.

Step 2: verification process

S907: an a unique identification K2 and second feature markup information B2 is respectively set for each verify data group G of acquisition. Each unique identification K2 and each second feature markup information B2 has one-to-one relationship.

Wherein, unique identification K2 is positive integer, and second feature markup information B2 includes: 1 and 0, and i is just more than or equal to 1 Integer.Wherein, verify data group G is represented when second feature markup information B2 is 1 as the verify data group with fraud feature, the Two feature markup information B2 represent verify data group G as the verify data group without fraud feature when being 0.With fraud feature Verify data group G will be identified that fraud client, the verify data group G without fraud feature will be identified that non-fraud is objective Family.

Each verify data group G includes: several characteristic T2, as shown in table 5.

Wherein, verify data group G includes: the characteristics T2 such as age, city, occupation and income, the application not as Limit.The city T2- and T2- occupation are set as discrete type characteristic, T2- age and T2- income are continuous type feature data. Verify data group G is multiple groups, comprising: G1, G2 ..., G99999999, the application is not limited.

Table 5

S908: each verify data group G is split according to the data type of each characteristic T2 of each verify data group G Generate the discrete verification data unit T21 and continuous verification data unit T21 of each verify data group G.

Wherein, each discrete verification data unit T21-i includes: unique identification K2-i, unique identification K2-i corresponding The second feature of the corresponding verify data group G of characteristic T2 and unique identification K2-i of discrete type in verify data group G Markup information B2-i, wherein i is the positive integer more than or equal to 1.

Specifically, discrete verification data unit T21-i include: the city characteristic T2-, it is characteristic T2- occupation, unique Identify K2-i and second feature markup information B2-i.As shown in table 6, the corresponding discrete verification data unit of each verify data group Gi T21-i be respectively T21-1, T21-2 ..., T21-99999999, wherein i=1,2 ... ..., 99999999.

Table 6

Verify data group G	Discrete verification data unit T21	Unique identification K2	The city T2-	T2- occupation	Second feature markup information B2
						G1	T21-1	00000001	021 (Shanghai)	0 (teacher)	0
G2	T21-2	00000002	010 (Beijing)	1 (doctor)	0
						……	……	……	……	……	……
G99999999	T21-99999999	99999999	020 (Guangzhou)	5 (unemployed)	1

Table 7

Verify data group G	Continuous verification data unit T22	Unique identification K2	The T2- age	T2- income	Second feature markup information B2
						G1	T22-1	00000001	20	100000	0
G2	T22-2	00000002	60	150000	0
						……	……	……	……	……	……
G99999999	T22-99999999	99999999	30	2000	1

The corresponding continuous verification data unit T22-i of each discrete verification data unit T21-i includes: unique identification The characteristic T2 and unique identification K2-i of continuous type in the corresponding verify data group Gi of K2-i, unique identification K2-i are corresponding Verify data group Gi second feature markup information B2-i, wherein i is positive integer more than or equal to 1, as shown in table 7.

Specifically, continuous verification data unit T22-i include: the characteristic T2- age, it is characteristic T2- income, unique Identify K2-i and second feature markup information B2-i.As shown in table 7.Wherein, the corresponding continuous verifying number of each verify data group Gi According to unit T22-i be respectively T22-1, T22-2 ..., T22-99999999, wherein i=1,2 ... ..., 99999999.

The quantity of discrete verification data unit T21-i in each verify data group Gi and continuous verification data unit T22- The quantity of i is equal, and each discrete verification data unit T21-i, each continuous verification data unit T22-i and each verify data group Gi One-to-one relationship is realized by the unique identification K2-i of each verify data group Gi.

S909: each each discrete training pattern M-Sj of discrete verification data unit T21-i input is calculated generate each from Dissipating training pattern M-Sj corresponding each first verifying predicted value Y1-Mj-i, i and j is the positive integer more than or equal to 1.

Wherein, the first verifying predicted value Y1-Mj-i includes: unique identification K2-i and the first verification result value Y1-i.Wherein, I=1,2,3 ... ... 99999999.

When it is implemented, as shown in figure 14, by discrete verification data unit T21-1, discrete verification data unit T21- 2 ... and discrete verification data unit T21-99999999 inputs discrete training pattern M-S1 and calculates the discrete training pattern of generation M-S1 it is corresponding first verifying predicted value Y1-M1-1, first verifying predicted value Y1-M1-2 ... and first verifying predicted value Y1- M1-99999999。

Wherein, the first verifying predicted value Y1-M1-1 includes: unique identification K2-1 and the first verification result value Y1-1, and first Verifying predicted value Y1-M1-2 includes: unique identification K2-2 and the first verification result value Y1-2 ... ..., the first verifying predicted value Y1- M1-99999999 includes: unique identification K2-99999999 and the first verification result value Y1-99999999, as shown in table 8.

Table 8

First verifying predicted value Y1-M1-i	Unique identification K2-i	First verification result value Y1-i
			Y1-M1-1	00000001	-2.45
Y1-M1-2	00000002	-4.56
			……	……	……
Y1-M1-99999999	99999999	10.23

The calculating process of predicted value Y1-M1-i is verified referring to above-mentioned discrete training pattern M-S1 corresponding each first, successively Calculating generates other discrete training pattern M-Sj corresponding each first and verifies predicted value Y1-Mj-i.Each first verifying predicted value Y1- Mj-i and each unique identification K2-i has one-to-one relationship.

S910: respectively by the corresponding continuous verification data unit T22-i of each unique identification K2-i and each discrete training The corresponding first verifying predicted value Y1-Mj-i of each unique identification K2-i in model M-Sj, which merges, generates each discrete training pattern The corresponding merging verification data unit T23-Sj-i of each unique identification K2-i of M-Sj.Wherein, i, j are just more than or equal to 1 Integer.

Wherein, merging verification data unit T23-Sj-i and discrete training pattern M-Sj has a corresponding relationship, and with every Each unique identification K2-i in a discrete training pattern M-Sj has one-to-one relationship.

When it is implemented, as shown in figure 15, by the corresponding continuous verification data unit T22-1 of unique identification K2-1 with it is discrete The corresponding first verifying predicted value Y1-M1-1 of unique identification K2-1 in training pattern M-S1, which merges, generates discrete training pattern M- The corresponding merging verification data unit T23-S1-1 of the unique identification K2-1 of S1；By the corresponding continuous verifying number of unique identification K2-2 Merge according to unit T22-2 first verifying predicted value Y1-M1-2 corresponding with the unique identification K2-2 in discrete training pattern M-S1 The corresponding merging verification data unit T23-S1-2 ... ... of unique identification K2-2 for generating discrete training pattern M-S1, will be unique Identify unique mark in K2-99999999 corresponding continuous verification data unit T22-99999999 and discrete training pattern M-S1 Know the corresponding first verifying predicted value Y1-M1-99999999 of K2-99999999 and merges the unique of the discrete training pattern M-S1 of generation Identify the corresponding merging verification data unit T23-S1-99999999 of K2-99999999.

And so on, by the corresponding continuous verification data unit T22-i of each unique identification K2-i and other any discrete instructions Practice the corresponding first verifying predicted value Y1-Mj-i of each unique identification K2-i in model M-Sj and merges generation any discrete training The corresponding merging verification data unit T23-Sj-i of each unique identification K2-i of model M-Sj.

S911: by the corresponding merging verification data unit T23- of each unique identification K2-i of each discrete training pattern M-Sj Each continuous training pattern M-Lj of each discrete corresponding continuous training pattern M-Lj calculating generation of training pattern M-Sj of Sj-i input The corresponding second verifying predicted value Y2-Mj-i, i and j of each unique identification K2-i be positive integer more than or equal to 1.

Wherein, the second verifying predicted value Y2-Mj-i includes: unique identification K2-i and the second verification result value Y2-i.Wherein, I=1,2,3 ... ... 99999999.

When it is implemented, as shown in figure 16, the corresponding merging of the unique identification K2-1 in discrete training pattern M-S1 is tested Demonstrate,prove data cell T23-S1-1, the corresponding merging verification data unit T23-S1-2 of unique identification K2-2 ... and unique identification The corresponding merging verification data unit T23-S1-99999999 of K2-99999999 inputs the corresponding company of discrete training pattern M-S1 Continuous training pattern M-L1 calculates the corresponding second verifying predicted value Y2- of unique identification K2-1 for generating continuous training pattern M-L1 The corresponding second verifying predicted value Y2-M1-2 of M1-1, unique identification K2-2 ... and unique identification K2-99999999 is corresponding Second verifying predicted value Y2-M1-99999999.

Wherein, the second verifying predicted value Y2-M1-1 includes: unique identification K2-1 and the second verification result value Y2-1, and second Verifying predicted value Y2-M1-2 includes: unique identification K2-2 and the second verification result value Y2-2 ... ..., the second verifying predicted value Y2- M1-99999999 includes: unique identification K2-99999999 and the second verification result value Y2-99999999, as shown in table 9.

Table 9

Second verifying predicted value Y2-M1-i	Unique identification K2-i	Second verification result value Y2-i
			Y2-M1-1	00000001	-2.45
Y2-M1-2	00000002	-4.56
			……	……	……
Y2-M1-99999999	99999999	10.23

By the corresponding merging verification data unit T23-S2-1 of unique identification K2-1, unique in discrete training pattern M-S2 The corresponding merging verification data unit T23-S2-2 of mark K2-2 ... and the corresponding merging of unique identification K2-99999999 is tested It demonstrate,proves data cell T23-S2-99999999 and inputs the corresponding continuous training pattern M-L2 calculating of discrete training pattern M-S2 and generate and connect The corresponding second verifying predicted value Y2-M2-1 of the unique identification K2-1 of continuous training pattern M-L2, unique identification K2-2 corresponding the Two verifying predicted value Y2-M2-2 ... and the corresponding second verifying predicted value Y2-M2- of unique identification K2-99999999 99999999。

Predicted value Y2-M1-i and discrete training pattern M- is verified referring to above-mentioned discrete training pattern M-S1 corresponding each second The calculating process of the corresponding each second verifying predicted value Y2-M2-i of S2, successively calculates and generates other M-Sj pairs of discrete training patterns Each second verifying predicted value Y2-Mj-i answered.Wherein, each second verifying predicted value Y2-Mj-i and each unique identification K2-i has One-to-one relationship.

S912: will be in the corresponding second feature markup information B2-i of each unique identification K2-i and each continuous training pattern M-Lj The corresponding second verifying predicted value Y2-Mj-i of each unique identification K2-i made the difference in each continuous training pattern M-Lj of generation respectively The corresponding difference Vj-i of each unique identification K2-i.

When it is implemented, as shown in figure 17, by the corresponding second feature markup information B2-1 of unique identification K2-1 and continuously Corresponding make the difference of the corresponding second verifying predicted value Y2-M1-1 of unique identification K2-1 in training pattern M-L1 generates continuous training mould The corresponding difference V1-1 of unique identification K2-1 in type M-L1, by the corresponding second feature markup information B2-2 of unique identification K2-2 It is continuous that second verifying predicted value Y2-M1-2 corresponding with each unique identification K2-2 in each continuous training pattern M-L1 makes the difference generation The corresponding difference V1-2 of unique identification K2-2 in training pattern M-L1 ... and unique identification K2-99999999 is corresponding Second feature markup information B2-99999999 is tested with each unique identification K2-1 corresponding second in each continuous training pattern M-L1 Card predicted value Y2-M1-1 makes the difference the corresponding difference V1-99999999 of unique identification K2-1 generated in continuous training pattern M-L1.

By in the corresponding second feature markup information B2-1 of unique identification K2-1 and each continuous training pattern M-L2 it is each only The corresponding second verifying predicted value Y2-M2-1 of one mark K2-1 makes the difference the unique identification K2-1 generated in continuous training pattern M-L2 Corresponding difference V2-1, will be in the corresponding second feature markup information B2-2 of unique identification K2-2 and each continuous training pattern M-L2 The corresponding second verifying predicted value Y2-M2-2 of each unique identification K2-2 make the difference the unique mark generated in continuous training pattern M-L2 Know the corresponding difference V2-2 of K2-2 ... and by the corresponding second feature markup information B2- of unique identification K2-99999999 99999999 second verifying predicted value Y2-M2-s corresponding with each unique identification K2-1 in each continuous training pattern M-L2 99999999 make the difference the corresponding difference V2-99999999 of unique identification K2-1 generated in continuous training pattern M-L2.

And so on, the corresponding difference Vj- of each unique identification K2-i in each continuous training pattern M-Lj is generated respectively i。

S913: each difference Vj-i in each continuous training pattern M-Lj is done and is generated each continuous training pattern M-Lj The validation value Qj of corresponding characteristic information identification model Zj.

When it is implemented, by continuous training pattern M-L1 difference V1-1, C1-2 ..., C1-99999999 does and gives birth to At the validation value Q1 of the corresponding characteristic information identification model Z1 of continuous training pattern M-L1；By the difference in continuous training pattern M-L2 Value V2-1, C2-2 ..., C2-99999999 do and generate the corresponding characteristic information identification model Z2 of continuous training pattern M-L2 Validation value Q2；And so on, generate the validation value Qj of each characteristic information identification model Zj.

S914: the validation value Qj of each characteristic information identification model Zj is ranked up, and the smallest validation value Qj is corresponding Characteristic information identification model Zj is as optimal characteristics information identification model.

When it is implemented, by the validation value of the validation value Q1 of characteristic information identification model Z1, characteristic information identification model Z2 Q2 ... and the validation value Q99999999 of characteristic information identification model Z99999999 is ranked up, by the smallest validation value pair The characteristic information identification model answered is as optimal characteristics information identification model.

Q2 is set in the present embodiment as minimum value, then optimal characteristics information identification model is Z2, preset discrete model M-S2 and preset continuous model M-L2.

Step 3: test process

S915: a unique identification K3 is respectively set for each data group D to be predicted of acquisition.

Wherein, unique identification K3 is positive integer.

Each data group D to be predicted includes: several characteristic T3, as shown in table 10.

Table 10

Wherein, data group D to be predicted includes: the characteristics T3 such as age, city, occupation and income, and the application is not with this It is limited.The city T3- and T3- occupation are set as discrete type characteristic, T3- age and T3- income are continuous type feature number According to.Data group D to be predicted is multiple groups, comprising: D1, D2 ..., D99999999, the application is not limited.

S916: according to the data type of each characteristic T3 of each data group D to be predicted by each data group D to be predicted It splits and generates the corresponding discrete data unit T31 and continuous data unit T32 of each data group D to be predicted.

Wherein, each discrete data unit T31-i includes: unique identification K3-i and unique identification K3-i corresponding to pre- The characteristic T3 of discrete type in measured data group D.

Specifically, discrete data unit T31-i includes: the city characteristic T3-, characteristic T3- occupation, unique identification K3-i.As shown in table 11, the corresponding discrete data unit T31-i of each data group Di to be predicted is respectively T31-1, T31- 2 ..., T31-99999999, wherein i=1,2 ... ..., 99999999.

Table 11

Data group D to be predicted	Discrete data unit T31	Unique identification K3	The city T3-	T3- occupation
					D1	T31-1	00000001	021 (Shanghai)	0 (teacher)
D2	T31-2	00000002	010 (Beijing)	1 (doctor)
					……	……	……	……	……
D99999999	T31-99999999	99999999	020 (Guangzhou)	5 (unemployed)

The corresponding continuous data unit T32-i of each discrete data unit T31-i includes: unique identification K3-i and unique The characteristic T3 of the continuous type in the corresponding data group D to be predicted of K3-i is identified, wherein i is the positive integer more than or equal to 1.

Specifically, continuous data unit T32-i includes: characteristic T3- age, characteristic T3- income and unique mark Know K3-i.As shown in table 12.Wherein, the corresponding continuous data unit T32-i of each data group Di to be predicted be respectively T32-1, T32-2 ..., T32-99999999, wherein i=1,2 ... ..., 99999999.

Table 12

Data group D to be predicted	Continuous data unit T32	Unique identification K3	The T3- age	T3- income
					D1	T32-1	00000001	20	100000
D2	T32-2	00000002	60	150000
					……	……	……	……	……
D99999999	T32-99999999	99999999	30	2000

The quantity of discrete data unit T31-i in each data group Di to be predicted and the number of continuous data unit T32-i Measure equal, and each discrete data unit T31-i, each continuous data unit T32-i and each data group Di to be predicted pass through respectively to pre- The unique identification K3-i of measured data group Di realizes one-to-one relationship.

S917: the corresponding discrete data unit T31-i of each unique identification K3-i is inputted into preset discrete model M-S2 It calculates and generates the corresponding first predicted value C1-i of each unique identification K3-i in preset discrete model M-S2.

Wherein, the first predicted value C1-i includes: unique identification K3-i and the first test result values C1i.

When it is implemented, preset discrete model M-S2 is using existing discrete logarithms such as logistic regression algorithms, the application is not As limit.

As shown in figure 18, by the corresponding T31-1 of unique identification K3-1, the corresponding T31-2 of unique identification K3-2 ... and The corresponding T31-99999999 of unique identification K3-99999999 inputs discrete model M-S2 and calculates M-S2 pairs of discrete model of generation The the first predicted value C1-i answered.

Wherein, the first predicted value C1-1 includes: unique identification K3-1 and the first test result values C₁₁, the first predicted value C1-2 It include: unique identification K3-2 and the first test result values C₁₂... ..., the first predicted value C1-99999999 includes: unique identification K3-2 and the first test result values C_199999999, as shown in table 13.

Table 13

First predicted value C1-i	Unique identification K3-i	First test result values C_1i
			C1-1	00000001	-2.45
C1-2	00000002	-4.56
			……	……	……
C1-99999999	99999999	10.23

Wherein, each first predicted value C1-i and each unique identification K3-i has one-to-one relationship.

S918: the corresponding continuous data unit T32-i of unique identification K3-i is merged with each first predicted value C1-i respectively After input preset continuous model M-L2 and calculate and generate the corresponding second predicted value C2-i of preset continuous model M-L2.

Wherein, the second predicted value C2-i includes: the first unique identification K3-i and the second test result values C_2i。

When it is implemented, preset continuous model M-L2 uses GBDT algorithm, the application is not limited.

As shown in figure 19, the corresponding continuous data unit T32-1 of unique identification K3-1 is merged with the first predicted value C1-1 After input preset continuous model M-L2 and calculate and generate the second predicted value C2-1, by the corresponding continuous data list of unique identification K3-2 First T32-2 inputs preset continuous model M-L2 and calculates the second predicted value C2-2 of generation, incites somebody to action after merging with the first predicted value C1-2 The corresponding continuous data unit T32-3 of unique identification K3-3 inputs preset continuous model M- after merging with the first predicted value C1-3 L2 calculate generate the second predicted value C2-3 ..., by the corresponding continuous data unit T32- of unique identification K3-99999999 99999999 merge with the first predicted value C1-99999999 after input preset continuous model M-L2 and calculate and generate the second predicted value C2-99999999。

Wherein, the second predicted value C2-1 includes: unique identification K3-1 and the second test result values C₂₁, the second predicted value C2-2 It include: unique identification K3-2 and the second test result values C₂₂..., the second predicted value C2-99999999 include: unique identification K3-99999999 and the second test result values C_299999999, as shown in table 14.

Table 14

Second predicted value C2-i	Unique identification K3-i	Second test result values C_2i
			C2-1	00000001	0.1346
C2-2	00000002	0.0293
			……	……	……
C2-99999999	99999999	0.9374

S919: number to be predicted is generated according to unique identification K3-i corresponding data group Di to be predicted and the second predicted value C2-i According to the corresponding characteristic information B3i of group Di.

When it is implemented, according to unique identification K3-1 corresponding data group D1 to be predicted and the second predicted value C2-1 generate to The corresponding characteristic information B1 of prediction data group D1, according to the corresponding data group D2 to be predicted of unique identification K3-2 and the second predicted value C2-2 generate the corresponding characteristic information B32 of data group D31 to be predicted ..., according to unique identification K3-99999999 it is corresponding to Prediction data group D99999999 and the second predicted value C2-99999999 generates the corresponding feature of data group D99999999 to be predicted Information B399999999, as shown in Table 15.

Table 15

Characteristic information B3 includes: 1 and 0, and i is the positive integer more than or equal to 1.Wherein, it represents when characteristic information B3 is 1 to pre- Measured data group Di is the data group to be predicted with fraud feature, is identified as cheating client；Characteristic information B3 be 0 when represent to Prediction data group Di is to be identified as non-fraud client without the data group to be predicted of fraud feature.

Conceived based on application identical with features described above information identifying method, the present invention also provides a kind of knowledges of characteristic information Other system, as described in following example.The principle solved the problems, such as due to this feature information identification system and characteristic information identification side Method is similar, therefore the implementation of this feature information identification system may refer to the implementation of characteristic information recognition methods, repeats place not It repeats again.

Figure 20 is the structural schematic diagram of the characteristic information identifying system of the embodiment of the present application.As shown in figure 20, this feature is believed Ceasing identifying system includes: acquiring unit 101, the first generation unit 102, the second generation unit 103 and third generation unit 104.

Acquiring unit 101 obtains the corresponding discrete data unit of the first unique identification and consecutive numbers of data group to be predicted According to unit.

First generation unit 102, for the corresponding discrete data unit of the first unique identification to be inputted preset walk-off-mode Type, which calculates, generates corresponding first predicted value of preset discrete model.Wherein, the first predicted value includes: the first unique identification.

Second generation unit 103, for merging the corresponding continuous data unit of the first unique identification and the first predicted value After input preset continuous model and calculate and generate corresponding second predicted value of preset continuous model.Wherein, the second predicted value packet It includes: the first unique identification.

Third generation unit 104, for raw according to the corresponding data group to be predicted of the first unique identification and the second predicted value At the corresponding characteristic information of data group to be predicted.

Conceived based on application identical with features described above information identifying method, the application provides a kind of computer equipment, such as Described in following example.Since the principle that the computer equipment solves the problems, such as is similar to characteristic information recognition methods, the meter The implementation for calculating machine equipment may refer to the implementation of characteristic information recognition methods, and overlaps will not be repeated.

In one embodiment, electronic equipment include: memory, processor and storage on a memory and can be in processor The computer program of upper operation, the processor realize the whole of the method in above-described embodiment when executing the computer program Step, for example, as shown in Figure 1, the processor realizes following step when executing the computer program:

Conceived based on application identical with features described above information identifying method, the application provides a kind of computer-readable storage Medium, as described in following example.The principle solved the problems, such as due to the computer readable storage medium and characteristic information identification side Method is similar, therefore the implementation of the computer readable storage medium may refer to the implementation of characteristic information recognition methods, repeats place It repeats no more.

In one embodiment, it is stored with computer program on computer readable storage medium, which is located Reason device realizes the Overall Steps of the characteristic information recognition methods in above-described embodiment when executing, for example, as shown in Figure 1, the calculating Machine program performs the steps of when being executed by processor

A kind of characteristic information recognition methods provided by the invention and system, comprising: obtain the first of data group to be predicted only The corresponding discrete data unit of one mark and continuous data unit；The corresponding discrete data unit input of first unique identification is pre- If discrete model calculate and generate corresponding first predicted value of preset discrete model；First predicted value includes: first unique mark Know；Preset continuous model, which is inputted, after the corresponding continuous data unit of first unique identification and the first predicted value are merged calculates life At corresponding second predicted value of preset continuous model；Second predicted value includes: the first unique identification；According to the first unique identification Corresponding data group to be predicted and the second predicted value generate the corresponding characteristic information of data group to be predicted.The application can be improved machine Device learning algorithm is to not only including discrete data but also including the data-handling efficiency of continuous data, to improve using the machine The efficiency of learning algorithm progress characteristic information identification.

It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.

Specific embodiment is applied in the present invention, and principle and implementation of the present invention are described, above embodiments Explanation be merely used to help understand method and its core concept of the invention；At the same time, for those skilled in the art, According to the thought of the present invention, there will be changes in the specific implementation manner and application range, in conclusion in this specification Appearance should not be construed as limiting the invention.

Claims

1. a kind of characteristic information recognition methods characterized by comprising

It is described preset that the corresponding discrete data unit of first unique identification is inputted into preset discrete model calculating generation Corresponding first predicted value of discrete model；First predicted value includes: first unique identification；

It is inputted after the corresponding continuous data unit of first unique identification and first predicted value are merged preset continuous Model, which calculates, generates corresponding second predicted value of the preset continuous model；Second predicted value includes: described first only One mark；

The data to be predicted are generated according to the corresponding data group to be predicted of first unique identification and second predicted value The corresponding characteristic information of group.

2. characteristic information recognition methods according to claim 1, which is characterized in that the data group to be predicted is multiple groups.

3. characteristic information recognition methods according to claim 2, which is characterized in that described to obtain the one of data group to be predicted The corresponding discrete data unit of first unique identification and continuous data unit, comprising:

One first unique identification is respectively set for each data group to be predicted of acquisition；Each data group to be predicted is wrapped It includes: several fisrt feature data；

According to the data type of each fisrt feature data of each data group to be predicted by each number to be predicted It is mitogenetic at the corresponding discrete data unit of each data group to be predicted and continuous data unit according to assembling and dismantling.

4. characteristic information recognition methods according to claim 3, which is characterized in that each discrete data unit is wrapped It includes: the fisrt feature number of one first unique identification and the discrete type in the corresponding data group to be predicted of first unique identification According to.

5. characteristic information recognition methods according to claim 4, which is characterized in that each discrete data unit is corresponding Continuous data unit include: in first unique identification and the corresponding data group to be predicted of first unique identification The fisrt feature data of continuous type.

6. characteristic information recognition methods according to claim 1, which is characterized in that further include:

Obtain the corresponding discrete training data unit of the second unique identification and continuous training data unit of training data group；

The corresponding discrete each preset discrete logarithm of training data unit input of second unique identification is calculated and is generated each The corresponding discrete training pattern of preset discrete logarithm and each preset discrete logarithm corresponding first are trained pre- Measured value；The first training predicted value includes: second unique identification；

By the corresponding continuous training data unit of second unique identification respectively with each preset discrete logarithm described The corresponding first training predicted value of two unique identifications inputted after merging preset continuation algorithm calculate generate it is each it is described it is preset from Dissipate the corresponding continuous training pattern of discrete training pattern of algorithm；

By each discrete training pattern respectively continuous training pattern corresponding with each discrete training pattern merge generate it is each Characteristic information identification model.

7. characteristic information recognition methods according to claim 6, which is characterized in that the training data group is multiple groups.

8. characteristic information recognition methods according to claim 7, which is characterized in that described to obtain the one the of training data group The corresponding discrete training data unit of two unique identifications and continuous training data unit, comprising:

One second unique identification and a fisrt feature markup information is respectively set for each training data group of acquisition；Each institute Stating training data group includes: several second feature data；

According to the data type of each second feature data of each training data group by each training data group It splits and generates each corresponding discrete training data unit of training data group and continuous training data unit.

9. characteristic information recognition methods according to claim 8, which is characterized in that each discrete training data unit It include: one second unique identification, the second feature of discrete type in the corresponding training data group of second unique identification The fisrt feature markup information of data and the corresponding training data group of second unique identification；

The corresponding continuous training data unit of each discrete training data unit includes: second unique identification, institute State the second feature data and second unique identification pair of the continuous type in the corresponding training data group of the second unique identification The fisrt feature markup information for the training data group answered.

10. characteristic information recognition methods according to claim 9, which is characterized in that described by second unique identification Corresponding continuous training data unit corresponding with second unique identification of each preset discrete logarithm first is instructed respectively Practice and inputs the discrete training pattern pair that preset continuation algorithm calculates each preset discrete logarithm of generation after predicted value merges The continuous training pattern answered, comprising:

Respectively by each corresponding continuous training data unit of second unique identification and each preset discrete logarithm The corresponding first training predicted value of each second unique identification merge and generate each described the second of each discrete logarithm The corresponding merging training data unit of unique identification；

It is each described preset discrete that each merging training data unit input preset continuation algorithm is calculated into generation The corresponding continuous training pattern of the discrete training pattern of algorithm.

11. characteristic information recognition methods according to claim 6, which is characterized in that further include:

Third unique identification, second feature markup information and the third unique identification for obtaining verify data group are corresponding discrete Verification data unit and continuous verification data unit；

The corresponding discrete each discrete training pattern of verification data unit input of the third unique identification is calculated and is generated respectively The discrete training pattern corresponding first verifies predicted value；The first verifying predicted value includes: the third unique identification；

It is inputted after the corresponding continuous verification data unit of the third unique identification is merged with the first verifying predicted value every The corresponding continuous training pattern of discrete training pattern of a first verifying predicted value, which calculates, generates each continuous trained mould The corresponding second verifying predicted value of type；The second verifying predicted value includes: the third unique identification；

According in the corresponding second feature markup information of the third unique identification and each continuous training pattern The ratio of the corresponding second verifying predicted value of third unique identification generates optimal characteristics information identification model.

12. characteristic information recognition methods according to claim 11, which is characterized in that the verify data group is multiple groups.

13. characteristic information recognition methods according to claim 12, which is characterized in that it is described, obtain verify data group It third unique identification, second feature markup information and the corresponding discrete verification data unit of the third unique identification and continuously tests Demonstrate,prove data cell, comprising:

A third unique identification and a second feature markup information is respectively set for each verify data group of acquisition；Each institute Stating verify data group includes: several third feature data；

According to the data type of each third feature data of each verify data group by each verify data group Split the discrete verification data unit and continuous verification data unit for generating each verify data group.

14. characteristic information recognition methods according to claim 13, which is characterized in that each discrete verify data list Member includes: that a third unique identification, the third of discrete type in the corresponding verify data group of the third unique identification are special Levy the second feature markup information of data and the corresponding verify data group of the third unique identification；

The corresponding continuous verification data unit of each discrete verification data unit includes: the third unique identification, institute State the third feature data and the third unique identification pair of the continuous type in the corresponding verify data group of third unique identification The second feature markup information for the verify data group answered.

15. characteristic information recognition methods according to claim 14, which is characterized in that described by the third unique identification Corresponding continuous verification data unit inputs each first verifying predicted value after merging with the first verifying predicted value The corresponding continuous training pattern of discrete training pattern, which calculates, generates the corresponding second verifying predicted value of each continuous training pattern, Include:

Respectively by each of the corresponding continuous verification data unit of each third unique identification and each discrete training pattern Each third of each discrete training pattern of the corresponding first verifying predicted value merging generation of third unique identification The corresponding merging verification data unit of unique identification；

The corresponding merging verification data unit input of each third unique identification of each discrete training pattern is each Each third of each continuous training pattern of the corresponding continuous training pattern calculating generation of discrete training pattern is unique Identify corresponding second verifying predicted value.

16. characteristic information recognition methods according to claim 11, which is characterized in that described uniquely to be marked according to the third It is corresponding with the third unique identification in each continuous training pattern described to know corresponding second feature markup information The ratio of second verifying predicted value generates optimal characteristics information identification model, comprising:

The corresponding second feature markup information of the third unique identification is corresponding with each continuous training pattern described The corresponding second verifying predicted value of third unique identification makes the difference the third generated in each continuous training pattern respectively The corresponding difference of unique identification；

The corresponding feature of each continuous training pattern is done and generated to the difference in each continuous training pattern The validation value of information identification model；

Each validation value is ranked up, using the corresponding characteristic information identification model of the smallest validation value as it is described most Excellent characteristic information identification model.

17. characteristic information recognition methods according to claim 11, which is characterized in that the optimal characteristics information identifies mould Type includes: the preset discrete model and the preset continuous model.

18. according to claim 1 to characteristic information recognition methods described in any one of 17, which is characterized in that the feature Information includes: fraud characteristic information.

19. a kind of characteristic information identifying system characterized by comprising

Acquiring unit, for obtaining the corresponding discrete data unit of the first unique identification and continuous data list of data group to be predicted Member；

First generation unit, by inputting the corresponding discrete data unit of first unique identification based on preset discrete model It calculates and generates corresponding first predicted value of the preset discrete model；First predicted value includes: first unique identification；

Second generation unit, for merging the corresponding continuous data unit of first unique identification and first predicted value After input preset continuous model and calculate and generate corresponding second predicted value of the preset continuous model；Second predicted value It include: first unique identification；

Third generation unit, for raw according to the corresponding data group to be predicted of first unique identification and second predicted value At the corresponding characteristic information of the data group to be predicted.

20. a kind of electronic equipment including memory, processor and stores the calculating that can be run on a memory and on a processor Machine program, which is characterized in that the processor realizes feature described in any one of claims 1 to 18 when executing described program The step of information identifying method.

21. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt The step of processor realizes characteristic information recognition methods described in any one of claims 1 to 18 when executing.