CN109784377A

CN109784377A - Multiple recognition model building method, device, computer equipment and storage medium

Info

Publication number: CN109784377A
Application number: CN201811601941.5A
Authority: CN
Inventors: 吴壮伟
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-12-26
Filing date: 2018-12-26
Publication date: 2019-05-21

Abstract

The invention discloses Multiple recognition model building method, device, computer equipment and storage mediums.This method is by obtaining source data, pretreatment and vector combination are carried out to source data, it obtains primitive character matrix and the column vector wherein chosen is subjected to statistical calculation to obtain newly-increased column vector, column vector low with the Pearson correlation coefficient of each column vector in primitive character matrix in newly-increased column vector is deleted, rear eigenmatrix is simplified；Eigenmatrix after simplification is increased by a column result vector and obtains raw data set；The row vector for extracting default line number by row is concentrated to concentrate non-extracted row vector as test set as training set, and using initial data in initial data；And it is obtained to including being trained to training pattern for multiple submodels for the anti-identification model for cheating identification by training set.The method achieve by training set to it is multiple son simultaneously models be trained include to obtain Multiple Velocity Model identification model, improve the recognition accuracy of identification model.

Description

Multiple recognition model building method, device, computer equipment and storage medium

Technical field

The present invention relates to prediction model technical field more particularly to a kind of Multiple recognition model building methods, device, calculating Machine equipment and storage medium.

Background technique

Current anti-fraud identifying system is more based on single model, and single model has quasi- on specific field The advantage closed and predicted, and on other fields or scene, do not have advantage；Such as linear regression model (LRM), in linear mould In type have advantage, and neural network model on nonlinear model have advantage, due to single model on specific field Tool advantage is not easy to promote the comprehensive performance of single model.

And when carrying out model training to single model, used data are usually initial data, to initial data It pre-processes less, causes data dimension low, reduce accuracy of the single model in specific identification.

Summary of the invention

The embodiment of the invention provides a kind of Multiple recognition model building method, device, computer equipment and storage medium, Aim to solve the problem that anti-fraud identifying system is more to carry out model training when institute to single model based on single model in the prior art The data of use are usually initial data, reduce accuracy of the single model in specific identification, and when single model uses by The problem of specific field or scene restriction.

In a first aspect, the embodiment of the invention provides a kind of Multiple recognition model building methods comprising:

Source data is obtained, source data is pre-processed, data after being handled；

Using the data at time point each in data after processing as a row vector, by data after processing in chronological sequence Sequentially corresponding row vector sequentially combines from top to bottom, obtains primitive character matrix；

The column vector chosen in primitive character matrix is subjected to statistical calculation to obtain newly-increased column vector, deletion is newly added It is located at preset ranking threshold with the size ranking of the Pearson correlation coefficient of each column vector in primitive character matrix in vector Column vector after value is simplified rear eigenmatrix；

It will increase by a column result vector in eigenmatrix after simplification and obtain raw data set；

The row vector for extracting preset percentage by row is concentrated to concentrate not as training set, and by initial data in initial data The row vector being extracted is as test set；And

By training set to including being trained to training pattern for multiple submodels, obtain for the anti-knowledge for cheating identification Other model.

Second aspect, the embodiment of the invention provides a kind of Multiple recognition model construction devices comprising:

Pretreatment unit pre-processes source data for obtaining source data, data after being handled；

Original matrix acquiring unit, the data for each time point in data after handling, will as a row vector In chronological sequence the corresponding row vector of sequence sequentially combines from top to bottom in data after processing, obtains primitive character matrix；

Simplification matrix acquiring unit, for the column vector chosen in primitive character matrix to be carried out statistical calculation to obtain Newly-increased column vector deletes the size in newly-increased column vector with the Pearson correlation coefficient of each column vector in primitive character matrix Ranking is located at the column vector after preset rank threshold, is simplified rear eigenmatrix；

Raw data set acquiring unit obtains initial data for increasing by a column result vector in eigenmatrix after simplifying Collection；

Raw data set split cells extracts the row vector of preset percentage as instruction by row for concentrating in initial data Practice collection, and concentrates non-extracted row vector as test set initial data；

Model training unit, for, to including being trained to training pattern for multiple submodels, being obtained by training set For the anti-identification model for cheating identification.

The third aspect, the embodiment of the present invention provide a kind of computer equipment again comprising memory, processor and storage On the memory and the computer program that can run on the processor, the processor execute the computer program Multiple recognition model building method described in the above-mentioned first aspect of Shi Shixian.

Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, wherein the computer can It reads storage medium and is stored with computer program, it is above-mentioned that the computer program when being executed by a processor executes the processor Multiple recognition model building method described in first aspect.

The embodiment of the invention provides a kind of Multiple recognition model building method, device, computer equipment and storage mediums. This method pre-processes source data, data after being handled by obtaining source data；Each time in data after handling The data of point are as a row vector, by the in chronological sequence corresponding row vector of sequence sequentially group from top to bottom in data after processing It closes, obtains primitive character matrix；By the column vector chosen in primitive character matrix carry out statistical calculation with newly added to Amount is deleted in newly-increased column vector and is located at the size ranking of the Pearson correlation coefficient of each column vector in primitive character matrix Column vector after preset rank threshold is simplified rear eigenmatrix；It will increase by a column result in eigenmatrix after simplification Vector obtains raw data set；It concentrates by the row vector of row extraction preset percentage in initial data as training set, and will be former Non-extracted row vector is as test set in beginning data set；And by training set to the mould to be trained including multiple submodels Type is trained, and is obtained for the anti-identification model for cheating identification.The method achieve pass through training set to multiple sons while moulds Type be trained include to obtain Multiple Velocity Model identification model, improve the recognition accuracy of identification model.

Detailed description of the invention

Technical solution in order to illustrate the embodiments of the present invention more clearly, below will be to needed in embodiment description Attached drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, general for this field For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.

Fig. 1 is the flow diagram of Multiple recognition model building method provided in an embodiment of the present invention；

Fig. 2 is the sub-process schematic diagram of Multiple recognition model building method provided in an embodiment of the present invention；

Fig. 3 is another sub-process schematic diagram of Multiple recognition model building method provided in an embodiment of the present invention；

Fig. 4 is another sub-process schematic diagram of Multiple recognition model building method provided in an embodiment of the present invention；

Fig. 5 is another flow diagram of Multiple recognition model building method provided in an embodiment of the present invention；

Fig. 6 is another flow diagram of Multiple recognition model building method provided in an embodiment of the present invention；

Fig. 7 is another flow diagram of Multiple recognition model building method provided in an embodiment of the present invention；

Fig. 8 is the schematic block diagram of Multiple recognition model construction device provided in an embodiment of the present invention；

Fig. 9 is the subelement schematic block diagram of Multiple recognition model construction device provided in an embodiment of the present invention；

Figure 10 is another subelement schematic block diagram of Multiple recognition model construction device provided in an embodiment of the present invention；

Figure 11 is another subelement schematic block diagram of Multiple recognition model construction device provided in an embodiment of the present invention；

Figure 12 is another schematic block diagram of Multiple recognition model construction device provided in an embodiment of the present invention；

Figure 13 is another schematic block diagram of Multiple recognition model construction device provided in an embodiment of the present invention；

Figure 14 is another schematic block diagram of Multiple recognition model construction device provided in an embodiment of the present invention；

Figure 15 is the schematic block diagram of computer equipment provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.

It should be appreciated that ought use in this specification and in the appended claims, term " includes " and "comprising" instruction Described feature, entirety, step, operation, the presence of element and/or component, but one or more of the other feature, whole is not precluded Body, step, operation, the presence or addition of element, component and/or its set.

It is also understood that mesh of the term used in this description of the invention merely for the sake of description specific embodiment And be not intended to limit the present invention.As description of the invention and it is used in the attached claims, unless on Other situations are hereafter clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.

It will be further appreciated that the term "and/or" used in description of the invention and the appended claims is Refer to any combination and all possible combinations of one or more of associated item listed, and including these combinations.

Referring to Fig. 1, Fig. 1 is the flow diagram of Multiple recognition model building method provided in an embodiment of the present invention, it should Multiple recognition model building method is applied in management server, and this method passes through the application software that is installed in management server It is executed, management server is the enterprise terminal for carrying out Multiple recognition model construction.

As shown in Figure 1, the method comprising the steps of S110~S160.

S110, source data is obtained, source data is pre-processed, data after being handled.

In the present embodiment, acquired source data can be the data under multiple application scenarios.Such as it settles a claim in vehicle insurance Under scene, acquired source data can be the driving recording data of automobile data recorder, such as source data collected is to come from In vehicle insurance APP, when user drives, spontaneous starting GPS positioning module is to GPS data into acquisition, and wherein source data includes longitude and latitude The time series datas such as degree, speed, height above sea level and direction, period are that every 1s acquires a data.

Since GPS positioning module source data collected is related with satellite-signal, it is easy in the place of high building There is the inaccurate situation of longitude and latitude data；And since APP needs outgoing data packet, in order to guarantee that data packet has been sent into Function can inevitably encounter data packet and send out the case where leading to Data duplication again.In order to ensure understanding the abnormal data in source data, need Source data is pre-processed.

In one embodiment, as shown in Fig. 2, step S110 includes:

If increasing degree of the included time series data there are current data with respect to last data exceeds in S111, source data The value of current data delete or the value of current data is set to last data * amount of increase threshold by preset amount of increase threshold value Value, obtains data after dealing of abnormal data；

S112, data are taken out by preset time cycle constant duration in number after dealing of abnormal data, is referred to Mark data；

S113, achievement data is normalized, data after being handled.

In the present embodiment, in order to eliminate the abnormal data in source data, can judge according to the sequencing at time point Included time series data exceeds preset amount of increase with respect to the increasing degree of last data with the presence or absence of current data in source data Threshold value (such as amount of increase threshold value is set as 150%).If increasing degree of the time series data there are current data with respect to last data exceeds Preset amount of increase threshold value then indicates that there are abnormal datas, and two ways can be used at this time and handled, and one is delete the timing Data, another kind are that the value of the time series data is set to last data * amount of increase threshold value, obtain data after dealing of abnormal data.

Then, data are taken out to number medium time interval after dealing of abnormal data, obtains achievement data.Constant duration Taking-up data, be the time-serial position in order to construct relative smooth.

Finally achievement data is normalized, is to be standardized data in order to subsequent mould Type training.

Wherein, when logarithm type data are normalized, numerical value normalizing is carried out using with minimax laws for criterion Change, the formula of minimax laws for criterion are as follows:Wherein X_nomIndicate normalization after numerical value, X indicate to The numerical value of normalized, X_minIndicate the minimum value of the parameter, X_maxIndicate the maximum value of the parameter.To classification type data, adopt With one-hot coding.

S120, using the data at time point each in data after processing as a row vector, by data after processing on time Between the corresponding row vector of sequencing sequentially combine from top to bottom, obtain primitive character matrix.

In the present embodiment, the corresponding timing of time point t is 100,100,60,300,230 (preceding two such as in source data A numerical value respectively represents longitude and dimension, third numerical value representation speed, and the 4th numerical value represents height above sea level, the 5th numerical value Represent direction), corresponding row vector is [0.2 0.3 0.5 0.1 0.3]；The corresponding row vector of time point 2t is [0.2 0.1 0.1 0.3 0.2]；..., the corresponding row vector of time point Nt is [0.1 0.1 0.1 0.1 0.1].By this N number of row to Amount sequentially combines from top to bottom according to chronological order, obtains the primitive character matrix of a N row 5 column.By by source data It is converted to primitive character matrix according to timing, initial data can be normalized standard processing using the standard as training pattern Input data.

S130, the column vector chosen in primitive character matrix is subjected to statistical calculation to obtain newly-increased column vector, deleted With the size ranking of the Pearson correlation coefficient of each column vector in primitive character matrix positioned at preset in newly-increased column vector Column vector after rank threshold is simplified rear eigenmatrix.

In this embodiment, the dimension of primitive character matrix is less, can will be in primitive character matrix in order to promote its dimension The column vector chosen carries out statistical calculation to obtain newly-increased column vector realization and extend to the dimension of primitive character matrix.For example, The primitive character matrix that obtained N row 5 arranges, after being extended using statistical calculation, after obtaining the increase feature arranged for N row M Matrix.In order to retain increase feature after increase the higher column vector of data value in column vector in matrix newly, can delete and newly add It is located at preset ranking threshold with the size ranking of the Pearson correlation coefficient of each column vector in primitive character matrix in vector Column vector after value is simplified rear eigenmatrix.

Wherein, the Pearson correlation coefficient between any two column vector is calculated, can be calculate by the following formula:

Wherein, E indicates mathematic expectaion；

ρ_X,YValue range be (0,1), work as ρ_X,YIt indicates that the similarity degree of two column vectors is higher closer to 1, works as ρ_X,YMore Indicate that the similarity degree of two column vectors is lower close to 0.

In one embodiment, as shown in figure 3, step S130 includes:

S131, preset statistical calculation species number is obtained；

Two column vectors in S132, random selection primitive character matrix, obtain identical with the statistical calculation species number Column vector combination；

S133, combine each column vector and calculate by one type operation in statistical calculation, obtain number be greater than or Equal to the newly-increased column vector of the statistical calculation species number.

Wherein, the statistical calculation includes summation operation, asks difference operation, multiplying and division arithmetic.For example, when default Statistical calculation species number be 4 kinds, be summation operation respectively, ask difference operation, multiplying and division arithmetic.At this time in original spy 2 column vectors of selection random in matrix are levied, a and b is denoted as respectively, then sums respectively to each numerical value in the two column vectors Operation a+b, difference operation a-b, multiplying a*b and division arithmetic a/b are asked.

For example, primitive character matrix is as follows:

For the data for arranging primitive character matrix-expand for 2 rows 9, the first column data can be added work with the second column data For the 6th column data, the second column data and third column data are subtracted each other as the 7th column data, the first column data and third columns It is used as the 8th column data according to being multiplied, third column data and the 5th column data are divided by as the 9th column data, and obtaining following matrix (can To be interpreted as increasing matrix after feature):

The 4 column column vectors newly increased in matrix after increasing feature seek Pearson came phase with the column vector of front 5 respectively respectively Relationship number, wherein the Pearson correlation coefficient between the column vector that the column vector of the 6th column and the column vector of first row are arranged to the 5th RespectivelyChoose maximum correlation coefficient thereinColumn vector as the 6th column With the related coefficient in the column vector of first five column.

Similarly, the related coefficient ... ... for calculating the column vector of the 7th column and the column vector of first five column, when obtaining each column With the related coefficient of the column vector of first five column, can and will increase after feature in matrix in each column vector and original feature vector Each column vector calculation Pearson correlation coefficient, if there are in column vector and primitive character matrix in vector after increasing feature The size ranking of the Pearson correlation coefficient of each column vector is located at after preset rank threshold, and corresponding column vector is carried out It deletes, is simplified rear eigenmatrix, such as preset rank threshold is 2, if then assuming the column vector and the 8th column of the 6th column Column vector be not deleted column vector, then eigenmatrix is as follows after simplifying:

The dimension of primitive character matrix can be quicklyd increase through the above way, and can be more by primitive character matrix-expand Eigenmatrix after the simplification of embodiment source data feature comprehensively.

S140, it will increase by a column result vector in eigenmatrix after simplification and obtain raw data set.

In the present embodiment, it is trained, each row vector of eigenmatrix after simplification need to be made in order to treat training pattern For the input to training pattern, while needing to increase in eigenmatrix after simplification a column result vector also with each row vector of correspondence And using the corresponding end value of each row vector as the output to training pattern, the training for treating training pattern can be completed.

For example, eigenmatrix is as follows after simplifying:

Raw data set is obtained after increasing by a column result vector are as follows:

Initial data concentrates each row vector to represent a training data, wherein preceding 7 data are made in each row vector For the input to training pattern, the last one data in each row vector are as the output to training pattern.

S150, it is concentrated in initial data and extracts the row vector of preset percentage as training set by row, and by initial data Concentrate non-extracted row vector as test set.

In the present embodiment, preset percentage is taken to indicate that the line number of extracted row vector accounts for the total of entire raw data set The percentage of line number, such as preset percentage is taken to may be configured as 70% indicates to choose initial data and concentrates 70% (i.e. preset the One ratio is row vector 70%) as training set, will not be chosen as the row vector of the residue 30% of training set as test set. By the way that raw data set is split as training set and test set, it can realize the training for treating training pattern, it can also be to instruction The identification model got is tested.

S160, by training set to including that multiple submodels being trained to training pattern, obtain knowing for instead cheating Other identification model.

In the present embodiment, selected submodel have support vector machines (i.e. SupportVector Machine, SVM, It is supervised learning model related to relevant learning algorithm, can analyze data, recognition mode, for classifying and returning point Analysis), XGBoost model (eXtreme Gradient Boosting, it can be understood as extreme gradient is promoted, XGBoost model Be exactly a kind of monitor model), Random Forest model etc..

By training set to including being trained to training pattern for multiple submodels, it includes polynary for can be obtained one The Multiple recognition model of submodel can accurately judge data to be identified.And since identification model is multiple Model can efficiently and dynamically adjust entire multi-model cooperation, have stronger fitness to different data set.

In one embodiment, as shown in figure 4, step S160 includes:

The total packet number of S161, acquisition to submodel in training pattern；

S162, training set is subjected to equal part according to total packet number, obtains sub- training set identical with the several numbers of total packet；

S163, carry out model training for each sub- training set as the training data of corresponding grouping submodel, obtain with Submodel identifies submodel correspondingly, to form multiple identification model.

In the present embodiment, first obtain the total packet number to submodel in training pattern, be in order to according to total packet number into The equal part of the corresponding number of row.For example, the total packet number to submodel in training pattern is 3, then training set is divided into 3 equal parts, Each sub- training set carries out model training respectively as the training data of corresponding grouping submodel, obtains corresponding with submodel Identification submodel, to form multiple identification model.In addition to using the above-mentioned total packet number according to submodel to training set etc. Point mode treat each submodel in training pattern and be trained, training set can also not divided directly but directly with Training set is respectively trained each submodel.Using including multiple submodels to training pattern, it can increase or reduce son Model has stronger scalability.

In one embodiment, as shown in figure 5, after step S160 further include:

S171, test set is inputted into the identification model progress model verifying, if input vector included in test set The difference for importing the end value in the result and test set that identification model is exported is less than default error threshold, and identification model is by testing It demonstrate,proves and saves the identification model.

In the present embodiment, in order to ensure the recognition accuracy of identification model, identification mould is being obtained by training set training After type, test set need to be inputted to the identification model and carry out model verifying.When the input vector in test set is input to identification The difference of model, obtained output result and the corresponding end value in test set is (included i.e. in test set within the allowable range The difference of end value that imports in the identification model result and test set that are exported of input vector be less than default error threshold), then Indicate the identification model can by verifying and it is specifically used.Once the identification model is not verified by model, then it represents that selected The submodel selected is unreasonable, needs to input more training datas again to be trained to the submodel.It is verified by increasing Link can effectively ensure the availability and recognition accuracy of identification model.

In one embodiment, as shown in fig. 6, after step S160 further include:

S1721, data to be identified are input to identification model, are corresponded to obtain sub- identification knot according to each submodel Fruit；

S1722, each sub- recognition result multiplied by the default weighted value of corresponding submodel and is summed, obtains recognition result.

In the present embodiment, as the first specific embodiment for obtaining recognition result, in order to make multi-model assist to be known It not as a result, can be for the corresponding default weighted value of each submodel, by each sub- recognition result multiplied by the default weight of corresponding submodel It is worth and sums, obtain recognition result, can efficiently and dynamically adjusts entire multi-model cooperation.

In one embodiment, as shown in fig. 7, after step S160 further include:

S1731, data to be identified are input to identification model, are corresponded to obtain sub- identification knot according to each submodel Fruit；

S1732, the sub- recognition result that numerical value output frequency in sub- recognition result is maximum value is obtained, and as identification model Recognition result.

It in the present embodiment, can also be defeated by counting a subpattern as the second specific embodiment for obtaining recognition result The highest sub- recognition result of the frequency of occurrences also achieves one kind in this way and fills as final recognition result in sub- recognition result out Divide the identification process for considering each model assist process.

The method achieve by training set to it is multiple son simultaneously models be trained include to obtain Multiple Velocity Model knowledge Other model, improves the recognition accuracy of identification model.

The embodiment of the present invention also provides a kind of Multiple recognition model construction device, which is used for Execute any embodiment of aforementioned Multiple recognition model building method.Specifically, referring to Fig. 8, Fig. 8 is that the embodiment of the present invention mentions The schematic block diagram of the Multiple recognition model construction device of confession.The Multiple recognition model construction device 100 can be configured at management In server.

As shown in figure 8, Multiple recognition model construction device 100 includes pretreatment unit 110, original matrix acquiring unit 120, simplification matrix acquiring unit 130, raw data set acquiring unit 140, raw data set split cells 150 and model training Unit 160.

Pretreatment unit 110 pre-processes source data for obtaining source data, data after being handled.

In one embodiment, as shown in figure 9, pretreatment unit 110 includes:

Dealing of abnormal data unit 111, if for time series data included in source data there are current data relatively before The increasing degree of one data exceeds preset amount of increase threshold value, and the value of current data delete or by the value of current data It is set to last data * amount of increase threshold value, obtains data after dealing of abnormal data；

Extraction unit 112 at equal intervals, for pressing preset time cycle constant duration in number after dealing of abnormal data Ground takes out data, obtains achievement data；

Normalization unit 113, for achievement data to be normalized, data after being handled.

Original matrix acquiring unit 120, the data for each time point in data after handling as a row vector, By in chronological sequence the corresponding row vector of sequence sequentially combines from top to bottom in data after processing, primitive character matrix is obtained.

Simplification matrix acquiring unit 130, column vector for will be chosen in primitive character matrix carry out statistical calculation with Obtain newly-increased column vector, delete in newly-increased column vector with the Pearson correlation coefficient of each column vector in primitive character matrix Size ranking is located at the column vector after preset rank threshold, is simplified rear eigenmatrix.

Wherein, E indicates mathematic expectaion；

In one embodiment, as shown in Figure 10, simplification matrix acquiring unit 130 includes:

Operation species number acquiring unit 131, for obtaining preset statistical calculation species number；

Column vector selection unit 132 obtains and the statistics for randomly choosing two column vectors in primitive character matrix The identical column vector combination of operation species number；

Newly-increased column vector acquiring unit 133, for by the combination of each column vector by one type operation in statistical calculation into Row calculates, and obtains the newly-increased column vector that number is greater than or equal to the statistical calculation species number.

Raw data set acquiring unit 140, for increase in eigenmatrix after simplifying a column result vector obtain it is original Data set.

Raw data set split cells 150, for concentrating the row vector for extracting preset percentage by row to make in initial data For training set, and concentrate non-extracted row vector as test set initial data.

Model training unit 160, for, to including being trained to training pattern for multiple submodels, being obtained by training set To for the anti-identification model for cheating identification.

In the present embodiment, selected submodel have support vector machines (i.e. Support Vector Machine, SVM, It is supervised learning model related to relevant learning algorithm, can analyze data, recognition mode, for classifying and returning point Analysis), XGBoost model (eXtreme Gradient Boosting, it can be understood as extreme gradient is promoted, XGBoost model Be exactly a kind of monitor model), Random Forest model etc..

In one embodiment, as shown in figure 11, model training unit 160 includes:

Submodel packet count acquiring unit 161, for obtaining the total packet number to submodel in training pattern；

Training set division unit 162 obtains and the several numbers of total packet for training set to be carried out equal part according to total packet number Identical sub- training set；

Identification model acquiring unit 163, for being grouped the training data of submodel using each sub- training set as correspondence Model training is carried out, obtains identifying submodel correspondingly with submodel, to form multiple identification model.

In one embodiment, as shown in figure 12, Multiple recognition model construction device 100 further include:

Model measurement authentication unit 171 carries out model verifying for test set to be inputted the identification model, if test set In the difference of end value that imports in the result and test set that identification model is exported of included input vector be less than default error Threshold value, identification model is by verifying and saving the identification model.

In one embodiment, as shown in figure 13, Multiple recognition model construction device 100 further include:

First input unit 1721, it is a pair of according to each submodel one for data to be identified to be input to identification model It should obtain sub- recognition result；

Weight calculation unit 1722, for each sub- recognition result multiplied by the default weighted value of corresponding submodel and to be summed, Obtain recognition result.

In one embodiment, as shown in figure 14, Multiple recognition model construction device 100 further include:

Second input unit 1731, it is a pair of according to each submodel one for data to be identified to be input to identification model It should obtain sub- recognition result；

Frequency statistics unit 1732, for obtaining the son identification knot that numerical value output frequency in sub- recognition result is maximum value Fruit, and the recognition result as identification model.

The arrangement achieves by training set to it is multiple son simultaneously models be trained include to obtain Multiple Velocity Model knowledge Other model, improves the recognition accuracy of identification model.

Above-mentioned Multiple recognition model construction device can be implemented as the form of computer program, which can be It is run in computer equipment as shown in figure 15.

Figure 15 is please referred to, Figure 15 is the schematic block diagram of computer equipment provided in an embodiment of the present invention.

Refering to fig. 15, which includes processor 502, memory and the net connected by system bus 501 Network interface 505, wherein memory may include non-volatile memory medium 503 and built-in storage 504.

The non-volatile memory medium 503 can storage program area 5031 and computer program 5032.The computer program 5032 are performed, and processor 502 may make to execute Multiple recognition model building method.

The processor 502 supports the operation of entire computer equipment 500 for providing calculating and control ability.

The built-in storage 504 provides environment for the operation of the computer program 5032 in non-volatile memory medium 503, should When computer program 5032 is executed by processor 502, processor 502 may make to execute Multiple recognition model building method.

The network interface 505 is for carrying out network communication, such as the transmission of offer data information.Those skilled in the art can To understand, structure shown in Figure 15, only the block diagram of part-structure relevant to the present invention program, is not constituted to this hair The restriction for the computer equipment 500 that bright scheme is applied thereon, specific computer equipment 500 may include than as shown in the figure More or fewer components perhaps combine certain components or with different component layouts.

Wherein, the processor 502 is for running computer program 5032 stored in memory, to realize following function Can: source data is obtained, source data is pre-processed, data after being handled；By the number at time point each in data after processing According to as a row vector, by chronological sequence the corresponding row vector of sequence sequentially combines from top to bottom in data after processing, obtain To primitive character matrix；The column vector chosen in primitive character matrix is subjected to statistical calculation to obtain newly-increased column vector, is deleted It is default except being located in newly-increased column vector with the size ranking of the Pearson correlation coefficient of each column vector in primitive character matrix Rank threshold after column vector, be simplified rear eigenmatrix；It will increase by a column result vector in eigenmatrix after simplification Obtain raw data set；It is concentrated in initial data and extracts the row vector of preset percentage as training set by row, and by original number According to the non-extracted row vector of concentration as test set；And by training set to include multiple submodels to training pattern into Row training, obtains for the anti-identification model for cheating identification.

In one embodiment, processor 502 execute it is described source data is pre-processed, the step of data after being handled It when rapid, performs the following operations: if the growth width of time series data included in source data there are current data with respect to last data Degree exceeds preset amount of increase threshold value, and the value of current data delete or the value of current data is set to last data * Amount of increase threshold value obtains data after dealing of abnormal data；By between the times such as preset time cycle in number after dealing of abnormal data Data are taken out every ground, obtain achievement data；Achievement data is normalized, data after being handled.

In one embodiment, processor 502 described unites the column vector chosen in primitive character matrix executing When meter operation is to obtain the step of newly-increased column vector, performs the following operations: obtaining preset statistical calculation species number；Random selection Two column vectors in primitive character matrix obtain column vector identical with the statistical calculation species number and combine；By it is each arrange to Amount combination is calculated by one type operation in statistical calculation, obtains number more than or equal to the statistical calculation species number Newly-increased column vector；Wherein, the statistical calculation includes summation operation, asks difference operation, multiplying and division arithmetic.

In one embodiment, processor 502 is executing the training set that passes through to the mould to be trained including multiple submodels Type is trained, and when obtaining the step for the anti-identification model for cheating identification, is performed the following operations: being obtained in training pattern The total packet number of submodel；Training set is subjected to equal part according to total packet number, obtains son training identical with the several numbers of total packet Collection；Model training is carried out using each sub- training set as the training data of corresponding grouping submodel, is obtained with submodel one by one Corresponding identification submodel, to form multiple identification model.

In one embodiment, processor 502 is executing the training set that passes through to the mould to be trained including multiple submodels After the step of type is trained, and is obtained for the anti-identification model for cheating identification, also performs the following operations: test set is inputted The identification model carries out model verifying, if input vector included in test set import result that identification model is exported with The difference of end value in test set is less than default error threshold, and identification model is by verifying and saving the identification model.

In one embodiment, processor 502 is executing the training set that passes through to the mould to be trained including multiple submodels It after the step of type is trained, and is obtained for the anti-identification model for cheating identification, also performs the following operations: by data to be identified It is input to identification model, is corresponded to obtain sub- recognition result according to each submodel；By each sub- recognition result multiplied by corresponding son The default weighted value of model is simultaneously summed, and recognition result is obtained.

In one embodiment, processor 502 is executing the training set that passes through to the mould to be trained including multiple submodels It after the step of type is trained, and is obtained for the anti-identification model for cheating identification, also performs the following operations: by data to be identified It is input to identification model, is corresponded to obtain sub- recognition result according to each submodel；Numerical value in sub- recognition result is obtained to export Frequency is the sub- recognition result of maximum value, and the recognition result as identification model.

It will be understood by those skilled in the art that the embodiment of computer equipment shown in Figure 15 is not constituted to computer The restriction of equipment specific composition, in other embodiments, computer equipment may include components more more or fewer than diagram, or Person combines certain components or different component layouts.For example, in some embodiments, computer equipment can only include depositing Reservoir and processor, in such embodiments, the structure and function of memory and processor are consistent with embodiment illustrated in fig. 15, Details are not described herein.

It should be appreciated that in embodiments of the present invention, processor 502 can be central processing unit (Central Processing Unit, CPU), which can also be other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable GateArray, FPGA) or other programmable logic devices Part, discrete gate or transistor logic, discrete hardware components etc..Wherein, general processor can be microprocessor or The processor is also possible to any conventional processor etc..

A kind of computer readable storage medium is provided in another embodiment of the invention.The computer readable storage medium It can be non-volatile computer readable storage medium.The computer-readable recording medium storage has computer program, wherein Acquisition source data is performed the steps of when computer program is executed by processor, source data is pre-processed, after obtaining processing Data；It, will be in chronological sequence suitable in data after processing using the data at time point each in data after processing as a row vector The corresponding row vector of sequence sequentially combines from top to bottom, obtains primitive character matrix；By the column chosen in primitive character matrix to Amount carries out statistical calculation to obtain newly-increased column vector, delete in newly-increased column vector with each column vector in primitive character matrix The size ranking of Pearson correlation coefficient is located at the column vector after preset rank threshold, is simplified rear eigenmatrix；It will Increase by a column result vector after simplification in eigenmatrix and obtains raw data set；It is concentrated in initial data and extracts default percentage by row The row vector of ratio concentrates non-extracted row vector as test set as training set, and using initial data；And pass through training Collect to including being trained to training pattern for multiple submodels, obtains for the anti-identification model for cheating identification.

In one embodiment, described that source data is pre-processed, data after being handled, comprising: if institute in source data Including time series data there are current data with respect to last data increasing degree exceed preset amount of increase threshold value, by current data Value carry out delete or the value of current data is set to last data * amount of increase threshold value, obtain data after dealing of abnormal data； Data are taken out by preset time cycle constant duration in number after dealing of abnormal data, obtain achievement data；By index Data are normalized, data after being handled.

In one embodiment, the statistical calculation includes summation operation, asks difference operation, multiplying and division arithmetic；Institute It states and the column vector chosen in primitive character matrix is subjected to statistical calculation to obtain newly-increased column vector, comprising: obtain preset Statistical calculation species number；Two column vectors in primitive character matrix are randomly choosed, are obtained identical as the statistical calculation species number Column vector combination；Combine each column vector and calculate by one type operation in statistical calculation, obtain number be greater than or Equal to the newly-increased column vector of the statistical calculation species number.

In one embodiment, it is described by training set to including that multiple submodels being trained to training pattern, obtain For the anti-identification model for cheating identification, comprising: obtain the total packet number to submodel in training pattern；By training set according to total Packet count carries out equal part, obtains sub- training set identical with the several numbers of total packet；It is grouped using each sub- training set as corresponding The training data of submodel carries out model training, obtains identifying submodel correspondingly with submodel, to form multiple knowledge Other model.

In one embodiment, it is described by training set to including that multiple submodels being trained to training pattern, obtain After the anti-identification model for cheating identification, further includes: test set is inputted the identification model and carries out model verifying, if surveying The difference for the end value that examination concentrates included input vector to import in the result and test set that identification model is exported is less than default Error threshold, identification model is by verifying and saving the identification model.

In one embodiment, it is described by training set to including that multiple submodels being trained to training pattern, obtain After the anti-identification model for cheating identification, further includes: data to be identified are input to identification model, according to each submodel One-to-one correspondence obtains sub- recognition result；Each sub- recognition result multiplied by the default weighted value of corresponding submodel and is summed, is known Other result.

In one embodiment, it is described by training set to including that multiple submodels being trained to training pattern, obtain After the anti-identification model for cheating identification, further includes: data to be identified are input to identification model, according to each submodel One-to-one correspondence obtains sub- recognition result；It obtains numerical value output frequency in sub- recognition result and is the sub- recognition result of maximum value, and make For the recognition result of identification model.

It is apparent to those skilled in the art that for convenience of description and succinctly, foregoing description is set The specific work process of standby, device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein. Those of ordinary skill in the art may be aware that unit described in conjunction with the examples disclosed in the embodiments of the present disclosure and algorithm Step can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and software Interchangeability generally describes each exemplary composition and step according to function in the above description.These functions are studied carefully Unexpectedly the specific application and design constraint depending on technical solution are implemented in hardware or software.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed The scope of the present invention.

In several embodiments provided by the present invention, it should be understood that disclosed unit and method, it can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only logical function partition, there may be another division manner in actual implementation, can also will be with the same function Unit set is at a unit, such as multiple units or components can be combined or can be integrated into another system or some Feature can be ignored, or not execute.In addition, shown or discussed mutual coupling, direct-coupling or communication connection can Be through some interfaces, the indirect coupling or communication connection of device or unit, be also possible to electricity, mechanical or other shapes Formula connection.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.Some or all of unit therein can be selected to realize the embodiment of the present invention according to the actual needs Purpose.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, is also possible to two or more units and is integrated in one unit.It is above-mentioned integrated Unit both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in one storage medium.Based on this understanding, technical solution of the present invention is substantially in other words to existing The all or part of part or the technical solution that technology contributes can be embodied in the form of software products, should Computer software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be Personal computer, server or network equipment etc.) execute all or part of step of each embodiment the method for the present invention Suddenly.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), magnetic disk or The various media that can store program code such as person's CD.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replace It changes, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with right It is required that protection scope subject to.

Claims

1. a kind of Multiple recognition model building method characterized by comprising

Using the data at time point each in data after processing as a row vector, by chronological sequence sequence in data after processing Corresponding row vector sequentially combines from top to bottom, obtains primitive character matrix；

The column vector chosen in primitive character matrix is subjected to statistical calculation to obtain newly-increased column vector, deletes newly-increased column vector In with the size ranking of the Pearson correlation coefficient of each column vector in primitive character matrix be located at preset rank threshold it Column vector afterwards is simplified rear eigenmatrix；

The row vector for extracting preset percentage by row is concentrated not mentioned as training set, and by initial data concentration in initial data The row vector taken is as test set；And

By training set to including being trained to training pattern for multiple submodels, obtain for the anti-identification mould for cheating identification Type.

2. Multiple recognition model building method according to claim 1, which is characterized in that described to be located in advance to source data Reason, data after being handled, comprising:

If there are the increasing degrees of the opposite last data of current data to rise beyond preset for included time series data in source data The value of current data delete or the value of current data is set to last data * amount of increase threshold value, obtains different by width threshold value Data after regular data processing；

Data are taken out by preset time cycle constant duration in number after dealing of abnormal data, obtain achievement data；

Achievement data is normalized, data after being handled.

3. Multiple recognition model building method according to claim 1, which is characterized in that the statistical calculation includes summation Difference operation, multiplying and division arithmetic are asked in operation；

It is described that the column vector chosen in primitive character matrix is subjected to statistical calculation to obtain newly-increased column vector, comprising:

Obtain preset statistical calculation species number；

Two column vectors in primitive character matrix are randomly choosed, Column vector groups identical with the statistical calculation species number are obtained It closes；

It combines each column vector and is calculated by one type operation in statistical calculation, obtain number more than or equal to the system Count the newly-increased column vector of operation species number.

4. Multiple recognition model building method according to claim 1, which is characterized in that it is described by training set to including Multiple submodels are trained to training pattern, are obtained for the anti-identification model for cheating identification, comprising:

Obtain the total packet number to submodel in training pattern；

Training set is subjected to equal part according to total packet number, obtains sub- training set identical with the several numbers of total packet；

Model training is carried out using each sub- training set as the training data of corresponding grouping submodel, is obtained with submodel one by one Corresponding identification submodel, to form multiple identification model.

5. Multiple recognition model building method according to claim 1, which is characterized in that it is described by training set to including Multiple submodels are trained to training pattern, are obtained for after the anti-identification model for cheating identification, further includes:

Test set is inputted into the identification model and carries out model verifying, if input vector included in test set imports identification mould The difference for the end value in result and test set that type is exported is less than default error threshold, and identification model is by verifying and saving institute State identification model.

6. Multiple recognition model building method according to claim 1, which is characterized in that it is described by training set to including Multiple submodels are trained to training pattern, are obtained for after the anti-identification model for cheating identification, further includes:

Data to be identified are input to identification model, are corresponded to obtain sub- recognition result according to each submodel；

Each sub- recognition result multiplied by the default weighted value of corresponding submodel and is summed, recognition result is obtained.

7. Multiple recognition model building method according to claim 1, which is characterized in that it is described by training set to including Multiple submodels are trained to training pattern, are obtained for after the anti-identification model for cheating identification, further includes:

Obtain the sub- recognition result that numerical value output frequency in sub- recognition result is maximum value, and the identification knot as identification model Fruit.

8. a kind of Multiple recognition model construction device characterized by comprising

Original matrix acquiring unit, the data for each time point in data after handling will be handled as a row vector In chronological sequence the corresponding row vector of sequence sequentially combines from top to bottom in data afterwards, obtains primitive character matrix；

Simplification matrix acquiring unit, for the column vector chosen in primitive character matrix to be carried out statistical calculation to be increased newly Column vector deletes the size ranking in newly-increased column vector with the Pearson correlation coefficient of each column vector in primitive character matrix Column vector after preset rank threshold is simplified rear eigenmatrix；

Raw data set acquiring unit obtains raw data set for increasing by a column result vector in eigenmatrix after simplifying；

Raw data set split cells extracts the row vector of preset percentage as training by row for concentrating in initial data Collection, and concentrate non-extracted row vector as test set initial data；

Model training unit, for, to including being trained to training pattern for multiple submodels, being used for by training set The identification model of anti-fraud identification.

9. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor The computer program of operation, which is characterized in that the processor realizes such as claim 1 to 7 when executing the computer program Any one of described in Multiple recognition model building method.

10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey Sequence, it is as described in any one of claim 1 to 7 more that the computer program when being executed by a processor executes the processor Weight identification model construction method.