CN110276621A - Data card is counter to cheat recognition methods, electronic device and readable storage medium storing program for executing - Google Patents
Data card is counter to cheat recognition methods, electronic device and readable storage medium storing program for executing Download PDFInfo
- Publication number
- CN110276621A CN110276621A CN201910422069.6A CN201910422069A CN110276621A CN 110276621 A CN110276621 A CN 110276621A CN 201910422069 A CN201910422069 A CN 201910422069A CN 110276621 A CN110276621 A CN 110276621A
- Authority
- CN
- China
- Prior art keywords
- fraud
- analysis
- preset
- data
- preset kind
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/018—Certifying business or products
- G06Q30/0185—Product, service or business identity fraud
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Finance (AREA)
- Artificial Intelligence (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- Entrepreneurship & Innovation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Strategic Management (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention relates to technical field of safety protection, disclose that a kind of data card is counter to cheat recognition methods, device and storage medium.This method has monitor model to be analyzed the preset kind characteristic in user data to obtain the first analysis result by preset kind trained in advance, and by preset kind unsupervised model trained in advance the preset kind characteristic in user data is analyzed to obtain the second analysis as a result, and then result is analyzed according to first obtained and the second Analysis result calculation goes out Comprehensive analysis results.Using the invention discloses technical solution, realize that the low data card of high-precision, high stability, system performance requirements is counter to cheat automatic identification.
Description
Technical field
Recognition methods, electronic device are cheated the present invention relates to technical field of safety protection more particularly to a kind of data card are counter
And readable storage medium storing program for executing.
Background technique
Currently, the scope of business of many financing corporations is related to multiple financial business scopes such as insurance, bank, investment, in order to
Financial risks is effectively reduced, this kind of company is handling corresponding financial business (for example, credit card transacting business, loan for client
Business etc.) after, anti-fraud identification can be all carried out to client.However, this kind of company be usually by the way of artificial to client into
Row anti-fraud identification, although occurring some anti-fraud automatic identification schemes at present on the market, these existing anti-frauds are certainly
Dynamic identifying schemes are normally based on clustering algorithm or sorting algorithm and carry out automatic identification, that there are precision is insufficient, stability is not high,
For the technical problems such as more demanding of system performance.
Therefore, how to provide a kind of high-precision, high stability, system performance requirements low anti-fraud automatic identification scheme,
Have become a technical problem urgently to be resolved.
Summary of the invention
In view of the foregoing, the main purpose of the present invention is to provide data card it is counter cheat recognition methods, electronic device and
Readable storage medium storing program for executing cheats identification with more stable, accurately anti-to client's progress.
To achieve the above object, the present invention provides that a kind of data card is counter to cheat recognition methods, this method comprises:
The user data for the pending anti-fraud analysis that predetermined terminal is sent is received, alternatively, receiving in advance
Determining terminal send with user identifier anti-fraud analysis request after, extracted from predetermined database with
The corresponding user data of the user identifier, the user data include one or more preset kind characteristics;
Preset kind characteristic in the user data, which is substituted into preset kind trained in advance, to be had in monitor model
It is analyzed, with output the first analysis result;And the preset kind characteristic in the user data is substituted into training in advance
The unsupervised model of preset kind in analyzed, with output second analysis result;
The first analysis result and the second analysis result are substituted into predetermined formula, comprehensive analysis knot is calculated
Fruit;
If calculated Comprehensive analysis results are greater than preset threshold, preset format is sent to the predetermined terminal
Risk of fraud warning information;And/or
If calculated Comprehensive analysis results are less than or equal to preset threshold, sent to the predetermined terminal
The result feedback information of preset format.
Preferably, the formula are as follows: F (X)=c × X1+b×X2, wherein F (X) represents the Comprehensive analysis results, X1Generation
The first analysis is as a result, X described in table2Second analysis is represented as a result, c, b is predetermined weighted values.
Preferably, it is to promote decision-tree model training by gradient to obtain that the preset kind, which has monitor model, described
The unsupervised model of preset kind is to stand abreast what model training obtained by the way that simulated annealing is isolated.
Preferably, the training process that the preset kind has monitor model includes:
The data sample of the user data of the preset quantity in preset time is obtained, and is each data sample label pair
The fraud mark answered, the fraud mark include cheating to extract one or more for each data sample with non-fraud
Preset kind characteristic simultaneously generates corresponding characteristic set;
Each feature in each characteristic set is analyzed according to predetermined analysis rule, to extract
Key feature in each characteristic set, and the corresponding key feature of each data sample is divided into the training of the first ratio
The verifying collection of collection and the second ratio;
There is monitor model using the key feature information training preset kind in the training set, and is tested described in utilization
The key feature information that card is concentrated verifies the accuracy rate that the preset kind has monitor model;
If accuracy rate is more than or equal to preset threshold, training terminates, alternatively, if accuracy rate is less than preset threshold,
Increase the quantity of the data sample and re-executes above steps.
Preferably, the predetermined analysis rule are as follows:
Calculate support and/or confidence that the corresponding fraud of each feature in each characteristic set is identified as fraud
Degree;
Filter out support be greater than default support or greater than all features support average value feature, and/
Or, filter out confidence level be greater than default confidence level or greater than all features confidence level average value feature, as described
Key feature.
To achieve the above object, the present invention also provides a kind of electronic device, the electronic device includes memory and processing
Device.It is stored with that the data card that can be run on the processor is counter to cheat recognizer on the memory, the data card is anti-
Fraud recognizer realizes following steps when being executed by the processor:
The user data for the pending anti-fraud analysis that predetermined terminal is sent is received, alternatively, receiving in advance
Determining terminal send with user identifier anti-fraud analysis request after, extracted from predetermined database with
The corresponding user data of the user identifier, the user data include one or more preset kind characteristics;
Preset kind characteristic in the user data, which is substituted into preset kind trained in advance, to be had in monitor model
It is analyzed, with output the first analysis result;And the preset kind characteristic in the user data is substituted into training in advance
The unsupervised model of preset kind in analyzed, with output second analysis result;
The first analysis result and the second analysis result are substituted into predetermined formula, comprehensive analysis knot is calculated
Fruit;
If calculated Comprehensive analysis results are greater than preset threshold, preset format is sent to the predetermined terminal
Risk of fraud warning information;And/or
If calculated Comprehensive analysis results are less than or equal to preset threshold, sent to the predetermined terminal
The result feedback information of preset format.
Preferably, the formula are as follows: F (X)=c × X1+b×X2, wherein F (X) represents the Comprehensive analysis results, X1Generation
The first analysis is as a result, X described in table2Second analysis is represented as a result, c, b is predetermined weighted values.
Preferably, the training process that the preset kind has monitor model includes:
The data sample of the user data of the preset quantity in preset time is obtained, and is each data sample label pair
The fraud mark answered, the fraud mark include cheating to extract one or more for each data sample with non-fraud
Preset kind characteristic simultaneously generates corresponding characteristic set;
Each feature in each characteristic set is analyzed according to predetermined analysis rule, to extract
Key feature in each characteristic set, and the corresponding key feature of each data sample is divided into the training of the first ratio
The verifying collection of collection and the second ratio;
There is monitor model using the key feature information training preset kind in the training set, and is tested described in utilization
The key feature information that card is concentrated verifies the accuracy rate that the preset kind has monitor model;
If accuracy rate is more than or equal to preset threshold, training terminates, alternatively, if accuracy rate is less than preset threshold,
Increase the quantity of the data sample and re-executes above steps.
Preferably, the predetermined analysis rule are as follows:
Calculate support and/or confidence that the corresponding fraud of each feature in each characteristic set is identified as fraud
Degree;
Filter out support be greater than default support or greater than all features support average value feature, and/
Or, filter out confidence level be greater than default confidence level or greater than all features confidence level average value feature, as described
Key feature.
To achieve the above object, the present invention also provides a kind of computer readable storage medium, the computer-readable storages
Including that data card is counter in medium cheats recognizer, and the data card counter cheat realizes above-mentioned number when recognizer is executed by processor
The step of recognition methods is instead cheated according to card.
Compared with the prior art, the present invention has monitor model to default in user data by preset kind trained in advance
Type feature data analyzed to obtain the first analysis as a result, and by the unsupervised model of preset kind trained in advance to
Preset kind characteristic in user data is analyzed to obtain the second analysis as a result, in turn according to the first obtained analysis knot
Fruit and the second Analysis result calculation go out Comprehensive analysis results, have effectively broken inertial thinking (this field of those skilled in the art
The inertial thinking of technical staff has been normally based on supervision algorithm model and has identified, this especially in financial business field
Inertial thinking is even more deep-rooted), the model feature for having played monitor model and unsupervised model is maximized, is effectively utilized
There is supervision algorithm model to improve model accuracy, and be effectively utilized unsupervised algorithm and capture novel fraud form, both protected
Accuracy has been demonstrate,proved, has in turn ensured stability, meanwhile, there is effective combination of monitor model and unsupervised model, effectively reduces nothing
Imitate influence of the feature to model.
Detailed description of the invention
Fig. 1 is the hardware structure diagram that the present invention realizes counter one embodiment of electronic device for cheating identification of data card;
Fig. 2 is the anti-functional block diagram for cheating 10 1 embodiment of recognizer of data card in Fig. 1;
Fig. 3 is the anti-implementation flow chart for cheating one embodiment of recognition methods of data card of the present invention;
Fig. 4 is the anti-application environment schematic diagram for cheating one embodiment of recognition methods of data card of the present invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, not
For limiting the present invention.Based on the embodiments of the present invention, those of ordinary skill in the art are not before making creative work
Every other embodiment obtained is put, shall fall within the protection scope of the present invention.
It should be noted that the description for being related to " first ", " second " etc. in the present invention is used for description purposes only, and cannot
It is interpreted as its relative importance of indication or suggestion or implicitly indicates the quantity of indicated technical characteristic.Define as a result, " the
One ", the feature of " second " can explicitly or implicitly include at least one of the features.In addition, the skill between each embodiment
Art scheme can be combined with each other, but must be based on can be realized by those of ordinary skill in the art, when technical solution
Will be understood that the combination of this technical solution is not present in conjunction with there is conflicting or cannot achieve when, also not the present invention claims
Protection scope within.
Shown in referring to Fig.1, the hardware configuration of counter one embodiment of electronic device for cheating identification of data card is realized for the present invention
Figure.In the present embodiment, electronic device 1 can including being stored in memory 11, processor 12, network interface 13, memory 11
Recognizer 10 is cheated by data card that processor 12 executes is counter.
The electronic device 1 can be server, smart phone, tablet computer, portable computer, desktop PC etc.
Terminal device with storage and calculation function.It in one embodiment of the invention, should when electronic device 1 is server
Server can be the one or more of rack-mount server, blade server, tower server or Cabinet-type server etc..
The memory 11 includes the readable storage medium storing program for executing of at least one type.The readable storage of at least one type
Medium can be the non-volatile memory medium of such as flash memory, hard disk, multimedia card, card-type memory.In some embodiments, institute
Stating readable storage medium storing program for executing can be the internal storage unit of the electronic device 1, such as the hard disk of the electronic device 1.Another
In a little embodiments, the readable storage medium storing program for executing is also possible to the external memory 11 of the electronic device 1, such as electronics dress
Set the plug-in type hard disk being equipped on 1, intelligent memory card (Smart Media Card, SMC), secure digital (Secure
Digital, SD) card, flash card (Flash Card) etc..
The processor 12 can be central processing unit (Central Processing Unit, CPU), microprocessor or
Any other applicable data processing chip, program code or processing data for being stored in run memory 11, such as hold
Row data card is counter to cheat recognizer 10 etc..
Fig. 1 is illustrated only with the anti-electronic device 1 for cheating recognizer 10 of component 11-13 and data card, but is answered
Understand, electronic device 1 may include more or less component.
In one embodiment of the invention, processor 12 executes the data card stored in memory 11 and instead cheats identification journey
Following steps are realized when sequence 10:
The user data for the pending anti-fraud analysis that predetermined terminal is sent is received, alternatively, receiving in advance
Determining terminal send with user identifier anti-fraud analysis request after, extracted from predetermined database with
The corresponding user data of the user identifier, the user data include one or more preset kind characteristics;
Preset kind characteristic in the user data, which is substituted into preset kind trained in advance, to be had in monitor model
It is analyzed, with output the first analysis result;And the preset kind characteristic in the user data is substituted into training in advance
The unsupervised model of preset kind in analyzed, with output second analysis result;
The first analysis result and the second analysis result are substituted into predetermined formula, comprehensive analysis knot is calculated
Fruit;
If calculated Comprehensive analysis results are greater than preset threshold, preset format is sent to the predetermined terminal
Risk of fraud warning information;And/or
If calculated Comprehensive analysis results are less than or equal to preset threshold, sent to the predetermined terminal
The result feedback information of preset format.
Illustrate the function of each building block and the cooperation of mutual function in the electronic device 1 to preferably illustrate
Relationship is described in detail below in conjunction with Fig. 2-3.
It is the anti-functional block diagram for cheating 10 1 embodiment of recognizer of data card in Fig. 1 referring to shown in Fig. 2.The number
According to card, anti-fraud recognizer 10 is divided into multiple functional modules, and multiple functional module is stored in memory 11, and
It is executed by processor 12, cheats identification with more stable, accurately anti-to client's progress.So-called " module " of the invention refers to energy
Enough complete the series of computation machine program instruction set of specific function.
In the present embodiment, the anti-recognizer 10 of cheating of the data card is divided into: data acquisition module 110, output
Intermediate result module 120 and calculating synthesis result module 130.It should be understood that in the present embodiment, the data card is anti-
Fraud recognizer 10 is divided into data acquisition module 110, output intermediate result module 120 and calculates synthesis result module 130,
It gives expression to that the data card is counter to cheat function achieved by recognizer 10 just for the sake of clearer, is not used to limit
The anti-recognizer 10 of cheating of the data card is only capable of or must be broken into data acquisition module 110, output intermediate result module
120 and synthesis result module 130 is calculated, it will be apparent to those skilled in the art that can in other embodiments, it easily will be described
The anti-recognizer 10 of cheating of data card is divided into the functional module different from the present embodiment, and this will not be repeated here.
The data acquisition module 110, for receiving pending anti-fraud analysis that predetermined terminal is sent
User data, alternatively, receiving that predetermined terminal sends with user identifier (for example, ID card No., passport No.
Deng) anti-fraud analysis request after, from predetermined database (for example, insurance business data library, banking business data, sign
Telecommunications databases etc.) in extract user data corresponding with the user identifier, the user data includes one or more default
Type feature data are (for example, age, gender, occupation, annual income, native place, information of not repaying, data card (for example, credit card)
(for example, to pessimism/optimism degree of society etc., the values information can be handled in client for debt information, values information
Obtained when business by allowing client to fill in evaluation questionnaire analysis), reference information (for example, with the presence or absence of breaking one's promise situation etc., the sign
Letter information can be obtained from predetermined collage-credit data library " for example, personal collage-credit data library of People's Bank of China "), finance
Fraud information (such as, if once carried out financial fraud)).
The output intermediate result module 120, it is pre- for substituting into the preset kind characteristic in the user data
First trained preset kind has monitor model (for example, LR (Logistic Regression, logistic regression) model, GBDT
(Gradient Boosting Decision Tree, gradient promoted decision tree) model) in analyzed, with first point of output
Result is analysed (for example, it may be the probability value that fraud occurs scores higher alternatively, can be the score value of risk of fraud, to cheat
Risk is bigger), and the preset kind characteristic in the user data is substituted into the unsupervised mould of preset kind trained in advance
Type (for example, SA-iForest (Simulated AnnealingIsolation Forest, simulated annealing isolate forest) model)
In analyzed, with export second analysis result (for example, it may be fraud occur probability value, alternatively, can be risk of fraud
Score value, score it is higher, risk of fraud is bigger).
The calculating synthesis result module 130, it is preparatory for substituting into the first analysis result and the second analysis result
In determining formula, Comprehensive analysis results are calculated.
Optionally, the formula are as follows: F (X)=c × X1+b×X2, wherein F (X) represents Comprehensive analysis results, X1Represent institute
The first analysis is stated as a result, X2Second analysis is represented as a result, c, b is predetermined weighted values.
Optionally, it if calculated Comprehensive analysis results are greater than preset threshold, is sent to the predetermined terminal
The risk of fraud warning information of preset format (for example, this modal analysis results is " high risk of fraud ", do be infused
Meaning);If and/or calculated Comprehensive analysis results are less than or equal to preset threshold, send out to the predetermined terminal
Send the result feedback information (for example, this modal analysis results is " low risk of fraud ", please know) of preset format.
Optionally, the training process that the preset kind has monitor model includes:
The user data of preset quantity (for example, 1,000,000) in (for example, in nearest 1 year) B1, acquisition preset time
Data sample, and mark corresponding fraud mark for each data sample, the fraud mark includes cheating and non-fraud;
For each data sample, extract one or more preset kind characteristics and generate corresponding characteristic set (for example,
The characteristic set of the preset kind characteristic of data sample i can be { Xi1, Xi2, Xi3 ..., Xim }, this feature set
Including m different types of features);
B2, each feature in each characteristic set is analyzed according to predetermined analysis rule, to mention
The key feature in each characteristic set is taken out, and the corresponding key feature of each data sample is divided into the first ratio
The verifying collection of the training set of (for example, 40%) and the second ratio (for example, 30%);
B3, there is monitor model (for example, logic using the key feature information training preset kind in the training set
Regression model), and preset kind described in the key feature information authentication concentrated using the verifying has the standard of monitor model
True rate;
If B4, accuracy rate are more than or equal to preset threshold, training terminates, alternatively, if accuracy rate is less than preset threshold,
Then increase the quantity of the data sample and re-executes above steps.
Optionally, the predetermined analysis rule are as follows:
C1, the corresponding fraud of each feature calculated in each characteristic set are identified as the support of fraud and/or set
(support of a feature refers to that the quantity of the characteristic set comprising this feature accounts for the hundred of the quantity of all characteristic sets to reliability
Divide ratio;The confidence level of one feature refers to the quotient of this feature support Yu a default support, for example, if feature Xim
Support is ai1, which is a2, then the confidence level of this feature Xim is ai1/a2);
C2, filter out support be greater than default support or greater than all features support average value feature,
And/or filter out confidence level be greater than default confidence level or greater than all features confidence level average value feature, as institute
State key feature.
Optionally, the training process of the unsupervised model of the preset kind includes:
First S1, building iTree (Isolation Tree, isolated tree) simultaneously generate data sample training set DTrain=
{ d1, d2 ..., di ..., dn }, wherein di is expressed as i-th of data sample, and the data sample training set is divided at random
Two parts, respectively DTrain1 and DTrain2;By the DTrain1 and DTrain2 generation characteristic set A=A1, A2 ..., Ai ...,
Am }, wherein Ai is expressed as ith feature in characteristic set and is selected random seed, from DTrainM are randomly choosed in 1 sample
Sample, and be randomly chosen k feature from characteristic set A and constitute subcharacter set a={ A1, A2 ..., Ak }, it is random in a
A feature Aj is selected, then each data sample is divided according to the split values p of feature Aj, k≤m;If data di
Aj feature characteristic value di (Aj) < p, then be placed on left subtree (left subtree is exactly to be seen with present node, its left child node that
The subtree of branch, the subtree is using the left child node of present node as root), it is on the contrary then be placed on right subtree (right subtree is exactly to work as prosthomere
Point sees that the subtree of its right that branch of child node, the subtree is using the right child node of present node as root);In this way iteratively
Left and right subtree is constructed, until meeting one of following condition:
(1)DTrainIn only a remaining data or a plurality of identical data;
(2) tree reaches maximum height;
With identical random seed and same procedure in DTrainFirst iTree model is again pulled up on 2 samples.
S2, repetition S1 construct 2* (L-1) tree again and form initial forest T={ T11,T12,T21,T22,…,TL1,TL2,
Middle T indicates the set of 2L iTree, Ti1Indicate the first time training of i-th iTree, Ti2Indicate second of i-th iTree
Training;
S3, D is used respectivelyTrain1 and DTrain2 couples of initial forest T={ T11,T12,T21,T22,…,TL1,TL2Be trained,
According to the difference value between Q- normalized set difference iTree, the accuracy value of single iTree is calculated with cross validation method
ACC;
ACC={ X1,X2,…,XL, wherein Q indicates the difference matrix of L iTree;QijIt is expressed as isolated tree TiWith it is isolated
Set TjBetween difference value;XjIndicate isolated tree TjAccuracy value;
If any two iTree independence, the value of the Q statistical magnitude of this two iTree is 0, the value of Q statistical magnitude [-
1,1] change between, the value of Q statistical magnitude is bigger, indicates that the difference value of two iTree is smaller;If the Q statistical magnitude of two iTree
Value be 1, then representing two iTree diversity factoies is 0.
S4, it forest algorithm is isolated from initial forest T using simulated annealing according to the otherness of iTree and accuracy selects
One fitness value preferably iTree is combined into iForest.The fitness function relationship such as following formula of single iTree:
Wherein, F (Tj) it is expressed as tree isolated tree TjFitness value;W1Indicate the weight of accuracy, W2Indicate otherness pair
The weight answered.
S4 the following steps are included:
S41, initialization take initial temperature t0It is sufficiently large, enable temperature t=t0, appoint and take initial solution T1;
S42 repeats step S43- step S46 to Current Temperatures t;
S43, T is solved to current1Random perturbation generates a new explanation T2;
S44, T is calculated2Increment df=F (T2)-F(T1), wherein F (T1) it is isolated tree T1Fitness value;
If S45, df < 0, receive T2As new current solution, i.e. T1=T2;Otherwise Metropolis (Markov Chain is pressed
Monte Carlo sampling method) rule, calculate T2Acceptance probability p, that is, equally distributed random number on (0,1) section is randomly generated
Rand, if p > rand, receives T2As new current solution, i.e. T1=T2, otherwise retain current solution T1;
Wherein, κ is Boltzmann constant, and exp indicates natural Exponents;
S46: if meeting the termination condition of setting, current solution T is exportedlFor optimal solution, termination condition is usually taken to be
New explanation T in several continuous Metropolis chains2Either setting is terminated when all not received terminates temperature: otherwise by decaying
Return step S42 after function decaying temperature t.The attenuation function are as follows:
Wherein, tsTemperature value when being walked for θ, t0For initial temperature.
S47: repeating step S43-S46, and the r iTree with more excellent adaptive value are selected from initial forest T and are combined into orphan
Vertical forest iForest, r≤L.
It is the anti-implementation flow chart for cheating one embodiment of recognition methods of data card of the present invention referring to shown in Fig. 3.In this implementation
In example, processor 12 realizes data when executing the anti-computer program for cheating recognizer 10 of the data card that stores in memory 11
The anti-fraud recognition methods of card includes: step S300- step S320.In conjunction with Fig. 4: the anti-application environment for cheating recognition methods of data card
Schematic diagram is illustrated realization of the invention.
S300, data acquisition module 110 receive the user for the pending anti-fraud analysis that predetermined terminal is sent
Data, alternatively, after receiving the anti-fraud analysis request with user identifier that predetermined terminal is sent, it is true from advance
User data corresponding with the user identifier is extracted in fixed database, the user data includes one or more default classes
Type characteristic;
Optionally, the formula are as follows: F (X)=c × X1+b×X2, wherein F (X) represents the Comprehensive analysis results, X1Generation
The first analysis is as a result, X described in table2Second analysis is represented as a result, c, b is predetermined weighted values.
Optionally, it if calculated Comprehensive analysis results are greater than preset threshold, is sent to the predetermined terminal
The risk of fraud warning information of preset format;And/or
If calculated Comprehensive analysis results are less than or equal to preset threshold, sent to the predetermined terminal
The result feedback information of preset format.
Optionally, the training process that the preset kind has monitor model includes:
B1, obtain preset time in preset quantity user data data sample, and be each data sample mark
Remember corresponding fraud mark, the fraud mark includes fraud and non-fraud, for each data sample, extract one or
Multiple preset kind characteristics simultaneously generate corresponding characteristic set;
B2, each feature in each characteristic set is analyzed according to predetermined analysis rule, to mention
The key feature in each characteristic set is taken out, and the corresponding key feature of each data sample is divided into the first ratio
The verifying collection of training set and the second ratio;
B3, there is monitor model using the key feature information training preset kind in the training set, and utilize institute
It states the key feature information that verifying is concentrated and verifies the accuracy rate that the preset kind has monitor model;
If B4, accuracy rate are more than or equal to preset threshold, training terminates, alternatively, if accuracy rate is less than preset threshold,
Then increase the quantity of the data sample and re-executes above steps.
Optionally, the predetermined analysis rule are as follows:
C1, the corresponding fraud of each feature calculated in each characteristic set are identified as the support of fraud and/or set
Reliability;
C2, filter out support be greater than default support or greater than all features support average value feature,
And/or filter out confidence level be greater than default confidence level or greater than all features confidence level average value feature, as institute
State key feature.
Optionally, the training process of the unsupervised model of the preset kind includes:
First S1, building iTree (Isolation Tree, isolated tree) simultaneously generate data sample training set DTrain=
{ d1, d2 ..., di ..., dn }, wherein di is expressed as i-th of data sample, and the data sample training set is divided at random
Two parts, respectively DTrain1 and DTrain2;By the DTrain1 and DTrain2 generation characteristic set A=A1, A2 ..., Ai ...,
Am }, wherein Ai is expressed as ith feature in characteristic set and is selected random seed, from DTrainM are randomly choosed in 1 sample
Sample, and be randomly chosen k feature from characteristic set A and constitute subcharacter set a={ A1, A2 ..., Ak }, it is random in a
A feature Aj is selected, then each data sample is divided according to the split values p of feature Aj, k≤m;If data di
Aj feature characteristic value di (Aj) < p, then be placed on left subtree (left subtree is exactly to be seen with present node, its left child node that
The subtree of branch, the subtree is using the left child node of present node as root), it is on the contrary then be placed on right subtree (right subtree is exactly to work as prosthomere
Point sees that the subtree of its right that branch of child node, the subtree is using the right child node of present node as root);In this way iteratively
Left and right subtree is constructed, until meeting one of following condition:
(1)DTrainIn only a remaining data or a plurality of identical data;
(2) tree reaches maximum height;
With identical random seed and same procedure in DTrainFirst iTree model is again pulled up on 2 samples
S2, repetition S1 construct 2* (L-1) tree again and form initial forest T={ T11,T12,T21,T22,…,TL1,TL2,
Middle T indicates the set of 2L iTree, Ti1Indicate the first time training of i-th iTree, Ti2Indicate second of i-th iTree
Training;
S3, D is used respectivelyTrain1 and DTrain2 couples of initial forest T={ T11,T12,T21,T22,…,TL1,TL2Be trained,
According to the difference value between Q- normalized set difference iTree, the accuracy value of single iTree is calculated with cross validation method
ACC;
ACC={ X1,X2,…,XL, wherein Q indicates the difference matrix of L iTree;QijIt is expressed as isolated tree TiWith it is isolated
Set TjBetween difference value;XjIndicate isolated tree TjAccuracy value;
If any two iTree independence, the value of the Q statistical magnitude of this two iTree is 0, the value of Q statistical magnitude [-
1,1] change between, the value of Q statistical magnitude is bigger, indicates that the difference value of two iTree is smaller;If the Q statistical magnitude of two iTree
Value be 1, then representing two iTree diversity factoies is 0.
S4, it forest algorithm is isolated from initial forest T using simulated annealing according to the otherness of iTree and accuracy selects
One fitness value preferably iTree is combined into iForest.The fitness function relationship such as following formula of single iTree:
Wherein, F (Tj) it is expressed as tree isolated tree TjFitness value;W1Indicate the weight of accuracy, W2Indicate otherness pair
The weight answered.
S4 the following steps are included:
S41, initialization take initial temperature t0It is sufficiently large, enable temperature t=t0, appoint and take initial solution T1;
S42 repeats step S43- step S46 to Current Temperatures t;
S43, a new explanation T is generated to current solution T1 random perturbation2;
S44, the increment df=F (T for calculating T22)-F(T1), wherein F (T1) it is isolated tree T1Fitness value;
If S45, df < 0, receive T2As new current solution, i.e. T1=T2;Otherwise Metropolis (Markov Chain is pressed
Monte Carlo sampling method) rule, calculate T2Acceptance probability p, that is, equally distributed random number on (0,1) section is randomly generated
Rand, if p > rand, receives T2As new current solution, i.e. T1=T2, otherwise retain current solution T1;
Wherein, κ is Boltzmann constant, and exp indicates natural Exponents;
S46: if meeting the termination condition of setting, current solution T is exportedlFor optimal solution, termination condition is usually taken to be
New explanation T in several continuous Metropolis chains2Either setting is terminated when all not received terminates temperature: otherwise by decaying
Return step S42 after function decaying temperature t.The attenuation function are as follows:
Wherein, tsTemperature value when being walked for θ, t0For initial temperature.
S47: repeating step S43-S46, and the r iTree with more excellent adaptive value are selected from initial forest T and are combined into orphan
Vertical forest iForest, r≤L.
Preset kind characteristic in the user data is substituted into instruction in advance by S310, output intermediate result module 120
Experienced preset kind, which has in monitor model, to be analyzed, with output the first analysis result;And it will be default in the user data
Type feature data are substituted into the unsupervised model of preset kind trained in advance and are analyzed, with output the second analysis result.
S320 calculates synthesis result module 130 and the first analysis result and the second analysis result is substituted into and predefined
Formula in, calculate Comprehensive analysis results.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but under many states
The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art
The part contributed out can be embodied in the form of software products, which is stored in one as described above
In storage medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that terminal device (it can be mobile phone,
Computer, server or network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair
Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills
Art field, is included within the scope of the present invention.
Claims (10)
1. a kind of data card is counter to cheat recognition methods, it is applied to electronic device, which is characterized in that the described method includes:
The user data for receiving the pending anti-fraud analysis that predetermined terminal is sent, alternatively, predetermined receiving
Terminal send with user identifier anti-fraud analysis request after, extracted from predetermined database and the use
Family identifies corresponding user data, and the user data includes one or more preset kind characteristics;
Preset kind characteristic substitution in user data preset kind trained in advance is had in monitor model and is carried out
Analysis, with output the first analysis result;And the preset kind characteristic in the user data is substituted into the pre- of training in advance
If being analyzed in the unsupervised model of type, with output the second analysis result;
The first analysis result and the second analysis result are substituted into predetermined formula, Comprehensive analysis results are calculated;
If calculated Comprehensive analysis results are greater than preset threshold, taking advantage of for preset format is sent to the predetermined terminal
Cheat Risk-warning information;And/or
If calculated Comprehensive analysis results are less than or equal to preset threshold, sent to the predetermined terminal default
The result feedback information of format.
2. data card according to claim 1 is counter to cheat recognition methods, which is characterized in that the formula are as follows: F (X)=c ×
X1+b×X2, wherein F (X) represents the Comprehensive analysis results, X1First analysis is represented as a result, X2Represent described second point
Analysis is as a result, c, b are predetermined weighted value.
3. data card according to claim 1 is counter to cheat recognition methods, which is characterized in that the preset kind has supervision mould
Type is to promote decision-tree model training by gradient to obtain, and the unsupervised model of preset kind is isolated by simulated annealing
Stand abreast what model training obtained.
4. data card according to claim 1 is counter to cheat recognition methods, which is characterized in that the preset kind has supervision mould
The training process of type includes:
The data sample of the user data of the preset quantity in preset time is obtained, and corresponding for each data sample label
Fraud mark, the fraud mark include that fraud and non-fraud extract one or more default for each data sample
Type feature data simultaneously generate corresponding characteristic set;
Each feature in each characteristic set is analyzed according to predetermined analysis rule, it is each to extract
Key feature in the characteristic set, and by the corresponding key feature of each data sample be divided into the first ratio training set and
The verifying collection of second ratio;
There is monitor model using the key feature information training preset kind in the training set, and is collected using the verifying
In key feature information verify the accuracy rate that the preset kind has monitor model;
If accuracy rate is more than or equal to preset threshold, training terminates, alternatively, increasing if accuracy rate is less than preset threshold
The quantity of the data sample simultaneously re-executes above steps.
5. data card according to any one of claims 1 to 4 is counter to cheat recognition methods, which is characterized in that described preparatory
Determining analysis rule are as follows:
Calculate support and/or confidence level that the corresponding fraud of each feature in each characteristic set is identified as fraud;
Filter out support be greater than default support or greater than all features support average value feature, and/or sieve
Select confidence level be greater than default confidence level or greater than all features confidence level average value feature, as described crucial special
Sign.
6. a kind of electronic device, the electronic device includes memory and processor, which is characterized in that is stored on the memory
There is the data card that can be run on the processor is counter to cheat recognizer, the anti-recognizer of cheating of the data card is by the place
Reason device realizes following steps when executing:
The user data for receiving the pending anti-fraud analysis that predetermined terminal is sent, alternatively, predetermined receiving
Terminal send with user identifier anti-fraud analysis request after, extracted from predetermined database and the use
Family identifies corresponding user data, and the user data includes one or more preset kind characteristics;
Preset kind characteristic substitution in user data preset kind trained in advance is had in monitor model and is carried out
Analysis, with output the first analysis result;And the preset kind characteristic in the user data is substituted into the pre- of training in advance
If being analyzed in the unsupervised model of type, with output the second analysis result;
The first analysis result and the second analysis result are substituted into predetermined formula, Comprehensive analysis results are calculated;
If calculated Comprehensive analysis results are greater than preset threshold, taking advantage of for preset format is sent to the predetermined terminal
Cheat Risk-warning information;And/or
If calculated Comprehensive analysis results are less than or equal to preset threshold, sent to the predetermined terminal default
The result feedback information of format.
7. electronic device as claimed in claim 6, which is characterized in that the formula are as follows: F (X)=c × X1+b×X2, wherein F
(X) Comprehensive analysis results, X are represented1First analysis is represented as a result, X2Second analysis is represented as a result, c, b are pre-
First determining weighted value.
8. electronic device as claimed in claim 7, which is characterized in that the preset kind has the training process packet of monitor model
It includes:
The data sample of the user data of the preset quantity in preset time is obtained, and corresponding for each data sample label
Fraud mark, the fraud mark include that fraud and non-fraud extract one or more default for each data sample
Type feature data simultaneously generate corresponding characteristic set;
Each feature in each characteristic set is analyzed according to predetermined analysis rule, it is each to extract
Key feature in the characteristic set, and by the corresponding key feature of each data sample be divided into the first ratio training set and
The verifying collection of second ratio;
There is monitor model using the key feature information training preset kind in the training set, and is collected using the verifying
In key feature information verify the accuracy rate that the preset kind has monitor model;
If accuracy rate is more than or equal to preset threshold, training terminates, alternatively, increasing if accuracy rate is less than preset threshold
The quantity of the data sample simultaneously re-executes above steps.
9. electronic device as claimed in claim 8, which is characterized in that the predetermined analysis rule are as follows:
Calculate support and/or confidence level that the corresponding fraud of each feature in each characteristic set is identified as fraud;
Filter out support be greater than default support or greater than all features support average value feature, and/or sieve
Select confidence level be greater than default confidence level or greater than all features confidence level average value feature, as described crucial special
Sign.
10. a kind of computer readable storage medium, which is characterized in that be stored with data card in the computer readable storage medium
Anti- fraud recognizer, the data card is counter to be realized as appointed in claim 1-5 when cheating recognizer performed by processor
Data card anti-the step of cheating recognition methods described in one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910422069.6A CN110276621A (en) | 2019-05-21 | 2019-05-21 | Data card is counter to cheat recognition methods, electronic device and readable storage medium storing program for executing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910422069.6A CN110276621A (en) | 2019-05-21 | 2019-05-21 | Data card is counter to cheat recognition methods, electronic device and readable storage medium storing program for executing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110276621A true CN110276621A (en) | 2019-09-24 |
Family
ID=67960117
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910422069.6A Pending CN110276621A (en) | 2019-05-21 | 2019-05-21 | Data card is counter to cheat recognition methods, electronic device and readable storage medium storing program for executing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110276621A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110781173A (en) * | 2019-10-12 | 2020-02-11 | 杭州城市大数据运营有限公司 | Data identification method and device, computer equipment and storage medium |
CN110880117A (en) * | 2019-10-31 | 2020-03-13 | 北京三快在线科技有限公司 | False service identification method, device, equipment and storage medium |
CN110930218A (en) * | 2019-11-07 | 2020-03-27 | 中诚信征信有限公司 | Method and device for identifying fraudulent customer and electronic equipment |
CN111222566A (en) * | 2020-01-02 | 2020-06-02 | 平安科技(深圳)有限公司 | User attribute identification method, device and storage medium |
CN111309817A (en) * | 2020-01-16 | 2020-06-19 | 秒针信息技术有限公司 | Behavior recognition method and device and electronic equipment |
CN111641608A (en) * | 2020-05-18 | 2020-09-08 | 咪咕动漫有限公司 | Abnormal user identification method and device, electronic equipment and storage medium |
CN112990246A (en) * | 2019-12-17 | 2021-06-18 | 杭州海康威视数字技术股份有限公司 | Method and device for establishing isolated tree model |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107240395A (en) * | 2017-06-16 | 2017-10-10 | 百度在线网络技术(北京)有限公司 | A kind of acoustic training model method and apparatus, computer equipment, storage medium |
CN109035003A (en) * | 2018-07-04 | 2018-12-18 | 北京玖富普惠信息技术有限公司 | Anti- fraud model modelling approach and anti-fraud monitoring method based on machine learning |
CN109166032A (en) * | 2018-08-22 | 2019-01-08 | 北京芯盾时代科技有限公司 | It is counter on a kind of electronic silver line to cheat method and system |
CN109753499A (en) * | 2018-12-17 | 2019-05-14 | 云南电网有限责任公司信息中心 | A kind of O&M monitoring data administering method |
-
2019
- 2019-05-21 CN CN201910422069.6A patent/CN110276621A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107240395A (en) * | 2017-06-16 | 2017-10-10 | 百度在线网络技术(北京)有限公司 | A kind of acoustic training model method and apparatus, computer equipment, storage medium |
CN109035003A (en) * | 2018-07-04 | 2018-12-18 | 北京玖富普惠信息技术有限公司 | Anti- fraud model modelling approach and anti-fraud monitoring method based on machine learning |
CN109166032A (en) * | 2018-08-22 | 2019-01-08 | 北京芯盾时代科技有限公司 | It is counter on a kind of electronic silver line to cheat method and system |
CN109753499A (en) * | 2018-12-17 | 2019-05-14 | 云南电网有限责任公司信息中心 | A kind of O&M monitoring data administering method |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110781173A (en) * | 2019-10-12 | 2020-02-11 | 杭州城市大数据运营有限公司 | Data identification method and device, computer equipment and storage medium |
CN110880117A (en) * | 2019-10-31 | 2020-03-13 | 北京三快在线科技有限公司 | False service identification method, device, equipment and storage medium |
CN110930218A (en) * | 2019-11-07 | 2020-03-27 | 中诚信征信有限公司 | Method and device for identifying fraudulent customer and electronic equipment |
CN110930218B (en) * | 2019-11-07 | 2024-01-23 | 中诚信征信有限公司 | Method and device for identifying fraudulent clients and electronic equipment |
CN112990246A (en) * | 2019-12-17 | 2021-06-18 | 杭州海康威视数字技术股份有限公司 | Method and device for establishing isolated tree model |
CN112990246B (en) * | 2019-12-17 | 2022-09-09 | 杭州海康威视数字技术股份有限公司 | Method and device for establishing isolated tree model |
CN111222566A (en) * | 2020-01-02 | 2020-06-02 | 平安科技(深圳)有限公司 | User attribute identification method, device and storage medium |
CN111222566B (en) * | 2020-01-02 | 2020-09-01 | 平安科技(深圳)有限公司 | User attribute identification method, device and storage medium |
CN111309817A (en) * | 2020-01-16 | 2020-06-19 | 秒针信息技术有限公司 | Behavior recognition method and device and electronic equipment |
CN111309817B (en) * | 2020-01-16 | 2023-11-03 | 秒针信息技术有限公司 | Behavior recognition method and device and electronic equipment |
CN111641608A (en) * | 2020-05-18 | 2020-09-08 | 咪咕动漫有限公司 | Abnormal user identification method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110276621A (en) | Data card is counter to cheat recognition methods, electronic device and readable storage medium storing program for executing | |
TWI706333B (en) | Fraud transaction identification method, device, server and storage medium | |
WO2020228530A1 (en) | Repeated transaction risk monitoring method and device, and computer readable storage medium | |
CN109035003A (en) | Anti- fraud model modelling approach and anti-fraud monitoring method based on machine learning | |
CN108665159A (en) | A kind of methods of risk assessment, device, terminal device and storage medium | |
CN113626607B (en) | Abnormal work order identification method and device, electronic equipment and readable storage medium | |
CN107403311B (en) | Account use identification method and device | |
CN112148995A (en) | Product recommendation method and device, electronic equipment and readable storage medium | |
CN106991312A (en) | Internet based on Application on Voiceprint Recognition is counter to cheat authentication method | |
CN112328657A (en) | Feature derivation method, feature derivation device, computer equipment and medium | |
CN115222443A (en) | Client group division method, device, equipment and storage medium | |
CN116579671B (en) | Performance assessment method, system, terminal and storage medium for automatically matching indexes | |
CN116402625B (en) | Customer evaluation method, apparatus, computer device and storage medium | |
CN116777646A (en) | Artificial intelligence-based risk identification method, apparatus, device and storage medium | |
CN110458570A (en) | Risk trade control and configuration method and its system | |
CN112819499A (en) | Information transmission method, information transmission device, server and storage medium | |
CN110706111A (en) | Method and device for identifying suspicious transaction account, storage medium and server | |
CN114581251A (en) | Data verification method and device, computer equipment and computer readable storage medium | |
CN111488463B (en) | Test corpus generation method and device and electronic equipment | |
CN114202337A (en) | Risk identification method, device, equipment and storage medium | |
CN113269179A (en) | Data processing method, device, equipment and storage medium | |
CN107025547A (en) | Payment channel detection method, device and terminal | |
CN112001425A (en) | Data processing method and device and computer readable storage medium | |
CN111767399A (en) | Emotion classifier construction method, device, equipment and medium based on unbalanced text set | |
CN114020687B (en) | User retention analysis method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |