CN110276621A - Data card is counter to cheat recognition methods, electronic device and readable storage medium storing program for executing - Google Patents

Data card is counter to cheat recognition methods, electronic device and readable storage medium storing program for executing Download PDF

Info

Publication number
CN110276621A
CN110276621A CN201910422069.6A CN201910422069A CN110276621A CN 110276621 A CN110276621 A CN 110276621A CN 201910422069 A CN201910422069 A CN 201910422069A CN 110276621 A CN110276621 A CN 110276621A
Authority
CN
China
Prior art keywords
fraud
analysis
preset
data
preset kind
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910422069.6A
Other languages
Chinese (zh)
Inventor
卢磊
范瑞
姚晨钰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Bank Co Ltd
Original Assignee
Ping An Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Bank Co Ltd filed Critical Ping An Bank Co Ltd
Priority to CN201910422069.6A priority Critical patent/CN110276621A/en
Publication of CN110276621A publication Critical patent/CN110276621A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Artificial Intelligence (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Strategic Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention relates to technical field of safety protection, disclose that a kind of data card is counter to cheat recognition methods, device and storage medium.This method has monitor model to be analyzed the preset kind characteristic in user data to obtain the first analysis result by preset kind trained in advance, and by preset kind unsupervised model trained in advance the preset kind characteristic in user data is analyzed to obtain the second analysis as a result, and then result is analyzed according to first obtained and the second Analysis result calculation goes out Comprehensive analysis results.Using the invention discloses technical solution, realize that the low data card of high-precision, high stability, system performance requirements is counter to cheat automatic identification.

Description

Data card is counter to cheat recognition methods, electronic device and readable storage medium storing program for executing
Technical field
Recognition methods, electronic device are cheated the present invention relates to technical field of safety protection more particularly to a kind of data card are counter And readable storage medium storing program for executing.
Background technique
Currently, the scope of business of many financing corporations is related to multiple financial business scopes such as insurance, bank, investment, in order to Financial risks is effectively reduced, this kind of company is handling corresponding financial business (for example, credit card transacting business, loan for client Business etc.) after, anti-fraud identification can be all carried out to client.However, this kind of company be usually by the way of artificial to client into Row anti-fraud identification, although occurring some anti-fraud automatic identification schemes at present on the market, these existing anti-frauds are certainly Dynamic identifying schemes are normally based on clustering algorithm or sorting algorithm and carry out automatic identification, that there are precision is insufficient, stability is not high, For the technical problems such as more demanding of system performance.
Therefore, how to provide a kind of high-precision, high stability, system performance requirements low anti-fraud automatic identification scheme, Have become a technical problem urgently to be resolved.
Summary of the invention
In view of the foregoing, the main purpose of the present invention is to provide data card it is counter cheat recognition methods, electronic device and Readable storage medium storing program for executing cheats identification with more stable, accurately anti-to client's progress.
To achieve the above object, the present invention provides that a kind of data card is counter to cheat recognition methods, this method comprises:
The user data for the pending anti-fraud analysis that predetermined terminal is sent is received, alternatively, receiving in advance Determining terminal send with user identifier anti-fraud analysis request after, extracted from predetermined database with The corresponding user data of the user identifier, the user data include one or more preset kind characteristics;
Preset kind characteristic in the user data, which is substituted into preset kind trained in advance, to be had in monitor model It is analyzed, with output the first analysis result;And the preset kind characteristic in the user data is substituted into training in advance The unsupervised model of preset kind in analyzed, with output second analysis result;
The first analysis result and the second analysis result are substituted into predetermined formula, comprehensive analysis knot is calculated Fruit;
If calculated Comprehensive analysis results are greater than preset threshold, preset format is sent to the predetermined terminal Risk of fraud warning information;And/or
If calculated Comprehensive analysis results are less than or equal to preset threshold, sent to the predetermined terminal The result feedback information of preset format.
Preferably, the formula are as follows: F (X)=c × X1+b×X2, wherein F (X) represents the Comprehensive analysis results, X1Generation The first analysis is as a result, X described in table2Second analysis is represented as a result, c, b is predetermined weighted values.
Preferably, it is to promote decision-tree model training by gradient to obtain that the preset kind, which has monitor model, described The unsupervised model of preset kind is to stand abreast what model training obtained by the way that simulated annealing is isolated.
Preferably, the training process that the preset kind has monitor model includes:
The data sample of the user data of the preset quantity in preset time is obtained, and is each data sample label pair The fraud mark answered, the fraud mark include cheating to extract one or more for each data sample with non-fraud Preset kind characteristic simultaneously generates corresponding characteristic set;
Each feature in each characteristic set is analyzed according to predetermined analysis rule, to extract Key feature in each characteristic set, and the corresponding key feature of each data sample is divided into the training of the first ratio The verifying collection of collection and the second ratio;
There is monitor model using the key feature information training preset kind in the training set, and is tested described in utilization The key feature information that card is concentrated verifies the accuracy rate that the preset kind has monitor model;
If accuracy rate is more than or equal to preset threshold, training terminates, alternatively, if accuracy rate is less than preset threshold, Increase the quantity of the data sample and re-executes above steps.
Preferably, the predetermined analysis rule are as follows:
Calculate support and/or confidence that the corresponding fraud of each feature in each characteristic set is identified as fraud Degree;
Filter out support be greater than default support or greater than all features support average value feature, and/ Or, filter out confidence level be greater than default confidence level or greater than all features confidence level average value feature, as described Key feature.
To achieve the above object, the present invention also provides a kind of electronic device, the electronic device includes memory and processing Device.It is stored with that the data card that can be run on the processor is counter to cheat recognizer on the memory, the data card is anti- Fraud recognizer realizes following steps when being executed by the processor:
The user data for the pending anti-fraud analysis that predetermined terminal is sent is received, alternatively, receiving in advance Determining terminal send with user identifier anti-fraud analysis request after, extracted from predetermined database with The corresponding user data of the user identifier, the user data include one or more preset kind characteristics;
Preset kind characteristic in the user data, which is substituted into preset kind trained in advance, to be had in monitor model It is analyzed, with output the first analysis result;And the preset kind characteristic in the user data is substituted into training in advance The unsupervised model of preset kind in analyzed, with output second analysis result;
The first analysis result and the second analysis result are substituted into predetermined formula, comprehensive analysis knot is calculated Fruit;
If calculated Comprehensive analysis results are greater than preset threshold, preset format is sent to the predetermined terminal Risk of fraud warning information;And/or
If calculated Comprehensive analysis results are less than or equal to preset threshold, sent to the predetermined terminal The result feedback information of preset format.
Preferably, the formula are as follows: F (X)=c × X1+b×X2, wherein F (X) represents the Comprehensive analysis results, X1Generation The first analysis is as a result, X described in table2Second analysis is represented as a result, c, b is predetermined weighted values.
Preferably, the training process that the preset kind has monitor model includes:
The data sample of the user data of the preset quantity in preset time is obtained, and is each data sample label pair The fraud mark answered, the fraud mark include cheating to extract one or more for each data sample with non-fraud Preset kind characteristic simultaneously generates corresponding characteristic set;
Each feature in each characteristic set is analyzed according to predetermined analysis rule, to extract Key feature in each characteristic set, and the corresponding key feature of each data sample is divided into the training of the first ratio The verifying collection of collection and the second ratio;
There is monitor model using the key feature information training preset kind in the training set, and is tested described in utilization The key feature information that card is concentrated verifies the accuracy rate that the preset kind has monitor model;
If accuracy rate is more than or equal to preset threshold, training terminates, alternatively, if accuracy rate is less than preset threshold, Increase the quantity of the data sample and re-executes above steps.
Preferably, the predetermined analysis rule are as follows:
Calculate support and/or confidence that the corresponding fraud of each feature in each characteristic set is identified as fraud Degree;
Filter out support be greater than default support or greater than all features support average value feature, and/ Or, filter out confidence level be greater than default confidence level or greater than all features confidence level average value feature, as described Key feature.
To achieve the above object, the present invention also provides a kind of computer readable storage medium, the computer-readable storages Including that data card is counter in medium cheats recognizer, and the data card counter cheat realizes above-mentioned number when recognizer is executed by processor The step of recognition methods is instead cheated according to card.
Compared with the prior art, the present invention has monitor model to default in user data by preset kind trained in advance Type feature data analyzed to obtain the first analysis as a result, and by the unsupervised model of preset kind trained in advance to Preset kind characteristic in user data is analyzed to obtain the second analysis as a result, in turn according to the first obtained analysis knot Fruit and the second Analysis result calculation go out Comprehensive analysis results, have effectively broken inertial thinking (this field of those skilled in the art The inertial thinking of technical staff has been normally based on supervision algorithm model and has identified, this especially in financial business field Inertial thinking is even more deep-rooted), the model feature for having played monitor model and unsupervised model is maximized, is effectively utilized There is supervision algorithm model to improve model accuracy, and be effectively utilized unsupervised algorithm and capture novel fraud form, both protected Accuracy has been demonstrate,proved, has in turn ensured stability, meanwhile, there is effective combination of monitor model and unsupervised model, effectively reduces nothing Imitate influence of the feature to model.
Detailed description of the invention
Fig. 1 is the hardware structure diagram that the present invention realizes counter one embodiment of electronic device for cheating identification of data card;
Fig. 2 is the anti-functional block diagram for cheating 10 1 embodiment of recognizer of data card in Fig. 1;
Fig. 3 is the anti-implementation flow chart for cheating one embodiment of recognition methods of data card of the present invention;
Fig. 4 is the anti-application environment schematic diagram for cheating one embodiment of recognition methods of data card of the present invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, not For limiting the present invention.Based on the embodiments of the present invention, those of ordinary skill in the art are not before making creative work Every other embodiment obtained is put, shall fall within the protection scope of the present invention.
It should be noted that the description for being related to " first ", " second " etc. in the present invention is used for description purposes only, and cannot It is interpreted as its relative importance of indication or suggestion or implicitly indicates the quantity of indicated technical characteristic.Define as a result, " the One ", the feature of " second " can explicitly or implicitly include at least one of the features.In addition, the skill between each embodiment Art scheme can be combined with each other, but must be based on can be realized by those of ordinary skill in the art, when technical solution Will be understood that the combination of this technical solution is not present in conjunction with there is conflicting or cannot achieve when, also not the present invention claims Protection scope within.
Shown in referring to Fig.1, the hardware configuration of counter one embodiment of electronic device for cheating identification of data card is realized for the present invention Figure.In the present embodiment, electronic device 1 can including being stored in memory 11, processor 12, network interface 13, memory 11 Recognizer 10 is cheated by data card that processor 12 executes is counter.
The electronic device 1 can be server, smart phone, tablet computer, portable computer, desktop PC etc. Terminal device with storage and calculation function.It in one embodiment of the invention, should when electronic device 1 is server Server can be the one or more of rack-mount server, blade server, tower server or Cabinet-type server etc..
The memory 11 includes the readable storage medium storing program for executing of at least one type.The readable storage of at least one type Medium can be the non-volatile memory medium of such as flash memory, hard disk, multimedia card, card-type memory.In some embodiments, institute Stating readable storage medium storing program for executing can be the internal storage unit of the electronic device 1, such as the hard disk of the electronic device 1.Another In a little embodiments, the readable storage medium storing program for executing is also possible to the external memory 11 of the electronic device 1, such as electronics dress Set the plug-in type hard disk being equipped on 1, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) etc..
The processor 12 can be central processing unit (Central Processing Unit, CPU), microprocessor or Any other applicable data processing chip, program code or processing data for being stored in run memory 11, such as hold Row data card is counter to cheat recognizer 10 etc..
Fig. 1 is illustrated only with the anti-electronic device 1 for cheating recognizer 10 of component 11-13 and data card, but is answered Understand, electronic device 1 may include more or less component.
In one embodiment of the invention, processor 12 executes the data card stored in memory 11 and instead cheats identification journey Following steps are realized when sequence 10:
The user data for the pending anti-fraud analysis that predetermined terminal is sent is received, alternatively, receiving in advance Determining terminal send with user identifier anti-fraud analysis request after, extracted from predetermined database with The corresponding user data of the user identifier, the user data include one or more preset kind characteristics;
Preset kind characteristic in the user data, which is substituted into preset kind trained in advance, to be had in monitor model It is analyzed, with output the first analysis result;And the preset kind characteristic in the user data is substituted into training in advance The unsupervised model of preset kind in analyzed, with output second analysis result;
The first analysis result and the second analysis result are substituted into predetermined formula, comprehensive analysis knot is calculated Fruit;
If calculated Comprehensive analysis results are greater than preset threshold, preset format is sent to the predetermined terminal Risk of fraud warning information;And/or
If calculated Comprehensive analysis results are less than or equal to preset threshold, sent to the predetermined terminal The result feedback information of preset format.
Illustrate the function of each building block and the cooperation of mutual function in the electronic device 1 to preferably illustrate Relationship is described in detail below in conjunction with Fig. 2-3.
It is the anti-functional block diagram for cheating 10 1 embodiment of recognizer of data card in Fig. 1 referring to shown in Fig. 2.The number According to card, anti-fraud recognizer 10 is divided into multiple functional modules, and multiple functional module is stored in memory 11, and It is executed by processor 12, cheats identification with more stable, accurately anti-to client's progress.So-called " module " of the invention refers to energy Enough complete the series of computation machine program instruction set of specific function.
In the present embodiment, the anti-recognizer 10 of cheating of the data card is divided into: data acquisition module 110, output Intermediate result module 120 and calculating synthesis result module 130.It should be understood that in the present embodiment, the data card is anti- Fraud recognizer 10 is divided into data acquisition module 110, output intermediate result module 120 and calculates synthesis result module 130, It gives expression to that the data card is counter to cheat function achieved by recognizer 10 just for the sake of clearer, is not used to limit The anti-recognizer 10 of cheating of the data card is only capable of or must be broken into data acquisition module 110, output intermediate result module 120 and synthesis result module 130 is calculated, it will be apparent to those skilled in the art that can in other embodiments, it easily will be described The anti-recognizer 10 of cheating of data card is divided into the functional module different from the present embodiment, and this will not be repeated here.
The data acquisition module 110, for receiving pending anti-fraud analysis that predetermined terminal is sent User data, alternatively, receiving that predetermined terminal sends with user identifier (for example, ID card No., passport No. Deng) anti-fraud analysis request after, from predetermined database (for example, insurance business data library, banking business data, sign Telecommunications databases etc.) in extract user data corresponding with the user identifier, the user data includes one or more default Type feature data are (for example, age, gender, occupation, annual income, native place, information of not repaying, data card (for example, credit card) (for example, to pessimism/optimism degree of society etc., the values information can be handled in client for debt information, values information Obtained when business by allowing client to fill in evaluation questionnaire analysis), reference information (for example, with the presence or absence of breaking one's promise situation etc., the sign Letter information can be obtained from predetermined collage-credit data library " for example, personal collage-credit data library of People's Bank of China "), finance Fraud information (such as, if once carried out financial fraud)).
The output intermediate result module 120, it is pre- for substituting into the preset kind characteristic in the user data First trained preset kind has monitor model (for example, LR (Logistic Regression, logistic regression) model, GBDT (Gradient Boosting Decision Tree, gradient promoted decision tree) model) in analyzed, with first point of output Result is analysed (for example, it may be the probability value that fraud occurs scores higher alternatively, can be the score value of risk of fraud, to cheat Risk is bigger), and the preset kind characteristic in the user data is substituted into the unsupervised mould of preset kind trained in advance Type (for example, SA-iForest (Simulated AnnealingIsolation Forest, simulated annealing isolate forest) model) In analyzed, with export second analysis result (for example, it may be fraud occur probability value, alternatively, can be risk of fraud Score value, score it is higher, risk of fraud is bigger).
The calculating synthesis result module 130, it is preparatory for substituting into the first analysis result and the second analysis result In determining formula, Comprehensive analysis results are calculated.
Optionally, the formula are as follows: F (X)=c × X1+b×X2, wherein F (X) represents Comprehensive analysis results, X1Represent institute The first analysis is stated as a result, X2Second analysis is represented as a result, c, b is predetermined weighted values.
Optionally, it if calculated Comprehensive analysis results are greater than preset threshold, is sent to the predetermined terminal The risk of fraud warning information of preset format (for example, this modal analysis results is " high risk of fraud ", do be infused Meaning);If and/or calculated Comprehensive analysis results are less than or equal to preset threshold, send out to the predetermined terminal Send the result feedback information (for example, this modal analysis results is " low risk of fraud ", please know) of preset format.
Optionally, the training process that the preset kind has monitor model includes:
The user data of preset quantity (for example, 1,000,000) in (for example, in nearest 1 year) B1, acquisition preset time Data sample, and mark corresponding fraud mark for each data sample, the fraud mark includes cheating and non-fraud; For each data sample, extract one or more preset kind characteristics and generate corresponding characteristic set (for example, The characteristic set of the preset kind characteristic of data sample i can be { Xi1, Xi2, Xi3 ..., Xim }, this feature set Including m different types of features);
B2, each feature in each characteristic set is analyzed according to predetermined analysis rule, to mention The key feature in each characteristic set is taken out, and the corresponding key feature of each data sample is divided into the first ratio The verifying collection of the training set of (for example, 40%) and the second ratio (for example, 30%);
B3, there is monitor model (for example, logic using the key feature information training preset kind in the training set Regression model), and preset kind described in the key feature information authentication concentrated using the verifying has the standard of monitor model True rate;
If B4, accuracy rate are more than or equal to preset threshold, training terminates, alternatively, if accuracy rate is less than preset threshold, Then increase the quantity of the data sample and re-executes above steps.
Optionally, the predetermined analysis rule are as follows:
C1, the corresponding fraud of each feature calculated in each characteristic set are identified as the support of fraud and/or set (support of a feature refers to that the quantity of the characteristic set comprising this feature accounts for the hundred of the quantity of all characteristic sets to reliability Divide ratio;The confidence level of one feature refers to the quotient of this feature support Yu a default support, for example, if feature Xim Support is ai1, which is a2, then the confidence level of this feature Xim is ai1/a2);
C2, filter out support be greater than default support or greater than all features support average value feature, And/or filter out confidence level be greater than default confidence level or greater than all features confidence level average value feature, as institute State key feature.
Optionally, the training process of the unsupervised model of the preset kind includes:
First S1, building iTree (Isolation Tree, isolated tree) simultaneously generate data sample training set DTrain= { d1, d2 ..., di ..., dn }, wherein di is expressed as i-th of data sample, and the data sample training set is divided at random Two parts, respectively DTrain1 and DTrain2;By the DTrain1 and DTrain2 generation characteristic set A=A1, A2 ..., Ai ..., Am }, wherein Ai is expressed as ith feature in characteristic set and is selected random seed, from DTrainM are randomly choosed in 1 sample Sample, and be randomly chosen k feature from characteristic set A and constitute subcharacter set a={ A1, A2 ..., Ak }, it is random in a A feature Aj is selected, then each data sample is divided according to the split values p of feature Aj, k≤m;If data di Aj feature characteristic value di (Aj) < p, then be placed on left subtree (left subtree is exactly to be seen with present node, its left child node that The subtree of branch, the subtree is using the left child node of present node as root), it is on the contrary then be placed on right subtree (right subtree is exactly to work as prosthomere Point sees that the subtree of its right that branch of child node, the subtree is using the right child node of present node as root);In this way iteratively Left and right subtree is constructed, until meeting one of following condition:
(1)DTrainIn only a remaining data or a plurality of identical data;
(2) tree reaches maximum height;
With identical random seed and same procedure in DTrainFirst iTree model is again pulled up on 2 samples.
S2, repetition S1 construct 2* (L-1) tree again and form initial forest T={ T11,T12,T21,T22,…,TL1,TL2, Middle T indicates the set of 2L iTree, Ti1Indicate the first time training of i-th iTree, Ti2Indicate second of i-th iTree Training;
S3, D is used respectivelyTrain1 and DTrain2 couples of initial forest T={ T11,T12,T21,T22,…,TL1,TL2Be trained, According to the difference value between Q- normalized set difference iTree, the accuracy value of single iTree is calculated with cross validation method ACC;
ACC={ X1,X2,…,XL, wherein Q indicates the difference matrix of L iTree;QijIt is expressed as isolated tree TiWith it is isolated Set TjBetween difference value;XjIndicate isolated tree TjAccuracy value;
If any two iTree independence, the value of the Q statistical magnitude of this two iTree is 0, the value of Q statistical magnitude [- 1,1] change between, the value of Q statistical magnitude is bigger, indicates that the difference value of two iTree is smaller;If the Q statistical magnitude of two iTree Value be 1, then representing two iTree diversity factoies is 0.
S4, it forest algorithm is isolated from initial forest T using simulated annealing according to the otherness of iTree and accuracy selects One fitness value preferably iTree is combined into iForest.The fitness function relationship such as following formula of single iTree:
Wherein, F (Tj) it is expressed as tree isolated tree TjFitness value;W1Indicate the weight of accuracy, W2Indicate otherness pair The weight answered.
S4 the following steps are included:
S41, initialization take initial temperature t0It is sufficiently large, enable temperature t=t0, appoint and take initial solution T1
S42 repeats step S43- step S46 to Current Temperatures t;
S43, T is solved to current1Random perturbation generates a new explanation T2
S44, T is calculated2Increment df=F (T2)-F(T1), wherein F (T1) it is isolated tree T1Fitness value;
If S45, df < 0, receive T2As new current solution, i.e. T1=T2;Otherwise Metropolis (Markov Chain is pressed Monte Carlo sampling method) rule, calculate T2Acceptance probability p, that is, equally distributed random number on (0,1) section is randomly generated Rand, if p > rand, receives T2As new current solution, i.e. T1=T2, otherwise retain current solution T1
Wherein, κ is Boltzmann constant, and exp indicates natural Exponents;
S46: if meeting the termination condition of setting, current solution T is exportedlFor optimal solution, termination condition is usually taken to be New explanation T in several continuous Metropolis chains2Either setting is terminated when all not received terminates temperature: otherwise by decaying Return step S42 after function decaying temperature t.The attenuation function are as follows:
Wherein, tsTemperature value when being walked for θ, t0For initial temperature.
S47: repeating step S43-S46, and the r iTree with more excellent adaptive value are selected from initial forest T and are combined into orphan Vertical forest iForest, r≤L.
It is the anti-implementation flow chart for cheating one embodiment of recognition methods of data card of the present invention referring to shown in Fig. 3.In this implementation In example, processor 12 realizes data when executing the anti-computer program for cheating recognizer 10 of the data card that stores in memory 11 The anti-fraud recognition methods of card includes: step S300- step S320.In conjunction with Fig. 4: the anti-application environment for cheating recognition methods of data card Schematic diagram is illustrated realization of the invention.
S300, data acquisition module 110 receive the user for the pending anti-fraud analysis that predetermined terminal is sent Data, alternatively, after receiving the anti-fraud analysis request with user identifier that predetermined terminal is sent, it is true from advance User data corresponding with the user identifier is extracted in fixed database, the user data includes one or more default classes Type characteristic;
Optionally, the formula are as follows: F (X)=c × X1+b×X2, wherein F (X) represents the Comprehensive analysis results, X1Generation The first analysis is as a result, X described in table2Second analysis is represented as a result, c, b is predetermined weighted values.
Optionally, it if calculated Comprehensive analysis results are greater than preset threshold, is sent to the predetermined terminal The risk of fraud warning information of preset format;And/or
If calculated Comprehensive analysis results are less than or equal to preset threshold, sent to the predetermined terminal The result feedback information of preset format.
Optionally, the training process that the preset kind has monitor model includes:
B1, obtain preset time in preset quantity user data data sample, and be each data sample mark Remember corresponding fraud mark, the fraud mark includes fraud and non-fraud, for each data sample, extract one or Multiple preset kind characteristics simultaneously generate corresponding characteristic set;
B2, each feature in each characteristic set is analyzed according to predetermined analysis rule, to mention The key feature in each characteristic set is taken out, and the corresponding key feature of each data sample is divided into the first ratio The verifying collection of training set and the second ratio;
B3, there is monitor model using the key feature information training preset kind in the training set, and utilize institute It states the key feature information that verifying is concentrated and verifies the accuracy rate that the preset kind has monitor model;
If B4, accuracy rate are more than or equal to preset threshold, training terminates, alternatively, if accuracy rate is less than preset threshold, Then increase the quantity of the data sample and re-executes above steps.
Optionally, the predetermined analysis rule are as follows:
C1, the corresponding fraud of each feature calculated in each characteristic set are identified as the support of fraud and/or set Reliability;
C2, filter out support be greater than default support or greater than all features support average value feature, And/or filter out confidence level be greater than default confidence level or greater than all features confidence level average value feature, as institute State key feature.
Optionally, the training process of the unsupervised model of the preset kind includes:
First S1, building iTree (Isolation Tree, isolated tree) simultaneously generate data sample training set DTrain= { d1, d2 ..., di ..., dn }, wherein di is expressed as i-th of data sample, and the data sample training set is divided at random Two parts, respectively DTrain1 and DTrain2;By the DTrain1 and DTrain2 generation characteristic set A=A1, A2 ..., Ai ..., Am }, wherein Ai is expressed as ith feature in characteristic set and is selected random seed, from DTrainM are randomly choosed in 1 sample Sample, and be randomly chosen k feature from characteristic set A and constitute subcharacter set a={ A1, A2 ..., Ak }, it is random in a A feature Aj is selected, then each data sample is divided according to the split values p of feature Aj, k≤m;If data di Aj feature characteristic value di (Aj) < p, then be placed on left subtree (left subtree is exactly to be seen with present node, its left child node that The subtree of branch, the subtree is using the left child node of present node as root), it is on the contrary then be placed on right subtree (right subtree is exactly to work as prosthomere Point sees that the subtree of its right that branch of child node, the subtree is using the right child node of present node as root);In this way iteratively Left and right subtree is constructed, until meeting one of following condition:
(1)DTrainIn only a remaining data or a plurality of identical data;
(2) tree reaches maximum height;
With identical random seed and same procedure in DTrainFirst iTree model is again pulled up on 2 samples
S2, repetition S1 construct 2* (L-1) tree again and form initial forest T={ T11,T12,T21,T22,…,TL1,TL2, Middle T indicates the set of 2L iTree, Ti1Indicate the first time training of i-th iTree, Ti2Indicate second of i-th iTree Training;
S3, D is used respectivelyTrain1 and DTrain2 couples of initial forest T={ T11,T12,T21,T22,…,TL1,TL2Be trained, According to the difference value between Q- normalized set difference iTree, the accuracy value of single iTree is calculated with cross validation method ACC;
ACC={ X1,X2,…,XL, wherein Q indicates the difference matrix of L iTree;QijIt is expressed as isolated tree TiWith it is isolated Set TjBetween difference value;XjIndicate isolated tree TjAccuracy value;
If any two iTree independence, the value of the Q statistical magnitude of this two iTree is 0, the value of Q statistical magnitude [- 1,1] change between, the value of Q statistical magnitude is bigger, indicates that the difference value of two iTree is smaller;If the Q statistical magnitude of two iTree Value be 1, then representing two iTree diversity factoies is 0.
S4, it forest algorithm is isolated from initial forest T using simulated annealing according to the otherness of iTree and accuracy selects One fitness value preferably iTree is combined into iForest.The fitness function relationship such as following formula of single iTree:
Wherein, F (Tj) it is expressed as tree isolated tree TjFitness value;W1Indicate the weight of accuracy, W2Indicate otherness pair The weight answered.
S4 the following steps are included:
S41, initialization take initial temperature t0It is sufficiently large, enable temperature t=t0, appoint and take initial solution T1
S42 repeats step S43- step S46 to Current Temperatures t;
S43, a new explanation T is generated to current solution T1 random perturbation2
S44, the increment df=F (T for calculating T22)-F(T1), wherein F (T1) it is isolated tree T1Fitness value;
If S45, df < 0, receive T2As new current solution, i.e. T1=T2;Otherwise Metropolis (Markov Chain is pressed Monte Carlo sampling method) rule, calculate T2Acceptance probability p, that is, equally distributed random number on (0,1) section is randomly generated Rand, if p > rand, receives T2As new current solution, i.e. T1=T2, otherwise retain current solution T1
Wherein, κ is Boltzmann constant, and exp indicates natural Exponents;
S46: if meeting the termination condition of setting, current solution T is exportedlFor optimal solution, termination condition is usually taken to be New explanation T in several continuous Metropolis chains2Either setting is terminated when all not received terminates temperature: otherwise by decaying Return step S42 after function decaying temperature t.The attenuation function are as follows:
Wherein, tsTemperature value when being walked for θ, t0For initial temperature.
S47: repeating step S43-S46, and the r iTree with more excellent adaptive value are selected from initial forest T and are combined into orphan Vertical forest iForest, r≤L.
Preset kind characteristic in the user data is substituted into instruction in advance by S310, output intermediate result module 120 Experienced preset kind, which has in monitor model, to be analyzed, with output the first analysis result;And it will be default in the user data Type feature data are substituted into the unsupervised model of preset kind trained in advance and are analyzed, with output the second analysis result.
S320 calculates synthesis result module 130 and the first analysis result and the second analysis result is substituted into and predefined Formula in, calculate Comprehensive analysis results.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but under many states The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in one as described above In storage medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that terminal device (it can be mobile phone, Computer, server or network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims (10)

1. a kind of data card is counter to cheat recognition methods, it is applied to electronic device, which is characterized in that the described method includes:
The user data for receiving the pending anti-fraud analysis that predetermined terminal is sent, alternatively, predetermined receiving Terminal send with user identifier anti-fraud analysis request after, extracted from predetermined database and the use Family identifies corresponding user data, and the user data includes one or more preset kind characteristics;
Preset kind characteristic substitution in user data preset kind trained in advance is had in monitor model and is carried out Analysis, with output the first analysis result;And the preset kind characteristic in the user data is substituted into the pre- of training in advance If being analyzed in the unsupervised model of type, with output the second analysis result;
The first analysis result and the second analysis result are substituted into predetermined formula, Comprehensive analysis results are calculated;
If calculated Comprehensive analysis results are greater than preset threshold, taking advantage of for preset format is sent to the predetermined terminal Cheat Risk-warning information;And/or
If calculated Comprehensive analysis results are less than or equal to preset threshold, sent to the predetermined terminal default The result feedback information of format.
2. data card according to claim 1 is counter to cheat recognition methods, which is characterized in that the formula are as follows: F (X)=c × X1+b×X2, wherein F (X) represents the Comprehensive analysis results, X1First analysis is represented as a result, X2Represent described second point Analysis is as a result, c, b are predetermined weighted value.
3. data card according to claim 1 is counter to cheat recognition methods, which is characterized in that the preset kind has supervision mould Type is to promote decision-tree model training by gradient to obtain, and the unsupervised model of preset kind is isolated by simulated annealing Stand abreast what model training obtained.
4. data card according to claim 1 is counter to cheat recognition methods, which is characterized in that the preset kind has supervision mould The training process of type includes:
The data sample of the user data of the preset quantity in preset time is obtained, and corresponding for each data sample label Fraud mark, the fraud mark include that fraud and non-fraud extract one or more default for each data sample Type feature data simultaneously generate corresponding characteristic set;
Each feature in each characteristic set is analyzed according to predetermined analysis rule, it is each to extract Key feature in the characteristic set, and by the corresponding key feature of each data sample be divided into the first ratio training set and The verifying collection of second ratio;
There is monitor model using the key feature information training preset kind in the training set, and is collected using the verifying In key feature information verify the accuracy rate that the preset kind has monitor model;
If accuracy rate is more than or equal to preset threshold, training terminates, alternatively, increasing if accuracy rate is less than preset threshold The quantity of the data sample simultaneously re-executes above steps.
5. data card according to any one of claims 1 to 4 is counter to cheat recognition methods, which is characterized in that described preparatory Determining analysis rule are as follows:
Calculate support and/or confidence level that the corresponding fraud of each feature in each characteristic set is identified as fraud;
Filter out support be greater than default support or greater than all features support average value feature, and/or sieve Select confidence level be greater than default confidence level or greater than all features confidence level average value feature, as described crucial special Sign.
6. a kind of electronic device, the electronic device includes memory and processor, which is characterized in that is stored on the memory There is the data card that can be run on the processor is counter to cheat recognizer, the anti-recognizer of cheating of the data card is by the place Reason device realizes following steps when executing:
The user data for receiving the pending anti-fraud analysis that predetermined terminal is sent, alternatively, predetermined receiving Terminal send with user identifier anti-fraud analysis request after, extracted from predetermined database and the use Family identifies corresponding user data, and the user data includes one or more preset kind characteristics;
Preset kind characteristic substitution in user data preset kind trained in advance is had in monitor model and is carried out Analysis, with output the first analysis result;And the preset kind characteristic in the user data is substituted into the pre- of training in advance If being analyzed in the unsupervised model of type, with output the second analysis result;
The first analysis result and the second analysis result are substituted into predetermined formula, Comprehensive analysis results are calculated;
If calculated Comprehensive analysis results are greater than preset threshold, taking advantage of for preset format is sent to the predetermined terminal Cheat Risk-warning information;And/or
If calculated Comprehensive analysis results are less than or equal to preset threshold, sent to the predetermined terminal default The result feedback information of format.
7. electronic device as claimed in claim 6, which is characterized in that the formula are as follows: F (X)=c × X1+b×X2, wherein F (X) Comprehensive analysis results, X are represented1First analysis is represented as a result, X2Second analysis is represented as a result, c, b are pre- First determining weighted value.
8. electronic device as claimed in claim 7, which is characterized in that the preset kind has the training process packet of monitor model It includes:
The data sample of the user data of the preset quantity in preset time is obtained, and corresponding for each data sample label Fraud mark, the fraud mark include that fraud and non-fraud extract one or more default for each data sample Type feature data simultaneously generate corresponding characteristic set;
Each feature in each characteristic set is analyzed according to predetermined analysis rule, it is each to extract Key feature in the characteristic set, and by the corresponding key feature of each data sample be divided into the first ratio training set and The verifying collection of second ratio;
There is monitor model using the key feature information training preset kind in the training set, and is collected using the verifying In key feature information verify the accuracy rate that the preset kind has monitor model;
If accuracy rate is more than or equal to preset threshold, training terminates, alternatively, increasing if accuracy rate is less than preset threshold The quantity of the data sample simultaneously re-executes above steps.
9. electronic device as claimed in claim 8, which is characterized in that the predetermined analysis rule are as follows:
Calculate support and/or confidence level that the corresponding fraud of each feature in each characteristic set is identified as fraud;
Filter out support be greater than default support or greater than all features support average value feature, and/or sieve Select confidence level be greater than default confidence level or greater than all features confidence level average value feature, as described crucial special Sign.
10. a kind of computer readable storage medium, which is characterized in that be stored with data card in the computer readable storage medium Anti- fraud recognizer, the data card is counter to be realized as appointed in claim 1-5 when cheating recognizer performed by processor Data card anti-the step of cheating recognition methods described in one.
CN201910422069.6A 2019-05-21 2019-05-21 Data card is counter to cheat recognition methods, electronic device and readable storage medium storing program for executing Pending CN110276621A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910422069.6A CN110276621A (en) 2019-05-21 2019-05-21 Data card is counter to cheat recognition methods, electronic device and readable storage medium storing program for executing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910422069.6A CN110276621A (en) 2019-05-21 2019-05-21 Data card is counter to cheat recognition methods, electronic device and readable storage medium storing program for executing

Publications (1)

Publication Number Publication Date
CN110276621A true CN110276621A (en) 2019-09-24

Family

ID=67960117

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910422069.6A Pending CN110276621A (en) 2019-05-21 2019-05-21 Data card is counter to cheat recognition methods, electronic device and readable storage medium storing program for executing

Country Status (1)

Country Link
CN (1) CN110276621A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781173A (en) * 2019-10-12 2020-02-11 杭州城市大数据运营有限公司 Data identification method and device, computer equipment and storage medium
CN110880117A (en) * 2019-10-31 2020-03-13 北京三快在线科技有限公司 False service identification method, device, equipment and storage medium
CN110930218A (en) * 2019-11-07 2020-03-27 中诚信征信有限公司 Method and device for identifying fraudulent customer and electronic equipment
CN111222566A (en) * 2020-01-02 2020-06-02 平安科技(深圳)有限公司 User attribute identification method, device and storage medium
CN111309817A (en) * 2020-01-16 2020-06-19 秒针信息技术有限公司 Behavior recognition method and device and electronic equipment
CN111641608A (en) * 2020-05-18 2020-09-08 咪咕动漫有限公司 Abnormal user identification method and device, electronic equipment and storage medium
CN112990246A (en) * 2019-12-17 2021-06-18 杭州海康威视数字技术股份有限公司 Method and device for establishing isolated tree model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107240395A (en) * 2017-06-16 2017-10-10 百度在线网络技术(北京)有限公司 A kind of acoustic training model method and apparatus, computer equipment, storage medium
CN109035003A (en) * 2018-07-04 2018-12-18 北京玖富普惠信息技术有限公司 Anti- fraud model modelling approach and anti-fraud monitoring method based on machine learning
CN109166032A (en) * 2018-08-22 2019-01-08 北京芯盾时代科技有限公司 It is counter on a kind of electronic silver line to cheat method and system
CN109753499A (en) * 2018-12-17 2019-05-14 云南电网有限责任公司信息中心 A kind of O&M monitoring data administering method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107240395A (en) * 2017-06-16 2017-10-10 百度在线网络技术(北京)有限公司 A kind of acoustic training model method and apparatus, computer equipment, storage medium
CN109035003A (en) * 2018-07-04 2018-12-18 北京玖富普惠信息技术有限公司 Anti- fraud model modelling approach and anti-fraud monitoring method based on machine learning
CN109166032A (en) * 2018-08-22 2019-01-08 北京芯盾时代科技有限公司 It is counter on a kind of electronic silver line to cheat method and system
CN109753499A (en) * 2018-12-17 2019-05-14 云南电网有限责任公司信息中心 A kind of O&M monitoring data administering method

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781173A (en) * 2019-10-12 2020-02-11 杭州城市大数据运营有限公司 Data identification method and device, computer equipment and storage medium
CN110880117A (en) * 2019-10-31 2020-03-13 北京三快在线科技有限公司 False service identification method, device, equipment and storage medium
CN110930218A (en) * 2019-11-07 2020-03-27 中诚信征信有限公司 Method and device for identifying fraudulent customer and electronic equipment
CN110930218B (en) * 2019-11-07 2024-01-23 中诚信征信有限公司 Method and device for identifying fraudulent clients and electronic equipment
CN112990246A (en) * 2019-12-17 2021-06-18 杭州海康威视数字技术股份有限公司 Method and device for establishing isolated tree model
CN112990246B (en) * 2019-12-17 2022-09-09 杭州海康威视数字技术股份有限公司 Method and device for establishing isolated tree model
CN111222566A (en) * 2020-01-02 2020-06-02 平安科技(深圳)有限公司 User attribute identification method, device and storage medium
CN111222566B (en) * 2020-01-02 2020-09-01 平安科技(深圳)有限公司 User attribute identification method, device and storage medium
CN111309817A (en) * 2020-01-16 2020-06-19 秒针信息技术有限公司 Behavior recognition method and device and electronic equipment
CN111309817B (en) * 2020-01-16 2023-11-03 秒针信息技术有限公司 Behavior recognition method and device and electronic equipment
CN111641608A (en) * 2020-05-18 2020-09-08 咪咕动漫有限公司 Abnormal user identification method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110276621A (en) Data card is counter to cheat recognition methods, electronic device and readable storage medium storing program for executing
TWI706333B (en) Fraud transaction identification method, device, server and storage medium
WO2020228530A1 (en) Repeated transaction risk monitoring method and device, and computer readable storage medium
CN109035003A (en) Anti- fraud model modelling approach and anti-fraud monitoring method based on machine learning
CN108665159A (en) A kind of methods of risk assessment, device, terminal device and storage medium
CN113626607B (en) Abnormal work order identification method and device, electronic equipment and readable storage medium
CN107403311B (en) Account use identification method and device
CN112148995A (en) Product recommendation method and device, electronic equipment and readable storage medium
CN106991312A (en) Internet based on Application on Voiceprint Recognition is counter to cheat authentication method
CN112328657A (en) Feature derivation method, feature derivation device, computer equipment and medium
CN115222443A (en) Client group division method, device, equipment and storage medium
CN116579671B (en) Performance assessment method, system, terminal and storage medium for automatically matching indexes
CN116402625B (en) Customer evaluation method, apparatus, computer device and storage medium
CN116777646A (en) Artificial intelligence-based risk identification method, apparatus, device and storage medium
CN110458570A (en) Risk trade control and configuration method and its system
CN112819499A (en) Information transmission method, information transmission device, server and storage medium
CN110706111A (en) Method and device for identifying suspicious transaction account, storage medium and server
CN114581251A (en) Data verification method and device, computer equipment and computer readable storage medium
CN111488463B (en) Test corpus generation method and device and electronic equipment
CN114202337A (en) Risk identification method, device, equipment and storage medium
CN113269179A (en) Data processing method, device, equipment and storage medium
CN107025547A (en) Payment channel detection method, device and terminal
CN112001425A (en) Data processing method and device and computer readable storage medium
CN111767399A (en) Emotion classifier construction method, device, equipment and medium based on unbalanced text set
CN114020687B (en) User retention analysis method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination