CN109993538A - Identity theft detection method based on probability graph model - Google Patents

Identity theft detection method based on probability graph model Download PDF

Info

Publication number
CN109993538A
CN109993538A CN201910148549.8A CN201910148549A CN109993538A CN 109993538 A CN109993538 A CN 109993538A CN 201910148549 A CN201910148549 A CN 201910148549A CN 109993538 A CN109993538 A CN 109993538A
Authority
CN
China
Prior art keywords
probability
graph model
formula
feature
network payment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910148549.8A
Other languages
Chinese (zh)
Inventor
王成
胡腾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN201910148549.8A priority Critical patent/CN109993538A/en
Publication of CN109993538A publication Critical patent/CN109993538A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/382Payment protocols; Details thereof insuring higher security of transaction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4014Identity check for transactions

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Computer Security & Cryptography (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention provides a kind of identity theft detection method based on probability graph model, comprising steps of S1: collecting and obtains and pre-process network payment transaction data, obtains a network payment transaction feature set;S2: it is established using the network payment transaction feature set and obtains a probability graph model;S3: inputting the parameter of a training set and the training probability graph model, while the conditional probability parameter of the probability graph model is obtained using Bayes' theorem;S4: predicting a forecast set of input using the conditional probability parameter and the Bayes' theorem, obtains a prediction result.A kind of identity theft detection method based on probability graph model of the invention, based on probability graph model, by synthesizing behavior and model attributes realization network payment fraud detection to user, dynamic on-line tuning can be carried out to detection model, improve the robustness of the accuracy and model that intercept fraudulent trading.

Description

Identity theft detection method based on probability graph model
Technical field
The present invention relates to the anti-fraud detection fields of internet banking network payment, more particularly to one kind to be based on probability artwork The identity theft detection method of type.
Background technique
Mobile Internet is a handle double-edged sword, is consequently also brought while bringing convenience to people's lives various hidden Suffer from, for example, online trading payment platform can allow people stays indoors in addition can be carried out doing shopping anywhere or anytime and prop up It pays, but this convenience and fast some illegal attackers is also allowed to have an opportunity to take advantage of, attacker is by stealing the account letter of user Breath, steals the individual privacy information of user, or even the user itself that disguises oneself as is traded or transferred accounts to complete to cheat.Therefore it is The effective individual interest safety for ensureing user and company, needs to establish effective network payment fraud detection system System.
Some network payment fraud models based on machine learning even deep learning are currently existed, wherein absolutely mostly Several learning models is the discrimination model based on expectation maximization, cheats model for the network on line is counter, uses deep learning Although equal models can be in effect better than other methods, deep learning model as the anti-method cheated of network payment One typical black-box model, result do not have it is explanatory, do not have enough convincingnesses.
Summary of the invention
In view of the deficiency of the prior art, the present invention provides a kind of identity theft detection side based on probability graph model Method realizes network payment fraud detection by synthesizing behavior modeling to user based on probability graph model, can be to detection model Dynamic on-line tuning is carried out, the robustness of the accuracy and model that intercept fraudulent trading is improved.
To achieve the goals above, the present invention provides a kind of identity theft detection method based on probability graph model, including Step:
S1: collecting and obtain and pre-process network payment transaction data, obtains a network payment transaction feature set;
S2: it is established using the network payment transaction feature set and obtains a probability graph model;
S3: the parameter of one training set of input and the training probability graph model, while using described in Bayes' theorem acquisition The conditional probability parameter of probability graph model;
S4: predicting a forecast set of input using the conditional probability parameter and the Bayes' theorem, obtains One prediction result.
Preferably, the S1 step further comprises step:
S11: data scrubbing step, by the network payment transaction data fill in missing values, smooth noise and Identification solves that data are inconsistent to realize that the clear of error correcting and repeated data is removed in the formattings of data, abnormal data It removes;
S12: the unified storage of the network payment transaction data of multiple data sources is formed a number by data integration step According to library;
S13: the network payment transaction data standardization processing in the database is formed into the network payment and is handed over Easy characteristic set.
Preferably, the S2 step further comprises step:
S21: the network payment transaction feature set θ, one candidate feature set θ ' of input, a set of relationship R, mark are obtained Sign attribute Y and threshold value λ;Wherein, θ ' ∈ Φ, R ∈ Φ, Φ indicate empty set.
S22: the feature X for obtaining the network payment transaction feature set θ is calculated according to formula (1)iWith tag attributes Y's Mutual information I:
Wherein, XiIndicate ith feature;I is the natural number more than or equal to 1;Y indicates tag attributes;X indicates XiValue; The value of y expression Y;The joint probability of p (x, y) expression x and y;P (x) is the marginal probability of x;P (y) is the marginal probability of y;I (Xi;Y X) is indicatediMutual information between Y;
S22: judge I (Xi;Y) whether it is more than or equal to preset threshold value λ;Such as it is to continue with subsequent step;
S23: the candidate feature set θ ' is updated according to formula (2):
θ ' :=θ '+Xi(2);
The network payment transaction feature set θ is updated according to formula (3):
θ :=θ-Xi(3);
S24: according to obtaining dependence r, r:Xi→Y;
S25: the set of relationship R is updated according to formula (4);
S26: judge whether the feature quantity in presently described candidate feature set θ ' is more than or equal to 2;It is such as to continue with subsequent Step, otherwise return step S23;
S27: the mutual information between feature two-by-two is calculated in presently described candidate feature set θ ' according to formula (5):
Wherein, XiIndicate the i-th feature in θ ', XjIndicate the jth feature in θ ', i, j are greater than the natural number equal to 1;x Indicate XiValue;X ' expression XjValue;The joint probability of p (x, x ') expression x and x ';P (x) is the marginal probability of x;p(x′) For the marginal probability of x ';I(Xi;Xj) indicate XiWith XjBetween mutual information;The set of relationship R is updated by formula (4);
S28: current θ ' is assigned to θ, and empties set θ ';By between formula (5) set of computations θ two-by-two feature Mutual information, if I (Xi;Xj) >=λ determines the dependence r between feature two-by-two, is passing through formula then according to priori knowledge (4) the set of relationship R is updated;
S29: repeating step S28 until θ is the I (X of empty or all featuresi;Xj)≤λ, at this time according to presently described set of relations It closes R and obtains the probability graph model.
Preferably, the S3 step further comprises step:
S31: one training set of input, the training set includes characteristic attribute and tag attributes;
S32: it is calculated according to formula (6) and obtains the conditional probability parameter:
Wherein, AiIndicate the i-th father node of the probability graph model;B indicates AiChild node;ptrain(Ai| B) indicate Ai Conditional probability parameter between B;p(Ai) indicate AiMarginal probability;p(B|Ai) expression condition be AiIt is the probability that B occurs;Aj Indicate jth father node;p(Aj) indicate AjMarginal probability;P(B|Aj) expression condition be AjWhen B occur probability;
S33: whether judgment formula (6) restrains, and is such as to continue with subsequent step, otherwise return step S31.
Preferably, the S4 step further comprises step:
S41: one test set of input, the test set includes characteristic attribute Y ';
S42: calculating according to a formula (7) and obtain a posterior probability, exports the prediction result according to the posterior probability;
Wherein, p (Y ' | X1,…,Xn) expression condition be X1,…,XnWhen Y ' generation probability;P(X1,…,Xn| Y ') it indicates X when condition is Y '1,…,Xn.Joint probability;The marginal probability of P (Y ') expression Y ';P(X1,…,Xn) indicate X1,…,XnConnection Close probability.
Preferably, it is further comprised the steps of: after the S4 step
S5: the prediction result is verified.
Preferably, the S5 step further comprises step:
S51: according to the prediction result count obtain formula (7) model one by positive class determine be positive class quantity TP, One by negative class determine to be positive the quantity FP of class, one negative class is determined into the number of class of being negative by positive class determine the to be negative quantity FN and one of class Measure TN;
S52: it is calculated according to a formula (8) and obtains an accurate rate precision:
It is calculated according to a formula (9) and obtains a recall rate recall:
Acquisition one, which is calculated, according to a formula (10) bothers rate disturb:
S53: it according to the accurate rate, the recall rate and described bother rate and evaluates the prediction result.
The present invention due to use above technical scheme, make it have it is following the utility model has the advantages that
Often there is based on Bayesian probability graph model when giving a forecast to data very strong interpretation and say Take power;Probability graph model carrys out training pattern using training set, obtains conditional probability parameter, when giving a forecast to test set, utilizes Priori knowledge and the condition of test set obtain conditional probability and finally derive that posterior probability, result have very strong convincingness; And probability graph model is capable of handling the situation there are hidden variable.The interpretable of model is improved based on probability graph model Property, to detection fraudulent trading, intercepts fraudulent trading and the fund security of user and enterprise is protected to have better guarantee.
Detailed description of the invention
Fig. 1 is the overview flow chart of the identity theft detection method based on probability graph model of the embodiment of the present invention;
Fig. 2 is that the bank data that is directed to of the embodiment of the present invention models to obtain probability graph model;
Fig. 3 is the part detailed process signal of the identity theft detection method based on probability graph model of the embodiment of the present invention Figure.
Specific embodiment
Below according to attached FIG. 1 to FIG. 3, presently preferred embodiments of the present invention is provided, and is described in detail, is enabled more preferable geographical Solve function of the invention, feature.
Please refer to FIG. 1 to FIG. 3, a kind of identity theft detection method based on probability graph model of the embodiment of the present invention, packet Include step:
S1: collecting and obtain and pre-process network payment transaction data, obtains a network payment transaction feature set.
Wherein, S1 step further comprises step:
S11: data scrubbing step fills in missing values, smooth noise and identification by carrying out to network payment transaction data Solve inconsistent formatting, the removing error correcting of abnormal data and the removing of repeated data to realize data of data;
S12: the unified storage of the network payment transaction data of multiple data sources is formed a database by data integration step;
S13: the network payment transaction data standardization processing in database is formed into network payment transaction feature set.
Although current internet finance has produced many transaction data abundant, based in the real world Data are generally all incomplete inconsistent dirty datas, can not directly participate in the calculating of model, it is therefore necessary to original Data are pre-processed.(1) data scrubbing: by filling in missing values, smooth noise data identifies or solves inconsistent clear up Data.Mainly reach target below: the formatting standard (such as time) of data, the removing of abnormal data, error correcting, The removing of repeated data;(2) data integration: the data in multiple data sources are mainly combined and are uniformly deposited by data integration Storage, establishes data warehouse;(3) data convert: by smoothly assembling, Data generalization, the modes such as standardization convert the data into Practise the form that model needs.
Such as: type is as shown in table 1 after the original field and pretreatment of data.
Type list after the original field of table 1 and pretreatment
Field name Data type Field description Type after pretreatment
Transaction_Time Character string The incident time is handed over, second grade is accurate to Integer
Check Character string The sign test mode of transaction Integer
Transaction_Type Character string The type of transaction Integer
Transaction_Amount Floating type Transaction amount, unit RMB Integer
Merchant_Code Character string The merchant number of transaction Integer
IP Character string Transaction whether common IP Integer
Sign Character string The label of transaction Integer
Available original field is largely character string type as can be seen from Table 1, and as probability graph model itself The variable of discrete type can only then be processed, therefore pre-processing not only includes data scrubbing and data integration, and is become in data During changing, continuous type floating number is also converted into the computable discrete variable of probability graph model.
S2: it is established using network payment transaction feature set and obtains a probability graph model.
By the dependence and independence between analysis feature, a complete probability graph is constructed.Constructing probability graph is then The Joint Distribution between data characteristics is constructed, and dependence and independence are two main characters of distribution.Independence property It is extremely important when answering inquiry, it can be used to fundamentally reduce the calculating cost of deduction.
In the present embodiment, the algorithm environment of this step is based on: Python and Numpy system.
Wherein, S2 step further comprises step:
S21: obtaining network payment transaction feature set θ, inputs a candidate feature set θ ', a set of relationship R, label category Property Y and threshold value λ;Wherein, θ ' ∈ Φ, R ∈ Φ, Φ indicate empty set.
S22: the feature X for obtaining network payment transaction feature set θ is calculated according to formula (1)iWith the mutual trust of tag attributes Y Breath amount I:
Wherein, XiIndicate ith feature;I is the natural number more than or equal to 1;Y indicates tag attributes;X indicates XiValue; The value of y expression Y;The joint probability of p (x, y) expression x and y;P (x) is the marginal probability of x;P (y) is the marginal probability of y;I (Xi;Y X) is indicatediMutual information between Y;
S22: judge I (Xi;Y) whether it is more than or equal to preset threshold value λ;Such as it is to continue with subsequent step;
S23: candidate feature set θ ' is updated according to formula (2):
θ ' :=θ '+Xi(2);
Network payment transaction feature set θ is updated according to formula (3):
θ :=θ-Xi(3);
S24: according to obtaining dependence r, r:Xi→Y;
S25: set of relationship R is updated according to formula (4);
S26: judge whether the feature quantity in current candidate characteristic set θ ' is more than or equal to 2;It is such as to continue with subsequent step, Otherwise return step S23;
S27: the mutual information between feature two-by-two is calculated in current candidate characteristic set θ ' according to formula (5):
Wherein, XiIndicate the i-th feature in θ ', XjIndicate the jth feature in θ ', i, j are greater than the natural number equal to 1;x Indicate XiValue;X ' expression XjValue;The joint probability of p (x, x ') expression x and x ';P (x) is the marginal probability of x;p(x′) For the marginal probability of x ';I(Xi;Xj) indicate XiWith XjBetween mutual information;Set of relationship R is updated by formula (4);
S28: current θ ' is assigned to θ, and empties set θ ';By between formula (5) set of computations θ two-by-two feature Mutual information, if I (Xi;Xj) >=λ determines the dependence r between feature two-by-two, is passing through formula then according to priori knowledge (4) set of relationship R is updated;
S29: repeating step S28 until θ is the I (X of empty or all featuresi;Xj)≤λ, at this time according to current relation set R Obtain probability graph model.
S3: the parameter of one training set of input and training probability graph model, while probability artwork is obtained using Bayes' theorem The conditional probability parameter of type.
The main function of this step is the parameter in training pattern.The essence of probability graph model training is exactly to pass through statistics instruction Practice the marginal probability of each of collection feature, and in this, as condition, by calculating the joint probability of feature, i.e. posterior probability As condition, go to infer the conditional probability in probability graph, the i.e. parameter of model using Bayes' theorem.
In the present embodiment, the algorithm environment of this step is based on: Python, Pgmpy probability graph model and Pandas number According to analysis tool.
Wherein, S3 step further comprises step:
S31: one training set of input, training set includes characteristic attribute and tag attributes;
S32: it is calculated according to formula (6) and obtains conditional probability parameter:
Wherein, AiIndicate the i-th father node of probability graph model;B indicates AiChild node;ptrain(Ai| B) indicate AiWith B it Between conditional probability parameter;p(Ai) indicate AiMarginal probability;p(B|Ai) expression condition be AiWhen B occur probability;AjIt indicates Jth father node;p(Aj) indicate AjMarginal probability;P(B|Aj) expression condition be AjWhen B occur probability;
S33: whether judgment formula (6) restrains, and is such as to continue with subsequent step, otherwise return step S31.
S4: predicting a forecast set of input using conditional probability parameter and Bayes' theorem, obtains a prediction knot Fruit.
The main function of this step is judged to unknown record, that is, is directed to a real-time transaction record, model A prediction result is provided, that is, judges that the transaction is arm's length dealing either fraudulent trading.And the process predicted mainly is also With Bayes' theorem, i.e., using the feature in transaction record as condition, with the conditional probability in model, with Bayes' theorem It goes to infer the posterior probability that this records.
In the present embodiment, the algorithm environment of this step is based on: Python, Pgmpy probability graph model, Pandas data Analysis tool and Numpy system.
Wherein, S4 step further comprises step:
S41: one test set of input, test set includes characteristic attribute Y ';
S42: it is done using Bayesian network and infers to be exactly in the conditional probability obtained using training process and test set Condition derive posterior probability;It is calculated according to a formula (7) and obtains posterior probability, prediction result is exported according to posterior probability;
Wherein, p (Y ' | X1,…,Xn) expression condition be X1,…,XnWhen Y ' generation probability;P(X1,…,Xn| Y ') it indicates X when condition is Y '1,…,XnJoint probability;The marginal probability of P (Y ') expression Y ';P(X1,…,Xn) indicate X1,…,XnConnection Close probability.
S5: prediction result is verified.
Wherein, S5 step further comprises step:
S51: according to prediction result count obtain formula (7) model one by positive class determine be positive class quantity TP, one will Negative class determine to be positive the quantity FP of class, one positive class determine the to be negative quantity FN and one of class is determined that negative class be negative the quantity of class TN;
S52: it is calculated according to a formula (8) and obtains an accurate rate precision:
It is calculated according to a formula (9) and obtains a recall rate recall:
Acquisition one, which is calculated, according to a formula (10) bothers rate disturb:
S53: according to accurate rate, recall rate and rate is bothered come evaluation and foreca result.
For example, being obtained by carrying out detection proof on true internet Bank Danamon transaction data collection in the rate of bothering (disturb) less than 1%, 0.5%, 0.1% and 0.05% the recall rate (interception rate, True Positive Rate) when, and Thus the performance of this method is evaluated, the method for the present embodiment herein means to put on and calculate is better than previous research on the time, And there is good robustness.
The probability graph model in Fig. 2 is please referred to, in actual use, the method for the present embodiment features disappearing for different user Take the joint ensemble between mode and different characteristic, users different first is when bank handles bank card, the work of the card A kind of purposes (as specially used the card as speculation in stocks or wage card) of fixation, therefore the bank of different purposes can be presented when using possible Card may show different sign test modes, if the bank card of some user is used to carry out particular transaction (as speculated in shares), Then relatively fixed normality (such as with the opening quotation of stock market and close disk time correlation) can be presented in the exchange hour of the card;And it should The transaction amount of card can show relatively high correlation (related to the price of stock);The trade company to trade simultaneously with the card Side can also show relatively high correlation (such as certain specific companies);It whether is that common IP also embodies during then trading The stationary distribution that user is formed when trading out is related.The behavior point of different user is constituted without the consumption habit of user Cloth, if once appearance and the unmatched behavior pattern of transaction before, has very maximum probability that can be judged as fraudulent trading. Here it is interpretation logic, the method for the present embodiment compared to traditional deep learning model black box, by combine with The relevant knowledge of banking is directed to similar user in conjunction with hypothesis, and building is used to portray the probability artwork of user behavior distribution Type, and the model has extraordinary interpretation logic.
In addition, being prediction model using probability graph model, the situation there are hidden variable can be preferably handled, this is base The a priori assumption of a routine can be provided by professional knowledge in probability graph model, i.e., when model itself has non-observational variable When, then using Bayesian Estimation can provide a kind of reasonable estimation by state-space model so that method have it is more preferable Robustness.
A kind of identity theft detection method based on probability graph model of the embodiment of the present invention, based on Bayesian general Rate graph model often has very strong interpretation and convincingness when giving a forecast to data;Probability graph model uses training set Training pattern obtains conditional probability parameter, when giving a forecast to test set, obtains item using the condition of prior probability and test set Part probability finally derives that posterior probability, result have very strong convincingness;And probability graph model is capable of handling that there are hidden The situation of variable, and these are that the existing method based on discrimination model can not accomplish;Therefore the embodiment of the present invention The identity theft detection method based on probability graph model based on probability graph model has not available for existing discrimination model Advantage.Deficiency which overcome tradition based on deep learning as fraud detection method improves the interpretation of model, right Detection fraudulent trading intercepts fraudulent trading and the fund security of user and enterprise is protected to have better guarantee.
The present invention has been described in detail with reference to the accompanying drawings, those skilled in the art can be according to upper It states and bright many variations example is made to the present invention.Thus, certain details in embodiment should not constitute limitation of the invention, this Invention will be using the range that the appended claims define as protection scope of the present invention.

Claims (7)

1. a kind of identity theft detection method based on probability graph model, comprising steps of
S1: collecting and obtain and pre-process network payment transaction data, obtains a network payment transaction feature set;
S2: it is established using the network payment transaction feature set and obtains a probability graph model;
S3: the parameter of one training set of input and the training probability graph model, while the probability is obtained using Bayes' theorem The conditional probability parameter of graph model;
S4: predicting a forecast set of input using the conditional probability parameter and the Bayes' theorem, and it is pre- to obtain one Survey result.
2. the identity theft detection method according to claim 1 based on probability graph model, which is characterized in that the S1 step Suddenly further comprise step:
S11: data scrubbing step fills in missing values, smooth noise and identification by carrying out to the network payment transaction data Solve inconsistent formatting, the removing error correcting of abnormal data and the removing of repeated data to realize data of data;
S12: the unified storage of the network payment transaction data of multiple data sources is formed a database by data integration step;
S13: it is special that the network payment transaction data standardization processing in the database is formed into the network payment transaction Collection is closed.
3. the identity theft detection method according to claim 2 based on probability graph model, which is characterized in that the S2 step Suddenly further comprise step:
S21: obtaining the network payment transaction feature set θ, inputs a candidate feature set θ ', a set of relationship R, label category Property Y and threshold value λ;Wherein, θ ' ∈ Φ, R ∈ Φ, Φ indicate empty set.
S22: the feature X for obtaining the network payment transaction feature set θ is calculated according to formula (1)iWith the mutual trust of tag attributes Y Breath amount I:
Wherein, XiIndicate ith feature;I is the natural number more than or equal to 1;Y indicates tag attributes;X indicates XiValue;Y table Show the value of Y;The joint probability of p (x, y) expression x and y;P (x) is the marginal probability of x;P (y) is the marginal probability of y;I(Xi; Y X) is indicatediMutual information between Y;
S22: judge I (Xi;Y) whether it is more than or equal to preset threshold value λ;Such as it is to continue with subsequent step;
S23: the candidate feature set θ ' is updated according to formula (2):
θ ' :=θ '+Xi(2);
The network payment transaction feature set θ is updated according to formula (3):
θ :=θ-Xi(3);
S24: according to obtaining dependence r, r:Xi→Y;
S25: the set of relationship R is updated according to formula (4);
S26: judge whether the feature quantity in presently described candidate feature set θ ' is more than or equal to 2;It is such as to continue with subsequent step, Otherwise return step S23;
S27: the mutual information between feature two-by-two is calculated in presently described candidate feature set θ ' according to formula (5):
Wherein, XiIndicate the i-th feature in θ ', XjIndicate the jth feature in θ ', i, j are greater than the natural number equal to 1;X is indicated XiValue;X ' expression XjValue;The joint probability of p (x, x ') expression x and x ';P (x) is the marginal probability of x;P (x ') is x ' Marginal probability;I(Xi;Xj) indicate XiWith XjBetween mutual information;The set of relationship R is updated by formula (4);
S28: current θ ' is assigned to θ, and empties set θ ';Pass through the mutual trust between formula (5) set of computations θ two-by-two feature Breath amount, if I (Xi;Xj) >=λ determines the dependence r between feature two-by-two then according to priori knowledge, is passing through formula (4) Update the set of relationship R;
S29: repeating step S28 until θ is the I (X of empty or all featuresi;Xj)≤λ, at this time according to presently described set of relationship R Obtain the probability graph model.
4. the identity theft detection method according to claim 3 based on probability graph model, which is characterized in that the S3 step Suddenly further comprise step:
S31: one training set of input, the training set includes characteristic attribute and tag attributes;
S32: it is calculated according to formula (6) and obtains the conditional probability parameter:
Wherein, AiIndicate the i-th father node of the probability graph model;B indicates AiChild node;ptrain(Ai| B) indicate AiWith B it Between conditional probability parameter;p(Ai) indicate AiMarginal probability;p(B|Ai) expression condition be AiWhen B occur probability;AjIt indicates Jth father node;p(Aj) indicate AjMarginal probability;P(B|Aj) expression condition be AjWhen B occur probability.
S33: whether judgment formula (6) restrains, and is such as to continue with subsequent step, otherwise return step S31.
5. the identity theft detection method according to claim 4 based on probability graph model, which is characterized in that the S4 step Suddenly further comprise step:
S41: one test set of input, the test set includes characteristic attribute Y ';
S42: calculating according to a formula (7) and obtain a posterior probability, exports the prediction result according to the posterior probability;
Wherein, p (Y ' | X1,…,Xn) expression condition be X1,…,XnWhen Y ' generation probability;P(X1,…,Xn| Y ') indicate condition X when for Y '1,…,XnJoint probability;The marginal probability ... of P (Y ') expression Y ';P(X1,…,Xn) indicate X1,…,XnJoint Probability.
6. the identity theft detection method according to claim 5 based on probability graph model, which is characterized in that the S4 step It is further comprised the steps of: after rapid
S5: the prediction result is verified.
7. the identity theft detection method according to claim 6 based on probability graph model, which is characterized in that the S5 step Suddenly further comprise step:
S51: according to the prediction result count obtain formula (7) model one by positive class determine be positive class quantity TP, one will Negative class determine to be positive the quantity FP of class, one positive class determine the to be negative quantity FN and one of class is determined that negative class be negative the quantity of class TN;
S52: it is calculated according to a formula (8) and obtains an accurate rate precision:
It is calculated according to a formula (9) and obtains a recall rate recall:
Acquisition one, which is calculated, according to a formula (10) bothers rate disturb:
S53: it according to the accurate rate, the recall rate and described bother rate and evaluates the prediction result.
CN201910148549.8A 2019-02-28 2019-02-28 Identity theft detection method based on probability graph model Pending CN109993538A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910148549.8A CN109993538A (en) 2019-02-28 2019-02-28 Identity theft detection method based on probability graph model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910148549.8A CN109993538A (en) 2019-02-28 2019-02-28 Identity theft detection method based on probability graph model

Publications (1)

Publication Number Publication Date
CN109993538A true CN109993538A (en) 2019-07-09

Family

ID=67130436

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910148549.8A Pending CN109993538A (en) 2019-02-28 2019-02-28 Identity theft detection method based on probability graph model

Country Status (1)

Country Link
CN (1) CN109993538A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046957A (en) * 2019-12-13 2020-04-21 支付宝(杭州)信息技术有限公司 Model embezzlement detection method, model training method and device
CN111800389A (en) * 2020-06-09 2020-10-20 同济大学 Port network intrusion detection method based on Bayesian network
CN111860647A (en) * 2020-07-21 2020-10-30 金陵科技学院 Abnormal consumption mode judgment method
CN112153221A (en) * 2020-09-16 2020-12-29 北京邮电大学 Communication behavior identification method based on social network diagram calculation

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150134512A1 (en) * 2013-11-13 2015-05-14 Mastercard International Incorporated System and method for detecting fraudulent network events
CN106910071A (en) * 2017-01-11 2017-06-30 中国建设银行股份有限公司 The verification method and device of user identity
CN107615326A (en) * 2015-01-20 2018-01-19 口袋医生公司 Use the healthy balance system and method for probability graph model
CN108376300A (en) * 2018-03-02 2018-08-07 江苏电力信息技术有限公司 A kind of user power utilization behavior prediction method based on probability graph model
CN108492173A (en) * 2018-03-23 2018-09-04 上海氪信信息技术有限公司 A kind of anti-Fraud Prediction method of credit card based on dual-mode network figure mining algorithm
CN109360099A (en) * 2018-10-22 2019-02-19 广东工业大学 A kind of anti-fraud method of finance based on k- nearest neighbor algorithm

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150134512A1 (en) * 2013-11-13 2015-05-14 Mastercard International Incorporated System and method for detecting fraudulent network events
CN107615326A (en) * 2015-01-20 2018-01-19 口袋医生公司 Use the healthy balance system and method for probability graph model
CN106910071A (en) * 2017-01-11 2017-06-30 中国建设银行股份有限公司 The verification method and device of user identity
CN108376300A (en) * 2018-03-02 2018-08-07 江苏电力信息技术有限公司 A kind of user power utilization behavior prediction method based on probability graph model
CN108492173A (en) * 2018-03-23 2018-09-04 上海氪信信息技术有限公司 A kind of anti-Fraud Prediction method of credit card based on dual-mode network figure mining algorithm
CN109360099A (en) * 2018-10-22 2019-02-19 广东工业大学 A kind of anti-fraud method of finance based on k- nearest neighbor algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
柴洪峰等: ""基于数据挖掘的异常交易检测方法"", 《计算机应用与软件》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046957A (en) * 2019-12-13 2020-04-21 支付宝(杭州)信息技术有限公司 Model embezzlement detection method, model training method and device
CN111046957B (en) * 2019-12-13 2021-03-16 支付宝(杭州)信息技术有限公司 Model embezzlement detection method, model training method and device
CN111800389A (en) * 2020-06-09 2020-10-20 同济大学 Port network intrusion detection method based on Bayesian network
CN111860647A (en) * 2020-07-21 2020-10-30 金陵科技学院 Abnormal consumption mode judgment method
CN111860647B (en) * 2020-07-21 2023-11-10 金陵科技学院 Abnormal consumption mode judging method
CN112153221A (en) * 2020-09-16 2020-12-29 北京邮电大学 Communication behavior identification method based on social network diagram calculation
CN112153221B (en) * 2020-09-16 2021-06-29 北京邮电大学 Communication behavior identification method based on social network diagram calculation

Similar Documents

Publication Publication Date Title
CN109993538A (en) Identity theft detection method based on probability graph model
Mittal et al. Performance evaluation of machine learning algorithms for credit card fraud detection
CN108665159A (en) A kind of methods of risk assessment, device, terminal device and storage medium
CN108734380B (en) Risk account determination method and device and computing equipment
CN109741173B (en) Method, device, equipment and computer storage medium for identifying suspicious money laundering teams
CN109410036A (en) A kind of fraud detection model training method and device and fraud detection method and device
CN109598331A (en) A kind of fraud identification model training method, fraud recognition methods and device
CN106709800A (en) Community partitioning method and device based on characteristic matching network
CN108960833A (en) A kind of abnormal transaction identification method based on isomery finance feature, equipment and storage medium
CN109635007B (en) Behavior evaluation method and device and related equipment
CN103577988A (en) Method and device for recognizing specific user
US11372526B2 (en) Method for anomaly detection in clustered data structures
CN108182627A (en) A kind of system that user credit assessment is realized according to user behavior
CN105303447A (en) Method and device for carrying out credit rating through network information
CN106779723A (en) A kind of mobile terminal methods of risk assessment and device
Simak Inverse and negative DEA and their application to credit risk evaluation
CN109086927A (en) In conjunction with the multiple-factor method of commerce of big data the analysis of public opinion and Fusion Model
Koralun-Bereźnicka Corporate performance
CN112950347A (en) Resource data processing optimization method and device, storage medium and terminal
CN110533528A (en) Assess the method and apparatus of business standing
Dong et al. Real-time Fraud Detection in e-Market Using Machine Learning Algorithms.
KR102646316B1 (en) Method and apparatus for identifying genuine article of seized movable property, and electronic auction system using the same
CN109635289A (en) Entry classification method and audit information abstracting method
Ranjan et al. Fraud detection on bank payments using machine learning
Huang et al. Multidimensional reputation evaluation model for crowdsourcing participants based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190709

RJ01 Rejection of invention patent application after publication