CN113469695B - Electronic fraud transaction identification method, system and device based on kernel supervision hash model - Google Patents

Electronic fraud transaction identification method, system and device based on kernel supervision hash model Download PDF

Info

Publication number
CN113469695B
CN113469695B CN202010237091.6A CN202010237091A CN113469695B CN 113469695 B CN113469695 B CN 113469695B CN 202010237091 A CN202010237091 A CN 202010237091A CN 113469695 B CN113469695 B CN 113469695B
Authority
CN
China
Prior art keywords
hash
kernel
model
data
transaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010237091.6A
Other languages
Chinese (zh)
Other versions
CN113469695A (en
Inventor
蒋昌俊
闫春钢
丁志军
刘关俊
张亚英
李震川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN202010237091.6A priority Critical patent/CN113469695B/en
Publication of CN113469695A publication Critical patent/CN113469695A/en
Application granted granted Critical
Publication of CN113469695B publication Critical patent/CN113469695B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Finance (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Strategic Management (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Technology Law (AREA)
  • Computing Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The invention provides an electronic fraud transaction identification method, system and device based on a kernel supervision hash model, wherein the method comprises the following steps: constructing a nuclear supervision hash model; training the core supervision hash model by using marked normal transaction data and fraudulent transaction data to obtain a trained core supervision hash model; normal transactions and fraudulent transactions are identified using a trained kernel supervised hash model. The invention discloses an electronic fraud transaction identification method, system and device based on a kernel supervision hash model, which are used for effectively identifying novel fraud transactions.

Description

Electronic fraud transaction identification method, system and device based on kernel supervision hash model
Technical Field
The invention relates to the technical field of electronic fraud transaction identification, in particular to an electronic fraud transaction identification method, system and device based on a kernel supervision hash model.
Background
Electronic transaction fraud and fraud detection are a continuous gaming process. In recent years, the explosive development of machine learning and data mining has driven the advancement of electronic fraudulent transaction identification technology. Common machine learning methods, such as logistic regression models, naive bayes models, decision tree models, random forest models, and the like, are all statistical learning-based methods, and require enough data samples to embody the features to be learned so as to be effective in learning. For electronic fraud transactions, as new technologies develop, fraud molecules may be careful to study new means to implement fraud. The new fraudulent transaction may be different from the known fraudulent transaction, so that the machine learning model trained by using the known fraudulent transaction data cannot effectively identify the new fraudulent transaction, resulting in the degradation of the electronic fraudulent transaction identification model and serious economic loss.
The novel electronic fraud transaction data has a small sample number, and a common machine learning method based on statistical learning is difficult to be applicable. The type of the test sample is determined by searching a training sample (reference data) which is closest to the test sample based on the nearest neighbor KNN model, the decision mode has low requirement on the sample number of different types of transaction data, the method is very suitable for identifying novel small amount of fraudulent transaction data, meanwhile, a large amount of known fraudulent transactions can be accurately identified, and the method can be well adapted to the change of fraudulent transaction types. However, the KNN model needs to calculate the distance between each test sample and the training sample (reference data), so that the calculation cost is very high, and a large amount of memory resources are consumed by storing a large amount of training samples (reference data), so that the KNN model is difficult to successfully apply in the electronic fraud transaction identification scene.
Research has been carried out for many years in electronic fraudulent transaction identification, but a conventional fraudulent transaction identification model cannot effectively learn novel fraudulent transactions with small sample size, because most of conventional fraudulent transaction identification models are based on statistical learning methods, a small number of novel fraudulent transactions have no statistical advantage compared with other fraudulent transaction samples, cannot be sufficiently learned, and can cause that the novel fraudulent transactions are difficult to accurately identify, so that the performance of the fraudulent transaction identification model is reduced.
It is therefore desirable to be able to address the problem of how to identify new fraudulent transactions.
Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, an object of the present invention is to provide a method, a system and a device for identifying electronic fraudulent transactions based on a kernel supervised hash model, which are used for solving the problem of how to identify novel fraudulent transactions in the prior art.
To achieve the above and other related objects, the present invention provides an electronic fraud transaction identification method based on a kernel supervised hash model, including the steps of: constructing a nuclear supervision hash model; training the core supervision hash model by using marked normal transaction data and fraudulent transaction data to obtain a trained core supervision hash model; normal transactions and fraudulent transactions are identified using a trained kernel supervised hash model.
In one embodiment of the present invention, the defining the kernel supervised hash model includes the steps of: determining the number p of the kernel hash functions;
defining a kernel hash function as:
Figure BDA0002431365790000021
wherein,,
Figure BDA0002431365790000022
Figure BDA0002431365790000023
the kernel function k (·) adopts a gaussian kernel function; each transaction data x input is mapped into a p-bit hash code c by using p kernel hash functions.
In one embodiment of the present invention, the core supervision hash model is trained using the marked normal data and the marked fraud data to obtain a trained core supervision hash model, comprising the steps of: the optimization objective function of the kernel supervised hash model is as follows:
Figure BDA0002431365790000024
Wherein the operation is
Figure BDA0002431365790000025
Representing the square of the Frobenius norm of the matrix, H t Expressed as:
Figure BDA0002431365790000026
wherein the method comprises the steps of
Figure BDA0002431365790000027
And optimizing an optimization objective function of the kernel supervised hash model by using a greedy algorithm and a gradient descent method.
In one embodiment of the invention, the identifying normal transactions and fraudulent transactions using the trained kernel supervised hash model includes the steps of: coding all training data through the trained kernel supervised hash model KSH (x) to obtain hash codes of all the training data; encoding test data through the kernel supervision hash model KSH (x) to obtain hash codes of the test data; calculating the similarity between the hash codes of the test data and the hash codes of all training data by adopting the Hamming distance; k training data closest to the similarity of hash codes of the test data are selected, and the transaction type of the test data is identified based on the transaction types of the k training data.
In order to achieve the above purpose, the invention also provides an electronic fraud transaction identification system based on the kernel supervision hash model, a definition module, a training module and an identification module; the definition module is used for constructing a nuclear supervision hash model; the training module is used for training the nuclear supervision hash model by using the marked normal transaction data and the marked fraudulent transaction data to obtain a trained nuclear supervision hash model; the identification module is used for identifying normal transactions and fraudulent transactions by using a trained kernel supervised hash model.
In an embodiment of the invention, the definition module is configured to:
determining the number p of the kernel hash functions;
defining a kernel hash function as:
Figure BDA0002431365790000031
wherein,,
Figure BDA0002431365790000032
Figure BDA0002431365790000033
the kernel function k (·) adopts a gaussian kernel function; each transaction data x input is mapped into a p-bit hash code c by using p kernel hash functions.
In an embodiment of the present invention, the training module is configured to: the optimization objective function of the kernel supervised hash model is as follows:
Figure BDA0002431365790000034
wherein the operation is
Figure BDA0002431365790000035
Representing the square of the Frobenius norm of the matrix, H t Expressed as:
Figure BDA0002431365790000036
wherein the method comprises the steps of
Figure BDA0002431365790000037
And optimizing an optimization objective function of the kernel supervised hash model by using a greedy algorithm and a gradient descent method.
In an embodiment of the invention, the identification module is configured to: coding all training data through the trained kernel supervised hash model KSH (x) to obtain hash codes of all the training data; encoding test data through the kernel supervision hash model KSH (x) to obtain hash codes of the test data; calculating the similarity between the hash codes of the test data and the hash codes of all training data by adopting the Hamming distance; k training data closest to the similarity of hash codes of the test data are selected, and the transaction type of the test data is identified based on the transaction types of the k training data.
To achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements any of the above-described electronic fraud transaction identification methods based on a kernel supervised hash model.
In order to achieve the above object, the present invention further provides an electronic fraud transaction recognition device based on a kernel supervised hash model, including: a processor and a memory; the memory is used for storing a computer program; the processor is connected with the memory and is used for executing the computer program stored in the memory so that the electronic fraud transaction identification device based on the nuclear supervision hash model executes any electronic fraud transaction identification method based on the nuclear supervision hash model.
As described above, the method, the system and the device for identifying the electronic fraud transaction based on the kernel supervision hash model have the following beneficial effects: for effectively identifying new fraudulent transactions.
Drawings
FIG. 1a is a flow chart of an embodiment of a method for identifying electronic fraudulent transactions based on a kernel supervised hash model according to the present invention;
FIG. 1b is a flow chart of an electronic fraud transaction identification method according to the present invention based on a kernel supervised hash model in a further embodiment;
FIG. 1c is a flow chart of an electronic fraud transaction identification method according to the present invention based on a kernel supervised hash model in yet another embodiment;
FIG. 1d is a graph showing the comparison of accuracy, recall, precision, and overall performance of an electronic fraud transaction identification method according to an embodiment of the present invention based on a kernel-supervised hash model;
FIG. 1e is a diagram showing training time and test time comparisons of the method for identifying electronic fraudulent transactions based on a kernel-supervised hash model according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an electronic fraud transaction identification system based on a kernel supervised hash model according to the present invention in an embodiment;
FIG. 3 is a schematic diagram of an electronic fraud transaction identifying device based on a kernel supervised hash model according to an embodiment of the present invention.
Description of element reference numerals
21. Definition module
22. Training module
23. Identification module
31. Processor and method for controlling the same
32. Memory device
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.
It should be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present invention by way of illustration, so that only the components related to the present invention are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, the form, number and proportion of each component in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
The method, the system and the device for identifying the electronic fraudulent transaction based on the kernel supervision hash model are used for effectively identifying novel fraudulent transaction.
As shown in fig. 1a, in an embodiment, the method for identifying electronic fraud transaction based on a kernel supervised hash model of the present invention includes the following steps:
and S11, constructing a nuclear supervision hash model.
Specifically, the construction of the nuclear supervision hash model comprises the following steps:
determining the number p of the kernel hash functions;
defining a kernel hash function as:
Figure BDA0002431365790000051
wherein,,
Figure BDA0002431365790000052
Figure BDA0002431365790000053
the kernel function k (·) adopts a gaussian kernel function; each transaction data x input is mapped into a p-bit hash code c by using p kernel hash functions.
Specifically, the kernel supervised Hash model maps high-dimensional original data into low-latitude Hash Codes (Hash Codes) by constructing a set of kernel Hash functions (Kernel Hash Functions), and then classifies transactions in a Hash space by adopting a nearest neighbor search method. The invention adopts a supervised hash method to train a group of kernel functions through marked normal and fraudulent transaction data, so that the samples of the same class are as close as possible in a hash space and the samples of different classes are as far away as possible. A core hash function may be formally defined as:
Figure BDA0002431365790000061
Where x represents a transaction record and,
Figure BDA0002431365790000062
is from the original annotation dataset { x } 1 ,x 2 ,…,x v }=X full Including a portion of fraudulent transaction data X' fraud And part of normal transaction data X' normal ,a j And b are the coefficients and bias terms, respectively, of the kernel function. The rest of the data set is decimated by t samples as trainingData set->
Figure BDA0002431365790000063
Figure BDA0002431365790000064
Sign function sgn (data), sgn (data) =1 when data > 0; sgn (data) = -1 when data is less than or equal to 0.
The hash code is generated by a plurality of kernel hash functions with different parameters, each of which is responsible for mapping the input transaction samples into a 1-bit hash code. In order to be able to better distinguish between different classes of transactions, fully exploiting the information of the training data requires that each kernel hash function is specific to the input training data set X train The conditions need to be satisfied:
Figure BDA0002431365790000065
bringing formula (1) into existence
Figure BDA0002431365790000066
The parameter b can be obtained by solving the equation, but the equation is difficult to solve, and alternative approximate solutions can be used:
Figure BDA0002431365790000067
bringing the approximate solution of parameter b into equation (1) can yield the final form of the kernel hash function:
Figure BDA0002431365790000068
wherein the method comprises the steps of
Figure BDA0002431365790000069
Can be calculated in advance. Thus in a kernel hash functionAfter determination of the kernel function k (), the parameters ∈>
Figure BDA00024313657900000610
The kernel hash function is uniquely determined. Thus, known marked fraud and normal transaction data sets X may be used train As a reference, the parameter +.>
Figure BDA0002431365790000071
After kernel function mapping, the transaction data of the same category can obtain the same symbol as far as possible, and the transaction data of different categories can obtain different symbols.
The KSH method maps each transaction data x input into a p-bit hash code c by using p kernel hash functions, and can be formally expressed as:
c=KSH(x)=[h 1 (x),h 2 (x),…,h p (x)]=[c 1 ,c 2 ,…,c p ] (3)
the KSH model can determine the number p of the adopted nuclear hash functions according to the needs, and the suggested p value range is as follows for the electronic fraud transaction scene of a general financial institution: p is more than or equal to 20 and less than or equal to 40. The kernel hash function generally requires a nonlinear learning capability, and the invention uniformly uses a Gaussian kernel function, namely
Figure BDA0002431365790000072
(Sigma value is from [0.5,1.5 ]]Random selection within range).
And step S12, training the nuclear supervision hash model by using the marked normal data and the marked fraud data to obtain a trained nuclear supervision hash model.
Specifically, the training the core supervision hash model by using the marked normal transaction data and the marked fraud transaction data to obtain a trained core supervision hash model, which comprises the following steps:
the optimization objective function of the kernel supervised hash model is as follows:
Figure BDA0002431365790000073
wherein the operation is
Figure BDA0002431365790000074
Representing the square of the Frobenius norm of the matrix, H t Expressed as:
Figure BDA0002431365790000075
wherein the method comprises the steps of
Figure BDA0002431365790000076
And optimizing an optimization objective function of the kernel supervised hash model by using a greedy algorithm and a gradient descent method.
In particular, the KSH model uses datasets
Figure BDA0002431365790000077
As reference dataset +.>
Figure BDA0002431365790000078
Figure BDA0002431365790000079
As training data set, learning p kernel hash functions h one by adopting greedy algorithm 1 (x),h 2 (x),…,h p (x) Corresponding parameter vector->
Figure BDA0002431365790000081
In an electronic fraudulent transaction identification scenario, the present invention measures training dataset X in the following manner train Similarity of any two samples:
Figure BDA0002431365790000082
wherein the method comprises the steps of
Figure BDA0002431365790000083
Representation->
Figure BDA0002431365790000084
And->
Figure BDA0002431365790000085
Belonging to the same class, and therefore their similarity S i,j =1 represents sample similarity; />
Figure BDA0002431365790000086
Representation of
Figure BDA0002431365790000087
And->
Figure BDA0002431365790000088
Not belonging to the same class, and therefore their similarity S i,j = -1 represents sample dissimilarity. If there are t samples, a t×t sample similarity matrix is formed
Figure BDA0002431365790000089
Each element in the sample similarity matrix is calculated by equation (4).
In the process of parameter learning, hash-coded inner products are used to calculate a training data set X train Similarity of any two sample hash codes, which can be formally expressed as:
Figure BDA00024313657900000810
wherein the method comprises the steps of
Figure BDA00024313657900000811
Representation sample->
Figure BDA00024313657900000812
And->
Figure BDA00024313657900000813
Corresponding to the Hash encoded Hamming distance. Then->
Figure BDA00024313657900000814
The value range of (2) is +. >
Figure BDA00024313657900000815
Can be used for
Figure BDA00024313657900000816
As a decision function of the similarity between any two samples, namely:
Figure BDA00024313657900000817
then expanding the decision function to the similarity between t training samples, the decision function can be obtained as:
Figure BDA00024313657900000818
wherein the method comprises the steps of
Figure BDA00024313657900000819
A matrix of hash codes of t training samples is represented. If the similarity matrix between the t training samples is S t Then the optimization objective function of the kernel supervised hash model can be obtained as:
Figure BDA00024313657900000820
wherein the operation is
Figure BDA00024313657900000821
Representing the square of the Frobenius norm of the matrix. The optimization of equation (8) is in fact the choice of suitable parameters such that the decision function F (-) predicts the similarity between the samples obtainedThe difference between the similarity and the actual similarity is as small as possible. Due to H t The hash code in (2) can be calculated by a kernel hash function shown in the formula (3), H t Can be expressed as:
Figure BDA0002431365790000091
wherein the method comprises the steps of
Figure BDA0002431365790000092
By taking this formula into formula (8) and making a simple transformation, a more specific optimization objective function can be obtained:
Figure BDA0002431365790000093
wherein L (H) t ) Representing the loss function of the optimization objective function. As can be seen from equation (10), the inner product of the sample hash code can be calculated individually on a bit-by-bit basis, and then the optimization objective can be gradually optimized using a greedy algorithm.
Using greedy algorithm on loss function L (H t ) The optimization is performed assuming that the completion parameters have been optimized
Figure BDA0002431365790000094
Then +.>
Figure BDA0002431365790000095
The optimization objective function of (1) can be expressed as:
Figure BDA0002431365790000096
wherein C is const Is a constant term that is used to determine the degree of freedom,
Figure BDA0002431365790000097
and P is 0 =pS t
Due to the objective function to be optimizedIs a loss function of (2)
Figure BDA0002431365790000098
The direct optimization is very difficult because of the inclusion of the sign function sgn (), so that a smoothing function can be used +.>
Figure BDA0002431365790000101
And replacing the sign function sgn (·) to reduce the optimization difficulty of the objective function. Due to the loss function->
Figure BDA0002431365790000102
Constant term C of (2) const Without affecting the optimization objective, and ignoring it, the processed loss function can be expressed as:
Figure BDA0002431365790000103
its corresponding optimization objective function can be expressed as:
Figure BDA0002431365790000104
loss function
Figure BDA0002431365790000105
For parameter->
Figure BDA0002431365790000106
The derivative of (2) can be expressed as:
Figure BDA0002431365790000107
wherein the method comprises the steps of
Figure BDA0002431365790000108
Thus, in the parameters of the hash function for any one core
Figure BDA0002431365790000109
The objective function can be optimized by means of gradient descent>
Figure BDA00024313657900001010
Obtained. For loss function->
Figure BDA00024313657900001011
If a pass solution method is adopted->
Figure BDA00024313657900001012
Is to calculate the optimal parameters +.>
Figure BDA00024313657900001013
The computational complexity is high. Thus, the gradient descent method may be used to approximate the optimal solution.
Assuming that the learning rate in the gradient descent method is alpha, the parameters are
Figure BDA00024313657900001014
The updating mode in the optimizing process is as follows:
Figure BDA00024313657900001015
then, the updated data is used for the
Figure BDA00024313657900001016
Is carried back into formula (12) and the parameter +.>
Figure BDA00024313657900001017
Is updated by using formula (15) for +.>
Figure BDA00024313657900001018
And updating. Repeating the process until the loss function is +.>
Figure BDA00024313657900001019
Hardly any more, i.e
Figure BDA00024313657900001020
Figure BDA00024313657900001021
(C t Is a very small constant threshold, e.g. 0.001), then +.>
Figure BDA00024313657900001022
The final result of (2) is thus the parameter +.>
Figure BDA00024313657900001023
And the optimization of the optimization objective function of the kernel supervision hash model is completed.
And S13, identifying normal transactions and fraudulent transactions by using the trained kernel supervised hash model.
Specifically, the use of the trained kernel supervised hash model to identify normal and fraudulent transactions includes the steps of:
and obtaining a trained nuclear supervision hash model. Coding all training data through the trained kernel supervised hash model KSH (x) to obtain hash codes of all the training data; and stored.
And coding the test data through the trained kernel supervised hash model KSH (x) to obtain the hash code of the test data.
And calculating the similarity between the hash codes of the test data and the hash codes of all the training data by using the Hamming distance.
K training data closest to the similarity of hash codes of the test data are selected, and the transaction type of the test data is identified based on the transaction types of the k training data. The transaction type of the test data is identified based on the transaction type of the k training data being a fraudulent or normal transaction. Specifically, considering the sparsity of the novel fraudulent transaction data amount in the training sample, the value range of k is set to be 3-7. For example, 3 training data closest to the similarity of hash codes of the test data are selected, and based on the transaction types of the 3 training data, normal transaction and fraudulent transaction are respectively identified, the transaction type of the test data is identified as normal transaction (because two of the three training data are normal transaction and one is fraudulent transaction).
Specifically, the method further comprises the step of classifying the similarity by using a nearest neighbor model KNN, so that the problem that the KNN model is high in calculation and storage cost in electronic fraudulent transaction identification is solved through hash coding, and the performance of the KNN model in identifying novel fraudulent transactions is fully exerted, and the novel fraudulent transactions can be effectively identified. In order to make the KNN model problematic for use in electronic fraud transaction identification, the KNN model is employed to determine the class of test samples. The kernel supervised hash model adopted by the invention consists of a group of kernel hash functions, and the kernel hash functions are obtained by supervised learning of training samples, so that samples with the same category are similar in a hash space as much as possible, and samples with different categories are separated in the hash space as much as possible. According to the invention, the kernel supervised hash model is used for mapping data from the original high-dimensional feature space to the low-latitude hash space, so that on one hand, the dimension of a sample is reduced, and on the other hand, the KNN model can quickly search the nearest neighbor sample in the hash space, the problem that the calculation and storage cost of the KNN model in electronic fraud transaction identification is high is solved, and novel electronic fraud transaction can be accurately identified. The KNN model is different from a conventional statistical learning method, and the category of the test sample is determined by the category of k nearest samples in the feature space, so that a small number of novel fraudulent transactions cannot be ignored due to the small number, and the identification performance of the novel fraudulent transactions is ensured.
As shown in fig. 1b, in an embodiment, the method for identifying electronic fraud transaction based on a kernel supervised hash model according to the present invention includes the following steps:
a kernel supervised hash model is defined.
And training the nuclear supervision hash model by using the marked normal data and the marked fraud data to obtain a trained nuclear supervision hash model.
The method for identifying normal transactions and fraudulent transactions by using the trained kernel supervised hash model specifically comprises the following steps: and coding all training data (high-dimensional reference (training) data) through the trained kernel supervised hash model KSH (x) to obtain hash coding (low-micro reference (training) data) of all the training data. Test data (test sample x test ) Encoding by the kernel supervised hash model KSH (x) to obtain the test data (test sample x) test ) Is encoded by the hash code of (a). And classifying the similarity through a nearest neighbor model KNN, selecting k training data closest to the similarity of the hash codes of the test data, and identifying the transaction type of the test data based on the transaction type of the k training data. Identifying the test data (test sample x based on the transaction type of the k training data being fraudulent or normal transactions test ) Is a transaction type of (a).
As shown in fig. 1c, in an embodiment, the method for identifying electronic fraud transaction based on a kernel supervised hash model of the present invention includes the following steps:
and S11, constructing a nuclear supervision hash model.
Specifically, the kernel supervised Hash model maps high-dimensional original data into low-latitude Hash Codes (Hash Codes) by constructing a set of kernel Hash functions (Kernel Hash Functions), and then classifies transactions in a Hash space by adopting a nearest neighbor search method. The invention adopts a supervised hash method to train a group of kernel functions through marked normal and fraudulent transaction data, so that the samples of the same class are as close as possible in a hash space and the samples of different classes are as far away as possible. Determining a core hash function is formally defined as:
Figure BDA0002431365790000121
where x represents a transaction record and,
Figure BDA0002431365790000122
is from the original annotation dataset { x } 1 ,x 2 ,…,x v }=X full Including a portion of fraudulent transaction data X' fraud And part of normal transaction data X' normal ,a j And b are the coefficients and bias terms, respectively, of the kernel function. The remaining data set is decimated by a further t samples as training data set +.>
Figure BDA0002431365790000123
Figure BDA0002431365790000124
Sign function sgn (data), sgn (data) =1 when data > 0; sgn (data) = -1 when data is less than or equal to 0.
The hash code is generated by a plurality of kernel hash functions with different parameters, each of which is responsible for mapping the input transaction samples into a 1-bit hash code. In order to be able to better distinguish between different classes of transactions, fully exploiting the information of the training data requires that each kernel hash function is specific to the input training data set X train The conditions need to be satisfied:
Figure BDA0002431365790000125
bringing formula (1) into existence
Figure BDA0002431365790000126
The parameter b can be obtained by solving the equation, but the equation is difficult to solve, and alternative approximate solutions can be used:
Figure BDA0002431365790000127
bringing the approximate solution of parameter b into equation (1) to obtain the final form of the kernel hash function:
Figure BDA0002431365790000131
wherein the method comprises the steps of
Figure BDA0002431365790000132
Can be calculated in advance. Thus, after determination of the kernel function k (·) in the kernel hash function, the parameter +.>
Figure BDA0002431365790000133
The kernel hash function is uniquely determined. Thus, known marked fraud and normal transaction data sets X may be used train As a reference, the parameter +.>
Figure BDA0002431365790000134
After kernel function mapping, the transaction data of the same category can obtain the same symbol as far as possible, and the transaction data of different categories can obtain different symbols.
The KSH method maps each transaction data x input into a p-bit hash code c by using p kernel hash functions, and can be formally expressed as:
c=KSH(x)=[h 1 (x),h 2 (x),…,h p (x)]=[c 1 ,c 2 ,…,c p ] (3)
The KSH model can determine the number p of the adopted nuclear hash functions according to the needs, and for the electronic fraud transaction scene of a general financial institution, the value range of p is determined as follows: p is more than or equal to 20 and less than or equal to 40. The kernel hash function generally requires a nonlinear learning capability, and the invention uniformly uses a Gaussian kernel function, namely
Figure BDA0002431365790000135
(Sigma value is from [0.5,1.5 ]]Random selection within range).
And step S12, training the nuclear supervision hash model by using the marked normal data and the marked fraud data to obtain a trained nuclear supervision hash model.
In particular, the KSH model uses datasets
Figure BDA0002431365790000136
As reference dataset +.>
Figure BDA0002431365790000137
Figure BDA0002431365790000138
As training data set, learning p kernel hash functions h one by adopting greedy algorithm 1 (x),h 2 (x),…,h p (x) Corresponding parameter vector->
Figure BDA0002431365790000139
In an electronic fraudulent transaction identification scenario, the present invention measures training dataset X in the following manner train Similarity of any two samples: />
Figure BDA00024313657900001310
Wherein the method comprises the steps of
Figure BDA00024313657900001311
Representation->
Figure BDA00024313657900001312
And->
Figure BDA00024313657900001313
Belonging to the same class, and therefore their similarity S i,j =1 represents sample similarity; />
Figure BDA00024313657900001314
Representation of
Figure BDA00024313657900001315
And->
Figure BDA00024313657900001316
Not belonging to the same class, and therefore their similarity S i,j = -1 represents sample dissimilarity. If there are t samples, a t×t sample similarity matrix is formed
Figure BDA00024313657900001317
Each element in the sample similarity matrix is calculated by equation (4).
In the process of parameter learning, hash-coded inner products are used to calculate a training data set X train Similarity of any two sample hash codes, which can be formally expressed as:
Figure BDA0002431365790000141
wherein the method comprises the steps of
Figure BDA0002431365790000142
Representation sample->
Figure BDA0002431365790000143
And->
Figure BDA0002431365790000144
Corresponding to the Hash encoded Hamming distance. Then->
Figure BDA0002431365790000145
The value range of (2) is +.>
Figure BDA0002431365790000146
Can be used for
Figure BDA0002431365790000147
As a decision function of the similarity between any two samples, namely:
Figure BDA0002431365790000148
then expanding the decision function to the similarity between t training samples, the decision function can be obtained as:
Figure BDA0002431365790000149
wherein the method comprises the steps of
Figure BDA00024313657900001410
A matrix of hash codes of t training samples is represented. If the similarity matrix between the t training samples is S t Then the optimization objective function of the kernel supervised hash model can be obtained as:
Figure BDA00024313657900001411
wherein the operation is
Figure BDA00024313657900001412
Representing the square of the Frobenius norm of the matrix. The optimization of equation (8) is in fact to choose the appropriate parameters such that the difference between the similarity between the samples predicted by the decision function F (-) and the actual similarity is as small as possible. Due to H t The hash code in (2) can be calculated by a kernel hash function shown in the formula (3), H t Can be expressed as: />
Figure BDA00024313657900001413
Wherein the method comprises the steps of
Figure BDA00024313657900001414
By taking this formula into formula (8) and making a simple transformation, a more specific optimization objective function can be obtained:
Figure BDA00024313657900001415
Figure BDA0002431365790000151
Wherein L (H) t ) Representing the loss function of the optimization objective function. As can be seen from equation (10), the inner product of the sample hash code can be calculated individually on a bit-by-bit basis, and then the optimization objective can be gradually optimized using a greedy algorithm.
Using greedy algorithm on loss function L (H t ) The optimization is performed assuming that the completion parameters have been optimized
Figure BDA0002431365790000152
Then +.>
Figure BDA0002431365790000153
The optimization objective function of (1) can be expressed as:
Figure BDA0002431365790000154
wherein C is const Is a constant term that is used to determine the degree of freedom,
Figure BDA0002431365790000155
and P is 0 =pS t
Loss function due to objective function to be optimized
Figure BDA0002431365790000156
The direct optimization is very difficult because of the inclusion of the sign function sgn (), so that a smoothing function can be used +.>
Figure BDA0002431365790000157
And replacing the sign function sgn (·) to reduce the optimization difficulty of the objective function. Due to the loss function->
Figure BDA0002431365790000158
Constant term C of (2) const To optimizeThe target has no effect and is ignored, then the processed loss function can be expressed as:
Figure BDA0002431365790000159
its corresponding optimization objective function can be expressed as:
Figure BDA00024313657900001510
loss function
Figure BDA00024313657900001511
For parameter->
Figure BDA00024313657900001512
The derivative of (2) can be expressed as:
Figure BDA00024313657900001513
wherein the method comprises the steps of
Figure BDA00024313657900001514
For loss function
Figure BDA00024313657900001515
If a pass solution method is adopted->
Figure BDA00024313657900001516
Is to calculate the optimal parameters +.>
Figure BDA00024313657900001517
The computational complexity is high. Thus, the gradient descent method may be used to approximate the optimal solution.
Assuming that the learning rate in the gradient descent method is alpha, the parameters are
Figure BDA0002431365790000161
The updating mode in the optimizing process is as follows:
Figure BDA0002431365790000162
then, the updated data is used for the
Figure BDA0002431365790000163
Is carried back into formula (12) and the parameter +.>
Figure BDA0002431365790000164
Is updated by using formula (15) for +.>
Figure BDA0002431365790000165
And updating. Repeating the process until the loss function is +.>
Figure BDA0002431365790000166
Hardly any more, i.e
Figure BDA0002431365790000167
Figure BDA0002431365790000168
(C t Is a very small constant threshold, e.g. 0.001), then +.>
Figure BDA0002431365790000169
The final result of (2) is thus the parameter +.>
Figure BDA00024313657900001610
And the optimization of the optimization objective function of the kernel supervision hash model is completed.
And S13, identifying normal transactions and fraudulent transactions by using the trained kernel supervised hash model.
Specifically, the use of the trained kernel supervised hash model to identify normal and fraudulent transactions includes the steps of:
and obtaining a trained nuclear supervision hash model. Coding all training data through the trained kernel supervised hash model KSH (x) to obtain hash codes of all the training data; and stored.
And coding the test data through the trained kernel supervised hash model KSH (x) to obtain the hash code of the test data.
And calculating the similarity between the hash codes of the test data and the hash codes of all the training data by using the Hamming distance.
And obtaining a trained nuclear supervision hash model. Coding all training data through the trained kernel supervised hash model KSH (x) to obtain hash codes of all the training data; and stored.
And coding the test data through the trained kernel supervised hash model KSH (x) to obtain the hash code of the test data.
And calculating the similarity between the hash codes of the test data and the hash codes of all the training data by using the Hamming distance.
K training data closest to the similarity of hash codes of the test data are selected, and the transaction type of the test data is identified based on the transaction types of the k training data. The transaction type of the test data is identified based on the transaction type of the k training data being a fraudulent or normal transaction. Specifically, considering the sparsity of the novel fraudulent transaction data amount in the training sample, the value range of k is set to be 3-7. For example, 3 training data closest to the similarity of hash codes of the test data are selected, and based on the transaction types of the 3 training data, normal transaction and fraudulent transaction are respectively identified, the transaction type of the test data is identified as normal transaction (because two of the three training data are normal transaction and one is fraudulent transaction).
Specifically, the method further comprises the step of classifying the similarity by using a nearest neighbor model KNN, so that the problem that the KNN model is high in calculation and storage cost in electronic fraudulent transaction identification is solved through hash coding, and the performance of the KNN model in identifying novel fraudulent transactions is fully exerted, and the novel fraudulent transactions can be effectively identified. In order to make the KNN model problematic for use in electronic fraud transaction identification, the KNN model is employed to determine the class of test samples. The kernel supervised hash model adopted by the invention consists of a group of kernel hash functions, and the kernel hash functions are obtained by supervised learning of training samples, so that samples with the same category are similar in a hash space as much as possible, and samples with different categories are separated in the hash space as much as possible. According to the invention, the kernel supervised hash model is used for mapping data from the original high-dimensional feature space to the low-latitude hash space, so that on one hand, the dimension of a sample is reduced, and on the other hand, the KNN model can quickly search the nearest neighbor sample in the hash space, the problem that the calculation and storage cost of the KNN model in electronic fraud transaction identification is high is solved, and novel electronic fraud transaction can be accurately identified. The KNN model is different from a conventional statistical learning method, and the category of the test sample is determined by the category of k nearest samples in the feature space, so that a small number of novel fraudulent transactions cannot be ignored due to the small number, and the identification performance of the novel fraudulent transactions is ensured.
Taking transaction data of a certain bank in China as an example, an actual test is developed. The data set contains transaction data from 2017, 4 to 6, approximately 350 ten thousand, all labeled by a bank professional. As shown in table 1, basic information for this dataset.
TABLE 1 electronic transaction data information
Month of month Data volume Features (e.g. a character)Quantity of Imbalance ratio
2017-04 1,243,035 43 1.07%
2017-05 1,216,299 43 2.22%
2017-06 1,042,714 43 2.39%
In order to be able to demonstrate the good performance and high efficiency of the method according to the invention in the identification of fraudulent transactions. The 4 and 5 month data of this dataset were used as training sets, and 80% of the reference dataset (reference data) used for KSH model training was randomly extracted from each of the fraudulent transaction data and the normal transaction data, respectively, and the remaining portion was used as validation set for adjusting model parameters. The 6 month data was then used as a test set of models (test data).
The invention discloses an electronic fraud transaction identification method based on a kernel supervision hash model, which is used for carrying out a comparison experiment with 3 common fraud transaction identification methods:
1)random Forests(RF);
2)Support Vector Machine(SVM);
3)K-Nearest Neighbor(KNN);
as shown in fig. 1d, although the electronic fraud transaction recognition method based on the kernel supervised hash model proposed by the present invention is lower in accuracy (precision) than the SVM model, the recall (recall) is higher than the SVM model, the SVM/RF/KNN/KSH is from left to right, so that the KSH model has better overall performance (maximum F1 value-F1-Measure) than other models. While the four methods do not differ much in accuracy. Experimental structure figure 1e shows training (training stage) times (SVM/RF/KNN/KSH in order from left to right) and testing (testing stage) times (SVM/RF/KNN/KSH in order from left to right) for different models, although the original KNN model does not require training, the testing stage is computationally intensive and time consuming. The training and testing time of the KSH model provided by the invention is close to that of the RF model, and the required time is less than that of the SVM model. Therefore, the KSH model provided by the invention can be used for efficiently identifying fraudulent transactions, and the computing resource consumption of the original KNN model applied to electronic fraudulent transaction identification can be obviously improved.
As shown in fig. 2, in one embodiment, the electronic fraud transaction identification system based on the kernel supervised hash model of the present invention includes a definition module 21, a training module 22, and an identification module 23.
The definition module 21 is used for constructing a kernel supervision hash model; the training module is used for training the nuclear supervision hash model by using the marked normal transaction data and the marked fraudulent transaction data to obtain a trained nuclear supervision hash model; the identification module is used for identifying normal transactions and fraudulent transactions by using a trained kernel supervised hash model.
In an embodiment of the present invention, the definition module 21 is configured to: determining the number p of the kernel hash functions; defining a kernel hash function as:
Figure BDA0002431365790000181
wherein,,
Figure BDA0002431365790000182
Figure BDA0002431365790000183
kernel function k%Using a gaussian kernel function; each transaction data x input is mapped into a p-bit hash code c by using p kernel hash functions.
In one embodiment of the present invention, the training module 22 is configured to: the optimization objective function of the kernel supervised hash model is as follows:
Figure BDA0002431365790000184
wherein the operation is
Figure BDA0002431365790000185
Representing the square of the Frobenius norm of the matrix, H t Expressed as:
Figure BDA0002431365790000186
wherein the method comprises the steps of
Figure BDA0002431365790000187
And optimizing an optimization objective function of the kernel supervised hash model by using a greedy algorithm and a gradient descent method.
In an embodiment of the present invention, the identification module 23 is configured to: coding all training data through the trained kernel supervised hash model KSH (x) to obtain hash codes of all the training data; encoding test data through the kernel supervision hash model KSH (x) to obtain hash codes of the test data; calculating the similarity between the hash codes of the test data and the hash codes of all training data by adopting the Hamming distance; k training data closest to the similarity of hash codes of the test data are selected, and the transaction type of the test data is identified based on the transaction types of the k training data.
It should be noted that, the structures and principles of the definition module 21, the training module 22 and the identification module 23 are in one-to-one correspondence with the steps in the above-mentioned electronic fraud transaction identification method based on the kernel supervised hash model, so that the description thereof is omitted here.
It should be noted that, it should be understood that the division of the modules of the above system is merely a division of a logic function, and may be fully or partially integrated into a physical entity or may be physically separated. And these modules may all be implemented in software in the form of calls by the processing element; or can be realized in hardware; the method can also be realized in a form of calling software by a processing element, and the method can be realized in a form of hardware by a part of modules. For example, the x module may be a processing element that is set up separately, may be implemented in a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and the function of the x module may be called and executed by a processing element of the apparatus. The implementation of the other modules is similar. In addition, all or part of the modules can be integrated together or can be independently implemented. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in a software form.
For example, the modules above may be one or more integrated circuits configured to implement the methods above, such as: one or more application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as ASIC), or one or more microprocessors (Digital Singnal Processor, abbreviated as DSP), or one or more field programmable gate arrays (Field Programmable Gate Array, abbreviated as FPGA), or the like. For another example, when a module above is implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU) or other processor that may invoke the program code. For another example, the modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
In an embodiment of the present invention, the present invention further includes a computer readable storage medium having a computer program stored thereon, which when executed by a processor implements any of the above-described electronic fraud transaction identification methods based on a kernel supervised hash model.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by computer program related hardware. The aforementioned computer program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
As shown in fig. 3, in an embodiment, the electronic fraud transaction identifying device based on the kernel supervised hash model of the present invention includes: a processor 31 and a memory 32; the memory 32 is used for storing a computer program; the processor 31 is connected to the memory 32 and is configured to execute a computer program stored in the memory 32, so that the electronic fraud transaction identifying device based on the kernel supervised hash model performs any of the electronic fraud transaction identifying methods based on the kernel supervised hash model.
Specifically, the memory 32 includes: various media capable of storing program codes, such as ROM, RAM, magnetic disk, U-disk, memory card, or optical disk.
Preferably, the processor 31 may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), etc.; but also digital signal processors (Digital Signal Processor, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field programmable gate arrays (Field Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In summary, the method, the system and the device for identifying the electronic fraudulent transaction based on the kernel supervision hash model are used for effectively identifying novel fraudulent transaction. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.
The above embodiments are merely illustrative of the principles of the present invention and its effectiveness, and are not intended to limit the invention. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the invention. Accordingly, it is intended that all equivalent modifications and variations of the invention be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.

Claims (7)

1. An electronic fraud transaction identification method based on a kernel supervision hash model is characterized by comprising the following steps:
constructing a nuclear supervision hash model; the kernel supervised hash model is used for mapping high-dimensional original data into low-dimensional hash codes by constructing a group of kernel hash functions, and classifying transactions in a hash space by adopting a nearest neighbor searching method; the hash code is generated by a plurality of kernel hash functions with different parameters;
Training the core supervision hash model by using marked normal transaction data and fraudulent transaction data to obtain a trained core supervision hash model; comprising the following steps: establishing an optimization objective function of a kernel supervision hash model; optimizing an optimization objective function of the kernel supervision hash model by using a greedy algorithm and a gradient descent method;
identifying normal transactions and fraudulent transactions using the trained kernel supervised hash model; the method comprises the following steps:
coding all training data through the trained kernel supervised hash model KSH (x) to obtain hash codes of all the training data;
encoding test data through the kernel supervision hash model KSH (x) to obtain hash codes of the test data;
calculating the similarity between the hash codes of the test data and the hash codes of all training data by adopting a Hamming distance, and classifying the similarity based on a KNN model;
k training data closest to the similarity of hash codes of the test data are selected, and the transaction type of the test data is identified based on the transaction types of the k training data.
2. The method for identifying electronic fraudulent transactions based on a nuclear supervised hash model according to claim 1, wherein the constructing the nuclear supervised hash model includes the steps of:
Determining the number p of the kernel hash functions;
defining a kernel hash function as:
Figure FDA0004133522570000011
wherein,,
Figure FDA0004133522570000012
Figure FDA0004133522570000013
the kernel function k (·) adopts a gaussian kernel function;
each transaction data x input is mapped into a p-bit hash code c by using p kernel hash functions.
3. The method for identifying electronic fraudulent transactions based on a kernel supervised hash model as set forth in claim 2, wherein,
the optimization objective function of the kernel supervision hash model is as follows:
Figure FDA0004133522570000021
wherein the operation is
Figure FDA0004133522570000022
Representing the square of the Frobenius norm of the matrix, H t Expressed as:
Figure FDA0004133522570000023
wherein the method comprises the steps of
Figure FDA0004133522570000024
4. An electronic fraud transaction identification system based on a kernel supervised hash model, comprising: the system comprises a definition module, a training module and an identification module;
the definition module is used for constructing a nuclear supervision hash model; the kernel supervised hash model is used for mapping high-dimensional original data into low-dimensional hash codes by constructing a group of kernel hash functions, and classifying transactions in a hash space by adopting a nearest neighbor searching method; the hash code is generated by a plurality of kernel hash functions with different parameters;
the training module is used for training the nuclear supervision hash model by using the marked normal transaction data and the marked fraudulent transaction data to obtain a trained nuclear supervision hash model; comprising the following steps: establishing an optimization objective function of a kernel supervision hash model; optimizing an optimization objective function of the kernel supervision hash model by using a greedy algorithm and a gradient descent method;
The identification module is used for identifying normal transactions and fraudulent transactions by using a trained kernel supervised hash model; the method comprises the following steps:
coding all training data through the trained kernel supervised hash model KSH (x) to obtain hash codes of all the training data;
encoding test data through the kernel supervision hash model KSH (x) to obtain hash codes of the test data;
calculating the similarity between the hash codes of the test data and the hash codes of all training data by adopting a Hamming distance, and classifying the similarity based on a KNN model;
k training data closest to the similarity of hash codes of the test data are selected, and the transaction type of the test data is identified based on the transaction types of the k training data.
5. The system for identifying electronic fraudulent transactions based on a kernel supervised hash model of claim 4, wherein said definition module is configured to:
determining the number p of the kernel hash functions;
defining a kernel hash function as:
Figure FDA0004133522570000031
wherein,,
Figure FDA0004133522570000032
Figure FDA0004133522570000033
the kernel function k (·) adopts a gaussian kernel function;
each transaction data x input is mapped into a p-bit hash code c by using p kernel hash functions.
6. A computer readable storage medium having stored thereon a computer program, wherein the computer program is executed by a processor to implement the method of identifying electronic fraudulent transactions based on a kernel supervised hash model of any of claims 1 to 3.
7. An electronic fraudulent transaction identification device based on a kernel supervised hash model, comprising: a processor and a memory;
the memory is used for storing a computer program;
the processor is connected to the memory for executing the computer program stored in the memory, so that the electronic fraud transaction identifying device based on the nuclear supervision hash model executes the electronic fraud transaction identifying method based on the nuclear supervision hash model according to any one of claims 1 to 3.
CN202010237091.6A 2020-03-30 2020-03-30 Electronic fraud transaction identification method, system and device based on kernel supervision hash model Active CN113469695B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010237091.6A CN113469695B (en) 2020-03-30 2020-03-30 Electronic fraud transaction identification method, system and device based on kernel supervision hash model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010237091.6A CN113469695B (en) 2020-03-30 2020-03-30 Electronic fraud transaction identification method, system and device based on kernel supervision hash model

Publications (2)

Publication Number Publication Date
CN113469695A CN113469695A (en) 2021-10-01
CN113469695B true CN113469695B (en) 2023-06-30

Family

ID=77865021

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010237091.6A Active CN113469695B (en) 2020-03-30 2020-03-30 Electronic fraud transaction identification method, system and device based on kernel supervision hash model

Country Status (1)

Country Link
CN (1) CN113469695B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868743A (en) * 2016-05-31 2016-08-17 天津中科智能识别产业技术研究院有限公司 Face retrieval method based on rapid supervised discrete hashing
CN107657453A (en) * 2016-07-25 2018-02-02 平安科技(深圳)有限公司 Cheat recognition methods and the device of data
CN108304573A (en) * 2018-02-24 2018-07-20 江苏测联空间大数据应用研究中心有限公司 Target retrieval method based on convolutional neural networks and supervision core Hash
CN108596630A (en) * 2018-04-28 2018-09-28 招商银行股份有限公司 Fraudulent trading recognition methods, system and storage medium based on deep learning
CN108629593A (en) * 2018-04-28 2018-10-09 招商银行股份有限公司 Fraudulent trading recognition methods, system and storage medium based on deep learning
CN110378699A (en) * 2019-07-25 2019-10-25 中国工商银行股份有限公司 A kind of anti-fraud method, apparatus and system of transaction
CN110827036A (en) * 2019-11-07 2020-02-21 深圳乐信软件技术有限公司 Method, device, equipment and storage medium for detecting fraudulent transactions

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868743A (en) * 2016-05-31 2016-08-17 天津中科智能识别产业技术研究院有限公司 Face retrieval method based on rapid supervised discrete hashing
CN107657453A (en) * 2016-07-25 2018-02-02 平安科技(深圳)有限公司 Cheat recognition methods and the device of data
CN108304573A (en) * 2018-02-24 2018-07-20 江苏测联空间大数据应用研究中心有限公司 Target retrieval method based on convolutional neural networks and supervision core Hash
CN108596630A (en) * 2018-04-28 2018-09-28 招商银行股份有限公司 Fraudulent trading recognition methods, system and storage medium based on deep learning
CN108629593A (en) * 2018-04-28 2018-10-09 招商银行股份有限公司 Fraudulent trading recognition methods, system and storage medium based on deep learning
CN110378699A (en) * 2019-07-25 2019-10-25 中国工商银行股份有限公司 A kind of anti-fraud method, apparatus and system of transaction
CN110827036A (en) * 2019-11-07 2020-02-21 深圳乐信软件技术有限公司 Method, device, equipment and storage medium for detecting fraudulent transactions

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Credit Card Fraud Detection via Kernel-Based Supervised Hashing;Zhenchuan Li, Guanjun Liu, Shuo Wang等;《2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation》;20181206;第1250-1253页 *

Also Published As

Publication number Publication date
CN113469695A (en) 2021-10-01

Similar Documents

Publication Publication Date Title
Chen et al. Semi-supervised feature selection via sparse rescaled linear square regression
Sevakula et al. Transfer learning for molecular cancer classification using deep neural networks
US20220382553A1 (en) Fine-grained image recognition method and apparatus using graph structure represented high-order relation discovery
Sathya et al. [Retracted] Cancer Categorization Using Genetic Algorithm to Identify Biomarker Genes
US20140279738A1 (en) Non-Linear Classification of Text Samples
Chakraborty et al. Simultaneous variable weighting and determining the number of clusters—A weighted Gaussian means algorithm
CN103164701B (en) Handwritten Numeral Recognition Method and device
CN112199536A (en) Cross-modality-based rapid multi-label image classification method and system
CN110880007A (en) Automatic selection method and system for machine learning algorithm
Chu et al. Stacked Similarity-Aware Autoencoders.
CN109299263A (en) File classification method, electronic equipment and computer program product
CN114091603A (en) Spatial transcriptome cell clustering and analyzing method
CN111782804A (en) TextCNN-based same-distribution text data selection method, system and storage medium
Liu et al. Evolutionary compact embedding for large-scale image classification
Malekipirbazari et al. Performance comparison of feature selection and extraction methods with random instance selection
CN115080749A (en) Weak supervision text classification method, system and device based on self-supervision training
Chen et al. Active learning for unbalanced data in the challenge with multiple models and biasing
Mandal et al. Unsupervised non-redundant feature selection: a graph-theoretic approach
Termritthikun et al. Evolutionary neural architecture search based on efficient CNN models population for image classification
Rajpal et al. Ensemble of deep learning and machine learning approach for classification of handwritten Hindi numerals
CN111553442B (en) Optimization method and system for classifier chain tag sequence
CN113469695B (en) Electronic fraud transaction identification method, system and device based on kernel supervision hash model
Yuan et al. Context-aware clustering
Marconi et al. Hyperbolic manifold regression
Yap et al. Neural information processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant