CN113469695B

CN113469695B - Electronic fraud transaction identification method, system and device based on kernel supervision hash model

Info

Publication number: CN113469695B
Application number: CN202010237091.6A
Authority: CN
Inventors: 蒋昌俊; 闫春钢; 丁志军; 刘关俊; 张亚英; 李震川
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2020-03-30
Filing date: 2020-03-30
Publication date: 2023-06-30
Anticipated expiration: 2040-03-30
Also published as: CN113469695A

Abstract

The invention provides an electronic fraud transaction identification method, system and device based on a kernel supervision hash model, wherein the method comprises the following steps: constructing a nuclear supervision hash model; training the core supervision hash model by using marked normal transaction data and fraudulent transaction data to obtain a trained core supervision hash model; normal transactions and fraudulent transactions are identified using a trained kernel supervised hash model. The invention discloses an electronic fraud transaction identification method, system and device based on a kernel supervision hash model, which are used for effectively identifying novel fraud transactions.

Description

Electronic fraud transaction identification method, system and device based on kernel supervision hash model

Technical Field

The invention relates to the technical field of electronic fraud transaction identification, in particular to an electronic fraud transaction identification method, system and device based on a kernel supervision hash model.

Background

Electronic transaction fraud and fraud detection are a continuous gaming process. In recent years, the explosive development of machine learning and data mining has driven the advancement of electronic fraudulent transaction identification technology. Common machine learning methods, such as logistic regression models, naive bayes models, decision tree models, random forest models, and the like, are all statistical learning-based methods, and require enough data samples to embody the features to be learned so as to be effective in learning. For electronic fraud transactions, as new technologies develop, fraud molecules may be careful to study new means to implement fraud. The new fraudulent transaction may be different from the known fraudulent transaction, so that the machine learning model trained by using the known fraudulent transaction data cannot effectively identify the new fraudulent transaction, resulting in the degradation of the electronic fraudulent transaction identification model and serious economic loss.

The novel electronic fraud transaction data has a small sample number, and a common machine learning method based on statistical learning is difficult to be applicable. The type of the test sample is determined by searching a training sample (reference data) which is closest to the test sample based on the nearest neighbor KNN model, the decision mode has low requirement on the sample number of different types of transaction data, the method is very suitable for identifying novel small amount of fraudulent transaction data, meanwhile, a large amount of known fraudulent transactions can be accurately identified, and the method can be well adapted to the change of fraudulent transaction types. However, the KNN model needs to calculate the distance between each test sample and the training sample (reference data), so that the calculation cost is very high, and a large amount of memory resources are consumed by storing a large amount of training samples (reference data), so that the KNN model is difficult to successfully apply in the electronic fraud transaction identification scene.

Research has been carried out for many years in electronic fraudulent transaction identification, but a conventional fraudulent transaction identification model cannot effectively learn novel fraudulent transactions with small sample size, because most of conventional fraudulent transaction identification models are based on statistical learning methods, a small number of novel fraudulent transactions have no statistical advantage compared with other fraudulent transaction samples, cannot be sufficiently learned, and can cause that the novel fraudulent transactions are difficult to accurately identify, so that the performance of the fraudulent transaction identification model is reduced.

It is therefore desirable to be able to address the problem of how to identify new fraudulent transactions.

Disclosure of Invention

In view of the above-mentioned drawbacks of the prior art, an object of the present invention is to provide a method, a system and a device for identifying electronic fraudulent transactions based on a kernel supervised hash model, which are used for solving the problem of how to identify novel fraudulent transactions in the prior art.

To achieve the above and other related objects, the present invention provides an electronic fraud transaction identification method based on a kernel supervised hash model, including the steps of: constructing a nuclear supervision hash model; training the core supervision hash model by using marked normal transaction data and fraudulent transaction data to obtain a trained core supervision hash model; normal transactions and fraudulent transactions are identified using a trained kernel supervised hash model.

In one embodiment of the present invention, the defining the kernel supervised hash model includes the steps of: determining the number p of the kernel hash functions;

defining a kernel hash function as:

wherein,,

the kernel function k (·) adopts a gaussian kernel function; each transaction data x input is mapped into a p-bit hash code c by using p kernel hash functions.

In one embodiment of the present invention, the core supervision hash model is trained using the marked normal data and the marked fraud data to obtain a trained core supervision hash model, comprising the steps of: the optimization objective function of the kernel supervised hash model is as follows:

Wherein the operation is

Representing the square of the Frobenius norm of the matrix, H _t Expressed as:

wherein the method comprises the steps of

And optimizing an optimization objective function of the kernel supervised hash model by using a greedy algorithm and a gradient descent method.

In one embodiment of the invention, the identifying normal transactions and fraudulent transactions using the trained kernel supervised hash model includes the steps of: coding all training data through the trained kernel supervised hash model KSH (x) to obtain hash codes of all the training data; encoding test data through the kernel supervision hash model KSH (x) to obtain hash codes of the test data; calculating the similarity between the hash codes of the test data and the hash codes of all training data by adopting the Hamming distance; k training data closest to the similarity of hash codes of the test data are selected, and the transaction type of the test data is identified based on the transaction types of the k training data.

In order to achieve the above purpose, the invention also provides an electronic fraud transaction identification system based on the kernel supervision hash model, a definition module, a training module and an identification module; the definition module is used for constructing a nuclear supervision hash model; the training module is used for training the nuclear supervision hash model by using the marked normal transaction data and the marked fraudulent transaction data to obtain a trained nuclear supervision hash model; the identification module is used for identifying normal transactions and fraudulent transactions by using a trained kernel supervised hash model.

In an embodiment of the invention, the definition module is configured to:

determining the number p of the kernel hash functions;

defining a kernel hash function as:

wherein,,

In an embodiment of the present invention, the training module is configured to: the optimization objective function of the kernel supervised hash model is as follows:

wherein the operation is

Representing the square of the Frobenius norm of the matrix, H _t Expressed as:

wherein the method comprises the steps of

In an embodiment of the invention, the identification module is configured to: coding all training data through the trained kernel supervised hash model KSH (x) to obtain hash codes of all the training data; encoding test data through the kernel supervision hash model KSH (x) to obtain hash codes of the test data; calculating the similarity between the hash codes of the test data and the hash codes of all training data by adopting the Hamming distance; k training data closest to the similarity of hash codes of the test data are selected, and the transaction type of the test data is identified based on the transaction types of the k training data.

To achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements any of the above-described electronic fraud transaction identification methods based on a kernel supervised hash model.

In order to achieve the above object, the present invention further provides an electronic fraud transaction recognition device based on a kernel supervised hash model, including: a processor and a memory; the memory is used for storing a computer program; the processor is connected with the memory and is used for executing the computer program stored in the memory so that the electronic fraud transaction identification device based on the nuclear supervision hash model executes any electronic fraud transaction identification method based on the nuclear supervision hash model.

As described above, the method, the system and the device for identifying the electronic fraud transaction based on the kernel supervision hash model have the following beneficial effects: for effectively identifying new fraudulent transactions.

Drawings

FIG. 1a is a flow chart of an embodiment of a method for identifying electronic fraudulent transactions based on a kernel supervised hash model according to the present invention;

FIG. 1b is a flow chart of an electronic fraud transaction identification method according to the present invention based on a kernel supervised hash model in a further embodiment;

FIG. 1c is a flow chart of an electronic fraud transaction identification method according to the present invention based on a kernel supervised hash model in yet another embodiment;

FIG. 1d is a graph showing the comparison of accuracy, recall, precision, and overall performance of an electronic fraud transaction identification method according to an embodiment of the present invention based on a kernel-supervised hash model;

FIG. 1e is a diagram showing training time and test time comparisons of the method for identifying electronic fraudulent transactions based on a kernel-supervised hash model according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an electronic fraud transaction identification system based on a kernel supervised hash model according to the present invention in an embodiment;

FIG. 3 is a schematic diagram of an electronic fraud transaction identifying device based on a kernel supervised hash model according to an embodiment of the present invention.

Description of element reference numerals

21. Definition module

22. Training module

23. Identification module

31. Processor and method for controlling the same

32. Memory device

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.

It should be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present invention by way of illustration, so that only the components related to the present invention are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, the form, number and proportion of each component in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.

The method, the system and the device for identifying the electronic fraudulent transaction based on the kernel supervision hash model are used for effectively identifying novel fraudulent transaction.

As shown in fig. 1a, in an embodiment, the method for identifying electronic fraud transaction based on a kernel supervised hash model of the present invention includes the following steps:

and S11, constructing a nuclear supervision hash model.

Specifically, the construction of the nuclear supervision hash model comprises the following steps:

determining the number p of the kernel hash functions;

defining a kernel hash function as:

wherein,,

Specifically, the kernel supervised Hash model maps high-dimensional original data into low-latitude Hash Codes (Hash Codes) by constructing a set of kernel Hash functions (Kernel Hash Functions), and then classifies transactions in a Hash space by adopting a nearest neighbor search method. The invention adopts a supervised hash method to train a group of kernel functions through marked normal and fraudulent transaction data, so that the samples of the same class are as close as possible in a hash space and the samples of different classes are as far away as possible. A core hash function may be formally defined as:

Where x represents a transaction record and,

is from the original annotation dataset { x } ₁ ,x ₂ ,…,x _v }＝X _full Including a portion of fraudulent transaction data X' _fraud And part of normal transaction data X' _normal ，a _j And b are the coefficients and bias terms, respectively, of the kernel function. The rest of the data set is decimated by t samples as trainingData set->

Sign function sgn (data), sgn (data) =1 when data > 0; sgn (data) = -1 when data is less than or equal to 0.

The hash code is generated by a plurality of kernel hash functions with different parameters, each of which is responsible for mapping the input transaction samples into a 1-bit hash code. In order to be able to better distinguish between different classes of transactions, fully exploiting the information of the training data requires that each kernel hash function is specific to the input training data set X _train The conditions need to be satisfied:

bringing formula (1) into existence

The parameter b can be obtained by solving the equation, but the equation is difficult to solve, and alternative approximate solutions can be used:

bringing the approximate solution of parameter b into equation (1) can yield the final form of the kernel hash function:

wherein the method comprises the steps of

Can be calculated in advance. Thus in a kernel hash functionAfter determination of the kernel function k (), the parameters ∈>

The kernel hash function is uniquely determined. Thus, known marked fraud and normal transaction data sets X may be used _train As a reference, the parameter +.>

After kernel function mapping, the transaction data of the same category can obtain the same symbol as far as possible, and the transaction data of different categories can obtain different symbols.

The KSH method maps each transaction data x input into a p-bit hash code c by using p kernel hash functions, and can be formally expressed as:

c＝KSH(x)＝[h ¹ (x),h ² (x),…,h ^p (x)]＝[c ₁ ,c ₂ ,…,c _p ] (3)

the KSH model can determine the number p of the adopted nuclear hash functions according to the needs, and the suggested p value range is as follows for the electronic fraud transaction scene of a general financial institution: p is more than or equal to 20 and less than or equal to 40. The kernel hash function generally requires a nonlinear learning capability, and the invention uniformly uses a Gaussian kernel function, namely

(Sigma value is from [0.5,1.5 ]]Random selection within range).

And step S12, training the nuclear supervision hash model by using the marked normal data and the marked fraud data to obtain a trained nuclear supervision hash model.

Specifically, the training the core supervision hash model by using the marked normal transaction data and the marked fraud transaction data to obtain a trained core supervision hash model, which comprises the following steps:

the optimization objective function of the kernel supervised hash model is as follows:

wherein the operation is

Representing the square of the Frobenius norm of the matrix, H _t Expressed as:

wherein the method comprises the steps of

In particular, the KSH model uses datasets

As reference dataset +.>

As training data set, learning p kernel hash functions h one by adopting greedy algorithm ¹ (x),h ² (x),…,h ^p (x) Corresponding parameter vector->

In an electronic fraudulent transaction identification scenario, the present invention measures training dataset X in the following manner _train Similarity of any two samples:

wherein the method comprises the steps of

Representation->

And->

Belonging to the same class, and therefore their similarity S _i,j =1 represents sample similarity; />

Representation of

And->

Not belonging to the same class, and therefore their similarity S _i,j = -1 represents sample dissimilarity. If there are t samples, a t×t sample similarity matrix is formed

Each element in the sample similarity matrix is calculated by equation (4).

In the process of parameter learning, hash-coded inner products are used to calculate a training data set X _train Similarity of any two sample hash codes, which can be formally expressed as:

wherein the method comprises the steps of

Representation sample->

And->

Corresponding to the Hash encoded Hamming distance. Then->

The value range of (2) is +. >

Can be used for

As a decision function of the similarity between any two samples, namely:

then expanding the decision function to the similarity between t training samples, the decision function can be obtained as:

wherein the method comprises the steps of

A matrix of hash codes of t training samples is represented. If the similarity matrix between the t training samples is S ^t Then the optimization objective function of the kernel supervised hash model can be obtained as:

wherein the operation is

Representing the square of the Frobenius norm of the matrix. The optimization of equation (8) is in fact the choice of suitable parameters such that the decision function F (-) predicts the similarity between the samples obtainedThe difference between the similarity and the actual similarity is as small as possible. Due to H _t The hash code in (2) can be calculated by a kernel hash function shown in the formula (3), H _t Can be expressed as:

wherein the method comprises the steps of

By taking this formula into formula (8) and making a simple transformation, a more specific optimization objective function can be obtained:

wherein L (H) _t ) Representing the loss function of the optimization objective function. As can be seen from equation (10), the inner product of the sample hash code can be calculated individually on a bit-by-bit basis, and then the optimization objective can be gradually optimized using a greedy algorithm.

Using greedy algorithm on loss function L (H _t ) The optimization is performed assuming that the completion parameters have been optimized

Then +.>

The optimization objective function of (1) can be expressed as:

wherein C is _const Is a constant term that is used to determine the degree of freedom,

and P is ₀ ＝pS ^t 。

Due to the objective function to be optimizedIs a loss function of (2)

The direct optimization is very difficult because of the inclusion of the sign function sgn (), so that a smoothing function can be used +.>

And replacing the sign function sgn (·) to reduce the optimization difficulty of the objective function. Due to the loss function->

Constant term C of (2) _const Without affecting the optimization objective, and ignoring it, the processed loss function can be expressed as:

its corresponding optimization objective function can be expressed as:

loss function

For parameter->

The derivative of (2) can be expressed as:

wherein the method comprises the steps of

Thus, in the parameters of the hash function for any one core

The objective function can be optimized by means of gradient descent>

Obtained. For loss function->

If a pass solution method is adopted->

Is to calculate the optimal parameters +.>

The computational complexity is high. Thus, the gradient descent method may be used to approximate the optimal solution.

Assuming that the learning rate in the gradient descent method is alpha, the parameters are

The updating mode in the optimizing process is as follows:

then, the updated data is used for the

Is carried back into formula (12) and the parameter +.>

Is updated by using formula (15) for +.>

And updating. Repeating the process until the loss function is +.>

Hardly any more, i.e

(C _t Is a very small constant threshold, e.g. 0.001), then +.>

The final result of (2) is thus the parameter +.>

And the optimization of the optimization objective function of the kernel supervision hash model is completed.

And S13, identifying normal transactions and fraudulent transactions by using the trained kernel supervised hash model.

Specifically, the use of the trained kernel supervised hash model to identify normal and fraudulent transactions includes the steps of:

and obtaining a trained nuclear supervision hash model. Coding all training data through the trained kernel supervised hash model KSH (x) to obtain hash codes of all the training data; and stored.

And coding the test data through the trained kernel supervised hash model KSH (x) to obtain the hash code of the test data.

And calculating the similarity between the hash codes of the test data and the hash codes of all the training data by using the Hamming distance.

K training data closest to the similarity of hash codes of the test data are selected, and the transaction type of the test data is identified based on the transaction types of the k training data. The transaction type of the test data is identified based on the transaction type of the k training data being a fraudulent or normal transaction. Specifically, considering the sparsity of the novel fraudulent transaction data amount in the training sample, the value range of k is set to be 3-7. For example, 3 training data closest to the similarity of hash codes of the test data are selected, and based on the transaction types of the 3 training data, normal transaction and fraudulent transaction are respectively identified, the transaction type of the test data is identified as normal transaction (because two of the three training data are normal transaction and one is fraudulent transaction).

Specifically, the method further comprises the step of classifying the similarity by using a nearest neighbor model KNN, so that the problem that the KNN model is high in calculation and storage cost in electronic fraudulent transaction identification is solved through hash coding, and the performance of the KNN model in identifying novel fraudulent transactions is fully exerted, and the novel fraudulent transactions can be effectively identified. In order to make the KNN model problematic for use in electronic fraud transaction identification, the KNN model is employed to determine the class of test samples. The kernel supervised hash model adopted by the invention consists of a group of kernel hash functions, and the kernel hash functions are obtained by supervised learning of training samples, so that samples with the same category are similar in a hash space as much as possible, and samples with different categories are separated in the hash space as much as possible. According to the invention, the kernel supervised hash model is used for mapping data from the original high-dimensional feature space to the low-latitude hash space, so that on one hand, the dimension of a sample is reduced, and on the other hand, the KNN model can quickly search the nearest neighbor sample in the hash space, the problem that the calculation and storage cost of the KNN model in electronic fraud transaction identification is high is solved, and novel electronic fraud transaction can be accurately identified. The KNN model is different from a conventional statistical learning method, and the category of the test sample is determined by the category of k nearest samples in the feature space, so that a small number of novel fraudulent transactions cannot be ignored due to the small number, and the identification performance of the novel fraudulent transactions is ensured.

As shown in fig. 1b, in an embodiment, the method for identifying electronic fraud transaction based on a kernel supervised hash model according to the present invention includes the following steps:

a kernel supervised hash model is defined.

And training the nuclear supervision hash model by using the marked normal data and the marked fraud data to obtain a trained nuclear supervision hash model.

The method for identifying normal transactions and fraudulent transactions by using the trained kernel supervised hash model specifically comprises the following steps: and coding all training data (high-dimensional reference (training) data) through the trained kernel supervised hash model KSH (x) to obtain hash coding (low-micro reference (training) data) of all the training data. Test data (test sample x _test ) Encoding by the kernel supervised hash model KSH (x) to obtain the test data (test sample x) _test ) Is encoded by the hash code of (a). And classifying the similarity through a nearest neighbor model KNN, selecting k training data closest to the similarity of the hash codes of the test data, and identifying the transaction type of the test data based on the transaction type of the k training data. Identifying the test data (test sample x based on the transaction type of the k training data being fraudulent or normal transactions _test ) Is a transaction type of (a).

As shown in fig. 1c, in an embodiment, the method for identifying electronic fraud transaction based on a kernel supervised hash model of the present invention includes the following steps:

and S11, constructing a nuclear supervision hash model.

Specifically, the kernel supervised Hash model maps high-dimensional original data into low-latitude Hash Codes (Hash Codes) by constructing a set of kernel Hash functions (Kernel Hash Functions), and then classifies transactions in a Hash space by adopting a nearest neighbor search method. The invention adopts a supervised hash method to train a group of kernel functions through marked normal and fraudulent transaction data, so that the samples of the same class are as close as possible in a hash space and the samples of different classes are as far away as possible. Determining a core hash function is formally defined as:

where x represents a transaction record and,

is from the original annotation dataset { x } ₁ ,x ₂ ,…,x _v }＝X _full Including a portion of fraudulent transaction data X' _fraud And part of normal transaction data X' _normal ，a _j And b are the coefficients and bias terms, respectively, of the kernel function. The remaining data set is decimated by a further t samples as training data set +.>

bringing formula (1) into existence

bringing the approximate solution of parameter b into equation (1) to obtain the final form of the kernel hash function:

wherein the method comprises the steps of

Can be calculated in advance. Thus, after determination of the kernel function k (·) in the kernel hash function, the parameter +.>

c＝KSH(x)＝[h ¹ (x),h ² (x),…,h ^p (x)]＝[c ₁ ,c ₂ ,…,c _p ] (3)

The KSH model can determine the number p of the adopted nuclear hash functions according to the needs, and for the electronic fraud transaction scene of a general financial institution, the value range of p is determined as follows: p is more than or equal to 20 and less than or equal to 40. The kernel hash function generally requires a nonlinear learning capability, and the invention uniformly uses a Gaussian kernel function, namely

(Sigma value is from [0.5,1.5 ]]Random selection within range).

In particular, the KSH model uses datasets

As reference dataset +.>

In an electronic fraudulent transaction identification scenario, the present invention measures training dataset X in the following manner _train Similarity of any two samples: />

Wherein the method comprises the steps of

Representation->

And->

Representation of

And->

Each element in the sample similarity matrix is calculated by equation (4).

wherein the method comprises the steps of

Representation sample->

And->

Corresponding to the Hash encoded Hamming distance. Then->

The value range of (2) is +.>

Can be used for

As a decision function of the similarity between any two samples, namely:

wherein the method comprises the steps of

wherein the operation is

Representing the square of the Frobenius norm of the matrix. The optimization of equation (8) is in fact to choose the appropriate parameters such that the difference between the similarity between the samples predicted by the decision function F (-) and the actual similarity is as small as possible. Due to H _t The hash code in (2) can be calculated by a kernel hash function shown in the formula (3), H _t Can be expressed as: />

Wherein the method comprises the steps of

Then +.>

The optimization objective function of (1) can be expressed as:

and P is ₀ ＝pS ^t 。

Loss function due to objective function to be optimized

Constant term C of (2) _const To optimizeThe target has no effect and is ignored, then the processed loss function can be expressed as:

its corresponding optimization objective function can be expressed as:

loss function

For parameter->

The derivative of (2) can be expressed as:

wherein the method comprises the steps of

For loss function

If a pass solution method is adopted->

Is to calculate the optimal parameters +.>

The updating mode in the optimizing process is as follows:

then, the updated data is used for the

Is carried back into formula (12) and the parameter +.>

Is updated by using formula (15) for +.>

And updating. Repeating the process until the loss function is +.>

Hardly any more, i.e

(C _t Is a very small constant threshold, e.g. 0.001), then +.>

The final result of (2) is thus the parameter +.>

Taking transaction data of a certain bank in China as an example, an actual test is developed. The data set contains transaction data from 2017, 4 to 6, approximately 350 ten thousand, all labeled by a bank professional. As shown in table 1, basic information for this dataset.

TABLE 1 electronic transaction data information

Month of month	Data volume	Features (e.g. a character)Quantity of	Imbalance ratio
				2017-04	1,243,035	43	1.07％
2017-05	1,216,299	43	2.22％
				2017-06	1,042,714	43	2.39％

In order to be able to demonstrate the good performance and high efficiency of the method according to the invention in the identification of fraudulent transactions. The 4 and 5 month data of this dataset were used as training sets, and 80% of the reference dataset (reference data) used for KSH model training was randomly extracted from each of the fraudulent transaction data and the normal transaction data, respectively, and the remaining portion was used as validation set for adjusting model parameters. The 6 month data was then used as a test set of models (test data).

The invention discloses an electronic fraud transaction identification method based on a kernel supervision hash model, which is used for carrying out a comparison experiment with 3 common fraud transaction identification methods:

1)random Forests(RF)；

2)Support Vector Machine(SVM)；

3)K-Nearest Neighbor(KNN)；

as shown in fig. 1d, although the electronic fraud transaction recognition method based on the kernel supervised hash model proposed by the present invention is lower in accuracy (precision) than the SVM model, the recall (recall) is higher than the SVM model, the SVM/RF/KNN/KSH is from left to right, so that the KSH model has better overall performance (maximum F1 value-F1-Measure) than other models. While the four methods do not differ much in accuracy. Experimental structure figure 1e shows training (training stage) times (SVM/RF/KNN/KSH in order from left to right) and testing (testing stage) times (SVM/RF/KNN/KSH in order from left to right) for different models, although the original KNN model does not require training, the testing stage is computationally intensive and time consuming. The training and testing time of the KSH model provided by the invention is close to that of the RF model, and the required time is less than that of the SVM model. Therefore, the KSH model provided by the invention can be used for efficiently identifying fraudulent transactions, and the computing resource consumption of the original KNN model applied to electronic fraudulent transaction identification can be obviously improved.

As shown in fig. 2, in one embodiment, the electronic fraud transaction identification system based on the kernel supervised hash model of the present invention includes a definition module 21, a training module 22, and an identification module 23.

The definition module 21 is used for constructing a kernel supervision hash model; the training module is used for training the nuclear supervision hash model by using the marked normal transaction data and the marked fraudulent transaction data to obtain a trained nuclear supervision hash model; the identification module is used for identifying normal transactions and fraudulent transactions by using a trained kernel supervised hash model.

In an embodiment of the present invention, the definition module 21 is configured to: determining the number p of the kernel hash functions; defining a kernel hash function as:

wherein,,

kernel function k%Using a gaussian kernel function; each transaction data x input is mapped into a p-bit hash code c by using p kernel hash functions.

In one embodiment of the present invention, the training module 22 is configured to: the optimization objective function of the kernel supervised hash model is as follows:

wherein the operation is

Representing the square of the Frobenius norm of the matrix, H _t Expressed as:

wherein the method comprises the steps of

In an embodiment of the present invention, the identification module 23 is configured to: coding all training data through the trained kernel supervised hash model KSH (x) to obtain hash codes of all the training data; encoding test data through the kernel supervision hash model KSH (x) to obtain hash codes of the test data; calculating the similarity between the hash codes of the test data and the hash codes of all training data by adopting the Hamming distance; k training data closest to the similarity of hash codes of the test data are selected, and the transaction type of the test data is identified based on the transaction types of the k training data.

It should be noted that, the structures and principles of the definition module 21, the training module 22 and the identification module 23 are in one-to-one correspondence with the steps in the above-mentioned electronic fraud transaction identification method based on the kernel supervised hash model, so that the description thereof is omitted here.

It should be noted that, it should be understood that the division of the modules of the above system is merely a division of a logic function, and may be fully or partially integrated into a physical entity or may be physically separated. And these modules may all be implemented in software in the form of calls by the processing element; or can be realized in hardware; the method can also be realized in a form of calling software by a processing element, and the method can be realized in a form of hardware by a part of modules. For example, the x module may be a processing element that is set up separately, may be implemented in a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and the function of the x module may be called and executed by a processing element of the apparatus. The implementation of the other modules is similar. In addition, all or part of the modules can be integrated together or can be independently implemented. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in a software form.

For example, the modules above may be one or more integrated circuits configured to implement the methods above, such as: one or more application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as ASIC), or one or more microprocessors (Digital Singnal Processor, abbreviated as DSP), or one or more field programmable gate arrays (Field Programmable Gate Array, abbreviated as FPGA), or the like. For another example, when a module above is implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU) or other processor that may invoke the program code. For another example, the modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

In an embodiment of the present invention, the present invention further includes a computer readable storage medium having a computer program stored thereon, which when executed by a processor implements any of the above-described electronic fraud transaction identification methods based on a kernel supervised hash model.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by computer program related hardware. The aforementioned computer program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.

As shown in fig. 3, in an embodiment, the electronic fraud transaction identifying device based on the kernel supervised hash model of the present invention includes: a processor 31 and a memory 32; the memory 32 is used for storing a computer program; the processor 31 is connected to the memory 32 and is configured to execute a computer program stored in the memory 32, so that the electronic fraud transaction identifying device based on the kernel supervised hash model performs any of the electronic fraud transaction identifying methods based on the kernel supervised hash model.

Specifically, the memory 32 includes: various media capable of storing program codes, such as ROM, RAM, magnetic disk, U-disk, memory card, or optical disk.

Preferably, the processor 31 may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), etc.; but also digital signal processors (Digital Signal Processor, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field programmable gate arrays (Field Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

In summary, the method, the system and the device for identifying the electronic fraudulent transaction based on the kernel supervision hash model are used for effectively identifying novel fraudulent transaction. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.

The above embodiments are merely illustrative of the principles of the present invention and its effectiveness, and are not intended to limit the invention. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the invention. Accordingly, it is intended that all equivalent modifications and variations of the invention be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.

Claims

1. An electronic fraud transaction identification method based on a kernel supervision hash model is characterized by comprising the following steps:

constructing a nuclear supervision hash model; the kernel supervised hash model is used for mapping high-dimensional original data into low-dimensional hash codes by constructing a group of kernel hash functions, and classifying transactions in a hash space by adopting a nearest neighbor searching method; the hash code is generated by a plurality of kernel hash functions with different parameters;

Training the core supervision hash model by using marked normal transaction data and fraudulent transaction data to obtain a trained core supervision hash model; comprising the following steps: establishing an optimization objective function of a kernel supervision hash model; optimizing an optimization objective function of the kernel supervision hash model by using a greedy algorithm and a gradient descent method;

identifying normal transactions and fraudulent transactions using the trained kernel supervised hash model; the method comprises the following steps:

coding all training data through the trained kernel supervised hash model KSH (x) to obtain hash codes of all the training data;

encoding test data through the kernel supervision hash model KSH (x) to obtain hash codes of the test data;

calculating the similarity between the hash codes of the test data and the hash codes of all training data by adopting a Hamming distance, and classifying the similarity based on a KNN model;

k training data closest to the similarity of hash codes of the test data are selected, and the transaction type of the test data is identified based on the transaction types of the k training data.

2. The method for identifying electronic fraudulent transactions based on a nuclear supervised hash model according to claim 1, wherein the constructing the nuclear supervised hash model includes the steps of:

Determining the number p of the kernel hash functions;

defining a kernel hash function as:

wherein,,

the kernel function k (·) adopts a gaussian kernel function;

each transaction data x input is mapped into a p-bit hash code c by using p kernel hash functions.

3. The method for identifying electronic fraudulent transactions based on a kernel supervised hash model as set forth in claim 2, wherein,

the optimization objective function of the kernel supervision hash model is as follows:

wherein the operation is

Representing the square of the Frobenius norm of the matrix, H _t Expressed as:

wherein the method comprises the steps of

4. An electronic fraud transaction identification system based on a kernel supervised hash model, comprising: the system comprises a definition module, a training module and an identification module;

the definition module is used for constructing a nuclear supervision hash model; the kernel supervised hash model is used for mapping high-dimensional original data into low-dimensional hash codes by constructing a group of kernel hash functions, and classifying transactions in a hash space by adopting a nearest neighbor searching method; the hash code is generated by a plurality of kernel hash functions with different parameters;

the training module is used for training the nuclear supervision hash model by using the marked normal transaction data and the marked fraudulent transaction data to obtain a trained nuclear supervision hash model; comprising the following steps: establishing an optimization objective function of a kernel supervision hash model; optimizing an optimization objective function of the kernel supervision hash model by using a greedy algorithm and a gradient descent method;

The identification module is used for identifying normal transactions and fraudulent transactions by using a trained kernel supervised hash model; the method comprises the following steps:

5. The system for identifying electronic fraudulent transactions based on a kernel supervised hash model of claim 4, wherein said definition module is configured to:

determining the number p of the kernel hash functions;

defining a kernel hash function as:

wherein,,

the kernel function k (·) adopts a gaussian kernel function;

6. A computer readable storage medium having stored thereon a computer program, wherein the computer program is executed by a processor to implement the method of identifying electronic fraudulent transactions based on a kernel supervised hash model of any of claims 1 to 3.

7. An electronic fraudulent transaction identification device based on a kernel supervised hash model, comprising: a processor and a memory;

the memory is used for storing a computer program;

the processor is connected to the memory for executing the computer program stored in the memory, so that the electronic fraud transaction identifying device based on the nuclear supervision hash model executes the electronic fraud transaction identifying method based on the nuclear supervision hash model according to any one of claims 1 to 3.