CN109544190A

CN109544190A - A kind of fraud identification model training method, fraud recognition methods and device

Info

Publication number: CN109544190A
Application number: CN201811432681.3A
Authority: CN
Inventors: 郭豪; 孙善萍; 任鑫涛; 蔡准; 孙悦; 郭晓鹏
Original assignee: Beijing Core Time Technology Co Ltd
Current assignee: Beijing Core Time Technology Co Ltd
Priority date: 2018-11-28
Filing date: 2018-11-28
Publication date: 2019-03-29

Abstract

This application provides a kind of fraud identification model training method, fraud recognition methods and devices, wherein, fraud identification model training method includes: the historical operation information of first sample user and the second sample of users based on acquisition and the fraud markup information of the second sample of users, constructs the first eigenvector of first sample user and the second feature vector of the second sample of users；Unsupervised pre-training is carried out to first nerves network based on first eigenvector；And Training is carried out to the first nerves network and classifier Jing Guo pre-training based on the corresponding fraud markup information of second feature vector sum, obtain fraud identification model.The embodiment of the present application can have the sample data training of mark to obtain fraud identification model by the sample data largely without mark and on a small quantity, while reducing the workload manually marked during model training to sample data, the efficiency of model training and the recognition accuracy of fraud identification model can be improved.

Description

A kind of fraud identification model training method, fraud recognition methods and device

Technical field

This application involves machine learning techniques fields, in particular to a kind of fraud identification model training method, fraud Recognition methods and device.

Background technique

The fast development of internet and popularizing for intelligent terminal, so that people are in the electronic silver for passing through multiple business channel Row remotely handle query the balance, transfer accounts, the business such as payment, financing of do shopping when obtain great convenient, no matter people are any Time, any place, bank counter is not needed, the finger that only need to easily make a movement can be carried out remittance by e-bank and be turned Account, calmly mutual turn living, Credit Statement and detail inquiry, credit card repayment, financing/fund purchase, a variety of finance such as payment of living Business, efficiency are greatly improved.But e-bank, while providing convenient service for user, there is also very much Security risk.

Investigation display, the network crime bring up to 445,000,000,000 dollars of economic loss to the whole world every year, it is increasingly complicated and to Different industries infiltration.At home, the Dark Industry Link scale of network swindle is more than 110,000,000,000 yuan, and practitioner is more than 1,600,000 people. Show according to the data that China Internet association issues, 63.4% netizen's message registration, shopping online record etc., and information are revealed； 78.2% netizen's personally identifiable information was once leaked.It is broken that the information that fraudster steals victim constantly carries out violence later Solution, account information is stolen, account information is usurped, is stolen and is turned the behaviors such as fund, the leakage of personal information, is realized precisely to swindle molecule Swindle brings great convenience, so that swindle molecule may be implemented precisely to swindle, the single amount of money of swindling is constantly soaring, fraud Behavior from single individual behavior, develops into well organized, the specific Dark Industry Link item of professional division, sends out for bank Exhibition network finance business brings severe challenge.

In order to enhance the safety of e-bank, there is the training method of supervision to engineering using traditional in the prior art It practises model to be trained, but when carrying out has the training of supervision, the sample for needing label is trained, and marks need of work It is pure artificial come what is completed, there are problems that time and effort consuming.But if having exemplar data to machine learning mould using a small amount of Type is trained, and can be very few due to sample data, the low problem of the fraud identification model recognition accuracy caused.

Summary of the invention

In view of this, the embodiment of the present application is designed to provide a kind of fraud identification model training method, fraud identification Method and device can have the sample data training of mark to obtain fraud identification by the sample data largely without mark and on a small quantity Model while reducing the workload manually marked during model training to sample data, can be improved model and instruct The recognition accuracy of experienced efficiency and fraud identification model.

In a first aspect, the embodiment of the present application provides a kind of fraud identification model training method, comprising:

Obtain the historical operation information of multiple first sample users；And obtain the historical operation of multiple second sample of users Information and the corresponding fraud markup information of each second sample of users；

According to the historical operation information of the first sample user, building can be used in characterizing the first sample user behaviour Make the first eigenvector of behavioural characteristic；And the historical operation information according to second sample of users, building can be used in Characterize the second feature vector of the second sample of users operation behavior feature；

The first eigenvector is input to the first nerves network and nervus opticus network of symmetrical configuration, to described One neural network carries out unsupervised pre-training；Wherein, the first nerves network is for compiling the first eigenvector Code；The nervus opticus network is used to be decoded the first eigenvector after coding；

The second feature vector is input to the first nerves network and classifier by pre-training, based on described The corresponding fraud markup information of second feature vector sum, to Jing Guo pre-training the first nerves network and the classification Device carries out Training, obtains fraud identification model.

With reference to first aspect, the embodiment of the present application provides the first possible embodiment of first aspect, wherein institute The historical operation information according to the first sample user is stated, building can be used in characterizing the first sample user's operation behavior The first eigenvector of feature；And the historical operation information according to second sample of users, building can be used in characterizing institute State the second feature vector of the second sample of users operation behavior feature, comprising:

For each first sample user, according to the historical operation information of first sample user, determining should Characteristic value of the first sample user under multiple predetermined registration operation behavioural characteristics；

According to characteristic value of the first sample user under multiple predetermined registration operation behavioural characteristics, building can be used in table Levy the first eigenvector of the first sample user's operation behavioural characteristic；And

For each second sample of users, according to the historical operation information of second sample of users, determining should Characteristic value of second sample of users under multiple predetermined registration operation behavioural characteristics；

According to characteristic value of second sample of users under multiple predetermined registration operation behavioural characteristics, building can be used in table Levy the second feature vector of the second sample of users operation behavior feature.

With reference to first aspect, the embodiment of the present application provides second of possible embodiment of first aspect, wherein institute The first nerves network and nervus opticus network that the first eigenvector is input to symmetrical configuration are stated, to the first nerves Network carries out unsupervised pre-training, comprising:

The first eigenvector is input in the first nerves network, is obtained in the first nerves network extremely The coding characteristic vector of few target encoding layer output；

The coding characteristic vector of the last layer coding layer output in the first nerves network is input to described second In neural network, the decoding for obtaining target decoder layer output corresponding with the target encoding layer in the nervus opticus network is special Levy vector；

Feature vector is decoded according to the coding characteristic vector sum, to the first nerves network and second mind Epicycle training is carried out through network；

By carrying out more wheel training to the first nerves network and the nervus opticus network, complete to first mind Unsupervised pre-training through network.

The possible embodiment of second with reference to first aspect, the embodiment of the present application provide the third of first aspect Possible embodiment, wherein it is described to decode feature vector according to the coding characteristic vector sum, to the first nerves Network and the nervus opticus network carry out epicycle training, comprising:

Epicycle is not completed to any one first sample user in the first sample user of training as target also First sample user decodes feature vector according to the coding characteristic vector sum of target first sample user, determines Loss of the target first sample user in epicycle；

According to the target first sample user in the loss of epicycle, the first nerves network and described are adjusted The parameter of two neural networks；

Using the target first sample user as the first sample user for completing training, and instruction will not be also completed when front-wheel Other any one first sample users are as new target first sample user in experienced first sample user；

Using the first nerves network and the nervus opticus network after parameter is had adjusted, the new target is obtained The coding characteristic vector sum of one sample of users decodes feature vector, and returns to according to target first sample user Feature vector is decoded described in coding characteristic vector sum, determines the target first sample user the loss of epicycle the step of；

Until all first sample users complete the training when front-wheel, complete to the first nerves network with The epicycle training of the nervus opticus network.

With reference to first aspect, the embodiment of the present application provides the 4th kind of possible embodiment of first aspect, wherein institute It states and the second feature vector is input to the first nerves network and classifier by pre-training, it is special based on described second Levy the corresponding fraud markup information of vector sum, to Jing Guo pre-training the first nerves network and classifier progress Training obtains fraud identification model, comprising:

The second feature vector is input to the first nerves network and classifier by pre-training, described in acquisition The fraud recognition result of second sample of users；And

According to the fraud recognition result of each second sample of users and the fraud mark of second sample of users Information is infused, to the first nerves network and classifier progress epicycle Training Jing Guo pre-training；

By carrying out more wheel Trainings to the first nerves network and the classifier, the fraud identification is obtained Model.

The 4th kind of possible embodiment with reference to first aspect, the embodiment of the present application provide the 5th kind of first aspect Possible embodiment, wherein the fraud recognition result and second sample according to each second sample of users The fraud markup information of user, to Jing Guo pre-training the first nerves network and the classifier carry out epicycle have prison Supervise and instruct white silk, comprising:

Epicycle is not completed to any one second sample of users in second sample of users of training as target also Second sample of users, according to the fraud recognition result of second sample of users of target and the institute of second sample of users of target Fraud markup information is stated, determines second sample of users of target in the intersection entropy loss of epicycle；

According to second sample of users of target epicycle the intersection entropy loss, adjust the first nerves network and The parameter of the classifier；

Using second sample of users of target as the second sample of users for completing training, and instruction will not be also completed when front-wheel Any one other second sample of users are as new the second sample of users of target in the second experienced sample of users,

Using the first nerves network and the classifier after parameter is had adjusted, new second sample of target is obtained The fraud recognition result of user, and return to fraud recognition result and the target according to the second sample of users of the target The fraud markup information of second sample of users determines second sample of users of target in the step of the intersection entropy loss of epicycle Suddenly；

Until all second sample of users all complete the training when front-wheel, complete to the first nerves Jing Guo pre-training Network and the classifier carry out epicycle Training.

The 5th kind of possible embodiment with reference to first aspect, the embodiment of the present application provide the 6th kind of first aspect Possible embodiment, wherein described to complete to the first nerves network and classifier progress sheet Jing Guo pre-training After taking turns Training, further includes:

Whether detection epicycle reaches default wheel number；If it is, stopping to the first nerves network and the classifier Training, the first nerves network and the classifier that last training in rotation is got are as the fraud identification model；

Alternatively,

The first nerves network and the classifier obtained using test set to epicycle is verified；If the test It concentrates, intersects item number of the entropy loss no more than the preset test data for intersecting entropy loss threshold value, occupy and surveyed in the test set The percentage of data total number is tried, preset first percentage threshold is greater than, then is stopped to the first nerves network and described The training of classifier is identified using the first nerves network that last training in rotation is got and the classifier as the fraud Model；

Alternatively,

Successively by the intersection entropy loss of each second sample of users of epicycle, the second sample of users corresponding with previous round Intersect entropy loss to be compared；If the intersection entropy loss of the second sample of users described in epicycle is greater than previous round, corresponding second sample is used The quantity of second sample of users of the intersection entropy loss at family, the percentage for occupying all second sample of users quantity reach preset Second percentage threshold then stops the training to the first nerves network and the classifier, and last round of training is obtained The first nerves network and the classifier as the fraud identification model.

Second aspect, the embodiment of the present application provide a kind of fraud recognition methods, comprising:

When operation behavior occurs for user to be detected, the historical operation information of the user to be detected is obtained；

According to the historical operation information of the user to be detected, building can be used in characterizing the user's operation behavior to be detected The target feature vector of feature；

The target feature vector is input to by the possible embodiment of the first of first aspect and first aspect The fraud identification that the training of fraud identification model training method described in any one to the 6th kind of possible embodiment obtains In model, the operation behavior for obtaining the user to be detected is the probability of fraud.

The third aspect, the embodiment of the present application provide a kind of fraud identification model training device, comprising:

First obtains module, for obtaining the historical operation information of multiple first sample users；And obtain multiple second The historical operation information of sample of users and the corresponding fraud markup information of each second sample of users；

First building module, for the historical operation information according to the first sample user, building can be used in characterizing The first eigenvector of the first sample user's operation behavioural characteristic；And the historical operation according to second sample of users Information, building can be used in characterizing the second feature vector of the second sample of users operation behavior feature；

Pre-training module, for the first eigenvector to be input to the first nerves network and the second mind of symmetrical configuration Through network, unsupervised pre-training is carried out to the first nerves network；Wherein, the first nerves network is used for described first Feature vector is encoded；The nervus opticus network is used to be decoded the first eigenvector after coding；

Training module, for the second feature vector to be input to the first nerves network by pre-training and is divided Class device is based on the corresponding fraud markup information of the second feature vector sum, to the first nerves Jing Guo pre-training Network and the classifier carry out Training, obtain fraud identification model.

In conjunction with the third aspect, the embodiment of the present application provides the first possible embodiment of the third aspect, wherein institute The first building module is stated, is specifically used for:

In conjunction with the third aspect, the embodiment of the present application provides second of possible embodiment of the third aspect, wherein institute Pre-training module is stated, is specifically used for:

In conjunction with second of possible embodiment of the third aspect, the embodiment of the present application provides the third of the third aspect Possible embodiment, wherein the pre-training module is specifically used for using following manner according to the coding characteristic vector sum The decoding feature vector carries out epicycle training to the first nerves network and the nervus opticus network:

In conjunction with the third aspect, the embodiment of the present application provides the 4th kind of possible embodiment of the third aspect, wherein institute Training module is stated, is specifically used for:

In conjunction with the 4th kind of possible embodiment of the third aspect, the embodiment of the present application provides the 5th kind of the third aspect Possible embodiment, wherein the training module is specifically used for using following manner according to each second sample of users Fraud recognition result and second sample of users the fraud markup information, to by pre-training it is described first mind Epicycle Training is carried out through network and the classifier:

In conjunction with the 5th kind of possible embodiment of the third aspect, the embodiment of the present application provides the 6th kind of the third aspect Possible embodiment, wherein the training module, complete to Jing Guo pre-training the first nerves network and described point After class device carries out epicycle Training, it is also used to:

Alternatively,

Fourth aspect, the embodiment of the present application provide a kind of fraud identification device, comprising:

Second obtains module, for when operation behavior occurs for user to be detected, obtaining the history behaviour of the user to be detected Make information；

Second building module, for the historical operation information according to the user to be detected, building can be used in described in characterization The target feature vector of user's operation behavioural characteristic to be detected；

It cheats recognition result and obtains module, for being input to the target feature vector by first aspect and first Fraud identification model described in any one of the possible embodiment of the first of aspect to the 6th kind of possible embodiment In the fraud identification model that training method training obtains, the operation behavior for obtaining the user to be detected is the general of fraud Rate.

The embodiment of the present application uses the first eigenvector of first sample user to the first nerves net of symmetrical configuration first Network and nervus opticus network carry out unsupervised training, and first nerves network can encode first eigenvector, and second Neural network can be decoded the first eigenvector by coding, during coding and decoding, so that the first mind Through e-learning to the feature of each first sample user；Then the second feature vector sum fraud mark of the second sample of users is reused Note information carries out the training for having supervision to first nerves network and classifier, further to by pre- instruction by way of having supervision Experienced first nerves network is adjusted, and to promote the precision of first nerves network, and completes the training to classifier, final The fraud identification model satisfied the use demand to precision manually marks sample data during model training to reduce While the workload of note, it can be improved the efficiency of model training and cheat the recognition accuracy of identification model.

To enable the above objects, features, and advantages of the application to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.

Detailed description of the invention

Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.

Fig. 1 shows a kind of flow chart for cheating identification model training method provided by the embodiment of the present application；

Fig. 2 shows provided by the embodiment of the present application fraud identification model training method in, the stream of construction feature vector Cheng Tu；

Fig. 3 is shown in fraud identification model training method provided by the embodiment of the present application, the tool of construction feature vector Body flow chart；

Fig. 4 shows a kind of structural representation of first nerves network and nervus opticus network provided by the embodiment of the present application Figure；

Fig. 5 is shown in fraud identification model training method provided by the embodiment of the present application, to first nerves network into The flow chart of the unsupervised pre-training of row；

Fig. 6 is shown in fraud identification model training method provided by the embodiment of the present application, to first nerves network and Nervus opticus network carries out the flow chart of epicycle training；

Fig. 7 shows a kind of structural schematic diagram of first nerves network and classifier provided by the embodiment of the present application；

Fig. 8 is shown in fraud identification model training method provided by the embodiment of the present application, to the by pre-training One neural network and classifier carry out the flow chart of Training；

Fig. 9 is shown in fraud identification model training method provided by the embodiment of the present application, to the by pre-training One neural network and classifier carry out the flow chart of epicycle Training；

Figure 10 shows the flow chart of fraud recognition methods provided by the embodiment of the present application；

Figure 11 shows the structural schematic diagram of fraud identification model training device provided by the embodiment of the present application；

Figure 12 shows the structural schematic diagram of fraud identification device provided by the embodiment of the present application；

Figure 13 shows the structural schematic diagram of fraud identifying system provided by the embodiment of the present application；

Figure 14 shows the application process schematic diagram of fraud identifying system provided by the embodiment of the present application；

Figure 15 shows a kind of structural schematic diagram of computer equipment provided by the embodiment of the present application；

Figure 16 shows the structural schematic diagram of another kind computer equipment provided by the embodiment of the present application.

Specific embodiment

To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application Middle attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only It is some embodiments of the present application, instead of all the embodiments.The application being usually described and illustrated herein in the accompanying drawings is real The component for applying example can be arranged and be designed with a variety of different configurations.Therefore, below to the application's provided in the accompanying drawings The detailed description of embodiment is not intended to limit claimed scope of the present application, but is merely representative of the selected reality of the application Apply example.Based on embodiments herein, those skilled in the art institute obtained without making creative work There are other embodiments, shall fall in the protection scope of this application.

Currently in order to the safety of enhancing e-bank, has the training method of supervision to machine using traditional in the prior art When device learning model is trained, there is a problem of that low efficiency and recognition accuracy are lower, is based on this, one kind provided by the present application Identification model training method, fraud recognition methods and device are cheated, can be had by the sample data largely without mark and on a small quantity The sample data training of mark obtains fraud identification model, and reduction manually marks sample data during model training Workload while, can be improved model training efficiency and cheat identification model recognition accuracy.

To be instructed to a kind of fraud identification model disclosed in the embodiment of the present application first convenient for understanding the present embodiment Practice method to describe in detail.

Shown in Figure 1, fraud identification model training method includes S101~S104 provided by the embodiment of the present application:

S101: the historical operation information of multiple first sample users is obtained；And obtain going through for multiple second sample of users History operation information and the corresponding fraud markup information of each second sample of users.

Here it can be seen that first sample user is the sample of users of no mark, the second sample of users is the sample for having mark This user, illustratively, the quantity of the quantity of the first sample user of no mark and the second sample of users for having mark can phase Together, it can also be different.

If the quantity of the first sample user of no mark is greater than the quantity for having the second sample of users of mark, it is being based on first When the historical operation information of sample of users carries out pre-training to first nerves network, the ginseng of first nerves network is enabled to The near-final training result of number, then using the historical operation information of the second sample of users to the first nerves Jing Guo pre-training Network carries out the training for having supervision, so that the parameter at first nerves network is adjusted, obtains training result.

The process for carrying out pre-training to first nerves network in the historical operation information based on first sample user, can see During making Training, the process of initial assignment is carried out to first nerves network, since the process enables to the first mind Initial value through network during Training, compared with random assignment in the prior art, closer to model training as a result, Therefore the instruction for having supervision is carried out to the first nerves network Jing Guo pre-training in the historical operation information using the second sample of users When white silk, training result can be faster obtained.Since the process of unsupervised training is simpler than Training process, into one Walk the training process accelerated.

Under a kind of possible scene, when whether judgement sample user occurs fraud, need according to the sample The historical operation information of user whithin a period of time carries out comprehensive descision, can not only be sentenced by once-through operation information merely It is disconnected, and the result that fraud whether occurs generally requires to complete just know after a period of time in operation in sample of users, Such as judge whether victim occur after a period of time.Therefore need to obtain the second sample of users in the first historical time section Whether interior historical operation information and each second sample of users the fraud mark of fraud occurs in the second historical time section Infuse information.

Optionally, historical operation information may come from different banking channels, for example, banking channel is at least Including selling bank, wechat bank, quick payment, Mobile banking, the Internet bank etc. directly to households.

Illustratively, historical operation may include a variety of different types of operations, such as fundamental operation and business operation.Its In, fundamental operation includes registering and logging, this is because any business operation process in any banking channel is all certain Basis and the premise of other operations can be regarded as comprising the two operations, the two operations；Business operation may include turning Account modifies transfer accounts limit, payment, enchashment etc., and business operation, may according to the request of different user in different bank business channel Different service logic and operating characteristics are had, the purpose of user's operation request has directly been reflected.

Illustratively, historical operation information is the information of various historical operations, such as registering the information of operation includes in 7 days Facility registration account number, same facility registration uses cell-phone number quantity etc. in 1 day；The information of registration operation includes same in 1 day Equipment login account quantity, whether non-commonly used equipment log in etc., the information of transfer operation includes whether single transfer amounts big Whether in 100,000, collecting account in blacklist etc., more examples are referring to shown in table 1-1, table 1-2, table 1-3, table 1-4, table 1-5 Out.

S102: according to the historical operation information of first sample user, building can be used in characterizing first sample user's operation The first eigenvector of behavioural characteristic；And the historical operation information according to the second sample of users, building can be used in characterization the The second feature vector of two sample of users operation behavior features.

It should be noted that be unfavorable for computer since historical operation information form is lack of standardization and carry out automatic processing, and The vectorization of data, which can be, is converted into the consistent form for being convenient to computer disposal of format nonstandard data.Therefore, Need to construct the feature vector that can be used in characterizing sample of users operation behavior feature according to historical operation information.

Shown in Figure 2 when specific implementation, the embodiment of the present application is based on following manner construction feature vector:

S201: be directed to each first sample user, according to the historical operation information of first sample user, determine this first Characteristic value of the sample of users under multiple predetermined registration operation behavioural characteristics.

S202: according to characteristic value of the first sample user under multiple predetermined registration operation behavioural characteristics, building can be used in Characterize the first eigenvector of first sample user's operation behavioural characteristic.

S203: be directed to each second sample of users, according to the historical operation information of second sample of users, determine this second Characteristic value of the sample of users under multiple predetermined registration operation behavioural characteristics.

S204: according to characteristic value of second sample of users under multiple predetermined registration operation behavioural characteristics, building can be used in Characterize the second feature vector of the second sample of users operation behavior feature.

Wherein, the sequencing that step S201 and step S203 are not carried out.

For step S201 and S203 when specific implementation, the embodiment of the present application determines that first sample is used based on following manner Characteristic value of family/second sample of users under multiple predetermined registration operation behavioural characteristics:

It herein, include multiple predetermined registration operation behavioural characteristics, the embodiment of the present application in the historical operation information of sample of users Provide a specific embodiment, referring to shown in table 1-1, table 1-2, table 1-3, table 1-4, table 1-5, show fundamental operation and The characteristic value classification of a variety of predetermined registration operation behavioural characteristics and each predetermined registration operation behavioural characteristic that business operation includes.Wherein, right It is indicated in numerical characteristics its corresponding numerical value that then be used directly, and then uses the coding of hot solely (one-hot) for category feature The vector of corresponding one 0,1 composition of mode, i.e. each category feature, classification number correspond to the dimension of vector, i.e. a classification pair The one-dimensional of vector is answered, when the predetermined registration operation behavioural characteristic is a certain classification, the corresponding vector position of the category takes 1, other portions Divide and then all sets 0.Such as predetermined registration operation behavioural characteristic " whether being tampered when facility registration " includes two classes, respectively " is usurped Change " and " being not tampered with ", then the predetermined registration operation behavioural characteristic " whether being tampered when facility registration " is solely encoded using two heat Mode, it is assumed that " being tampered " is " 10 ", and " being not tampered with " is then " 01 ".

Table 1-1 registration operation predetermined registration operation behavioural characteristic

Predetermined registration operation behavioural characteristic	Characteristic value classification
		Whether it is tampered when facility registration	Two-value category feature
Whether simulator is used when facility registration	Two-value category feature
		Whether escape from prison when facility registration	Two-value category feature
In 1 day, facility registration account number	Numerical characteristics
		In 7 days, facility registration account number	Numerical characteristics
In 1 day, same facility registration uses cell-phone number quantity	Numerical characteristics
		In 7 days, same facility registration uses cell-phone number quantity	Numerical characteristics
Same cell-phone number attempts times of registration in 1 day	Numerical characteristics
		In 1 day, the same registration cell-phone number uses IP quantity	Numerical characteristics
In 7 days, IP quantity when the same registration cell-phone number	Numerical characteristics
		In 1 day, the same IP register account number number	Numerical characteristics
In 7 days, same IP register account number number	Numerical characteristics

Table 1-2 register predetermined registration operation behavioural characteristic

Table 1-3 transfer operation predetermined registration operation behavioural characteristic

Predetermined registration operation behavioural characteristic	Characteristic value classification
		Whether collecting account is in blacklist	Two-value category feature
Whether transfer accounts in sensitization time	Two-value category feature
		The current transfer amounts of account account for the percentage of 6 months whole transfer amounts	Numerical characteristics
In 1 hour, same account transfer number	Numerical characteristics
		Whether single transfer amounts are greater than 100,000	Two-value category feature
In 1 day, account summary transfer amounts	Numerical characteristics
		In 1 day, account trading password errors number	Numerical characteristics
User transfers accounts number to personal account	Numerical characteristics

Table 1-4 payment operation predetermined registration operation behavioural characteristic

Table 1-5 consumption operation predetermined registration operation behavioural characteristic

Predetermined registration operation behavioural characteristic	Characteristic value classification
		Whether First Consumption card one week in is opened	Two-value category feature
Whether consumed in sensitization time	Two-value category feature
		In 1 hour, the long-distance consuming number of account	Numerical characteristics
In 1 hour, account consumes total degree	Numerical characteristics
		In 1 day, user's cumulative consumption amount of money	Numerical characteristics
In 1 day, account summary transfer amounts	Numerical characteristics
		In 1 day, the account long-distance consuming amount of money	Numerical characteristics
In 1 day, customer transaction password errors number	Numerical characteristics

Determine first sample user/second sample of users under multiple predetermined registration operation behavioural characteristics through the above way Shown in Figure 3 after characteristic value, for step S202 and S204 when specific implementation, the embodiment of the present application is based on following sides Formula constructs first eigenvector/second feature vector:

S301: according to characteristic value of first sample user/second sample of users under multiple predetermined registration operation behavioural characteristics, Form the initial first eigenvector of first sample user/second sample of users/initial second feature vector.

S302: data cleansing is carried out to initial first eigenvector/initial second feature vector, obtains cleaning fisrt feature Vector/cleaning second feature vector.

Herein, because historical operation information data are likely to occur mistake and are lost during acquisition and transmission, step The effect of S302 is to remove the characteristic value of feature distribution exception and carry out to the predetermined registration operation behavioural characteristic of those values that lack in individuality Characteristic value filling processing.

When specific implementation, it is alternatively possible to which it is different to be purged feature distribution using isolated forest (IForest) model Normal characteristic value.Herein, isolating forest model for exception definition is " to be easy outlier (the more likely to be isolated Separated) ", it can be understood as sparse and high from the density farther away point of group of distribution.Isolated forest model is by many Random decision tree composition, each decision tree are all random from all characteristic sets choose when dividing leaf node Select target signature and the sort operation of the threshold value progress node in random selection target signature.After generating one tree, often One primary data sample all can uniquely correspond to a leaf node in tree, and often leaf node corresponding to exceptional sample The number of plies it is relatively high.

Optionally, when the predetermined registration operation behavioural characteristic to those values that lack in individuality carries out characteristic value filling processing, It, can be directly by all sample of users when the characteristic value classification of the predetermined registration operation behavioural characteristic for the value that lacks in individuality is category feature Historical operation information data in the most classification of frequency of occurrence corresponding with the predetermined registration operation behavioural characteristic as the default behaviour Make the characteristic value of behavioural characteristic；It, can when the characteristic value classification of the predetermined registration operation behavioural characteristic for the value that lacks in individuality is numerical characteristics Directly by all characteristic values corresponding with the predetermined registration operation behavioural characteristic in the historical operation information data of all sample of users Characteristic value of the average value as the predetermined registration operation behavioural characteristic.

S303: data enhancing is carried out to cleaning first eigenvector/cleaning second feature vector.

Under normal circumstances, the quantity of positive sample (there is no the samples of fraud) will be far more than negative sample (hair The sample of raw fraud) quantity, i.e., positive and negative sample size is very unbalanced, and unbalanced sample can come to the training band of model Very big difficulty.Therefore it needs to carry out data enhancing to cleaning first eigenvector/cleaning second feature vector.

When specific implementation, it is alternatively possible to using synthesis minority class oversampling technique (Synthetic Minority Oversampling Technique, SMOTE) to cleaning first eigenvector/cleaning second feature vector of negative sample user Carry out expansion processing, SMOTE algorithm by the cleaning first eigenvector of all negative sample users/cleaning second feature vector, It is mapped in feature space, then each cleaning first eigenvector/cleaning second feature vector can correspond in the space A point, a point in each arbitrary point line as newly-generated negative sample user cleaning first eigenvector/it is clear Wash second feature vector, aforesaid operations are repeated then can be generated the cleaning fisrt feature of any number of negative sample user to Amount/cleaning second feature vector finally controls cleaning first eigenvector/cleaning second feature of newly-generated negative sample user The cleaning first eigenvector of vector sum positive sample user/cleaning second feature vector, between ratio reach default ratio, example Such as default ratio can be 1:3 or 1:4.

S304: Data Dimensionality Reduction is carried out to cleaning first eigenvector/cleaning second feature vector, obtains dimensionality reduction fisrt feature Vector/dimensionality reduction second feature vector.

Herein, carry out Data Dimensionality Reduction can significance level in cleaning first eigenvector/cleaning second feature vector compared with Low characteristic value is removed, and is conducive to the promotion of model training speed and the raising of model recognition accuracy in this way.

When specific implementation, it is alternatively possible to using principal component analysis (Principal Component Analysis, PCA) method carries out Data Dimensionality Reduction to cleaning first eigenvector/cleaning second feature vector.PCA method is to original The feature of beginning carries out linear transformation, and original high dimensional feature is mapped to the feature of low-dimensional, makes between the feature after converting The degree of correlation is lower, more can reflect the essential information of data.

S305: data normalization operation is carried out to dimensionality reduction first eigenvector/dimensionality reduction second feature vector, is obtained final First eigenvector/second feature vector.

Herein, the purpose of data normalization operation is that the characteristic value of each predetermined registration operation behavioural characteristic is mapped to one Identical range does so the dimension impact that can be eliminated between different predetermined registration operation behavioural characteristics, can be more conducive to model Training.

When specific implementation, it is alternatively possible to using (0,1) standardized way, i.e., by all predetermined registration operation behaviors Feature is all converted to the normal data that mean value is 0, variance is 1.

It is provided by the embodiments of the present application after obtaining final first eigenvector/second feature vector through the above steps Cheating identification model training method further includes following S103 and S104:

S103: first eigenvector is input to the first nerves network and nervus opticus network of symmetrical configuration, to first Neural network carries out unsupervised pre-training.

Wherein, first nerves network is for encoding first eigenvector；Nervus opticus network is used for by compiling First eigenvector after code is decoded.

Illustratively, shown in Figure 4, the embodiment of the present application provides a kind of first nerves network and nervus opticus network Structural schematic diagram.Figure 4, it is seen that L1 layers, L2 layers constitute first nerves network with L3 layers, L3 layers, L4 layers and L5 Layer constitutes nervus opticus network.Wherein, L1 layers and L5 layers of symmetrical configuration, L2 layers a with L4 layers of symmetrical configuration namely neuron Number is identical.L1 layers, L2 layers be coding layer, L4 layer, L5 layers be decoding layer, the L3 layers of output layer as first nerves network, simultaneously Input layer as nervus opticus network.From L1 layers to L3 layers, neuron number is gradually decreased, and can be reduced by a certain percentage, Such as from L1 layers to L3 layers, neuron number is respectively 2m, m and m/2, can also be reduced with not to scale (NTS)；From L3 layers to L5 layers, mind Gradually increase, can increase by a certain percentage through first number, such as from L3 layers to L5 layers, neuron number is respectively m/2, m and 2m can also be increased with not to scale (NTS).Wherein, the L1 layers of input layer as entire neural network model is used for input step S102 Obtained first eigenvector.Optionally, L1 layers to L5 layers of each layer is all full articulamentum, can sufficiently learn fisrt feature The essential information of vector guarantees the accuracy of model training.

It is appreciated that the structure of first nerves network and nervus opticus network in Fig. 4 is only exemplary, the first mind It can also increase through network and the nervus opticus network number of plies, such as first nerves network includes L1 layers, L2 layers, L3 layers and L4 layers, Correspondingly, nervus opticus network includes L4 layers, L5 layers, L6 layers and L7 layers, wherein L1 layers and L7 layers of symmetrical configuration, L2 layers and L6 Layer symmetrical configuration, L3 layers and L5 layers of symmetrical configuration, the input of the L4 layers of output layer and nervus opticus network for first nerves network Layer.

The effect of first nerves network is encoded to first eigenvector, carries out eigentransformation to first eigenvector With the compression of characteristic dimension, it is therefore an objective to remove the noise in first eigenvector, extract reaction first sample user's operation row The most essential information being characterized；The effect of nervus opticus network is decoded to the first eigenvector after coding, also Original goes out the compressed first eigenvector of first nerves network code.

Shown in Figure 5 when specific implementation, the embodiment of the present application specifically uses following manner to first nerves network Carry out unsupervised pre-training:

S501: first eigenvector is input in first nerves network, obtains at least one of first nerves network The coding characteristic vector of target encoding layer output.

Optionally, target encoding layer can be all coding layers, be also possible to specified code segment layer.If the One neural network includes L1 layers, L2 layers, L3 layers and L4 layers, correspondingly, nervus opticus network includes L4 layers, L5 layers, L6 layers and L7 Layer, wherein L1 layers and L7 layers of symmetrical configuration, L2 layers and L6 layers of symmetrical configuration, L3 layers and L5 layers of symmetrical configuration, L4 layers are the first mind The input layer of output layer and nervus opticus network through network.Target encoding layer can be all coding layers, i.e. coding layer L1 Layer, L2 layers, L3 layers, target encoding layer may be specified code segment layer, i.e., its in coding layer L1 layers, L2 layers, L3 layers Middle a part of coding layer.

S502: the coding characteristic vector of the last layer coding layer output in first nerves network is input to nervus opticus In network, the decoding feature vector of target decoder layer output corresponding with target encoding layer in nervus opticus network is obtained.

Herein, target decoder layer is corresponding with target encoding layer, such as in Fig. 4, and L1 layers corresponding with L5 layers, and L2 layers right with L4 layers It answers.

S503: feature vector is decoded according to coding characteristic vector sum, first nerves network and nervus opticus network are carried out Epicycle training.

During a wheel training, according to the loss function of network model, network is carried out in the way of gradient decline Training.When specific implementation, optionally, first using the coding characteristic vector sum of first sample user decode feature to The loss for measuring first sample user to first nerves network and nervus opticus network adjusting parameter and then uses tune First nerves network and nervus opticus network after whole parameter obtain the loss of next first sample user, then to first Neural network and nervus opticus network readjust parameter, until the coding characteristic vector sum using all first sample users After feature vector completion is decoded to first nerves network and nervus opticus network training, complete to first nerves network and second The epicycle training of neural network.

Illustratively, when target encoding layer is L1 layers, L2 layers, L3 layers, and target decoder layer is L5 layers, L6 layers, L7 layers, In, L1 layers are corresponding with L7 layers, and L2 layers are corresponding with L6 layers, and L3 layers are corresponding with L5 layers, and first nerves network and nervus opticus network are whole The loss function of a unsupervised network is shown in formula (1):

Formula (1):

Wherein, X_L1For L1 layers of coding characteristic vector, X_L7For L7 layers of decoding feature vector, i is i-th of target first Sample of users, n are the quantity of first sample user, and γ and β are a floating point values of the value between [0,1], are illustrated each The weight of group intermediate objective coding layer and intermediate objective decoding layer.

It is shown in Figure 6 when specific implementation, the embodiment of the present application specifically use following manner to first nerves network and Nervus opticus network carries out epicycle training:

S601: epicycle is not completed to any one first sample user in the first sample user of training as target also First sample user decodes feature vector according to the coding characteristic vector sum of target first sample user, determines the target the Loss of one sample of users in epicycle.

S602: according to target first sample user in the loss of epicycle, first nerves network and nervus opticus network are adjusted Parameter.

S603: it using target first sample user as the first sample user for completing training, and will not also be completed when front-wheel Other any one first sample users are as new target first sample user in trained first sample user.

S604: using first nerves network and nervus opticus network after parameter is had adjusted, the new target first is obtained The coding characteristic vector sum of sample of users decodes feature vector, and returns to special according to the coding of target first sample user It levies vector sum and decodes feature vector, determine the target first sample user the loss of epicycle the step of.

S605: until all first sample users complete the training when front-wheel, completion is to first nerves network and the The epicycle training of two neural networks.

S504: it by carrying out more wheel training to first nerves network and nervus opticus network, completes to first nerves network Unsupervised pre-training.

Unsupervised pre- instruction is carried out to first nerves network using the largely first sample user without label through the above way After white silk, the characteristic information of a large amount of first sample user is learnt in first nerves network, the embodiment of the present application passes through following Step S104 carries out the training for having supervision, energy to first nerves network and classifier using there is the second sample of users of label on a small quantity Trained first nerves network and classifier are accessed, as fraud identification model.

S104: being input to first nerves network and classifier by pre-training for second feature vector, special based on second The corresponding fraud markup information of vector sum is levied, to the first nerves network and classifier progress Training Jing Guo pre-training, Obtain fraud identification model.

Wherein, the first nerves network by pre-training is used to carry out feature learning to second feature vector；Classifier is used Classify in second feature vector.Such as can be using sigmoid function as classifier, it can be with by sigmoid function The operation behavior for obtaining the second sample of users is the probability of fraud.

Illustratively, shown in Figure 7, the embodiment of the present application provides the structure of a kind of first nerves network and classifier Schematic diagram, L1 layers, L2 layers, L3 layers are first nerves network, and L6 layers are classifier.

Shown in Figure 8 when specific implementation, the embodiment of the present application specifically uses following manner to by pre-training First nerves network and classifier carry out Training:

S801: second feature vector is input to first nerves network and classifier by pre-training, obtains the second sample The fraud recognition result of this user.

S802: it is marked according to the fraud of the fraud recognition result of each second sample of users and second sample of users Information, to the first nerves network and classifier progress epicycle Training Jing Guo pre-training.

During a wheel training, according to the loss function of network model, network is carried out in the way of gradient decline Training.When specific implementation, optionally, the intersection entropy loss of second sample of users is obtained first, to first nerves net Network and classifier adjusting parameter and then next the is obtained using the first nerves network and classifier that have adjusted after parameter The intersection entropy loss of two sample of users, then to first nerves network and classifier adjusting parameter, until passing through the second all samples This user's intersects training of the entropy loss completion to first nerves network and classifier, completes to first nerves network and classifier Epicycle training.

Specifically, the cross entropy loss function for entirely having supervision network of first nerves network and classifier composition is formula (2) shown in:

Formula (2):

Wherein, x_iFor the vector value of i-th of second sample of users, y_iLetter is marked for the fraud of i-th of second sample of users Breath, m are the quantity of the second sample of users, σ (x_i) it is sigmoid function.General y_i0 or 1 is taken, for example, working as y_iIndicating for 1 should Cheating markup information is fraud, and 0 indicates that the fraud markup information is normal.

Shown in Figure 9 when specific implementation, the embodiment of the present application specifically uses following manner to by pre-training First nerves network and classifier carry out epicycle Training:

S901: epicycle is not completed to any one second sample of users in the second sample of users of training as target also Second sample of users, according to taking advantage of for the fraud recognition result of second sample of users of target and second sample of users of target Markup information is cheated, determines the second sample of users of target in the intersection entropy loss of epicycle.

S902: according to the second sample of users of target in the intersection entropy loss of epicycle, first nerves network and classifier are adjusted Parameter.

S903: it using the second sample of users of target as the second sample of users for completing training, and will not also be completed when front-wheel Any one other second sample of users are as new the second sample of users of target in the second trained sample of users.

S904: it using first nerves network and classifier after parameter is had adjusted, obtains new second sample of target and uses The fraud recognition result at family, and return to according to the fraud recognition result of the second sample of users of the target and the target The fraud markup information of two sample of users determines the second sample of users of target the intersection entropy loss of epicycle the step of.

S905: it until all second sample of users all complete the training when front-wheel, completes to the first mind Jing Guo pre-training Epicycle Training is carried out through network and classifier.

S803: by carrying out more wheel Trainings to first nerves network and classifier, fraud identification model is obtained.

Optionally, after the epicycle training to first nerves network and classifier is completed by step S905, the application is implemented Example can detect whether deconditioning by any one method in following three kinds of modes:

Mode one: whether detection epicycle reaches default wheel number；If it is, stopping to first nerves network and classifier Training, the first nerves network and classifier that last training in rotation is got are as fraud identification model.

When specific implementation, in model training, a trained default wheel number can be preset, if detecting this Wheel reaches default wheel number, then stops the training to first nerves network and classifier, the first mind that last training in rotation is got Through network and classifier as fraud identification model.

Mode two: the first nerves network and classifier obtained using test set to epicycle is verified；If in test set, Intersect item number of the entropy loss no more than the preset test data for intersecting entropy loss threshold value, occupies the total item of test data in test set Several percentage is greater than preset first percentage threshold, then stops the training to first nerves network and classifier, will be last The first nerves network and classifier that one training in rotation is got are as fraud identification model.

During model training, it is required that the value for intersecting entropy loss is gradually reduced, therefore, using test set to this When the first nerves network and classifier that wheel obtains are verified, if intersecting entropy loss in test set and being not more than preset friendship The item number for pitching the test data of entropy loss threshold value, reaches certain predetermined ratio, for example, 90%, 95% etc., then stop to the first mind Training through network and classifier, the first nerves network and classifier that last training in rotation is got identify mould as fraud Type.

Mode three: successively by the intersection entropy loss of each second sample of users of epicycle, the second sample corresponding with previous round is used The intersection entropy loss at family is compared；If the intersection entropy loss of the second sample of users of epicycle is greater than previous round, corresponding second sample is used The quantity of second sample of users of the intersection entropy loss at family, the percentage for occupying all second sample of users quantity reach preset Second percentage threshold then stops the training to first nerves network and classifier, and the first mind that last round of training is obtained Through network and classifier as fraud identification model；

During model training, it is required that the value that the value for intersecting entropy loss is gradually reduced, and will intersect entropy loss The first nerves network and classifier obtained when minimum is as fraud identification model.

A kind of fraud identification model training method provided by the embodiments of the present application uses the first of first sample user first Feature vector carries out unsupervised training, first nerves network energy to the first nerves network and nervus opticus network of symmetrical configuration Enough to encode to first eigenvector, nervus opticus network can be decoded the first eigenvector by coding, During coding and decoding, so that feature of the first nerves e-learning to each first sample user；Then second is reused The second feature vector sum fraud markup information of sample of users carries out the training for having supervision to first nerves network and classifier, into One step is adjusted the first nerves network Jing Guo pre-training by way of having supervision, to promote the essence of first nerves network Degree, and the training to classifier is completed, the fraud identification model that precision satisfies the use demand is finally obtained, to reduce in model While the workload manually marked in training process to sample data, the efficiency and fraud of model training can be improved The recognition accuracy of identification model.

Shown in Figure 10, the embodiment of the present application also provides a kind of fraud recognition methods, comprising:

S1001: when operation behavior occurs for user to be detected, the historical operation information of the user to be detected is obtained.

S1002: according to the historical operation information of the user to be detected, building can be used in characterizing user's operation row to be detected The target feature vector being characterized.

When specific implementation, with reference to the method in the application in step S102, building can be used in characterizing use to be detected The target feature vector of family operation behavior feature.

S1003: by target feature vector, it is input to what fraud identification model training method training provided by the present application obtained It cheats in identification model, the operation behavior for obtaining user to be detected is the probability of fraud.

For example, when model training, it is to be checked when obtaining when fraud markup information indicates fraud, indicates normal using 0 using 1 When the operation behavior of survey user is that the probability of fraud is greater than predetermined probabilities threshold value, indicate that the operation behavior of user to be detected is Fraud is indicated when the probability that the operation behavior for obtaining user to be detected is fraud is not more than predetermined probabilities threshold value The operation behavior of user to be detected is normal behaviour.Such as predetermined probabilities threshold value can take 0.5,0.6 etc..

The embodiment of the present application is when detecting user's generation operation behavior to be detected is fraud, then to the current of user Operation behavior, which executes, intercepts operation, and this intercept information and all and user's phase of banking channel internal record Historical operation information of the historical operation information of pass as new the second sample of users for having label, saves to special database In.When detecting user's generation operation behavior to be detected is normal behaviour, then the operation requests of user is passed through into information and forwarded To the practical business system of banking channel, the operation requests of user are normally handled.

A kind of fraud recognition methods provided by the embodiments of the present application uses first when cheating identification model training The first eigenvector of first sample user carries out the first nerves network and nervus opticus network of symmetrical configuration unsupervised Training, first nerves network can encode first eigenvector, and nervus opticus network can be to by the first of coding Feature vector is decoded, during coding and decoding, so that first nerves e-learning is to each first sample user's Feature；Then the second feature vector sum fraud markup information of the second sample of users is reused to first nerves network and classifier The training for having supervision is carried out, further the first nerves network Jing Guo pre-training is adjusted by way of having supervision, with The precision of first nerves network is promoted, and completes the training to classifier, the fraud that precision satisfies the use demand is finally obtained and knows Other model, to can be improved while reducing the workload manually marked during model training to sample data The efficiency of model training and the recognition accuracy for cheating identification model.

Conceived based on same application, additionally provide in the embodiment of the present application and cheats that identification model training method is corresponding to take advantage of Cheat identification model training device, the principle solved the problems, such as due to the device in the embodiment of the present application with the embodiment of the present application is above-mentioned takes advantage of It is similar to cheat identification model training method, therefore the implementation of device may refer to the implementation of method, overlaps will not be repeated.

It is shown in Figure 11, identification model training device is cheated provided by the embodiment of the present application, comprising:

First obtains module 111, for obtaining the historical operation information of multiple first sample users；And obtain multiple the The historical operation information of two sample of users and the corresponding fraud markup information of each second sample of users；

First building module 112, for the historical operation information according to first sample user, building can be used in characterization the The first eigenvector of one sample user's operation behavioural characteristic；And the historical operation information according to the second sample of users, building It can be used in characterizing the second feature vector of the second sample of users operation behavior feature；

Pre-training module 113, for first eigenvector to be input to the first nerves network and the second mind of symmetrical configuration Through network, unsupervised pre-training is carried out to first nerves network；Wherein, first nerves network is used to carry out first eigenvector Coding；Nervus opticus network is used to be decoded the first eigenvector after coding；

Training module 114, for second feature vector to be input to first nerves network and classifier by pre-training, Based on the corresponding fraud markup information of second feature vector sum, to Jing Guo pre-training first nerves network and classifier have Supervised training obtains fraud identification model.

Optionally, the first building module 112 is specifically used for being directed to each first sample user, be used according to the first sample The historical operation information at family determines characteristic value of the first sample user under multiple predetermined registration operation behavioural characteristics；

According to characteristic value of the first sample user under multiple predetermined registration operation behavioural characteristics, building can be used in characterization the The first eigenvector of one sample user's operation behavioural characteristic；And

For each second sample of users, according to the historical operation information of second sample of users, second sample is determined Characteristic value of the user under multiple predetermined registration operation behavioural characteristics；

According to characteristic value of second sample of users under multiple predetermined registration operation behavioural characteristics, building can be used in characterization the The second feature vector of two sample of users operation behavior features.

Optionally, pre-training module 113 is obtained specifically for first eigenvector to be input in first nerves network The coding characteristic vector of at least one target encoding layer output in first nerves network；

The coding characteristic vector of the last layer coding layer output in first nerves network is input to nervus opticus network In, obtain the decoding feature vector of target decoder layer output corresponding with target encoding layer in nervus opticus network；

Feature vector is decoded according to coding characteristic vector sum, this training in rotation is carried out to first nerves network and nervus opticus network Practice；

By carrying out more wheel training to first nerves network and nervus opticus network, complete to first nerves network without prison Superintend and direct pre-training.

Optionally, pre-training module 113 is specifically used for using following manner to first nerves network and nervus opticus network Carry out epicycle training:

Epicycle is not completed to any one first sample user in the first sample user of training as target first also Sample of users decodes feature vector according to the coding characteristic vector sum of target first sample user, determines first sample of target Loss of this user in epicycle；

According to target first sample user in the loss of epicycle, the ginseng of first nerves network and nervus opticus network is adjusted Number；

Using target first sample user as the first sample user for completing training, and training will not be also completed when front-wheel Other any one first sample users are as new target first sample user in first sample user；

Using first nerves network and nervus opticus network after parameter is had adjusted, obtains the new target first sample and use The coding characteristic vector sum at family decodes feature vector, and returns to the coding characteristic vector according to target first sample user With decoding feature vector, determine the target first sample user the loss of epicycle the step of；

Until all first sample users complete the training when front-wheel, complete to first nerves network and nervus opticus The epicycle training of network.

Optionally, training module 114, specifically for second feature vector to be input to the first nerves net by pre-training Network and classifier obtain the fraud recognition result of the second sample of users；

According to the fraud recognition result of each second sample of users and the fraud markup information of second sample of users, To the first nerves network and classifier progress epicycle Training Jing Guo pre-training；

By carrying out more wheel Trainings to first nerves network and classifier, fraud identification model is obtained.

Optionally, training module 114, specifically for the first nerves network Jing Guo pre-training and being divided using following manner Class device carries out epicycle Training:

Epicycle is not completed to any one second sample of users in the second sample of users of training as target second also Sample of users, according to the fraud recognition result of second sample of users of target and the fraud mark of second sample of users of target Information is infused, determines the second sample of users of target in the intersection entropy loss of epicycle；

According to the second sample of users of target in the intersection entropy loss of epicycle, the ginseng of first nerves network and classifier is adjusted Number；

Using the second sample of users of target as the second sample of users for completing training, and training will not be also completed when front-wheel Any one other second sample of users are as new the second sample of users of target in second sample of users,

Using first nerves network and classifier after parameter is had adjusted, taking advantage of for new second sample of users of target is obtained Recognition result is cheated, and returns to the fraud recognition result and second sample of target according to the second sample of users of the target The fraud markup information of user determines the second sample of users of target the intersection entropy loss of epicycle the step of；

Until all second sample of users all complete the training when front-wheel, complete to the first nerves network Jing Guo pre-training Epicycle Training is carried out with classifier.

Optionally, training module 114, specifically for complete to Jing Guo pre-training first nerves network and classifier into After row epicycle Training,

Whether detection epicycle reaches default wheel number；If it is, stop the training to first nerves network and classifier, it will The first nerves network and classifier that last training in rotation is got are as fraud identification model；

Alternatively,

The first nerves network and classifier obtained using test set to epicycle is verified；If in test set, cross entropy Loss occupies hundred of test data total number in test set no more than the item number of the preset test data for intersecting entropy loss threshold value Divide ratio, be greater than preset first percentage threshold, then stops the training to first nerves network and classifier, by last training in rotation The first nerves network and classifier got is as fraud identification model；

Alternatively,

Successively by the intersection entropy loss of each second sample of users of epicycle, the second sample of users corresponding with previous round is intersected Entropy loss is compared；If the intersection of the second sample of users of epicycle intersected entropy loss and be greater than corresponding second sample of users of previous round The quantity of second sample of users of entropy loss, the percentage for occupying all second sample of users quantity reach preset second percentage Than threshold value, then stop the training to first nerves network and classifier, and the first nerves network that last round of training is obtained and Classifier is as fraud identification model.

A kind of fraud identification model training device provided by the embodiments of the present application, when cheating identification model training, It is carried out first using first nerves network and nervus opticus network of the first eigenvector of first sample user to symmetrical configuration Unsupervised training, first nerves network can encode first eigenvector, and nervus opticus network can be to by compiling The first eigenvector of code is decoded, during coding and decoding, so that first nerves e-learning is to each first sample The feature of this user；Then the second feature vector sum fraud markup information of the second sample of users is reused to first nerves network The training for having supervision is carried out with classifier, further the first nerves network Jing Guo pre-training is carried out by way of having supervision Adjustment to promote the precision of first nerves network, and completes the training to classifier, finally obtains what precision satisfied the use demand Identification model is cheated, thus while reducing the workload manually marked during model training to sample data, energy It enough improves the efficiency of model training and cheats the recognition accuracy of identification model.

Conceived based on same application, fraud identification corresponding with fraud recognition methods is additionally provided in the embodiment of the present application and is filled It sets, since the principle that the device in the embodiment of the present application solves the problems, such as is similar to the above-mentioned fraud recognition methods of the embodiment of the present application, Therefore the implementation of device may refer to the implementation of method, and overlaps will not be repeated.

It is shown in Figure 12, identification device is cheated provided by the embodiment of the present application, comprising:

Second obtains module 121, for obtaining the history of the user to be detected when operation behavior occurs for user to be detected Operation information；

Second building module 122, for the historical operation information according to the user to be detected, building can be used in characterization to Detect the target feature vector of user's operation behavioural characteristic；

It cheats recognition result and obtains module 123, for being input to fraud identification provided by the present application for target feature vector In the fraud identification model that model training method training obtains, the operation behavior for obtaining user to be detected is the general of fraud Rate.

A kind of fraud identification device provided by the embodiments of the present application uses first when cheating identification model training The first eigenvector of first sample user carries out the first nerves network and nervus opticus network of symmetrical configuration unsupervised Training, first nerves network can encode first eigenvector, and nervus opticus network can be to by the first of coding Feature vector is decoded, during coding and decoding, so that first nerves e-learning is to each first sample user's Feature；Then the second feature vector sum fraud markup information of the second sample of users is reused to first nerves network and classifier The training for having supervision is carried out, further the first nerves network Jing Guo pre-training is adjusted by way of having supervision, with The precision of first nerves network is promoted, and completes the training to classifier, the fraud that precision satisfies the use demand is finally obtained and knows Other model, to can be improved while reducing the workload manually marked during model training to sample data The efficiency of model training and the recognition accuracy for cheating identification model.

Shown in Figure 13, the another embodiment of the application also provides a kind of fraud identifying system, comprising: timer 131, such as Fraud identification model training device 132 provided by the present application and such as fraud identification device 133 provided by the present application；

Timer 131, fraud identification model training device 132 and fraud identification device 133 are sequentially connected；

Identification model training device 132 is cheated, for obtaining fraud identification model；

Timer 131 is timed the new fraud knowledge of reacquisition for periodically triggering fraud identification model training device Other model；

Cheat identification device 133, for according to fraud identification model training device obtain fraud identification model, obtain to The operation behavior for detecting user is the probability of fraud.

In the following, shown in Figure 14, the application, which provides a specific embodiment and identifies to fraud provided by the present application, is The application process of system illustrates:

Here illustrate the application process of banking channel fraud identifying system by taking Mobile banking as an example.

During it can clearly be seen that banking channel is instead cheated in Figure 14, fraud identifying system is core mould Block, fraud identifying system have docked mobile banking service system, and fraud identifying system comes from mobile banking service system by receiving Unite transmission user operation behavior, the value-at-risk of operation behavior is assessed and (namely fraud knowledge is carried out to operation behavior Not, the probability value that operation behavior is fraud is obtained).

If risk evaluation result is fraud operation, risk evaluation result is fed back to mobile banking service system, Mobile banking service system can carry out interception operation according to the feedback result, and be this intercept information and Mobile banking Corresponding Mobile banking's data are written as sample data in all historical operation informations relevant to the user of system internal record In library；

If risk evaluation result is normal operating, user's operation behavior is just transmitted to Mobile banking by instruction Next operation system is exactly some normal process flows of the user in mobile banking service system.

When banking channels database has accumulated a certain amount of training data within a period of time, (this partial data is come Source includes identification on line, user feedback, expert's mark or the black production database of other channels) after, it can be by these new training Data are sent to banking channel fraud identifying system, by training for the regular Boot Model of the timer of the internal system Journey, and the fraud identification model in more new line guarantee that the fraud identification model on line has advance, it is accurate to improve model identification Rate.

A kind of fraud identifying system provided by the embodiments of the present application uses first when cheating identification model training The first eigenvector of first sample user carries out the first nerves network and nervus opticus network of symmetrical configuration unsupervised Training, first nerves network can encode first eigenvector, and nervus opticus network can be to by the first of coding Feature vector is decoded, during coding and decoding, so that first nerves e-learning is to each first sample user's Feature；Then the second feature vector sum fraud markup information of the second sample of users is reused to first nerves network and classifier The training for having supervision is carried out, further the first nerves network Jing Guo pre-training is adjusted by way of having supervision, with The precision of first nerves network is promoted, and completes the training to classifier, the fraud that precision satisfies the use demand is finally obtained and knows Other model, to can be improved while reducing the workload manually marked during model training to sample data The efficiency of model training and the recognition accuracy for cheating identification model.

The embodiment of the present application also provides a kind of computer readable storage medium, stored on the computer readable storage medium There is computer program, which executes above-mentioned fraud identification model training method when being run by processor the step of.

Specifically, which can be general storage medium, such as mobile disk, hard disk, on the storage medium Computer program when being run, above-mentioned fraud identification model training method is able to carry out, so as to by largely without mark Sample data and have on a small quantity mark sample data training obtain fraud identification model, reduce during model training to sample While the workload that notebook data is manually marked, it can be improved the efficiency of model training and cheat the identification of identification model Accuracy rate.

The embodiment of the present application also provides another computer readable storage medium, deposited on the computer readable storage medium Computer program is contained, the fraud recognition methods in above method embodiment is executed when which is run by processor Step.

Specifically, which can be general storage medium, such as mobile disk, hard disk, on the storage medium Computer program when being run, above-mentioned fraud recognition methods is able to carry out, so as to by largely without the sample number of mark According to have mark on a small quantity sample data training obtain fraud identification model, reduce during model training to sample data into While the workload of pedestrian's work mark, it can be improved the efficiency of model training and cheat the recognition accuracy of identification model.

Corresponding to the fraud identification model training method in Fig. 1, the embodiment of the present application also provides a kind of computer equipment, As shown in figure 15, which includes memory 1000, processor 2000 and is stored on the memory 1000 and can be in the processing The computer program run on device 2000, wherein above-mentioned processor 2000 realizes above-mentioned fraud when executing above-mentioned computer program The step of identification model training method.

Specifically, above-mentioned memory 1000 and processor 2000 can be general memory and processor, not do here It is specific to limit, when the computer program of 2000 run memory 1000 of processor storage, it is able to carry out above-mentioned fraud identification mould Type training method, so as to pass through the largely sample data without mark and there is the sample data training of mark to be cheated on a small quantity Identification model can be improved mould while reducing the workload manually marked during model training to sample data The efficiency of type training and the recognition accuracy for cheating identification model.

Corresponding to the fraud recognition methods in Figure 10, the embodiment of the present application also provides another computer equipments, such as scheme Shown in 16, which includes memory 3000, processor 4000 and is stored on the memory 3000 and can be in the processor The computer program run on 4000, wherein above-mentioned processor 4000 realizes that above-mentioned fraud is known when executing above-mentioned computer program The step of other method.

Specifically, above-mentioned memory 3000 and processor 4000 can be general memory and processor, not do here It is specific to limit, when the computer program of 4000 run memory 3000 of processor storage, it is able to carry out above-mentioned fraud identification side Method identifies mould so as to by the largely sample data without mark and have the sample data training of mark to obtain fraud on a small quantity Type can be improved model training while reducing the workload manually marked during model training to sample data Efficiency and cheat identification model recognition accuracy.

The computer journey of fraud identification model training method, fraud recognition methods and device provided by the embodiment of the present application Sequence product, the computer readable storage medium including storing program code, the instruction that said program code includes can be used for holding Row previous methods method as described in the examples, specific implementation can be found in embodiment of the method, and details are not described herein.

In all examples being illustrated and described herein, any occurrence should be construed as merely illustratively, without It is as limitation, therefore, other examples of exemplary embodiment can have different values.

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description It with the specific work process of device, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.In the application In provided several embodiments, it should be understood that disclosed systems, devices and methods, it can be real by another way It is existing.The apparatus embodiments described above are merely exemplary.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.

It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in the executable non-volatile computer-readable storage medium of a processor.Based on this understanding, the application Technical solution substantially the part of the part that contributes to existing technology or the technical solution can be with software in other words The form of product embodies, which is stored in a storage medium, including some instructions use so that One computer equipment (can be personal computer, server or the network equipment etc.) executes each embodiment institute of the application State all or part of the steps of method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. is various to deposit Store up the medium of program code.

Finally, it should be noted that embodiment described above, the only specific embodiment of the application, to illustrate the application Technical solution, rather than its limitations, the protection scope of the application is not limited thereto, although with reference to the foregoing embodiments to this Shen It please be described in detail, those skilled in the art should understand that: anyone skilled in the art Within the technical scope of the present application, it can still modify to technical solution documented by previous embodiment or can be light It is readily conceivable that variation or equivalent replacement of some of the technical features；And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of the embodiment of the present application technical solution, should all cover the protection in the application Within the scope of.Therefore, the protection scope of the application shall be subject to the protection scope of the claim.

Claims

1. a kind of fraud identification model training method characterized by comprising

Obtain the historical operation information of multiple first sample users；And obtain the historical operation letter of multiple second sample of users Breath fraud markup information corresponding with each second sample of users；

According to the historical operation information of the first sample user, building can be used in characterizing the first sample user's operation row The first eigenvector being characterized；And the historical operation information according to second sample of users, building can be used in characterizing The second feature vector of the second sample of users operation behavior feature；

The first eigenvector is input to the first nerves network and nervus opticus network of symmetrical configuration, to first mind Unsupervised pre-training is carried out through network；Wherein, the first nerves network is for encoding the first eigenvector；Institute Nervus opticus network is stated for being decoded to the first eigenvector after coding；

The second feature vector is input to the first nerves network and classifier by pre-training, is based on described second Feature vector and the corresponding fraud markup information, to Jing Guo pre-training the first nerves network and the classifier into Row Training obtains fraud identification model.

2. the method according to claim 1, wherein described believe according to the historical operation of the first sample user Breath, building can be used in characterizing the first eigenvector of the first sample user's operation behavioural characteristic；And according to described The historical operation information of two sample of users, building can be used in characterizing the second spy of the second sample of users operation behavior feature Levy vector, comprising:

For each first sample user, according to the historical operation information of first sample user, determine this first Characteristic value of the sample of users under multiple predetermined registration operation behavioural characteristics；

According to characteristic value of the first sample user under multiple predetermined registration operation behavioural characteristics, building can be used in characterizing institute State the first eigenvector of first sample user's operation behavioural characteristic；And

For each second sample of users, according to the historical operation information of second sample of users, determine this second Characteristic value of the sample of users under multiple predetermined registration operation behavioural characteristics；

According to characteristic value of second sample of users under multiple predetermined registration operation behavioural characteristics, building can be used in characterizing institute State the second feature vector of the second sample of users operation behavior feature.

3. the method according to claim 1, wherein described be input to symmetrical configuration for the first eigenvector First nerves network and nervus opticus network, unsupervised pre-training is carried out to the first nerves network, comprising:

The first eigenvector is input in the first nerves network, at least one in the first nerves network is obtained The coding characteristic vector of a target encoding layer output；

The coding characteristic vector of the last layer coding layer output in the first nerves network is input to the nervus opticus In network, obtain the decoding feature of corresponding with target encoding layer target decoder layer output in the nervus opticus network to Amount；

Feature vector is decoded according to the coding characteristic vector sum, to the first nerves network and the nervus opticus net Network carries out epicycle training；

By carrying out more wheel training to the first nerves network and the nervus opticus network, complete to the first nerves net The unsupervised pre-training of network.

4. according to the method described in claim 3, it is characterized in that, described decode spy according to the coding characteristic vector sum Vector is levied, epicycle training is carried out to the first nerves network and the nervus opticus network, comprising:

Epicycle is not completed to any one first sample user in the first sample user of training as target first also Sample of users decodes feature vector according to the coding characteristic vector sum of target first sample user, determines the mesh First sample user is marked in the loss of epicycle；

According to the target first sample user in the loss of epicycle, the first nerves network and second mind are adjusted Parameter through network；

Using the target first sample user as the first sample user for completing training, and training will not be also completed when front-wheel Other any one first sample users are as new target first sample user in first sample user；

Using the first nerves network and the nervus opticus network after parameter is had adjusted, new first sample of target is obtained The coding characteristic vector sum of this user decodes feature vector, and returns to the coding according to target first sample user Feature vector and the decoding feature vector determine the target first sample user the loss of epicycle the step of；

Until all first sample users complete the training when front-wheel, completion is to the first nerves network and described The epicycle training of nervus opticus network.

5. the method according to claim 1, wherein described be input to the second feature vector by instructing in advance The experienced first nerves network and classifier are based on the corresponding fraud markup information of the second feature vector sum, right Training is carried out by the first nerves network of pre-training and the classifier, obtains fraud identification model, comprising:

The second feature vector is input to the first nerves network and classifier by pre-training, obtains described second The fraud recognition result of sample of users；And

It is marked and is believed according to the fraud of the fraud recognition result of each second sample of users and second sample of users Breath, to the first nerves network and classifier progress epicycle Training Jing Guo pre-training；

By carrying out more wheel Trainings to the first nerves network and the classifier, the fraud identification mould is obtained Type.

6. according to the method described in claim 5, it is characterized in that, described know according to the fraud of each second sample of users Not as a result, and second sample of users the fraud markup information, to Jing Guo pre-training the first nerves network with The classifier carries out epicycle Training, comprising:

Epicycle is not completed to any one second sample of users in second sample of users of training as target second also Sample of users is taken advantage of according to the fraud recognition result of second sample of users of target and the described of second sample of users of target Markup information is cheated, determines second sample of users of target in the intersection entropy loss of epicycle；

According to second sample of users of target in the intersection entropy loss of epicycle, the first nerves network and described is adjusted The parameter of classifier；

Using second sample of users of target as the second sample of users for completing training, and training will not be also completed when front-wheel Any one other second sample of users are as new the second sample of users of target in second sample of users,

Using the first nerves network and the classifier after parameter is had adjusted, new second sample of users of target is obtained Fraud recognition result, and return to the fraud recognition result and the target second according to the second sample of users of the target The fraud markup information of sample of users determines second sample of users of target the intersection entropy loss of epicycle the step of；

Until all second sample of users all complete the training when front-wheel, complete to the first nerves network Jing Guo pre-training Epicycle Training is carried out with the classifier.

7. according to the method described in claim 6, it is characterized in that, the completion is to the first nerves net Jing Guo pre-training After network and the classifier carry out epicycle Training, further includes:

Whether detection epicycle reaches default wheel number；If it is, stopping the instruction to the first nerves network and the classifier Practice, the first nerves network and the classifier that last training in rotation is got are as the fraud identification model；

Alternatively,

The first nerves network and the classifier obtained using test set to epicycle is verified；If the test set In, intersect item number of the entropy loss no more than the preset test data for intersecting entropy loss threshold value, occupies the test integrated test The percentage of data total number is greater than preset first percentage threshold, then stops to the first nerves network and described point The training of class device, the first nerves network and the classifier that last training in rotation is got identify mould as the fraud Type；

Alternatively,

Successively by the intersection entropy loss of each second sample of users of epicycle, the second sample of users corresponding with previous round is intersected Entropy loss is compared；If the entropy loss that intersects of the second sample of users described in epicycle is greater than corresponding second sample of users of previous round Intersect the quantity of the second sample of users of entropy loss, the percentage for occupying all second sample of users quantity reaches preset second Percentage threshold then stops the training to the first nerves network and the classifier, and the institute that last round of training is obtained First nerves network and the classifier are stated as the fraud identification model.

8. a kind of fraud recognition methods characterized by comprising

According to the historical operation information of the user to be detected, building can be used in characterizing the user's operation behavioural characteristic to be detected Target feature vector；

The target feature vector is input to by fraud identification model training side described in claim 1-7 any one In the fraud identification model that method training obtains, the operation behavior for obtaining the user to be detected is the probability of fraud.

9. a kind of fraud identification model training device characterized by comprising

First obtains module, for obtaining the historical operation information of multiple first sample users；And obtain multiple second samples The historical operation information of user and the corresponding fraud markup information of each second sample of users；

First building module, for the historical operation information according to the first sample user, building can be used in described in characterization The first eigenvector of first sample user's operation behavioural characteristic；And believed according to the historical operation of second sample of users Breath, building can be used in characterizing the second feature vector of the second sample of users operation behavior feature；

Pre-training module, for the first eigenvector to be input to the first nerves network and nervus opticus net of symmetrical configuration Network carries out unsupervised pre-training to the first nerves network；Wherein, the first nerves network is used for the fisrt feature Vector is encoded；The nervus opticus network is used to be decoded the first eigenvector after coding；

Training module, for being input to the second feature vector by the first nerves network of pre-training and classification Device is based on the corresponding fraud markup information of the second feature vector sum, to the first nerves net Jing Guo pre-training Network and the classifier carry out Training, obtain fraud identification model.

10. a kind of fraud identification device characterized by comprising

Second obtains module, for when operation behavior occurs for user to be detected, obtaining the historical operation letter of the user to be detected Breath；

Second building module, for the historical operation information according to the user to be detected, building can be used in characterizing described to be checked Survey the target feature vector of user's operation behavioural characteristic；

It cheats recognition result and obtains module, for being input to the target feature vector by claim 1-7 any one In the fraud identification model that the fraud identification model training method training obtains, the operation row of the user to be detected is obtained For the probability for fraud.