CN109544190A - A kind of fraud identification model training method, fraud recognition methods and device - Google Patents
A kind of fraud identification model training method, fraud recognition methods and device Download PDFInfo
- Publication number
- CN109544190A CN109544190A CN201811432681.3A CN201811432681A CN109544190A CN 109544190 A CN109544190 A CN 109544190A CN 201811432681 A CN201811432681 A CN 201811432681A CN 109544190 A CN109544190 A CN 109544190A
- Authority
- CN
- China
- Prior art keywords
- sample
- training
- users
- fraud
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/018—Certifying business or products
- G06Q30/0185—Product, service or business identity fraud
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
This application provides a kind of fraud identification model training method, fraud recognition methods and devices, wherein, fraud identification model training method includes: the historical operation information of first sample user and the second sample of users based on acquisition and the fraud markup information of the second sample of users, constructs the first eigenvector of first sample user and the second feature vector of the second sample of users;Unsupervised pre-training is carried out to first nerves network based on first eigenvector;And Training is carried out to the first nerves network and classifier Jing Guo pre-training based on the corresponding fraud markup information of second feature vector sum, obtain fraud identification model.The embodiment of the present application can have the sample data training of mark to obtain fraud identification model by the sample data largely without mark and on a small quantity, while reducing the workload manually marked during model training to sample data, the efficiency of model training and the recognition accuracy of fraud identification model can be improved.
Description
Technical field
This application involves machine learning techniques fields, in particular to a kind of fraud identification model training method, fraud
Recognition methods and device.
Background technique
The fast development of internet and popularizing for intelligent terminal, so that people are in the electronic silver for passing through multiple business channel
Row remotely handle query the balance, transfer accounts, the business such as payment, financing of do shopping when obtain great convenient, no matter people are any
Time, any place, bank counter is not needed, the finger that only need to easily make a movement can be carried out remittance by e-bank and be turned
Account, calmly mutual turn living, Credit Statement and detail inquiry, credit card repayment, financing/fund purchase, a variety of finance such as payment of living
Business, efficiency are greatly improved.But e-bank, while providing convenient service for user, there is also very much
Security risk.
Investigation display, the network crime bring up to 445,000,000,000 dollars of economic loss to the whole world every year, it is increasingly complicated and to
Different industries infiltration.At home, the Dark Industry Link scale of network swindle is more than 110,000,000,000 yuan, and practitioner is more than 1,600,000 people.
Show according to the data that China Internet association issues, 63.4% netizen's message registration, shopping online record etc., and information are revealed;
78.2% netizen's personally identifiable information was once leaked.It is broken that the information that fraudster steals victim constantly carries out violence later
Solution, account information is stolen, account information is usurped, is stolen and is turned the behaviors such as fund, the leakage of personal information, is realized precisely to swindle molecule
Swindle brings great convenience, so that swindle molecule may be implemented precisely to swindle, the single amount of money of swindling is constantly soaring, fraud
Behavior from single individual behavior, develops into well organized, the specific Dark Industry Link item of professional division, sends out for bank
Exhibition network finance business brings severe challenge.
In order to enhance the safety of e-bank, there is the training method of supervision to engineering using traditional in the prior art
It practises model to be trained, but when carrying out has the training of supervision, the sample for needing label is trained, and marks need of work
It is pure artificial come what is completed, there are problems that time and effort consuming.But if having exemplar data to machine learning mould using a small amount of
Type is trained, and can be very few due to sample data, the low problem of the fraud identification model recognition accuracy caused.
Summary of the invention
In view of this, the embodiment of the present application is designed to provide a kind of fraud identification model training method, fraud identification
Method and device can have the sample data training of mark to obtain fraud identification by the sample data largely without mark and on a small quantity
Model while reducing the workload manually marked during model training to sample data, can be improved model and instruct
The recognition accuracy of experienced efficiency and fraud identification model.
In a first aspect, the embodiment of the present application provides a kind of fraud identification model training method, comprising:
Obtain the historical operation information of multiple first sample users;And obtain the historical operation of multiple second sample of users
Information and the corresponding fraud markup information of each second sample of users;
According to the historical operation information of the first sample user, building can be used in characterizing the first sample user behaviour
Make the first eigenvector of behavioural characteristic;And the historical operation information according to second sample of users, building can be used in
Characterize the second feature vector of the second sample of users operation behavior feature;
The first eigenvector is input to the first nerves network and nervus opticus network of symmetrical configuration, to described
One neural network carries out unsupervised pre-training;Wherein, the first nerves network is for compiling the first eigenvector
Code;The nervus opticus network is used to be decoded the first eigenvector after coding;
The second feature vector is input to the first nerves network and classifier by pre-training, based on described
The corresponding fraud markup information of second feature vector sum, to Jing Guo pre-training the first nerves network and the classification
Device carries out Training, obtains fraud identification model.
With reference to first aspect, the embodiment of the present application provides the first possible embodiment of first aspect, wherein institute
The historical operation information according to the first sample user is stated, building can be used in characterizing the first sample user's operation behavior
The first eigenvector of feature;And the historical operation information according to second sample of users, building can be used in characterizing institute
State the second feature vector of the second sample of users operation behavior feature, comprising:
For each first sample user, according to the historical operation information of first sample user, determining should
Characteristic value of the first sample user under multiple predetermined registration operation behavioural characteristics;
According to characteristic value of the first sample user under multiple predetermined registration operation behavioural characteristics, building can be used in table
Levy the first eigenvector of the first sample user's operation behavioural characteristic;And
For each second sample of users, according to the historical operation information of second sample of users, determining should
Characteristic value of second sample of users under multiple predetermined registration operation behavioural characteristics;
According to characteristic value of second sample of users under multiple predetermined registration operation behavioural characteristics, building can be used in table
Levy the second feature vector of the second sample of users operation behavior feature.
With reference to first aspect, the embodiment of the present application provides second of possible embodiment of first aspect, wherein institute
The first nerves network and nervus opticus network that the first eigenvector is input to symmetrical configuration are stated, to the first nerves
Network carries out unsupervised pre-training, comprising:
The first eigenvector is input in the first nerves network, is obtained in the first nerves network extremely
The coding characteristic vector of few target encoding layer output;
The coding characteristic vector of the last layer coding layer output in the first nerves network is input to described second
In neural network, the decoding for obtaining target decoder layer output corresponding with the target encoding layer in the nervus opticus network is special
Levy vector;
Feature vector is decoded according to the coding characteristic vector sum, to the first nerves network and second mind
Epicycle training is carried out through network;
By carrying out more wheel training to the first nerves network and the nervus opticus network, complete to first mind
Unsupervised pre-training through network.
The possible embodiment of second with reference to first aspect, the embodiment of the present application provide the third of first aspect
Possible embodiment, wherein it is described to decode feature vector according to the coding characteristic vector sum, to the first nerves
Network and the nervus opticus network carry out epicycle training, comprising:
Epicycle is not completed to any one first sample user in the first sample user of training as target also
First sample user decodes feature vector according to the coding characteristic vector sum of target first sample user, determines
Loss of the target first sample user in epicycle;
According to the target first sample user in the loss of epicycle, the first nerves network and described are adjusted
The parameter of two neural networks;
Using the target first sample user as the first sample user for completing training, and instruction will not be also completed when front-wheel
Other any one first sample users are as new target first sample user in experienced first sample user;
Using the first nerves network and the nervus opticus network after parameter is had adjusted, the new target is obtained
The coding characteristic vector sum of one sample of users decodes feature vector, and returns to according to target first sample user
Feature vector is decoded described in coding characteristic vector sum, determines the target first sample user the loss of epicycle the step of;
Until all first sample users complete the training when front-wheel, complete to the first nerves network with
The epicycle training of the nervus opticus network.
With reference to first aspect, the embodiment of the present application provides the 4th kind of possible embodiment of first aspect, wherein institute
It states and the second feature vector is input to the first nerves network and classifier by pre-training, it is special based on described second
Levy the corresponding fraud markup information of vector sum, to Jing Guo pre-training the first nerves network and classifier progress
Training obtains fraud identification model, comprising:
The second feature vector is input to the first nerves network and classifier by pre-training, described in acquisition
The fraud recognition result of second sample of users;And
According to the fraud recognition result of each second sample of users and the fraud mark of second sample of users
Information is infused, to the first nerves network and classifier progress epicycle Training Jing Guo pre-training;
By carrying out more wheel Trainings to the first nerves network and the classifier, the fraud identification is obtained
Model.
The 4th kind of possible embodiment with reference to first aspect, the embodiment of the present application provide the 5th kind of first aspect
Possible embodiment, wherein the fraud recognition result and second sample according to each second sample of users
The fraud markup information of user, to Jing Guo pre-training the first nerves network and the classifier carry out epicycle have prison
Supervise and instruct white silk, comprising:
Epicycle is not completed to any one second sample of users in second sample of users of training as target also
Second sample of users, according to the fraud recognition result of second sample of users of target and the institute of second sample of users of target
Fraud markup information is stated, determines second sample of users of target in the intersection entropy loss of epicycle;
According to second sample of users of target epicycle the intersection entropy loss, adjust the first nerves network and
The parameter of the classifier;
Using second sample of users of target as the second sample of users for completing training, and instruction will not be also completed when front-wheel
Any one other second sample of users are as new the second sample of users of target in the second experienced sample of users,
Using the first nerves network and the classifier after parameter is had adjusted, new second sample of target is obtained
The fraud recognition result of user, and return to fraud recognition result and the target according to the second sample of users of the target
The fraud markup information of second sample of users determines second sample of users of target in the step of the intersection entropy loss of epicycle
Suddenly;
Until all second sample of users all complete the training when front-wheel, complete to the first nerves Jing Guo pre-training
Network and the classifier carry out epicycle Training.
The 5th kind of possible embodiment with reference to first aspect, the embodiment of the present application provide the 6th kind of first aspect
Possible embodiment, wherein described to complete to the first nerves network and classifier progress sheet Jing Guo pre-training
After taking turns Training, further includes:
Whether detection epicycle reaches default wheel number;If it is, stopping to the first nerves network and the classifier
Training, the first nerves network and the classifier that last training in rotation is got are as the fraud identification model;
Alternatively,
The first nerves network and the classifier obtained using test set to epicycle is verified;If the test
It concentrates, intersects item number of the entropy loss no more than the preset test data for intersecting entropy loss threshold value, occupy and surveyed in the test set
The percentage of data total number is tried, preset first percentage threshold is greater than, then is stopped to the first nerves network and described
The training of classifier is identified using the first nerves network that last training in rotation is got and the classifier as the fraud
Model;
Alternatively,
Successively by the intersection entropy loss of each second sample of users of epicycle, the second sample of users corresponding with previous round
Intersect entropy loss to be compared;If the intersection entropy loss of the second sample of users described in epicycle is greater than previous round, corresponding second sample is used
The quantity of second sample of users of the intersection entropy loss at family, the percentage for occupying all second sample of users quantity reach preset
Second percentage threshold then stops the training to the first nerves network and the classifier, and last round of training is obtained
The first nerves network and the classifier as the fraud identification model.
Second aspect, the embodiment of the present application provide a kind of fraud recognition methods, comprising:
When operation behavior occurs for user to be detected, the historical operation information of the user to be detected is obtained;
According to the historical operation information of the user to be detected, building can be used in characterizing the user's operation behavior to be detected
The target feature vector of feature;
The target feature vector is input to by the possible embodiment of the first of first aspect and first aspect
The fraud identification that the training of fraud identification model training method described in any one to the 6th kind of possible embodiment obtains
In model, the operation behavior for obtaining the user to be detected is the probability of fraud.
The third aspect, the embodiment of the present application provide a kind of fraud identification model training device, comprising:
First obtains module, for obtaining the historical operation information of multiple first sample users;And obtain multiple second
The historical operation information of sample of users and the corresponding fraud markup information of each second sample of users;
First building module, for the historical operation information according to the first sample user, building can be used in characterizing
The first eigenvector of the first sample user's operation behavioural characteristic;And the historical operation according to second sample of users
Information, building can be used in characterizing the second feature vector of the second sample of users operation behavior feature;
Pre-training module, for the first eigenvector to be input to the first nerves network and the second mind of symmetrical configuration
Through network, unsupervised pre-training is carried out to the first nerves network;Wherein, the first nerves network is used for described first
Feature vector is encoded;The nervus opticus network is used to be decoded the first eigenvector after coding;
Training module, for the second feature vector to be input to the first nerves network by pre-training and is divided
Class device is based on the corresponding fraud markup information of the second feature vector sum, to the first nerves Jing Guo pre-training
Network and the classifier carry out Training, obtain fraud identification model.
In conjunction with the third aspect, the embodiment of the present application provides the first possible embodiment of the third aspect, wherein institute
The first building module is stated, is specifically used for:
For each first sample user, according to the historical operation information of first sample user, determining should
Characteristic value of the first sample user under multiple predetermined registration operation behavioural characteristics;
According to characteristic value of the first sample user under multiple predetermined registration operation behavioural characteristics, building can be used in table
Levy the first eigenvector of the first sample user's operation behavioural characteristic;And
For each second sample of users, according to the historical operation information of second sample of users, determining should
Characteristic value of second sample of users under multiple predetermined registration operation behavioural characteristics;
According to characteristic value of second sample of users under multiple predetermined registration operation behavioural characteristics, building can be used in table
Levy the second feature vector of the second sample of users operation behavior feature.
In conjunction with the third aspect, the embodiment of the present application provides second of possible embodiment of the third aspect, wherein institute
Pre-training module is stated, is specifically used for:
The first eigenvector is input in the first nerves network, is obtained in the first nerves network extremely
The coding characteristic vector of few target encoding layer output;
The coding characteristic vector of the last layer coding layer output in the first nerves network is input to described second
In neural network, the decoding for obtaining target decoder layer output corresponding with the target encoding layer in the nervus opticus network is special
Levy vector;
Feature vector is decoded according to the coding characteristic vector sum, to the first nerves network and second mind
Epicycle training is carried out through network;
By carrying out more wheel training to the first nerves network and the nervus opticus network, complete to first mind
Unsupervised pre-training through network.
In conjunction with second of possible embodiment of the third aspect, the embodiment of the present application provides the third of the third aspect
Possible embodiment, wherein the pre-training module is specifically used for using following manner according to the coding characteristic vector sum
The decoding feature vector carries out epicycle training to the first nerves network and the nervus opticus network:
Epicycle is not completed to any one first sample user in the first sample user of training as target also
First sample user decodes feature vector according to the coding characteristic vector sum of target first sample user, determines
Loss of the target first sample user in epicycle;
According to the target first sample user in the loss of epicycle, the first nerves network and described are adjusted
The parameter of two neural networks;
Using the target first sample user as the first sample user for completing training, and instruction will not be also completed when front-wheel
Other any one first sample users are as new target first sample user in experienced first sample user;
Using the first nerves network and the nervus opticus network after parameter is had adjusted, the new target is obtained
The coding characteristic vector sum of one sample of users decodes feature vector, and returns to according to target first sample user
Feature vector is decoded described in coding characteristic vector sum, determines the target first sample user the loss of epicycle the step of;
Until all first sample users complete the training when front-wheel, complete to the first nerves network with
The epicycle training of the nervus opticus network.
In conjunction with the third aspect, the embodiment of the present application provides the 4th kind of possible embodiment of the third aspect, wherein institute
Training module is stated, is specifically used for:
The second feature vector is input to the first nerves network and classifier by pre-training, described in acquisition
The fraud recognition result of second sample of users;And
According to the fraud recognition result of each second sample of users and the fraud mark of second sample of users
Information is infused, to the first nerves network and classifier progress epicycle Training Jing Guo pre-training;
By carrying out more wheel Trainings to the first nerves network and the classifier, the fraud identification is obtained
Model.
In conjunction with the 4th kind of possible embodiment of the third aspect, the embodiment of the present application provides the 5th kind of the third aspect
Possible embodiment, wherein the training module is specifically used for using following manner according to each second sample of users
Fraud recognition result and second sample of users the fraud markup information, to by pre-training it is described first mind
Epicycle Training is carried out through network and the classifier:
Epicycle is not completed to any one second sample of users in second sample of users of training as target also
Second sample of users, according to the fraud recognition result of second sample of users of target and the institute of second sample of users of target
Fraud markup information is stated, determines second sample of users of target in the intersection entropy loss of epicycle;
According to second sample of users of target epicycle the intersection entropy loss, adjust the first nerves network and
The parameter of the classifier;
Using second sample of users of target as the second sample of users for completing training, and instruction will not be also completed when front-wheel
Any one other second sample of users are as new the second sample of users of target in the second experienced sample of users,
Using the first nerves network and the classifier after parameter is had adjusted, new second sample of target is obtained
The fraud recognition result of user, and return to fraud recognition result and the target according to the second sample of users of the target
The fraud markup information of second sample of users determines second sample of users of target in the step of the intersection entropy loss of epicycle
Suddenly;
Until all second sample of users all complete the training when front-wheel, complete to the first nerves Jing Guo pre-training
Network and the classifier carry out epicycle Training.
In conjunction with the 5th kind of possible embodiment of the third aspect, the embodiment of the present application provides the 6th kind of the third aspect
Possible embodiment, wherein the training module, complete to Jing Guo pre-training the first nerves network and described point
After class device carries out epicycle Training, it is also used to:
Whether detection epicycle reaches default wheel number;If it is, stopping to the first nerves network and the classifier
Training, the first nerves network and the classifier that last training in rotation is got are as the fraud identification model;
Alternatively,
The first nerves network and the classifier obtained using test set to epicycle is verified;If the test
It concentrates, intersects item number of the entropy loss no more than the preset test data for intersecting entropy loss threshold value, occupy and surveyed in the test set
The percentage of data total number is tried, preset first percentage threshold is greater than, then is stopped to the first nerves network and described
The training of classifier is identified using the first nerves network that last training in rotation is got and the classifier as the fraud
Model;
Alternatively,
Successively by the intersection entropy loss of each second sample of users of epicycle, the second sample of users corresponding with previous round
Intersect entropy loss to be compared;If the intersection entropy loss of the second sample of users described in epicycle is greater than previous round, corresponding second sample is used
The quantity of second sample of users of the intersection entropy loss at family, the percentage for occupying all second sample of users quantity reach preset
Second percentage threshold then stops the training to the first nerves network and the classifier, and last round of training is obtained
The first nerves network and the classifier as the fraud identification model.
Fourth aspect, the embodiment of the present application provide a kind of fraud identification device, comprising:
Second obtains module, for when operation behavior occurs for user to be detected, obtaining the history behaviour of the user to be detected
Make information;
Second building module, for the historical operation information according to the user to be detected, building can be used in described in characterization
The target feature vector of user's operation behavioural characteristic to be detected;
It cheats recognition result and obtains module, for being input to the target feature vector by first aspect and first
Fraud identification model described in any one of the possible embodiment of the first of aspect to the 6th kind of possible embodiment
In the fraud identification model that training method training obtains, the operation behavior for obtaining the user to be detected is the general of fraud
Rate.
The embodiment of the present application uses the first eigenvector of first sample user to the first nerves net of symmetrical configuration first
Network and nervus opticus network carry out unsupervised training, and first nerves network can encode first eigenvector, and second
Neural network can be decoded the first eigenvector by coding, during coding and decoding, so that the first mind
Through e-learning to the feature of each first sample user;Then the second feature vector sum fraud mark of the second sample of users is reused
Note information carries out the training for having supervision to first nerves network and classifier, further to by pre- instruction by way of having supervision
Experienced first nerves network is adjusted, and to promote the precision of first nerves network, and completes the training to classifier, final
The fraud identification model satisfied the use demand to precision manually marks sample data during model training to reduce
While the workload of note, it can be improved the efficiency of model training and cheat the recognition accuracy of identification model.
To enable the above objects, features, and advantages of the application to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate
Appended attached drawing, is described in detail below.
Detailed description of the invention
Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to needed in the embodiment attached
Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair
The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this
A little attached drawings obtain other relevant attached drawings.
Fig. 1 shows a kind of flow chart for cheating identification model training method provided by the embodiment of the present application;
Fig. 2 shows provided by the embodiment of the present application fraud identification model training method in, the stream of construction feature vector
Cheng Tu;
Fig. 3 is shown in fraud identification model training method provided by the embodiment of the present application, the tool of construction feature vector
Body flow chart;
Fig. 4 shows a kind of structural representation of first nerves network and nervus opticus network provided by the embodiment of the present application
Figure;
Fig. 5 is shown in fraud identification model training method provided by the embodiment of the present application, to first nerves network into
The flow chart of the unsupervised pre-training of row;
Fig. 6 is shown in fraud identification model training method provided by the embodiment of the present application, to first nerves network and
Nervus opticus network carries out the flow chart of epicycle training;
Fig. 7 shows a kind of structural schematic diagram of first nerves network and classifier provided by the embodiment of the present application;
Fig. 8 is shown in fraud identification model training method provided by the embodiment of the present application, to the by pre-training
One neural network and classifier carry out the flow chart of Training;
Fig. 9 is shown in fraud identification model training method provided by the embodiment of the present application, to the by pre-training
One neural network and classifier carry out the flow chart of epicycle Training;
Figure 10 shows the flow chart of fraud recognition methods provided by the embodiment of the present application;
Figure 11 shows the structural schematic diagram of fraud identification model training device provided by the embodiment of the present application;
Figure 12 shows the structural schematic diagram of fraud identification device provided by the embodiment of the present application;
Figure 13 shows the structural schematic diagram of fraud identifying system provided by the embodiment of the present application;
Figure 14 shows the application process schematic diagram of fraud identifying system provided by the embodiment of the present application;
Figure 15 shows a kind of structural schematic diagram of computer equipment provided by the embodiment of the present application;
Figure 16 shows the structural schematic diagram of another kind computer equipment provided by the embodiment of the present application.
Specific embodiment
To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application
Middle attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only
It is some embodiments of the present application, instead of all the embodiments.The application being usually described and illustrated herein in the accompanying drawings is real
The component for applying example can be arranged and be designed with a variety of different configurations.Therefore, below to the application's provided in the accompanying drawings
The detailed description of embodiment is not intended to limit claimed scope of the present application, but is merely representative of the selected reality of the application
Apply example.Based on embodiments herein, those skilled in the art institute obtained without making creative work
There are other embodiments, shall fall in the protection scope of this application.
Currently in order to the safety of enhancing e-bank, has the training method of supervision to machine using traditional in the prior art
When device learning model is trained, there is a problem of that low efficiency and recognition accuracy are lower, is based on this, one kind provided by the present application
Identification model training method, fraud recognition methods and device are cheated, can be had by the sample data largely without mark and on a small quantity
The sample data training of mark obtains fraud identification model, and reduction manually marks sample data during model training
Workload while, can be improved model training efficiency and cheat identification model recognition accuracy.
To be instructed to a kind of fraud identification model disclosed in the embodiment of the present application first convenient for understanding the present embodiment
Practice method to describe in detail.
Shown in Figure 1, fraud identification model training method includes S101~S104 provided by the embodiment of the present application:
S101: the historical operation information of multiple first sample users is obtained;And obtain going through for multiple second sample of users
History operation information and the corresponding fraud markup information of each second sample of users.
Here it can be seen that first sample user is the sample of users of no mark, the second sample of users is the sample for having mark
This user, illustratively, the quantity of the quantity of the first sample user of no mark and the second sample of users for having mark can phase
Together, it can also be different.
If the quantity of the first sample user of no mark is greater than the quantity for having the second sample of users of mark, it is being based on first
When the historical operation information of sample of users carries out pre-training to first nerves network, the ginseng of first nerves network is enabled to
The near-final training result of number, then using the historical operation information of the second sample of users to the first nerves Jing Guo pre-training
Network carries out the training for having supervision, so that the parameter at first nerves network is adjusted, obtains training result.
The process for carrying out pre-training to first nerves network in the historical operation information based on first sample user, can see
During making Training, the process of initial assignment is carried out to first nerves network, since the process enables to the first mind
Initial value through network during Training, compared with random assignment in the prior art, closer to model training as a result,
Therefore the instruction for having supervision is carried out to the first nerves network Jing Guo pre-training in the historical operation information using the second sample of users
When white silk, training result can be faster obtained.Since the process of unsupervised training is simpler than Training process, into one
Walk the training process accelerated.
Under a kind of possible scene, when whether judgement sample user occurs fraud, need according to the sample
The historical operation information of user whithin a period of time carries out comprehensive descision, can not only be sentenced by once-through operation information merely
It is disconnected, and the result that fraud whether occurs generally requires to complete just know after a period of time in operation in sample of users,
Such as judge whether victim occur after a period of time.Therefore need to obtain the second sample of users in the first historical time section
Whether interior historical operation information and each second sample of users the fraud mark of fraud occurs in the second historical time section
Infuse information.
Optionally, historical operation information may come from different banking channels, for example, banking channel is at least
Including selling bank, wechat bank, quick payment, Mobile banking, the Internet bank etc. directly to households.
Illustratively, historical operation may include a variety of different types of operations, such as fundamental operation and business operation.Its
In, fundamental operation includes registering and logging, this is because any business operation process in any banking channel is all certain
Basis and the premise of other operations can be regarded as comprising the two operations, the two operations;Business operation may include turning
Account modifies transfer accounts limit, payment, enchashment etc., and business operation, may according to the request of different user in different bank business channel
Different service logic and operating characteristics are had, the purpose of user's operation request has directly been reflected.
Illustratively, historical operation information is the information of various historical operations, such as registering the information of operation includes in 7 days
Facility registration account number, same facility registration uses cell-phone number quantity etc. in 1 day;The information of registration operation includes same in 1 day
Equipment login account quantity, whether non-commonly used equipment log in etc., the information of transfer operation includes whether single transfer amounts big
Whether in 100,000, collecting account in blacklist etc., more examples are referring to shown in table 1-1, table 1-2, table 1-3, table 1-4, table 1-5
Out.
S102: according to the historical operation information of first sample user, building can be used in characterizing first sample user's operation
The first eigenvector of behavioural characteristic;And the historical operation information according to the second sample of users, building can be used in characterization the
The second feature vector of two sample of users operation behavior features.
It should be noted that be unfavorable for computer since historical operation information form is lack of standardization and carry out automatic processing, and
The vectorization of data, which can be, is converted into the consistent form for being convenient to computer disposal of format nonstandard data.Therefore,
Need to construct the feature vector that can be used in characterizing sample of users operation behavior feature according to historical operation information.
Shown in Figure 2 when specific implementation, the embodiment of the present application is based on following manner construction feature vector:
S201: be directed to each first sample user, according to the historical operation information of first sample user, determine this first
Characteristic value of the sample of users under multiple predetermined registration operation behavioural characteristics.
S202: according to characteristic value of the first sample user under multiple predetermined registration operation behavioural characteristics, building can be used in
Characterize the first eigenvector of first sample user's operation behavioural characteristic.
S203: be directed to each second sample of users, according to the historical operation information of second sample of users, determine this second
Characteristic value of the sample of users under multiple predetermined registration operation behavioural characteristics.
S204: according to characteristic value of second sample of users under multiple predetermined registration operation behavioural characteristics, building can be used in
Characterize the second feature vector of the second sample of users operation behavior feature.
Wherein, the sequencing that step S201 and step S203 are not carried out.
For step S201 and S203 when specific implementation, the embodiment of the present application determines that first sample is used based on following manner
Characteristic value of family/second sample of users under multiple predetermined registration operation behavioural characteristics:
It herein, include multiple predetermined registration operation behavioural characteristics, the embodiment of the present application in the historical operation information of sample of users
Provide a specific embodiment, referring to shown in table 1-1, table 1-2, table 1-3, table 1-4, table 1-5, show fundamental operation and
The characteristic value classification of a variety of predetermined registration operation behavioural characteristics and each predetermined registration operation behavioural characteristic that business operation includes.Wherein, right
It is indicated in numerical characteristics its corresponding numerical value that then be used directly, and then uses the coding of hot solely (one-hot) for category feature
The vector of corresponding one 0,1 composition of mode, i.e. each category feature, classification number correspond to the dimension of vector, i.e. a classification pair
The one-dimensional of vector is answered, when the predetermined registration operation behavioural characteristic is a certain classification, the corresponding vector position of the category takes 1, other portions
Divide and then all sets 0.Such as predetermined registration operation behavioural characteristic " whether being tampered when facility registration " includes two classes, respectively " is usurped
Change " and " being not tampered with ", then the predetermined registration operation behavioural characteristic " whether being tampered when facility registration " is solely encoded using two heat
Mode, it is assumed that " being tampered " is " 10 ", and " being not tampered with " is then " 01 ".
Table 1-1 registration operation predetermined registration operation behavioural characteristic
Predetermined registration operation behavioural characteristic | Characteristic value classification |
Whether it is tampered when facility registration | Two-value category feature |
Whether simulator is used when facility registration | Two-value category feature |
Whether escape from prison when facility registration | Two-value category feature |
In 1 day, facility registration account number | Numerical characteristics |
In 7 days, facility registration account number | Numerical characteristics |
In 1 day, same facility registration uses cell-phone number quantity | Numerical characteristics |
In 7 days, same facility registration uses cell-phone number quantity | Numerical characteristics |
Same cell-phone number attempts times of registration in 1 day | Numerical characteristics |
In 1 day, the same registration cell-phone number uses IP quantity | Numerical characteristics |
In 7 days, IP quantity when the same registration cell-phone number | Numerical characteristics |
In 1 day, the same IP register account number number | Numerical characteristics |
In 7 days, same IP register account number number | Numerical characteristics |
Table 1-2 register predetermined registration operation behavioural characteristic
Table 1-3 transfer operation predetermined registration operation behavioural characteristic
Predetermined registration operation behavioural characteristic | Characteristic value classification |
Whether collecting account is in blacklist | Two-value category feature |
Whether transfer accounts in sensitization time | Two-value category feature |
The current transfer amounts of account account for the percentage of 6 months whole transfer amounts | Numerical characteristics |
In 1 hour, same account transfer number | Numerical characteristics |
Whether single transfer amounts are greater than 100,000 | Two-value category feature |
In 1 day, account summary transfer amounts | Numerical characteristics |
In 1 day, account trading password errors number | Numerical characteristics |
User transfers accounts number to personal account | Numerical characteristics |
Table 1-4 payment operation predetermined registration operation behavioural characteristic
Table 1-5 consumption operation predetermined registration operation behavioural characteristic
Predetermined registration operation behavioural characteristic | Characteristic value classification |
Whether First Consumption card one week in is opened | Two-value category feature |
Whether consumed in sensitization time | Two-value category feature |
In 1 hour, the long-distance consuming number of account | Numerical characteristics |
In 1 hour, account consumes total degree | Numerical characteristics |
In 1 day, user's cumulative consumption amount of money | Numerical characteristics |
In 1 day, account summary transfer amounts | Numerical characteristics |
In 1 day, the account long-distance consuming amount of money | Numerical characteristics |
In 1 day, customer transaction password errors number | Numerical characteristics |
Determine first sample user/second sample of users under multiple predetermined registration operation behavioural characteristics through the above way
Shown in Figure 3 after characteristic value, for step S202 and S204 when specific implementation, the embodiment of the present application is based on following sides
Formula constructs first eigenvector/second feature vector:
S301: according to characteristic value of first sample user/second sample of users under multiple predetermined registration operation behavioural characteristics,
Form the initial first eigenvector of first sample user/second sample of users/initial second feature vector.
S302: data cleansing is carried out to initial first eigenvector/initial second feature vector, obtains cleaning fisrt feature
Vector/cleaning second feature vector.
Herein, because historical operation information data are likely to occur mistake and are lost during acquisition and transmission, step
The effect of S302 is to remove the characteristic value of feature distribution exception and carry out to the predetermined registration operation behavioural characteristic of those values that lack in individuality
Characteristic value filling processing.
When specific implementation, it is alternatively possible to which it is different to be purged feature distribution using isolated forest (IForest) model
Normal characteristic value.Herein, isolating forest model for exception definition is " to be easy outlier (the more likely to be isolated
Separated) ", it can be understood as sparse and high from the density farther away point of group of distribution.Isolated forest model is by many
Random decision tree composition, each decision tree are all random from all characteristic sets choose when dividing leaf node
Select target signature and the sort operation of the threshold value progress node in random selection target signature.After generating one tree, often
One primary data sample all can uniquely correspond to a leaf node in tree, and often leaf node corresponding to exceptional sample
The number of plies it is relatively high.
Optionally, when the predetermined registration operation behavioural characteristic to those values that lack in individuality carries out characteristic value filling processing,
It, can be directly by all sample of users when the characteristic value classification of the predetermined registration operation behavioural characteristic for the value that lacks in individuality is category feature
Historical operation information data in the most classification of frequency of occurrence corresponding with the predetermined registration operation behavioural characteristic as the default behaviour
Make the characteristic value of behavioural characteristic;It, can when the characteristic value classification of the predetermined registration operation behavioural characteristic for the value that lacks in individuality is numerical characteristics
Directly by all characteristic values corresponding with the predetermined registration operation behavioural characteristic in the historical operation information data of all sample of users
Characteristic value of the average value as the predetermined registration operation behavioural characteristic.
S303: data enhancing is carried out to cleaning first eigenvector/cleaning second feature vector.
Under normal circumstances, the quantity of positive sample (there is no the samples of fraud) will be far more than negative sample (hair
The sample of raw fraud) quantity, i.e., positive and negative sample size is very unbalanced, and unbalanced sample can come to the training band of model
Very big difficulty.Therefore it needs to carry out data enhancing to cleaning first eigenvector/cleaning second feature vector.
When specific implementation, it is alternatively possible to using synthesis minority class oversampling technique (Synthetic Minority
Oversampling Technique, SMOTE) to cleaning first eigenvector/cleaning second feature vector of negative sample user
Carry out expansion processing, SMOTE algorithm by the cleaning first eigenvector of all negative sample users/cleaning second feature vector,
It is mapped in feature space, then each cleaning first eigenvector/cleaning second feature vector can correspond in the space
A point, a point in each arbitrary point line as newly-generated negative sample user cleaning first eigenvector/it is clear
Wash second feature vector, aforesaid operations are repeated then can be generated the cleaning fisrt feature of any number of negative sample user to
Amount/cleaning second feature vector finally controls cleaning first eigenvector/cleaning second feature of newly-generated negative sample user
The cleaning first eigenvector of vector sum positive sample user/cleaning second feature vector, between ratio reach default ratio, example
Such as default ratio can be 1:3 or 1:4.
S304: Data Dimensionality Reduction is carried out to cleaning first eigenvector/cleaning second feature vector, obtains dimensionality reduction fisrt feature
Vector/dimensionality reduction second feature vector.
Herein, carry out Data Dimensionality Reduction can significance level in cleaning first eigenvector/cleaning second feature vector compared with
Low characteristic value is removed, and is conducive to the promotion of model training speed and the raising of model recognition accuracy in this way.
When specific implementation, it is alternatively possible to using principal component analysis (Principal Component
Analysis, PCA) method carries out Data Dimensionality Reduction to cleaning first eigenvector/cleaning second feature vector.PCA method is to original
The feature of beginning carries out linear transformation, and original high dimensional feature is mapped to the feature of low-dimensional, makes between the feature after converting
The degree of correlation is lower, more can reflect the essential information of data.
S305: data normalization operation is carried out to dimensionality reduction first eigenvector/dimensionality reduction second feature vector, is obtained final
First eigenvector/second feature vector.
Herein, the purpose of data normalization operation is that the characteristic value of each predetermined registration operation behavioural characteristic is mapped to one
Identical range does so the dimension impact that can be eliminated between different predetermined registration operation behavioural characteristics, can be more conducive to model
Training.
When specific implementation, it is alternatively possible to using (0,1) standardized way, i.e., by all predetermined registration operation behaviors
Feature is all converted to the normal data that mean value is 0, variance is 1.
It is provided by the embodiments of the present application after obtaining final first eigenvector/second feature vector through the above steps
Cheating identification model training method further includes following S103 and S104:
S103: first eigenvector is input to the first nerves network and nervus opticus network of symmetrical configuration, to first
Neural network carries out unsupervised pre-training.
Wherein, first nerves network is for encoding first eigenvector;Nervus opticus network is used for by compiling
First eigenvector after code is decoded.
Illustratively, shown in Figure 4, the embodiment of the present application provides a kind of first nerves network and nervus opticus network
Structural schematic diagram.Figure 4, it is seen that L1 layers, L2 layers constitute first nerves network with L3 layers, L3 layers, L4 layers and L5
Layer constitutes nervus opticus network.Wherein, L1 layers and L5 layers of symmetrical configuration, L2 layers a with L4 layers of symmetrical configuration namely neuron
Number is identical.L1 layers, L2 layers be coding layer, L4 layer, L5 layers be decoding layer, the L3 layers of output layer as first nerves network, simultaneously
Input layer as nervus opticus network.From L1 layers to L3 layers, neuron number is gradually decreased, and can be reduced by a certain percentage,
Such as from L1 layers to L3 layers, neuron number is respectively 2m, m and m/2, can also be reduced with not to scale (NTS);From L3 layers to L5 layers, mind
Gradually increase, can increase by a certain percentage through first number, such as from L3 layers to L5 layers, neuron number is respectively m/2, m and
2m can also be increased with not to scale (NTS).Wherein, the L1 layers of input layer as entire neural network model is used for input step S102
Obtained first eigenvector.Optionally, L1 layers to L5 layers of each layer is all full articulamentum, can sufficiently learn fisrt feature
The essential information of vector guarantees the accuracy of model training.
It is appreciated that the structure of first nerves network and nervus opticus network in Fig. 4 is only exemplary, the first mind
It can also increase through network and the nervus opticus network number of plies, such as first nerves network includes L1 layers, L2 layers, L3 layers and L4 layers,
Correspondingly, nervus opticus network includes L4 layers, L5 layers, L6 layers and L7 layers, wherein L1 layers and L7 layers of symmetrical configuration, L2 layers and L6
Layer symmetrical configuration, L3 layers and L5 layers of symmetrical configuration, the input of the L4 layers of output layer and nervus opticus network for first nerves network
Layer.
The effect of first nerves network is encoded to first eigenvector, carries out eigentransformation to first eigenvector
With the compression of characteristic dimension, it is therefore an objective to remove the noise in first eigenvector, extract reaction first sample user's operation row
The most essential information being characterized;The effect of nervus opticus network is decoded to the first eigenvector after coding, also
Original goes out the compressed first eigenvector of first nerves network code.
Shown in Figure 5 when specific implementation, the embodiment of the present application specifically uses following manner to first nerves network
Carry out unsupervised pre-training:
S501: first eigenvector is input in first nerves network, obtains at least one of first nerves network
The coding characteristic vector of target encoding layer output.
Optionally, target encoding layer can be all coding layers, be also possible to specified code segment layer.If the
One neural network includes L1 layers, L2 layers, L3 layers and L4 layers, correspondingly, nervus opticus network includes L4 layers, L5 layers, L6 layers and L7
Layer, wherein L1 layers and L7 layers of symmetrical configuration, L2 layers and L6 layers of symmetrical configuration, L3 layers and L5 layers of symmetrical configuration, L4 layers are the first mind
The input layer of output layer and nervus opticus network through network.Target encoding layer can be all coding layers, i.e. coding layer L1
Layer, L2 layers, L3 layers, target encoding layer may be specified code segment layer, i.e., its in coding layer L1 layers, L2 layers, L3 layers
Middle a part of coding layer.
S502: the coding characteristic vector of the last layer coding layer output in first nerves network is input to nervus opticus
In network, the decoding feature vector of target decoder layer output corresponding with target encoding layer in nervus opticus network is obtained.
Herein, target decoder layer is corresponding with target encoding layer, such as in Fig. 4, and L1 layers corresponding with L5 layers, and L2 layers right with L4 layers
It answers.
S503: feature vector is decoded according to coding characteristic vector sum, first nerves network and nervus opticus network are carried out
Epicycle training.
During a wheel training, according to the loss function of network model, network is carried out in the way of gradient decline
Training.When specific implementation, optionally, first using the coding characteristic vector sum of first sample user decode feature to
The loss for measuring first sample user to first nerves network and nervus opticus network adjusting parameter and then uses tune
First nerves network and nervus opticus network after whole parameter obtain the loss of next first sample user, then to first
Neural network and nervus opticus network readjust parameter, until the coding characteristic vector sum using all first sample users
After feature vector completion is decoded to first nerves network and nervus opticus network training, complete to first nerves network and second
The epicycle training of neural network.
Illustratively, when target encoding layer is L1 layers, L2 layers, L3 layers, and target decoder layer is L5 layers, L6 layers, L7 layers,
In, L1 layers are corresponding with L7 layers, and L2 layers are corresponding with L6 layers, and L3 layers are corresponding with L5 layers, and first nerves network and nervus opticus network are whole
The loss function of a unsupervised network is shown in formula (1):
Formula (1):
Wherein, XL1For L1 layers of coding characteristic vector, XL7For L7 layers of decoding feature vector, i is i-th of target first
Sample of users, n are the quantity of first sample user, and γ and β are a floating point values of the value between [0,1], are illustrated each
The weight of group intermediate objective coding layer and intermediate objective decoding layer.
It is shown in Figure 6 when specific implementation, the embodiment of the present application specifically use following manner to first nerves network and
Nervus opticus network carries out epicycle training:
S601: epicycle is not completed to any one first sample user in the first sample user of training as target also
First sample user decodes feature vector according to the coding characteristic vector sum of target first sample user, determines the target the
Loss of one sample of users in epicycle.
S602: according to target first sample user in the loss of epicycle, first nerves network and nervus opticus network are adjusted
Parameter.
S603: it using target first sample user as the first sample user for completing training, and will not also be completed when front-wheel
Other any one first sample users are as new target first sample user in trained first sample user.
S604: using first nerves network and nervus opticus network after parameter is had adjusted, the new target first is obtained
The coding characteristic vector sum of sample of users decodes feature vector, and returns to special according to the coding of target first sample user
It levies vector sum and decodes feature vector, determine the target first sample user the loss of epicycle the step of.
S605: until all first sample users complete the training when front-wheel, completion is to first nerves network and the
The epicycle training of two neural networks.
S504: it by carrying out more wheel training to first nerves network and nervus opticus network, completes to first nerves network
Unsupervised pre-training.
Unsupervised pre- instruction is carried out to first nerves network using the largely first sample user without label through the above way
After white silk, the characteristic information of a large amount of first sample user is learnt in first nerves network, the embodiment of the present application passes through following
Step S104 carries out the training for having supervision, energy to first nerves network and classifier using there is the second sample of users of label on a small quantity
Trained first nerves network and classifier are accessed, as fraud identification model.
S104: being input to first nerves network and classifier by pre-training for second feature vector, special based on second
The corresponding fraud markup information of vector sum is levied, to the first nerves network and classifier progress Training Jing Guo pre-training,
Obtain fraud identification model.
Wherein, the first nerves network by pre-training is used to carry out feature learning to second feature vector;Classifier is used
Classify in second feature vector.Such as can be using sigmoid function as classifier, it can be with by sigmoid function
The operation behavior for obtaining the second sample of users is the probability of fraud.
Illustratively, shown in Figure 7, the embodiment of the present application provides the structure of a kind of first nerves network and classifier
Schematic diagram, L1 layers, L2 layers, L3 layers are first nerves network, and L6 layers are classifier.
Shown in Figure 8 when specific implementation, the embodiment of the present application specifically uses following manner to by pre-training
First nerves network and classifier carry out Training:
S801: second feature vector is input to first nerves network and classifier by pre-training, obtains the second sample
The fraud recognition result of this user.
S802: it is marked according to the fraud of the fraud recognition result of each second sample of users and second sample of users
Information, to the first nerves network and classifier progress epicycle Training Jing Guo pre-training.
During a wheel training, according to the loss function of network model, network is carried out in the way of gradient decline
Training.When specific implementation, optionally, the intersection entropy loss of second sample of users is obtained first, to first nerves net
Network and classifier adjusting parameter and then next the is obtained using the first nerves network and classifier that have adjusted after parameter
The intersection entropy loss of two sample of users, then to first nerves network and classifier adjusting parameter, until passing through the second all samples
This user's intersects training of the entropy loss completion to first nerves network and classifier, completes to first nerves network and classifier
Epicycle training.
Specifically, the cross entropy loss function for entirely having supervision network of first nerves network and classifier composition is formula
(2) shown in:
Formula (2):
Wherein, xiFor the vector value of i-th of second sample of users, yiLetter is marked for the fraud of i-th of second sample of users
Breath, m are the quantity of the second sample of users, σ (xi) it is sigmoid function.General yi0 or 1 is taken, for example, working as yiIndicating for 1 should
Cheating markup information is fraud, and 0 indicates that the fraud markup information is normal.
Shown in Figure 9 when specific implementation, the embodiment of the present application specifically uses following manner to by pre-training
First nerves network and classifier carry out epicycle Training:
S901: epicycle is not completed to any one second sample of users in the second sample of users of training as target also
Second sample of users, according to taking advantage of for the fraud recognition result of second sample of users of target and second sample of users of target
Markup information is cheated, determines the second sample of users of target in the intersection entropy loss of epicycle.
S902: according to the second sample of users of target in the intersection entropy loss of epicycle, first nerves network and classifier are adjusted
Parameter.
S903: it using the second sample of users of target as the second sample of users for completing training, and will not also be completed when front-wheel
Any one other second sample of users are as new the second sample of users of target in the second trained sample of users.
S904: it using first nerves network and classifier after parameter is had adjusted, obtains new second sample of target and uses
The fraud recognition result at family, and return to according to the fraud recognition result of the second sample of users of the target and the target
The fraud markup information of two sample of users determines the second sample of users of target the intersection entropy loss of epicycle the step of.
S905: it until all second sample of users all complete the training when front-wheel, completes to the first mind Jing Guo pre-training
Epicycle Training is carried out through network and classifier.
S803: by carrying out more wheel Trainings to first nerves network and classifier, fraud identification model is obtained.
Optionally, after the epicycle training to first nerves network and classifier is completed by step S905, the application is implemented
Example can detect whether deconditioning by any one method in following three kinds of modes:
Mode one: whether detection epicycle reaches default wheel number;If it is, stopping to first nerves network and classifier
Training, the first nerves network and classifier that last training in rotation is got are as fraud identification model.
When specific implementation, in model training, a trained default wheel number can be preset, if detecting this
Wheel reaches default wheel number, then stops the training to first nerves network and classifier, the first mind that last training in rotation is got
Through network and classifier as fraud identification model.
Mode two: the first nerves network and classifier obtained using test set to epicycle is verified;If in test set,
Intersect item number of the entropy loss no more than the preset test data for intersecting entropy loss threshold value, occupies the total item of test data in test set
Several percentage is greater than preset first percentage threshold, then stops the training to first nerves network and classifier, will be last
The first nerves network and classifier that one training in rotation is got are as fraud identification model.
During model training, it is required that the value for intersecting entropy loss is gradually reduced, therefore, using test set to this
When the first nerves network and classifier that wheel obtains are verified, if intersecting entropy loss in test set and being not more than preset friendship
The item number for pitching the test data of entropy loss threshold value, reaches certain predetermined ratio, for example, 90%, 95% etc., then stop to the first mind
Training through network and classifier, the first nerves network and classifier that last training in rotation is got identify mould as fraud
Type.
Mode three: successively by the intersection entropy loss of each second sample of users of epicycle, the second sample corresponding with previous round is used
The intersection entropy loss at family is compared;If the intersection entropy loss of the second sample of users of epicycle is greater than previous round, corresponding second sample is used
The quantity of second sample of users of the intersection entropy loss at family, the percentage for occupying all second sample of users quantity reach preset
Second percentage threshold then stops the training to first nerves network and classifier, and the first mind that last round of training is obtained
Through network and classifier as fraud identification model;
During model training, it is required that the value that the value for intersecting entropy loss is gradually reduced, and will intersect entropy loss
The first nerves network and classifier obtained when minimum is as fraud identification model.
A kind of fraud identification model training method provided by the embodiments of the present application uses the first of first sample user first
Feature vector carries out unsupervised training, first nerves network energy to the first nerves network and nervus opticus network of symmetrical configuration
Enough to encode to first eigenvector, nervus opticus network can be decoded the first eigenvector by coding,
During coding and decoding, so that feature of the first nerves e-learning to each first sample user;Then second is reused
The second feature vector sum fraud markup information of sample of users carries out the training for having supervision to first nerves network and classifier, into
One step is adjusted the first nerves network Jing Guo pre-training by way of having supervision, to promote the essence of first nerves network
Degree, and the training to classifier is completed, the fraud identification model that precision satisfies the use demand is finally obtained, to reduce in model
While the workload manually marked in training process to sample data, the efficiency and fraud of model training can be improved
The recognition accuracy of identification model.
Shown in Figure 10, the embodiment of the present application also provides a kind of fraud recognition methods, comprising:
S1001: when operation behavior occurs for user to be detected, the historical operation information of the user to be detected is obtained.
S1002: according to the historical operation information of the user to be detected, building can be used in characterizing user's operation row to be detected
The target feature vector being characterized.
When specific implementation, with reference to the method in the application in step S102, building can be used in characterizing use to be detected
The target feature vector of family operation behavior feature.
S1003: by target feature vector, it is input to what fraud identification model training method training provided by the present application obtained
It cheats in identification model, the operation behavior for obtaining user to be detected is the probability of fraud.
For example, when model training, it is to be checked when obtaining when fraud markup information indicates fraud, indicates normal using 0 using 1
When the operation behavior of survey user is that the probability of fraud is greater than predetermined probabilities threshold value, indicate that the operation behavior of user to be detected is
Fraud is indicated when the probability that the operation behavior for obtaining user to be detected is fraud is not more than predetermined probabilities threshold value
The operation behavior of user to be detected is normal behaviour.Such as predetermined probabilities threshold value can take 0.5,0.6 etc..
The embodiment of the present application is when detecting user's generation operation behavior to be detected is fraud, then to the current of user
Operation behavior, which executes, intercepts operation, and this intercept information and all and user's phase of banking channel internal record
Historical operation information of the historical operation information of pass as new the second sample of users for having label, saves to special database
In.When detecting user's generation operation behavior to be detected is normal behaviour, then the operation requests of user is passed through into information and forwarded
To the practical business system of banking channel, the operation requests of user are normally handled.
A kind of fraud recognition methods provided by the embodiments of the present application uses first when cheating identification model training
The first eigenvector of first sample user carries out the first nerves network and nervus opticus network of symmetrical configuration unsupervised
Training, first nerves network can encode first eigenvector, and nervus opticus network can be to by the first of coding
Feature vector is decoded, during coding and decoding, so that first nerves e-learning is to each first sample user's
Feature;Then the second feature vector sum fraud markup information of the second sample of users is reused to first nerves network and classifier
The training for having supervision is carried out, further the first nerves network Jing Guo pre-training is adjusted by way of having supervision, with
The precision of first nerves network is promoted, and completes the training to classifier, the fraud that precision satisfies the use demand is finally obtained and knows
Other model, to can be improved while reducing the workload manually marked during model training to sample data
The efficiency of model training and the recognition accuracy for cheating identification model.
Conceived based on same application, additionally provide in the embodiment of the present application and cheats that identification model training method is corresponding to take advantage of
Cheat identification model training device, the principle solved the problems, such as due to the device in the embodiment of the present application with the embodiment of the present application is above-mentioned takes advantage of
It is similar to cheat identification model training method, therefore the implementation of device may refer to the implementation of method, overlaps will not be repeated.
It is shown in Figure 11, identification model training device is cheated provided by the embodiment of the present application, comprising:
First obtains module 111, for obtaining the historical operation information of multiple first sample users;And obtain multiple the
The historical operation information of two sample of users and the corresponding fraud markup information of each second sample of users;
First building module 112, for the historical operation information according to first sample user, building can be used in characterization the
The first eigenvector of one sample user's operation behavioural characteristic;And the historical operation information according to the second sample of users, building
It can be used in characterizing the second feature vector of the second sample of users operation behavior feature;
Pre-training module 113, for first eigenvector to be input to the first nerves network and the second mind of symmetrical configuration
Through network, unsupervised pre-training is carried out to first nerves network;Wherein, first nerves network is used to carry out first eigenvector
Coding;Nervus opticus network is used to be decoded the first eigenvector after coding;
Training module 114, for second feature vector to be input to first nerves network and classifier by pre-training,
Based on the corresponding fraud markup information of second feature vector sum, to Jing Guo pre-training first nerves network and classifier have
Supervised training obtains fraud identification model.
Optionally, the first building module 112 is specifically used for being directed to each first sample user, be used according to the first sample
The historical operation information at family determines characteristic value of the first sample user under multiple predetermined registration operation behavioural characteristics;
According to characteristic value of the first sample user under multiple predetermined registration operation behavioural characteristics, building can be used in characterization the
The first eigenvector of one sample user's operation behavioural characteristic;And
For each second sample of users, according to the historical operation information of second sample of users, second sample is determined
Characteristic value of the user under multiple predetermined registration operation behavioural characteristics;
According to characteristic value of second sample of users under multiple predetermined registration operation behavioural characteristics, building can be used in characterization the
The second feature vector of two sample of users operation behavior features.
Optionally, pre-training module 113 is obtained specifically for first eigenvector to be input in first nerves network
The coding characteristic vector of at least one target encoding layer output in first nerves network;
The coding characteristic vector of the last layer coding layer output in first nerves network is input to nervus opticus network
In, obtain the decoding feature vector of target decoder layer output corresponding with target encoding layer in nervus opticus network;
Feature vector is decoded according to coding characteristic vector sum, this training in rotation is carried out to first nerves network and nervus opticus network
Practice;
By carrying out more wheel training to first nerves network and nervus opticus network, complete to first nerves network without prison
Superintend and direct pre-training.
Optionally, pre-training module 113 is specifically used for using following manner to first nerves network and nervus opticus network
Carry out epicycle training:
Epicycle is not completed to any one first sample user in the first sample user of training as target first also
Sample of users decodes feature vector according to the coding characteristic vector sum of target first sample user, determines first sample of target
Loss of this user in epicycle;
According to target first sample user in the loss of epicycle, the ginseng of first nerves network and nervus opticus network is adjusted
Number;
Using target first sample user as the first sample user for completing training, and training will not be also completed when front-wheel
Other any one first sample users are as new target first sample user in first sample user;
Using first nerves network and nervus opticus network after parameter is had adjusted, obtains the new target first sample and use
The coding characteristic vector sum at family decodes feature vector, and returns to the coding characteristic vector according to target first sample user
With decoding feature vector, determine the target first sample user the loss of epicycle the step of;
Until all first sample users complete the training when front-wheel, complete to first nerves network and nervus opticus
The epicycle training of network.
Optionally, training module 114, specifically for second feature vector to be input to the first nerves net by pre-training
Network and classifier obtain the fraud recognition result of the second sample of users;
According to the fraud recognition result of each second sample of users and the fraud markup information of second sample of users,
To the first nerves network and classifier progress epicycle Training Jing Guo pre-training;
By carrying out more wheel Trainings to first nerves network and classifier, fraud identification model is obtained.
Optionally, training module 114, specifically for the first nerves network Jing Guo pre-training and being divided using following manner
Class device carries out epicycle Training:
Epicycle is not completed to any one second sample of users in the second sample of users of training as target second also
Sample of users, according to the fraud recognition result of second sample of users of target and the fraud mark of second sample of users of target
Information is infused, determines the second sample of users of target in the intersection entropy loss of epicycle;
According to the second sample of users of target in the intersection entropy loss of epicycle, the ginseng of first nerves network and classifier is adjusted
Number;
Using the second sample of users of target as the second sample of users for completing training, and training will not be also completed when front-wheel
Any one other second sample of users are as new the second sample of users of target in second sample of users,
Using first nerves network and classifier after parameter is had adjusted, taking advantage of for new second sample of users of target is obtained
Recognition result is cheated, and returns to the fraud recognition result and second sample of target according to the second sample of users of the target
The fraud markup information of user determines the second sample of users of target the intersection entropy loss of epicycle the step of;
Until all second sample of users all complete the training when front-wheel, complete to the first nerves network Jing Guo pre-training
Epicycle Training is carried out with classifier.
Optionally, training module 114, specifically for complete to Jing Guo pre-training first nerves network and classifier into
After row epicycle Training,
Whether detection epicycle reaches default wheel number;If it is, stop the training to first nerves network and classifier, it will
The first nerves network and classifier that last training in rotation is got are as fraud identification model;
Alternatively,
The first nerves network and classifier obtained using test set to epicycle is verified;If in test set, cross entropy
Loss occupies hundred of test data total number in test set no more than the item number of the preset test data for intersecting entropy loss threshold value
Divide ratio, be greater than preset first percentage threshold, then stops the training to first nerves network and classifier, by last training in rotation
The first nerves network and classifier got is as fraud identification model;
Alternatively,
Successively by the intersection entropy loss of each second sample of users of epicycle, the second sample of users corresponding with previous round is intersected
Entropy loss is compared;If the intersection of the second sample of users of epicycle intersected entropy loss and be greater than corresponding second sample of users of previous round
The quantity of second sample of users of entropy loss, the percentage for occupying all second sample of users quantity reach preset second percentage
Than threshold value, then stop the training to first nerves network and classifier, and the first nerves network that last round of training is obtained and
Classifier is as fraud identification model.
A kind of fraud identification model training device provided by the embodiments of the present application, when cheating identification model training,
It is carried out first using first nerves network and nervus opticus network of the first eigenvector of first sample user to symmetrical configuration
Unsupervised training, first nerves network can encode first eigenvector, and nervus opticus network can be to by compiling
The first eigenvector of code is decoded, during coding and decoding, so that first nerves e-learning is to each first sample
The feature of this user;Then the second feature vector sum fraud markup information of the second sample of users is reused to first nerves network
The training for having supervision is carried out with classifier, further the first nerves network Jing Guo pre-training is carried out by way of having supervision
Adjustment to promote the precision of first nerves network, and completes the training to classifier, finally obtains what precision satisfied the use demand
Identification model is cheated, thus while reducing the workload manually marked during model training to sample data, energy
It enough improves the efficiency of model training and cheats the recognition accuracy of identification model.
Conceived based on same application, fraud identification corresponding with fraud recognition methods is additionally provided in the embodiment of the present application and is filled
It sets, since the principle that the device in the embodiment of the present application solves the problems, such as is similar to the above-mentioned fraud recognition methods of the embodiment of the present application,
Therefore the implementation of device may refer to the implementation of method, and overlaps will not be repeated.
It is shown in Figure 12, identification device is cheated provided by the embodiment of the present application, comprising:
Second obtains module 121, for obtaining the history of the user to be detected when operation behavior occurs for user to be detected
Operation information;
Second building module 122, for the historical operation information according to the user to be detected, building can be used in characterization to
Detect the target feature vector of user's operation behavioural characteristic;
It cheats recognition result and obtains module 123, for being input to fraud identification provided by the present application for target feature vector
In the fraud identification model that model training method training obtains, the operation behavior for obtaining user to be detected is the general of fraud
Rate.
A kind of fraud identification device provided by the embodiments of the present application uses first when cheating identification model training
The first eigenvector of first sample user carries out the first nerves network and nervus opticus network of symmetrical configuration unsupervised
Training, first nerves network can encode first eigenvector, and nervus opticus network can be to by the first of coding
Feature vector is decoded, during coding and decoding, so that first nerves e-learning is to each first sample user's
Feature;Then the second feature vector sum fraud markup information of the second sample of users is reused to first nerves network and classifier
The training for having supervision is carried out, further the first nerves network Jing Guo pre-training is adjusted by way of having supervision, with
The precision of first nerves network is promoted, and completes the training to classifier, the fraud that precision satisfies the use demand is finally obtained and knows
Other model, to can be improved while reducing the workload manually marked during model training to sample data
The efficiency of model training and the recognition accuracy for cheating identification model.
Shown in Figure 13, the another embodiment of the application also provides a kind of fraud identifying system, comprising: timer 131, such as
Fraud identification model training device 132 provided by the present application and such as fraud identification device 133 provided by the present application;
Timer 131, fraud identification model training device 132 and fraud identification device 133 are sequentially connected;
Identification model training device 132 is cheated, for obtaining fraud identification model;
Timer 131 is timed the new fraud knowledge of reacquisition for periodically triggering fraud identification model training device
Other model;
Cheat identification device 133, for according to fraud identification model training device obtain fraud identification model, obtain to
The operation behavior for detecting user is the probability of fraud.
In the following, shown in Figure 14, the application, which provides a specific embodiment and identifies to fraud provided by the present application, is
The application process of system illustrates:
Here illustrate the application process of banking channel fraud identifying system by taking Mobile banking as an example.
During it can clearly be seen that banking channel is instead cheated in Figure 14, fraud identifying system is core mould
Block, fraud identifying system have docked mobile banking service system, and fraud identifying system comes from mobile banking service system by receiving
Unite transmission user operation behavior, the value-at-risk of operation behavior is assessed and (namely fraud knowledge is carried out to operation behavior
Not, the probability value that operation behavior is fraud is obtained).
If risk evaluation result is fraud operation, risk evaluation result is fed back to mobile banking service system,
Mobile banking service system can carry out interception operation according to the feedback result, and be this intercept information and Mobile banking
Corresponding Mobile banking's data are written as sample data in all historical operation informations relevant to the user of system internal record
In library;
If risk evaluation result is normal operating, user's operation behavior is just transmitted to Mobile banking by instruction
Next operation system is exactly some normal process flows of the user in mobile banking service system.
When banking channels database has accumulated a certain amount of training data within a period of time, (this partial data is come
Source includes identification on line, user feedback, expert's mark or the black production database of other channels) after, it can be by these new training
Data are sent to banking channel fraud identifying system, by training for the regular Boot Model of the timer of the internal system
Journey, and the fraud identification model in more new line guarantee that the fraud identification model on line has advance, it is accurate to improve model identification
Rate.
A kind of fraud identifying system provided by the embodiments of the present application uses first when cheating identification model training
The first eigenvector of first sample user carries out the first nerves network and nervus opticus network of symmetrical configuration unsupervised
Training, first nerves network can encode first eigenvector, and nervus opticus network can be to by the first of coding
Feature vector is decoded, during coding and decoding, so that first nerves e-learning is to each first sample user's
Feature;Then the second feature vector sum fraud markup information of the second sample of users is reused to first nerves network and classifier
The training for having supervision is carried out, further the first nerves network Jing Guo pre-training is adjusted by way of having supervision, with
The precision of first nerves network is promoted, and completes the training to classifier, the fraud that precision satisfies the use demand is finally obtained and knows
Other model, to can be improved while reducing the workload manually marked during model training to sample data
The efficiency of model training and the recognition accuracy for cheating identification model.
The embodiment of the present application also provides a kind of computer readable storage medium, stored on the computer readable storage medium
There is computer program, which executes above-mentioned fraud identification model training method when being run by processor the step of.
Specifically, which can be general storage medium, such as mobile disk, hard disk, on the storage medium
Computer program when being run, above-mentioned fraud identification model training method is able to carry out, so as to by largely without mark
Sample data and have on a small quantity mark sample data training obtain fraud identification model, reduce during model training to sample
While the workload that notebook data is manually marked, it can be improved the efficiency of model training and cheat the identification of identification model
Accuracy rate.
The embodiment of the present application also provides another computer readable storage medium, deposited on the computer readable storage medium
Computer program is contained, the fraud recognition methods in above method embodiment is executed when which is run by processor
Step.
Specifically, which can be general storage medium, such as mobile disk, hard disk, on the storage medium
Computer program when being run, above-mentioned fraud recognition methods is able to carry out, so as to by largely without the sample number of mark
According to have mark on a small quantity sample data training obtain fraud identification model, reduce during model training to sample data into
While the workload of pedestrian's work mark, it can be improved the efficiency of model training and cheat the recognition accuracy of identification model.
Corresponding to the fraud identification model training method in Fig. 1, the embodiment of the present application also provides a kind of computer equipment,
As shown in figure 15, which includes memory 1000, processor 2000 and is stored on the memory 1000 and can be in the processing
The computer program run on device 2000, wherein above-mentioned processor 2000 realizes above-mentioned fraud when executing above-mentioned computer program
The step of identification model training method.
Specifically, above-mentioned memory 1000 and processor 2000 can be general memory and processor, not do here
It is specific to limit, when the computer program of 2000 run memory 1000 of processor storage, it is able to carry out above-mentioned fraud identification mould
Type training method, so as to pass through the largely sample data without mark and there is the sample data training of mark to be cheated on a small quantity
Identification model can be improved mould while reducing the workload manually marked during model training to sample data
The efficiency of type training and the recognition accuracy for cheating identification model.
Corresponding to the fraud recognition methods in Figure 10, the embodiment of the present application also provides another computer equipments, such as scheme
Shown in 16, which includes memory 3000, processor 4000 and is stored on the memory 3000 and can be in the processor
The computer program run on 4000, wherein above-mentioned processor 4000 realizes that above-mentioned fraud is known when executing above-mentioned computer program
The step of other method.
Specifically, above-mentioned memory 3000 and processor 4000 can be general memory and processor, not do here
It is specific to limit, when the computer program of 4000 run memory 3000 of processor storage, it is able to carry out above-mentioned fraud identification side
Method identifies mould so as to by the largely sample data without mark and have the sample data training of mark to obtain fraud on a small quantity
Type can be improved model training while reducing the workload manually marked during model training to sample data
Efficiency and cheat identification model recognition accuracy.
The computer journey of fraud identification model training method, fraud recognition methods and device provided by the embodiment of the present application
Sequence product, the computer readable storage medium including storing program code, the instruction that said program code includes can be used for holding
Row previous methods method as described in the examples, specific implementation can be found in embodiment of the method, and details are not described herein.
In all examples being illustrated and described herein, any occurrence should be construed as merely illustratively, without
It is as limitation, therefore, other examples of exemplary embodiment can have different values.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description
It with the specific work process of device, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.In the application
In provided several embodiments, it should be understood that disclosed systems, devices and methods, it can be real by another way
It is existing.The apparatus embodiments described above are merely exemplary.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product
It is stored in the executable non-volatile computer-readable storage medium of a processor.Based on this understanding, the application
Technical solution substantially the part of the part that contributes to existing technology or the technical solution can be with software in other words
The form of product embodies, which is stored in a storage medium, including some instructions use so that
One computer equipment (can be personal computer, server or the network equipment etc.) executes each embodiment institute of the application
State all or part of the steps of method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read-Only
Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. is various to deposit
Store up the medium of program code.
Finally, it should be noted that embodiment described above, the only specific embodiment of the application, to illustrate the application
Technical solution, rather than its limitations, the protection scope of the application is not limited thereto, although with reference to the foregoing embodiments to this Shen
It please be described in detail, those skilled in the art should understand that: anyone skilled in the art
Within the technical scope of the present application, it can still modify to technical solution documented by previous embodiment or can be light
It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make
The essence of corresponding technical solution is detached from the spirit and scope of the embodiment of the present application technical solution, should all cover the protection in the application
Within the scope of.Therefore, the protection scope of the application shall be subject to the protection scope of the claim.
Claims (10)
1. a kind of fraud identification model training method characterized by comprising
Obtain the historical operation information of multiple first sample users;And obtain the historical operation letter of multiple second sample of users
Breath fraud markup information corresponding with each second sample of users;
According to the historical operation information of the first sample user, building can be used in characterizing the first sample user's operation row
The first eigenvector being characterized;And the historical operation information according to second sample of users, building can be used in characterizing
The second feature vector of the second sample of users operation behavior feature;
The first eigenvector is input to the first nerves network and nervus opticus network of symmetrical configuration, to first mind
Unsupervised pre-training is carried out through network;Wherein, the first nerves network is for encoding the first eigenvector;Institute
Nervus opticus network is stated for being decoded to the first eigenvector after coding;
The second feature vector is input to the first nerves network and classifier by pre-training, is based on described second
Feature vector and the corresponding fraud markup information, to Jing Guo pre-training the first nerves network and the classifier into
Row Training obtains fraud identification model.
2. the method according to claim 1, wherein described believe according to the historical operation of the first sample user
Breath, building can be used in characterizing the first eigenvector of the first sample user's operation behavioural characteristic;And according to described
The historical operation information of two sample of users, building can be used in characterizing the second spy of the second sample of users operation behavior feature
Levy vector, comprising:
For each first sample user, according to the historical operation information of first sample user, determine this first
Characteristic value of the sample of users under multiple predetermined registration operation behavioural characteristics;
According to characteristic value of the first sample user under multiple predetermined registration operation behavioural characteristics, building can be used in characterizing institute
State the first eigenvector of first sample user's operation behavioural characteristic;And
For each second sample of users, according to the historical operation information of second sample of users, determine this second
Characteristic value of the sample of users under multiple predetermined registration operation behavioural characteristics;
According to characteristic value of second sample of users under multiple predetermined registration operation behavioural characteristics, building can be used in characterizing institute
State the second feature vector of the second sample of users operation behavior feature.
3. the method according to claim 1, wherein described be input to symmetrical configuration for the first eigenvector
First nerves network and nervus opticus network, unsupervised pre-training is carried out to the first nerves network, comprising:
The first eigenvector is input in the first nerves network, at least one in the first nerves network is obtained
The coding characteristic vector of a target encoding layer output;
The coding characteristic vector of the last layer coding layer output in the first nerves network is input to the nervus opticus
In network, obtain the decoding feature of corresponding with target encoding layer target decoder layer output in the nervus opticus network to
Amount;
Feature vector is decoded according to the coding characteristic vector sum, to the first nerves network and the nervus opticus net
Network carries out epicycle training;
By carrying out more wheel training to the first nerves network and the nervus opticus network, complete to the first nerves net
The unsupervised pre-training of network.
4. according to the method described in claim 3, it is characterized in that, described decode spy according to the coding characteristic vector sum
Vector is levied, epicycle training is carried out to the first nerves network and the nervus opticus network, comprising:
Epicycle is not completed to any one first sample user in the first sample user of training as target first also
Sample of users decodes feature vector according to the coding characteristic vector sum of target first sample user, determines the mesh
First sample user is marked in the loss of epicycle;
According to the target first sample user in the loss of epicycle, the first nerves network and second mind are adjusted
Parameter through network;
Using the target first sample user as the first sample user for completing training, and training will not be also completed when front-wheel
Other any one first sample users are as new target first sample user in first sample user;
Using the first nerves network and the nervus opticus network after parameter is had adjusted, new first sample of target is obtained
The coding characteristic vector sum of this user decodes feature vector, and returns to the coding according to target first sample user
Feature vector and the decoding feature vector determine the target first sample user the loss of epicycle the step of;
Until all first sample users complete the training when front-wheel, completion is to the first nerves network and described
The epicycle training of nervus opticus network.
5. the method according to claim 1, wherein described be input to the second feature vector by instructing in advance
The experienced first nerves network and classifier are based on the corresponding fraud markup information of the second feature vector sum, right
Training is carried out by the first nerves network of pre-training and the classifier, obtains fraud identification model, comprising:
The second feature vector is input to the first nerves network and classifier by pre-training, obtains described second
The fraud recognition result of sample of users;And
It is marked and is believed according to the fraud of the fraud recognition result of each second sample of users and second sample of users
Breath, to the first nerves network and classifier progress epicycle Training Jing Guo pre-training;
By carrying out more wheel Trainings to the first nerves network and the classifier, the fraud identification mould is obtained
Type.
6. according to the method described in claim 5, it is characterized in that, described know according to the fraud of each second sample of users
Not as a result, and second sample of users the fraud markup information, to Jing Guo pre-training the first nerves network with
The classifier carries out epicycle Training, comprising:
Epicycle is not completed to any one second sample of users in second sample of users of training as target second also
Sample of users is taken advantage of according to the fraud recognition result of second sample of users of target and the described of second sample of users of target
Markup information is cheated, determines second sample of users of target in the intersection entropy loss of epicycle;
According to second sample of users of target in the intersection entropy loss of epicycle, the first nerves network and described is adjusted
The parameter of classifier;
Using second sample of users of target as the second sample of users for completing training, and training will not be also completed when front-wheel
Any one other second sample of users are as new the second sample of users of target in second sample of users,
Using the first nerves network and the classifier after parameter is had adjusted, new second sample of users of target is obtained
Fraud recognition result, and return to the fraud recognition result and the target second according to the second sample of users of the target
The fraud markup information of sample of users determines second sample of users of target the intersection entropy loss of epicycle the step of;
Until all second sample of users all complete the training when front-wheel, complete to the first nerves network Jing Guo pre-training
Epicycle Training is carried out with the classifier.
7. according to the method described in claim 6, it is characterized in that, the completion is to the first nerves net Jing Guo pre-training
After network and the classifier carry out epicycle Training, further includes:
Whether detection epicycle reaches default wheel number;If it is, stopping the instruction to the first nerves network and the classifier
Practice, the first nerves network and the classifier that last training in rotation is got are as the fraud identification model;
Alternatively,
The first nerves network and the classifier obtained using test set to epicycle is verified;If the test set
In, intersect item number of the entropy loss no more than the preset test data for intersecting entropy loss threshold value, occupies the test integrated test
The percentage of data total number is greater than preset first percentage threshold, then stops to the first nerves network and described point
The training of class device, the first nerves network and the classifier that last training in rotation is got identify mould as the fraud
Type;
Alternatively,
Successively by the intersection entropy loss of each second sample of users of epicycle, the second sample of users corresponding with previous round is intersected
Entropy loss is compared;If the entropy loss that intersects of the second sample of users described in epicycle is greater than corresponding second sample of users of previous round
Intersect the quantity of the second sample of users of entropy loss, the percentage for occupying all second sample of users quantity reaches preset second
Percentage threshold then stops the training to the first nerves network and the classifier, and the institute that last round of training is obtained
First nerves network and the classifier are stated as the fraud identification model.
8. a kind of fraud recognition methods characterized by comprising
When operation behavior occurs for user to be detected, the historical operation information of the user to be detected is obtained;
According to the historical operation information of the user to be detected, building can be used in characterizing the user's operation behavioural characteristic to be detected
Target feature vector;
The target feature vector is input to by fraud identification model training side described in claim 1-7 any one
In the fraud identification model that method training obtains, the operation behavior for obtaining the user to be detected is the probability of fraud.
9. a kind of fraud identification model training device characterized by comprising
First obtains module, for obtaining the historical operation information of multiple first sample users;And obtain multiple second samples
The historical operation information of user and the corresponding fraud markup information of each second sample of users;
First building module, for the historical operation information according to the first sample user, building can be used in described in characterization
The first eigenvector of first sample user's operation behavioural characteristic;And believed according to the historical operation of second sample of users
Breath, building can be used in characterizing the second feature vector of the second sample of users operation behavior feature;
Pre-training module, for the first eigenvector to be input to the first nerves network and nervus opticus net of symmetrical configuration
Network carries out unsupervised pre-training to the first nerves network;Wherein, the first nerves network is used for the fisrt feature
Vector is encoded;The nervus opticus network is used to be decoded the first eigenvector after coding;
Training module, for being input to the second feature vector by the first nerves network of pre-training and classification
Device is based on the corresponding fraud markup information of the second feature vector sum, to the first nerves net Jing Guo pre-training
Network and the classifier carry out Training, obtain fraud identification model.
10. a kind of fraud identification device characterized by comprising
Second obtains module, for when operation behavior occurs for user to be detected, obtaining the historical operation letter of the user to be detected
Breath;
Second building module, for the historical operation information according to the user to be detected, building can be used in characterizing described to be checked
Survey the target feature vector of user's operation behavioural characteristic;
It cheats recognition result and obtains module, for being input to the target feature vector by claim 1-7 any one
In the fraud identification model that the fraud identification model training method training obtains, the operation row of the user to be detected is obtained
For the probability for fraud.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811432681.3A CN109544190A (en) | 2018-11-28 | 2018-11-28 | A kind of fraud identification model training method, fraud recognition methods and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811432681.3A CN109544190A (en) | 2018-11-28 | 2018-11-28 | A kind of fraud identification model training method, fraud recognition methods and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109544190A true CN109544190A (en) | 2019-03-29 |
Family
ID=65850677
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811432681.3A Pending CN109544190A (en) | 2018-11-28 | 2018-11-28 | A kind of fraud identification model training method, fraud recognition methods and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109544190A (en) |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110245302A (en) * | 2019-05-24 | 2019-09-17 | 阿里巴巴集团控股有限公司 | The strategy-generating method and device and electronic equipment of fraud case for identification |
CN110348190A (en) * | 2019-06-29 | 2019-10-18 | 上海淇毓信息科技有限公司 | User equipment ownership judgment method and device based on user's operation behavior |
CN110427971A (en) * | 2019-07-05 | 2019-11-08 | 五八有限公司 | Recognition methods, device, server and the storage medium of user and IP |
CN110503198A (en) * | 2019-07-23 | 2019-11-26 | 平安科技(深圳)有限公司 | Obtain method, apparatus, equipment and the storage medium of neural network test report |
CN110597984A (en) * | 2019-08-12 | 2019-12-20 | 大箴(杭州)科技有限公司 | Method and device for determining abnormal behavior user information, storage medium and terminal |
CN110705585A (en) * | 2019-08-22 | 2020-01-17 | 深圳壹账通智能科技有限公司 | Network fraud identification method and device, computer device and storage medium |
CN110738396A (en) * | 2019-09-18 | 2020-01-31 | 阿里巴巴集团控股有限公司 | method, device and equipment for extracting characteristics of equipment |
CN110880117A (en) * | 2019-10-31 | 2020-03-13 | 北京三快在线科技有限公司 | False service identification method, device, equipment and storage medium |
CN111126481A (en) * | 2019-12-20 | 2020-05-08 | 湖南千视通信息科技有限公司 | Training method and device of neural network model |
CN111222026A (en) * | 2020-01-09 | 2020-06-02 | 支付宝(杭州)信息技术有限公司 | Training method of user category identification model and user category identification method |
CN111275546A (en) * | 2020-02-24 | 2020-06-12 | 中国工商银行股份有限公司 | Financial client fraud risk identification method and device |
CN111539309A (en) * | 2020-04-21 | 2020-08-14 | 广州云从鼎望科技有限公司 | Data processing method, system, platform, equipment and medium based on OCR |
CN111641608A (en) * | 2020-05-18 | 2020-09-08 | 咪咕动漫有限公司 | Abnormal user identification method and device, electronic equipment and storage medium |
CN111881991A (en) * | 2020-08-03 | 2020-11-03 | 联仁健康医疗大数据科技股份有限公司 | Method and device for identifying fraud and electronic equipment |
CN112417293A (en) * | 2020-12-03 | 2021-02-26 | 京东数字科技控股股份有限公司 | Information pushing method and system, model training method and related equipment |
CN112634026A (en) * | 2020-12-30 | 2021-04-09 | 四川新网银行股份有限公司 | Credit fraud identification method based on user page operation behavior |
CN112733045A (en) * | 2021-04-06 | 2021-04-30 | 北京轻松筹信息技术有限公司 | User behavior analysis method and device and electronic equipment |
CN112967134A (en) * | 2021-05-19 | 2021-06-15 | 北京轻松筹信息技术有限公司 | Network training method, risk user identification method, device, equipment and medium |
CN113160233A (en) * | 2021-04-02 | 2021-07-23 | 易普森智慧健康科技(深圳)有限公司 | Method for training example segmentation neural network model by using sparse labeled data set |
CN113256434A (en) * | 2021-06-08 | 2021-08-13 | 平安科技(深圳)有限公司 | Method, device, equipment and storage medium for recognizing vehicle insurance claim settlement behaviors |
WO2021159775A1 (en) * | 2020-02-11 | 2021-08-19 | 腾讯科技(深圳)有限公司 | Training method and device for audio separation network, audio separation method and device, and medium |
CN113837303A (en) * | 2021-09-29 | 2021-12-24 | 中国联合网络通信集团有限公司 | Black product user identification method, TEE node and computer readable storage medium |
CN114143786A (en) * | 2021-11-29 | 2022-03-04 | 爱浦路网络技术(北京)有限公司 | User identification method, system, device and storage medium based on 5G |
CN115550506A (en) * | 2022-09-27 | 2022-12-30 | 中国电信股份有限公司 | Training of user recognition model, user recognition method and device |
WO2024020773A1 (en) * | 2022-07-26 | 2024-02-01 | 江苏树实科技有限公司 | Model generation method, image classification method, controller, and electronic device |
-
2018
- 2018-11-28 CN CN201811432681.3A patent/CN109544190A/en active Pending
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110245302A (en) * | 2019-05-24 | 2019-09-17 | 阿里巴巴集团控股有限公司 | The strategy-generating method and device and electronic equipment of fraud case for identification |
CN110245302B (en) * | 2019-05-24 | 2023-08-08 | 创新先进技术有限公司 | Policy generation method and device for identifying fraudulent cases and electronic equipment |
CN110348190A (en) * | 2019-06-29 | 2019-10-18 | 上海淇毓信息科技有限公司 | User equipment ownership judgment method and device based on user's operation behavior |
CN110427971A (en) * | 2019-07-05 | 2019-11-08 | 五八有限公司 | Recognition methods, device, server and the storage medium of user and IP |
CN110503198A (en) * | 2019-07-23 | 2019-11-26 | 平安科技(深圳)有限公司 | Obtain method, apparatus, equipment and the storage medium of neural network test report |
CN110597984B (en) * | 2019-08-12 | 2022-05-20 | 大箴(杭州)科技有限公司 | Method and device for determining abnormal behavior user information, storage medium and terminal |
CN110597984A (en) * | 2019-08-12 | 2019-12-20 | 大箴(杭州)科技有限公司 | Method and device for determining abnormal behavior user information, storage medium and terminal |
CN110705585A (en) * | 2019-08-22 | 2020-01-17 | 深圳壹账通智能科技有限公司 | Network fraud identification method and device, computer device and storage medium |
CN110738396A (en) * | 2019-09-18 | 2020-01-31 | 阿里巴巴集团控股有限公司 | method, device and equipment for extracting characteristics of equipment |
CN110880117A (en) * | 2019-10-31 | 2020-03-13 | 北京三快在线科技有限公司 | False service identification method, device, equipment and storage medium |
CN111126481A (en) * | 2019-12-20 | 2020-05-08 | 湖南千视通信息科技有限公司 | Training method and device of neural network model |
CN111222026A (en) * | 2020-01-09 | 2020-06-02 | 支付宝(杭州)信息技术有限公司 | Training method of user category identification model and user category identification method |
CN111222026B (en) * | 2020-01-09 | 2023-07-14 | 支付宝(杭州)信息技术有限公司 | Training method of user category recognition model and user category recognition method |
WO2021159775A1 (en) * | 2020-02-11 | 2021-08-19 | 腾讯科技(深圳)有限公司 | Training method and device for audio separation network, audio separation method and device, and medium |
CN111275546A (en) * | 2020-02-24 | 2020-06-12 | 中国工商银行股份有限公司 | Financial client fraud risk identification method and device |
CN111275546B (en) * | 2020-02-24 | 2023-08-18 | 中国工商银行股份有限公司 | Financial customer fraud risk identification method and device |
CN111539309A (en) * | 2020-04-21 | 2020-08-14 | 广州云从鼎望科技有限公司 | Data processing method, system, platform, equipment and medium based on OCR |
CN111641608A (en) * | 2020-05-18 | 2020-09-08 | 咪咕动漫有限公司 | Abnormal user identification method and device, electronic equipment and storage medium |
CN111881991B (en) * | 2020-08-03 | 2023-11-10 | 联仁健康医疗大数据科技股份有限公司 | Method and device for identifying fraud and electronic equipment |
CN111881991A (en) * | 2020-08-03 | 2020-11-03 | 联仁健康医疗大数据科技股份有限公司 | Method and device for identifying fraud and electronic equipment |
CN112417293A (en) * | 2020-12-03 | 2021-02-26 | 京东数字科技控股股份有限公司 | Information pushing method and system, model training method and related equipment |
CN112634026A (en) * | 2020-12-30 | 2021-04-09 | 四川新网银行股份有限公司 | Credit fraud identification method based on user page operation behavior |
CN113160233A (en) * | 2021-04-02 | 2021-07-23 | 易普森智慧健康科技(深圳)有限公司 | Method for training example segmentation neural network model by using sparse labeled data set |
CN112733045A (en) * | 2021-04-06 | 2021-04-30 | 北京轻松筹信息技术有限公司 | User behavior analysis method and device and electronic equipment |
CN112733045B (en) * | 2021-04-06 | 2021-06-22 | 北京轻松筹信息技术有限公司 | User behavior analysis method and device and electronic equipment |
CN112967134B (en) * | 2021-05-19 | 2021-09-21 | 北京轻松筹信息技术有限公司 | Network training method, risk user identification method, device, equipment and medium |
CN112967134A (en) * | 2021-05-19 | 2021-06-15 | 北京轻松筹信息技术有限公司 | Network training method, risk user identification method, device, equipment and medium |
CN113256434A (en) * | 2021-06-08 | 2021-08-13 | 平安科技(深圳)有限公司 | Method, device, equipment and storage medium for recognizing vehicle insurance claim settlement behaviors |
CN113837303A (en) * | 2021-09-29 | 2021-12-24 | 中国联合网络通信集团有限公司 | Black product user identification method, TEE node and computer readable storage medium |
CN114143786A (en) * | 2021-11-29 | 2022-03-04 | 爱浦路网络技术(北京)有限公司 | User identification method, system, device and storage medium based on 5G |
WO2024020773A1 (en) * | 2022-07-26 | 2024-02-01 | 江苏树实科技有限公司 | Model generation method, image classification method, controller, and electronic device |
CN115550506A (en) * | 2022-09-27 | 2022-12-30 | 中国电信股份有限公司 | Training of user recognition model, user recognition method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109544190A (en) | A kind of fraud identification model training method, fraud recognition methods and device | |
CN109598331A (en) | A kind of fraud identification model training method, fraud recognition methods and device | |
US10572885B1 (en) | Training method, apparatus for loan fraud detection model and computer device | |
Boyd et al. | Introduction to applied linear algebra: vectors, matrices, and least squares | |
CN109410036A (en) | A kind of fraud detection model training method and device and fraud detection method and device | |
CN109409896A (en) | Identification model training method, bank's fraud recognition methods and device are cheated by bank | |
CN109345260A (en) | A kind of fraud detection model training method and device and fraud detection method and device | |
Li et al. | Dissecting ethereum blockchain analytics: What we learn from topology and geometry of the ethereum graph? | |
CN106503873A (en) | A kind of prediction user follows treaty method, device and the computing device of probability | |
CN105354595A (en) | Robust visual image classification method and system | |
CN109389494A (en) | Borrow or lend money fraud detection model training method, debt-credit fraud detection method and device | |
CN107230108A (en) | The processing method and processing device of business datum | |
CN111597348B (en) | User image drawing method, device, computer equipment and storage medium | |
CN108961032A (en) | Borrow or lend money processing method, device and server | |
CN110084609B (en) | Transaction fraud behavior deep detection method based on characterization learning | |
CN108509458B (en) | Business object identification method and device | |
CN110264342A (en) | A kind of business audit method and device based on machine learning | |
Fisichella et al. | Can deep learning improve technical analysis of forex data to predict future price movements? | |
CN109726918A (en) | The personal credit for fighting network and semi-supervised learning based on production determines method | |
CN110222733A (en) | The high-precision multistage neural-network classification method of one kind and system | |
Xu et al. | Novel key indicators selection method of financial fraud prediction model based on machine learning hybrid mode | |
CN111241258A (en) | Data cleaning method and device, computer equipment and readable storage medium | |
CN110389963A (en) | The recognition methods of channel effect, device, equipment and storage medium based on big data | |
Philip et al. | Improved insights on financial health through partially constrained hidden Markov model clustering on loan repayment data | |
CN110992173A (en) | Credit risk assessment model generation method based on multi-instance learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190329 |
|
RJ01 | Rejection of invention patent application after publication |