Specific embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in accompanying drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure and should not be by embodiments set forth here
Limited.On the contrary, there is provided these embodiments are able to be best understood from the disclosure, and can be by the scope of the present disclosure
Complete conveys to those skilled in the art.
The flow chart that Fig. 1 shows the fraud business recognition method embodiment one of present invention offer.As shown in figure 1, the party
Method comprises the steps:
Step 101, receives business datum, business datum and basic data is carried out Data Fusion and obtains fused business
Data.
Receive the business datum of the relevant transaction that client (bank/payment company etc.) passes over.Business datum is being entered
Before row is processed, filtration treatment is carried out to business datum first, to filter out hashed field.Then, by business datum and the 3rd
The basic data that Fang Tongji is obtained carries out Data Fusion and obtains fused business data.Wherein basic data is referred to for supplementing
Illustrate that the data of business datum, such as business produce the relevant information of position, can pass through IP (Internet Protocol, Yin Te
FidonetFido) address of cache is into latitude and longitude information or province's cities and counties' information;For example business produces device-dependent message again, can pass through MAC
(Media Access Control, medium access control) address of cache is into concrete equipment;Such as business produces user's correlation again
Information, can be mapped to particular user (the such as user of bank/payment company) by cell-phone number.By what client was passed over
Business datum is blended with basic data, solves the problems, such as that the data that client's only business datum is caused are single, information content is few,
Make to the supplement that business datum carries out multiple dimensions which become more three-dimensional in more detail so that data analysis is more accurate, be easy to follow-up
Recognition decision.
Step 102, calling rule engine and/or sorter model are identified to fused business data, obtain judging knot
Really.
Process provides two kinds of identification facilities, respectively regulation engine and sorter model.Fixed wherein in regulation engine
Rule of the justice for concrete business, each service definition have a lot of rules.These rules can be blacklist rule, white list
Rule or dimension rule.Dimension refers to the combination of certain field or certain several field in concrete service fields, or certain word
Section extension, such as mailbox, IP address and MAC Address etc., then or account access frequency.By fused business data input extremely
After in regulation engine, each dimension data of fused business data is given a mark, the fraction of each dimension data comprehensive is obtained
The fraud fraction of fused business data, and the fraud fraction according to the fused business data, obtain result of determination, that is, determine
Whether the business datum is the data of fraud.Sorter model be based on sorting algorithm, by the use of known business datum as
Training data trains grader and is stored in sorter model database, when System Call Classification device model, by fused business
Data input obtains classification results into the grader for selecting, for determining that whether the business datum is the data of fraud.
If system has regulation engine and sorter model concurrently, during concrete identification, wherein one can be selected according to the setting of client
Individual identification facility is identified.If customer selecting regulation engine, this step directly invoke regulation engine and are identified;If
Customer selecting sorter model, then this step directly invoke sorter model and be identified.If client selects rule to draw simultaneously
Hold up and sorter model, then client need to arrange the default weight of two kinds of identification facilities, can be according to respective default during concrete identification
Weight is carried out processing to recognition result and generates final result of determination.
Step 103, is that business datum sets identification label according to result of determination, is set the business datum of identification label
Return to called side.
It is that business datum sets identification label according to result of determination, identification label can be P, R and D, and P represents and pass through, i.e.,
Business datum is normal data;D represents that is, business datum is fraud data;R represents undetermined, that is, need further people
Work examination & verification confirms.The business datum for being set identification label returns to called side.
According to the fraud business recognition method that the present embodiment is provided, known fraud is can determine whether using regulation engine,
Using the measurable unknown fraud of sorter model.Industry wherein in regulation engine according to belonging to client is flexibly fixed
Adopted rule, facilitates system to make a policy;Sorter model can use sorting algorithm, daily the sample training mould of operation large data sets
Type is saved in database, improves the accuracy of Model Identification.
The flow chart that Fig. 2 shows the fraud business recognition method embodiment two of present invention offer.As shown in Fig. 2 the party
Method comprises the steps:
Step 201, receives business datum, business datum and basic data is carried out Data Fusion and obtains fused business
Data.
Receive the business datum of the relevant transaction that client (bank/payment company etc.) passes over.Business datum is being entered
Before row is processed, filtration treatment is carried out to business datum first, to filter out hashed field.Then, by business datum and the 3rd
The basic data that Fang Tongji is obtained carries out Data Fusion and obtains fused business data.Wherein basic data is referred to for supplementing
The data of business datum are described, for example business produces the relevant information of position, can by IP address be mapped to latitude and longitude information or
Province's cities and counties' information;For example business produces device-dependent message again, can be mapped to concrete equipment by MAC Address;And such as business
User related information is produced, particular user (the such as user of bank/payment company) can be mapped to by cell-phone number.By by visitor
The business datum that family passes over is blended with basic data so that the information of business datum expression in more detail, is easy to follow-up knowledge
Other decision-making.
Step 202, calling rule engine are identified to fused business data, obtain result of determination.
When system is initially use, system also without sorter model, therefore in this stage for starting, is first called
Regulation engine is identified to fused business data.The rule for concrete business has been pre-defined in regulation engine, each industry
Business definition has a lot of rules.These rules can be blacklist rule, white list rule or dimension rule.Dimension is referred to specifically
The combination of certain field or certain several field in service fields, or the extension of certain field, such as mailbox, IP address and MAC
Address etc., then or account access frequency.After in fused business data input to regulation engine, to fused business data
Each dimension data is given a mark, and the fraction of each dimension data comprehensive obtains the fraud fraction of fused business data, Yi Jigen
According to the fraud fraction of the fused business data, result of determination is obtained, that is, determine that whether the business datum is the number of fraud
According to.
Step 203, is that business datum sets identification label according to result of determination, is set the business datum of identification label
Return to called side.
It is that business datum sets identification label according to result of determination, identification label can be P, R and D, and P represents and pass through, i.e.,
Business datum is normal data;D represents that is, business datum is fraud data;R represents undetermined, that is, need further people
Work examination & verification confirms.The business datum for being set identification label returns to called side.
Executed after a period of time using above-mentioned steps S201 to step S203, the industry for setting identification label in a large number can be obtained
Business data, are wherein set as that the business datum of P and D can train sorter model as training data.
Step 204, using the business datum identified by the use of regulation engine as training data, is instructed to sorter model
Practice.
The business datum identified by the use of above-mentioned steps 201 to step 203 is carried out as training data to sorter model
Training.First, training data is pre-processed, filters out the hashed field in training data and/or the content not being identified;
Then, building training dataset and sorting algorithm being passed to as parameter, training obtains grader;The grader for training is carried out
Evaluate, the grader that is verified by appraisement system is stored in sorter model database for calling.
One or more comprising following evaluation criterion of appraisement system:Accuracy, False Rate, the length of training time and
ROC curve (Receiver Operating Characteristic curve, Receiver operating curve).Grader
Training and evaluation are online lower triggerings, it is to avoid allow real-time business datum Awaiting Triage device to create, more meet requirement of real-time.
The algorithm adopted by the present embodiment sorter model can be decision Tree algorithms, NB Algorithm, nerve net
Network algorithm or logistic regression algorithm.The present invention is without limitation.
Step 205, receives new business datum, new business datum and basic data is carried out Data Fusion and is obtained
Fused business data.Concrete fusion process can be found in the specific descriptions in step S201.
Step 206, calling classification device model are identified to fused business data, obtain result of determination.
The sorter model obtained using step 204 training is identified to new fused business data.Specifically, can root
According to the selection of client, be evaluated grader is called to classify the fused business data.Preferably, select correct
Rate highest grader is classified to fused business data.
Step 207, is that new business datum sets identification label according to result of determination, is set the new of identification label
Business datum returns to called side.
Step 208, generates new rule according to the fraud business datum identified using sorter model, by new rule
It is added in regulation engine.
Through the process of above step 205 to step 207, sorter model measurable go out new fraud, this step
The fraud business datum identified using sorter model generates new rule, and new rule is added in regulation engine, with
This reacts on regulation engine so that sorter model and regulation engine can be updated.
In the present embodiment, obtain as sorter model will be trained using the business datum for setting identification label,
So when initially use system, being without sorter model in system, it is therefore necessary to first by regulation engine pair
Business datum is processed, and business datum finally takes label after rules engines processes.When using regulation engine for a period of time
Afterwards, the business datum for setting identification label of accumulation can be used to train sorter model.When regulation engine and grader mould
When type all occurs, specifically using regulation engine or sorter model, setting can be needed according to client.If customer selecting
Regulation engine, then directly invoke regulation engine and be identified;If customer selecting sorter model, grader mould is directly invoked
Type is identified.If client selects regulation engine and sorter model, client arrange the pre- of two kinds of identification facilities simultaneously
If weight, recognition result can be carried out processing according to respective default weight during concrete identification and generate final result of determination.
Further, the present embodiment is also set in the business datum write into Databasce of identification label, works as auditor
During to recognizing that label has objection, can in database, called data be modified at any time, it is ensured that the correctness of model training data,
Improve the accuracy of business datum identification.
In addition, this method can also show recognition result and corresponding rule, Ke Huke in intuitively graphical user's page
(comprising establishment, modification, delete, search for, test and use) various rules are easily edited on the page, without specialty
Rule editing personnel enter edlin.Purposes and priority that can also be clearly per rule, reasonable when triggering rule will
Triggering reason shows on the page.Client can also enter the test of line discipline online, according to normal flow to business number during test
According to being identified, but not by recognition result write into Databasce.If test result makes customer satisfaction, at any time rule can be answered
Use on line, prevent potential risk in time.
According to the fraud business recognition method that the present embodiment is provided, known fraud is can determine whether using regulation engine,
Using the measurable unknown fraud of sorter model.Industry wherein in regulation engine according to belonging to client is flexibly fixed
Adopted rule, facilitates system to make a policy;Sorter model can use sorting algorithm, daily the sample training mould of operation large data sets
Type is saved in database, improves the accuracy of Model Identification.This method combines the advantage of two kinds of identification facilities, the first rank
Section is identified to business datum by initial regulation engine, is that data accumulation is done in the training of sorter model;Second stage
Business datum is identified using the sorter model for training, when new fraud is identified, new rule can be generated
Then it is added in regulation engine, regulation engine is reacted on this so that sorter model and regulation engine can be updated,
So as to solve the problems, such as that regulation engine is unable to real-time update, its training data to sorter model offer is more accurate in turn
Really so that sorter model more can accurately distinguish fraud and non-fraud.
Fig. 3 shows the functional block diagram of the fraud business identifying device embodiment of present invention offer.As shown in figure 3,
The device includes:Data fusion module 31, regular identification module 32, model identification module 33, label setting module 34.
Business datum and basic data, for receiving business datum, are carried out Data Fusion by data fusion module 31
Obtain fused business data.
Data fusion module 31 receives the business datum of the relevant transaction that client (bank/payment company etc.) passes over.
Before processing to business datum, filtration treatment is carried out to business datum first, to filter out hashed field.Then, will
The basic data that business datum and third party's statistics are obtained carries out Data Fusion and obtains fused business data.Wherein basic number
According to the data referred to for the business datum that remarks additionally, such as business is produced the relevant information of position, can be mapped by IP address
Into latitude and longitude information or province's cities and counties' information;For example business produces device-dependent message again, can be mapped to by MAC Address and specifically be set
Standby;Again for example business produces user related information, can by cell-phone number be mapped to particular user (such as bank/payment company
User).Blended by the business datum and basic data that pass over client, solve client's only business datum and cause
The problem that data are single, information content is few, make to the supplement that business datum carries out multiple dimensions which become in more detail more three-dimensional, make
Obtain data analysis more accurate, be easy to follow-up recognition decision.
Fused business data are identified for calling rule engine, obtain result of determination by regular identification module 32.
Wherein defined in regulation engine for concrete business rule, each service definition has a lot of rules.These rule
Can be then blacklist rule, white list rule or dimension rule.Dimension refer in concrete service fields certain field or certain
The combination of several fields, or the extension of certain field, such as mailbox, IP address and MAC Address etc., then or account access
Frequency.Regular identification module 32 is further used for:Calling rule engine carries out beating to each dimension data of fused business data
Point;The fraction of each dimension data comprehensive obtains the fraud fraction of fused business data, and according to the fused business data
Fraud fraction, obtain result of determination, that is, determine that whether the business datum is the data of fraud.
Fused business data are identified for calling classification device model, obtain result of determination by model identification module 33.
Label setting module 34, for being that business datum sets identification label according to result of determination, is set identification mark
The business datum of label returns to called side.
Further, the device also includes:Model training module 35, for the business number that will be identified using regulation engine
According to as training data, sorter model is trained.
Model training module 35 is further used for:Training data is pre-processed, is filtered out useless in training data
Field and/or the content not being identified;Building training dataset and sorting algorithm being passed to as parameter, training obtains grader;
The grader for training is evaluated, by the grader that is verified by appraisement system be stored in sorter model database for
Call.One or more comprising following evaluation criterion of appraisement system:Accuracy, False Rate, the length of training time and ROC
Curve.The training of grader and evaluation are online lower triggerings, it is to avoid allow real-time business datum Awaiting Triage device to create, more accord with
Close requirement of real-time.
The algorithm adopted by the present embodiment sorter model can be decision Tree algorithms, NB Algorithm, nerve net
Network algorithm or logistic regression algorithm.The present invention is without limitation.
Further, the device also includes:Rule generation module 36, for according to taking advantage of for being identified using sorter model
Swindleness business datum generates new rule, and new rule is added in regulation engine, regulation engine is reacted on this so that point
Class device model and regulation engine can be updated.
Further, the device also includes:Comprehensive identification module 37, for obtaining the pre- of regulation engine and sorter model
If weight;When calling rule engine and sorter model are identified to the fused business data, according to respective default
Weight carries out process to recognition result and obtains the result of determination.
Further, the device is also set in the business datum write into Databasce of identification label, as auditor couple
Identification label is when having objection, can in database, called data is modified at any time, it is ensured that the correctness of model training data, carries
The accuracy of high business datum identification.
In addition, this device can also show recognition result and corresponding rule, Ke Huke in intuitively graphical user's page
(comprising establishment, modification, delete, search for, test and use) various rules are easily edited on the page, without specialty
Rule editing personnel enter edlin.Purposes and priority that can also be clearly per rule, reasonable when triggering rule will
Triggering reason shows on the page.Client can also enter the test of line discipline online, according to normal flow to business number during test
According to being identified, but not by recognition result write into Databasce.If test result makes customer satisfaction, at any time rule can be answered
Use on line, prevent potential risk in time.
According to the fraud business identifying device that the present embodiment is provided, known fraud is can determine whether using regulation engine,
Using the measurable unknown fraud of sorter model.Industry wherein in regulation engine according to belonging to client is flexibly fixed
Adopted rule, facilitates system to make a policy;Sorter model can use sorting algorithm, daily the sample training mould of operation large data sets
Type is saved in database, improves the accuracy of Model Identification.This device combines the advantage of two kinds of identification facilities, the first rank
Section is identified to business datum by initial regulation engine, is that data accumulation is done in the training of sorter model;Second stage
Business datum is identified using the sorter model for training, when new fraud is identified, new rule can be generated
Then it is added in regulation engine, regulation engine is reacted on this so that sorter model and regulation engine can be updated,
So as to solve the problems, such as that regulation engine is unable to real-time update, its training data to sorter model offer is more accurate in turn
Really so that sorter model more can accurately distinguish fraud and non-fraud.
Algorithm and display be not inherently related to any certain computer, virtual system or miscellaneous equipment provided herein.
Various general-purpose systems can also be used together based on teaching in this.As described above, construct required by this kind of system
Structure be obvious.Additionally, the present invention is also not for any certain programmed language.It is understood that, it is possible to use various
Programming language realizes the content of invention described herein, and the above description done by language-specific is to disclose this
Bright preferred forms.
In specification mentioned herein, a large amount of details are illustrated.It is to be appreciated, however, that the enforcement of the present invention
Example can be put into practice in the case where not having these details.In some instances, known method, structure are not been shown in detail
And technology, so as not to obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify the disclosure helping understand one or more in each inventive aspect,
Above in the description to the exemplary embodiment of the present invention, each feature of the present invention is grouped together into single enforcement sometimes
In example, figure or descriptions thereof.However, should not be construed to reflect following intention by the method for the disclosure:I.e. required guarantor
The more features of feature that the application claims ratio of shield is expressly recited in each claim.More precisely, such as following
Claims reflected as, inventive aspect is all features less than single embodiment disclosed above.Therefore,
Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself
All as the separate embodiments of the present invention.
Those skilled in the art be appreciated that can to embodiment in equipment in module carry out adaptively
Change and they are arranged in one or more equipment different from the embodiment.Can be the module in embodiment or list
Unit or component are combined into a module or unit or component, and can be divided in addition multiple submodule or subelement or
Sub-component.In addition at least some in such feature and/or process or unit is excluded each other, can adopt any
Combination is disclosed to all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so to appoint
Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (includes adjoint power
Profit is required, summary and accompanying drawing) disclosed in each feature can identical by offers, be equal to or the alternative features of similar purpose carry out generation
Replace.
Although additionally, it will be appreciated by those of skill in the art that some embodiments in this include institute in other embodiments
Including some features rather than further feature, but the combination of the feature of different embodiment means in the scope of the present invention
Within and form different embodiments.For example, in the following claims, embodiment required for protection any it
One can in any combination mode using.
The present invention all parts embodiment can be realized with hardware, or with one or more processor operation
Software module realize, or with combinations thereof realize.It will be understood by those of skill in the art that can use in practice
Microprocessor or digital signal processor (DSP) are realizing some or all portions in device according to embodiments of the present invention
The some or all functions of part.The present invention is also implemented as executing a part for method as described herein or complete
The equipment in portion or program of device (for example, computer program and computer program).Such program for realizing the present invention
Can store on a computer-readable medium, or there can be the form of one or more signal.Such signal can be with
Download from internet website and obtain, or provide on carrier signal, or provided with any other form.
It should be noted that above-described embodiment the present invention will be described rather than limits the invention, and ability
Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference symbol being located between bracket should not be configured to limitations on claims.Word "comprising" is not excluded the presence of not
Element listed in the claims or step.Word "a" or "an" before being located at element does not exclude the presence of multiple such
Element.The present invention can come real by means of the hardware for including some different elements and by means of properly programmed computer
Existing.If in the unit claim for listing equipment for drying, several in these devices can be by same hardware branch
To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and be run after fame
Claim.
The invention discloses:A1, a kind of fraud business recognition method, wherein, including:
Business datum is received, business datum and basic data is carried out Data Fusion and is obtained fused business data;
Calling rule engine and/or sorter model are identified to the fused business data, obtain result of determination;
It is that the business datum sets identification label according to result of determination, the business datum for being set identification label is returned
Give called side.
The fused business data wherein, are known by A2, the method according to A1 in the calling rule engine
Not, result of determination is obtained, is that methods described also includes after the business datum sets identification label according to result of determination:Will
The business datum identified by the use of regulation engine is trained as training data to the sorter model.
A3, the method according to A2, wherein, described model is trained further includes:
Training data is pre-processed, the hashed field in training data and/or the content not being identified is filtered out;
Building training dataset and sorting algorithm being passed to as parameter, training obtains grader;
The grader for training is evaluated, the grader that is verified by appraisement system is stored in sorter model data
For calling in storehouse.
A4, the method according to A3, wherein, one or more comprising following evaluation criterion of the appraisement system:Just
True rate, False Rate, the length of training time and ROC curve.
The fused business data wherein, are known by A5, the method according to A1 in the calling classification device model
Not, result of determination is obtained, is that methods described also includes after the business datum sets identification label according to result of determination:Root
New rule is generated according to the fraud business datum identified using sorter model, the new rule is added to the rule
In engine.
A6, the method according to A1 or A5, wherein, the rule used by the regulation engine includes with the next item down or many
?:Blacklist rule, white list rule and dimension rule.
A7, the method according to A1, wherein, the calling rule engine is identified to the fused business data,
Obtain result of determination to further include:
The calling rule engine is given a mark to each dimension data of the fused business data;
The fraction of each dimension data comprehensive obtains the fraud fraction of the fused business data;And
According to the fraud fraction of the fused business data, result of determination is obtained.
A8, the method according to A7, wherein, the dimension data is specially the preset field of the fused business data
Or the combination of preset field.
A9, the method according to A1, wherein, the calling rule engine and/or sorter model are to the fusion industry
Business data are identified, and obtain result of determination and further include:
Obtain the default weight of regulation engine and sorter model;
When calling rule engine and sorter model are identified to the fused business data, according to respective default
Weight carries out process to recognition result and obtains the result of determination.
A10, the method according to A1, wherein, the basic data produces location dependent information and/or industry comprising business
Business produces device-dependent message and/or business produces user related information.
The invention also discloses:B11, a kind of fraud business identifying device, wherein, including:
Business datum and basic data, for receiving business datum, are carried out Data Fusion and are obtained by data fusion module
Arrive fused business data;
The fused business data are identified for calling rule engine, obtain result of determination by regular identification module;
The fused business data are identified by model identification module for calling classification device model, obtain judging knot
Really;
Label setting module, for being that the business datum sets identification label according to result of determination, has been set identification
The business datum of label returns to called side.
B12, the device according to B11, wherein, described device also includes:Model training module, for will using rule
The business datum that engine is identified is trained to the sorter model as training data.
B13, the device according to B12, wherein, the model training module is further used for:
Training data is pre-processed, the hashed field in training data and/or the content not being identified is filtered out;
Building training dataset and sorting algorithm being passed to as parameter, training obtains grader;
The grader for training is evaluated, the grader that is verified by appraisement system is stored in sorter model data
For calling in storehouse.
B14, the device according to B13, wherein, one or more comprising following evaluation criterion of the appraisement system:
Accuracy, False Rate, the length of training time and ROC curve.
B15, the device according to B11, wherein, described device also includes:Rule generation module, for according to utilization point
The fraud business datum that class device Model Identification goes out generates new rule, and the new rule is added in the regulation engine.
B16, the device according to B11 or B15, wherein, the rule used by the regulation engine includes with the next item down
Or it is multinomial:Blacklist rule, white list rule and dimension rule.
B17, the device according to B11, wherein, the regular identification module is further used for:Calling rule engine pair
Each dimension data of the fused business data is given a mark;The fraction of each dimension data comprehensive obtains the fused business
The fraud fraction of data;And the fraud fraction according to the fused business data, obtain result of determination.
B18, the device according to B17, wherein, the dimension data is specially the predetermined word of the fused business data
Section or the combination of preset field.
B19, the device according to B11, wherein, described device also includes:Comprehensive identification module, draws for obtaining rule
Hold up the default weight with sorter model;When calling rule engine and sorter model are identified to the fused business data
When, process is carried out to recognition result according to respective default weight and obtains the result of determination.
B20, the device according to B11, wherein, the basic data comprising business produce location dependent information and/or
Business produces device-dependent message and/or business produces user related information.