CN109544150A - A kind of method of generating classification model and device calculate equipment and storage medium - Google Patents

A kind of method of generating classification model and device calculate equipment and storage medium Download PDF

Info

Publication number
CN109544150A
CN109544150A CN201811174157.0A CN201811174157A CN109544150A CN 109544150 A CN109544150 A CN 109544150A CN 201811174157 A CN201811174157 A CN 201811174157A CN 109544150 A CN109544150 A CN 109544150A
Authority
CN
China
Prior art keywords
variable
sample label
probability
label
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811174157.0A
Other languages
Chinese (zh)
Inventor
贾思宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201811174157.0A priority Critical patent/CN109544150A/en
Publication of CN109544150A publication Critical patent/CN109544150A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/382Payment protocols; Details thereof insuring higher security of transaction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Abstract

A kind of method of generating classification model and device that this specification provides calculate equipment and storage medium, wherein the method includes obtaining multiple sample datas and the corresponding white sample label of each sample data or black sample label;Extract the variable and the corresponding variate-value of the variable for constituting each sample data;Disaggregated model is trained by the variable, the corresponding variate-value of the variable, the white sample label and the black sample label, the disaggregated model is obtained, the disaggregated model is used to export the various combination of the variable and the first probability of the white sample label and the black sample label degree of association.

Description

A kind of method of generating classification model and device calculate equipment and storage medium
Technical field
This application involves the machine learning techniques field of computer, in particular to a kind of method of generating classification model and dress It sets, calculate equipment and storage medium.
Background technique
In international credit card payment transaction, business team needs to divide fraudulent trading (ratio is if any Chargeback) Class, including ATO (stealing account), Non Fraud (the non-real fraud such as FF) and three kinds of main Types of Stolen Card (stealing card), To grasp the risk position of different fraud types, and special prevention and control are carried out for high risk type, is finally reached and intercepts risk, drop The purpose of low fraud index.
Currently, fraudulent trading classification is completed by the method for semi-automatic semi-manualization, to manual service experience and manpower Cost requirement is higher, and treatment effeciency and accuracy are also poor.
Summary of the invention
In view of this, this specification embodiment provides a kind of method of generating classification model and device, a kind of calculating equipment And storage medium, to solve technological deficiency existing in the prior art.
This specification embodiment discloses a kind of method of generating classification model, comprising:
Obtain multiple sample datas and the corresponding white sample label of each sample data or black sample label;
Extract the variable and the corresponding variate-value of the variable for constituting each sample data;
Pass through the variable, the corresponding variate-value of the variable, the white sample label and the black sample label pair Disaggregated model is trained, and obtains the disaggregated model, and the disaggregated model is used to export various combination and the institute of the variable State the first probability of white sample label and the black sample label degree of association.
On the other hand, this specification embodiment also discloses a kind of disaggregated model generating means, comprising:
First obtains module, is configured as obtaining multiple sample datas and the corresponding white sample of each sample data Label or black sample label;
First extraction module is configured as extracting the variable for constituting each sample data and the variable is corresponding Variate-value;
First training module is configured as through the variable, the corresponding variate-value of the variable, the white sample label And the black sample label is trained disaggregated model, obtains the disaggregated model, the disaggregated model is used to export institute State the various combination of variable and the first probability of the white sample label and the black sample label degree of association.
On the other hand, the embodiment of the present application discloses a kind of calculating equipment, including memory, processor and is stored in storage On device and the computer instruction that can run on a processor, the processor realize the instruction by processor when executing described instruction The step of method of generating classification model as described above is realized when execution.
On the other hand, the embodiment of the present application discloses a kind of computer readable storage medium, is stored with computer instruction, The step of instruction realizes method of generating classification model as described above when being executed by processor.
A kind of method of generating classification model and device, a kind of calculating equipment and storage medium that this specification provides, wherein The method includes obtaining multiple sample datas and the corresponding white sample label of each sample data or black sample label; Extract the variable and the corresponding variate-value of the variable for constituting each sample data;Pass through the variable, the variable Corresponding variate-value, the white sample label and the black sample label are trained disaggregated model, obtain the classification Model, the disaggregated model is used to export the various combination of the variable and the white sample label and the black sample label are closed First probability of connection degree.
Detailed description of the invention
Fig. 1 is a kind of method of generating classification model schematic diagram that one embodiment of this specification provides;
Fig. 2 is a kind of decision-tree model generation method schematic diagram that one embodiment of this specification provides;
Fig. 3 is a kind of flow chart for method of generating classification model that one embodiment of this specification provides;
Fig. 4 is a kind of variant structural schematic diagram that one embodiment of this specification provides;
Fig. 5 is a kind of output parameter schematic diagram for method of generating classification model that one embodiment of this specification provides;
Fig. 6 is a kind of flow chart for method of generating classification model that one embodiment of this specification provides;
Fig. 7 is a kind of flow chart for method of generating classification model that one embodiment of this specification provides;
Fig. 8 is a kind of structural schematic diagram for disaggregated model generating means that one embodiment of this specification provides;
Fig. 9 is a kind of structural block diagram for calculating equipment that one embodiment of this specification provides.
Specific embodiment
Many details are explained in the following description in order to fully understand the application.But the application can be with Much it is different from other way described herein to implement, those skilled in the art can be without prejudice to the application intension the case where Under do similar popularization, therefore the application is not limited by following public specific implementation.
The term used in this specification one or more embodiment be only merely for for the purpose of describing particular embodiments, It is not intended to be limiting this specification one or more embodiment.In this specification one or more embodiment and appended claims The "an" of singular used in book, " described " and "the" are also intended to including most forms, unless context is clearly Indicate other meanings.It is also understood that term "and/or" used in this specification one or more embodiment refers to and includes One or more associated any or all of project listed may combine.
It will be appreciated that though may be retouched using term first, second etc. in this specification one or more embodiment Various information are stated, but these information should not necessarily be limited by these terms.These terms are only used to for same type of information being distinguished from each other It opens.For example, first can also be referred to as second, class in the case where not departing from this specification one or more scope of embodiments As, second can also be referred to as first.Depending on context, word as used in this " if " can be construed to " ... when " or " when ... " or " in response to determination ".
Firstly, the vocabulary of terms being related to one or more embodiments of the invention explains.
International credit card payment: it is paid in electric business platform using international card (non-CN card).
Fraudulent trading: holder refuses to pay (i.e. Chargeback) non-trade in person to credit card issuer proposition, commonly referred to as such Transaction is fraudulent trading.
ATO: stealing account, and fraudster steals the account of arm's length dealing on platform and trades.
Non Fraud: although holder proposes to refuse to pay, being judged as non-real fraud according to information such as trading activities, can It can be because goods is not to reasons such as versions.
Stolen Card: card, fraudster's login account on platform are stolen, and is traded using the card stolen.
Infocode: the section of model variable influences the result of model score.
In the present specification, a kind of method of generating classification model and device, a kind of calculating equipment and storage medium are provided, It is described in detail one by one in the following embodiments.
Fig. 1 is to show the method for generating classification model flow chart of this specification one or more embodiment.
The disaggregated model includes input and output parameter, wherein the input parameter includes obtaining multiple samples Then data and the corresponding white sample label of each sample data or black sample label are extracted and constitute each sample The input parameter of the variable of data and the corresponding variate-value of the variable as the disaggregated model.
The output parameter includes that the various combination of the variable and the white sample label and the black sample label are closed First probability of connection degree.
In practical application, the disaggregated model can be applied in the classification of the fraudulent trading paid to international credit card, The input parameter may include already present more sorted fraudulent tradings in history, and every fraudulent trading is equal A corresponding label, such as 0 or 1, then by the variable and the corresponding variable of each variable of every fraudulent trading Value extracts, and the variable and the corresponding variate-value of each variable are the input parameter.
The output parameter may include the various combination and the label degree of association of the variable of all fraudulent tradings extracted Probability, i.e., the various combination of the described variable is some label probability of occurrence.
In this specification one or more embodiment, behavior when fraud returns can be restored by the disaggregated model, For the fraudulent trading that each pen receives, retrievable all information is trained when being returned using fraud, if certain account has More fraudulent tradings return, and all provide tag along sort to every fraudulent trading.
Fig. 2 is the decision applied in the method for generating classification model for show the offer of this specification one or more embodiment Tree-model generation method flow chart.
The decision-tree model includes input and output parameter, wherein the input parameter includes according to acquisition The various combination for multiple variables that multiple sample datas extract, the corresponding range of variables of the various combination of the variable and logical Cross the various combination and the white sample label and the black sample mark of the variable of the disaggregated model output of above-described embodiment Sign the first probability of the degree of association.
The output parameter includes the second probability of the range of variables Yu the first probabilistic correlation degree.
In practical application, the decision-tree model can apply the classification in the fraudulent trading paid to international credit card In, it is described to input in the more fraudulent tradings already present in history that parameter may include extraction, the variable of every fraudulent trading, The difference of the variable of the disaggregated model of the corresponding range of variables of the various combination of the variable and above-described embodiment output First probability of combination and the white sample label and the black sample label degree of association.
The output parameter may include the second probability that the range of variables influences the first probabilistic correlation degree.
In this specification one or more embodiment, the decision-tree model can be carried out by the result to disaggregated model Second training optimizes the disaggregated model according to the result of the second training, until the disaggregated model reaches symbol Conjunction expected effect, such as the result of the disaggregated model are more accurate.
Referring to Fig. 3, this specification one or more embodiment provides a kind of method of generating classification model, including step 302 to step 306.
Step 302: obtaining multiple sample datas and the corresponding white sample label of each sample data or black sample Label.
In this specification one or more embodiment, the multiple sample data includes but is not limited to already present in history Fraudulent trading has completed the data of the classification of these fraudulent tradings by semi-automatic semi-manual mode, wherein the multiple Sample data includes classify identical sample data and ataxonomic sample data.
Such as fraudulent trading includes that ATO (stealing account), Non Fraud (the non-real fraud such as FF) and Stolen Card (are stolen Card) three kinds of main Types, the identical sample data of the classification can be ATO class or be Non Fraud class.Described point The type of the identical sample data of class can be selected according to actual needs, and the application is not limited in any way this.
Each corresponding white sample label of the sample data or black sample label are it is to be understood that the sample data is The black sample label of the correspondence of ATO class;The sample data is not the white sample label of correspondence of ATO class.In actual use, for side Just model training is carried out, the white sample label can be replaced with number 0, and the black sample label can be replaced with number 1.
Step 304: extracting the variable and the corresponding variate-value of the variable for constituting each sample data.
In this specification one or more embodiment, each sample data includes variable and each variable pair The variate-value answered.
In actual use, the variable is different, extracts the variable and the variable pair for constituting each sample data The mode for the variate-value answered is not also identical.
For example, in this specification one or more embodiment, if the variable includes business division and business datum,
It then extracts the variable for constituting each sample data and the corresponding variate-value of the variable includes:
Extract the business division and the corresponding variate-value of the business division for constituting each sample data;
The business datum is determined according to the business division and the corresponding variate-value of the business division.
In actual use, the variable can also include the other content in addition to business division and business datum, can be with It is configured according to actual needs, the application is not limited in any way this.
In addition, the business division is different in actual use, it is corresponding according to the business division and the business division Variate-value determine that the mode of the business datum is not also identical.
For example, in this specification one or more embodiment, if the business division includes service attribute variable,
Then determine that the business datum includes according to the business division and the corresponding variate-value of the business division:
The main body of any two identical services attribute under preset condition is chosen as variable main body;
The feature of service attribute main body is chosen as variable object;
The business datum is determined according to comparison result of the variable main body on the identical variable object.
In actual use, the business division can also include the other content in addition to service attribute variable, Ke Yigen It is configured according to actual demand, the application is not limited in any way this.
Referring to fig. 4, by taking the business is transaction business as an example, the business division and business datums that include to the variable into Row is described in detail.
If the business is transaction business, the business division is the attribute variable that trades, such as transaction attribute itself (is set Standby, card, IP address and shipping address etc.), the business division can also include that whether account mailbox changes, account is blocked History, account shipping address number etc..
The business division includes variable main body and variable object, and the variable main body includes setting for transaction attribute itself Standby, card etc., the variable object include the feature, such as stroke count, the product classification number of Successful Transaction etc. of transaction attribute main body.
On the basis of according to the business division, select different time sequence or under the conditions of two same transaction attributes Equipment as variable main body, such as equipment under the conditions of fraud is as the equipment conduct under the conditions of variable main body A and non-fraud Variable main body B chooses some feature of transaction attribute main body as variable object, then by described two variable main bodys in phase The business datum is determined with the feature comparison result on variable object, such as equipment transaction stroke count and non-fraud under the conditions of fraud Under the conditions of equipment transaction stroke count comparison result, can also be that the maximum of the umid that certain fraudulent trading uses is handed in practical application The comparison result of the maximum transaction amount for the umid that the easy amount of money and the previous transaction of the transaction use.
Step 306: passing through the variable, the corresponding variate-value of the variable, the white sample label and the black sample This label is trained disaggregated model, obtains the disaggregated model, and the disaggregated model is used to export the difference of the variable First probability of combination and the white sample label and the black sample label degree of association.
In this specification one or more embodiment, the disaggregated model can include but is not limited to Random Forest model.
The difference variable can generate a variety of different combinations, each combination and the white sample label in different values Have a degree of association with the black sample label, the Random Forest model be used to export the various combination of the variable with it is described First probability of white sample label and the black sample label degree of association.
For example, see Fig. 5, the white sample label is 0, and the black sample label is 1, the random forest output prediction As a result with prediction score, the prediction result is 1 or 0, the probability that the prediction score occurs for the prediction result, described first Probability can the maximum probability of selection result.Such as prediction result is 1, prediction score is 1 or prediction result is 0, predicts score It is 1 etc..
In this specification one or more embodiment, if the sample data is fraudulent trading, the disaggregated model Returning the result is to provide score, i.e. probability to each fraudulent trading, if certain account has more fraudulent tradings, every transaction is equal There is a score;Point of the sample data is finally determined according to the threshold value of the tag along sort for returning the result and determining of disaggregated model Class.
In this specification one or more embodiment, the method for generating classification model is further comprising the steps of:
Obtain tag along sort, wherein the corresponding preset threshold of the tag along sort.
Determine that the various combination of the variable is corresponding according to the preset threshold of first probability and the tag along sort Tag along sort.
In this specification one or more embodiment, the tag along sort includes ATO, Non Fraud etc., the default threshold Value is configured according to tag along sort and actual demand, such as the preset threshold that tag along sort is ATO is 0~1.
In actual use, a disaggregated model can only provide a kind of tag along sort as a result, to carry out other contingency tables The prediction of label can choose different types of sample data, targeted train classification models.
It is following to be illustrated so that the trained disaggregated model is used to predict whether tag along sort is ATO as an example, such as institute State disaggregated model be prediction fraudulent trading whether the model for being ATO, tag along sort be ATO preset threshold be set as 0~1, point In class model application, if it is 0.2 that disaggregated model, which exports the probability that certain fraudulent trading is ATO, then can determine that this cheats The tag along sort of transaction is ATO, if it is 1.4 that disaggregated model, which exports the probability that certain fraudulent trading is ATO, then this can be determined The tag along sort of fraudulent trading is not ATO.
In this specification one or more embodiment, the disaggregated model is applied in fraudulent trading classification, fraud is handed over Variable in easy disaggregated model further includes the behavioral characteristics class change of transaction attribute in addition to transaction attribute variable itself is business division Amount is business datum, and the reduction trading activity of more height improves the accuracy of fraudulent trading classification.In addition, fraudulent trading is classified The final output of model is the result is that give a mark to each fraudulent trading, and set the score of different classifications according to disaggregated model result Threshold value, therefore can realize that fraudulent trading is classified automatically, improve classification efficiency and coverage rate, the variable and marking mark of disaggregated model Standard is unified, ensure that the Stability and veracity of classification.
Referring to Fig. 6, this specification one or more embodiment is provided in a kind of method of generating classification model to decision tree The step of model is trained includes step 602 to step 604.
Step 602: obtain the variable various combination and the white sample label and the black sample label degree of association the The corresponding range of variables of the various combination of one probability and the variable.
In this specification one or more embodiment, the input parameter of the decision-tree model includes according to above-mentioned classification mould Type is it can be concluded that first probability and the corresponding range of variables of various combination setting for the variable.
Such as the variable is umid_is_fraud and umid_lifespan, the various combination of the variable includes Umid_is_fraud and umid_lifespan, for the change of umid_is_fraud and umid_lifespan setting Measuring section can be umid_lifespan > 90 umid_is_fraud=0and.
Step 604: decision-tree model being trained by first probability and the range of variables, obtains described determine Plan tree-model, the decision-tree model are used to export the second probability of the range of variables Yu the first probabilistic correlation degree.
In this specification one or more embodiment, second probability is that the range of variables influences described first generally The result of rate.In actual use, in order to make it easy to understand, second probability can be indicated with Infocode.
Since above-mentioned disaggregated model is more complex, the variable that disaggregated model uses is more, the score exported according to disaggregated model Also can be inaccurate, and judge influence of the several hundred a variables of disaggregated model input to score, it is difficult to automatically complete.
In this specification one or more embodiment, by the way of complex model simplification, Selecting operation mode and knot The training result of disaggregated model and variable are carried out second training, generated intuitive and easy to understand by the decision-tree model of fruit relative straightforward infocode.For example, if the decision tree leaf node probability that is expressed as ATO is smaller, before leaf node backtracking Leaf node, and generate one group of rule and represent this decision path.The probability that path at this time is ATO is 20%- > lifespan When greater than 90 days -> is_fraud=0- > umid, rule is ATO_ratio=0.2and umid_is_fraud=0and Umid_lifespan > 90 are then easy to be understood as in certain equipment without fraud record and when the use of duration being more than 90 days be ATO Probability be 20%.Wherein, the probability of ATO be 20% be the disaggregated model training result, umid_is_fraud and Umid_lifespan is variable, the variable combination of umid_lifespan > 90 umid_is_fraud=0and and variable Combine corresponding range of variables.
In this specification one or more embodiment, centainly asked since the training result of disaggregated model exists in accuracy rate Topic, will affect the effect of disaggregated model training, in addition, the accuracy of variable also will affect disaggregated model performance, therefore pass through by The training result and variable of disaggregated model carry out second training, infocode are exported, thus the sample data of Optimum Classification model And variable, so that the training result of the disaggregated model is more accurate.
Referring to Fig. 7, this specification one or more embodiment provides a kind of method of generating classification model, including step 702 to step 712.
Step 702: obtaining multiple sample datas and the corresponding white sample label of each sample data or black sample Label.
Step 704: extracting the variable and the corresponding variate-value of the variable for constituting each sample data.
Step 706: passing through the variable, the corresponding variate-value of the variable, the white sample label and the black sample This label is trained disaggregated model, obtains the disaggregated model, and the disaggregated model is used to export the difference of the variable First probability of combination and the white sample label and the black sample label degree of association.
Step 708: obtain the variable various combination and the white sample label and the black sample label degree of association the The corresponding range of variables of the various combination of one probability and the variable.
Step 710: decision-tree model being trained by first probability and the range of variables, obtains described determine Plan tree-model, the decision-tree model are used to export the second probability of the range of variables Yu the first probabilistic correlation degree.
In this specification one or more embodiment, the infocode that can extract decision-tree model output is judged, It determines whether its significant range of variables meets service logic, and provides and whether need the Optimizing Suggestions such as to adjust and how to adjust.
A such as data are as follows: the probability of ATO is 20% without fraud record in certain equipment and when being more than 90 days using duration. If judging, the probability that the data are ATO is little, illustrates that umid_is_fraud=0andumid_lifespan > 90 pair judge It whether be for ATO is inapparent range of variables.If it is judged that umid_is_fraud=0and umid_lifespan > When 20, the probability which is ATO may be very big, then this adjustable range of variables.
Step 712: judging whether second probability is accurate, if it is not, then adjusting the corresponding white sample of the sample data Label or the corresponding range of variables of the various combination of black sample label and the variable.
In this specification one or more embodiment, if second probability is inaccurate, corrects this infocode and dividing Label result in class model adjusts the corresponding white sample label of the sample data or black sample label, to improve classification The accuracy rate of the sample data of model.
Such as some infocode is affected in disaggregated model, and the result of certain transaction is set to is ATO, When sample data when this transaction is as disaggregated model training inputs, corresponding is black sample label 1.But last discovery should Infocode inaccuracy, then the black sample label 1 of this transaction can be revised as white sample label 0.
If the second probability inaccuracy, adjusts the corresponding range of variables of various combination of the variable.I.e. for sentencing Break as infocode inaccuracy, the case where significant range of variables does not meet service logic, range of variables is adjusted or It is substituted using other range of variables.
Such as certain equipment use when a length of range of variables, initial logic is more long using duration more may be to steal account, To smaller a possibility that being then robber's account longer using the time after range of variables adjustment.
In this specification one or more embodiment, by the training sample data of decision-tree model Optimum Classification model and Range of variables, until the effect that the training result of disaggregated model is optimal, so that the output training result of disaggregated model is more Accurately.
In this specification one or more embodiment, the training result of disaggregated model is used into decision-tree model second training The more intuitive infocode being more readily understood is generated, and then optimizes the number of training of disaggregated model according to the infocode According to and variable.
The disaggregated model greatly improves training sample data optimization efficiency and saves a large amount of manpowers, shortens previous only root According to the period of the training result adjustment disaggregated model performance of disaggregated model, and the accurate of disaggregated model is improved to a certain extent Rate, and the disaggregated model can stablize, efficiently, all standing, high-accuracy complete fraudulent trading classification, and can be in international card branch It is used in the different business scene paid.
Referring to Fig. 8, this specification one or more embodiment provides a kind of disaggregated model generating means, comprising:
First obtains module 802, is configured as obtaining multiple sample datas and each sample data is corresponding white Sample label or black sample label;
First extraction module 804 is configured as extracting the variable and the variable pair for constituting each sample data The variate-value answered;
First training module 806 is configured as through the variable, the corresponding variate-value of the variable, the white sample Label and the black sample label are trained disaggregated model, obtain the disaggregated model, and the disaggregated model is used to defeated First probability of the various combination of the variable and the white sample label and the black sample label degree of association out.
Optionally, the variable includes business division and business datum,
First extraction module 804 includes:
Second extracting sub-module is configured as extracting the business division and the business for constituting each sample data The corresponding variate-value of main body;
First determines submodule, is configured as true according to the business division and the corresponding variate-value of the business division The fixed business datum.
Optionally, the business division includes service attribute variable,
Described first determines that submodule includes:
First chooses submodule, is configured as choosing the main body conduct of any two identical services attribute under preset condition Variable main body;
Second chooses submodule, is configured as choosing the feature of service attribute main body as variable object;
Business datum determines submodule, is configured as the comparison according to the variable main body on the identical variable object As a result the business datum is determined.
Optionally, described device further include:
Second obtains module, is configured as obtaining the various combination of the variable and the white sample label and black sample mark Sign the corresponding range of variables of various combination of the first probability and the variable of the degree of association;
Second training module is configured as instructing decision-tree model by first probability and the range of variables Practice, obtain the decision-tree model, the decision-tree model is used to export the range of variables and the first probabilistic correlation degree The second probability.
Optionally, described device further include:
Judgment module is configured as judging whether second probability is accurate, if it is not, it is corresponding then to adjust the sample data White sample label or black sample label and the variable the corresponding range of variables of various combination.
Optionally, described device further include:
Third obtains module, is configured as obtaining tag along sort, wherein the corresponding preset threshold of the tag along sort;
Second determining module is configured as according to the determination of the preset threshold of first probability and the tag along sort The corresponding tag along sort of the various combination of variable.
In this specification one or more embodiment, the sorter on the basis of transaction attribute variable refined, Increase business datum;The business datum include choose different time sequence or under the conditions of two same transaction attributes as become Main body is measured, chooses some feature of transaction attribute main body as variable object, two variable main bodys are on identical variable object Feature comparison result is business datum.
Because to trade with reference to all information cheated when returning, General Properties variable when classifying to fraudulent trading Attribute variable's (account uses mailbox number etc.) can also portray the dynamic transaction behavior of account, but by the limit of technical time window System, can not value outside statistical time window, therefore effect has limitation, which can overcome the disadvantages that the vacancy of behavior reduction, and It is not limited by time window, classification results accuracy rate can be improved.In addition, the business datum flexibly can change host-guest so that For different business scene.
In addition, the variable of complicated disaggregated model is more, the decision-tree model of Selecting operation mode and result relative straightforward The various combination of training result and variable to disaggregated model carries out second training and generates infocode, takes out inapparent Infocode is by business experience judgement, the label for optimizing and correcting training sample data in former disaggregated model as a result, repeating the mistake The training result of Cheng Zhizhi disaggregated model meets expection.To infocode judgement can simultaneously disaggregated model training sample data and Variable is more directly obviously improved disaggregated model effect.
Fig. 9 is to show the structural block diagram of the calculating equipment 100 according to one embodiment of this specification.The calculating equipment 100 Component include but is not limited to memory 110 and processor 120.Processor 120 is connected with memory 110 by bus 130, Database 150 is for saving data.
Calculating equipment 100 further includes access device 140, access device 140 enable calculate equipment 100 via one or Multiple networks 160 communicate.The example of these networks includes public switched telephone network (PSTN), local area network (LAN), wide area network (WAN), the combination of the communication network of personal area network (PAN) or such as internet.Access device 140 may include wired or wireless One or more of any kind of network interface (for example, network interface card (NIC)), such as IEEE702.11 wireless local area Net (WLAN) wireless interface, worldwide interoperability for microwave accesses (Wi-MAX) interface, Ethernet interface, universal serial bus (USB) connect Mouth, cellular network interface, blue tooth interface, near-field communication (NFC) interface, etc..
In one embodiment of this specification, unshowned other component in above-mentioned and Fig. 9 of equipment 100 is calculated It can be connected to each other, such as pass through bus.It should be appreciated that calculating device structure block diagram shown in Fig. 9 is merely for the sake of example Purpose, rather than the limitation to this specification range.Those skilled in the art can according to need, and increase or replace other portions Part.
Calculating equipment 100 can be any kind of static or mobile computing device, including mobile computer or mobile meter Calculate equipment (for example, tablet computer, personal digital assistant, laptop computer, notebook computer, net book etc.), movement Phone (for example, smart phone), wearable calculating equipment (for example, smartwatch, intelligent glasses etc.) or other kinds of shifting Dynamic equipment, or the static calculating equipment of such as desktop computer or PC.Calculating equipment 100 can also be mobile or state type Server.
One embodiment of the application also provides a kind of computer readable storage medium, is stored with computer instruction, the instruction The step of method of generating classification model as previously described is realized when being executed by processor.
A kind of exemplary scheme of above-mentioned computer readable storage medium for the present embodiment.It should be noted that this is deposited The technical solution of storage media and the technical solution of above-mentioned method of generating classification model belong to same design, the technology of storage medium The detail content that scheme is not described in detail may refer to the description of the technical solution of above-mentioned method of generating classification model.
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims It is interior.In some cases, the movement recorded in detail in the claims or step can be come according to the sequence being different from embodiment It executes and desired result still may be implemented.In addition, process depicted in the drawing not necessarily require show it is specific suitable Sequence or consecutive order are just able to achieve desired result.In some embodiments, multitasking and parallel processing be also can With or may be advantageous.
The technology carrier being related to is paid described in the embodiment of the present application, such as may include near-field communication (Near Field Communication, NFC), WIFI, 3G/4G/5G, POS machine swipe the card technology, two dimensional code barcode scanning technology, bar code barcode scanning technology, Bluetooth, infrared, short message (Short Message Service, SMS), Multimedia Message (Multimedia Message Service, MMS) etc..
The computer instruction includes computer program code, the computer program code can for source code form, Object identification code form, executable file or certain intermediate forms etc..The computer-readable medium may include: that can carry institute State any entity or device, recording medium, USB flash disk, mobile hard disk, magnetic disk, CD, the computer storage of computer program code Device, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), Electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that the computer-readable medium include it is interior Increase and decrease appropriate can be carried out according to the requirement made laws in jurisdiction with patent practice by holding, such as in certain jurisdictions of courts Area does not include electric carrier signal and telecommunication signal according to legislation and patent practice, computer-readable medium.
It should be noted that for the various method embodiments described above, describing for simplicity, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the application is not limited by the described action sequence because According to the application, certain steps can use other sequences or carry out simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules might not all be this Shen It please be necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiments.
The application preferred embodiment disclosed above is only intended to help to illustrate the application.There is no detailed for alternative embodiment All details are described, are not limited the invention to the specific embodiments described.Obviously, according to the content of this specification, It can make many modifications and variations.These embodiments are chosen and specifically described to this specification, is in order to preferably explain the application Principle and practical application, so that skilled artisan be enable to better understand and utilize the application.The application is only It is limited by claims and its full scope and equivalent.

Claims (14)

1. a kind of method of generating classification model characterized by comprising
Obtain multiple sample datas and the corresponding white sample label of each sample data or black sample label;
Extract the variable and the corresponding variate-value of the variable for constituting each sample data;
By the variable, the corresponding variate-value of the variable, the white sample label and the black sample label to classification Model is trained, and obtains the disaggregated model, the disaggregated model be used to export the various combination of the variable with it is described white First probability of sample label and the black sample label degree of association.
2. the method according to claim 1, wherein the variable includes business division and business datum,
It extracts the variable for constituting each sample data and the corresponding variate-value of the variable includes:
Extract the business division and the corresponding variate-value of the business division for constituting each sample data;
The business datum is determined according to the business division and the corresponding variate-value of the business division.
3. according to the method described in claim 2, it is characterized in that, the business division includes service attribute variable,
Determine that the business datum includes according to the business division and the corresponding variate-value of the business division:
The main body of any two identical services attribute under preset condition is chosen as variable main body;
The feature of service attribute main body is chosen as variable object;
The business datum is determined according to comparison result of the variable main body on the identical variable object.
4. the method according to claim 1, wherein further include:
Obtain the various combination of the variable and the first probability and the institute of the white sample label and the black sample label degree of association State the corresponding range of variables of various combination of variable;
Decision-tree model is trained by first probability and the range of variables, obtains the decision-tree model, institute State the second probability that decision-tree model is used to export the range of variables Yu the first probabilistic correlation degree.
5. according to the method described in claim 4, it is characterized by further comprising:
Judge whether second probability is accurate, if it is not, then adjusting the corresponding white sample label of the sample data or black sample The corresponding range of variables of the various combination of label and the variable.
6. the method according to claim 1, wherein further include:
Obtain tag along sort, wherein the corresponding preset threshold of the tag along sort;
The corresponding classification of various combination of the variable is determined according to the preset threshold of first probability and the tag along sort Label.
7. a kind of disaggregated model generating means characterized by comprising
First obtains module, is configured as obtaining multiple sample datas and the corresponding white sample label of each sample data Or black sample label;
First extraction module is configured as extracting the variable and the corresponding variable of the variable for constituting each sample data Value;
First training module, be configured as by the variable, the corresponding variate-value of the variable, the white sample label and The black sample label is trained disaggregated model, obtains the disaggregated model, and the disaggregated model is used to export the change First probability of the various combination of amount and the white sample label and the black sample label degree of association.
8. device according to claim 7, which is characterized in that the variable includes business division and business datum,
First extraction module includes:
Second extracting sub-module is configured as extracting the business division and the business division for constituting each sample data Corresponding variate-value;
First determines submodule, is configured as determining institute according to the business division and the corresponding variate-value of the business division State business datum.
9. device according to claim 8, which is characterized in that the business division includes service attribute variable,
Described first determines that submodule includes:
First chooses submodule, is configured as choosing the main body of any two identical services attribute under preset condition as variable Main body;
Second chooses submodule, is configured as choosing the feature of service attribute main body as variable object;
Business datum determines submodule, is configured as the comparison result according to the variable main body on the identical variable object Determine the business datum.
10. device according to claim 7, which is characterized in that further include:
Second obtains module, is configured as obtaining the various combination of the variable and the white sample label and black sample label is closed The corresponding range of variables of various combination of the first probability and the variable of connection degree;
Second training module is configured as being trained decision-tree model by first probability and the range of variables, Obtain the decision-tree model, the decision-tree model is used to export the of the range of variables and the first probabilistic correlation degree Two probability.
11. device according to claim 7, which is characterized in that further include:
Judgment module is configured as judging whether second probability is accurate, if it is not, it is corresponding white then to adjust the sample data The corresponding range of variables of the various combination of sample label or black sample label and the variable.
12. device according to claim 7, which is characterized in that further include:
Third obtains module, is configured as obtaining tag along sort, wherein the corresponding preset threshold of the tag along sort;
Second determining module is configured as determining the variable according to the preset threshold of first probability and the tag along sort The corresponding tag along sort of various combination.
13. a kind of calculating equipment including memory, processor and stores the calculating that can be run on a memory and on a processor Machine instruction, which is characterized in that the processor is realized when executing described instruction realizes that right the is wanted when instruction is executed by processor The step of seeking 1-6 any one the method.
14. a kind of computer readable storage medium, is stored with computer instruction, which is characterized in that the instruction is held by processor The step of claim 1-6 any one the method is realized when row.
CN201811174157.0A 2018-10-09 2018-10-09 A kind of method of generating classification model and device calculate equipment and storage medium Pending CN109544150A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811174157.0A CN109544150A (en) 2018-10-09 2018-10-09 A kind of method of generating classification model and device calculate equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811174157.0A CN109544150A (en) 2018-10-09 2018-10-09 A kind of method of generating classification model and device calculate equipment and storage medium

Publications (1)

Publication Number Publication Date
CN109544150A true CN109544150A (en) 2019-03-29

Family

ID=65843552

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811174157.0A Pending CN109544150A (en) 2018-10-09 2018-10-09 A kind of method of generating classification model and device calculate equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109544150A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110263859A (en) * 2019-06-21 2019-09-20 深圳前海微众银行股份有限公司 Sample classification method, apparatus, equipment and readable storage medium storing program for executing
CN111209377A (en) * 2020-04-23 2020-05-29 腾讯科技(深圳)有限公司 Text processing method, device, equipment and medium based on deep learning
WO2021059081A1 (en) * 2019-09-25 2021-04-01 International Business Machines Corporation Systems and methods for training a model using a few-shot classification process
WO2021103401A1 (en) * 2019-11-25 2021-06-03 深圳壹账通智能科技有限公司 Data object classification method and apparatus, computer device and storage medium
CN115346084A (en) * 2022-08-15 2022-11-15 腾讯科技(深圳)有限公司 Sample processing method, sample processing apparatus, electronic device, storage medium, and program product

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574544A (en) * 2015-12-16 2016-05-11 平安科技(深圳)有限公司 Data processing method and device
CN106156809A (en) * 2015-04-24 2016-11-23 阿里巴巴集团控股有限公司 For updating the method and device of disaggregated model
US20170083920A1 (en) * 2015-09-21 2017-03-23 Fair Isaac Corporation Hybrid method of decision tree and clustering technology
CN107305565A (en) * 2016-04-21 2017-10-31 富士通株式会社 Information processor, information processing method and message processing device
WO2017219548A1 (en) * 2016-06-20 2017-12-28 乐视控股(北京)有限公司 Method and device for predicting user attributes
CN107785058A (en) * 2017-07-24 2018-03-09 平安科技(深圳)有限公司 Anti- fraud recognition methods, storage medium and the server for carrying safety brain
CN108595497A (en) * 2018-03-16 2018-09-28 北京达佳互联信息技术有限公司 Data screening method, apparatus and terminal
CN108595704A (en) * 2018-05-10 2018-09-28 成都信息工程大学 A kind of the emotion of news and classifying importance method based on soft disaggregated model

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156809A (en) * 2015-04-24 2016-11-23 阿里巴巴集团控股有限公司 For updating the method and device of disaggregated model
US20170083920A1 (en) * 2015-09-21 2017-03-23 Fair Isaac Corporation Hybrid method of decision tree and clustering technology
CN105574544A (en) * 2015-12-16 2016-05-11 平安科技(深圳)有限公司 Data processing method and device
CN107305565A (en) * 2016-04-21 2017-10-31 富士通株式会社 Information processor, information processing method and message processing device
WO2017219548A1 (en) * 2016-06-20 2017-12-28 乐视控股(北京)有限公司 Method and device for predicting user attributes
CN107785058A (en) * 2017-07-24 2018-03-09 平安科技(深圳)有限公司 Anti- fraud recognition methods, storage medium and the server for carrying safety brain
CN108595497A (en) * 2018-03-16 2018-09-28 北京达佳互联信息技术有限公司 Data screening method, apparatus and terminal
CN108595704A (en) * 2018-05-10 2018-09-28 成都信息工程大学 A kind of the emotion of news and classifying importance method based on soft disaggregated model

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110263859A (en) * 2019-06-21 2019-09-20 深圳前海微众银行股份有限公司 Sample classification method, apparatus, equipment and readable storage medium storing program for executing
WO2021059081A1 (en) * 2019-09-25 2021-04-01 International Business Machines Corporation Systems and methods for training a model using a few-shot classification process
WO2021103401A1 (en) * 2019-11-25 2021-06-03 深圳壹账通智能科技有限公司 Data object classification method and apparatus, computer device and storage medium
CN111209377A (en) * 2020-04-23 2020-05-29 腾讯科技(深圳)有限公司 Text processing method, device, equipment and medium based on deep learning
CN115346084A (en) * 2022-08-15 2022-11-15 腾讯科技(深圳)有限公司 Sample processing method, sample processing apparatus, electronic device, storage medium, and program product

Similar Documents

Publication Publication Date Title
CN109544150A (en) A kind of method of generating classification model and device calculate equipment and storage medium
CN107766929B (en) Model analysis method and device
CN108648074A (en) Loan valuation method, apparatus based on support vector machines and equipment
US20210365963A1 (en) Target customer identification method and device, electronic device and medium
CN110147823B (en) Wind control model training method, device and equipment
CN107563757B (en) Data risk identification method and device
CN106296195A (en) A kind of Risk Identification Method and device
CN107578332A (en) A kind of method, apparatus, equipment and storage medium for recommending cash commodity
CN109409677A (en) Enterprise Credit Risk Evaluation method, apparatus, equipment and storage medium
CN108256691A (en) Refund Probabilistic Prediction Model construction method and device
CN110349000A (en) Method, apparatus and electronic equipment are determined based on the volume strategy that mentions of tenant group
CN104636912A (en) Identification method and device for withdrawal of credit cards
CN104657369A (en) User attribute information generating method and system
CN107657500A (en) Stock recommends method and server
CN109741177A (en) Appraisal procedure, device and the intelligent terminal of user credit
CN113469730A (en) Customer repurchase prediction method and device based on RF-LightGBM fusion model under non-contract scene
CN110415103A (en) The method, apparatus and electronic equipment that tenant group mentions volume are carried out based on variable disturbance degree index
CN110930038A (en) Loan demand identification method, loan demand identification device, loan demand identification terminal and loan demand identification storage medium
KR20220071875A (en) Device and method for underwriting a person who will subscribe to insurance based on artificial neural network
CN110349007A (en) The method, apparatus and electronic equipment that tenant group mentions volume are carried out based on variable discrimination index
CN110276677A (en) Refund prediction technique, device, equipment and storage medium based on big data platform
CN111882140A (en) Risk evaluation method, model training method, device, equipment and storage medium
CN111882420A (en) Generation method of response rate, marketing method, model training method and device
CN108898308A (en) Methods of risk assessment, device, server and readable storage medium storing program for executing
Yuping et al. New methods of customer segmentation and individual credit evaluation based on machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200921

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200921

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190329