CN108346098A - A kind of method and device of air control rule digging - Google Patents

A kind of method and device of air control rule digging Download PDF

Info

Publication number
CN108346098A
CN108346098A CN201810053792.7A CN201810053792A CN108346098A CN 108346098 A CN108346098 A CN 108346098A CN 201810053792 A CN201810053792 A CN 201810053792A CN 108346098 A CN108346098 A CN 108346098A
Authority
CN
China
Prior art keywords
variable
sample
type
characteristic type
air control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810053792.7A
Other languages
Chinese (zh)
Other versions
CN108346098B (en
Inventor
孙清清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201810053792.7A priority Critical patent/CN108346098B/en
Publication of CN108346098A publication Critical patent/CN108346098A/en
Application granted granted Critical
Publication of CN108346098B publication Critical patent/CN108346098B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Evolutionary Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Physiology (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

This application discloses a kind of method and devices of air control rule digging, it can be according to preset each characteristic type, determine that each learning sample corresponds to the characteristic value of this feature type, and as the variable of this feature type, pass through genetic algorithm later, the variable of each characteristic type and each characteristic type is screened, determine each specifying variable of each specific characteristic type and specific characteristic type, finally, according to each learning sample, each specifying variable of each specific characteristic type and specific characteristic type, using first order rule learning algorithm, air control rule is generated.

Description

A kind of method and device of air control rule digging
Technical field
This application involves information technology field more particularly to a kind of method and devices of air control rule digging.
Background technology
Money laundering is a kind of behavior that income generated in violation of the regulations legalizes, and refers mainly to the income by income generated in violation of the regulations and its generation, passes through Financial institution is covered up with various means, conceals its source and property, so that it is legalized in form, is belonged to illegal activity.
Currently, since money-laundering is realized by financial institution, financial institution is as anti money washing First Line is usually identified the transaction request received by the recognition rule configured, and with refusal execution, there are money laundering suspicion Doubtful transaction prevents the generation of money laundering behavior, alternatively, when determination has the transaction of money laundering suspicion, is deposited to the data of transaction Card, subsequently to trace.
But due to existing recognition rule, the data of the money laundering typically rule of thumb or in history determined Manually it is arranged.And the recognition rule being manually arranged, it is usually inaccurate so that anti money washing is being carried out based on existing recognition rule When efficiency it is low, and recognition accuracy is relatively low.
As it can be seen that a kind of method for excavating air control rule according to demand is needed, to excavate the air control of money-laundering for identification Rule.
Invention content
This specification embodiment provides a kind of method and device of air control rule digging, existing is manually arranged for solving Recognition rule is usually inaccurate so that efficiency when based on existing recognition rule anti money washing is low, and recognition accuracy compared with Low problem.
This specification embodiment uses following technical proposals:
A kind of method of air control rule digging, including:
For preset each characteristic type, determine that each learning sample corresponds to the characteristic value of this feature type, as the spy Levy the variable of type;
By genetic algorithm, at least part of characteristic type is selected from each characteristic type, as specific characteristic type, with And at least part of variable is selected from each variable of this feature type for each characteristic type, as this feature type Specifying variable;
According to each learning sample, each specifying variable of each specific characteristic type and specific characteristic type selected, adopt With first order rule learning algorithm, air control rule is generated.
A kind of device of air control rule digging, including:
Determining module determines that each learning sample corresponds to the characteristic value of this feature type for preset each characteristic type, Variable as this feature type;
Selecting module selects at least part of characteristic type by genetic algorithm from each characteristic type, as specified spy Type is levied, and, at least part of variable is selected from each variable of this feature type for each characteristic type, as the spy Levy the specifying variable of type;
Generation module, according to each learning sample, each finger of each specific characteristic type and specific characteristic type selected Determine variable, using first order rule learning algorithm, generates air control rule.
A kind of server, wherein the server includes:One or more processors and memory, memory are stored with Program, and be configured to execute following steps by one or more processors:
For preset each characteristic type, determine that each learning sample corresponds to the characteristic value of this feature type, as the spy Levy the variable of type;
By genetic algorithm, at least part of characteristic type is selected from each characteristic type, as specific characteristic type, with And at least part of variable is selected from each variable of this feature type for each characteristic type, as this feature type Specifying variable;
According to each learning sample, each specifying variable of each specific characteristic type and specific characteristic type selected, adopt With first order rule learning algorithm, air control rule is generated.
Above-mentioned at least one technical solution that this specification embodiment uses can reach following advantageous effect:
The method and device provided by this specification can be according to each study sample firstly, for preset each characteristic type This, determines the variable of this feature type.Later, by genetic algorithm, to the variable of each characteristic type and each characteristic type into Row screening, determines each specifying variable of specific characteristic type and specific characteristic type.Finally, refer to according to each learning sample, respectively Each specifying variable for determining characteristic type and specific characteristic type generates air control rule using first order rule learning algorithm.Due to Group become a common practice rule each specific characteristic type and its corresponding each specifying variable, be according to feature selecting algorithm screening and optimizing determine Out, therefore the recognition effect of the air control rule based on the generation of each specific characteristic type is more preferable.It is artificial to can avoid the prior art The disadvantage of rule is set, efficiency and recognition accuracy when carrying out anti money washing according to air control rule are improved.
Description of the drawings
Attached drawing described herein is used for providing further understanding of the present application, constitutes part of this application, this Shen Illustrative embodiments and their description please do not constitute the improper restriction to the application for explaining the application.In the accompanying drawings:
Fig. 1 is a kind of process for air control rule digging that this specification embodiment provides;
Fig. 2 is the optimization process for the genetic algorithm that this specification embodiment provides;
Fig. 3 a to 3d are the schematic diagram for the crossover operation that this specification provides;
Fig. 4 is the schematic diagram that air control rule is generated by first order rule learning algorithm that this specification embodiment provides;
Fig. 5 is a kind of structural schematic diagram for air control rule digging device that this specification embodiment provides;
Fig. 6 is a kind of structural schematic diagram for server that this specification embodiment provides.
Specific implementation mode
To keep the purpose, technical scheme and advantage of this specification clearer, it is embodied below in conjunction with this specification Technical scheme is clearly and completely described in example and corresponding attached drawing.Obviously, described embodiment is only this Shen Please a part of the embodiment, instead of all the embodiments.Based on the embodiment in specification, those of ordinary skill in the art are not having There is the every other embodiment obtained under the premise of making creative work, shall fall in the protection scope of this application.
Below in conjunction with attached drawing, the technical solution that each embodiment of the application provides is described in detail.
Fig. 1 is a kind of process for air control rule digging that specification embodiment provides, and specifically may include following steps:
S100:For preset each characteristic type, determine that each learning sample corresponds to the characteristic value of this feature type, as The variable of this feature type.
In this specification one or more embodiment, air control rule digging can be used for excavating carries out risk to various risks The rule of control, e.g., to swindle transaction prevention and control rule, to the air control rule etc. of loan transaction, for the convenience of description, subsequently It is illustrated so that the process of the air control rule digging is used to generate the risk rule of identification money laundering as an example.Then, the air control The process of air control rule digging can by financial institution, supervision department, law enforcement agency etc., participate in mechanism in anti money washing or Department executes.For example, investigating the equipment such as the terminal of mechanism by the server of bank, economic crime executes the air control rule digging Process.
For the convenience of description, follow-up this specification by taking the server of bank executes the process of the air control rule digging as an example into Row explanation.
In addition, since money laundering is mainly by the business (e.g., transaction business) that is executed in financial machine come what is realized, therefore In this specification, by the air control rule digging process generate air control rule can be according to business datum to money laundering into The air control rule of row identification, the wherein business datum can be that server executes data required when the service request received.
Then, in order to keep the air control rule of generation more accurate, server can determine each study according to historical data first Sample.It is directed to preset each characteristic type again, determines that each learning sample corresponds to the characteristic value of this feature type, as this feature The variable of type, so as to the execution of subsequent step.Specifically, server can be by the industry of several transactions executed in history Business data, respectively as learning sample.That is, by the business datum of each transaction, as an independent learning sample.Its In, it may include in the business datum of the transaction:Personal information (e.g., name, age, gender, address, the correspondent party of both parties Formula etc.), transaction amount, business address Internet protocol (Internet Protocol, IP) of both sides when executing, IP Location belonging country, the affiliated administrative division of IP address etc..
Further, usually for the needs of anti money washing, financial institution can ask every transaction to carry out anti money washing examination (e.g., identifying whether to belong to money laundering by preset air control rule), and by examination result addition in the business datum of the transaction. Wherein, it for being identified as the transaction of money laundering, usually also needs to report international anti money washing tissue, so that in the business datum of the transaction It is added to and is sanctioned in list with the relevant information of money laundering personnel.Then, server is according to historical data when determining learning sample, Can be also money laundering by examination result further according to the examination result for money laundering in the business datum of each transaction Business datum is not the business datum of money laundering as positive example sample, using examination result as negative example sample, as shown in table 1.
Table 1 is the schematic diagram that this specification implements each learning sample that the server provided determines:
Sample identification Business datum Examination result
001 Initiator:User a;Recipient:User f;IP countries:US;…… P
002 Initiator:User b;Recipient:User e;IP countries:RU;…… P
003 Initiator:User c;Recipient:User h;IP countries:UA;…… N
004 Initiator:User a;Recipient:User i;IP countries:CN;…… P
005 Initiator:User d;Recipient:User j;IP countries:UK;…… N
006 Initiator:User e;Recipient:User k;IP countries:DE;…… N
Table 1
It can be seen in table 1 that each learning sample corresponds to the business datum of a transaction respectively, also, can be tied according to examining Fruit is divided into positive example sample and negative example sample.Wherein, examination result indicates that the transaction is money laundering if being identified as P, if It is identified as N and then indicates it is not money laundering.That is, be labeled as P is positive example sample, what it is labeled as N is negative example sample.For Facilitate description, the representation method using above-mentioned N and P will be continued in follow-up explanation.
Later, it due to including the feature of money laundering for identification in each business datum in positive example sample, therefore takes Business device can determine that each learning sample corresponds respectively to this feature type for preset each characteristic type from each learning sample Characteristic value, as the variable of this feature type, will pass through subsequent step, further determine that for form air control rule it is each Each specifying variable of specific characteristic type and specific characteristic type.
In the present specification, include multiple business data since every transaction corresponds in business datum, and it is different types of Business datum is different the effect of identification money laundering, therefore can be using each business datum as a kind of feature class Type, and by the possible characteristic value of each business datum, as the variable of this feature type, use is determined will pass through subsequent step (larger characteristic type and each spy e.g., are acted on for identification money laundering in each specific characteristic type for generating air control rule Larger variable is acted on for identification money laundering in sign type).
For example, it is assumed that usually money laundering be from U.S. somewhere user to Britain's somewhere Client-initiated money transfer transactions, that IP address belonging country in business datum at this time is the U.S. or Britain in IP address belonging country, merchandises for identification Business whether be money laundering effect it is larger.Or, it is assumed that usual money laundering is in 4 points of bank's Afternoon Local Time 58 minutes It initiates, then the exchange hour in business datum and exchange hour are 58 minutes at 4 points in afternoons at this time, merchandises for identification Business whether be money laundering effect it is larger.
Then, server can be directed to preset each characteristic type, determine that each learning sample corresponds to the spy of this feature type Value indicative, as the variable of this feature type, so that it is determined that the corresponding each variable of each characteristic type.
Specifically, due to for each financial institution, the business datum of transaction business is its private data, so logical The business datum of the transaction business often only itself executed in the historical data of each financial institution, and other finance can not be obtained The business datum of mechanism.Therefore, learning sample is only determined by the historical data of itself, the abundant journey of learning sample may be caused Degree, the level of coverage deficiency to money laundering, and then the air control rule accuracy rate being subsequently generated is caused to decline.
For example, it is assumed that by the equipment of the same IP address, money laundering is carried out by bank a, and is executed by bank b Be arm's length dealing.Then when determining learning sample, bank a can determine that the transaction is positive example sample, the feature of the IP address The feature of positive example sample is also belonged to, and for bank b, money laundering was initiated due to not recording the IP address, then might be used The learning sample is recorded as negative example sample.
Therefore, in the present specification, server is when determining preset each characteristic type, in addition to dividing all kinds of business datums It Zuo Wei not also can sanction list according to preconfigured money laundering and determine that learning sample (is specifically as follows just other than each characteristic type Example sample), and according to the matching result of information and each business datum in money laundering sanction list, it is also used as each characteristic type, And enrich preset each characteristic type.
Specifically, it can be the letter for having determined as money laundering that international anti money washing tissue is announced that list is sanctioned in the money laundering Breath, wherein may include that the business datum respectively merchandised for having determined as money laundering (e.g., participates in both parties individual's letter of money laundering Breath, IP address, exchange hour etc.).Then on the one hand server can sanction the business respectively merchandised that list includes by the money laundering Data determine each positive example sample.On the other hand, whether the data that each business datum can be sanctioned to list with the money laundering match, and make For each characteristic type, and using each matching result as variable.
Specifically, its corresponding each business datum and money laundering can be sanctioned list by server to being directed to each learning sample In information matched, and using the matching result of each business datum as the variable of characteristic type.
For example, being directed to each learning sample, initiator's name in the learning sample and money laundering are sanctioned to each surname in list Name is matched, and initiator's IP address in the learning sample is matched with each IP address that money laundering is sanctioned in list, by this Each birthday date that initiator's birthday date and money laundering are sanctioned in list in learning sample match, etc..And according to progress As a result the variable of the characteristic type of the learning sample, e.g., 0 indicates to mismatch, and 1 indicates matching.
S102:By genetic algorithm, at least part of characteristic type is selected from each characteristic type, as specific characteristic class Type, and, at least part of variable is selected from each variable of this feature type for each characteristic type, as this feature class The specifying variable of type.
In the present specification, whether characteristic type different as described in the step s 100 is money laundering for judging transaction The effect of transaction is different, therefore in order to improve the accuracy for being subsequently generated air control rule, avoids unstable characteristic type pair The influence of air control rule is generated, server can also determine each specific characteristic class for forming air control rule by genetic algorithm Each variable of type and each specific characteristic type.Wherein, the unstable characteristic type is in positive example sample and negative example sample Characteristic type similar in the probability of middle appearance.For example, a certain characteristic type is that most of positive example sample and major part are negative simultaneously The feature that has of example sample, then this feature for judge to merchandise whether be money laundering effect it is smaller, therefore can pass through Genetic algorithm optimization filters out.Similarly, it for the variable of each characteristic type, can also be optimized by genetic algorithm Filter.
Specifically, in the present specification, each learning sample can be considered that a population, each learning sample can be considered one by one Body, each characteristic type in each learning sample can be considered in the individual that each gene for including, the variable of characteristic type can be considered The variable of gene.Server can determine each characteristic type and each feature by the optimization process of genetic algorithm as shown in Figure 2 The specifying variable of type, includes the following steps:
S1020:Feature coding is carried out to each learning sample.
First, optimization server can be directed to each learning sample for convenience, be carried out to the characteristic type of the learning sample Feature coding, in order to subsequently according to feature selecting algorithm, determine the specific characteristic type for forming air control rule.
Specifically, server can be directed to characteristic type Fi, by this feature type FiVariable partitions be three domains:Operator Domain Oi, codomain ViAnd action scope Ei.Wherein, FiIndicate ith feature type, OiIndicate the operator of i-th of special card type, fortune Operator may include:" in ", "=" and " not in " indicate include, be equal to and do not include respectively, ViIndicate i-th of special card class The characteristic value of type, EiIndicate that i-th of special card type whether there is in learning sample, e.g., 0 indicates to be not present, and 1 indicates exist.
It is assumed that by taking the learning sample 001 in table 1 as an example, feature knot as shown in Table 2 is being obtained after feature coding Structure:
Table 2
Wherein, the Partial Feature type in corresponding learning sample 001 after coding as a result, table 2 shows F1~F44 Feature coding, content are the variable of the corresponding feature coding of each characteristic type.For each learning sample, all it can determine pair Should learning sample series of features coding.Therefore, it after carrying out feature coding to each characteristic type, can be obtained such as 3 institute of table The population shown.
Learning sample 1 F1 F2 …… Fm P
Learning sample 2 F1 F2 …… Fm P
Learning sample 3 F1 F2 …… Fm N
…… …… …… …… …… ……
Learning sample X F1 F2 …… Fm P
Table 3
Wherein, it is seen that share X learning sample, one learning sample of each behavior in table 3, that is, a in genetic algorithm Body, and P and N is as previously described the examination result to learning sample.
S1021:It is at war with selection to each learning sample, using the learning sample selected as the individual in population.
Server can be to the population, and be at war with selection, and the higher individual of fitness is retained in the population.Specifically , server can calculate the fitness of the sample to be selected using each learning sample as sample to be selected (that is, individual), it Afterwards according to calculating to sample respectively to be selected fitness, filter out the sample to be selected of preset quantity as subsequent step Execute object.For example, according to calculating to sample respectively to be selected fitness sequence from high to low, select preset quantity Sample to be selected, or according to preset fitness threshold value, fitness is selected to be higher than the sample to be selected of the fitness threshold value, etc. Deng.Certainly, it for how according to fitness to select sample to be selected, can specifically be configured as needed, this specification is to this It does not limit.
In addition, in the present specification, each fitness of sample to be selected can be according to each feature in the sample to be selected The sum of fitness of variable of type determines.And the fitness of the variable of each characteristic type can be calculated according to fitness formula It arrives.Specifically, the fitness formula can be:fitnessji=Niplog2(Nip/(Nip+10Nin)), wherein fitnessiIt indicates The fitness of ith feature type, NipIndicate the number that the variable of the ith feature type occurs in each positive example sample, Nin Indicate the number that the variable of the ith feature type occurs in negative example sample.Then, server can be according to formula:Determine the fitness of each sample to be selected, wherein fitnessjJ-th of sample to be selected of expression Fitness, fitnessjiIndicate the fitness of the variable of ith feature type in j-th of sample to be selected.
S1022:The population is adjusted, determines at least one new population.
Then, service can be to by the population after previous step screening, executing and replicating operation, crossover operation and variation At least one of operation etc. operation, to obtain new population.In the present specification, with GkLow K is indicated for population, then G0Table Show for the first time by the population of fitness screening.In order to facilitate understanding, during being optimized by genetic algorithm, this theory The sample to be selected that bright book is selected with individual replacement by step S1021, it is each by what is selected with gene substitution characteristic type The set of sample to be selected is illustrated as a population, the variable of characteristic type for the variable of gene.
Specifically, being illustrated below for each operation:
It is that server carries out random reproduction to the individual in population to replicate operation, obtains new population G'0, wherein it is random multiple The duplication probability of system can be configured as needed, e.g., 50%.That is, thering is 50% probability to be copied into new population each individual G'0In.
Crossover operation is server from population G0In individual match two-by-two, and according to crossover probability, determine each pair of individual Whether crossover operation is executed.Specific crossover operation may include:It exchanges and merges.That is, determining to carry out each of crossover operation To the syngeneic (that is, characteristic type of the same race) that two are individual in individual, and carry out the exchange between the variable of syngeneic or Person merges, to obtain new individual.And the new individual obtained after crossover operation will be executed, as new population G "0In individual. Wherein, the probability of crossover operation can be also configured as needed, e.g., 90%.That is, each pair of individual of server has 90% probability Carry out crossover operation.
It should be noted that in the present specification, the type of variable may include:The first kind and Second Type.Wherein, One type can be two-value type, that is, only there are two the variables being worth.Second Type can be:Enumeration type or it is combined in one kind.Specifically , two-value type variable can be matching result, one kind (e.g., the resident country of initiator, the IP that enumeration type variable can be in plurality of kinds of contents Address belonging country etc.), combined variable can be one or more in plurality of kinds of contents.In addition, types of variables may also include: Discrete variable, concretely discrete data (e.g., birthday date, age, the amount of money etc.).Certainly, server also can basis Need feature being divided into more, this specification does not limit this.
And for crossover operation, the variable that server can be directed in each pair of individual is the syngeneic of two-value type variable Variable swap, or for it is each pair of individual in variable be enumeration type variable, combined variable syngeneic into change Amount row merges.Wherein, it is specifically chosen which syngeneic swaps or merges, and can be arranged as required to this specification not It limits.
For example, it is assumed that for population G shown in Fig. 3 a0In two individuals 002 and 008, server select the two Body carries out crossover operation and carries out syngeneic exchange to determine new individual.Then, server can first determine the variable in this individual For the syngeneic of two-value type variable, characteristic type F as shown in fig. 3a3With characteristic type F5Variable be two-value type variable, And by 002 and 008 corresponding codomain V of individual3And V5It is interchangeable, so that it is determined that two new individuals, as new population G”0In individual, as shown in Figure 3b.Certainly, also can be the difference spy of two-value type variable to each self-contained variable of individual The codomain of sign type is interchangeable, e.g., shown in Fig. 3 c.Certainly, when exchanging the codomain of different characteristic type, due to different characteristic The representation of the variable of type is different, and (e.g., variable is:" US ", " RU ", " DE " etc. indicate that the variable of country and variable are: “10:10AM”、“9:15PM " etc. indicates that the variable of time, the representation of variable are different), therefore representation in order to prevent Exchange between different variables leads to newly-generated individual gene deformity, and (it is 10 IP belonging countries e.g., occur:10AM is handed over It is US easily to initiate the time), server also can determine the identical change of representation according to the representation of the variable of each characteristic type How exchange between amount is specifically arranged this specification and does not limit this.
Alternatively, when if server merges to determine new individual syngeneic, individual 002 and 008 two can be directed to Variable is the gene of combined variable, characteristic type F as shown in fig. 3a in individual2With characteristic type F4, by individual 002 He 008 corresponding codomain V2It merges, and by 002 and 008 corresponding threshold value V of individual5It merges, to really Fixed new individual, as new population G "0In individual, as shown in Figure 3d.
Mutation operation is server from population G0According to mutation probability, the individual into row variation is selected from each individual, And be adjusted for the action scope of at least one gene (that is, characteristic type) in the individual selected, i.e., by the gene whether It is present in the individual and is adjusted, to which new individual will be obtained, as new population G "0In individual.Wherein, mutation operation Select probability can be configured as needed, this specification does not limit, and e.g., may be configured as 10%.Wherein, for each base Cause, when server by the variable of gene by being not present in being adjusted to be present in the individual in the individual when, can be according in the base The frequency of occurrences of each variable because in selects a variable as the variable of the gene of the individual after adjustment.Such as, there is frequency in selection The highest variable of rate, the variable as the gene.
S1023:It is determined next by tournament selection according to the population, and according to the new population that the population generates For population.
After again, server can be according to regaining population G'0、G”0、G”0And population G0At least one population in sample This conduct sample to be selected re-starts tournament selection, obtains next-generation population G1.Wherein, the process of selection of being at war with can Such as abovementioned steps S1021, this specification repeats no more this.
S1024:Whether the iterations for the population that judgment step S1023 is determined reach preset value, if executing step S1025, if it is not, executing step S1022.
S1025:The population determined is exported as a result.
Finally, server can be repeated the above process by genetic algorithm, until iterations reach preset value, wherein Preset value can be configured as needed, and this specification does not limit this.For example, preset value is 5000, then when determining G5000When, which is exported as a result.At this point, in the population it is each individual in include gene be protected by tournament selection The gene stayed, and by above-mentioned duplication, intersection and mutation operation, reduce the variable in the gene respectively remained Type.Each characteristic type for including in the population then is the specific characteristic type for generating air control rule, and each specified spy Each variable of type is levied, is specifying variable.
In addition, in the present specification, server, can be from duplication, intersection and variation during each population iteration At least one operation is selected to execute in operation.
S104:According to each learning sample, each specified change of each specific characteristic type and specific characteristic type selected Amount generates air control rule using first order rule learning algorithm.
In this specification embodiment, server determines each specified spy by the feature learnings algorithm such as genetic algorithm Levy type after, can according in each learning sample for the mark of positive example sample and negative example sample, by first order rule Algorithm is practised, air control rule is generated.
Specifically, server can redefine each positive example sample and each negative example sample according to each specific characteristic type first This, that is, list is sanctioned according to the corresponding business datum of each learning sample and money laundering, redefines the feature class of each learning sample Type.
Later, server can use first order rule learning algorithm (First Order Inductive Learner, Foil), several air control rules are generated.
Wherein, server, can be according to positive example sample and negative example sample, really when generating air control rule by Foil algorithms Determine training set.And following steps are executed, as shown in Figure 4:
S1040:Air control regular collection is initialized, and using each positive example sample as positive example sample set, each negative example sample is made To bear example sample set;
S1041:Judge whether positive example sample set is empty, if so then execute step S1048, if it is not, thening follow the steps S1042;
S1042:According to Foil algorithms, from each specifying variable, a specifying variable is determined, as newly-built air control rule packet The specifying variable contained;
S1043:Judge whether create air control rule and the matching degree of negative example sample set is less than pre-determined threshold, if so, Step S1045 is executed, if it is not, thening follow the steps S1044;
S1044:According to Foil algorithms, from each specifying variable, determines a specifying variable, be added to the newly-built air control In rule;
S1045:According to the newly-built air control rule after addition specifying variable, updates and bear example sample set, and repeat step S1043;
S1046:The newly-built air control rule is added in air control regular collection;
S1047:According to air control regular collection, delete regular with the air control in the air control regular collection in positive example sample set Matched positive example sample, and repeat S1041;
S1048:Until judging positive example sample set for sky, by the air control rule in air control regular collection, as generation Several air controls rule.
Wherein, in step S1043, formula can be used in server:
Determine the gain of each specifying variable Value, and by the highest specifying variable of yield value, as the specifying variable for being added to air control rule.Wherein, PnewExpression specifies this After newly-built air control rule is added in variable, the quantity with the positive example sample for creating air control rule match, NnewIt indicates the specified change After newly-built air control rule is added in amount, the quantity with the negative example sample for creating air control rule match, PoldIt indicates the specified change not When newly-built air control rule is added in amount, the quantity with the positive example sample for creating air control rule match, NoldIt indicates the specified change not When newly-built air control rule is added in amount, the quantity with the negative example sample for creating air control rule match.
It should be noted that in the present specification, specifying variable being added in air control rule, is concretely specified this The corresponding specific characteristic type of variable, is added in air control rule.For example, specific characteristic type is IP belonging countries, specifies and become Amount may include:" US ", " CN ", " UK " etc., when server selects specifying variable to be added to wind for " US " and by the specifying variable Regulatory control then when, can be specifically by IP belonging countries be US be added in air control rule.Then, each wind generated by specification Regulatory control then, can be considered the air control rule being made of different condition, when each condition coupling for having business datum and any air control rule When, it may be determined that the corresponding gel coat of the business datum is money laundering.
For example, it is assumed that generate certain air control rule be initiate transaction IP belonging to cross be US, promoter's name be user a, then When the transaction that user a is initiated from the U.S., it may be determined that the transaction is money laundering.
In addition, in the present specification, creating the matching degree of air control rule and negative example sample set, air control rule is created with this The quantity of matched positive example sample, and the ratio-dependent with the quantity of the negative example sample for creating air control rule match.For example, Assuming that some newly-built air control rule and 1000 positive example sample matches, and with 1 negative example sample matches, then newly-built air control rule Matching degree with negative example sample set is 0.1%.
Certainly, in the present specification, pre-determined threshold can be configured as needed, alternatively, server is judging to create wind When whether regulatory control is then higher than pre-determined threshold with the matching degree of negative example sample set, it can may be alternatively provided as judging newly-built air control rule The whether not matching with any negative example sample set.This specification does not limit this.
Based on air control rule digging process shown in FIG. 1, since group becomes a common practice each specific characteristic type and its correspondence of rule Each specifying variable, determined according to feature selecting algorithm screening and optimizing, thus based on each specific characteristic type generate Air control rule recognition effect it is more preferable.The avoidable prior art be manually arranged rule disadvantage, improve according to air control rule into Efficiency when row anti money washing and recognition accuracy.
In addition, in the present specification, server after generating several air control rules, can also to the air control rule of generation into Travel rule is trimmed, to further increase the recognition accuracy of air control rule.
Specifically, server can redefine the detection sample different from learning sample, each sample that detects can be according to going through History data determine.Later, according to several air controls of generation rule, each detection sample is identified, determines recognition result.So Afterwards, according to the examination result of existing each detection sample, the accuracy of the recognition result of several air controls rule generated is determined.
Specifically, it is possible, firstly, to be directed to the recognition result of each air control rule, determining each air control rule is accurate respectively Degree.Later, for each air control rule, server can be to the specified change for each specific characteristic type for including in the air control rule Whether amount judges to delete the accuracy of the recognition result of air control rule after the specifying variable higher than before the deletion specifying variable The recognition result accuracy of air control rule, if so, the specifying variable of the specific characteristic type is deleted, if otherwise not deleting.Most Afterwards, after server completes the trimming to all air controls rule, then the recognition result of the air control rule after each trimming is recalculated Accuracy select the air control rule of specified quantity as washing for identification and according to the accuracy of the air control rule after trimming The air control rule of money transaction.
Further, above-described embodiment only illustrates book by taking the air control rule of generation for identification money laundering as an example, Similarly, the air control rule digging method that this specification provides can also be directed to other types of service, air control rule be determined, such as in step Described in rapid S100, this specification does not limit this.
It should be noted that the executive agent of each step of this specification embodiment institute providing method may each be same and set It is standby, alternatively, this method is also by distinct device as executive agent.For example, the executive agent of step S100 and step S102 can be with Executive agent for equipment 1, step S102 can be equipment 2;Alternatively, the executive agent of step S100 can be equipment 1, step The executive agent of S102 and step S104 can be equipment 2;Etc..It is above-mentioned that this specification specific embodiment is described. Other embodiments are within the scope of the appended claims.In some cases, the action recorded in detail in the claims or step It suddenly can be according to being executed different from the sequence in embodiment and desired result still may be implemented.In addition, in the accompanying drawings The process of description, which not necessarily requires the particular order shown or consecutive order, could realize desired result.In certain embodiment party In formula, multitasking and parallel processing is also possible or it may be advantageous.
Based on the method for air control rule digging shown in FIG. 1, this specification embodiment also provides a kind of air control rule digging Device, as shown in Figure 5.
Fig. 5 is a kind of structural schematic diagram for air control rule digging device that this specification embodiment provides, described device packet It includes:
Determining module 200 determines that each learning sample corresponds to the feature of this feature type for preset each characteristic type Value, the variable as this feature type;
Selecting module 202 selects at least part of characteristic type, as finger by genetic algorithm from each characteristic type Determine characteristic type, and, at least part of variable is selected from each variable of this feature type for each characteristic type, as The specifying variable of this feature type;
Generation module 204, according to each of each learning sample, each specific characteristic type selected and specific characteristic type Specifying variable generates air control rule using first order rule learning algorithm.
The selecting module 202 selects learning sample according to the duplication probability of the genetic algorithm from each learning sample It is replicated, obtains each reproduction copies, the variable of each characteristic type and each reproduction copies pair are corresponded to according to each learning sample Specific characteristic type and specifying variable should be selected in the variable of each characteristic type.
The selecting module 202, according to the crossover probability of the genetic algorithm, from each learning sample, it is several right to determine Learning sample swaps the variable to learning sample characteristic type of the same race for each pair of learning sample, and/or, to this The variable of learning sample characteristic type of the same race is merged, cross sample is obtained, each feature is corresponded to according to each learning sample The variable of type and each cross sample correspond to the variable of each characteristic type, select specific characteristic type and specifying variable.
The type of the variable includes:The first kind and Second Type, the selecting module 202, to this to learning sample Middle variable is that the variable of the characteristic type of the same race of the first kind swaps, and is Second Type to variable in learning sample to this The variable of characteristic type of the same race merges.
The selecting module 202 is selected from each learning sample according to the mutation probability of the genetic algorithm into row variation Learning sample, for each learning sample selected into row variation, the change at least one characteristic type in the learning sample Amount obtains each variation sample with the presence or absence of being adjusted in the learning sample, corresponds to each feature class according to each learning sample The variable of type and each variation sample correspond to the variable of each characteristic type, select specific characteristic type and specifying variable.
The selecting module 202, for each characteristic type, when by the variable of this feature type by being not present in the study When being adjusted to be present in the learning sample in sample, a change is selected according to the frequency of occurrences of each variable in this feature type Amount, the variable as this feature type of the learning sample after adjustment.
The selecting module 202, according to the fitness formula of the genetic algorithmThe fitness for determining sample respectively to be selected, according to each to be selected The sequence of sample fitness from high to low is selected, the sample to be selected of specified quantity is selected, the sample to be selected selected is corresponded to Each characteristic type, as specific characteristic type, and by the sample to be selected selected correspond to each characteristic type variable, As specifying variable;
Wherein, the sample to be selected includes:Each learning sample, each reproduction copies, each cross sample and respectively become abnormal At least a kind of sample in this, fitnessjIndicate the fitness of j-th of sample to be selected, fitnessjiIndicate j-th it is to be selected Select the fitness of the variable of ith feature type in sample, NipIndicate the variable of the ith feature type in each positive example sample Appearance number, NinIndicate that the number that the variable of the ith feature type occurs in each negative example sample, positive example sample are wind Control result is risky sample to be selected, and negative example sample is that air control result is not have risky sample to be selected according to the something lost The fitness formula of propagation algorithmDetermine sample respectively to be selected Fitness selects the sample to be selected of specified quantity, will select according to the respectively sequence of sample fitness to be selected from high to low The corresponding each characteristic type of sample to be selected, correspond to as specific characteristic type, and by the sample to be selected selected The variable of each characteristic type, as specifying variable;
Wherein, the sample to be selected includes:Each learning sample, each reproduction copies, each cross sample and respectively become abnormal At least a kind of sample in this, fitnessjIndicate the fitness of j-th of sample to be selected, fitnessjiIndicate j-th it is to be selected Select the fitness of the variable of ith feature type in sample, NipIndicate the variable of the ith feature type in each positive example sample Appearance number, NinIndicate that the number that the variable of the ith feature type occurs in each negative example sample, positive example sample are wind Control result is risky sample to be selected, and negative example sample is that air control result is not have risky sample to be selected.
Described device further includes:
Rule trimming module 206 determines that the recognition accuracy of each air control rule, the identification according to each air control rule are accurate Rate is adjusted the characteristic type for including in each air control rule, redefines the recognition accuracy of the air control rule after adjustment, According to the recognition accuracy of each air control rule before the recognition accuracy of each air control rule after adjustment and adjustment, selection is at least One air control rule.
The rule trimming module 206 determines each detection sample different from each learning sample, root according to historical data According to each air control rule to the recognition result of each detection sample, the recognition accuracy of each air control rule is determined.
The rule trimming module 206, for each air control rule, according to the recognition accuracy of air control rule, from group In each characteristic type at air control rule, determine makes the feature class that the recognition accuracy of air control rule improves after being deleted Type deletes the characteristic type determined from each characteristic type of composition air control rule.
Based on the method for the air control rule digging described in Fig. 1, this specification, which corresponds to, provides a kind of server, as shown in fig. 6, Wherein, the server includes:One or more processors and memory, memory have program stored therein, and be configured to by One or more processors execute following steps:
For preset each characteristic type, determine that each learning sample corresponds to the characteristic value of this feature type, as the spy Levy the variable of type;
By genetic algorithm, at least part of characteristic type is selected from each characteristic type, as specific characteristic type, with And at least part of variable is selected from each variable of this feature type for each characteristic type, as this feature type Specifying variable;
According to each learning sample, each specifying variable of each specific characteristic type and specific characteristic type selected, adopt With first order rule learning algorithm, air control rule is generated.
In the 1990s, the improvement of a technology can be distinguished clearly be on hardware improvement (for example, Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So And with the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit. Designer nearly all obtains corresponding hardware circuit by the way that improved method flow to be programmed into hardware circuit.Cause This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, programmable logic device (Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate Array, FPGA)) it is exactly such a integrated circuit, logic function determines device programming by user.By designer Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, designs and makes without asking chip maker Dedicated IC chip.Moreover, nowadays, substitution manually makes IC chip, this programming is also used instead mostly " patrols Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development, And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language (Hardware Description Language, HDL), and HDL is also not only a kind of, but there are many kind, such as ABEL (Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL (Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language) etc., VHDL (Very-High-Speed are most generally used at present Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art also answer This understands, it is only necessary to method flow slightly programming in logic and is programmed into integrated circuit with above-mentioned several hardware description languages, The hardware circuit for realizing the logical method flow can be readily available.
Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing The computer for the computer readable program code (such as software or firmware) that device and storage can be executed by (micro-) processor can Read medium, logic gate, switch, application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), the form of programmable logic controller (PLC) and embedded microcontroller, the example of controller includes but not limited to following microcontroller Device:ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320, are deposited Memory controller is also implemented as a part for the control logic of memory.It is also known in the art that in addition to Pure computer readable program code mode is realized other than controller, can be made completely by the way that method and step is carried out programming in logic Controller is obtained in the form of logic gate, switch, application-specific integrated circuit, programmable logic controller (PLC) and embedded microcontroller etc. to come in fact Existing identical function.Therefore this controller is considered a kind of hardware component, and to including for realizing various in it The device of function can also be considered as the structure in hardware component.Or even, it can will be regarded for realizing the device of various functions For either the software module of implementation method can be the structure in hardware component again.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity, Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used Think personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play It is any in device, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment The combination of equipment.
For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this The function of each unit is realized can in the same or multiple software and or hardware when application.
It should be understood by those skilled in the art that, the embodiment of the present invention can be provided as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, the present invention can be used in one or more wherein include computer usable program code computer The computer program production implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.
The present invention be with reference to according to the method for the embodiment of the present invention, the flow of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that can be realized by computer program instructions every first-class in flowchart and/or the block diagram The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided Instruct the processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine so that the instruction executed by computer or the processor of other programmable data processing devices is generated for real The device for the function of being specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that instruction generation stored in the computer readable memory includes referring to Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device so that count Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, in computer or The instruction executed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in a box or multiple boxes.
In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net Network interface and memory.
Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology realizes information storage.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic tape cassette, tape magnetic disk storage or other magnetic storage apparatus Or any other non-transmission medium, it can be used for storage and can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability Including so that process, method, commodity or equipment including a series of elements include not only those elements, but also wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that wanted including described There is also other identical elements in the process of element, method, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can be provided as method, system or computer program product. Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.
The application can describe in the general context of computer-executable instructions executed by a computer, such as program Module.Usually, program module includes routines performing specific tasks or implementing specific abstract data types, program, object, group Part, data structure etc..The application can also be put into practice in a distributed computing environment, in these distributed computing environments, by Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage device.
Each embodiment in this specification is described in a progressive manner, identical similar portion between each embodiment Point just to refer each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so description is fairly simple, related place is referring to embodiment of the method Part explanation.
Above is only an example of the present application, it is not intended to limit this application.For those skilled in the art For, the application can have various modifications and variations.It is all within spirit herein and principle made by any modification, equivalent Replace, improve etc., it should be included within the scope of claims hereof.

Claims (21)

1. a kind of method of air control rule digging, including:
For preset each characteristic type, determine that each learning sample corresponds to the characteristic value of this feature type, as this feature class The variable of type;
By genetic algorithm, at least part of characteristic type is selected from each characteristic type, as specific characteristic type, and, At least part of variable is selected from each variable of this feature type for each characteristic type, as the specified of this feature type Variable;
According to each learning sample, each specifying variable of each specific characteristic type and specific characteristic type selected, using one Rank Rule learning algorithm generates air control rule.
2. the method as described in claim 1 selects at least part of feature class by genetic algorithm from each characteristic type Type, as specific characteristic type, and, for each characteristic type, from each variable of this feature type, select at least part of Variable is specifically included as the specifying variable of this feature type:
According to the duplication probability of the genetic algorithm, selects learning sample to be replicated from each learning sample, obtain each duplication Sample;
Correspond to the variable of each characteristic type corresponding to the variable of each characteristic type and each reproduction copies according to each learning sample, Select specific characteristic type and specifying variable.
3. the method as described in claim 1 selects at least part of feature class by genetic algorithm from each characteristic type Type, as specific characteristic type, and, for each characteristic type, from each variable of this feature type, select at least part of Variable is specifically included as the specifying variable of this feature type:
According to the crossover probability of the genetic algorithm, from each learning sample, determine several to learning sample;
For each pair of learning sample, which is swapped, and/or, to this to learning The variable for practising sample characteristic type of the same race merges, and obtains cross sample;
Correspond to the variable of each characteristic type corresponding to the variable of each characteristic type and each cross sample according to each learning sample, Select specific characteristic type and specifying variable.
4. the type of method as claimed in claim 3, the variable includes:The first kind and Second Type;
The variable to learning sample characteristic type of the same race is swapped, is specifically included:
The variable to the characteristic type of the same race that variable in learning sample is the first kind is swapped;
The variable to learning sample characteristic type of the same race is merged, is specifically included:
The variable to the characteristic type of the same race that variable in learning sample is Second Type is merged.
5. the method as described in claim 1 selects at least part of feature class by genetic algorithm from each characteristic type Type, as specific characteristic type, and, for each characteristic type, from each variable of this feature type, select at least part of Variable is specifically included as the specifying variable of this feature type:
According to the mutation probability of the genetic algorithm, the learning sample into row variation is selected from each learning sample;
For each learning sample selected into row variation, whether the variable of at least one characteristic type in the learning sample is deposited It is to be adjusted in the learning sample, obtains each variation sample;
Correspond to the variable of each characteristic type corresponding to the variable of each characteristic type and each variation sample according to each learning sample, Select specific characteristic type and specifying variable.
6. method as claimed in claim 5 whether there is in this variable of at least one characteristic type in learning sample It is adjusted, specifically includes in learning sample:
For each characteristic type, when the variable of this feature type is adjusted to be present in this in the learning sample by being not present in When in learning sample, a variable is selected according to the frequency of occurrences of each variable in this feature type, as the study after adjustment The variable of this feature type of sample.
7. such as claim 2~6 any one of them method, specific characteristic type and specifying variable are selected, is specifically included:
According to the fitness formula of the genetic algorithmIt determines The respectively fitness of sample to be selected;
According to the respectively sequence of sample fitness to be selected from high to low, the sample to be selected of specified quantity is selected;
It is to be selected as specific characteristic type, and by what is selected by the corresponding each characteristic type of the sample to be selected selected The variable that sample corresponds to each characteristic type is selected, as specifying variable;
Wherein, the sample to be selected includes:In each learning sample, each reproduction copies, each cross sample and each variation sample At least a kind of sample, fitnessjIndicate the fitness of j-th of sample to be selected, fitnessjiIndicate j-th of sample to be selected The fitness of the variable of ith feature type, N in thisipIndicate variable the going out in each positive example sample of the ith feature type Existing number, NinIndicate that the number that the variable of the ith feature type occurs in each negative example sample, positive example sample are air control knot Fruit is risky sample to be selected, and negative example sample is that air control result is not have risky sample to be selected.
8. the method as described in claim 1, after generating air control rule, the method further includes:
Determine the recognition accuracy of each air control rule;
According to the recognition accuracy of each air control rule, the characteristic type for including in each air control rule is adjusted;
Redefine the recognition accuracy of the air control rule after adjustment;
According to the recognition accuracy of each air control rule before the recognition accuracy of each air control rule after adjustment and adjustment, selection At least one air control rule.
9. method as claimed in claim 8 determines the recognition accuracy of the rule of each generation, specifically includes:
According to historical data, each detection sample different from each learning sample is determined;
According to each air control rule to the recognition result of each detection sample, the recognition accuracy of each air control rule is determined.
10. method as claimed in claim 9, according to the recognition accuracy of each air control rule, to including in each air control rule Characteristic type is adjusted, and is specifically included:
For each air control rule, according to the recognition accuracy of air control rule, from each characteristic type of composition air control rule In, determine makes the characteristic type that the recognition accuracy of air control rule improves after being deleted;
The characteristic type determined is deleted from each characteristic type of composition air control rule.
11. a kind of device of air control rule digging, including:
Determining module determines that each learning sample corresponds to the characteristic value of this feature type for preset each characteristic type, as The variable of this feature type;
Selecting module selects at least part of characteristic type by genetic algorithm from each characteristic type, as specific characteristic class Type, and, at least part of variable is selected from each variable of this feature type for each characteristic type, as this feature class The specifying variable of type;
Generation module, according to each learning sample, each specified change of each specific characteristic type and specific characteristic type selected Amount generates air control rule using first order rule learning algorithm.
12. device as claimed in claim 11, the selecting module, according to the duplication probability of the genetic algorithm, from each It practises selection learning sample in sample to be replicated, obtains each reproduction copies, each characteristic type is corresponded to according to each learning sample Variable and each reproduction copies correspond to the variable of each characteristic type, select specific characteristic type and specifying variable.
13. device as claimed in claim 11, the selecting module, according to the crossover probability of the genetic algorithm, from each Practise sample in, determine it is several to learning sample, for each pair of learning sample, to the variable to learning sample characteristic type of the same race It swaps, and/or, which is merged, cross sample is obtained, according to each The variable that sample corresponds to each characteristic type corresponding to the variable of each characteristic type and each cross sample is practised, specific characteristic is selected Type and specifying variable.
14. the type of device as claimed in claim 13, the variable includes:The first kind and Second Type, the selection Module swaps the variable to the characteristic type of the same race that variable in learning sample is the first kind, to this to learning sample Variable is that the variable of the characteristic type of the same race of Second Type merges in this.
15. device as claimed in claim 11, the selecting module, according to the mutation probability of the genetic algorithm, from each The learning sample selected in sample into row variation is practised, for each learning sample selected into row variation, in the learning sample The variable of at least one characteristic type obtains each variation sample, according to each with the presence or absence of being adjusted in the learning sample The variable that sample corresponds to each characteristic type corresponding to the variable of each characteristic type and each variation sample is practised, specific characteristic is selected Type and specifying variable.
16. device as claimed in claim 15, the selecting module, for each characteristic type, when by this feature type When variable is by being not present in being adjusted to be present in the learning sample in the learning sample, according to each variable in this feature type The frequency of occurrences select a variable, the variable as this feature type of the learning sample after adjustment.
17. such as claim 12 to 16 any one of them device, the selecting module, according to the fitness of the genetic algorithm FormulaThe fitness for determining sample respectively to be selected, according to each The sequence of sample fitness to be selected from high to low, selects the sample to be selected of specified quantity, the sample to be selected that will be selected Corresponding each characteristic type corresponds to each characteristic type as specific characteristic type, and by the sample to be selected selected Variable, as specifying variable;
Wherein, the sample to be selected includes:In each learning sample, each reproduction copies, each cross sample and each variation sample At least a kind of sample, fitnessjIndicate the fitness of j-th of sample to be selected, fitnessjiIndicate j-th of sample to be selected The fitness of the variable of ith feature type, N in thisipIndicate variable the going out in each positive example sample of the ith feature type Existing number, NinIndicate that the number that the variable of the ith feature type occurs in each negative example sample, positive example sample are air control knot Fruit is risky sample to be selected, and negative example sample is that air control result is not have risky sample to be selected.
18. device as claimed in claim 11, described device further include:
Rule trimming module determines the recognition accuracy of each air control rule, according to the recognition accuracy of each air control rule, to each wind Regulatory control then in include characteristic type be adjusted, redefine adjustment after air control rule recognition accuracy, according to adjustment The recognition accuracy of the recognition accuracy of each air control rule afterwards and each air control rule before adjustment, selects at least one air control Rule.
19. device as claimed in claim 18, the rule trimming module determines and each learning sample according to historical data Different each detection samples determines that the identification of each air control rule is accurate according to each air control rule to the recognition result of each detection sample True rate.
20. device as claimed in claim 19, the rule trimming module is advised for each air control rule according to the air control Recognition accuracy then determines the identification for making air control rule after being deleted from each characteristic type of composition air control rule The characteristic type that accuracy rate improves deletes the characteristic type determined from each characteristic type of composition air control rule.
21. a kind of server, wherein the server includes:One or more processors and memory, memory are stored with journey Sequence, and be configured to execute following steps by one or more processors:
For preset each characteristic type, determine that each learning sample corresponds to the characteristic value of this feature type, as this feature class The variable of type;
By genetic algorithm, at least part of characteristic type is selected from each characteristic type, as specific characteristic type, and, At least part of variable is selected from each variable of this feature type for each characteristic type, as the specified of this feature type Variable;
According to each learning sample, each specifying variable of each specific characteristic type and specific characteristic type selected, using one Rank Rule learning algorithm generates air control rule.
CN201810053792.7A 2018-01-19 2018-01-19 Method and device for mining wind control rule Active CN108346098B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810053792.7A CN108346098B (en) 2018-01-19 2018-01-19 Method and device for mining wind control rule

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810053792.7A CN108346098B (en) 2018-01-19 2018-01-19 Method and device for mining wind control rule

Publications (2)

Publication Number Publication Date
CN108346098A true CN108346098A (en) 2018-07-31
CN108346098B CN108346098B (en) 2022-05-31

Family

ID=62960526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810053792.7A Active CN108346098B (en) 2018-01-19 2018-01-19 Method and device for mining wind control rule

Country Status (1)

Country Link
CN (1) CN108346098B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670835A (en) * 2018-09-25 2019-04-23 深圳壹账通智能科技有限公司 Air control method, apparatus, equipment and readable storage medium storing program for executing based on service node
CN109840838A (en) * 2018-12-26 2019-06-04 天翼电子商务有限公司 Air control rule model system with double engines, control method and server
CN110135701A (en) * 2019-04-23 2019-08-16 北京淇瑀信息科技有限公司 Control automatic generation method, device, electronic equipment and the readable medium of rule
CN111967600A (en) * 2020-08-18 2020-11-20 北京睿知图远科技有限公司 Feature derivation system and method based on genetic algorithm in wind control scene
WO2021196843A1 (en) * 2020-03-31 2021-10-07 支付宝(杭州)信息技术有限公司 Derived variable selection method and apparatus for risk identification model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102054039A (en) * 2010-12-30 2011-05-11 长安大学 Fitness scaling method for improving overall search capability of genetic algorithm
CN106445821A (en) * 2016-09-23 2017-02-22 郑州云海信息技术有限公司 Method for automatically generating test case based on genetic algorithm
CN106952162A (en) * 2016-01-07 2017-07-14 平安科技(深圳)有限公司 Money laundering risks rating calculation method and system
CN107391569A (en) * 2017-06-16 2017-11-24 阿里巴巴集团控股有限公司 Identification, model training, Risk Identification Method, device and the equipment of data type
CN107424069A (en) * 2017-08-17 2017-12-01 阿里巴巴集团控股有限公司 A kind of generation method of air control feature, risk monitoring and control method and apparatus
CN107491988A (en) * 2017-08-09 2017-12-19 浙江工商大学 A kind of wisdom retail data method for digging based on genetic algorithm and improvement interest-degree

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102054039A (en) * 2010-12-30 2011-05-11 长安大学 Fitness scaling method for improving overall search capability of genetic algorithm
CN106952162A (en) * 2016-01-07 2017-07-14 平安科技(深圳)有限公司 Money laundering risks rating calculation method and system
CN106445821A (en) * 2016-09-23 2017-02-22 郑州云海信息技术有限公司 Method for automatically generating test case based on genetic algorithm
CN107391569A (en) * 2017-06-16 2017-11-24 阿里巴巴集团控股有限公司 Identification, model training, Risk Identification Method, device and the equipment of data type
CN107491988A (en) * 2017-08-09 2017-12-19 浙江工商大学 A kind of wisdom retail data method for digging based on genetic algorithm and improvement interest-degree
CN107424069A (en) * 2017-08-17 2017-12-01 阿里巴巴集团控股有限公司 A kind of generation method of air control feature, risk monitoring and control method and apparatus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
漠北墨杯: ""周志华机器学习读后总结第14、15、16章"", 《CSDN》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670835A (en) * 2018-09-25 2019-04-23 深圳壹账通智能科技有限公司 Air control method, apparatus, equipment and readable storage medium storing program for executing based on service node
CN109840838A (en) * 2018-12-26 2019-06-04 天翼电子商务有限公司 Air control rule model system with double engines, control method and server
CN109840838B (en) * 2018-12-26 2021-08-31 天翼数智科技(北京)有限公司 Wind control rule model dual-engine system, control method and server
CN110135701A (en) * 2019-04-23 2019-08-16 北京淇瑀信息科技有限公司 Control automatic generation method, device, electronic equipment and the readable medium of rule
WO2021196843A1 (en) * 2020-03-31 2021-10-07 支付宝(杭州)信息技术有限公司 Derived variable selection method and apparatus for risk identification model
CN111967600A (en) * 2020-08-18 2020-11-20 北京睿知图远科技有限公司 Feature derivation system and method based on genetic algorithm in wind control scene

Also Published As

Publication number Publication date
CN108346098B (en) 2022-05-31

Similar Documents

Publication Publication Date Title
CN108346098A (en) A kind of method and device of air control rule digging
US20240086930A1 (en) Blockchain transaction safety
CN107424069A (en) A kind of generation method of air control feature, risk monitoring and control method and apparatus
CN107368259A (en) A kind of method and apparatus that business datum is write in the catenary system to block
CN109063952B (en) Policy generation and risk control method and device
CN109146638B (en) Method and device for identifying abnormal financial transaction group
CN110457912A (en) Data processing method, device and electronic equipment
CN105446988B (en) The method and apparatus for predicting classification
CN106156092B (en) Data processing method and device
CN112465627B (en) Financial loan auditing method and system based on block chain and machine learning
CN110310114A (en) Object classification method, device, server and storage medium
US11995547B2 (en) Predicting and visualizing outcomes using a time-aware recurrent neural network
CN110428139A (en) The information forecasting method and device propagated based on label
CN109033148A (en) One kind is towards polytypic unbalanced data preprocess method, device and equipment
CN111242319A (en) Model prediction result interpretation method and device
CN111353600A (en) Abnormal behavior detection method and device
CN113537960A (en) Method, device and equipment for determining abnormal resource transfer link
CN114548300B (en) Method and device for explaining service processing result of service processing model
CN112750038B (en) Transaction risk determination method, device and server
CN110516915A (en) Service node training, appraisal procedure, device and electronic equipment
WO2019023406A9 (en) System and method for detecting and responding to transaction patterns
CN112801784A (en) Bit currency address mining method and device for digital currency exchange
CN116431651A (en) Graph data processing method and device and computer equipment
CN108305172A (en) Invest target selection method, electronic device and computer readable storage medium
CN114881761A (en) Determination method of similar sample and determination method of credit limit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

GR01 Patent grant
GR01 Patent grant