Disclosure of Invention
The specification provides an air control method and an air control device, which can rapidly and effectively provide trade air control service for countries and regions with shorter online time and smaller case quantity of online trade.
The application discloses a wind control method, comprising the following steps:
respectively acquiring sample data of a source region and a target region;
training a source region transaction model based on sample data of the source region;
generating a target area transaction model according to the structure and parameters of the source area transaction model, expanding each leaf node of each decision tree contained in the target area transaction model into a sub-decision tree according to sample data of the target area, and pruning each internal node of each decision tree if the first experience error of the internal node serving as a root node is larger than the second experience error of the leaf node so as to adjust the target area transaction model;
and carrying out risk control on the transaction of the target area according to the adjusted transaction model of the target area.
In a preferred embodiment, expanding each leaf node of each decision tree included in the target region transaction model into sub-decision trees according to the sample data of the target region, and for each internal node of each decision tree, pruning the internal node to adjust the target region transaction model if the first experience error of the internal node as a root node is greater than the second experience error of the internal node as a leaf node, including:
determining a sample data set of the target region corresponding to each leaf node of each decision tree contained in the target region transaction model;
training a sub-decision tree for each leaf node of each decision tree based on the sample data set of the target area corresponding to each leaf node, and expanding the leaf node into the sub-decision tree;
and for each internal node of each decision tree in the target area transaction model and a sample data set of the target area corresponding to the internal node, calculating a first experience error of the internal node as a root node and a second experience error of the internal node as a leaf node, and pruning the internal node as the leaf node if the first experience error is larger than the second experience error.
In a preferred embodiment, the transaction is an online payment service.
In a preferred embodiment, the source regional transaction model and the target regional transaction model are random forest models.
In a preferred embodiment, the generating the target regional transaction model according to the structure and parameters of the source regional transaction model includes: and copying the structure and parameters of the random forest model of the source region to the random forest model of the target region.
In a preferred embodiment, the first experience error as a root node and the second experience error as a leaf node of each internal node of each decision tree in the target region transaction model are calculated in bottom-to-top order for each internal node and the sample data set of the target region corresponding to the internal node.
In a preferred embodiment, in the step of calculating the first empirical error as a root node and the second empirical error as a leaf node for the internal node, the first and second empirical errors are represented using log loss or cross entropy.
The application also discloses wind control device includes:
the acquisition module is used for respectively acquiring sample data of a source region and a target region;
the training module is used for training a source region transaction model based on the sample data of the source region;
the adjustment module is used for generating a target area transaction model according to the structure and parameters of the source area transaction model, expanding each leaf node of each decision tree contained in the target area transaction model into a sub-decision tree according to sample data of the target area, pruning each internal node of each decision tree if the first experience error of the internal node serving as a root node is larger than the second experience error of the internal node serving as the leaf node so as to adjust the target area transaction model;
and the wind control module is used for controlling risks of the transactions in the target area according to the adjusted transaction model of the target area.
In a preferred embodiment, the adjustment module comprises the following sub-modules:
sample dataset and submodule: a sample data set for determining the target region corresponding to each leaf node of each decision tree contained in the target region transaction model;
sub-decision tree sub-module: for each leaf node of each decision tree, training a sub-decision tree based on the sample data set of the target region corresponding to each leaf node, and expanding the leaf node into the sub-decision tree;
and the calculation sub-module is used for calculating a first experience error serving as a root node and a second experience error serving as a leaf node of each internal node of each decision tree in the target area transaction model and a sample data set of the target area corresponding to the internal node, and pruning the internal node into the leaf node if the first experience error is larger than the second experience error.
In a preferred embodiment, the transaction is an online payment service.
In a preferred embodiment, the source regional transaction model and the target regional transaction model are random forest models.
In a preferred embodiment, the adaptation module is further adapted to copy the structure and parameters of the random forest model of the source region to a random forest model of the target region.
In a preferred embodiment, the calculation submodule calculates, for each internal node of each decision tree in the target region transaction model and a sample data set of the target region corresponding to the internal node, a first experience error as a root node and a second experience error as a leaf node of the internal node in a bottom-to-top order.
In a preferred embodiment, the first and second empirical errors are represented using log loss or cross entropy.
The application also discloses a wind control equipment includes:
a memory for storing computer executable instructions; the method comprises the steps of,
a processor for implementing steps in a method as described hereinbefore when executing said computer executable instructions.
The application also discloses a computer readable storage medium having stored therein computer executable instructions which when executed by a processor implement the steps in the method as described above.
According to the embodiment of the specification, the data resources of countries and regions with longer online time and more cases of online payment service can be utilized, and under the condition that transaction sample data is not allowed to be directly transmitted, transaction wind control services can be rapidly and effectively provided for countries and regions with shorter online time and less cases of online payment service.
Further, compared with a scheme of directly using a wind control model of a source region, the wind control method of the embodiment of the specification can be better adapted to data distribution of a target region; compared with the scheme of constructing the wind control model by using the sample data of the target area, the wind control effect is better due to the fact that the sample data and the experience knowledge of the source area are utilized; compared with a sample migration method, the method does not need to transfer sample data between a source region and a target region, and can effectively isolate data.
In this specification, a number of technical features are described, and are distributed in each technical solution, which makes the specification too lengthy if all possible combinations of technical features (i.e. technical solutions) of the present application are to be listed. In order to avoid this problem, the technical features disclosed in the above summary of the invention, the technical features disclosed in the following embodiments and examples, and the technical features disclosed in the drawings may be freely combined with each other to constitute various new technical solutions (which should be regarded as having been described in the present specification) unless such a combination of technical features is technically impossible. For example, in one example, feature a+b+c is disclosed, in another example, feature a+b+d+e is disclosed, and features C and D are equivalent technical means that perform the same function, technically only by alternative use, and may not be adopted simultaneously, feature E may be technically combined with feature C, and then the solution of a+b+c+d should not be considered as already described because of technical impossibility, and the solution of a+b+c+e should be considered as already described.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. However, it will be understood by those skilled in the art that the claimed invention may be practiced without these specific details and with various changes and modifications from the embodiments that follow.
Embodiments of the present specification will be described in further detail below with reference to the accompanying drawings.
First, a specific application scenario of one embodiment of the present application is described.
In this scenario, risk control is required in the target area for a certain transaction, for example, for an online payment service. Wherein, since the target area just starts to be on line with the service, the case quantity and the sample data are small. In the source region, the service is on line for a long time, so that the service has sufficient case quantity and sample data, and can obtain effective wind control service. In this embodiment, the source region may be in, for example, china and the target region may be in, for example, indonesia.
As shown in fig. 1, the wind control method of the present embodiment includes the following steps:
step 110: respectively acquiring sample data of a source region and a target region;
step 120: training a source region transaction model based on sample data of the source region;
step 130: generating a target area transaction model according to the structure and parameters of the source area transaction model, expanding each leaf node of each decision tree contained in the target area transaction model into a sub-decision tree according to sample data of the target area, and pruning each internal node of each decision tree if the first experience error of the internal node serving as a root node is larger than the second experience error of the leaf node so as to adjust the target area transaction model;
step 140: and performing risk control on the transaction of the target area according to the adjusted transaction model of the target area.
Each step is explained in detail below.
For step 110:
specifically, first, sample data of online payment services in a source area and a target area, that is, historical transaction data, for example, data related to user login, transaction, registration, verification results and the like, is obtained, and each historical transaction data is labeled with a case label, wherein a case reported by a user is marked as black (i.e., positive), and a case not reported is marked as non-black (i.e., negative).
For example, taking china as a source region and indonesia as a target region, historical transaction data of online payment services in china and indonesia are acquired respectively, and for each historical transaction, if there is a user report, for example, if the user reports a case where the card is stolen by a blackout and a non-self transaction occurs, the historical transaction is marked as black (i.e., positive), and if there is no user report, the historical transaction is marked as non-black (i.e., negative).
Furthermore, characteristic variables are designed based on case characteristics of the source region and the target region, and proper characteristic variables are screened out based on IV value screening and other modes. The feature variable is data obtained by processing the sample data, for example, the number of times of registration accumulated in one day in the user dimension.
The IV value is a feature reflecting the importance of the feature variable in the model. For example, the user may set a threshold value, which may be an empirical value. When the IV value of the characteristic variable is larger than the threshold value, the characteristic variable is important, and the characteristic variable is determined to be effective; conversely, when the IV value of the feature variable is not greater than the threshold value, it is stated that the feature variable is less important, and therefore, it is not determined as a valid feature variable.
For step 120:
in this embodiment, the source region trade model is a random forest model.
A random forest model is a classifier that contains a plurality of decision trees, where a decision tree is a basic classifier that generally classifies features into two classes. The built decision tree is in a tree structure and can be regarded as a set of if-then rules.
In this embodiment, the source region transaction model is trained based on sample data of the online payment service in china.
For step 130:
the target area trade model is also a random forest model.
The parameters of the source region trade model refer to the structure of each decision tree in the random forest model and the classification value of each node of the decision tree.
The generating a target region transaction model according to the structure and the parameters of the source region transaction model comprises the following steps: and copying the structure and parameters of the random forest model of the source region to the random forest model of the target region.
By copying the structure and parameters of the random forest model of the source region to the random forest model of the target region, only abstract data of the transactions of the source region, i.e. only the structure and parameters of the random forest model of the source region, may be used without using specific sample data of the source region.
As shown in fig. 2, the step of adjusting the transaction model of the target area according to the sample data of the target area may be implemented in the following specific manner:
step 1302: and determining a sample data set of the target region corresponding to each leaf node of each decision tree contained in the target region transaction model in sample data of the target region.
Specifically, when each sample data is decided by using the random forest model, the sample data finally falls to one leaf node of the decision tree of the random forest model. And respectively using decision trees in the transaction model of the source region to make decisions on all sample data of the target region, and then counting data sets falling on each leaf node of the decision tree, wherein the data sets are sample data sets of the target region corresponding to each leaf node.
Step 1304: and training a sub-decision tree for each leaf node of each decision tree based on the sample data set of the target area corresponding to each leaf node, and expanding the leaf node into the sub-decision tree.
The target region transaction model is copied from the source region transaction model, and through this step, sample data of the target region is essentially fused to the source region transaction model.
Specifically, this step may train one sub-decision tree for all leaf nodes of each decision tree contained in the target region transaction model in a round-robin manner based on the sample dataset sums of its corresponding target region, thereby expanding each leaf node into one sub-decision tree.
Step 1306: for each internal node of each decision tree contained in the target region transaction model and the sample data set of the target region corresponding to the internal node, calculating a first experience error of the internal node as a root node and a second experience error of the internal node as a leaf node.
The internal nodes refer to all internal nodes of the decision tree after the expansion step, and the internal nodes comprise all non-leaf nodes and root nodes of the decision tree. The empirical error of the internal node as the root node, also referred to as subtree error, and the empirical error of the internal node as the leaf node, also referred to as leaf error.
The decision sequence of the decision trees is bottom-up, so that for each internal node of each decision tree contained in the target region transaction model, a first experience error as a root node and a second experience error as a leaf node of the internal node are calculated in bottom-up order.
In the step of calculating the first empirical error of the internal node as the root node, the empirical error may be represented using log loss or cross entropy. For example, for a classification problem, the first tested error is represented using logoss. The specific calculation of the first experience error of the internal node as the root node and the specific calculation of the second experience error of the internal node as the leaf node are all common general knowledge in the art, and will not be described in detail herein.
Step 1308: comparing the first experience error and the second experience error, pruning the internal node as a leaf node if the first experience error is greater than the second experience error.
Pruning the internal nodes to leaf nodes if the first experience error of the internal node as the root node is greater than the second experience error of the internal node as the leaf node is to avoid overfitting the model on the target domain dataset.
For each decision tree contained in the target locale transaction model, steps 1302-1308 are performed to make adjustments to each decision tree. Thus, all the adjusted decision trees are combined to obtain an adjusted target region transaction model.
For step 140:
for example, the adjusted target area transaction model may be deployed online, the current transaction in the target area may be scored, the model score of the current transaction may be determined, and compared with a threshold value, whether the model score of the current transaction is lower than the threshold value may be determined, if so, the current transaction may be passed, if so, the user may be further checked, whether the current transaction is a principal transaction may be determined, if so, the current transaction may be passed, otherwise, the verification may not be passed, and the current transaction may fail.
A second embodiment of the present specification relates to an air control device having a structure as shown in fig. 3, the air control device including: the system comprises an acquisition module, a training module, an adjustment module and an air control module.
And the acquisition module is used for respectively acquiring the sample data of the source region and the target region.
And the training module is used for training a source region transaction model based on the sample data of the source region.
And the adjusting module is used for generating a target area transaction model according to the structure and the parameters of the source area transaction model and adjusting the target area transaction model according to the sample data of the target area. Optionally, in one embodiment, the adjustment module includes the following sub-modules: sample dataset and submodule: a sample data set for determining the target region corresponding to each leaf node of each decision tree contained in the target region transaction model; sub-decision tree sub-module: for each leaf node of each decision tree, training a sub-decision tree based on the sample data set of the target region corresponding to each leaf node, and expanding the leaf node into the sub-decision tree; and the calculation sub-module is used for calculating a first experience error serving as a root node and a second experience error serving as a leaf node of each internal node of each decision tree in the target area transaction model and a sample data set of the target area corresponding to the internal node according to the sequence from bottom to top, and pruning the internal node into the leaf node if the first experience error is larger than the second experience error. Optionally, in one embodiment, the transaction is an online payment service.
Optionally, in one embodiment, the source regional transaction model and the target regional transaction model are random forest models.
Alternatively, in one embodiment, the first and second empirical errors are represented using log loss or cross entropy.
And the wind control module is used for controlling risks of the transactions in the target area according to the adjusted transaction model of the target area.
The first embodiment is a method embodiment corresponding to the present embodiment, and the technical details in the first embodiment can be applied to the present embodiment, and the technical details in the present embodiment can also be applied to the first embodiment.
The technical carriers involved in payment in the embodiments of the present disclosure may include, for example, near field communication (Near Field Communication, NFC), WIFI, 3G/4G/5G, POS machine card swiping technology, two-dimensional code scanning technology, bar code scanning technology, bluetooth, infrared, short message (Short Message Service, SMS), multimedia message (Multimedia Message Service, MMS), and the like.
It should be noted that, as will be understood by those skilled in the art, the implementation functions of the modules shown in the embodiments of the wind control device described above may be understood by referring to the description of the wind control method described above. The functions of the modules shown in the embodiments of the wind control device described above may be implemented by a program (executable instructions) running on a processor, or by a specific logic circuit. The wind control device according to the embodiment of the present disclosure may also be stored in a computer readable storage medium if implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present specification may be embodied in essence or a part contributing to the prior art in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the methods described in the embodiments of the present specification. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes. Thus, embodiments of the present specification are not limited to any specific combination of hardware and software.
Accordingly, the present description also provides a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, implement the method embodiments of the present description. Computer-readable storage media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable storage media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
In addition, the embodiment of the present specification also provides a wind control apparatus, which includes a memory for storing computer executable instructions, and a processor; the processor is configured to implement the steps of the method embodiments described above when executing computer-executable instructions in the memory. The processor may be a central processing unit (Central Processing Unit, abbreviated as "CPU"), other general purpose processors, digital signal processors (Digital Signal Processor, abbreviated as "DSP"), application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as "ASIC"), and the like. The aforementioned memory may be a read-only memory (ROM), a random access memory (random access memory, RAM), a Flash memory (Flash), a hard disk, a solid state disk, or the like. The steps of the method disclosed in the embodiments of the present invention may be directly embodied in a hardware processor for execution, or may be executed by a combination of hardware and software modules in the processor.
It should be noted that in the present patent application, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. In the present patent application, if it is mentioned that an action is performed according to an element, it means that the action is performed at least according to the element, and two cases are included: the act is performed solely on the basis of the element and is performed on the basis of the element and other elements. Multiple, etc. expressions include 2, 2 times, 2, and 2 or more, 2 or more times, 2 or more.
All references mentioned in this specification are to be considered as being included in the disclosure of this specification in their entirety so as to be applicable as a basis for modification when necessary. Furthermore, it should be understood that the foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, or the like, which is within the spirit and principles of one or more embodiments of the present disclosure, is intended to be included within the scope of one or more embodiments of the present disclosure.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.