CN116128068A - Training method and device for money backwashing model and electronic equipment - Google Patents

Training method and device for money backwashing model and electronic equipment Download PDF

Info

Publication number
CN116128068A
CN116128068A CN202211475923.3A CN202211475923A CN116128068A CN 116128068 A CN116128068 A CN 116128068A CN 202211475923 A CN202211475923 A CN 202211475923A CN 116128068 A CN116128068 A CN 116128068A
Authority
CN
China
Prior art keywords
transaction
marked
model
unlabeled
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211475923.3A
Other languages
Chinese (zh)
Inventor
刘正夫
武润鹏
伍思恒
周振华
张孝丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
4Paradigm Beijing Technology Co Ltd
Original Assignee
4Paradigm Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 4Paradigm Beijing Technology Co Ltd filed Critical 4Paradigm Beijing Technology Co Ltd
Priority to CN202211475923.3A priority Critical patent/CN116128068A/en
Publication of CN116128068A publication Critical patent/CN116128068A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Finance (AREA)
  • Software Systems (AREA)
  • Accounting & Taxation (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Artificial Intelligence (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Technology Law (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention discloses a training method and device for a money laundering model and electronic equipment, wherein the training method comprises the following steps: acquiring an initial marked transaction sample set; according to a pre-selected machine learning algorithm, training a machine learning model based on an initial marked transaction sample set to obtain a prediction model; obtaining an unlabeled transaction sample set, carrying out prediction processing on the unlabeled transaction sample set according to a prediction model, and screening at least one transaction sample to be marked from the unlabeled transaction sample set according to a prediction result; obtaining at least one new marked transaction sample after the at least one transaction sample to be marked is marked, and performing iterative update training on the prediction model according to the at least one new marked transaction sample; judging whether the prediction model meets a preset termination condition, if so, obtaining a money laundering model based on the prediction model, otherwise, returning to the step of acquiring the unlabeled transaction sample set.

Description

Training method and device for money backwashing model and electronic equipment
The application is a divisional application of a patent application with the application date of 2019, 9, 5, 201910839229.7 and entitled "training method and device for money laundering model" and electronic equipment.
Technical Field
The present invention relates to the technical field of machine learning models, and more particularly, to a training method of a money laundering model, a training device of a money laundering model, an electronic device, and a readable storage medium.
Background
Money laundering refers to money laundering activities for preventing the origin and nature of the earnings and benefits of crimes such as disguised, concealed drug crimes, organizational crimes of black social nature, terrorist crimes, smuggled crimes, bristled crimes, and crimes that disrupt financial management order crimes. The back money laundering is a common wind control scenario, which is a systematic engineering that identifies possible money laundering activities and strikes the illegally obtained and earned money laundering activities through various approaches.
With the development of machine learning and data mining technologies, a machine learning model is generally adopted in the prior art to identify money laundering behaviors. In recent years, machine learning models have met with greater success in the money laundering scenario. However, machine learning requires that all samples be handed to an expert for marking, and training a back-money-laundering machine learning model requires a large number of samples, which results in a large amount of labor consumption for marking the samples, and increases labor costs. In addition, redundant samples are introduced into the training set, so that the training of the money laundering model is not facilitated.
Disclosure of Invention
The invention provides a new technical scheme for training a money laundering model.
According to a first aspect of the present invention, there is provided a training method of a money laundering model, comprising:
obtaining an initial set of marked transaction samples, wherein each sample in the set of marked samples has been marked as either a money laundering transaction or a non-money laundering transaction;
according to a pre-selected machine learning algorithm, training a machine learning model based on the initial marked transaction sample set to obtain a prediction model;
obtaining an unlabeled transaction sample set, carrying out prediction processing on the unlabeled transaction sample set according to the prediction model, and screening at least one transaction sample to be marked from the unlabeled transaction sample set according to a prediction result;
obtaining at least one new marked transaction sample after the at least one transaction sample to be marked is marked, and performing iterative update training on the prediction model according to the at least one new marked transaction sample;
and judging whether the prediction model meets a preset termination condition, if so, obtaining a money laundering model based on the prediction model, otherwise, returning to the step of acquiring the unlabeled transaction sample set.
Optionally, the predicting the unlabeled transaction sample set according to the prediction model, and screening at least one transaction sample to be labeled from the unlabeled transaction sample set according to a prediction result includes:
and screening unlabeled transaction samples with prediction results within a first setting range from the unlabeled transaction sample set to serve as the transaction samples to be labeled.
Optionally, the prediction model comprises a target back money laundering model and a screening model consisting of a plurality of models;
the step of predicting the unlabeled transaction sample set according to the prediction model, and screening at least one transaction sample to be marked from the unlabeled transaction sample set according to the prediction result comprises the following steps:
predicting the unlabeled transaction sample set according to the target anti-money laundering model to obtain a first prediction result of each unlabeled transaction sample;
screening unlabeled transaction samples with a first prediction result within a first setting range from the unlabeled transaction sample set to serve as candidate transaction samples to be labeled;
predicting each candidate transaction sample to be marked according to a plurality of models forming the screening model to obtain a corresponding second prediction result;
And screening the at least one transaction sample to be marked from the candidate transaction samples to be marked according to the corresponding second prediction result.
Optionally, the step of screening the at least one transaction sample to be marked from the candidate transaction samples to be marked according to the corresponding second prediction result includes:
and carrying out de-duplication treatment on the candidate transaction samples to be marked according to a second prediction result corresponding to each candidate transaction sample to be marked, so as to obtain at least one transaction sample to be marked.
Optionally, the step of performing deduplication processing on the candidate transaction samples to be marked according to the second prediction result corresponding to each candidate sample to be marked, to obtain the transaction samples to be marked includes:
for each candidate transaction sample to be marked, setting a marking value corresponding to a second prediction result in a second setting range as a first setting value; setting a marking value corresponding to a second prediction result exceeding the second setting range as a second setting value;
for each candidate transaction sample to be marked, obtaining a marking value vector according to marking values corresponding to second prediction results of the corresponding candidate transaction samples to be marked based on a preset sequence;
And carrying out de-duplication treatment on the candidate transaction samples to be marked according to the marking value vector to obtain the transaction samples to be marked.
Optionally, the termination condition includes any one or more of the following:
the number of the candidate transaction samples to be marked is smaller than or equal to a preset first number threshold;
the proportion of the candidate transaction samples to be marked screened out in the unlabeled transaction sample set is smaller than or equal to a preset first proportion threshold;
the number of unlabeled transaction samples is less than or equal to a preset second number threshold;
and the proportion of the unlabeled transaction samples in the unlabeled transaction sample set is smaller than or equal to a preset second proportion threshold value.
Optionally, the algorithms used by the plurality of predictive models are different or the super-parameters are different.
Optionally, the method for obtaining the back money laundering model based on the prediction model includes any one of the following:
taking the target money back-washing model as the money back-washing model;
selecting any one of the screening models as the money laundering model;
combining at least two screening models to obtain the money laundering model;
And combining the target money back-washing model and at least one screening model to obtain the money back-washing model.
Optionally, the prediction model is composed of a plurality of models;
the step of predicting the unlabeled transaction sample set according to the prediction model and screening at least one transaction sample to be marked from the unlabeled transaction sample set according to a prediction result comprises the following steps:
predicting the unlabeled transaction sample set according to one model or at least two models in the prediction models to obtain a first prediction result of each unlabeled transaction sample;
screening candidate transaction samples to be marked meeting preset conditions from the unlabeled transaction sample set according to the first prediction result;
predicting each candidate transaction sample to be marked according to a plurality of models forming the prediction model to obtain a corresponding second prediction result;
and screening the at least one transaction sample to be marked from the candidate transaction samples to be marked according to the corresponding second prediction result.
Optionally, the step of predicting the unlabeled transaction sample set according to one model or at least two models of the prediction models, to obtain a first prediction result of each unlabeled transaction sample includes:
Selecting one model from a plurality of models forming the prediction model, and predicting each unlabeled transaction sample according to the selected model to obtain a corresponding first prediction result;
or,
selecting at least two models from a plurality of models forming the prediction model, respectively predicting each unlabeled transaction sample according to the selected at least two models to obtain a prediction result, and averaging the prediction results corresponding to the at least two models to obtain a first prediction result.
Optionally, the screening the at least one transaction sample to be marked from the candidate transaction samples to be marked according to the corresponding second prediction result includes:
for each candidate transaction sample to be marked, setting a marking value corresponding to a second prediction result in a second setting range as a first setting value; setting a marking value corresponding to a second prediction result exceeding the second setting range as a second setting value;
for each candidate transaction sample to be marked, obtaining a marking value vector according to marking values corresponding to second prediction results of the corresponding candidate transaction samples to be marked based on a preset sequence;
And carrying out de-duplication treatment on the candidate transaction samples to be marked according to the marking value vector to obtain at least one transaction sample to be marked.
Optionally, the step of obtaining the money laundering model based on the prediction model includes:
any one prediction model is obtained and used as the money back washing model; or,
and combining at least two prediction models to obtain the money laundering model.
Optionally, the machine learning algorithm is a random forest algorithm.
Optionally, the training method further includes:
obtaining a target transaction sample to be predicted;
and predicting the target transaction sample according to the money laundering model to obtain a prediction result of the target transaction sample.
Optionally, the training method further includes:
and displaying the prediction result of the target sample.
According to a second aspect of the present invention there is provided a training device for a money laundering model comprising:
an initial sample acquisition module for acquiring an initial set of marked transaction samples, wherein each sample in the set of marked samples has been marked as either a money laundering transaction or a non-money laundering transaction;
the model initial training module is used for carrying out machine learning model training based on the initial marked transaction sample set according to a pre-selected machine learning algorithm to obtain a prediction model;
The sample screening module to be marked is used for obtaining an unlabeled transaction sample set, carrying out prediction processing on the unlabeled transaction sample set according to the prediction model, and screening at least one sample to be marked from the unlabeled transaction sample set according to a prediction result;
the model iteration training module is used for obtaining at least one new marked transaction sample after the at least one transaction sample to be marked is marked, and carrying out iteration updating training on the prediction model according to the at least one new marked transaction sample;
and the termination condition judging module is used for judging whether the prediction model meets the preset termination condition, if so, obtaining a money laundering model based on the prediction model, and if not, returning to the step of acquiring the unlabeled transaction sample set.
Optionally, the predicting the unlabeled transaction sample set according to the prediction model, and screening at least one transaction sample to be labeled from the unlabeled transaction sample set according to a prediction result includes:
and screening unlabeled transaction samples with prediction results within a first setting range from the unlabeled transaction sample set to serve as the transaction samples to be labeled.
Optionally, the prediction model comprises a target back money laundering model and a screening model consisting of a plurality of models;
the predicting the unlabeled transaction sample set according to the prediction model, and screening at least one transaction sample to be marked from the unlabeled transaction sample set according to the prediction result comprises:
predicting the unlabeled transaction sample set according to the target anti-money laundering model to obtain a first prediction result of each unlabeled transaction sample;
screening unlabeled transaction samples with a first prediction result within a first setting range from the unlabeled transaction sample set to serve as candidate transaction samples to be labeled;
predicting each candidate transaction sample to be marked according to a plurality of models forming the screening model to obtain a corresponding second prediction result;
and screening the at least one transaction sample to be marked from the candidate transaction samples to be marked according to the corresponding second prediction result.
Optionally, the screening the at least one transaction sample to be marked from the candidate transaction samples to be marked according to the corresponding second prediction result includes:
And carrying out de-duplication treatment on the candidate transaction samples to be marked according to a second prediction result corresponding to each candidate transaction sample to be marked, so as to obtain at least one transaction sample to be marked.
Optionally, the performing deduplication processing on the candidate transaction samples to be marked according to the second prediction result corresponding to each candidate sample to be marked, to obtain the transaction samples to be marked includes:
for each candidate transaction sample to be marked, setting a marking value corresponding to a second prediction result in a second setting range as a first setting value; setting a marking value corresponding to a second prediction result exceeding the second setting range as a second setting value;
for each candidate transaction sample to be marked, obtaining a marking value vector according to marking values corresponding to second prediction results of the corresponding candidate transaction samples to be marked based on a preset sequence;
and carrying out de-duplication treatment on the candidate transaction samples to be marked according to the marking value vector to obtain the transaction samples to be marked.
Optionally, the termination condition includes any one or more of the following:
the number of the candidate transaction samples to be marked is smaller than or equal to a preset first number threshold;
The proportion of the candidate transaction samples to be marked screened out in the unlabeled transaction sample set is smaller than or equal to a preset first proportion threshold;
the number of unlabeled transaction samples is less than or equal to a preset second number threshold;
and the proportion of the unlabeled transaction samples in the unlabeled transaction sample set is smaller than or equal to a preset second proportion threshold value.
Optionally, the algorithms used by the plurality of predictive models are different or the super-parameters are different.
Optionally, the obtaining the money laundering model based on the prediction model includes any one of the following:
taking the target money back-washing model as the money back-washing model;
selecting any one of the screening models as the money laundering model;
combining at least two screening models to obtain the money laundering model;
and combining the target money back-washing model and at least one screening model to obtain the money back-washing model.
Optionally, the prediction model is composed of a plurality of models;
the predicting the unlabeled transaction sample set according to the prediction model, and screening at least one transaction sample to be marked from the unlabeled transaction sample set according to a prediction result includes:
Predicting the unlabeled transaction sample set according to one model or at least two models in the prediction models to obtain a first prediction result of each unlabeled transaction sample;
screening candidate transaction samples to be marked meeting preset conditions from the unlabeled transaction sample set according to the first prediction result;
predicting each candidate transaction sample to be marked according to a plurality of models forming the prediction model to obtain a corresponding second prediction result;
and screening the at least one transaction sample to be marked from the candidate transaction samples to be marked according to the corresponding second prediction result.
Optionally, the predicting the unlabeled transaction sample set according to one model or at least two models in the prediction models, to obtain a first prediction result of each unlabeled transaction sample includes:
selecting one model from a plurality of models forming the prediction model, and predicting each unlabeled transaction sample according to the selected model to obtain a corresponding first prediction result;
or,
selecting at least two models from a plurality of models forming the prediction model, respectively predicting each unlabeled transaction sample according to the selected at least two models to obtain a prediction result, and averaging the prediction results corresponding to the at least two models to obtain a first prediction result.
Optionally, the screening the at least one transaction sample to be marked from the candidate transaction samples to be marked according to the corresponding second prediction result includes:
for each candidate transaction sample to be marked, setting a marking value corresponding to a second prediction result in a second setting range as a first setting value; setting a marking value corresponding to a second prediction result exceeding the second setting range as a second setting value;
for each candidate transaction sample to be marked, obtaining a marking value vector according to marking values corresponding to second prediction results of the corresponding candidate transaction samples to be marked based on a preset sequence;
and carrying out de-duplication treatment on the candidate transaction samples to be marked according to the marking value vector to obtain at least one transaction sample to be marked.
Optionally, the obtaining the money laundering model based on the prediction model includes:
any one prediction model is obtained and used as the money back washing model; or,
and combining at least two prediction models to obtain the money laundering model.
Optionally, the machine learning algorithm is a random forest algorithm.
Optionally, the training device further includes:
A module for obtaining a target transaction sample to be predicted;
and the module is used for predicting the target transaction sample according to the money laundering model to obtain a prediction result of the target transaction sample.
Optionally, the training device further includes:
and means for displaying a prediction result of the target sample.
According to a third aspect of the present invention, there is provided an electronic device comprising:
a training device according to the second aspect of the invention; or,
a processor and a memory for storing instructions for controlling the processor to perform the training method according to the first aspect of the invention.
According to a fourth aspect of the present invention there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the training method according to the first aspect of the present invention.
According to the embodiment of the invention, firstly, based on a preselected machine learning algorithm, machine learning training is carried out on the basis of a small amount of initial marked transaction sample sets to obtain a prediction model, then, according to the prediction result of the prediction model on unmarked transaction samples, transaction samples to be marked, which need expert marking, are screened out, and according to new marked transaction samples, which are obtained after the expert marking of the transaction samples to be marked, iterative training is carried out on the prediction model to obtain a money laundering model. In this way, the number of expert-labeled transaction samples may be reduced, making it easier for the predictive model to learn valuable information. Moreover, the calculated amount in the iterative training process of the prediction model can be reduced by carrying out de-duplication processing on the candidate transaction samples to be marked.
Other features of the present invention and its advantages will become apparent from the following detailed description of exemplary embodiments of the invention, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a block diagram showing an example of a hardware configuration of an electronic device that can be used to implement an embodiment of the present invention;
FIG. 2 shows a flow chart of a training method of the money laundering model of an embodiment of the invention;
FIG. 3 shows a flowchart of the steps of screening transaction samples to be marked according to an embodiment of the present invention;
FIG. 4 is a flow chart showing one example of a training method for a money laundering model according to an embodiment of the present invention;
FIG. 5 shows a functional block diagram of a training device for a money laundering model according to an embodiment of the invention;
fig. 6 shows a block diagram of an electronic device of an embodiment of the invention.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.
The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of exemplary embodiments may have different values.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
Various embodiments and examples according to embodiments of the present invention are described below with reference to the accompanying drawings.
< hardware configuration >
Fig. 1 is a block diagram showing a hardware configuration of an electronic device 1000 in which an embodiment of the present invention can be implemented.
The electronic device 1000 may be a laptop, desktop, cell phone, tablet, etc. As shown in fig. 1, the electronic device 1000 may include a processor 1100, a memory 1200, an interface device 1300, a communication device 1400, a display device 1500, an input device 1600, a speaker 1700, a microphone 1800, and the like. The processor 1100 may be a central processing unit CPU, a microprocessor MCU, or the like. The memory 1200 includes, for example, ROM (read only memory), RAM (random access memory), nonvolatile memory such as a hard disk, and the like. The interface device 1300 includes, for example, a USB interface, a headphone interface, and the like. The communication device 1400 can be capable of wired or wireless communication, and specifically can include Wifi communication, bluetooth communication, 2G/3G/4G/5G communication, and the like. The display device 1500 is, for example, a liquid crystal display, a touch display, or the like. The input device 1600 may include, for example, a touch screen, keyboard, somatosensory input, and the like. A user may input/output voice information through the speaker 1700 and microphone 1800.
The electronic device shown in fig. 1 is merely illustrative and is in no way meant to limit the invention, its application or uses. The memory 1200 of the electronic device 1000 is configured to store instructions for controlling the processor 1100 to operate to perform any one of the training methods for the money laundering model provided in the embodiments of the present invention. It will be appreciated by those skilled in the art that although a plurality of devices are shown for the electronic apparatus 1000 in fig. 1, the present invention may relate to only some of the devices thereof, for example, the electronic apparatus 1000 relates to only the processor 1100 and the storage device 1200. The skilled person can design instructions according to the disclosed solution. How the instructions control the processor to operate is well known in the art and will not be described in detail here.
< method example >
In this embodiment, a training method for a money laundering model is provided. The training method of the money laundering model can be implemented by electronic equipment. The electronic device may be an electronic device 1000 as shown in fig. 1.
According to the training method of the money laundering model of the present embodiment shown in fig. 2, the training method may include the following steps S2100 to S2500:
In step S2100, an initial set of marked transaction samples is obtained.
Wherein each sample in the marked set of samples has been marked as either a money laundering transaction or a non-money laundering transaction.
For example, a sample in the marked transaction sample set that is a money laundering transaction may be marked 1 and a sample that is not a money laundering transaction may be marked 0.
In this embodiment, each sample in the marked transaction sample set may be expert marked, or may be marked by the electronic device according to a predetermined rule.
Step S2200, training a machine learning model based on the initial marked transaction sample set according to a pre-selected machine learning algorithm to obtain a prediction model.
In one embodiment, the pre-selected machine learning algorithm may be, for example, any one or more of a random forest algorithm, a GBDT (Gradient Boosting Decison Tree, gradient-lifting tree) algorithm, and a XGBOOST (eXtreme Gradient Boosting, extreme gradient-lifting) algorithm, a logistic regression algorithm, a neural network algorithm, and the like.
Based on the marked transaction sample set obtained in step S2100 and the pre-selected machine learning algorithm, machine learning model training is performed to obtain at least one predictive model.
Step S2300, obtaining an unlabeled transaction sample set, carrying out prediction processing on the unlabeled transaction sample set according to a prediction model, and screening at least one transaction sample to be marked from the unlabeled transaction sample set according to a prediction result.
The unlabeled transaction sample set includes a plurality of unlabeled transaction samples, i.e., each unlabeled transaction sample is not pre-labeled as either a money laundering transaction or a non-money laundering transaction.
In the first embodiment, the prediction model may be a model, and then, the prediction result of each unlabeled transaction sample may be obtained by performing prediction processing on the unlabeled transaction sample set according to the prediction model. The prediction result may be a score between 0 and 1, for indicating a probability of predicting whether the corresponding transaction sample is a money laundering transaction based on the prediction model.
In this embodiment, an unlabeled transaction sample whose prediction result is within the first setting range may be selected from unlabeled transaction samples as the transaction sample to be labeled.
When the prediction result is closer to 1, that is, the probability that the corresponding transaction sample is considered as money laundering transaction by the prediction model is higher. The more closely the prediction is, the greater the likelihood that the prediction model considers that the corresponding transaction sample is not a money laundering transaction. In other words, the greater the absolute value of the difference between the predicted result and 0.5, the easier the prediction model determines the type of the corresponding transaction sample (i.e., money laundering or not money laundering). The larger the absolute value of the difference between the predicted result of the transaction sample and 0.5, the less valuable the prediction model is to annotate. Since only transaction samples which cannot be accurately predicted by the prediction model have labeling value, the unlabeled transaction samples with the prediction result within the first setting range can be selected as the transaction samples to be labeled.
The first setting range may be preset according to an application scenario or specific requirements. For example, the first setting range may be [0.4,0.6], and then, an unlabeled transaction sample with a prediction result greater than or equal to 0.4 and less than or equal to 0.6 may be selected as the transaction sample to be labeled.
In a second embodiment, the predictive model includes a target anti-money laundering model and a screening model consisting of a plurality of models.
Further, the machine learning algorithms employed by the plurality of predictive models may be different and/or the superparameters may be set to be different such that machine learning model training is performed based on the same initial set of marked transaction samples, resulting in different plurality of predictive models.
Then, performing prediction processing on the unlabeled transaction sample set according to the prediction model, and screening at least one transaction sample to be labeled from the unlabeled transaction sample set according to the prediction result may further include steps S2310 to S2340 as shown in fig. 3:
step S2310, predicting the unlabeled transaction sample set according to the target anti-money laundering model to obtain a first prediction result of each unlabeled transaction sample.
The first prediction result may be a score between 0 and 1 for indicating a probability of predicting whether the corresponding transaction sample is a money laundering transaction based on the target anti-money laundering model.
Step S2320, screening marked transaction samples with the first prediction result within the first setting range from unmarked transaction samples as candidate transaction samples to be marked.
When the first prediction result is closer to 1, that is, the target money laundering model considers the corresponding transaction sample to be a money laundering transaction, the probability is higher. The greater the likelihood that the target money laundering model considers the corresponding transaction sample to be not a money laundering transaction when the first prediction result is closer to 0. In other words, the greater the absolute value of the difference between the first predicted result and 0.5, the easier the target anti-money laundering model determines the type of the corresponding transaction sample (i.e., money laundering or not money laundering). The larger the absolute value of the difference between the first predicted outcome of the transaction sample and 0.5, the less valuable the target anti-money laundering model will be. Because only the transaction samples which cannot be accurately predicted by the target money back-washing model have the labeling value, the unlabeled transaction samples with the first prediction result within the first setting range can be selected as candidate transaction samples to be labeled.
The first setting range may be preset according to an application scenario or specific requirements. For example, the first setting range may be [0.4,0.6], and then, an unlabeled transaction sample with a prediction result greater than or equal to 0.4 and less than or equal to 0.6 may be selected as a candidate transaction sample to be labeled.
Step S2330, for each candidate transaction sample to be marked, respectively predicting according to a plurality of models forming the screening model to obtain a corresponding second prediction result.
Specifically, the transaction samples to be marked for each candidate may be predicted based on a plurality of models constituting the screening model, so as to obtain a second prediction result of each candidate transaction sample to be marked corresponding to each model in the screening model.
For example, the prediction model includes a screening model composed of n models, and m candidate transaction samples to be marked are obtained through step S2320, i #i∈[1,m]) The transaction sample to be marked candidate corresponds to the j (j E [1, n) in the screening model]) The second prediction result of the model may be P i,j
Step S2340, screening at least one transaction sample to be marked from the candidate transaction samples to be marked according to the corresponding second prediction result.
In one embodiment, the duplicate removal processing may be performed on the candidate transaction samples to be marked according to the second prediction result of each candidate transaction sample to be marked, so as to obtain at least one transaction sample to be marked.
In one embodiment, the manner of performing the deduplication processing on the candidate transaction samples to be marked may include steps S2341 to S2343 as follows:
Step S2341, setting a marking value corresponding to a second prediction result in a second setting range as a first setting value for each candidate transaction sample to be marked; and setting a marking value corresponding to a second prediction result exceeding a second setting range as a second setting value.
In this embodiment, whether the candidate transaction sample to be marked is the result of the money laundering transaction may be determined by determining whether the second prediction result of the candidate transaction sample to be marked is within the second set range.
For example, it may be that the corresponding model predicts that the corresponding candidate transaction sample to be marked is not a money laundering transaction if the second prediction result is within the second set range; and under the condition that the second prediction result exceeds the second setting range, the corresponding model predicts that the corresponding candidate transaction sample to be marked is a money laundering transaction.
The second setting range may be preset according to an application scenario or specific requirements. For example, the second setting range may be [0,0.5], and then, the flag value of the second prediction result that is equal to or greater than 0 and equal to or less than 0.5 (i.e., within the second setting range) may be set as the first setting value, and the flag value of the second prediction result that is greater than 0.5 (i.e., exceeds the second setting range) may be set as the second setting value.
The first setting value and the second setting value may be specific values set in advance according to application scenarios or specific requirements, and the specific values of the first setting value and the second setting value are different. For example, the first set point may be 0 and the second set point may be 1.
In one example, the number of models constituting the screening model is 3, and the second prediction result obtained by predicting the transaction samples to be marked with 4 candidates may be as shown in the following table 1:
TABLE 1
Figure BDA0003960009960000121
Setting a flag value corresponding to a second prediction result in a second setting range in table 1 to be a first setting value 0; setting the marking value corresponding to the second prediction result exceeding the second setting range as a second setting value 1, wherein the marking value of the obtained 4 candidate transaction samples to be marked corresponding to each model can be as shown in the following table 2:
TABLE 2
Figure BDA0003960009960000122
Figure BDA0003960009960000131
Step S2342, for each candidate transaction sample to be marked, obtaining a marking value vector according to the marking value corresponding to the second prediction result of the corresponding candidate transaction sample to be marked based on the preset sequence.
The preset sequence may be a sequence of a plurality of models constituting the screening model, and the sequence may be preset according to an application scenario or specific requirements.
For example, the predictive model includes a screening model consisting of n models, and m candidate transaction samples to be marked are obtained by step S2320, i (i.e. [1, m)]) The transaction sample to be marked candidate corresponds to the j (j E [1, n) in the screening model]) Of individual modelsThe second prediction result is P i,j Second prediction result P i,j The corresponding tag value may be denoted B i,j . If the n models are arranged in sequence, the resulting tag value vector of the i candidate transaction sample to be tagged may be expressed as: (B) i,1 ,B i,2 ,…,B i,j ,…,B i,n-1 ,B i,n )。
In the example shown in table 2, the tag value vector of the 1 st candidate transaction sample to be tagged may be represented as (1, 0, 1), the tag value vector of the 2 nd candidate transaction sample to be tagged may be represented as (1, 0, 1), the tag value vector of the 3 rd candidate transaction sample to be tagged may be represented as (1, 1), and the tag value vector of the 4 th candidate transaction sample to be tagged may be represented as (1, 0, 1).
In step S2343, the candidate transaction samples to be marked are subjected to de-duplication processing according to the marking value vector, so as to obtain the transaction samples to be marked.
In one embodiment, the method for performing deduplication processing on the candidate transaction samples to be marked according to the marking value vector to obtain the transaction samples to be marked may be: for each marking value vector, selecting one from at least one candidate transaction sample to be marked as the transaction sample to be marked.
In the example shown in table 2, the marking value vectors of the 1 st, 3 rd and 4 th candidate transaction samples to be marked are (1, 0, 1), and then one transaction sample is selected from the 1 st, 3 rd and 4 th candidate transaction samples to be marked as the transaction sample to be marked. In addition, since only the 2 nd candidate transaction sample to be marked has the mark value vector of (1, 1), the 2 nd candidate transaction sample to be marked can be directly used as the transaction sample to be marked.
In a third embodiment, the prediction model is composed of a plurality of models, and then, performing prediction processing on the unlabeled transaction sample set according to the prediction model, and screening at least one transaction sample to be labeled from the unlabeled transaction sample set according to the prediction result may further include steps S2350 to S2380 as follows:
step S2350, predicting the unlabeled transaction sample set according to one or at least two of the prediction models to obtain a first prediction result of each unlabeled transaction sample.
In one embodiment, predicting the set of unlabeled transaction samples based on one or at least two of the prediction models, the obtaining a first prediction result for each unlabeled transaction sample may include: and selecting one model from a plurality of models forming a prediction model, and predicting each unlabeled transaction sample according to the selected model to obtain a corresponding first prediction result.
The first prediction result may be a score between 0 and 1 for indicating a probability of predicting whether the corresponding unlabeled transaction sample is a money laundering transaction based on the selected model.
In another embodiment, predicting the set of unlabeled transaction samples according to one or at least two of the prediction models, the obtaining a first prediction result for each unlabeled transaction sample may include: selecting at least two models from a plurality of models constituting a prediction model; and respectively predicting each unlabeled transaction sample according to the selected at least two models to obtain a predicted result, and averaging the predicted results respectively corresponding to the at least two models to obtain a first predicted result.
The prediction result of each selected model may be a score between 0 and 1, which is used to represent the probability that the corresponding model predicts whether the corresponding transaction sample is a money laundering transaction. The first prediction is an average of the predictions for each selected model.
Step S2360, screening candidate transaction samples to be marked meeting the preset conditions from the unlabeled transaction sample set according to the first prediction result.
The more likely that the selected one or at least two models consider the corresponding transaction sample to be a money laundering transaction, the closer the first prediction is to 1. The greater the likelihood that the selected one or at least two models consider the corresponding transaction sample to be a money laundering transaction when the first prediction result is closer to 0. In other words, the greater the absolute value of the difference between the first prediction result and 0.5, the easier the selected model or models will determine the type of the corresponding transaction sample (i.e., money laundering or not money laundering). The greater the absolute value of the difference between the first prediction result of the transaction sample and 0.5, the less valuable the model or models are to be labeled. Since only the transaction samples which cannot be accurately predicted by the selected model or at least two models have labeling value, the unlabeled transaction samples with the first prediction result within the first setting range can be selected as candidate transaction samples to be labeled.
The first setting range may be preset according to an application scenario or specific requirements. For example, the first setting range may be [0.4,0.6], and then, an unlabeled transaction sample with a prediction result greater than or equal to 0.4 and less than or equal to 0.6 may be selected as a candidate transaction sample to be labeled.
Step S2370, for each candidate transaction sample to be marked, respectively predicting according to a plurality of models constituting the prediction model to obtain a corresponding second prediction result.
Specifically, the transaction samples to be marked for each candidate may be predicted based on a plurality of models constituting the prediction model, so as to obtain a second prediction result of each model in the prediction model corresponding to each transaction sample to be marked for each candidate.
For example, the prediction model is composed of k models, and m candidate transaction samples to be marked are obtained in step S2320, i (i.e. [1, m)]) The transaction sample to be marked candidate corresponds to the j (j E [1, k) in the screening model]) The second prediction result of the model may be P' i,j °
Step S2380, screening at least one transaction sample to be marked from the candidate transaction samples to be marked according to the corresponding second prediction result.
In one embodiment, the screening at least one transaction sample to be marked from the candidate transaction samples to be marked according to the corresponding second prediction result includes steps S2381 to S2383 as follows:
Step S2381, setting a marking value corresponding to a second prediction result in a second setting range as a first setting value for each candidate transaction sample to be marked; and setting a marking value corresponding to a second prediction result exceeding a second setting range as a second setting value.
This step S2381 may refer to the description of the previous step S2341, and will not be repeated here.
Step S2382, for each candidate transaction sample to be marked, obtaining a marking value vector according to the marking value corresponding to the second prediction result of the corresponding candidate transaction sample to be marked based on the preset sequence.
The step S2382 may refer to the description of the step S2342, and will not be repeated here.
In step S2383, the candidate transaction samples to be marked are subjected to de-duplication processing according to the marking value vector, so as to obtain at least one transaction sample to be marked.
The step S2383 may refer to the description of the step S2343, and will not be described herein.
After obtaining at least one transaction sample to be marked according to the first, second, or third embodiments described above, step S2400 is continued.
Step S2400, obtaining at least one new marked transaction sample after the at least one transaction sample to be marked is marked, and performing iterative update training on the prediction model according to the at least one new marked transaction sample.
In one embodiment, the transaction samples to be marked obtained in step S2300 may be provided to an expert performing the marking action, and each transaction sample to be marked is marked by the expert to obtain a corresponding new marked transaction sample.
The electronic device for executing the embodiment of the invention can acquire a new marked transaction sample after the transaction sample to be marked is marked, and perform iterative updating training on the prediction model according to the new marked transaction sample.
On the basis of the first embodiment, the prediction model is a model, and then the model may be iteratively updated according to the new marked transaction samples.
On the basis of the second embodiment, the prediction model comprises a target back-washing model and a screening model consisting of a plurality of models, and then the target back-washing model and the screening model consisting of the plurality of models can be subjected to iterative update training according to new marked transaction samples.
On the basis of the third embodiment, the prediction model is composed of a plurality of models, and then the plurality of models composing the prediction model can be iteratively updated and trained according to the new marked transaction samples.
And step S2500, judging whether the prediction model meets the preset termination condition, if so, obtaining a money laundering model based on the prediction model, otherwise, returning to the step of acquiring the unlabeled transaction sample set.
On the basis of the first embodiment described above, the termination condition may include any one or more of the following:
the number of transaction samples to be marked is less than or equal to a preset first number threshold;
the proportion of the screened transaction samples to be marked in the unlabeled transaction sample set is smaller than or equal to a preset first proportion threshold;
the number of unlabeled transaction samples is less than or equal to a preset second number threshold;
the proportion of unlabeled transaction samples in the unlabeled transaction sample set is less than or equal to a preset second proportion threshold.
In an embodiment in which the termination condition includes that the number of transaction samples to be marked is less than or equal to a preset first number threshold, the first number threshold may be set in advance according to an application scenario or specific requirements, for example, the first number threshold may be 100, and then, when the number of transaction samples to be marked obtained in step S2300 is less than or equal to 100, it is determined that the prediction model satisfies the termination condition.
In an embodiment in which the termination condition includes an unlabeled transaction sample set and the proportion of the screened transaction samples to be marked is less than or equal to a preset first proportion threshold, the number of the transaction samples to be marked screened in the step S2300 may be determined as the number to be marked, the number of the unlabeled transaction samples in the unlabeled transaction sample set obtained in the step S2300 is determined as the number to be unmarked, and the proportion of the number to be marked to the number to be unmarked is calculated, that is, the proportion of the screened transaction samples to be marked in the unlabeled transaction sample set. The first proportion threshold may be preset according to an application scenario or specific requirements, for example, the first proportion threshold may be 10%, and then when the proportion of the transaction samples to be marked screened out of the unlabeled transaction sample set obtained in step S2300 is less than or equal to 10%, it may be determined that the prediction model satisfies the termination condition.
In an embodiment in which the termination condition includes that the number of unlabeled transaction samples is less than or equal to a preset second number threshold, the second number threshold may be preset according to an application scenario or specific requirements, for example, the second number threshold may be 100, and then, when the number of unlabeled transaction samples in the unlabeled transaction sample set obtained in step S2300 is less than or equal to 100, it is determined that the prediction model satisfies the termination condition.
In an embodiment in which the termination condition includes an unlabeled transaction sample set, the proportion of the unlabeled transaction samples is less than or equal to the preset second proportion threshold, the number of the to-be-labeled transaction samples screened in the step S2300 may be determined as the number to be labeled, the number of the unlabeled transaction samples in the unlabeled transaction sample set obtained in the step S2300 is determined as the number to be unlabeled, a difference value between the number to be labeled and the number to be labeled is calculated, and a ratio of the difference value to the number to be labeled is calculated, that is, the proportion of the unlabeled transaction samples. The second proportion threshold may be preset according to the application scenario or specific requirements, for example, the second proportion threshold may be 10%, and then it may be determined that the prediction model satisfies the termination condition when the proportion of the unlabeled transaction sample obtained in step S2300 is less than or equal to 10%.
The termination condition may include any one or more of the following, on the basis of the second or third embodiment described above:
the number of candidate transaction samples to be marked is smaller than or equal to a preset first number threshold;
the proportion of the candidate marked transaction samples screened out in the unmarked transaction sample set is smaller than or equal to a preset first proportion threshold value;
The number of unlabeled transaction samples is less than or equal to a preset second number threshold;
the proportion of unlabeled transaction samples in the unlabeled transaction sample set is less than or equal to a preset second proportion threshold.
In an embodiment in which the termination condition includes that the number of candidate transaction samples to be marked is less than or equal to a preset first number threshold, the first number threshold may be set in advance according to an application scenario or specific requirements, for example, the first number threshold may be 100, and then, when the number of candidate transaction samples to be marked obtained through step S2320 or step S2360 is less than or equal to 100, it is determined that the prediction model satisfies the termination condition.
In an embodiment in which the termination condition includes an unlabeled transaction sample set, and the proportion of the candidate to-be-marked transaction samples screened out is less than or equal to a preset first proportion threshold, the number of the candidate to-be-marked transaction samples screened out in step S2320 or step S2360 may be determined as the candidate to-be-marked number, the number of the unlabeled transaction samples in the unlabeled transaction sample set obtained in step S2320 or step S2360 is determined as the unlabeled number, and the ratio of the candidate to-be-marked number to the unlabeled number is calculated, that is, the proportion of the candidate to-be-marked transaction samples screened out in the unlabeled transaction sample set is calculated. The first proportion threshold may be preset according to an application scenario or specific requirements, for example, the first proportion threshold may be 10%, and then when the proportion of candidate transaction samples to be marked screened out of the unlabeled transaction sample set obtained in step S2320 or step S2360 is less than or equal to 10%, it may be determined that the prediction model meets the termination condition.
In an embodiment in which the termination condition includes that the number of unlabeled transaction samples is less than or equal to a preset second number threshold, the second number threshold may be preset according to an application scenario or specific requirements, for example, the second number threshold may be 100, and then, when the number of unlabeled transaction samples in the unlabeled transaction sample set acquired through step S2320 or step S2360 is less than or equal to 100, it is determined that the prediction model satisfies the termination condition.
In an embodiment in which the termination condition includes an unlabeled transaction sample set, the proportion of the unlabeled transaction samples is less than or equal to the preset second proportion threshold, the number of candidate transaction samples to be labeled screened in step S2320 or step S2360 may be determined as the number of candidate transaction samples to be labeled, the number of unlabeled transaction samples in the unlabeled transaction sample set obtained in step S2320 or step S2360 is determined as the number of unlabeled, the difference between the number of unlabeled samples and the number of candidate transaction samples is calculated, and the ratio of the difference to the number of unlabeled samples is calculated, that is, the proportion of the unlabeled transaction samples. The second proportion threshold may be preset according to the application scenario or specific requirements, for example, the second proportion threshold may be 10%, and then when the proportion of the unlabeled transaction sample obtained in step S2320 or step S2360 is less than or equal to 10%, it may be determined that the prediction model satisfies the termination condition.
On the basis of the first embodiment, the prediction model is a model, and when the prediction model meets the preset termination condition, obtaining the money back-washing model based on the prediction model may include: the prediction model was used as a money laundering model.
On the basis of the second embodiment, the prediction model includes a target money laundering model and a screening model composed of a plurality of models, and when the prediction model meets a preset termination condition, obtaining the money laundering model based on the prediction model may include any one of the following:
taking the target money back-washing model as a money back-washing model;
selecting any screening model as a back money laundering model;
combining at least two screening models to obtain a money laundering model;
combining the target money back-washing model and at least one screening model to obtain the money back-washing model.
In one embodiment, combining at least two models to obtain a money launder model may include: the at least two models are combined by means of weighted summation, weighted averaging or the like.
On the basis of the third embodiment, the prediction model is composed of a plurality of models, and when the prediction model meets the preset termination condition, obtaining the money laundering model based on the prediction model may include:
Any model forming a prediction model is obtained and used as a money back washing model; or,
and combining at least two models forming the prediction model to obtain the money laundering model.
In one embodiment, combining at least two models to obtain a money launder model may include: the at least two models are combined by means of weighted summation, weighted averaging or the like.
And when the prediction model meets the non-preset termination condition, returning to the step of acquiring the unlabeled transaction sample set.
In one embodiment, each return to the unlabeled transaction sample set obtained by the step of obtaining an unlabeled transaction sample set corresponds to a subset of the unlabeled transaction sample set obtained by the step of obtaining an unlabeled transaction sample set performed a previous time. The term "previous" as used herein refers to any one of the preceding times.
According to the embodiment of the invention, firstly, based on a preselected machine learning algorithm, machine learning training is carried out on the basis of a small amount of initial marked transaction sample sets to obtain a prediction model, then, according to the prediction result of the prediction model on unmarked transaction samples, transaction samples to be marked, which need expert marking, are screened out, and according to new marked transaction samples, which are obtained after the expert marking of the transaction samples to be marked, iterative training is carried out on the prediction model to obtain a money laundering model. In this way, the number of expert-labeled transaction samples may be reduced, making it easier for the predictive model to learn valuable information. Moreover, the calculated amount in the iterative training process of the prediction model can be reduced by carrying out de-duplication processing on the candidate transaction samples to be marked.
After obtaining the money launder model, the method may further comprise:
obtaining a target transaction sample to be predicted;
and predicting the target transaction sample according to the money laundering model to obtain a prediction result of the target transaction sample.
In one embodiment, the method may further comprise: and displaying the predicted result of the target transaction sample so that the supervisor can execute corresponding control processing on the target transaction sample according to the predicted result. For example, whether the target transaction sample is a money laundering transaction may be determined according to the prediction result, and in the case of the money laundering transaction, a penalty may be given to a executor of the target transaction sample, or the corresponding transaction may be cancelled.
< example 1>
The training method of the money laundering model provided in this embodiment will be further described below with reference to fig. 4. As shown in fig. 4, the method includes:
step S4001, an initial set of marked transaction samples is obtained.
Step S4002, performing machine learning model training based on the initial marked transaction sample set according to a pre-selected machine learning algorithm, to obtain a target back-washing model and a screening model composed of a plurality of models.
Step S4003, an unlabeled transaction sample set is obtained.
Step S4004, predicting the unlabeled transaction sample set according to the target anti-money laundering model to obtain a first prediction result of each unlabeled transaction sample.
In step S4005, a marked transaction sample with the first prediction result within the first setting range is selected from the unmarked transaction samples as a candidate transaction sample to be marked.
Step S4006, for each candidate transaction sample to be marked, respectively predicting according to a plurality of models constituting the screening model, to obtain a corresponding second prediction result.
Step S4007, for each candidate transaction sample to be marked, setting a marking value corresponding to a second prediction result within a second setting range as a first setting value; and setting a marking value corresponding to a second prediction result exceeding a second setting range as a second setting value.
Step S4008, for each candidate transaction sample to be marked, obtaining a marking value vector according to the marking value corresponding to the second prediction result of the corresponding candidate transaction sample to be marked based on the preset sequence.
In step S4009, the candidate transaction samples to be marked are subjected to de-duplication processing according to the marking value vector, so as to obtain the transaction samples to be marked.
Step S4010, obtaining at least one new marked transaction sample after the at least one transaction sample to be marked is marked, and performing iterative update training on the target money laundering model and the screening model according to the at least one new marked transaction sample.
Step S4011, judging whether the prediction model satisfies a preset termination condition.
When the termination condition is satisfied, step S4012 is executed; when the termination condition is not satisfied, the process returns to step S4003.
Step S4012, the target money laundering model is used as a money laundering model.
< device example >
In this embodiment, a training device 5000 for backwashing money models is provided, as shown in fig. 5, and includes an initial sample acquiring module 5100, a model initial training module 5200, a sample screening module 5300 to be marked, a model iterative training module 5400 and a termination condition judging module 5500.
The initial sample acquisition module 5100 is configured to acquire an initial set of marked transaction samples, wherein each sample in the set of marked samples has been marked as either a money laundering transaction or not.
The model initial training module 5200 is configured to perform machine learning model training based on an initial marked transaction sample set according to a pre-selected machine learning algorithm to obtain a prediction model.
The sample to be marked screening module 5300 is configured to obtain an unlabeled transaction sample set, predict the unlabeled transaction sample set according to a prediction model, and screen at least one sample to be marked from the unlabeled transaction sample set according to a prediction result.
The model iterative training module 5400 is configured to obtain at least one new marked transaction sample after the at least one transaction sample to be marked is marked, and perform iterative update training on the prediction model according to the at least one new marked transaction sample.
The termination condition judging module 5500 is configured to judge whether the prediction model meets a preset termination condition, if yes, obtain a money laundering model based on the prediction model, otherwise control the sample screening module 5300 to obtain an unlabeled transaction sample set, perform prediction processing on the unlabeled transaction sample set according to the prediction model, and screen at least one transaction sample to be marked from the unlabeled transaction sample set according to the prediction result.
In one embodiment, predicting the unlabeled transaction sample set according to the prediction model, and screening the at least one transaction sample to be labeled from the unlabeled transaction sample set according to the prediction result includes:
and screening unlabeled transaction samples with predicted results within a first setting range from the unlabeled transaction sample set to serve as transaction samples to be marked.
In one embodiment, the predictive model includes a target back money laundering model and a screening model consisting of a plurality of models;
The method comprises the steps of carrying out prediction processing on an unlabeled transaction sample set according to a prediction model, and screening at least one transaction sample to be marked from the unlabeled transaction sample set according to a prediction result, wherein the steps comprise:
predicting an unlabeled transaction sample set according to the target back money laundering model to obtain a first prediction result of each unlabeled transaction sample;
screening unlabeled transaction samples with the first prediction result within a first setting range from an unlabeled transaction sample set, and taking the unlabeled transaction samples as candidate transaction samples to be labeled;
predicting each candidate transaction sample to be marked according to a plurality of models forming a screening model to obtain a corresponding second prediction result;
and screening at least one transaction sample to be marked from the candidate transaction samples to be marked according to the corresponding second prediction result.
In one embodiment, the screening at least one transaction sample to be marked from the candidate transaction samples to be marked according to the corresponding second prediction result includes:
and carrying out de-duplication processing on the candidate transaction samples to be marked according to the second prediction result corresponding to each candidate transaction sample to be marked, so as to obtain at least one transaction sample to be marked.
In one embodiment, according to a second prediction result corresponding to each candidate sample to be marked, performing deduplication processing on the candidate transaction samples to be marked to obtain the transaction samples to be marked includes:
for each candidate transaction sample to be marked, setting a marking value corresponding to a second prediction result in a second setting range as a first setting value; setting a marking value corresponding to a second prediction result exceeding a second setting range as a second setting value;
for each candidate transaction sample to be marked, obtaining a marking value vector according to marking values corresponding to second prediction results of the corresponding candidate transaction samples to be marked based on a preset sequence;
and carrying out de-duplication processing on the candidate transaction samples to be marked according to the marking value vector to obtain the transaction samples to be marked.
In one embodiment, the termination condition includes any one or more of the following:
the number of candidate transaction samples to be marked is smaller than or equal to a preset first number threshold;
the proportion of the candidate transaction samples to be marked screened out in the unlabeled transaction sample set is smaller than or equal to a preset first proportion threshold;
the number of unlabeled transaction samples is less than or equal to a preset second number threshold;
The proportion of unlabeled transaction samples in the unlabeled transaction sample set is less than or equal to a preset second proportion threshold.
In one embodiment, the algorithms employed by the multiple predictive models are different or the super-parameters are different.
In one embodiment, deriving the money laundering model based on the predictive model includes any one of:
taking the target money back-washing model as a money back-washing model;
selecting any screening model as a back money laundering model;
combining at least two screening models to obtain a money laundering model;
combining the target money back-washing model and at least one screening model to obtain the money back-washing model.
In one embodiment, the predictive model is comprised of a plurality of models;
the method comprises the steps of carrying out prediction processing on an unlabeled transaction sample set according to a prediction model, and screening at least one transaction sample to be marked from the unlabeled transaction sample set according to a prediction result, wherein the steps comprise:
predicting the unlabeled transaction sample set according to one model or at least two models in the prediction models to obtain a first prediction result of each unlabeled transaction sample;
screening candidate transaction samples to be marked which meet preset conditions from a non-marked transaction sample set according to a first prediction result;
Predicting each candidate transaction sample to be marked according to a plurality of models forming a prediction model to obtain a corresponding second prediction result;
and screening at least one transaction sample to be marked from the candidate transaction samples to be marked according to the corresponding second prediction result.
In one embodiment, predicting the set of unlabeled transaction samples according to one or at least two of the prediction models, the obtaining a first prediction result for each unlabeled transaction sample includes:
selecting one model from a plurality of models forming a prediction model, and predicting each unlabeled transaction sample according to the selected model to obtain a corresponding first prediction result;
or,
selecting at least two models from a plurality of models forming a prediction model, respectively predicting each unlabeled transaction sample according to the selected at least two models to obtain a prediction result, and averaging the prediction results respectively corresponding to the at least two models to obtain a first prediction result.
In one embodiment, the screening at least one transaction sample to be marked from the candidate transaction samples to be marked according to the corresponding second prediction result includes:
For each candidate transaction sample to be marked, setting a marking value corresponding to a second prediction result in a second setting range as a first setting value; setting a marking value corresponding to a second prediction result exceeding a second setting range as a second setting value;
for each candidate transaction sample to be marked, obtaining a marking value vector according to marking values corresponding to second prediction results of the corresponding candidate transaction samples to be marked based on a preset sequence;
and carrying out de-duplication treatment on the candidate transaction samples to be marked according to the marking value vector to obtain at least one transaction sample to be marked.
In one embodiment, deriving the money laundering model based on the predictive model includes:
any one prediction model is obtained and used as a money back washing model; or,
and combining at least two prediction models to obtain the money laundering model.
In one embodiment, the machine learning algorithm is a random forest algorithm.
In one embodiment, the training device 5000 further comprises:
a module for obtaining a target transaction sample to be predicted;
and the module is used for predicting the target transaction sample according to the money laundering model to obtain a prediction result of the target transaction sample.
In one embodiment, the training device 5000 further comprises:
and a module for displaying the predicted result of the target sample.
Those skilled in the art will appreciate that the exercise device 5000 of the money laundering model may be implemented in various ways. For example, the training device 5000 of the money back model may be implemented by an instruction configuration processor. For example, instructions may be stored in ROM and when the device is activated, instructions are read from ROM into the programmable device to implement exercise apparatus 5000 of the money laundering model. For example, the exercise device 5000 of the money laundering model may be solidified into a dedicated device (e.g., ASIC). The training device 5000 of the money laundering model may be divided into separate units or they may be combined together. The training device 5000 of the money laundering model may be implemented by one of the above-described various implementations, or may be implemented by a combination of two or more of the above-described various implementations.
In this embodiment, the training device 5000 of the money laundering model may have various implementation forms, for example, the training device 5000 of the money laundering model may be any functional module running in a software product or an application program that provides transaction services, or a peripheral insert, a plug-in, a patch, etc. of the software product or the application program, and may also be the software product or the application program itself.
< electronic device >
In the present embodiment, an electronic apparatus 6000 is also provided. The electronic device 6000 may be the electronic device 1000 shown in fig. 1.
In one aspect, the electronic device 6000 may include the training apparatus 5000 of the money laundering model described above for implementing the training method of the money laundering model according to any embodiment of the present invention.
In another aspect, as shown in fig. 6, the electronic device 6000 may further include a processor 6100 and a memory 6200, the memory 6200 for storing executable instructions; the processor 6100 is configured to run the electronic device 6000 according to the control of the instructions to perform the training method of the money laundering model according to any embodiment of the invention.
In this embodiment, the electronic device 6000 may be a mobile phone, a tablet computer, a palm computer, a desktop computer, a notebook computer, a workstation, a game console, and the like.
< computer-readable storage Medium >
In this embodiment, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a training method for a money laundering model as in any of the embodiments of the present invention.
The present invention may be an apparatus, method and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present invention.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.
Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are all equivalent.
The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims (10)

1. A training method for a money laundering model, comprising:
obtaining an initial set of marked transaction samples, wherein each sample in the set of marked samples has been marked as either a money laundering transaction or a non-money laundering transaction;
according to a pre-selected machine learning algorithm, training a machine learning model based on the initial marked transaction sample set to obtain a prediction model;
obtaining an unlabeled transaction sample set, carrying out prediction processing on the unlabeled transaction sample set according to the prediction model, and screening at least one transaction sample to be marked from the unlabeled transaction sample set according to a prediction result;
Obtaining at least one new marked transaction sample after the at least one transaction sample to be marked is marked, and performing iterative update training on the prediction model according to the at least one new marked transaction sample;
and judging whether the prediction model meets a preset termination condition, if so, obtaining a money laundering model based on the prediction model, otherwise, returning to the step of acquiring the unlabeled transaction sample set.
2. The training method of claim 1, wherein predicting the unlabeled exemplar set of transactions according to the prediction model, and screening at least one exemplar of transactions to be labeled from the unlabeled exemplar set of transactions according to a prediction result comprises:
and screening unlabeled transaction samples with prediction results within a first setting range from the unlabeled transaction sample set to serve as the transaction samples to be labeled.
3. The training method of claim 1, wherein the predictive model comprises a target back money laundering model and a screening model consisting of a plurality of models;
the step of predicting the unlabeled transaction sample set according to the prediction model, and screening at least one transaction sample to be marked from the unlabeled transaction sample set according to the prediction result comprises the following steps:
Predicting the unlabeled transaction sample set according to the target anti-money laundering model to obtain a first prediction result of each unlabeled transaction sample;
screening unlabeled transaction samples with a first prediction result within a first setting range from the unlabeled transaction sample set to serve as candidate transaction samples to be labeled;
predicting each candidate transaction sample to be marked according to a plurality of models forming the screening model to obtain a corresponding second prediction result;
and screening the at least one transaction sample to be marked from the candidate transaction samples to be marked according to the corresponding second prediction result.
4. The training method of claim 3, wherein the step of screening the at least one transaction sample to be marked from the candidate transaction samples to be marked based on the corresponding second prediction results comprises:
and carrying out de-duplication treatment on the candidate transaction samples to be marked according to a second prediction result corresponding to each candidate transaction sample to be marked, so as to obtain at least one transaction sample to be marked.
5. The training method of claim 4, wherein the step of performing deduplication processing on the candidate transaction samples to be marked according to the second prediction result corresponding to each candidate sample to be marked, to obtain the transaction samples to be marked comprises:
For each candidate transaction sample to be marked, setting a marking value corresponding to a second prediction result in a second setting range as a first setting value; setting a marking value corresponding to a second prediction result exceeding the second setting range as a second setting value;
for each candidate transaction sample to be marked, obtaining a marking value vector according to marking values corresponding to second prediction results of the corresponding candidate transaction samples to be marked based on a preset sequence;
and carrying out de-duplication treatment on the candidate transaction samples to be marked according to the marking value vector to obtain the transaction samples to be marked.
6. Training method according to claim 1, characterized in that the termination conditions comprise any one or more of the following:
the number of the candidate transaction samples to be marked is smaller than or equal to a preset first number threshold;
the proportion of the candidate transaction samples to be marked screened out in the unlabeled transaction sample set is smaller than or equal to a preset first proportion threshold;
the number of unlabeled transaction samples is less than or equal to a preset second number threshold;
and the proportion of the unlabeled transaction samples in the unlabeled transaction sample set is smaller than or equal to a preset second proportion threshold value.
7. A training method as claimed in claim 3, characterized in that the algorithms or super-parameters used by the plurality of predictive models are different.
8. A training device for backwashing money models, comprising:
an initial sample acquisition module for acquiring an initial set of marked transaction samples, wherein each sample in the set of marked samples has been marked as either a money laundering transaction or a non-money laundering transaction;
the model initial training module is used for carrying out machine learning model training based on the initial marked transaction sample set according to a pre-selected machine learning algorithm to obtain a prediction model;
the sample screening module to be marked is used for obtaining an unlabeled transaction sample set, carrying out prediction processing on the unlabeled transaction sample set according to the prediction model, and screening at least one sample to be marked from the unlabeled transaction sample set according to a prediction result;
the model iteration training module is used for obtaining at least one new marked transaction sample after the at least one transaction sample to be marked is marked, and carrying out iteration updating training on the prediction model according to the at least one new marked transaction sample;
And the termination condition judging module is used for judging whether the prediction model meets the preset termination condition, if so, obtaining a money laundering model based on the prediction model, and if not, returning to the step of acquiring the unlabeled transaction sample set.
9. An electronic device, comprising:
the training device of claim 8; or,
a processor and a memory for storing instructions for controlling the processor to perform the training method according to any one of claims 1 to 7.
10. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the training method of any of claims 1 to 7.
CN202211475923.3A 2019-09-05 2019-09-05 Training method and device for money backwashing model and electronic equipment Pending CN116128068A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211475923.3A CN116128068A (en) 2019-09-05 2019-09-05 Training method and device for money backwashing model and electronic equipment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211475923.3A CN116128068A (en) 2019-09-05 2019-09-05 Training method and device for money backwashing model and electronic equipment
CN201910839229.7A CN110689135B (en) 2019-09-05 2019-09-05 Anti-money laundering model training method and device and electronic equipment

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201910839229.7A Division CN110689135B (en) 2019-09-05 2019-09-05 Anti-money laundering model training method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN116128068A true CN116128068A (en) 2023-05-16

Family

ID=69107787

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201910839229.7A Active CN110689135B (en) 2019-09-05 2019-09-05 Anti-money laundering model training method and device and electronic equipment
CN202211475923.3A Pending CN116128068A (en) 2019-09-05 2019-09-05 Training method and device for money backwashing model and electronic equipment

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201910839229.7A Active CN110689135B (en) 2019-09-05 2019-09-05 Anti-money laundering model training method and device and electronic equipment

Country Status (1)

Country Link
CN (2) CN110689135B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111429282B (en) * 2020-03-27 2023-08-25 中国工商银行股份有限公司 Transaction money back-flushing method and device based on money back-flushing model migration
CN112116411A (en) * 2020-08-10 2020-12-22 第四范式(北京)技术有限公司 Training method, device and system for commodity recommendation sequencing model
CN112116478A (en) * 2020-09-28 2020-12-22 中国建设银行股份有限公司 Method and device for processing suspicious bank anti-money-laundering report
CN113256300B (en) * 2021-05-27 2023-04-07 支付宝(杭州)信息技术有限公司 Transaction processing method and device

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103903441B (en) * 2014-04-04 2015-07-01 山东省计算中心 Road traffic state distinguishing method based on semi-supervised learning
CN107507613B (en) * 2017-07-26 2021-03-16 合肥美的智能科技有限公司 Scene-oriented Chinese instruction identification method, device, equipment and storage medium
CN108053087A (en) * 2017-10-20 2018-05-18 深圳前海微众银行股份有限公司 Anti money washing monitoring method, equipment and computer readable storage medium
CN109359793B (en) * 2018-08-03 2020-11-17 创新先进技术有限公司 Prediction model training method and device for new scene
CN109460795A (en) * 2018-12-17 2019-03-12 北京三快在线科技有限公司 Classifier training method, apparatus, electronic equipment and computer-readable medium
CN109960800B (en) * 2019-03-13 2023-06-27 安徽省泰岳祥升软件有限公司 Weak supervision text classification method and device based on active learning
CN109919684A (en) * 2019-03-18 2019-06-21 上海盛付通电子支付服务有限公司 For generating method, electronic equipment and the computer readable storage medium of information prediction model
CN109960808B (en) * 2019-03-26 2023-02-07 广东工业大学 Text recognition method, device and equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN110689135B (en) 2022-10-11
CN110689135A (en) 2020-01-14

Similar Documents

Publication Publication Date Title
CN110689135B (en) Anti-money laundering model training method and device and electronic equipment
US11568260B2 (en) Exponential modeling with deep learning features
US20170236215A1 (en) User experience using social and financial information
CN107103036B (en) Method and equipment for acquiring application downloading probability and programmable equipment
JP2021121922A (en) Multi-model training method and apparatus based on feature extraction, electronic device, and medium
CN111179031B (en) Training method, device and system for commodity recommendation model
US11334773B2 (en) Task-based image masking
CN110555469A (en) Method and device for processing interactive sequence data
CN107969156A (en) For handling the neutral net of graph data
US12008591B2 (en) Machine learning based user targeting
CN109313720A (en) The strength neural network of external memory with sparse access
US20230028266A1 (en) Product recommendation to promote asset recycling
US20160283500A1 (en) Recommending connections in a social network system
CN111178687B (en) Financial risk classification method and device and electronic equipment
US11023442B2 (en) Automated structuring of unstructured data
CN111078858A (en) Article searching method and device and electronic equipment
US20240112229A1 (en) Facilitating responding to multiple product or service reviews associated with multiple sources
CN116648698A (en) Dynamic facet ordering
CN111191677B (en) User characteristic data generation method and device and electronic equipment
Li et al. The clustering-based case-based reasoning for imbalanced business failure prediction: a hybrid approach through integrating unsupervised process with supervised process
US10635972B1 (en) Recurrent neural networks with rectified linear units
CN112381236A (en) Data processing method, device, equipment and storage medium for federal transfer learning
US11074486B2 (en) Query analysis using deep neural net classification
US20230186197A1 (en) Effective performance assessment
CN111694768A (en) Operation method, device and related product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination