CN115330401A - Illegal merchant identification model construction method and device and illegal merchant identification method - Google Patents

Illegal merchant identification model construction method and device and illegal merchant identification method Download PDF

Info

Publication number
CN115330401A
CN115330401A CN202210079681.XA CN202210079681A CN115330401A CN 115330401 A CN115330401 A CN 115330401A CN 202210079681 A CN202210079681 A CN 202210079681A CN 115330401 A CN115330401 A CN 115330401A
Authority
CN
China
Prior art keywords
merchant
sample data
edge
classifier
specified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210079681.XA
Other languages
Chinese (zh)
Inventor
潘骏
牛媛媛
王颖卓
邹勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unionpay Co Ltd
Original Assignee
China Unionpay Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unionpay Co Ltd filed Critical China Unionpay Co Ltd
Priority to CN202210079681.XA priority Critical patent/CN115330401A/en
Publication of CN115330401A publication Critical patent/CN115330401A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products

Landscapes

  • Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Storage Device Security (AREA)

Abstract

The invention relates to a construction method of an illegal commercial tenant identification model. The method comprises the following steps: an initial sample obtaining step, namely obtaining first type of merchant sample data as an initial sample and forming a training set; a model training step, wherein model training is carried out based on a training set to obtain a classifier; an edge sampling step, namely classifying the second type of merchant sample data by using a classifier and acquiring merchant sample data of a specified edge through edge sampling; a sample adding step, namely performing specified processing on merchant sample data with specified edges to obtain first type merchant sample data and adding the first type merchant sample data into a training set; a condition judging step, namely judging whether the classifier meets a specified condition, if so, continuing the following model output step, and otherwise, repeating the model training step, the edge sampling step and the sample adding step until the classifier meets the specified condition; and a model output step, namely outputting the current classifier as an illegal merchant identification model.

Description

Illegal merchant identification model construction method and device and illegal merchant identification method
Technical Field
The invention relates to a data processing technology, in particular to a method for constructing an illegal merchant identification model and a method for identifying an illegal merchant.
Background
The difficulty of monitoring and detecting illegal merchants is continuously improved, and in order to reduce the time for screening the illegal merchants, a monitoring mode combining machine learning and illegal merchant identification scenes has been adopted. However, in the process of initially exploring and collecting positive and negative samples, it is found that whether the number of illegal samples of the currently determined merchants is too low, and the remaining merchants belong to unlabeled samples, while the traditional machine learning method needs to use large-scale labeled data to obtain a high-quality model, however, obtaining a large amount of labeled data is a very time-consuming and labor-consuming task, and under the condition of limited resources in the current business scene, the task is almost impossible to be completed.
Disclosure of Invention
In view of the above problems, the present invention aims to provide a method for constructing an illegal merchant identification model capable of intelligently recommending high-quality samples.
Further, the invention also aims to provide the illegal merchant identification method and the illegal merchant identification system which can improve the coverage rate and accuracy of illegal merchant identification.
The invention discloses a method for constructing an illegal commercial tenant identification model, which is characterized by comprising the following steps:
an initial sample obtaining step, namely obtaining first type of merchant sample data as an initial sample and forming a training set;
a model training step, namely performing model training and tuning training based on a training set to obtain a classifier;
an edge sampling step, namely classifying the second type of merchant sample data by using the classifier obtained in the model training step and acquiring merchant sample data of a specified edge through edge sampling;
a sample adding step, namely performing specified processing on the merchant sample data of the specified edge acquired in the edge sampling step to obtain first type merchant sample data and adding the first type merchant sample data into a training set;
a condition judging step of judging whether the classifier meets a specified condition, if so, continuing the following model output step, otherwise, repeating the model training step, the edge sampling step and the sample adding step until the classifier meets the specified condition; and
and a model output step, namely outputting the current classifier as an illegal merchant identification model.
Optionally, the first type of merchant sample data is merchant sample data labeled with a black sample and a white sample, and the second type of merchant sample data is merchant sample data not labeled with a black sample and a white sample.
Optionally, in the condition determining step, determining whether the classifier satisfies a prescribed condition includes: and judging whether the sample data of the specified edge obtained by utilizing the classifier to carry out edge sampling meets a first specified condition or not.
Optionally, the determining whether the sample data of the specified edge obtained by performing edge sampling by using the classifier satisfies a first specified condition includes: and judging whether the sample data of the specified edge obtained by utilizing the classifier to carry out edge sampling is smaller than a first threshold value.
Optionally, in the condition determining step, determining whether the classifier satisfies a prescribed condition includes: and judging whether the merchant sample data in the training set of the classifier meets a second specified condition.
Optionally, the determining whether the merchant sample data in the training set of the classifier satisfies a second prescribed condition includes: and judging whether the merchant sample data in the training set of the classifier is larger than a second threshold value.
Optionally, the obtaining merchant sample data for specifying an edge by edge sampling includes:
and performing edge sampling by using the classifier obtained in the model training step, and taking the merchant sample data with the prediction probability meeting a specified threshold range as the merchant sample data of the specified edge.
Optionally, the obtaining merchant sample data for specifying an edge by edge sampling includes:
and performing edge sampling by using the classifier obtained in the model training step, and taking the merchant sample data with the confidence coefficient meeting a specified threshold range as the merchant sample data of the specified edge.
Optionally, the merchant sample data with the prediction probability of 0.4-0.6 is used as the merchant sample data of the specified edge.
Optionally, the step of performing a predetermined process on the merchant sample data of the predetermined edge obtained in the edge sampling step to obtain a first type of merchant sample data includes: and marking black samples and white samples for the merchant sample data of the specified edge acquired in the edge sampling step.
Optionally, the classifier employs xgboost.
The illegal merchant identification method in one aspect of the present invention is characterized by including:
acquiring merchant characteristic data to be identified;
inputting the characteristic data of the commercial tenant to be identified into a illegal commercial tenant identification model constructed by the illegal commercial tenant identification model construction method of any one of claims 1-11; and
and identifying whether the characteristic data of the merchant to be identified is the characteristic illegal merchant by using the merchant identification model.
Optionally, the illegal merchant identification model building device is characterized by including:
the initial sample acquisition module is used as an initial sample to acquire first type of merchant sample data and form a training set;
the model training module is used for carrying out model training and tuning training on the basis of the training set to obtain a classifier;
the edge sampling module is used for classifying the second type of merchant sample data by using the classifier obtained by the model training module and obtaining merchant sample data of a specified edge through edge sampling;
the sample adding module is used for carrying out specified processing on the merchant sample data of the specified edge acquired in the edge sampling module to obtain first type merchant sample data and adding the first type merchant sample data into the training set;
the condition judging module is used for judging whether the classifier meets a specified condition or not, if so, executing the actions of the following model output modules, and otherwise, repeating the actions executed by the model training module, the edge sampling module and the sample adding module until the classifier meets the specified condition; and
and the model output module is used for outputting the current classifier as the illegal merchant identification model.
Optionally, the first type of merchant sample data is merchant sample data labeled with a black sample and a white sample, and the second type of merchant sample data is merchant sample data not labeled with a black sample and a white sample.
Optionally, in the condition determining module, determining whether the classifier satisfies a specified condition includes: and judging whether the sample data of the specified edge obtained by utilizing the classifier to perform edge sampling meets a first specified condition or not.
Optionally, in the condition determining module, determining whether the sample data of the specified edge obtained by performing edge sampling by using the classifier satisfies a first specified condition includes: and judging whether the sample data of the specified edge obtained by utilizing the classifier to carry out edge sampling is smaller than a first threshold value.
Optionally, in the condition determining module, the determining whether the classifier satisfies a specified condition includes: and judging whether the merchant sample data in the training set of the classifier meets a second specified condition.
Optionally, in the condition determining module, determining whether the merchant sample data in the training set of the classifier satisfies a second prescribed condition includes: and judging whether the merchant sample data in the training set of the classifier is larger than a second threshold value.
Optionally, in the edge sampling module, acquiring merchant sample data that specifies an edge by edge sampling includes: and performing edge sampling by using the classifier obtained by the model training module, and taking the merchant sample data with the prediction probability within a specified threshold range as the merchant sample data of the specified edge.
Optionally, in the edge sampling module, acquiring merchant sample data that specifies an edge by edge sampling includes:
and performing edge sampling by using the classifier obtained by the model training module, and taking the merchant sample data with the confidence coefficient meeting a specified threshold value as the merchant sample data of the specified edge.
Optionally, in the edge sampling module, the merchant sample data with the prediction probability of 0.4-0.6 is used as the merchant sample data of the specified edge.
Optionally, in the sample adding module, performing a specified process on the merchant sample data of the specified edge acquired in the edge sampling module to obtain a first type of merchant sample data includes: and marking black samples and white samples for the merchant sample data of the specified edge acquired in the edge sampling module.
Optionally, the classifier employs xgboost.
The computer readable medium of one aspect of the invention is stored with a computer program, and is characterized in that the computer program is executed by a processor to realize the illegal merchant identification model construction method.
The computer device in one aspect of the present invention includes a storage module, a processor, and a computer program stored on the storage module and executable on the processor, and is characterized in that the processor implements the method for constructing the illegal merchant identification model when executing the computer program.
Drawings
Fig. 1 is a schematic diagram showing a main flow of the illegal merchant identification model construction method of the present invention.
Fig. 2 is a schematic diagram showing a flow of the illegal merchant identification model construction method according to an embodiment of the present invention.
Fig. 3 is a block diagram showing the structure of the illegal merchant identification model construction device according to the present invention.
Detailed Description
The following description is of some of the several embodiments of the invention and is intended to provide a basic understanding of the invention. It is not intended to identify key or critical elements of the invention or to delineate the scope of the invention.
For the purposes of brevity and explanation, the principles of the present invention are described herein with reference primarily to exemplary embodiments thereof. However, those skilled in the art will readily recognize that the same principles are equally applicable to all types of illegal merchant identification model building methods, illegal merchant identification methods, and illegal merchant identification systems and that these same principles may be implemented therein, as well as any such variations, without departing from the true spirit and scope of the present patent application.
Moreover, in the following description, reference is made to the accompanying drawings that illustrate certain exemplary embodiments. Electrical, mechanical, logical, and structural changes may be made to these embodiments without departing from the spirit and scope of the invention. In addition, while a feature of the invention may have been disclosed with respect to only one of several implementations/embodiments, such feature may be combined with one or more other features of the other implementations/embodiments as may be desired and/or advantageous for any given or identified function. The following description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims and their equivalents.
Terms such as "comprising" and "comprises" mean that, in addition to having elements (modules) and steps that are directly and explicitly stated in the description and claims, the solution of the invention does not exclude the presence of other elements (modules) and steps that are not directly or explicitly stated.
First, a method for constructing the illegal merchant identification model according to the present invention will be described.
The inventor of the invention discovers through researching the traditional illegal merchant identification method that only certain rules are generally considered in the traditional illegal merchant identification method, for example, whether a merchant has a large number of user transactions deviating from the consumption habit of a user, generally, the scene related to the manually designed rule judgment method is too single, the coverage is narrow, more illegal merchants cannot be effectively identified, and the merchant is easy to avoid the rules, the manually designed rules are seriously dependent on the working experience of a designer, the illegal merchants which can be identified by the model are only limited in the limited cognition of the designer, when the merchant adopts a new illegal method, the identification capability of the model is greatly reduced, and on the other hand, the traditional machine learning method needs to use large-scale marked data to train a high-quality model.
In view of the above-mentioned problems in the conventional technology, in order to perform illegal merchant identification by using a machine learning party, a problem that the number of sample marks is small at present needs to be solved first. Based on the starting point, the illegal merchant identification model construction method aims to select unmarked samples with the effect of improving the model by intelligently recommending the unmarked samples in a small amount of marked sample scenes so as to assist a machine learning modeling process to obtain a higher-quality label sample data set, and construct a merchant identification model with high identification precision.
Fig. 1 is a schematic diagram showing a main flow of the illegal merchant identification model construction method of the present invention.
As shown in fig. 1, the method for constructing the illegal merchant identification model of the present invention includes:
initial sample acquisition step S100: acquiring a first type of merchant sample data as an initial sample and forming a training set;
model training step S200: performing model training and tuning training based on the training set to obtain a classifier;
an edge sampling step S300: classifying the second type of merchant sample data by using the classifier obtained in the model training step and acquiring merchant sample data of a specified edge through edge sampling;
a sample addition step S400: performing specified processing on the merchant sample data of the specified edge acquired in the edge sampling step to obtain first type merchant sample data and adding the first type merchant sample data into the training set;
condition judging step S500: judging whether the classifier meets a specified condition, if so, continuing the following model output step, otherwise, repeating the model training step, the edge sampling step and the sample adding step until the classifier meets the specified condition; and
model output step S600: and outputting the current classifier as an illegal merchant identification model.
The first type of merchant sample data is merchant sample data marked with a black sample and a white sample, and the second type of merchant sample data is merchant sample data not marked with the black sample and the white sample. In the step of adding samples S400, performing the specified processing on the merchant sample data of the specified edge obtained in the step of sampling the edge to obtain a first type of merchant sample data includes: marking black samples and white samples for the merchant sample data of the specified edge acquired in the edge sampling step S400.
In the step S300 of edge sampling, the obtaining merchant sample data that specifies an edge by edge sampling includes: and performing edge sampling by using the classifier obtained in the model training step, and taking the merchant sample data with the prediction probability meeting a specified threshold range as the merchant sample data of the specified edge.
In the condition judging step S500, judging whether the classifier satisfies a prescribed condition includes: and judging whether the sample data of the specified edge obtained by utilizing the classifier to perform edge sampling meets a first specified condition or judging whether the sample data of a commercial tenant in a training set of the classifier meets a second specified condition.
Here, as an example, xgboost is employed as the classifier in the present invention.
As described above, a few existing illegal merchants are used as black samples and a part of normal merchants are used as white samples (the rest of merchants are merchants without black and white samples), the black and white samples are used as training set labels, wherein historical transaction characteristics, cash register characteristics and the like of merchants, bank cards, institutions and the like are used as modeling characteristics, an initial classifier of a training base is used for predicting the unmarked merchants by using the trained initial classifier, merchant samples with prediction probability within a specified threshold range (for example, 0.4 to 0.6) belonging to samples with low confidence coefficient which are difficult to distinguish are collected through edge sampling, sample data of the merchants are provided for service experts for reconfirmation and verification and sample labeling (labeling black samples and white samples), the sample labeling is increased to the training set, model training and model tuning are performed based on the updated training set, uncertain edge sampling is continued, sample data which are difficult to distinguish are extracted, reconfirmed, and the process is repeated for several times until enough samples with low confidence coefficient are obtained through the obtained by using the obtained samples with low confidence coefficient.
According to the method for constructing the illegal merchant identification model, the problem that the amount of the currently labeled sample data is small can be solved, and a small amount of unlabeled samples are automatically screened out by obtaining a sampling strategy of high-quality samples and are provided for a service expert to label. The sample automatically screened in this way is the sample with the maximum effect of improving the model, namely the sample with high quality. Therefore, compared with a model which can obtain a better detection effect only by using large-scale marking data, the invention can use less marked merchant samples, intelligently recommend the samples to be marked and mark the samples by a service expert aiming at a large number of unmarked merchant samples, and obtain a relatively better illegal merchant identification effect while reducing the cost of the marked samples. Moreover, as the number of high-quality marked samples is increased, the learning effect of the model is improved, the timeliness and the accuracy of standard work are finally ensured, and the efficiency is improved.
Next, an embodiment of the illegal merchant identification model construction method according to the present invention will be described.
Fig. 2 is a schematic diagram showing a flow of the illegal merchant identification model construction method according to an embodiment of the present invention.
In the following description, for ease of understanding, all sample data is divided into a set a and a set B. The set a refers to a data set of labeled black and white samples, and the set B refers to a data set of unlabeled gray samples.
As shown in fig. 2, the method for constructing the illegal merchant identification model according to an embodiment of the present invention includes:
step S1: taking a batch of marked samples as initial samples (including black samples and white samples), wherein the initial samples form a set A at present;
step S2: performing model training and tuning training according to the basic characteristics of the initial sample to obtain an initial classifier;
and step S3: classifying a part of unlabeled gray samples (namely unlabeled samples) in the set B by using the trained classifier;
and step S4: obtaining sample data which is difficult to classify through edge sampling;
step S5: taking the collected sample data as intelligent recommended unlabelled samples, and labeling the unlabelled samples;
step S6: adding the labeled sample data (including black samples and white samples) into the set A, recalculating the basic characteristics of the samples, and performing model training and tuning training again to obtain a classifier;
step S7: judging whether the number of samples or the edge data obtained by the classifier meets a preset specified condition, repeating the steps S3-S6 if the number of samples or the edge data obtained by the classifier does not meet the preset specified condition, and continuing the step S8 if the number of samples or the edge data obtained by the classifier meets the preset specified condition; and
step S8: and outputting the classifier obtained by final training as a violation merchant identification model.
As an example, the sample data that is relatively difficult to classify in step S3 may be selected from sample data whose prediction probability is within a predetermined range, and the predetermined range may be, for example: 0.4-0.6, 0.35-0.65, 0.3-0.7, 0.55-0.75, etc., i.e., selecting a relatively intermediate range of values with a prediction probability between 0-1, because a prediction probability with respect to the intermediate value indicates that the sample data may be a white sample or a black sample, which are relatively difficult to classify.
The inventor finds sample data which is difficult to distinguish by focusing on the prediction probability through the ingenious conception, and selects the sample data for marking to be used as the sample data of the next classifier training, thereby improving the accuracy of the classifier and greatly reducing the sample data for marking.
In step S7, as a specific example of determining whether the number of samples or the edge data obtained by the classifier satisfies a predetermined condition set in advance, for example, it is determined that the number of marked samples is sufficiently large (for example, the number of samples is larger than a predetermined threshold value) or the predicted edge data is sufficiently small (for example, the number of edge data is smaller than a predetermined threshold value).
Fig. 3 is a schematic flow chart showing the illegal merchant identification by using the illegal merchant identification model constructed by the invention.
As shown in fig. 3, the process of identifying the illegal merchant by using the illegal merchant identification model constructed by the present invention includes:
step S21: acquiring characteristic data of a merchant to be identified;
step S22: inputting the characteristic data of the commercial tenant to be identified into a violation commercial tenant identification model obtained by utilizing the process training of the figure 2; and
step S23: and identifying whether the commercial tenant to be identified is the illegal commercial tenant through the illegal commercial tenant identification model.
Here, a specific example will be described as to the technical effect of the present invention.
In the process of constructing the illegal merchant identification model by using the method, only 6562 determined illegal merchants and 298233 normal merchants are used as initial samples (namely an initial set A), the remaining 9159822 merchants belong to unlabeled merchants (namely an initial set B), the unlabeled samples are intelligently recommended through the process shown in FIG. 2, three iterations are performed (namely steps S3-S6 are repeated), 52693 unlabeled samples are selected in front and back, and are confirmed, wherein 5338 merchants are confirmed as illegal merchants, the remaining merchants belong to normal merchants are confirmed, the samples of the parts are added into a training set (namely the set A), the model is retrained, the coverage rate of the model is improved from 30% to more than 92%, and the accuracy is improved from 65% to more than 83%.
As described above, in the invention, the appropriate candidate set is intelligently screened out through the machine learning mode to label the sample, compared with the situation of simply waiting for manually labeling all samples, the labeling cost is greatly reduced, a high-quality data set can be obtained, and a more accurate illegal merchant identification model is trained through machine learning, so that the illegal merchant can be identified more accurately and comprehensively.
The invention also provides a device for constructing the illegal merchant identification model, and fig. 3 is a structural block diagram showing the device for constructing the illegal merchant identification model.
As shown in fig. 3, the illegal merchant identification model building apparatus according to the present invention includes:
an initial sample obtaining module 100, configured to obtain a first type of merchant sample data as an initial sample and form a training set;
the model training module 200 is used for carrying out model training and tuning training based on the training set to obtain a classifier;
the edge sampling module 300 is used for classifying the second type of merchant sample data by using the classifier obtained by the model training module and obtaining merchant sample data of the specified edge through edge sampling;
a sample adding module 400, configured to perform specified processing on the merchant sample data of the specified edge obtained in the edge sampling module to obtain a first type of merchant sample data, and add the first type of merchant sample data to the training set;
a condition judging module 500, configured to judge whether the classifier satisfies a predetermined condition, if so, execute the following actions of the model output module, otherwise, repeat the actions executed by the model training module, the edge sampling module, and the sample adding module until the classifier satisfies the predetermined condition; and
and the model output module 600 outputs the current classifier as the illegal merchant identification model.
The first type of merchant sample data is merchant sample data marked with a black sample and a white sample, and the second type of merchant sample data is merchant sample data not marked with the black sample and the white sample.
In the condition determining module 500, determining whether the classifier satisfies a predetermined condition includes: and judging whether the sample data of the specified edge obtained by utilizing the classifier to perform edge sampling meets a first specified condition or not.
The determining, in the condition determining module 500, whether the sample data of the specified edge obtained by performing edge sampling by using the classifier satisfies a first specified condition includes: and judging whether the sample data of the specified edge obtained by utilizing the classifier to carry out edge sampling is smaller than a first threshold value.
In the condition determining module 500, the determining whether the classifier satisfies a specified condition includes: and judging whether the merchant sample data in the training set of the classifier meets a second specified condition.
In the condition determining module 500, determining whether the merchant sample data in the training set of the classifier satisfies a second predetermined condition includes: and judging whether the merchant sample data in the training set of the classifier is larger than a second threshold value.
In the edge sampling module 300, acquiring merchant sample data that specifies an edge by edge sampling includes: and performing edge sampling by using the classifier obtained by the model training module, and taking the merchant sample data with the prediction probability within a specified threshold range as the merchant sample data of the specified edge.
In the edge sampling module 300, acquiring merchant sample data that specifies an edge by edge sampling includes: and performing edge sampling by using the classifier obtained by the model training module, and taking the merchant sample data with the confidence coefficient meeting a specified threshold value as the merchant sample data of the specified edge.
In the edge sampling module 300, the merchant sample data with the prediction probability of 0.4-0.6 is used as the merchant sample data of the specified edge.
In the sample adding module 400, performing a predetermined process on the merchant sample data of the predetermined edge acquired in the edge sampling module to obtain a first type of merchant sample data includes: and marking black samples and white samples for the merchant sample data of the specified edge acquired in the edge sampling module.
The invention also provides a computer readable medium, which stores a computer program, and is characterized in that the computer program is executed by a processor to realize the illegal merchant identification model construction method.
The invention also provides computer equipment which comprises a storage module, a processor and a computer program which is stored on the storage module and can run on the processor, and is characterized in that the processor realizes the illegal merchant identification model construction method when executing the computer program.
As described above, according to the method for constructing the illegal merchant identification model of the present invention, a to-be-labeled sample can be intelligently recommended for a large number of unlabeled merchant samples, so that a sample set that is most likely to improve the effect of the machine learning classification model is screened out from the large number of unlabeled samples as an uncertain sample candidate set, and the intelligently recommended uncertain sample set only occupies a small portion of the total amount of the unlabeled merchant, thereby greatly reducing the labor cost and the time cost of sample labeling. After extracting the unmarked samples recommended intelligently, the unmarked samples are marked as determined samples, and the process is repeated until more and more marked high-quality samples or fewer and less samples with low confidence coefficient adopted by the edge are obtained. And finally, constructing a machine learning model according to the marked samples, for example, performing rolling learning and feature updating by adopting an xgboost classifier, and classifying whether the merchants violate rules after convergence, so that the coverage rate and accuracy of illegal merchant identification are greatly improved.
The above examples mainly illustrate the illegal merchant identification model construction method, the illegal merchant identification method, and the illegal merchant identification system of the present invention. Although only a few embodiments of the present invention have been described in detail, those skilled in the art will appreciate that the present invention may be embodied in many other forms without departing from the spirit or scope thereof. Accordingly, the present examples and embodiments are to be considered as illustrative and not restrictive, and various modifications and substitutions may be made therein without departing from the spirit and scope of the present invention as defined by the appended claims.

Claims (25)

1. A method for constructing an illegal commercial tenant identification model is characterized by comprising the following steps:
an initial sample obtaining step, namely obtaining first type of merchant sample data as an initial sample and forming a training set;
a model training step, namely performing model training and tuning training based on a training set to obtain a classifier;
an edge sampling step, namely classifying the second type of merchant sample data by using the classifier obtained in the model training step and obtaining merchant sample data of a specified edge through edge sampling;
a sample adding step, namely performing specified processing on the merchant sample data of the specified edge acquired in the edge sampling step to obtain first type merchant sample data and adding the first type merchant sample data into a training set;
a condition judging step of judging whether the classifier meets a specified condition, if so, continuing the following model outputting step, otherwise, repeating the model training step, the edge sampling step and the sample adding step until the classifier meets the specified condition; and
and a model output step, namely outputting the current classifier as an illegal merchant identification model.
2. The illegal merchant identification model building method of claim 1,
the first type of merchant sample data is merchant sample data marked with black samples and white samples, and the second type of merchant sample data is merchant sample data not marked with the black samples and the white samples.
3. The illegal merchant identification model building method of claim 1,
in the condition judging step, judging whether the classifier satisfies a prescribed condition includes: and judging whether the sample data of the specified edge obtained by utilizing the classifier to carry out edge sampling meets a first specified condition or not.
4. The illegal merchant identification model building method of claim 3,
judging whether the sample data of the specified edge obtained by edge sampling by using the classifier meets a first specified condition or not comprises the following steps: and judging whether the sample data of the specified edge obtained by utilizing the classifier to carry out edge sampling is smaller than a first threshold value.
5. The illegal merchant identification model building method of claim 1,
in the condition judging step, judging whether the classifier satisfies a prescribed condition includes: and judging whether the merchant sample data in the training set of the classifier meets a second specified condition.
6. The illegal merchant identification model building method of claim 5,
judging whether the merchant sample data in the training set of the classifier meets a second specified condition comprises the following steps: and judging whether the merchant sample data in the training set of the classifier is larger than a second threshold value.
7. The illegal merchant identification model building method of claim 1,
the acquiring merchant sample data specifying an edge by edge sampling includes:
and performing edge sampling by using the classifier obtained in the model training step, and taking the merchant sample data with the prediction probability meeting a specified threshold range as the merchant sample data of the specified edge.
8. The illegal merchant identification model building method of claim 1,
the acquiring merchant sample data specifying an edge by edge sampling includes:
and performing edge sampling by using the classifier obtained in the model training step, and taking the merchant sample data with the confidence coefficient meeting a specified threshold range as the merchant sample data of the specified edge.
9. The illegal merchant identification model building method of claim 7,
and taking the merchant sample data with the prediction probability of 0.4-0.6 as the merchant sample data of the specified edge.
10. The illegal merchant identification model building method of claim 2,
the step of performing a predetermined process on the merchant sample data of the predetermined edge obtained in the edge sampling step to obtain a first type of merchant sample data includes: and marking black samples and white samples for the merchant sample data of the specified edge acquired in the edge sampling step.
11. The illegal merchant identification model building method of claim 1,
the classifier employs xgboost.
12. A violation merchant identification method is characterized by comprising the following steps:
acquiring merchant characteristic data to be identified;
inputting the characteristic data of the commercial tenant to be identified into a illegal commercial tenant identification model constructed by the illegal commercial tenant identification model construction method of any one of claims 1-11; and
and identifying whether the characteristic data of the merchant to be identified is the characteristic illegal merchant by using the merchant identification model.
13. The utility model provides a violation merchant identification model construction device which characterized in that includes:
the initial sample acquisition module is used as an initial sample to acquire first type of merchant sample data and form a training set;
the model training module is used for carrying out model training and tuning training on the basis of the training set to obtain a classifier;
the edge sampling module is used for classifying the second type of merchant sample data by using the classifier obtained by the model training module and obtaining merchant sample data of a specified edge through edge sampling;
the sample adding module is used for carrying out specified processing on the merchant sample data of the specified edge acquired in the edge sampling module to obtain first type merchant sample data and adding the first type merchant sample data into the training set;
the condition judging module is used for judging whether the classifier meets a specified condition or not, if so, executing the actions of the following model output modules, and otherwise, repeating the actions executed by the model training module, the edge sampling module and the sample adding module until the classifier meets the specified condition; and
and the model output module is used for outputting the current classifier as the illegal merchant identification model.
14. The illegal merchant identification model building apparatus of claim 13,
the first type of merchant sample data is merchant sample data marked with black samples and white samples, and the second type of merchant sample data is merchant sample data not marked with the black samples and the white samples.
15. The illegal merchant identification model building apparatus of claim 13,
in the condition determining module, determining whether the classifier satisfies a prescribed condition includes: and judging whether the sample data of the specified edge obtained by utilizing the classifier to perform edge sampling meets a first specified condition or not.
16. The illegal merchant identification model building apparatus of claim 15,
in the condition determining module, determining whether the sample data of the specified edge obtained by performing edge sampling by using the classifier satisfies a first specified condition includes: and judging whether the sample data of the specified edge obtained by utilizing the classifier to carry out edge sampling is smaller than a first threshold value.
17. The illegal merchant identification model building apparatus of claim 13,
in the condition determining module, the determining whether the classifier satisfies a prescribed condition includes: and judging whether the merchant sample data in the training set of the classifier meets a second specified condition.
18. The illegal merchant identification model building apparatus of claim 17,
in the condition determining module, determining whether the merchant sample data in the training set of the classifier satisfies a second prescribed condition includes: and judging whether the merchant sample data in the training set of the classifier is larger than a second threshold value.
19. The illegal merchant identification model building apparatus of claim 13,
in the edge sampling module, acquiring merchant sample data specifying an edge by edge sampling includes: and performing edge sampling by using the classifier obtained by the model training module, and taking the merchant sample data with the prediction probability within a specified threshold range as the merchant sample data of the specified edge.
20. The illegal merchant identification model building apparatus of claim 13,
in the edge sampling module, acquiring merchant sample data specifying an edge by edge sampling includes:
and performing edge sampling by using the classifier obtained by the model training module, and taking the merchant sample data with the confidence coefficient meeting a specified threshold value as the merchant sample data of the specified edge.
21. The illegal merchant identification model building apparatus of claim 19,
in the edge sampling module, the merchant sample data with the prediction probability of 0.4-0.6 is used as the merchant sample data of the specified edge.
22. The illegal merchant identification model building apparatus of claim 14,
in the sample adding module, performing specified processing on the merchant sample data of the specified edge acquired in the edge sampling module to obtain a first type of merchant sample data includes: and marking black samples and white samples for the merchant sample data of the specified edge acquired in the edge sampling module.
23. The illegal merchant identification model building apparatus of claim 13,
the classifier employs xgboost.
24. A computer-readable medium, having stored thereon a computer program,
the computer program is executed by a processor to realize the illegal merchant identification model construction method according to any one of claims 1 to 11.
25. A computer device, comprising a storage module, a processor and a computer program which is stored on the storage module and can run on the processor, wherein the processor implements the illegal merchant identification model construction method according to any one of claims 1 to 11 when executing the computer program.
CN202210079681.XA 2022-01-24 2022-01-24 Illegal merchant identification model construction method and device and illegal merchant identification method Pending CN115330401A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210079681.XA CN115330401A (en) 2022-01-24 2022-01-24 Illegal merchant identification model construction method and device and illegal merchant identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210079681.XA CN115330401A (en) 2022-01-24 2022-01-24 Illegal merchant identification model construction method and device and illegal merchant identification method

Publications (1)

Publication Number Publication Date
CN115330401A true CN115330401A (en) 2022-11-11

Family

ID=83915765

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210079681.XA Pending CN115330401A (en) 2022-01-24 2022-01-24 Illegal merchant identification model construction method and device and illegal merchant identification method

Country Status (1)

Country Link
CN (1) CN115330401A (en)

Similar Documents

Publication Publication Date Title
CN111553387B (en) Personnel target detection method based on Yolov3
JP5176763B2 (en) Low quality character identification method and apparatus
CN111882446A (en) Abnormal account detection method based on graph convolution network
CN113591866B (en) Special operation certificate detection method and system based on DB and CRNN
CN113688851B (en) Data labeling method and device and fine granularity identification method and device
CN105426441B (en) A kind of automatic preprocess method of time series
CN112766218B (en) Cross-domain pedestrian re-recognition method and device based on asymmetric combined teaching network
CN111241987B (en) Multi-target model visual tracking method based on cost-sensitive three-branch decision
CN110555125A (en) Vehicle retrieval method based on local features
CN111144462A (en) Unknown individual identification method and device for radar signals
CN113283467B (en) Weak supervision picture classification method based on average loss and category-by-category selection
CN111242131B (en) Method, storage medium and device for identifying images in intelligent paper reading
CN103093239B (en) A kind of merged point to neighborhood information build drawing method
CN103310088A (en) Automatic detecting method of abnormal illumination power consumption
CN110349119B (en) Pavement disease detection method and device based on edge detection neural network
CN111797772A (en) Automatic invoice image classification method, system and device
CN115330401A (en) Illegal merchant identification model construction method and device and illegal merchant identification method
CN115984639A (en) Intelligent detection method for fatigue state of part
CN113192108A (en) Human-in-loop training method for visual tracking model and related device
CN106326882A (en) Fingerprint identification system and fingerprint identification method based on image quality assessment technology
CN116843368B (en) Marketing data processing method based on ARMA model
CN116738551B (en) Intelligent processing method for acquired data of BIM model
Bhanumathi et al. Underwater Fish Species Classification Using Alexnet
CN114239753B (en) Migratable image identification method and device
CN117078233B (en) Maintenance decision method based on road network maintenance comprehensive evaluation index

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination