CN114880663A - Black product cheating identification method and system based on anomaly detection - Google Patents
Black product cheating identification method and system based on anomaly detection Download PDFInfo
- Publication number
- CN114880663A CN114880663A CN202210343185.0A CN202210343185A CN114880663A CN 114880663 A CN114880663 A CN 114880663A CN 202210343185 A CN202210343185 A CN 202210343185A CN 114880663 A CN114880663 A CN 114880663A
- Authority
- CN
- China
- Prior art keywords
- data
- user
- characteristic
- constructing
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000001514 detection method Methods 0.000 title claims abstract description 23
- 238000012549 training Methods 0.000 claims abstract description 32
- 230000002159 abnormal effect Effects 0.000 claims abstract description 24
- 238000009826 distribution Methods 0.000 claims abstract description 9
- 238000005065 mining Methods 0.000 claims abstract description 4
- 230000002776 aggregation Effects 0.000 claims description 25
- 238000004220 aggregation Methods 0.000 claims description 25
- 230000000694 effects Effects 0.000 claims description 21
- 238000010276 construction Methods 0.000 claims description 16
- 238000005520 cutting process Methods 0.000 claims description 15
- 238000012546 transfer Methods 0.000 claims description 13
- 238000002955 isolation Methods 0.000 claims description 7
- 238000005070 sampling Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000001960 triggered effect Effects 0.000 claims description 4
- 238000004519 manufacturing process Methods 0.000 claims description 2
- 238000012545 processing Methods 0.000 claims description 2
- 230000006399 behavior Effects 0.000 abstract description 12
- 230000008901 benefit Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000004992 fission Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 210000002268 wool Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/45—Structures or tools for the administration of authentication
- G06F21/46—Structures or tools for the administration of authentication by designing passwords or checking the strength of passwords
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Virology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a black product cheating identification method and system based on anomaly detection, relating to the technical field of wind control safety and comprising the following steps of: s1, acquiring user data; s2, defining an output dimension as an inviter dimension; s3, analyzing, constructing and mining the commonalities of black-produced users; s4, after the features in the step S3 are constructed, model training is carried out by combining the obtained features; s5, establishing a risk strategy, standardizing the abnormal score to be within a score range of 0-100, defining the risk level of the inviter according to data distribution and data performance, and configuring a proper risk handling strategy. According to the method and the device, the abnormal users can be dynamically identified through some abnormal indexes of user behaviors and services, the abnormal degree of the users can be defined through the weight, and dynamic management and control are realized through defining the abnormal degree threshold or the abnormal user proportion.
Description
Technical Field
The invention relates to the technical field of wind control safety, in particular to a black product cheating identification method and system based on anomaly detection.
Background
With the development of internet technology and the continuous revolution of the field of financial science and technology, the traditional banking industry gradually changes from relying on off-line store customer service expansion into an on-line and off-line combined business development mode, and each internet bank actively expands on-line business and reduces the maintenance cost of customers and clients. In this mode, due to the lack of cores of offline sales operators, some black or malicious wool users are bred, and the profit is earned out by utilizing business holes or marketing activity rules, so that the healthy operation and growth of the pull-new activity are influenced, the benefits of companies are seriously harmed, and difficulties and bottlenecks are created for subsequent user transformation.
Common internet banks mainly develop business on line, bank account opening scenes are mainly interacted with users on the basis of data, and are drawn to be used as main customer acquisition sources by social fission, so that risks of being utilized by black products are greatly improved.
The existing method for identifying the black products is mainly based on an association rule matching or supervised labeled training method. However, for the association rule, firstly, the rule needs to be defined manually, secondly, the rule threshold value cannot be changed dynamically according to the distribution of data, and the continuous effectiveness of the rule cannot be ensured; however, the supervised learning method is difficult to define the fraud label, especially for a new business without bad sample accumulation in the early stage of the activity, and is difficult to artificially mark, time-consuming, difficult to cover completely, difficult to obtain significant effect in a short period, and not feasible.
At present, the anti-fraud identification method in the financial field mainly focuses on credit business, a malicious or potential overdue user is predicted through a user rating card model, but for the credit business, a user label is easy to define through historical bad account data, validity and model effect of variables can be better evaluated based on training samples, and for an anti-fraud scene of operation finance, definition of a fraud user is not clear.
Therefore, for anti-fraud user identification in a marketing scene, several problems need to be solved, one is that the model can automatically identify a bad sample, namely a fraud user, in a scene without a training sample; the output result of the model II needs to have certain business interpretability, namely business evidence needs to be provided to confirm the fraud of the user; finally, the interception of the fraudulent users of the marketing campaign also needs to keep a good balance between the management and control strength and the business expansion.
Disclosure of Invention
The invention aims to provide a black product cheating identification method and system based on anomaly detection, so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme:
a black product cheating identification method based on anomaly detection comprises the following steps:
s1, acquiring user data;
s2, defining an output dimension as an inviter dimension;
s3, analyzing, constructing and mining the commonness characteristics of black product users;
s4, after the features in the step S3 are constructed, model training is carried out by combining the obtained features;
s5, establishing a risk strategy, standardizing the abnormal score to be within a score range of 0-100, defining the risk level of the inviter according to data distribution and data performance, and configuring a proper risk handling strategy.
As a further scheme of the invention: the step S1 includes the steps of:
s11, acquiring user account opening and personal information;
s12, obtaining user login information;
s13, acquiring user inviter information;
s14, acquiring user behavior data;
and S15, acquiring the user transfer record.
As a still further scheme of the invention: the step S3 includes the following steps
S31, constructing the characteristics of equipment aggregation, and calculating the number of people who are invited to log in and the number of people who have the same equipment;
s32, constructing personal information gathering characteristics, wherein after being encrypted, the login password and the payment password are consistent, the number of people who are consistent with each other is compared with the number of people who are consistent with each other, and the number of people who are consistent with each other is compared with the number of people who are the same with the address information filled in the account opening;
s33, constructing the characteristics of transaction aggregation, wherein the number of the invited people who have the relevant money of the rotary activities to the unified account is larger than that of the unified account;
s34, constructing the characteristic of only browsing the page related to the participation activity reward requirement;
s35, constructing a feature which does not trigger the user proportion of any input box;
s36, constructing the characteristic that the practical APP market is smaller than one minute of user ratio.
As a still further scheme of the invention: the step S4 includes the steps of:
s41, sampling from the training set for constructing an isolated tree, and taking randomly-extracted sub-samples as root nodes;
s42, randomly appointing a certain characteristic, and generating a cutting point in the current node data, wherein the cutting point is randomly generated between the maximum value and the minimum value of the characteristic of the current node;
s43, dividing sub-trees, placing the data with the appointed dimension of the current node data space smaller than the cut point in the left sub-tree, and placing the data larger than or equal to the appointed dimension of the current node data space in the right sub-tree;
s44, in the child node, repeating the step S42 and the step S43 until the child node has only one data or reaches a predefined number height;
and S45, substituting the training samples into each isolated tree in Isolation Forest, calculating through a first formula, recording the path length, and calculating the abnormal score of each sample.
As a still further scheme of the invention: the step S3 of constructing the data source required by the commonality characteristic of the black product user includes: a user table, a user login log table, a user invitation relation table, a user behavior buried point data table and a user transfer log table;
the processing procedure of the common characteristic in the step S3 includes: taking n inviters m1 in the counting period as sample points, logging the same equipment number as the invitee m2 as the characteristic x1 through the data table construction m1, logging the same equipment number as the invitee m2 as the characteristic x2 through m2 invited by the same m1, calculating the number characteristics x3 and x4 of the people with m1 and m2 and the comparison between the login password and the payment password consistent through a data table, wherein the number of people with m2 encrypted login password and the comparison between the login password and the payment password consistent through the same m1 invitation is the characteristics x5 and x6, the number of people with m2 address consistent is the characteristic x7, the number of persons having the amount associated with the gyrating activity to the same account by the invitee is calculated as characteristic x8 through the data table, calculating m2 number of news to data only of a login page through a data table to be characteristic x9, m2 number without triggering any input box to be characteristic x10, m2 number with practical APP duration not exceeding one minute to be characteristic x11, and m2 number with root and simulator device id to be practical is characteristic x12 and x 13; and filling missing values into the numerical values, and dividing x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12 and x13 by the number a of m2 invited by m1 in the counting period.
As a still further scheme of the invention: the first formula in step S45 is:
c (ψ) ═ 2H (ψ -1) -2(ψ -1)/n, where x is the data point, ψ is the number of sample points, H (i) is a key sum, and E (H (x)) is the average height of the data point x in the forest.
A blackout cheating recognition system based on anomaly detection comprises:
the user data acquisition module is used for acquiring basic information of a user;
an output dimension definition module for constructing data features from inviter dimensions in a marketing scenario;
the characteristic construction module is used for constructing the common characteristics of the black product users;
the model training module is used for combining the obtained characteristics to train the model after the characteristics are constructed by the characteristic construction module;
and the risk strategy making module is used for standardizing the abnormal score to be within a score range of 0-100, defining the risk level of the inviter M1 according to data distribution and data performance, and configuring a proper risk handling strategy.
As a further scheme of the invention: the user data acquisition module comprises:
the account opening and personal information acquisition unit is used for acquiring the basic account opening information of the user;
a login information acquisition unit for acquiring login information of a user;
the system comprises an inviter information acquisition unit, a user information acquisition unit and a user information acquisition unit, wherein the inviter information acquisition unit is used for acquiring information of inviters of a user;
the behavior data acquisition unit is used for acquiring the behavior data of the user;
and the transfer record unit is used for acquiring the transfer record of the user.
As a still further scheme of the invention: the feature construction module includes:
the equipment aggregation characteristic unit is used for constructing equipment aggregation characteristics and calculating the number of persons who are invited to log in and the number of persons who have the same equipment;
the personal information gathering characteristic unit is used for constructing personal information gathering characteristics, namely the login password is consistent with the payment password after the encrypted information is encrypted by the invitee, the number of people with the consistent inviter accounts is compared with the number of people with the same account opening filling address information;
the transaction aggregation characteristic unit is used for constructing transaction aggregation characteristics, namely the number of the invited persons who have the rotary activity related money to the unified account is compared;
and the related characteristic unit is used for constructing a characteristic that only the related page is required for browsing the participation activity reward, constructing a characteristic that the user proportion of any input box is not triggered, and constructing a characteristic that the practical APP market is less than one minute of user proportion.
As a still further scheme of the invention: the model training module comprises:
the isolated tree construction unit is used for sampling from the training set, constructing an isolated tree and taking randomly-extracted sub-samples as root nodes;
a cut point determining unit, configured to generate a cut point in current node data when a certain feature is randomly specified, where the cut point is randomly generated between a maximum value and a minimum value of the feature of a current node;
the system comprises a molecule number dividing unit, a node number dividing unit and a node number dividing unit, wherein the molecule number dividing unit is used for placing data of which the designated dimension of a current node data space is smaller than a cutting point in a left sub-tree and placing data of which the designated dimension is larger than or equal to that of a current node data space in a right sub-tree;
the training data acquisition unit is used for stopping after the cutting point determination unit and the molecule number division unit are repeatedly cut until the child node only has one data or reaches the preset tree height;
and the sample anomaly calculation unit is used for substituting the training samples into each isolated tree in Isolation Forest, calculating through a first formula, recording the path length, and calculating the anomaly score of each sample.
Compared with the prior art, the invention has the beneficial effects that: according to the method and the device, the abnormal users can be dynamically identified through some abnormal indexes of user behaviors and services, the abnormal degree of the users can be defined through the weight, and dynamic management and control are realized through defining the abnormal degree threshold or the abnormal user proportion.
1. The characteristic threshold is intelligently determined by adopting methods such as machine learning and the like, so that the limitation and instability of a manual decision threshold are eliminated, and the threshold can be changed along with the change of data distribution;
2. the multiple dimensionality risk characteristics are integrated, and compared with the traditional single rule identification, the limitation and the unicity of the model are reduced;
3. malicious risk customers can be identified more accurately, damage of fraudulent users to benefits of companies is reduced, and cost is saved for the companies.
Drawings
Fig. 1 is a schematic diagram of a blackout cheating identification method based on anomaly detection.
Fig. 2 is a specific schematic diagram of a black product cheating identification method based on anomaly detection.
Fig. 3 is a schematic diagram of a black product cheating recognition system based on anomaly detection.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1 to 3, in an embodiment of the present invention, a method for identifying a black product cheat based on anomaly detection includes the following steps: 1. acquiring user data, wherein the data to be acquired comprises user account opening and personal information, user login information, user inviter information, user behavior data, user transfer data and the like, so that basic information of a user can be acquired;
2. defining an output dimension as an inviter dimension, namely for marketing scenes such as fission and pull, and the like, wherein main nodes are the inviter and the invitee, the inviter and the invitee are in one-to-many relationship, benefit driving points of cheating by the inviter are far higher than those of the invitee, the inviter is more likely to become a black-production cheating user, and the inviter risk can be better identified through whether the user features invited by the inviter are aggregated or the abnormal user occupation ratio, so that the patent constructs data features from the inviter dimension;
3. analyzing, constructing and mining common characteristics of black product users, wherein the black product users usually control a large number of accounts by one person or a small number of persons, so that the associated users usually have related characteristics of aggregation, wherein the constructed characteristics need corresponding data packets, and in the embodiment, the needed data packets comprise a first account opening table and a user table; logging in a log table by a user; thirdly, the user invites the relation table; fourthly, burying a data table by the user behavior; a user transfer log table,
for example, the aggregation of user information and logged devices, an aggregation feature can be constructed, the aggregation of devices: the number of the invited persons who log in the same equipment is larger than that of the invited persons who log in the same equipment; personal information gathering: after the invitee encrypts the login password, the payment password and the number of people consistent with the inviter are compared, and the number of people with the same address information is filled in the account; transaction aggregation: the invitee has the related money of the rotary activities to account for the same number of people;
for example, because the fraudulent user has a stronger purpose of participating in the activity, and compared with the normal user, the behavior data is relatively single, and the browsing duration is shorter, the following characteristics can be constructed, that is, only the relevant page required by the participation in the activity reward is browsed, the user proportion of any input box is not triggered, and the user proportion of the APP duration is less than 1 minute;
finally, considering dimension risk features such as abnormal means counterfeit device id and the like, including using root and simulator user proportion, the process required for calculating relevant features in the present embodiment is as follows: taking n inviters m1 in the counting period as sample points, logging the same equipment number as the invitee m2 as the characteristic x1 through the data table construction m1, logging the same equipment number as the invitee m2 as the characteristic x2 through m2 invited by the same m1, calculating the number characteristics x3 and x4 of the people with m1 and m2 and the comparison between the login password and the payment password consistent through a data table, wherein the number of people with m2 encrypted login password and the comparison between the login password and the payment password consistent through the same m1 invitation is the characteristics x5 and x6, the number of people with m2 address consistent is the characteristic x7, the number of persons who are invited to have the relevant amount of the rotary activity to the same account through the data table is calculated as characteristic x8, calculating m2 number of the news to the data only of a login page through a data table to be characteristic x9, m2 number of any input boxes which are not triggered to be characteristic x10, m2 number of practical APP time length which does not exceed one minute is characteristic x11, and m2 number of practical root and simulator equipment id is characteristic x12 and x 13; filling missing values into the numerical values, and dividing x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12 and x13 by a number a of m2 invited by m1 in a period;
4. after the characteristics in the step 3 are constructed, model training is carried out by combining the obtained characteristics, and because an unsupervised training method is adopted, a model is established mainly based on the Isolation Forest method;
5. and (4) formulating a risk strategy, standardizing the abnormal score to be within a 0-100 score interval, defining the risk level of the inviter according to data distribution and data performance, and configuring a proper risk handling strategy.
As a further embodiment of the present application, please refer to fig. 1 and fig. 2, wherein the model training method in step 4 is as follows, firstly, sampling is performed from the training set to construct an isolated tree, randomly extracted subsamples are used as root nodes, that is, psi sample points are immediately extracted from the data sample population to form a subset, and the subset is placed into the root nodes of the isolated tree; randomly appointing a certain characteristic, generating a cutting point p in the current node data, wherein the p is randomly generated between the maximum value and the minimum value of the characteristic of the current node, namely randomly appointing a characteristic q from all the characteristics, and randomly generating a cutting point p in the value of the characteristic q of the current node; dividing subtrees, placing data with the appointed dimension of the current node data space smaller than a cutting point p in a left subtree, placing data with the appointed dimension larger than or equal to p in a right subtree, namely cutting samples, dividing the current data space into two subspaces, placing the data with the characteristic q smaller than the cutting point p in the left subtree, and placing the data with the characteristic q larger than or equal to p in the right subtree; repeating the step (c) in the child nodes until the child nodes have only one data or reach the height of a predefined number, namely judging whether all the nodes have only one sample point or the isolated tree reaches the specified height, if so, adding 1 to the current number of the isolated tree, otherwise, continuing the step (c); testing each isolated tree in the Isolation Forest by the training sample, recording the path length, calculating the abnormal score of each sample, namely performing abnormal detection on each sample, and calculating the abnormal score of each sample by the following calculation method
c (ψ) ═ 2H (ψ -1) -2(ψ -1)/n, where x is the data point, ψ is the number of sample points, H (i) is a key sum, and E (H (x)) is the average height of the data point x in the forest.
Referring to fig. 3, in an embodiment of the present invention, a blackout cheating identification system based on anomaly detection includes: user data acquisition module, output dimension definition module, characteristic construction module, model training module and risk strategy make module, wherein utilize user data acquisition module to obtain user's basic information at first, user data acquisition module includes in this embodiment: the system comprises an account opening and personal information acquisition unit, a login information acquisition unit, an inviter information acquisition unit, a behavior data acquisition unit, a transfer record unit and other basic information acquisition units, wherein the account opening and personal information acquisition unit is used for acquiring the basic information of the account opening of a user, the login information acquisition unit is used for acquiring the login information of the user, the inviter information acquisition unit is used for acquiring the information of the inviter of the user, the behavior data acquisition unit is used for acquiring the behavior data of the user, and the transfer record unit is used for acquiring the transfer record of the user; the output dimension definition module is used for constructing data characteristics from dimensions of the inviter in a marketing scene; the feature construction module is used for constructing the common features of the black product users, wherein the feature construction module comprises: the equipment aggregation characteristic unit, the personal information aggregation characteristic unit, the transaction aggregation characteristic unit and the related characteristic unit are firstly used for constructing the equipment aggregation characteristic, calculating the number of people who are invited to log in between the invited people and who have the invited people to log in the same equipment, then using the personal information aggregation characteristic unit to construct the personal information aggregation characteristic, namely, after the encrypted login password of the invitee is consistent with the payment password, the number of people with consistent login password of the invitee is compared with the number of people with the same account opening address information, then the transaction aggregation characteristic unit is utilized to construct the transaction aggregation characteristic, the method comprises the steps that firstly, a user is invited to a uniform account, namely the number of the invited user has the rotation activity related amount, and finally, the related characteristic unit is utilized to construct the characteristics of only browsing the related page required for participating in the activity reward, construct the characteristics of not triggering any input box user proportion and construct the characteristics of a practical APP market smaller than one minute user proportion; the model training module is used for performing model training by combining the obtained characteristics after the characteristics are constructed by the characteristic construction module; the risk strategy making module is used for standardizing the abnormal score to be within the interval of 0-100 scores, defining the risk level of the inviter M1 according to data distribution and data performance, and configuring a proper risk handling strategy.
As a further embodiment of the present application, please refer to the drawings, wherein the model training module comprises: the isolated tree construction unit is used for sampling from the training set, constructing an isolated tree and taking randomly-extracted sub-samples as root nodes; a cut point determining unit, configured to generate a cut point in current node data when a certain feature is randomly specified, where the cut point is randomly generated between a maximum value and a minimum value of the feature of a current node; the system comprises a molecule number dividing unit, a node number dividing unit and a node number dividing unit, wherein the molecule number dividing unit is used for placing data of which the designated dimension of a current node data space is smaller than a cutting point in a left sub-tree and placing data of which the designated dimension is larger than or equal to that of a current node data space in a right sub-tree; the training data acquisition unit is used for stopping after the cutting point determination unit and the molecule number division unit are repeatedly cut until the child node only has one data or reaches the preset tree height; and the sample anomaly calculation unit is used for substituting the training samples into each isolated tree in Isolation Forest, calculating through a first formula, recording the path length, and calculating the anomaly score of each sample.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.
Claims (10)
1. A black product cheating identification method based on anomaly detection is characterized by comprising the following steps:
s1, acquiring user data;
s2, defining an output dimension as an inviter dimension;
s3, analyzing, constructing and mining the commonalities of black-produced users;
s4, after the features in the step S3 are constructed, model training is carried out by combining the obtained features;
s5, establishing a risk strategy, standardizing the abnormal score to be within a score range of 0-100, defining the risk level of the inviter according to data distribution and data performance, and configuring a proper risk handling strategy.
2. The anomaly detection-based blackjack cheating recognition method according to claim 1, wherein the step S1 comprises the steps of:
s11, acquiring user account opening and personal information;
s12, obtaining user login information;
s13, acquiring user inviter information;
s14, acquiring user behavior data;
and S15, acquiring the user transfer record.
3. The black production cheating recognition method based on anomaly detection according to claim 1, wherein said step S3 comprises the steps of
S31, constructing the characteristics of equipment aggregation, and calculating the number of people who are invited to log in and the number of people who have the same equipment;
s32, constructing personal information gathering characteristics, wherein after being encrypted, the login password and the payment password are consistent, the number of people who are consistent with each other is compared with the number of people who are consistent with each other, and the number of people who are consistent with each other is compared with the number of people who are the same with the address information filled in the account opening;
s33, constructing the characteristics of transaction aggregation, wherein the number of the invited people who have the relevant money of the rotary activities to the unified account is larger than that of the unified account;
s34, constructing the characteristic of only browsing the page related to the participation activity reward requirement;
s35, constructing a feature which does not trigger the user proportion of any input box;
s36, constructing the characteristic that the practical APP market is smaller than one minute of user ratio.
4. The anomaly detection-based blackjack cheating recognition method according to claim 1, wherein the step S4 comprises the steps of:
s41, sampling from the training set for constructing an isolated tree, and taking randomly-extracted sub-samples as root nodes;
s42, randomly appointing a certain characteristic, and generating a cutting point in the current node data, wherein the cutting point is randomly generated between the maximum value and the minimum value of the characteristic of the current node;
s43, dividing sub-trees, placing the data with the appointed dimension of the current node data space smaller than the cut point in the left sub-tree, and placing the data larger than or equal to the appointed dimension of the current node data space in the right sub-tree;
s44, in the child node, repeating the step S42 and the step S43 until the child node has only one data or reaches a predefined number height;
and S45, substituting the training samples into each isolated tree in Isolation Forest, calculating through a first formula, recording the path length, and calculating the abnormal score of each sample.
5. The anomaly detection-based blackproduction cheating identification method according to claim 1, wherein the step S3 of constructing the data source required by the commonalities characteristics of the blackproduction users comprises: a user table, a user login log table, a user invitation relation table, a user behavior buried point data table and a user transfer log table;
the processing procedure of the common characteristic in the step S3 includes: taking n inviters m1 in the counting period as sample points, logging the same equipment number as the invitee m2 as the characteristic x1 through the data table construction m1, logging the same equipment number as the invitee m2 as the characteristic x2 through m2 invited by the same m1, calculating the number characteristics x3 and x4 of the people with m1 and m2 and the comparison between the login password and the payment password consistent through a data table, wherein the number of people with m2 encrypted login password and the comparison between the login password and the payment password consistent through the same m1 invitation is the characteristics x5 and x6, the number of people with m2 address consistent is the characteristic x7, the number of persons having the amount associated with the gyrating activity to the same account by the invitee is calculated as characteristic x8 through the data table, calculating m2 number of news to data only of a login page through a data table to be characteristic x9, m2 number without triggering any input box to be characteristic x10, m2 number with practical APP duration not exceeding one minute to be characteristic x11, and m2 number with root and simulator device id to be practical is characteristic x12 and x 13; and filling missing values into the numerical values, and dividing x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12 and x13 by the number a of m2 invited by m1 in the counting period.
6. The anomaly detection-based blackjack cheating identification method according to claim 4, wherein the step of
The first formula in step S45 is: c (ψ) ═ 2H (ψ -1) -2(ψ -1)/n, where x is the data point, ψ is the number of sample points, H (i) is a key sum, and E (H (x)) is the average height of the data point x in the forest.
7. A blackout cheating recognition system based on anomaly detection, comprising:
the user data acquisition module is used for acquiring basic information of a user;
an output dimension definition module for constructing data features from inviter dimensions in a marketing scenario;
the characteristic construction module is used for constructing the common characteristics of the black product users;
the model training module is used for combining the obtained characteristics to train the model after the characteristics are constructed by the characteristic construction module;
and the risk strategy making module is used for standardizing the abnormal score to be within a score range of 0-100, defining the risk level of the inviter M1 according to data distribution and data performance, and configuring a proper risk handling strategy.
8. The anomaly detection-based blackjack cheating recognition system according to claim 7, wherein the user data acquisition module comprises:
the account opening and personal information acquisition unit is used for acquiring the basic account opening information of the user;
a login information acquisition unit for acquiring login information of a user;
the system comprises an inviter information acquisition unit, a user information acquisition unit and a user information acquisition unit, wherein the inviter information acquisition unit is used for acquiring information of inviters of a user;
the behavior data acquisition unit is used for acquiring behavior data of a user;
and the transfer record unit is used for acquiring the transfer record of the user.
9. The anomaly detection-based blackjack cheating recognition system of claim 7, wherein the signature construction module comprises:
the equipment aggregation characteristic unit is used for constructing equipment aggregation characteristics and calculating the number of persons who are invited to log in and the number of persons who have the same equipment;
the personal information gathering characteristic unit is used for constructing personal information gathering characteristics, namely the login password is consistent with the payment password after the encrypted information is encrypted by the invitee, the number of people with the consistent inviter accounts is compared with the number of people with the same account opening filling address information;
the transaction aggregation characteristic unit is used for constructing transaction aggregation characteristics, namely the number of the invited persons who have the rotary activity related money to the unified account is compared;
and the related characteristic unit is used for constructing a characteristic that only the related page is required for browsing the participation activity reward, constructing a characteristic that the user proportion of any input box is not triggered, and constructing a characteristic that the practical APP market is less than one minute of user proportion.
10. The anomaly detection-based blackjack cheating recognition system of claim 7, wherein the model training module comprises:
the isolated tree construction unit is used for sampling from the training set, constructing an isolated tree and taking randomly-extracted sub-samples as root nodes;
a cut point determining unit, configured to generate a cut point in current node data when a certain feature is randomly specified, where the cut point is randomly generated between a maximum value and a minimum value of the feature of a current node;
the system comprises a molecule number dividing unit, a node number dividing unit and a node number dividing unit, wherein the molecule number dividing unit is used for placing data of which the designated dimension of a current node data space is smaller than a cutting point in a left sub-tree and placing data of which the designated dimension is larger than or equal to that of a current node data space in a right sub-tree;
the training data acquisition unit is used for stopping after the cutting point determination unit and the molecule number division unit are repeatedly cut until the child node only has one data or reaches the preset tree height;
and the sample anomaly calculation unit is used for substituting the training samples into each isolated tree in Isolation Forest, calculating through a first formula, recording the path length, and calculating the anomaly score of each sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210343185.0A CN114880663A (en) | 2022-04-02 | 2022-04-02 | Black product cheating identification method and system based on anomaly detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210343185.0A CN114880663A (en) | 2022-04-02 | 2022-04-02 | Black product cheating identification method and system based on anomaly detection |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114880663A true CN114880663A (en) | 2022-08-09 |
Family
ID=82669445
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210343185.0A Pending CN114880663A (en) | 2022-04-02 | 2022-04-02 | Black product cheating identification method and system based on anomaly detection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114880663A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116128534A (en) * | 2023-04-13 | 2023-05-16 | 上海二三四五网络科技有限公司 | User fission cheating identification method and device based on comprehensive similarity |
-
2022
- 2022-04-02 CN CN202210343185.0A patent/CN114880663A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116128534A (en) * | 2023-04-13 | 2023-05-16 | 上海二三四五网络科技有限公司 | User fission cheating identification method and device based on comprehensive similarity |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI712981B (en) | Risk identification model training method, device and server | |
Ma et al. | A new aspect on P2P online lending default prediction using meta-level phone usage data in China | |
US11436430B2 (en) | Feature information extraction method, apparatus, server cluster, and storage medium | |
CN110892442A (en) | System, method and apparatus for adaptive scoring to detect misuse or abuse of business cards | |
CN112053221A (en) | Knowledge graph-based internet financial group fraud detection method | |
CN108399509A (en) | Determine the method and device of the risk probability of service request event | |
CN103678659A (en) | E-commerce website cheat user identification method and system based on random forest algorithm | |
CN110148000A (en) | A kind of security management and control system and method applied to payment platform | |
CN101236638A (en) | Web based bank card risk monitoring method and system | |
CN102946331A (en) | Detecting method and device for zombie users of social networks | |
CN107807941A (en) | Information processing method and device | |
CN110598982B (en) | Active wind control method and system based on intelligent interaction | |
Liu et al. | A graph learning based approach for identity inference in dapp platform blockchain | |
Mawutor | Impact of E-Banking on the Profitability of Banks in Ghana | |
CN110119980A (en) | A kind of anti-fraud method, apparatus, system and recording medium for credit | |
CN113902037A (en) | Abnormal bank account identification method, system, electronic device and storage medium | |
CN114880663A (en) | Black product cheating identification method and system based on anomaly detection | |
CN111831715A (en) | Intelligent access and certificate storage system and method based on artificial intelligence big data | |
CN111582757B (en) | Method, device, equipment and computer readable storage medium for analyzing fraud risk | |
US20100042446A1 (en) | Systems and methods for providing core property review | |
EP3879418A1 (en) | Identity verification method and device | |
CN115564591A (en) | Financing product determination method and related equipment | |
CN115034685A (en) | Customer value evaluation method, customer value evaluation device and computer-readable storage medium | |
KR102358156B1 (en) | Method for providing customized loan brokerage services | |
Nimesh et al. | A survey on opinion mining and sentiment analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |