CN114880663A - Black product cheating identification method and system based on anomaly detection - Google Patents

Black product cheating identification method and system based on anomaly detection Download PDF

Info

Publication number
CN114880663A
CN114880663A CN202210343185.0A CN202210343185A CN114880663A CN 114880663 A CN114880663 A CN 114880663A CN 202210343185 A CN202210343185 A CN 202210343185A CN 114880663 A CN114880663 A CN 114880663A
Authority
CN
China
Prior art keywords
data
user
characteristic
constructing
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210343185.0A
Other languages
Chinese (zh)
Inventor
李灵
黄平
况小荣
李晓森
邓海辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weisi E Commerce Shenzhen Co ltd
Original Assignee
Weisi E Commerce Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weisi E Commerce Shenzhen Co ltd filed Critical Weisi E Commerce Shenzhen Co ltd
Priority to CN202210343185.0A priority Critical patent/CN114880663A/en
Publication of CN114880663A publication Critical patent/CN114880663A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/45Structures or tools for the administration of authentication
    • G06F21/46Structures or tools for the administration of authentication by designing passwords or checking the strength of passwords

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a black product cheating identification method and system based on anomaly detection, relating to the technical field of wind control safety and comprising the following steps of: s1, acquiring user data; s2, defining an output dimension as an inviter dimension; s3, analyzing, constructing and mining the commonalities of black-produced users; s4, after the features in the step S3 are constructed, model training is carried out by combining the obtained features; s5, establishing a risk strategy, standardizing the abnormal score to be within a score range of 0-100, defining the risk level of the inviter according to data distribution and data performance, and configuring a proper risk handling strategy. According to the method and the device, the abnormal users can be dynamically identified through some abnormal indexes of user behaviors and services, the abnormal degree of the users can be defined through the weight, and dynamic management and control are realized through defining the abnormal degree threshold or the abnormal user proportion.

Description

Black product cheating identification method and system based on anomaly detection
Technical Field
The invention relates to the technical field of wind control safety, in particular to a black product cheating identification method and system based on anomaly detection.
Background
With the development of internet technology and the continuous revolution of the field of financial science and technology, the traditional banking industry gradually changes from relying on off-line store customer service expansion into an on-line and off-line combined business development mode, and each internet bank actively expands on-line business and reduces the maintenance cost of customers and clients. In this mode, due to the lack of cores of offline sales operators, some black or malicious wool users are bred, and the profit is earned out by utilizing business holes or marketing activity rules, so that the healthy operation and growth of the pull-new activity are influenced, the benefits of companies are seriously harmed, and difficulties and bottlenecks are created for subsequent user transformation.
Common internet banks mainly develop business on line, bank account opening scenes are mainly interacted with users on the basis of data, and are drawn to be used as main customer acquisition sources by social fission, so that risks of being utilized by black products are greatly improved.
The existing method for identifying the black products is mainly based on an association rule matching or supervised labeled training method. However, for the association rule, firstly, the rule needs to be defined manually, secondly, the rule threshold value cannot be changed dynamically according to the distribution of data, and the continuous effectiveness of the rule cannot be ensured; however, the supervised learning method is difficult to define the fraud label, especially for a new business without bad sample accumulation in the early stage of the activity, and is difficult to artificially mark, time-consuming, difficult to cover completely, difficult to obtain significant effect in a short period, and not feasible.
At present, the anti-fraud identification method in the financial field mainly focuses on credit business, a malicious or potential overdue user is predicted through a user rating card model, but for the credit business, a user label is easy to define through historical bad account data, validity and model effect of variables can be better evaluated based on training samples, and for an anti-fraud scene of operation finance, definition of a fraud user is not clear.
Therefore, for anti-fraud user identification in a marketing scene, several problems need to be solved, one is that the model can automatically identify a bad sample, namely a fraud user, in a scene without a training sample; the output result of the model II needs to have certain business interpretability, namely business evidence needs to be provided to confirm the fraud of the user; finally, the interception of the fraudulent users of the marketing campaign also needs to keep a good balance between the management and control strength and the business expansion.
Disclosure of Invention
The invention aims to provide a black product cheating identification method and system based on anomaly detection, so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme:
a black product cheating identification method based on anomaly detection comprises the following steps:
s1, acquiring user data;
s2, defining an output dimension as an inviter dimension;
s3, analyzing, constructing and mining the commonness characteristics of black product users;
s4, after the features in the step S3 are constructed, model training is carried out by combining the obtained features;
s5, establishing a risk strategy, standardizing the abnormal score to be within a score range of 0-100, defining the risk level of the inviter according to data distribution and data performance, and configuring a proper risk handling strategy.
As a further scheme of the invention: the step S1 includes the steps of:
s11, acquiring user account opening and personal information;
s12, obtaining user login information;
s13, acquiring user inviter information;
s14, acquiring user behavior data;
and S15, acquiring the user transfer record.
As a still further scheme of the invention: the step S3 includes the following steps
S31, constructing the characteristics of equipment aggregation, and calculating the number of people who are invited to log in and the number of people who have the same equipment;
s32, constructing personal information gathering characteristics, wherein after being encrypted, the login password and the payment password are consistent, the number of people who are consistent with each other is compared with the number of people who are consistent with each other, and the number of people who are consistent with each other is compared with the number of people who are the same with the address information filled in the account opening;
s33, constructing the characteristics of transaction aggregation, wherein the number of the invited people who have the relevant money of the rotary activities to the unified account is larger than that of the unified account;
s34, constructing the characteristic of only browsing the page related to the participation activity reward requirement;
s35, constructing a feature which does not trigger the user proportion of any input box;
s36, constructing the characteristic that the practical APP market is smaller than one minute of user ratio.
As a still further scheme of the invention: the step S4 includes the steps of:
s41, sampling from the training set for constructing an isolated tree, and taking randomly-extracted sub-samples as root nodes;
s42, randomly appointing a certain characteristic, and generating a cutting point in the current node data, wherein the cutting point is randomly generated between the maximum value and the minimum value of the characteristic of the current node;
s43, dividing sub-trees, placing the data with the appointed dimension of the current node data space smaller than the cut point in the left sub-tree, and placing the data larger than or equal to the appointed dimension of the current node data space in the right sub-tree;
s44, in the child node, repeating the step S42 and the step S43 until the child node has only one data or reaches a predefined number height;
and S45, substituting the training samples into each isolated tree in Isolation Forest, calculating through a first formula, recording the path length, and calculating the abnormal score of each sample.
As a still further scheme of the invention: the step S3 of constructing the data source required by the commonality characteristic of the black product user includes: a user table, a user login log table, a user invitation relation table, a user behavior buried point data table and a user transfer log table;
the processing procedure of the common characteristic in the step S3 includes: taking n inviters m1 in the counting period as sample points, logging the same equipment number as the invitee m2 as the characteristic x1 through the data table construction m1, logging the same equipment number as the invitee m2 as the characteristic x2 through m2 invited by the same m1, calculating the number characteristics x3 and x4 of the people with m1 and m2 and the comparison between the login password and the payment password consistent through a data table, wherein the number of people with m2 encrypted login password and the comparison between the login password and the payment password consistent through the same m1 invitation is the characteristics x5 and x6, the number of people with m2 address consistent is the characteristic x7, the number of persons having the amount associated with the gyrating activity to the same account by the invitee is calculated as characteristic x8 through the data table, calculating m2 number of news to data only of a login page through a data table to be characteristic x9, m2 number without triggering any input box to be characteristic x10, m2 number with practical APP duration not exceeding one minute to be characteristic x11, and m2 number with root and simulator device id to be practical is characteristic x12 and x 13; and filling missing values into the numerical values, and dividing x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12 and x13 by the number a of m2 invited by m1 in the counting period.
As a still further scheme of the invention: the first formula in step S45 is:
Figure BDA0003580103850000041
c (ψ) ═ 2H (ψ -1) -2(ψ -1)/n, where x is the data point, ψ is the number of sample points, H (i) is a key sum, and E (H (x)) is the average height of the data point x in the forest.
A blackout cheating recognition system based on anomaly detection comprises:
the user data acquisition module is used for acquiring basic information of a user;
an output dimension definition module for constructing data features from inviter dimensions in a marketing scenario;
the characteristic construction module is used for constructing the common characteristics of the black product users;
the model training module is used for combining the obtained characteristics to train the model after the characteristics are constructed by the characteristic construction module;
and the risk strategy making module is used for standardizing the abnormal score to be within a score range of 0-100, defining the risk level of the inviter M1 according to data distribution and data performance, and configuring a proper risk handling strategy.
As a further scheme of the invention: the user data acquisition module comprises:
the account opening and personal information acquisition unit is used for acquiring the basic account opening information of the user;
a login information acquisition unit for acquiring login information of a user;
the system comprises an inviter information acquisition unit, a user information acquisition unit and a user information acquisition unit, wherein the inviter information acquisition unit is used for acquiring information of inviters of a user;
the behavior data acquisition unit is used for acquiring the behavior data of the user;
and the transfer record unit is used for acquiring the transfer record of the user.
As a still further scheme of the invention: the feature construction module includes:
the equipment aggregation characteristic unit is used for constructing equipment aggregation characteristics and calculating the number of persons who are invited to log in and the number of persons who have the same equipment;
the personal information gathering characteristic unit is used for constructing personal information gathering characteristics, namely the login password is consistent with the payment password after the encrypted information is encrypted by the invitee, the number of people with the consistent inviter accounts is compared with the number of people with the same account opening filling address information;
the transaction aggregation characteristic unit is used for constructing transaction aggregation characteristics, namely the number of the invited persons who have the rotary activity related money to the unified account is compared;
and the related characteristic unit is used for constructing a characteristic that only the related page is required for browsing the participation activity reward, constructing a characteristic that the user proportion of any input box is not triggered, and constructing a characteristic that the practical APP market is less than one minute of user proportion.
As a still further scheme of the invention: the model training module comprises:
the isolated tree construction unit is used for sampling from the training set, constructing an isolated tree and taking randomly-extracted sub-samples as root nodes;
a cut point determining unit, configured to generate a cut point in current node data when a certain feature is randomly specified, where the cut point is randomly generated between a maximum value and a minimum value of the feature of a current node;
the system comprises a molecule number dividing unit, a node number dividing unit and a node number dividing unit, wherein the molecule number dividing unit is used for placing data of which the designated dimension of a current node data space is smaller than a cutting point in a left sub-tree and placing data of which the designated dimension is larger than or equal to that of a current node data space in a right sub-tree;
the training data acquisition unit is used for stopping after the cutting point determination unit and the molecule number division unit are repeatedly cut until the child node only has one data or reaches the preset tree height;
and the sample anomaly calculation unit is used for substituting the training samples into each isolated tree in Isolation Forest, calculating through a first formula, recording the path length, and calculating the anomaly score of each sample.
Compared with the prior art, the invention has the beneficial effects that: according to the method and the device, the abnormal users can be dynamically identified through some abnormal indexes of user behaviors and services, the abnormal degree of the users can be defined through the weight, and dynamic management and control are realized through defining the abnormal degree threshold or the abnormal user proportion.
1. The characteristic threshold is intelligently determined by adopting methods such as machine learning and the like, so that the limitation and instability of a manual decision threshold are eliminated, and the threshold can be changed along with the change of data distribution;
2. the multiple dimensionality risk characteristics are integrated, and compared with the traditional single rule identification, the limitation and the unicity of the model are reduced;
3. malicious risk customers can be identified more accurately, damage of fraudulent users to benefits of companies is reduced, and cost is saved for the companies.
Drawings
Fig. 1 is a schematic diagram of a blackout cheating identification method based on anomaly detection.
Fig. 2 is a specific schematic diagram of a black product cheating identification method based on anomaly detection.
Fig. 3 is a schematic diagram of a black product cheating recognition system based on anomaly detection.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1 to 3, in an embodiment of the present invention, a method for identifying a black product cheat based on anomaly detection includes the following steps: 1. acquiring user data, wherein the data to be acquired comprises user account opening and personal information, user login information, user inviter information, user behavior data, user transfer data and the like, so that basic information of a user can be acquired;
2. defining an output dimension as an inviter dimension, namely for marketing scenes such as fission and pull, and the like, wherein main nodes are the inviter and the invitee, the inviter and the invitee are in one-to-many relationship, benefit driving points of cheating by the inviter are far higher than those of the invitee, the inviter is more likely to become a black-production cheating user, and the inviter risk can be better identified through whether the user features invited by the inviter are aggregated or the abnormal user occupation ratio, so that the patent constructs data features from the inviter dimension;
3. analyzing, constructing and mining common characteristics of black product users, wherein the black product users usually control a large number of accounts by one person or a small number of persons, so that the associated users usually have related characteristics of aggregation, wherein the constructed characteristics need corresponding data packets, and in the embodiment, the needed data packets comprise a first account opening table and a user table; logging in a log table by a user; thirdly, the user invites the relation table; fourthly, burying a data table by the user behavior; a user transfer log table,
for example, the aggregation of user information and logged devices, an aggregation feature can be constructed, the aggregation of devices: the number of the invited persons who log in the same equipment is larger than that of the invited persons who log in the same equipment; personal information gathering: after the invitee encrypts the login password, the payment password and the number of people consistent with the inviter are compared, and the number of people with the same address information is filled in the account; transaction aggregation: the invitee has the related money of the rotary activities to account for the same number of people;
for example, because the fraudulent user has a stronger purpose of participating in the activity, and compared with the normal user, the behavior data is relatively single, and the browsing duration is shorter, the following characteristics can be constructed, that is, only the relevant page required by the participation in the activity reward is browsed, the user proportion of any input box is not triggered, and the user proportion of the APP duration is less than 1 minute;
finally, considering dimension risk features such as abnormal means counterfeit device id and the like, including using root and simulator user proportion, the process required for calculating relevant features in the present embodiment is as follows: taking n inviters m1 in the counting period as sample points, logging the same equipment number as the invitee m2 as the characteristic x1 through the data table construction m1, logging the same equipment number as the invitee m2 as the characteristic x2 through m2 invited by the same m1, calculating the number characteristics x3 and x4 of the people with m1 and m2 and the comparison between the login password and the payment password consistent through a data table, wherein the number of people with m2 encrypted login password and the comparison between the login password and the payment password consistent through the same m1 invitation is the characteristics x5 and x6, the number of people with m2 address consistent is the characteristic x7, the number of persons who are invited to have the relevant amount of the rotary activity to the same account through the data table is calculated as characteristic x8, calculating m2 number of the news to the data only of a login page through a data table to be characteristic x9, m2 number of any input boxes which are not triggered to be characteristic x10, m2 number of practical APP time length which does not exceed one minute is characteristic x11, and m2 number of practical root and simulator equipment id is characteristic x12 and x 13; filling missing values into the numerical values, and dividing x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12 and x13 by a number a of m2 invited by m1 in a period;
4. after the characteristics in the step 3 are constructed, model training is carried out by combining the obtained characteristics, and because an unsupervised training method is adopted, a model is established mainly based on the Isolation Forest method;
5. and (4) formulating a risk strategy, standardizing the abnormal score to be within a 0-100 score interval, defining the risk level of the inviter according to data distribution and data performance, and configuring a proper risk handling strategy.
As a further embodiment of the present application, please refer to fig. 1 and fig. 2, wherein the model training method in step 4 is as follows, firstly, sampling is performed from the training set to construct an isolated tree, randomly extracted subsamples are used as root nodes, that is, psi sample points are immediately extracted from the data sample population to form a subset, and the subset is placed into the root nodes of the isolated tree; randomly appointing a certain characteristic, generating a cutting point p in the current node data, wherein the p is randomly generated between the maximum value and the minimum value of the characteristic of the current node, namely randomly appointing a characteristic q from all the characteristics, and randomly generating a cutting point p in the value of the characteristic q of the current node; dividing subtrees, placing data with the appointed dimension of the current node data space smaller than a cutting point p in a left subtree, placing data with the appointed dimension larger than or equal to p in a right subtree, namely cutting samples, dividing the current data space into two subspaces, placing the data with the characteristic q smaller than the cutting point p in the left subtree, and placing the data with the characteristic q larger than or equal to p in the right subtree; repeating the step (c) in the child nodes until the child nodes have only one data or reach the height of a predefined number, namely judging whether all the nodes have only one sample point or the isolated tree reaches the specified height, if so, adding 1 to the current number of the isolated tree, otherwise, continuing the step (c); testing each isolated tree in the Isolation Forest by the training sample, recording the path length, calculating the abnormal score of each sample, namely performing abnormal detection on each sample, and calculating the abnormal score of each sample by the following calculation method
Figure BDA0003580103850000081
c (ψ) ═ 2H (ψ -1) -2(ψ -1)/n, where x is the data point, ψ is the number of sample points, H (i) is a key sum, and E (H (x)) is the average height of the data point x in the forest.
Referring to fig. 3, in an embodiment of the present invention, a blackout cheating identification system based on anomaly detection includes: user data acquisition module, output dimension definition module, characteristic construction module, model training module and risk strategy make module, wherein utilize user data acquisition module to obtain user's basic information at first, user data acquisition module includes in this embodiment: the system comprises an account opening and personal information acquisition unit, a login information acquisition unit, an inviter information acquisition unit, a behavior data acquisition unit, a transfer record unit and other basic information acquisition units, wherein the account opening and personal information acquisition unit is used for acquiring the basic information of the account opening of a user, the login information acquisition unit is used for acquiring the login information of the user, the inviter information acquisition unit is used for acquiring the information of the inviter of the user, the behavior data acquisition unit is used for acquiring the behavior data of the user, and the transfer record unit is used for acquiring the transfer record of the user; the output dimension definition module is used for constructing data characteristics from dimensions of the inviter in a marketing scene; the feature construction module is used for constructing the common features of the black product users, wherein the feature construction module comprises: the equipment aggregation characteristic unit, the personal information aggregation characteristic unit, the transaction aggregation characteristic unit and the related characteristic unit are firstly used for constructing the equipment aggregation characteristic, calculating the number of people who are invited to log in between the invited people and who have the invited people to log in the same equipment, then using the personal information aggregation characteristic unit to construct the personal information aggregation characteristic, namely, after the encrypted login password of the invitee is consistent with the payment password, the number of people with consistent login password of the invitee is compared with the number of people with the same account opening address information, then the transaction aggregation characteristic unit is utilized to construct the transaction aggregation characteristic, the method comprises the steps that firstly, a user is invited to a uniform account, namely the number of the invited user has the rotation activity related amount, and finally, the related characteristic unit is utilized to construct the characteristics of only browsing the related page required for participating in the activity reward, construct the characteristics of not triggering any input box user proportion and construct the characteristics of a practical APP market smaller than one minute user proportion; the model training module is used for performing model training by combining the obtained characteristics after the characteristics are constructed by the characteristic construction module; the risk strategy making module is used for standardizing the abnormal score to be within the interval of 0-100 scores, defining the risk level of the inviter M1 according to data distribution and data performance, and configuring a proper risk handling strategy.
As a further embodiment of the present application, please refer to the drawings, wherein the model training module comprises: the isolated tree construction unit is used for sampling from the training set, constructing an isolated tree and taking randomly-extracted sub-samples as root nodes; a cut point determining unit, configured to generate a cut point in current node data when a certain feature is randomly specified, where the cut point is randomly generated between a maximum value and a minimum value of the feature of a current node; the system comprises a molecule number dividing unit, a node number dividing unit and a node number dividing unit, wherein the molecule number dividing unit is used for placing data of which the designated dimension of a current node data space is smaller than a cutting point in a left sub-tree and placing data of which the designated dimension is larger than or equal to that of a current node data space in a right sub-tree; the training data acquisition unit is used for stopping after the cutting point determination unit and the molecule number division unit are repeatedly cut until the child node only has one data or reaches the preset tree height; and the sample anomaly calculation unit is used for substituting the training samples into each isolated tree in Isolation Forest, calculating through a first formula, recording the path length, and calculating the anomaly score of each sample.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims (10)

1. A black product cheating identification method based on anomaly detection is characterized by comprising the following steps:
s1, acquiring user data;
s2, defining an output dimension as an inviter dimension;
s3, analyzing, constructing and mining the commonalities of black-produced users;
s4, after the features in the step S3 are constructed, model training is carried out by combining the obtained features;
s5, establishing a risk strategy, standardizing the abnormal score to be within a score range of 0-100, defining the risk level of the inviter according to data distribution and data performance, and configuring a proper risk handling strategy.
2. The anomaly detection-based blackjack cheating recognition method according to claim 1, wherein the step S1 comprises the steps of:
s11, acquiring user account opening and personal information;
s12, obtaining user login information;
s13, acquiring user inviter information;
s14, acquiring user behavior data;
and S15, acquiring the user transfer record.
3. The black production cheating recognition method based on anomaly detection according to claim 1, wherein said step S3 comprises the steps of
S31, constructing the characteristics of equipment aggregation, and calculating the number of people who are invited to log in and the number of people who have the same equipment;
s32, constructing personal information gathering characteristics, wherein after being encrypted, the login password and the payment password are consistent, the number of people who are consistent with each other is compared with the number of people who are consistent with each other, and the number of people who are consistent with each other is compared with the number of people who are the same with the address information filled in the account opening;
s33, constructing the characteristics of transaction aggregation, wherein the number of the invited people who have the relevant money of the rotary activities to the unified account is larger than that of the unified account;
s34, constructing the characteristic of only browsing the page related to the participation activity reward requirement;
s35, constructing a feature which does not trigger the user proportion of any input box;
s36, constructing the characteristic that the practical APP market is smaller than one minute of user ratio.
4. The anomaly detection-based blackjack cheating recognition method according to claim 1, wherein the step S4 comprises the steps of:
s41, sampling from the training set for constructing an isolated tree, and taking randomly-extracted sub-samples as root nodes;
s42, randomly appointing a certain characteristic, and generating a cutting point in the current node data, wherein the cutting point is randomly generated between the maximum value and the minimum value of the characteristic of the current node;
s43, dividing sub-trees, placing the data with the appointed dimension of the current node data space smaller than the cut point in the left sub-tree, and placing the data larger than or equal to the appointed dimension of the current node data space in the right sub-tree;
s44, in the child node, repeating the step S42 and the step S43 until the child node has only one data or reaches a predefined number height;
and S45, substituting the training samples into each isolated tree in Isolation Forest, calculating through a first formula, recording the path length, and calculating the abnormal score of each sample.
5. The anomaly detection-based blackproduction cheating identification method according to claim 1, wherein the step S3 of constructing the data source required by the commonalities characteristics of the blackproduction users comprises: a user table, a user login log table, a user invitation relation table, a user behavior buried point data table and a user transfer log table;
the processing procedure of the common characteristic in the step S3 includes: taking n inviters m1 in the counting period as sample points, logging the same equipment number as the invitee m2 as the characteristic x1 through the data table construction m1, logging the same equipment number as the invitee m2 as the characteristic x2 through m2 invited by the same m1, calculating the number characteristics x3 and x4 of the people with m1 and m2 and the comparison between the login password and the payment password consistent through a data table, wherein the number of people with m2 encrypted login password and the comparison between the login password and the payment password consistent through the same m1 invitation is the characteristics x5 and x6, the number of people with m2 address consistent is the characteristic x7, the number of persons having the amount associated with the gyrating activity to the same account by the invitee is calculated as characteristic x8 through the data table, calculating m2 number of news to data only of a login page through a data table to be characteristic x9, m2 number without triggering any input box to be characteristic x10, m2 number with practical APP duration not exceeding one minute to be characteristic x11, and m2 number with root and simulator device id to be practical is characteristic x12 and x 13; and filling missing values into the numerical values, and dividing x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12 and x13 by the number a of m2 invited by m1 in the counting period.
6. The anomaly detection-based blackjack cheating identification method according to claim 4, wherein the step of
Figure FDA0003580103840000031
The first formula in step S45 is: c (ψ) ═ 2H (ψ -1) -2(ψ -1)/n, where x is the data point, ψ is the number of sample points, H (i) is a key sum, and E (H (x)) is the average height of the data point x in the forest.
7. A blackout cheating recognition system based on anomaly detection, comprising:
the user data acquisition module is used for acquiring basic information of a user;
an output dimension definition module for constructing data features from inviter dimensions in a marketing scenario;
the characteristic construction module is used for constructing the common characteristics of the black product users;
the model training module is used for combining the obtained characteristics to train the model after the characteristics are constructed by the characteristic construction module;
and the risk strategy making module is used for standardizing the abnormal score to be within a score range of 0-100, defining the risk level of the inviter M1 according to data distribution and data performance, and configuring a proper risk handling strategy.
8. The anomaly detection-based blackjack cheating recognition system according to claim 7, wherein the user data acquisition module comprises:
the account opening and personal information acquisition unit is used for acquiring the basic account opening information of the user;
a login information acquisition unit for acquiring login information of a user;
the system comprises an inviter information acquisition unit, a user information acquisition unit and a user information acquisition unit, wherein the inviter information acquisition unit is used for acquiring information of inviters of a user;
the behavior data acquisition unit is used for acquiring behavior data of a user;
and the transfer record unit is used for acquiring the transfer record of the user.
9. The anomaly detection-based blackjack cheating recognition system of claim 7, wherein the signature construction module comprises:
the equipment aggregation characteristic unit is used for constructing equipment aggregation characteristics and calculating the number of persons who are invited to log in and the number of persons who have the same equipment;
the personal information gathering characteristic unit is used for constructing personal information gathering characteristics, namely the login password is consistent with the payment password after the encrypted information is encrypted by the invitee, the number of people with the consistent inviter accounts is compared with the number of people with the same account opening filling address information;
the transaction aggregation characteristic unit is used for constructing transaction aggregation characteristics, namely the number of the invited persons who have the rotary activity related money to the unified account is compared;
and the related characteristic unit is used for constructing a characteristic that only the related page is required for browsing the participation activity reward, constructing a characteristic that the user proportion of any input box is not triggered, and constructing a characteristic that the practical APP market is less than one minute of user proportion.
10. The anomaly detection-based blackjack cheating recognition system of claim 7, wherein the model training module comprises:
the isolated tree construction unit is used for sampling from the training set, constructing an isolated tree and taking randomly-extracted sub-samples as root nodes;
a cut point determining unit, configured to generate a cut point in current node data when a certain feature is randomly specified, where the cut point is randomly generated between a maximum value and a minimum value of the feature of a current node;
the system comprises a molecule number dividing unit, a node number dividing unit and a node number dividing unit, wherein the molecule number dividing unit is used for placing data of which the designated dimension of a current node data space is smaller than a cutting point in a left sub-tree and placing data of which the designated dimension is larger than or equal to that of a current node data space in a right sub-tree;
the training data acquisition unit is used for stopping after the cutting point determination unit and the molecule number division unit are repeatedly cut until the child node only has one data or reaches the preset tree height;
and the sample anomaly calculation unit is used for substituting the training samples into each isolated tree in Isolation Forest, calculating through a first formula, recording the path length, and calculating the anomaly score of each sample.
CN202210343185.0A 2022-04-02 2022-04-02 Black product cheating identification method and system based on anomaly detection Pending CN114880663A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210343185.0A CN114880663A (en) 2022-04-02 2022-04-02 Black product cheating identification method and system based on anomaly detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210343185.0A CN114880663A (en) 2022-04-02 2022-04-02 Black product cheating identification method and system based on anomaly detection

Publications (1)

Publication Number Publication Date
CN114880663A true CN114880663A (en) 2022-08-09

Family

ID=82669445

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210343185.0A Pending CN114880663A (en) 2022-04-02 2022-04-02 Black product cheating identification method and system based on anomaly detection

Country Status (1)

Country Link
CN (1) CN114880663A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116128534A (en) * 2023-04-13 2023-05-16 上海二三四五网络科技有限公司 User fission cheating identification method and device based on comprehensive similarity

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116128534A (en) * 2023-04-13 2023-05-16 上海二三四五网络科技有限公司 User fission cheating identification method and device based on comprehensive similarity

Similar Documents

Publication Publication Date Title
TWI712981B (en) Risk identification model training method, device and server
Ma et al. A new aspect on P2P online lending default prediction using meta-level phone usage data in China
US11436430B2 (en) Feature information extraction method, apparatus, server cluster, and storage medium
CN110892442A (en) System, method and apparatus for adaptive scoring to detect misuse or abuse of business cards
CN112053221A (en) Knowledge graph-based internet financial group fraud detection method
CN108399509A (en) Determine the method and device of the risk probability of service request event
CN103678659A (en) E-commerce website cheat user identification method and system based on random forest algorithm
CN110148000A (en) A kind of security management and control system and method applied to payment platform
CN101236638A (en) Web based bank card risk monitoring method and system
CN102946331A (en) Detecting method and device for zombie users of social networks
CN107807941A (en) Information processing method and device
CN110598982B (en) Active wind control method and system based on intelligent interaction
Liu et al. A graph learning based approach for identity inference in dapp platform blockchain
Mawutor Impact of E-Banking on the Profitability of Banks in Ghana
CN110119980A (en) A kind of anti-fraud method, apparatus, system and recording medium for credit
CN113902037A (en) Abnormal bank account identification method, system, electronic device and storage medium
CN114880663A (en) Black product cheating identification method and system based on anomaly detection
CN111831715A (en) Intelligent access and certificate storage system and method based on artificial intelligence big data
CN111582757B (en) Method, device, equipment and computer readable storage medium for analyzing fraud risk
US20100042446A1 (en) Systems and methods for providing core property review
EP3879418A1 (en) Identity verification method and device
CN115564591A (en) Financing product determination method and related equipment
CN115034685A (en) Customer value evaluation method, customer value evaluation device and computer-readable storage medium
KR102358156B1 (en) Method for providing customized loan brokerage services
Nimesh et al. A survey on opinion mining and sentiment analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination