CN112734433A - Abnormal user detection method and device, electronic equipment and storage medium - Google Patents

Abnormal user detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112734433A
CN112734433A CN202011455045.XA CN202011455045A CN112734433A CN 112734433 A CN112734433 A CN 112734433A CN 202011455045 A CN202011455045 A CN 202011455045A CN 112734433 A CN112734433 A CN 112734433A
Authority
CN
China
Prior art keywords
user
result
feature data
feature
abnormal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011455045.XA
Other languages
Chinese (zh)
Inventor
曾伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Huantai Digital Technology Co ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Shenzhen Huantai Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd, Shenzhen Huantai Technology Co Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202011455045.XA priority Critical patent/CN112734433A/en
Publication of CN112734433A publication Critical patent/CN112734433A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Technology Law (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method and a device for detecting an abnormal user, electronic equipment and a storage medium, and relates to the technical field of electronic equipment. The method comprises the following steps: the method comprises the steps of obtaining user data of each user to be detected, carrying out feature processing on the user data based on preset feature conditions, obtaining a first amount of feature data, traversing the first amount of feature data, carrying out cluster analysis on the first amount of feature data, obtaining a plurality of result clusters, carrying out anomaly analysis on the feature data contained in each of the plurality of result clusters, obtaining an anomaly analysis result of each result cluster, and determining the user to be detected corresponding to the feature data contained in the result cluster of which the anomaly analysis result meets the preset anomaly conditions as an anomalous user. According to the method and the device, the cluster analysis and the anomaly detection are combined, so that the identification of an abnormal user can be effectively improved while the rapid iteration is ensured.

Description

Abnormal user detection method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of electronic device technologies, and in particular, to a method and an apparatus for detecting an abnormal user, an electronic device, and a storage medium.
Background
Currently, anti-fraud schemes are continually evolving as fraud becomes more severe. However, the current anti-fraud scheme generally lags behind the fraud event, and the fraud event is also continuously updated, so that the anti-fraud scheme has a poor detection effect on the fraud event, and the prevention effect on the fraud event is poor.
Disclosure of Invention
In view of the above problems, the present application provides a method, an apparatus, an electronic device, and a storage medium for detecting an abnormal user, so as to solve the above problems.
In a first aspect, an embodiment of the present application provides a method for detecting an abnormal user, where the method includes: acquiring user data of each user to be detected; performing feature processing on the user data based on a preset feature condition to obtain a first amount of feature data; traversing the first quantity of feature data, and performing cluster analysis on the first quantity of feature data to obtain a plurality of result clusters; performing anomaly analysis on the characteristic data contained in each result cluster in the plurality of result clusters to obtain an anomaly analysis result of each result cluster; and determining the user to be detected corresponding to the characteristic data contained in the result cluster of which the abnormal analysis result meets the preset abnormal condition as the abnormal user.
In a second aspect, an embodiment of the present application provides an apparatus for detecting an abnormal user, where the apparatus includes: the user data acquisition module is used for acquiring the user data of each user to be detected; the characteristic data acquisition module is used for carrying out characteristic processing on the user data based on preset characteristic conditions to acquire a first amount of characteristic data; a result cluster obtaining module, configured to traverse the first amount of feature data, and perform cluster analysis on the first amount of feature data to obtain a plurality of result clusters; an anomaly analysis result obtaining module, configured to perform anomaly analysis on feature data included in each of the multiple result clusters to obtain an anomaly analysis result of each result cluster; and the abnormal user detection module is used for determining the user to be detected corresponding to the characteristic data contained in the result cluster of which the abnormal analysis result meets the preset abnormal condition as the abnormal user.
In a third aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, the memory being coupled to the processor, the memory storing instructions, and the processor performing the above method when the instructions are executed by the processor.
In a fourth aspect, the present application provides a computer-readable storage medium, in which a program code is stored, and the program code can be called by a processor to execute the above method.
The method, the device, the electronic device and the storage medium for detecting the abnormal users, provided by the embodiment of the application, are used for obtaining user data of each user to be detected, performing feature processing on the user data based on preset feature conditions to obtain a first quantity of feature data, traversing the first quantity of feature data, performing cluster analysis on the first quantity of feature data to obtain a plurality of result clusters, performing abnormal analysis on the feature data contained in each result cluster in the plurality of result clusters to obtain an abnormal analysis result of each result cluster, and determining the user to be detected corresponding to the feature data contained in the result cluster of which the abnormal analysis result meets the preset abnormal conditions as the abnormal user.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart illustrating a method for detecting an abnormal user according to an embodiment of the present application;
FIG. 2 is a flow chart illustrating a method for detecting an abnormal user according to another embodiment of the present application;
fig. 3 is a flowchart illustrating step S220 of the abnormal user detection method illustrated in fig. 2 of the present application;
fig. 4 is a flowchart illustrating step S230 of the abnormal user detection method illustrated in fig. 2 of the present application;
FIG. 5 is a flow chart illustrating a method for detecting an abnormal user according to still another embodiment of the present application;
FIG. 6 is a flow chart illustrating a method for detecting an abnormal user according to another embodiment of the present application;
fig. 7 is a flowchart illustrating step S440 of the abnormal user detection method illustrated in fig. 6 of the present application;
FIG. 8 is a flow chart illustrating a method for detecting abnormal users according to still another embodiment of the present application;
fig. 9 is a flowchart illustrating a method for detecting an abnormal user according to yet another embodiment of the present application;
FIG. 10 is a block diagram illustrating an apparatus for detecting an abnormal user according to an embodiment of the present application;
fig. 11 is a block diagram of an electronic device for executing a method for detecting an abnormal user according to an embodiment of the present application;
fig. 12 illustrates a storage unit for storing or carrying program codes for implementing the abnormal user detection method according to the embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
At present, the fraud incidents are more and more intense, and according to statistics, the phishing in 2017 in China leads to the loss of more than 1.4 ten thousand yuan of victims, but the phishing is not taken as a deceptive party, and on the contrary, the cost for preventing and detecting abuse, fraud and illegal legalization in the years is still continuously increased.
And with the attack and defense upgrade of fraud and anti-fraud technologies, the current fraud presents the following characteristics: specialization of fraud tools: professional crime tools such as VOIP, pineapple WifPineapple and one-key new-machine APP are popular and are continuously upgraded. Mimicking normal user behavior: the relevance of the GPS and the ip of the fraudulent party is gradually weakened, the regularity of the mobile phone number section is weakened, no obvious centralization exists in the residence and the unit, meanwhile, the fraudulent party cannot frequently and repeatedly apply within the same day, the application time interval is less, 1 month is obtained, and more months are obtained.
Currently, anti-fraud schemes suffer from the following disadvantages: 1) anti-fraud schemes lag behind fraud approaches. Current mainstream processing fraud schemes are rules and supervised learning. However, rules and supervised learning are mainly based on attack patterns observed historically, and fraudsters can quickly change fraud patterns, so rules and models are generally lagged compared to changes in fraud measures. And the fraud tag needs to be deposited and verified for months, and model tuning also consumes a lot of time, so that the traditional mode is difficult to effectively cope with the continuously changing fraud attacks. 2) Non-centralized fraud events and accounts. In the current upgrading of fraud, the aggregated group fraud features that were more apparent in the past are gradually disappearing. Fraudulent accounts become more covert and increasingly difficult to detect. Such as the exposure of fraudulent molecules by current frequent applications, is becoming less and less. A larger scale of comprehensive analysis is needed to discover potential relationships between accounts. 3) Digital information mining is inadequate. Currently most vendors mine more for devices such as ip addresses, e-mail, and single devices. But if a fraudulent person uses a special mobile phone which is difficult to obtain the equipment information to apply for a low frequency, the fraudulent person is difficult to find. 4) The effect of using a clustering or anomaly detection algorithm alone is not good. On a large-scale sample, the abnormal users are difficult to be effectively separated by singly using a clustering algorithm, and the abnormal users are easily influenced by noise points and difficult to identify real fraudulent molecules by singly using abnormal detection.
In view of the above problems, the inventors have found through long-term research and provide a method, an apparatus, an electronic device, and a storage medium for detecting an abnormal user, which are provided in the embodiments of the present application, and by combining cluster analysis and abnormal detection, while ensuring fast iteration, identification of an abnormal user can be effectively improved. The specific method for detecting an abnormal user is described in detail in the following embodiments.
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a method for detecting an abnormal user according to an embodiment of the present application. The detection method of the abnormal user is used for combining cluster analysis and abnormal detection, so that the identification of the abnormal user can be effectively improved while fast iteration is guaranteed. In a specific embodiment, the method for detecting the abnormal user is applied to the detecting apparatus 200 for the abnormal user shown in fig. 10 and the electronic device 100 (fig. 11) equipped with the detecting apparatus 200 for the abnormal user. The specific process of the present embodiment will be described below by taking an electronic device as an example, and it is understood that the electronic device applied in the present embodiment may include a smart phone, a tablet computer, a wearable electronic device, and the like, which is not limited herein. As will be described in detail with respect to the flow shown in fig. 10, the method for detecting an abnormal user may specifically include the following steps:
step S110: and acquiring user data of each user to be detected.
In this embodiment, user data of each user to be detected may be acquired. The users to be detected may include all users, may include users who are suspected to have possible fraud initially, and may also include users who are suspected to have possible fraud initially with a ratio greater than a specified ratio, and the like, which is not limited herein.
In some embodiments, the user data of each user to be detected may be acquired from a preset database, and the user data of each user to be detected may be acquired in real time, which is not limited herein.
In some implementations, the user data can include at least attribute data and behavior data. The attribute data may include address, gender, age, work unit, etc., and the behavior data may include time of applying for loan, amount of applying for loan, account registration time interval, etc., which are not limited herein.
Step S120: and performing feature processing on the user data based on a preset feature condition to obtain a first amount of feature data.
In some embodiments, the electronic device may preset and store a feature condition as a preset feature condition, where the preset feature condition is used as a basis for feature extraction of user data. Therefore, in this embodiment, after the user data of each user is acquired, the user data may be subjected to feature processing based on a preset feature condition, so as to obtain a first number of feature data.
In some embodiments, the preset feature condition may include a preset feature extraction condition and a preset feature screening condition, and the first amount of feature data may be obtained by performing feature extraction on the user data based on the preset feature extraction condition and performing feature screening on the user data based on the preset feature screening condition.
In some embodiments, the first amount of feature data may include at least attribute feature data and behavior feature data.
Step S130: and traversing the first quantity of characteristic data, and carrying out cluster analysis on the first quantity of characteristic data to obtain a plurality of result clusters.
In this embodiment, after the first amount of feature data is obtained, the first amount of feature data may be traversed, and the first amount of feature data may be subjected to cluster analysis based on a traversal result for the first amount of feature data to obtain a plurality of result clusters. Wherein each result cluster of the plurality of result clusters comprises a plurality of feature data.
In some embodiments, the electronic device may preset and store a preset clustering algorithm, and may traverse the first amount of feature data after obtaining the first amount of feature data, and perform cluster analysis on the first amount of feature data based on a traversal result for the first amount of feature data through the preset clustering algorithm to obtain a plurality of result clusters.
In some embodiments, the electronic device may preset and store a plurality of preset clustering algorithms, and after obtaining the first number of feature data, may traverse the first number of feature data to obtain a traversal result for the first number of feature data, determine one preset clustering algorithm from the plurality of preset clustering algorithms as a target clustering algorithm according to the traversal result, and perform cluster analysis on the first number of feature data through the target clustering algorithm to obtain a plurality of result clusters.
Step S140: and performing anomaly analysis on the characteristic data contained in each result cluster in the plurality of result clusters to obtain an anomaly analysis result of each result cluster.
In this embodiment, after obtaining the plurality of result clusters, the anomaly analysis may be performed on the feature data included in each of the plurality of result clusters, so as to obtain an anomaly analysis result of each of the result clusters.
In some embodiments, after obtaining the plurality of result clusters, the abnormal value of each result cluster in the plurality of result clusters and the proportion of the abnormal user corresponding to the abnormal value may be analyzed to obtain the abnormal analysis result of each result cluster.
Step S150: and determining the user to be detected corresponding to the characteristic data contained in the result cluster of which the abnormal analysis result meets the preset abnormal condition as the abnormal user.
In some embodiments, the electronic device may preset and store a preset abnormal condition, where the preset abnormal condition is used as a judgment basis for an abnormal analysis result. Therefore, in the embodiment, after obtaining the abnormal analysis result, the abnormal analysis result may be compared with the preset abnormal condition to determine whether the abnormal analysis result satisfies the preset abnormal condition, wherein, when the judgment result represents that the abnormal analysis result meets the preset abnormal condition, the user to be detected corresponding to the characteristic data contained in the result cluster of the abnormal analysis result meeting the preset abnormal condition can be determined as the abnormal user, and when the judgment result represents that the abnormal analysis result does not meet the preset abnormal condition, it can be determined that the user to be detected corresponding to the feature data included in the result cluster of the abnormal analysis result which does not satisfy the preset abnormal condition is a non-abnormal user, or, and judging the abnormal user by other modes for the user to be detected corresponding to the characteristic data contained in the result cluster of the abnormal analysis result which does not meet the preset abnormal condition.
In some embodiments, when it is determined that the user to be detected corresponding to the feature data included in a certain result cluster is an abnormal user, it may be considered that the user to be detected corresponding to the feature data included in the result cluster is a fraudulent group, and then the result cluster may be analyzed and verified to determine whether the user to be detected corresponding to the feature data included in the result cluster is a real fraudulent group, so as to prevent misjudgment caused by problems such as data acquisition errors, and if the analysis and verification result indicates that the user to be detected corresponding to the feature data included in the result cluster is a real fraudulent group, the basis and result of the judgment may be included in the rules and models. As a manner, more features of the user to be detected corresponding to the feature data included in the result cluster may be extracted for further observation and analysis, so as to perform analysis and verification, for example, the call record of the user to be detected corresponding to the feature data included in the result cluster may be extracted for analysis and verification, and the like, which is not limited herein.
The method for detecting abnormal users, provided by one embodiment of the application, includes the steps of obtaining user data of each user to be detected, performing feature processing on the user data based on preset feature conditions to obtain a first amount of feature data, traversing the first amount of feature data, performing cluster analysis on the first amount of feature data to obtain a plurality of result clusters, performing abnormal analysis on feature data contained in each result cluster of the plurality of result clusters to obtain an abnormal analysis result of each result cluster, and determining the user to be detected corresponding to the feature data contained in the result cluster of which the abnormal analysis result meets the preset abnormal conditions as the abnormal user.
Referring to fig. 2, fig. 2 is a schematic flowchart illustrating a method for detecting an abnormal user according to another embodiment of the present application. As will be described in detail with respect to the flow shown in fig. 2, the method for detecting an abnormal user may specifically include the following steps:
step S210: and acquiring user data of each user to be detected.
For detailed description of step S210, please refer to step S110, which is not described herein again.
Step S220: and performing feature extraction on the user data based on preset feature extraction conditions to obtain a second amount of feature data.
In some embodiments, the preset feature condition includes a preset feature extraction condition, and the electronic device may preset and store the feature extraction condition as the preset feature extraction condition, where the preset feature extraction condition is used as a basis for feature extraction of the user data. Therefore, in this embodiment, after the user data of each user to be detected is obtained, feature extraction may be performed on the user data based on a preset feature extraction condition to obtain a second number of feature data.
Referring to fig. 3, fig. 3 is a flowchart illustrating step S220 of the abnormal user detection method illustrated in fig. 2 of the present application. As will be explained in detail with respect to the flow shown in fig. 3, the method may specifically include the following steps:
step S221: and extracting attribute features of the user data to obtain a third amount of attribute feature data.
In some embodiments, after the user data of each user to be detected is obtained, the attribute feature extraction may be performed on the user data to obtain a third number of attribute feature data. As one mode, the attribute feature data may include an address, a gender, an age, a work unit, a used mobile phone model, a postfix, and the like of the user to be detected, which is not limited herein.
Step S222: and performing behavior feature extraction on the user data to obtain a fourth amount of behavior feature data.
In some embodiments, after the user data of each user to be detected is obtained, behavior feature extraction may be performed on the user data to obtain a fourth number of behavior feature data, where a size relationship between the third number and the fourth number is not limited herein, that is, the third number may be greater than the fourth number, may be smaller than the fourth number, or may be equal to the fourth number. As one way, the behavior feature data may include time of applying a loan, amount of applying a loan, account registration time interval, average amount of borrowing in a recent period, and the like of the user to be detected at each large bank or institution, which is not limited herein.
Step S223: obtaining the second amount of feature data based on the third amount of attribute feature data and the fourth amount of behavior feature data.
In some embodiments, after obtaining the third amount of attribute feature data and the fourth amount of behavior feature data, the second amount of feature data may be obtained based on the third amount of attribute feature data and the fourth amount of behavior feature data. As one way, the third number of attribute feature data and the fourth number of behavior feature data may be added to obtain the second number of feature data.
Step S230: and performing feature screening on the second quantity of feature data based on preset feature screening conditions to obtain the first quantity of feature data.
In some embodiments, the preset feature condition includes a preset feature filtering condition, and the electronic device may preset and store the feature filtering condition as the preset feature filtering condition, where the preset feature filtering condition is used as a basis for performing feature filtering on the user data. Therefore, in this embodiment, after the second amount of feature data is acquired, the second amount of feature data may be filtered based on the preset feature filtering condition to obtain the first amount of feature data. Typically, the second number is greater than the first number.
In some embodiments, the second amount of feature data may be feature filtered based on feature missing values, single values, skewness, etc. to obtain the first amount of feature data. In some embodiments, the feature data of the second quantity may be further subjected to feature screening based on a Boruta method, a variance expansion coefficient method, backward selection, an L1 penalty term, service logic, and the like, so as to obtain the feature data of the first quantity, where when a logistic regression algorithm is used, a scheme such as WOE or normalization is often used, and details are not repeated here.
Referring to fig. 4, fig. 4 is a flowchart illustrating a step S230 of the method for detecting an abnormal user shown in fig. 2 of the present application. As will be explained in detail with respect to the flow shown in fig. 4, the method may specifically include the following steps:
step S231: and filtering the attribute feature data with the information value smaller than the first information value in the third amount of attribute feature data to obtain a fifth amount of attribute feature data.
In some embodiments, the third amount of attribute feature data and the fourth amount of behavior feature data may be filtered by the information value IV. Specifically, a third amount of attribute feature data may be filtered by the first information value, and a fourth amount of behavior feature data may be filtered by the second information value, where for the attribute feature data, the general amount is smaller and the corresponding information value is lower, the first information value may be set relatively loosely for filtering, and for the behavior feature data, the general amount is larger and the corresponding information value is larger, the second information value may be set relatively strictly for filtering, and therefore, the first information value may be set smaller than the second information value, for example, the first information value is set to 0.001, and the second information value is set to 0.01.
In some embodiments, after obtaining the third amount of attribute feature data, an information value corresponding to each attribute feature data in the third amount of attribute feature data may be obtained, the information value corresponding to each attribute feature data may be compared with the first information value, and attribute feature data having an information value smaller than the first information value are filtered from the third amount of attribute feature data according to the comparison result, so as to obtain a fifth amount of attribute feature data, where the fifth amount is smaller than or equal to the third amount.
Step S232: and filtering the behavior characteristic data with the information value smaller than the second information value in the fourth amount of behavior characteristic data to obtain a sixth amount of behavior characteristic data, wherein the first information value is smaller than the second information value.
In some embodiments, after obtaining the fourth amount of behavior feature data, an information value corresponding to each behavior feature data in the fourth amount of behavior feature data may be obtained, the information value corresponding to each behavior feature data may be compared with the second information value, and attribute feature data having an information value smaller than the second information value is filtered from the fourth amount of behavior feature data according to the comparison result, so as to obtain a sixth amount of behavior feature data, where the sixth amount is smaller than or equal to the fourth amount.
As one way, it is assumed that there are 5w attribute feature data of the third number, 15w behavior feature data of the fourth number, the first information value is 0.001, and the second information value is 0.01. Attribute feature data with an IV >0.001 may be selected from the attribute feature data, for example 3000 attribute feature data may be selected from 5W attribute feature data, and behavior feature data with an IV >0.01 may be selected from the behavior feature data, for example 6000 behavior feature data may be selected from 15W behavior feature data.
Step S233: obtaining the first quantity of feature data based on the fifth quantity of attribute feature data and the sixth quantity of behavior feature data.
In some embodiments, after obtaining the fifth amount of attribute feature data and the sixth amount of behavior feature data, the first amount of feature data may be obtained based on the fifth amount of attribute feature data and the sixth amount of behavior feature data. As one approach, a fifth amount of attribute feature data and a sixth amount of behavior feature data may be summed to obtain a first amount of feature data.
Step S240: and traversing the first quantity of characteristic data, and carrying out cluster analysis on the first quantity of characteristic data to obtain a plurality of result clusters.
Step S250: and performing anomaly analysis on the characteristic data contained in each result cluster in the plurality of result clusters to obtain an anomaly analysis result of each result cluster.
Step S260: and determining the user to be detected corresponding to the characteristic data contained in the result cluster of which the abnormal analysis result meets the preset abnormal condition as the abnormal user.
For the detailed description of steps S240 to S260, please refer to steps S130 to S150, which are not described herein again.
In another embodiment of the application, a method for detecting an abnormal user includes obtaining user data of each user to be detected, performing feature extraction on the user data based on a preset feature extraction condition to obtain a second number of feature data, performing feature screening on the feature data of the second data based on a preset feature screening condition to obtain feature data of first data, traversing the first number of feature data, performing cluster analysis on the first number of feature data to obtain a plurality of result clusters, performing abnormal analysis on the feature data included in each of the plurality of result clusters to obtain an abnormal analysis result of each result cluster, and determining the user to be detected corresponding to the feature data included in the result cluster whose abnormal analysis result satisfies the preset abnormal condition as the abnormal user. Compared with the detection method for the abnormal user shown in fig. 1, in the embodiment, the user data is subjected to feature processing based on the preset feature extraction condition and the feature screening condition to obtain a first amount of feature data, so as to improve the probability of subsequently processed effective data and reduce the time required for traversal.
Referring to fig. 5, fig. 5 is a schematic flowchart illustrating a method for detecting an abnormal user according to still another embodiment of the present application. As will be described in detail with respect to the flow shown in fig. 5, the method for detecting an abnormal user may specifically include the following steps:
step S310: and acquiring user data of each user to be detected.
Step S320: and performing feature processing on the user data based on a preset feature condition to obtain a first amount of feature data.
For the detailed description of steps S310 to S320, please refer to steps S110 to S120, which are not described herein again.
Step S330: and performing feature combination on the first quantity of feature data based on a preset multistage label to obtain a plurality of combined features, wherein the preset multistage label is obtained based on a preset abnormal user type.
The possibility of mass feature combinations is almost infinite, for example, 5 to 50 feature data are selected from 500 feature data to be combined, so that 2.602113395257136e +69 are possible to be combined, such calculation can hardly be completed, and even if the calculation can be completed, resources and time are wasted. Therefore, in the embodiment, the multilevel tags are preset in the manner, and a large number of combination manners are attributed to the exception or fraud of the preset exception user type, so that the time required by traversing the feature data is effectively reduced.
In this embodiment, the electronic device may set a preset multi-level tag (as shown in table 1) based on a preset abnormal user type, and perform feature combination on the first amount of feature data based on the preset multi-level tag to obtain a plurality of combination features. Wherein the preset abnormal user type may comprise a fraud type.
TABLE 1
Figure BDA0002828387700000111
Step S340: and obtaining the type of the abnormal user to be detected.
In this embodiment, when the electronic device detects the abnormal user, it may also detect different abnormal types in a targeted manner, so that in this embodiment, the type of the abnormal user to be detected may be obtained, and the type of the abnormal user to be detected that needs to be detected may be detected in a targeted manner.
Step S350: and determining a target combined feature from the plurality of combined features based on the type of the abnormal user to be detected.
In this embodiment, after the type of the abnormal user to be detected is obtained, the target combination feature may be determined from the plurality of combination features based on the type of the abnormal user to be detected. In some embodiments, the type of the abnormal user to be detected is taken as a fraud type (batch registration of fraud account user types), such users to be detected are represented as similar account names, similar mobile phone models, similar mobile phone numbers and very close in active time, and for the fraud type, only the most critical account name, mobile phone type, mobile phone number and user active time (such as credit applying time) need to be extracted, the similarity is calculated, and a group with a high fraud ratio, namely a key attention object, is judged through clustering or community discovery and the like, so that scanning of random and non-purpose full-amount feature data can be avoided, and therefore the efficiency of fraud scanning is greatly improved. By one approach, the target combination feature corresponding to the fraud type may include: attribute feature data _ account information exception + behavior feature data _ apply for behavior exception.
Step S360: and traversing the feature data in the target combination features, and performing cluster analysis on the feature data in the target combination features to obtain the plurality of result clusters.
In this embodiment, after the target combination feature is determined, the feature data in the target combination feature may be traversed, and the feature data in the target combination feature may be subjected to cluster analysis based on a traversal result for the feature data in the target combination feature to obtain a plurality of result clusters, so that the search space is also reduced while different types of fraud are pertinently identified, thereby speeding up the search time.
Step S370: and performing anomaly analysis on the characteristic data contained in each result cluster in the plurality of result clusters to obtain an anomaly analysis result of each result cluster.
Step S380: and determining the user to be detected corresponding to the characteristic data contained in the result cluster of which the abnormal analysis result meets the preset abnormal condition as the abnormal user.
For the detailed description of steps S370 to S380, refer to steps S140 to S150, which are not described herein again.
In another embodiment of the present application, a method for detecting an abnormal user includes obtaining user data of each user to be detected, performing feature processing on the user data based on a preset feature extraction condition to obtain a first number of feature data, performing feature combination on the feature data of the first data based on a preset multi-level tag to obtain a plurality of combination features, where the preset multi-level tag is obtained based on a preset abnormal user type to obtain a type of the user to be detected, determining a target combination feature from the plurality of combination features based on the type of the user to be detected, traversing the feature data in the target combination feature, performing cluster analysis on the feature data in the target combination feature to obtain a plurality of result clusters, performing abnormal analysis on the feature data included in each of the plurality of result clusters to obtain an abnormal analysis result of each of the result clusters, and performing abnormal analysis on the feature data included in the result cluster whose abnormal analysis result satisfies the preset abnormal condition with respect to the user to be detected And determining as an abnormal user. Compared with the detection method for the abnormal user shown in fig. 1, the embodiment also creates the preset multi-level tags based on the preset abnormal user type, and performs feature combination on the first amount of feature data according to the preset multi-level tags, thereby effectively reducing the time required for traversal.
Referring to fig. 6, fig. 6 is a schematic flowchart illustrating a method for detecting an abnormal user according to another embodiment of the present application. As will be described in detail with respect to the flow shown in fig. 6, the method for detecting an abnormal user may specifically include the following steps:
step S410: and acquiring user data of each user to be detected.
Step S420: and performing feature processing on the user data based on a preset feature condition to obtain a first amount of feature data.
Step S430: the first quantity of feature data is traversed.
For detailed description of steps S410 to S430, please refer to steps S110 to S130, which are not described herein again.
Step S440: and determining a target clustering algorithm from a plurality of clustering algorithms based on the magnitude relation between the first quantity and the preset quantity.
In this embodiment, since different clustering algorithms are suitable for different scenarios, the clustering algorithm needs to be selected. By one approach, most clustering algorithms are difficult to use because they take an infinite amount of time when the amount of feature data is large to a certain amount, such as more than 10 ten thousand pieces of data. For example, experiments show that only KMeans and HDBSCAN are available when the data volume of the feature data exceeds a certain degree. Meanwhile, different clustering algorithms have different stabilities, for example, kmans is greatly influenced by random initialization, so in this embodiment, a clustering algorithm can be selected to balance operation efficiency and performance.
In some embodiments, the electronic device is preset and stores a preset number, wherein the preset number is used as a judgment basis for the first number. Therefore, in this embodiment, after obtaining the first number of feature data, the first number may be compared with a preset number to obtain a magnitude relationship between the first number and the preset number, and the target clustering algorithm may be determined from the plurality of clustering algorithms based on the magnitude relationship between the first number and the preset number. As a way of example, the plurality of clustering algorithms may include MiniBatchKMeans clustering algorithm, HDBSCAN clustering algorithm, and KMeans clustering algorithm, wherein KMeans clustering algorithm refers to k-means clustering, which is a vector quantization method, initially starting from signal processing, aiming to divide n observations into k clusters, wherein each observation belongs to a cluster with the smallest mean (cluster center or cluster centroid). The MiniBatchMeans clustering algorithm refers to a small-batch K-means algorithm, and compared with K-means, the calculation process does not use all data samples, and a part of samples from different classes of samples are extracted to represent the respective classes for carrying out the algorithm. The HDBSCAN clustering algorithm is clustering application based on noise space density, and has the core that a high-density centroid is found, so that the effect on data containing similar density is good.
Referring to fig. 7, fig. 7 is a flowchart illustrating step S440 of the abnormal user detection method illustrated in fig. 6 of the present application. In this embodiment, the clustering algorithms include a minibatchkmans clustering algorithm, an HDBSCAN clustering algorithm, and a kmans clustering algorithm, and the following will be described in detail with respect to the flow shown in fig. 7, where the method may specifically include the following steps:
step S441: when the first number is larger than a first preset number, determining the MiniBatchKMeans clustering algorithm from the plurality of clustering algorithms as the target clustering algorithm.
In some embodiments, the electronic device may preset and store a first preset number and a second preset number, where the first preset number is smaller than the second preset number, and both the first preset number and the second preset number are used as a basis for determining the first number. Therefore, in this embodiment, after obtaining the first number of feature data, the first number may be compared with the first preset number and the second preset number respectively to obtain a magnitude relationship between the first number and the first preset number and a magnitude relationship between the first number and the second preset number.
In this embodiment, when the comparison result indicates that the first number is greater than the first preset number, the MiniBatchKMeans clustering algorithm may be determined from the plurality of clustering algorithms as the target clustering algorithm. For example, if the first preset number is 10 ten thousand, when the comparison result indicates that the first number is greater than 10 ten thousand, the MiniBatchKMeans clustering algorithm may be determined from the plurality of clustering algorithms as the target clustering algorithm.
Step S442: and when the first number is smaller than a second preset number, determining the HDBSCAN clustering algorithm from the plurality of clustering algorithms as the target clustering algorithm.
In this embodiment, when the comparison result indicates that the first number is smaller than the second preset number, the HDBSCAN clustering algorithm may be determined from the plurality of clustering algorithms as the target clustering algorithm. For example, if the second preset number is 5 ten thousand, when the comparison result indicates that the first number is less than 5 ten thousand, the HDBSCAN clustering algorithm may be determined from the plurality of clustering algorithms as the target clustering algorithm.
Step S443: when the first number is not less than the second preset number and not more than the first preset number, determining the KMeans clustering algorithm from the plurality of clustering algorithms as the target clustering algorithm.
In this embodiment, when the comparison result indicates that the first number is not less than the second preset number and not greater than the first preset number, the KMeans clustering algorithm may be determined from the plurality of clustering algorithms as the target clustering algorithm. For example, if the second preset number is 5 thousands and the first preset number is 10 thousands, when the comparison result indicates that the first number is not less than 5 thousands and not more than 10 thousands, the KMeans clustering algorithm may be determined from the plurality of clustering algorithms as the target clustering algorithm.
Step S450: and performing clustering analysis on the first quantity of characteristic data based on the target clustering algorithm to obtain a plurality of result clusters.
In this embodiment, after obtaining the first amount of feature data and determining the target clustering algorithm, the first amount of feature data may be subjected to clustering analysis based on the target clustering algorithm to obtain a plurality of result clusters.
Step S460: and performing anomaly analysis on the characteristic data contained in each result cluster in the plurality of result clusters to obtain an anomaly analysis result of each result cluster.
Step S470: and determining the user to be detected corresponding to the characteristic data contained in the result cluster of which the abnormal analysis result meets the preset abnormal condition as the abnormal user.
For the detailed description of steps S460 to S470, refer to steps S140 to S150, which are not described herein again.
The method for detecting an abnormal user according to another embodiment of the present application includes obtaining user data of each user to be detected, performing feature processing on the user data based on a preset feature condition to obtain a first amount of feature data, traversing the first amount of feature data, determining a target clustering algorithm from a plurality of clustering algorithms based on a size relationship between the first amount and the preset amount, performing cluster analysis on the first amount of feature data based on the target clustering algorithm to obtain a plurality of result clusters, performing abnormal analysis on feature data included in each of the plurality of result clusters to obtain an abnormal analysis result of each result cluster, and determining a user to be detected corresponding to feature data included in a result cluster whose abnormal analysis result satisfies the preset abnormal condition as an abnormal user. Compared with the detection method for the abnormal users shown in fig. 1, the embodiment also determines the target clustering algorithm based on the magnitude relationship between the first quantity and the preset quantity, so that the accuracy of the determined target clustering algorithm is improved, and the clustering effect is improved.
Referring to fig. 8, fig. 8 is a schematic flowchart illustrating a method for detecting an abnormal user according to yet another embodiment of the present application. As will be described in detail with respect to the flow shown in fig. 8, the method for detecting an abnormal user may specifically include the following steps:
step S510: and acquiring user data of each user to be detected.
Step S520: and performing feature processing on the user data based on a preset feature condition to obtain a first amount of feature data.
For the detailed description of steps S510 to S520, refer to steps S110 to S120, which are not described herein again.
Step S530: and traversing the first quantity of feature data, and performing cluster analysis on the first quantity of feature data to obtain a plurality of result clusters to be determined.
As a way, since the clustering algorithms all need to manually set the clustering number, there may be a result cluster with better clustering quality or a result cluster with poorer clustering quality in the plurality of result clusters obtained by the clustering algorithms. Therefore, in this embodiment, after the first amount of feature data is obtained, the first amount of feature data may be traversed, and the first amount of feature data is subjected to cluster analysis based on the traversal result for the first amount of feature data, so as to obtain a plurality of result clusters to be determined.
Step S540: and processing the result clusters to be determined based on Silhouette Coefficient and Calinski Harabasz to obtain the result clusters.
In this embodiment, after obtaining a plurality of result clusters to be determined, the plurality of result clusters to be determined may be processed based on the Silhouette Coefficient and Calinski harabsasz to obtain a plurality of result clusters, so that the finally obtained plurality of result clusters are better.
In some embodiments, may be based on score ═ ωsil*scoresilch*scorechObtaining a final cluster of results, wherein scoresilRepresents Silhouette coeffient score, scorechRepresents Calinski Harabasz score. OmegasilAnd ωchThe hyper-parameters respectively represent the weights of the Silhouette Coefficient score and the Calinski Harabasz score, and can be set according to experience or obtained by selecting part of important features and samples and performing pre-training. The Silhouette Coefficient score is an evaluation mode for evaluating the good and bad clustering effect, the clustering effect is measured through the cohesion degree and the separation degree, the Calinski Harabasz score is an index for evaluating the clustering effect and is also called as a variance ratio standard, and the score is defined as the ratio between the intra-group dispersion degree and the inter-group dispersion degree.
Wherein the content of the first and second substances,
Figure BDA0002828387700000161
n is the total number of feature data,
Figure BDA0002828387700000162
is the minimum of the average distance of the ith sample from the intra-cluster samples of the different clusters,
Figure BDA0002828387700000163
is equal to the average distance from the sample i to other sample points in the same cluster, k is the cluster number of the cluster, BkAs a covariance matrix between classes, WkIs the covariance matrix of the data inside the class, tr is the trace of the matrix.
Step S550: and performing anomaly analysis on the characteristic data contained in each result cluster in the plurality of result clusters to obtain an anomaly analysis result of each result cluster.
Step S560: and determining the user to be detected corresponding to the characteristic data contained in the result cluster of which the abnormal analysis result meets the preset abnormal condition as the abnormal user.
For the detailed description of steps S550 to S560, refer to steps S140 to S150, which are not described herein again.
In another embodiment of the application, a method for detecting an abnormal user includes obtaining user data of each user to be detected, performing feature processing on the user data based on a preset feature condition to obtain a first amount of feature data, traversing the first amount of feature data, performing cluster analysis on the first amount of feature data to obtain a plurality of result clusters to be determined, processing the plurality of result clusters to be determined based on a simple task Coefficient and Calinski harasz to obtain a plurality of result clusters, performing anomaly analysis on feature data included in each of the plurality of result clusters to obtain an anomaly analysis result of each result cluster, and determining a user to be detected corresponding to feature data included in a result cluster whose anomaly analysis result satisfies a preset anomaly condition as an abnormal user. Compared with the detection method for the abnormal user shown in fig. 1, the embodiment also obtains a better result cluster based on the Silhouette Coefficient and Calinski harabsasz, thereby avoiding instability caused by single-value index evaluation.
Referring to fig. 9, fig. 9 is a schematic flowchart illustrating a method for detecting an abnormal user according to yet another embodiment of the present application. In this embodiment, the plurality of result clusters include a target result cluster, and as will be described in detail with reference to the flow shown in fig. 9, the method for detecting an abnormal user may specifically include the following steps:
step S610: and acquiring user data of each user to be detected.
Step S620: and performing feature processing on the user data based on a preset feature condition to obtain a first amount of feature data.
Step S630: and traversing the first quantity of characteristic data, and carrying out cluster analysis on the first quantity of characteristic data to obtain a plurality of result clusters.
Step S640: and performing anomaly analysis on the characteristic data contained in each result cluster in the plurality of result clusters to obtain an anomaly analysis result of each result cluster.
For the detailed description of steps S610 to S640, refer to steps S110 to S140, which are not described herein again.
Step S650: and when the abnormal analysis result of the target result cluster represents that the proportion of the corresponding abnormal users in the target result cluster is greater than a preset proportion and the abnormal value of the target result cluster is greater than a preset abnormal value, determining that the target result cluster meets the preset abnormal condition.
In this embodiment, for the feature data in the target result cluster, the proportion of the abnormal users in the users to be detected corresponding to the feature data included in the target result cluster may be calculated, and the proportion of the abnormal users is compared with a preset proportion which is preset and stored, so as to determine whether the proportion of the abnormal users is greater than the preset proportion. In one embodiment, the predetermined ratio is at least one-half.
In this embodiment, for a target result cluster, an abnormal value of the target result cluster may be calculated and compared with a preset abnormal value that is set and stored in advance to determine whether the abnormal value of the target result cluster is greater than the preset abnormal value. In some embodiments, an outlier of the target outcome cluster may be calculated by isolationf orest.
In some embodiments, when the proportion of the corresponding abnormal users in the target result cluster is greater than a preset proportion and the abnormal value of the target result cluster is greater than a preset abnormal value, it may be determined that the target result cluster satisfies a preset abnormal condition.
Step S660: and determining the user to be detected corresponding to the characteristic data contained in the target result cluster as an abnormal user.
In this embodiment, when it is determined that the target result cluster satisfies the preset abnormal condition, the user to be detected corresponding to the feature data included in the target result cluster may be determined as an abnormal user.
In yet another embodiment of the present application, a method for detecting an abnormal user is provided, where user data of each user to be detected is obtained, feature processing is performed on the user data based on a preset feature condition, a first quantity of feature data is obtained, the first quantity of feature data is traversed, cluster analysis is performed on the first quantity of feature data, a plurality of result clusters are obtained, abnormality analysis is performed on the feature data included in each of the plurality of result clusters, an abnormality analysis result of each result cluster is obtained, when a ratio of abnormal users corresponding to the abnormal user in a target result cluster represented by the abnormality analysis result of the target result cluster is greater than a preset ratio and an abnormal value of the target result cluster is greater than a preset abnormal value, it is determined that the target result cluster satisfies the preset abnormal condition, and the user to be detected corresponding to the feature data included in the target result cluster is determined as the abnormal user. Compared with the method for detecting an abnormal user shown in fig. 1, in this embodiment, when the proportion of the corresponding abnormal user in the target result cluster is greater than the preset proportion and the abnormal value of the target result cluster is greater than the preset abnormal value, the user to be detected corresponding to the feature data included in the target result cluster is determined as the abnormal user, so that the accuracy of determining the abnormal user is improved.
Referring to fig. 10, fig. 10 is a block diagram illustrating a detection apparatus for an abnormal user according to an embodiment of the present application. The apparatus 200 for detecting an abnormal user is applied to the electronic device, and will be explained with reference to the block diagram shown in fig. 10, where the apparatus 200 for detecting an abnormal user includes: a user data obtaining module 210, a feature data obtaining module 220, a result cluster obtaining module 230, an abnormal analysis result obtaining module 240, and an abnormal user detecting module 250, wherein:
the user data obtaining module 210 is configured to obtain user data of each user to be detected.
A feature data obtaining module 220, configured to perform feature processing on the user data based on a preset feature condition, so as to obtain a first amount of feature data.
Further, the feature data obtaining module 220 includes: a first feature data obtaining sub-module and a second feature data obtaining sub-module, wherein: a first attribute feature data obtaining unit, a first behavior feature data obtaining unit, and a first feature data obtaining unit, wherein:
and the first feature data obtaining submodule is used for carrying out feature extraction on the user data based on a preset feature extraction condition to obtain a second amount of feature data.
And the first attribute feature data acquisition unit is used for extracting attribute features of the user data to acquire a third amount of attribute feature data.
And the first behavior feature data acquisition unit is used for extracting behavior features of the user data to acquire a fourth amount of behavior feature data.
A first feature data obtaining unit, configured to obtain the second number of feature data based on the third number of attribute feature data and the fourth number of behavior feature data.
And the second characteristic data obtaining submodule is used for carrying out characteristic screening on the second quantity of characteristic data based on preset characteristic screening conditions to obtain the first quantity of characteristic data.
Further characteristically, the second feature data obtaining sub-module includes: a second attribute feature data obtaining unit, a second behavior feature data obtaining unit, and a second feature data obtaining unit, wherein:
and a second attribute feature data obtaining unit, configured to filter, from the third amount of attribute feature data, attribute feature data whose information value is smaller than the first information value, and obtain a fifth amount of attribute feature data.
And a second behavior feature data obtaining unit, configured to filter, from the fourth amount of behavior feature data, behavior feature data whose information value is smaller than a second information value, and obtain a sixth amount of behavior feature data, where the first information value is smaller than the second information value.
A second feature data obtaining unit, configured to obtain the first number of feature data based on the fifth number of attribute feature data and the sixth number of behavior feature data.
A result cluster obtaining module 230, configured to traverse the first amount of feature data, and perform cluster analysis on the first amount of feature data to obtain a plurality of result clusters.
Further, the result cluster obtaining module 230 includes: a combined feature obtaining submodule, an abnormal user type obtaining submodule, a target combined feature determining submodule and a first result cluster obtaining submodule, wherein:
and the combined feature obtaining submodule is used for carrying out feature combination on the first quantity of feature data based on a preset multistage tag to obtain a plurality of combined features, wherein the preset multistage tag is obtained based on a preset abnormal user type.
And the abnormal user type obtaining submodule is used for obtaining the type of the abnormal user to be detected.
And the target combination characteristic determining submodule is used for determining a target combination characteristic from the plurality of combination characteristics based on the type of the abnormal user to be detected.
And the first result cluster obtaining submodule is used for traversing the feature data in the target combined feature and carrying out cluster analysis on the feature data in the target combined feature to obtain the plurality of result clusters.
Further, the result cluster obtaining module 230 includes: a target clustering algorithm determination submodule and a second result cluster obtaining submodule, wherein:
and the target clustering algorithm determining submodule is used for determining a target clustering algorithm from a plurality of clustering algorithms based on the magnitude relation between the first quantity and the preset quantity.
Further, the clustering algorithms include a MiniBatchKMeans clustering algorithm, a HDBSCAN clustering algorithm, and a KMeans clustering algorithm, and the target clustering algorithm determination sub-module includes: a first target clustering algorithm determining unit, a second target clustering algorithm determining unit and a third target clustering algorithm determining unit, wherein:
a first target clustering algorithm determining unit, configured to determine the MiniBatchKMeans clustering algorithm as the target clustering algorithm from the multiple clustering algorithms when the first number is greater than a first preset number.
And the second target clustering algorithm determining unit is used for determining the HDBSCAN clustering algorithm as the target clustering algorithm from the plurality of clustering algorithms when the first number is smaller than a second preset number.
A third target clustering algorithm determining unit, configured to determine the kmans clustering algorithm as the target clustering algorithm from the plurality of clustering algorithms when the first number is not less than the second preset number and not greater than the first preset number.
And the second result cluster obtaining submodule is used for carrying out clustering analysis on the first quantity of characteristic data based on the target clustering algorithm to obtain a plurality of result clusters.
Further, the result cluster obtaining module 230 includes: a result cluster obtaining submodule to be determined and a third result cluster obtaining submodule, wherein:
and the result cluster to be determined obtaining submodule is used for carrying out clustering analysis on the first quantity of characteristic data to obtain a plurality of result clusters to be determined.
And the third result cluster obtaining sub-module is used for processing the plurality of result clusters to be determined based on Silhouette Coefficient and Calinski Harabasz to obtain the plurality of result clusters.
An anomaly analysis result obtaining module 240, configured to perform anomaly analysis on the feature data included in each result cluster of the multiple result clusters, so as to obtain an anomaly analysis result of each result cluster.
And the abnormal user detection module 250 is configured to determine a user to be detected corresponding to the feature data included in the result cluster in which the abnormal analysis result meets the preset abnormal condition as an abnormal user.
Further, the plurality of result clusters includes a target result cluster, and the abnormal user detecting module 250 includes: an abnormal user detection sub-module and an abnormal user determination sub-module, wherein:
and the abnormal user detection sub-module is used for determining that the target result cluster meets the preset abnormal condition when the abnormal analysis result of the target result cluster indicates that the proportion of the corresponding abnormal users in the target result cluster is greater than a preset proportion and the abnormal value of the target result cluster is greater than a preset abnormal value.
And the abnormal user determining submodule is used for determining the user to be detected corresponding to the characteristic data contained in the target result cluster as the abnormal user.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, the coupling between the modules may be electrical, mechanical or other type of coupling.
In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
Referring to fig. 11, a block diagram of an electronic device 100 according to an embodiment of the present disclosure is shown. The electronic device 100 may be a smart phone, a tablet computer, an electronic book, or other electronic devices capable of running an application. The electronic device 100 in the present application may include one or more of the following components: a processor 110, a memory 120, and one or more applications, wherein the one or more applications may be stored in the memory 120 and configured to be executed by the one or more processors 110, the one or more programs configured to perform a method as described in the aforementioned method embodiments.
Processor 110 may include one or more processing cores, among other things. The processor 110 connects various parts within the overall electronic device 100 using various interfaces and lines, and performs various functions of the electronic device 100 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 120 and calling data stored in the memory 120. Alternatively, the processor 110 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 110 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content to be displayed; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 110, but may be implemented by a communication chip.
The Memory 120 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 120 may be used to store instructions, programs, code sets, or instruction sets. The memory 120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The data storage area may also store data created by the electronic device 100 during use (e.g., phone book, audio-video data, chat log data), and the like.
Referring to fig. 12, a block diagram of a computer-readable storage medium according to an embodiment of the present application is shown. The computer-readable medium 300 has stored therein a program code that can be called by a processor to execute the method described in the above-described method embodiments.
The computer-readable storage medium 300 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Alternatively, the computer-readable storage medium 300 includes a non-volatile computer-readable storage medium. The computer readable storage medium 300 has storage space for program code 310 for performing any of the method steps of the method described above. The program code can be read from or written to one or more computer program products. The program code 310 may be compressed, for example, in a suitable form.
To sum up, the method, the device, the electronic device, and the storage medium for detecting abnormal users provided in the embodiments of the present application obtain user data of each user to be detected, perform feature processing on the user data based on a preset feature condition to obtain a first amount of feature data, traverse the first amount of feature data, perform cluster analysis on the first amount of feature data to obtain a plurality of result clusters, perform anomaly analysis on feature data included in each result cluster of the plurality of result clusters to obtain an anomaly analysis result of each result cluster, and determine a user to be detected corresponding to feature data included in a result cluster whose anomaly analysis result satisfies the preset anomaly condition as an abnormal user.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (12)

1. A method for detecting an abnormal user, the method comprising:
acquiring user data of each user to be detected;
performing feature processing on the user data based on a preset feature condition to obtain a first amount of feature data;
traversing the first quantity of feature data, and performing cluster analysis on the first quantity of feature data to obtain a plurality of result clusters;
performing anomaly analysis on the characteristic data contained in each result cluster in the plurality of result clusters to obtain an anomaly analysis result of each result cluster;
and determining the user to be detected corresponding to the characteristic data contained in the result cluster of which the abnormal analysis result meets the preset abnormal condition as the abnormal user.
2. The method according to claim 1, wherein the performing feature processing on the user data based on a preset feature condition to obtain a first amount of feature data comprises:
performing feature extraction on the user data based on preset feature extraction conditions to obtain a second quantity of feature data;
and performing feature screening on the second quantity of feature data based on preset feature screening conditions to obtain the first quantity of feature data.
3. The method according to claim 2, wherein the performing feature extraction on the user data based on a preset feature extraction condition to obtain a second amount of feature data comprises:
extracting attribute features of the user data to obtain a third amount of attribute feature data;
performing behavior feature extraction on the user data to obtain a fourth amount of behavior feature data;
obtaining the second amount of feature data based on the third amount of attribute feature data and the fourth amount of behavior feature data.
4. The method according to claim 3, wherein the performing feature filtering on the second quantity of feature data based on a preset feature filtering condition to obtain the first quantity of feature data comprises:
filtering attribute feature data with information values smaller than the first information values in the third amount of attribute feature data to obtain a fifth amount of attribute feature data;
filtering the behavior characteristic data with the information value smaller than the second information value from the fourth amount of behavior characteristic data to obtain a sixth amount of behavior characteristic data, wherein the first information value is smaller than the second information value;
obtaining the first quantity of feature data based on the fifth quantity of attribute feature data and the sixth quantity of behavior feature data.
5. The method of claim 1, wherein traversing the first amount of feature data and performing cluster analysis on the first amount of feature data to obtain a plurality of result clusters comprises:
performing feature combination on the first quantity of feature data based on a preset multistage label to obtain a plurality of combined features, wherein the preset multistage label is obtained based on a preset abnormal user type;
acquiring the type of an abnormal user to be detected;
determining a target combination feature from the plurality of combination features based on the type of the abnormal user to be detected;
and traversing the feature data in the target combination features, and performing cluster analysis on the feature data in the target combination features to obtain the plurality of result clusters.
6. The method according to any one of claims 1-5, wherein performing cluster analysis on the first quantity of feature data to obtain a plurality of result clusters comprises:
determining a target clustering algorithm from a plurality of clustering algorithms based on the magnitude relation between the first quantity and a preset quantity;
and performing clustering analysis on the first quantity of characteristic data based on the target clustering algorithm to obtain a plurality of result clusters.
7. The method of claim 6, wherein the plurality of clustering algorithms comprises a MiniBatchKMeans clustering algorithm, a HDBSCAN clustering algorithm, and a KMeans clustering algorithm, and wherein determining the target clustering algorithm from the plurality of clustering algorithms based on a magnitude relationship between the first number and a preset number comprises:
when the first number is larger than a first preset number, determining the MiniBatchKMeans clustering algorithm from the plurality of clustering algorithms as the target clustering algorithm;
when the first number is smaller than a second preset number, determining the HDBSCAN clustering algorithm from the plurality of clustering algorithms as the target clustering algorithm; or
When the first number is not less than the second preset number and not more than the first preset number, determining the KMeans clustering algorithm from the plurality of clustering algorithms as the target clustering algorithm.
8. The method according to any one of claims 1-5, wherein performing cluster analysis on the first quantity of feature data to obtain a plurality of result clusters comprises:
performing cluster analysis on the first quantity of characteristic data to obtain a plurality of result clusters to be determined;
and processing the result clusters to be determined based on Silhouette Coefficient and Calinski Harabasz to obtain the result clusters.
9. The method according to any one of claims 1 to 5, wherein the plurality of result clusters include a target result cluster, and the determining that the user to be detected corresponding to the feature data included in the result cluster in which the abnormal analysis result satisfies the preset abnormal condition is an abnormal user includes:
when the abnormal analysis result of the target result cluster represents that the proportion of the corresponding abnormal users in the target result cluster is greater than a preset proportion and the abnormal value of the target result cluster is greater than a preset abnormal value, determining that the target result cluster meets the preset abnormal condition;
and determining the user to be detected corresponding to the characteristic data contained in the target result cluster as an abnormal user.
10. An apparatus for detecting an abnormal user, the apparatus comprising:
the user data acquisition module is used for acquiring the user data of each user to be detected;
the characteristic data acquisition module is used for carrying out characteristic processing on the user data based on preset characteristic conditions to acquire a first amount of characteristic data;
a result cluster obtaining module, configured to traverse the first amount of feature data, and perform cluster analysis on the first amount of feature data to obtain a plurality of result clusters;
an anomaly analysis result obtaining module, configured to perform anomaly analysis on feature data included in each of the multiple result clusters to obtain an anomaly analysis result of each result cluster;
and the abnormal user detection module is used for determining the user to be detected corresponding to the characteristic data contained in the result cluster of which the abnormal analysis result meets the preset abnormal condition as the abnormal user.
11. An electronic device comprising a memory and a processor, the memory coupled to the processor, the memory storing instructions that, when executed by the processor, the processor performs the method of any of claims 1-9.
12. A computer-readable storage medium, having stored thereon program code that can be invoked by a processor to perform the method according to any one of claims 1 to 9.
CN202011455045.XA 2020-12-10 2020-12-10 Abnormal user detection method and device, electronic equipment and storage medium Pending CN112734433A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011455045.XA CN112734433A (en) 2020-12-10 2020-12-10 Abnormal user detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011455045.XA CN112734433A (en) 2020-12-10 2020-12-10 Abnormal user detection method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112734433A true CN112734433A (en) 2021-04-30

Family

ID=75599896

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011455045.XA Pending CN112734433A (en) 2020-12-10 2020-12-10 Abnormal user detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112734433A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115967542A (en) * 2022-11-29 2023-04-14 腾讯科技(深圳)有限公司 Human factor-based intrusion detection method, device, equipment and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180114016A1 (en) * 2016-10-24 2018-04-26 Samsung Sds Co., Ltd. Method and apparatus for detecting anomaly based on behavior-analysis
CN109447461A (en) * 2018-10-26 2019-03-08 北京三快在线科技有限公司 User credit appraisal procedure and device, electronic equipment, storage medium
CN111612038A (en) * 2020-04-24 2020-09-01 平安直通咨询有限公司上海分公司 Abnormal user detection method and device, storage medium and electronic equipment
CN111612037A (en) * 2020-04-24 2020-09-01 平安直通咨询有限公司上海分公司 Abnormal user detection method, device, medium and electronic equipment
CN111737461A (en) * 2020-06-03 2020-10-02 新华网股份有限公司 Text processing method and device, electronic equipment and computer readable storage medium
CN111783875A (en) * 2020-06-29 2020-10-16 中国平安财产保险股份有限公司 Abnormal user detection method, device, equipment and medium based on cluster analysis
CN111915418A (en) * 2020-05-25 2020-11-10 百维金科(上海)信息科技有限公司 Internet financial fraud online detection method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180114016A1 (en) * 2016-10-24 2018-04-26 Samsung Sds Co., Ltd. Method and apparatus for detecting anomaly based on behavior-analysis
CN109447461A (en) * 2018-10-26 2019-03-08 北京三快在线科技有限公司 User credit appraisal procedure and device, electronic equipment, storage medium
CN111612038A (en) * 2020-04-24 2020-09-01 平安直通咨询有限公司上海分公司 Abnormal user detection method and device, storage medium and electronic equipment
CN111612037A (en) * 2020-04-24 2020-09-01 平安直通咨询有限公司上海分公司 Abnormal user detection method, device, medium and electronic equipment
CN111915418A (en) * 2020-05-25 2020-11-10 百维金科(上海)信息科技有限公司 Internet financial fraud online detection method and device
CN111737461A (en) * 2020-06-03 2020-10-02 新华网股份有限公司 Text processing method and device, electronic equipment and computer readable storage medium
CN111783875A (en) * 2020-06-29 2020-10-16 中国平安财产保险股份有限公司 Abnormal user detection method, device, equipment and medium based on cluster analysis

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115967542A (en) * 2022-11-29 2023-04-14 腾讯科技(深圳)有限公司 Human factor-based intrusion detection method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN105590055B (en) Method and device for identifying user credible behaviors in network interaction system
CN110728526B (en) Address recognition method, device and computer readable medium
CN111090807B (en) Knowledge graph-based user identification method and device
CN110909222B (en) User portrait establishing method and device based on clustering, medium and electronic equipment
CN108985048B (en) Simulator identification method and related device
CN111445304B (en) Information recommendation method, device, computer equipment and storage medium
WO2020257991A1 (en) User identification method and related product
CN113762377B (en) Network traffic identification method, device, equipment and storage medium
CN111242744B (en) Individual behavior modeling and fraud detection method for low-frequency transaction
CN114186626A (en) Abnormity detection method and device, electronic equipment and computer readable medium
CN110827924A (en) Clustering method and device for gene expression data, computer equipment and storage medium
CN111798047A (en) Wind control prediction method and device, electronic equipment and storage medium
CN115238815A (en) Abnormal transaction data acquisition method, device, equipment, medium and program product
CN112612887A (en) Log processing method, device, equipment and storage medium
CN110611655B (en) Blacklist screening method and related product
CN115632874A (en) Method, device, equipment and storage medium for detecting threat of entity object
CN111353109A (en) Malicious domain name identification method and system
CN112734433A (en) Abnormal user detection method and device, electronic equipment and storage medium
US11658987B2 (en) Dynamic fraudulent user blacklist to detect fraudulent user activity with near real-time capabilities
CN113763057B (en) User identity portrait data processing method and device
CN109992960B (en) Counterfeit parameter detection method and device, electronic equipment and storage medium
EP3783543A1 (en) Learning system, learning method, and program
CN114513341B (en) Malicious traffic detection method, malicious traffic detection device, terminal and computer readable storage medium
CN110717817A (en) Pre-loan approval method and device, electronic equipment and computer-readable storage medium
CN107665443B (en) Obtain the method and device of target user

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210721

Address after: 518052 2501, office building T2, Qianhai China Resources Financial Center, 55 guiwan 4th Road, Nanshan street, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen City, Guangdong Province

Applicant after: Shenzhen Huantai Digital Technology Co.,Ltd.

Address before: 518057 Fuan Science and Technology Building, Block B, No. 13, Yuehai Street, Nanshan District, Shenzhen City, Guangdong Province, 207-2

Applicant before: Shenzhen Huantai Technology Co.,Ltd.

Applicant before: OPPO Guangdong Mobile Communications Co.,Ltd.

TA01 Transfer of patent application right