CN108108743B - Abnormal user identification method and device for identifying abnormal user - Google Patents

Abnormal user identification method and device for identifying abnormal user Download PDF

Info

Publication number
CN108108743B
CN108108743B CN201611051585.5A CN201611051585A CN108108743B CN 108108743 B CN108108743 B CN 108108743B CN 201611051585 A CN201611051585 A CN 201611051585A CN 108108743 B CN108108743 B CN 108108743B
Authority
CN
China
Prior art keywords
abnormal
user
key
users
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611051585.5A
Other languages
Chinese (zh)
Other versions
CN108108743A (en
Inventor
陈善
田天
康伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Original Assignee
Baidu Online Network Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baidu Online Network Technology Beijing Co Ltd filed Critical Baidu Online Network Technology Beijing Co Ltd
Priority to CN201611051585.5A priority Critical patent/CN108108743B/en
Publication of CN108108743A publication Critical patent/CN108108743A/en
Application granted granted Critical
Publication of CN108108743B publication Critical patent/CN108108743B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses an abnormal user identification method and a device for identifying abnormal users. One embodiment of the method comprises: acquiring feature data of a plurality of users, and determining abnormal users in the plurality of users in an unsupervised learning mode based on the feature data; based on the determined feature data of the abnormal user, selecting key feature parameters for constructing a classification model from the plurality of feature parameters in a supervised learning mode, and generating key feature data containing the key feature parameters; and constructing a classification model by using the key characteristic data. The method and the device realize that the abnormal user is identified by adopting an unsupervised learning mode, and the classification model is constructed by adopting a supervised learning mode based on the key features selected from the feature data of the abnormal user, so that the classification model only adopts the key features with higher importance degree for identifying the abnormal user to identify the abnormal user, the interference of the features with lower importance degree to the identification process is avoided, the identification accuracy is improved, and the expense of the identification process is reduced.

Description

Abnormal user identification method and device for identifying abnormal user
Technical Field
The present application relates to the field of computers, and in particular, to the field of big data, and more particularly, to an abnormal user identification method and an abnormal user identification device.
Background
In big data analysis, the abnormal users often need to be identified and the data of the abnormal users are removed to improve the accuracy of big data analysis. At present, an identification rule is generally configured to determine whether a feature of a user matches the identification rule, and determine whether the user is an abnormal user.
However, when the above method is used to identify and remove the data of the abnormal user, on one hand, since the data of the user is in a massive level, the cost of the identification process is large because the feature information of each user is matched with the identification rule one by one. On the other hand, because the importance degree of each feature of the user to the identification of the abnormal user cannot be determined, a large number of features with low importance degree participate in calculation, so that interference to the identification process is caused, the accuracy is reduced, and the cost of the identification process is further increased.
Disclosure of Invention
The application provides an abnormal user identification method and an abnormal user identification device, which are used for solving the technical problems existing in the background technology part.
In a first aspect, the present application provides an abnormal user identification method, including: the method comprises the steps of obtaining feature data of a plurality of users, determining abnormal users in the plurality of users in an unsupervised learning mode based on the feature data, wherein the feature data comprise: a plurality of feature parameters indicating features of a user; based on the determined feature data of the abnormal user, selecting key feature parameters for constructing a classification model from the plurality of feature parameters in a supervised learning mode, and generating key feature data containing the key feature parameters; and constructing a classification model by using the key characteristic data so as to identify whether the user is an abnormal user or not by using the classification model.
In a second aspect, the present application provides an apparatus for identifying an abnormal user, the apparatus comprising: the identification unit is configured to acquire feature data of a plurality of users and determine abnormal users in the plurality of users in an unsupervised learning mode based on the feature data, wherein the feature data comprises: a plurality of feature parameters indicating features of a user; the selecting unit is configured to select key feature parameters for constructing a classification model from the plurality of feature parameters in a supervised learning mode based on the determined feature data of the abnormal user, and generate key feature data containing the key feature parameters; and the construction unit is configured to construct a classification model by using the key feature data so as to identify whether the user is an abnormal user by using the classification model.
According to the abnormal user identification method and the abnormal user identification device, the abnormal users in the multiple users are determined in an unsupervised learning mode by acquiring the feature data of the multiple users and based on the feature data, and the feature data comprise: a plurality of feature parameters indicating features of a user; based on the determined feature data of the abnormal user, selecting key feature parameters for constructing a classification model from the plurality of feature parameters in a supervised learning mode, and generating key feature data containing the key feature parameters; and constructing a classification model by using the key characteristic data. The method and the device realize that the abnormal user is identified by adopting an unsupervised learning mode, and the classification model is constructed by adopting a supervised learning mode based on the key features selected from the feature data of the abnormal user, so that the classification model only identifies the abnormal user by adopting the key features with higher importance degree for identifying the abnormal user, the interference of the features with lower importance degree to the identification process is avoided, the identification accuracy is improved, and meanwhile, the expense of the identification process is reduced.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
fig. 1 illustrates an exemplary system architecture of an abnormal user identification method or an apparatus for identifying an abnormal user, which may be applied to the present application;
FIG. 2 illustrates a flow diagram of one embodiment of an abnormal user identification method according to the present application;
FIG. 3 illustrates a flow diagram of another embodiment of an abnormal user identification method according to the present application;
FIG. 4 illustrates a schematic structural diagram of one embodiment of an apparatus for identifying anomalous users in accordance with the present application;
fig. 5 shows a schematic structural diagram of a computer system suitable for implementing the apparatus for identifying an abnormal user according to the embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows an exemplary system architecture 100 that can be applied to an embodiment of the abnormal user identification method or the apparatus for identifying an abnormal user of the present application.
As shown in fig. 1, the system architecture 100 may include terminals 101, 102, 103, a network 104, and a server 105. The network 104 is used to provide the medium of transmission links between the terminals 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless transmission links, or fiber optic cables, among others.
The user may use the terminals 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminals 101, 102, 103 may be installed with various communication applications, such as a search application, a group purchase application, an instant communication application, and the like.
The terminals 101, 102, 103 may be various electronic devices having display screens and supporting network communication, including but not limited to smart phones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.
The terminals 101, 102, 103 may collect feature parameters indicating features of the user, such as an account number, a user name, a telephone number, and a URL of the user, and transmit feature data including a plurality of feature parameters to the server 105. The server 105 may use the feature data to build a classification model for identifying whether the user is an anomalous user.
Referring to fig. 2, a flow 200 of one embodiment of an abnormal user identification method according to the present application is shown. The abnormal user identification method provided by the embodiment of the present application may be executed by the server 105 in fig. 1, and accordingly, a device for identifying an abnormal user may be disposed in the server 105. The method comprises the following steps:
step 201, an abnormal user in the plurality of users is determined in an unsupervised learning manner.
In this embodiment, the user's features may be described from multiple dimensions using multiple feature parameters. The characteristic data of the user includes: a plurality of feature parameters indicative of features of the user, wherein each feature may correspond to at least one feature parameter. After the feature data of each of the multiple users is obtained, an unsupervised learning mode can be adopted to determine abnormal users in the multiple users according to the feature data of the multiple users.
In this embodiment, the attribute of the user may be selected in advance from the attributes of the user based on the information of each attribute as the feature parameter. For example, the selected feature parameters include feature parameters such as an account number, a user name, a phone number, and a URL, and the feature parameters such as the account number, the user name, the phone number, and the URL of the user may be used to describe features of the user from multiple dimensions. Accordingly, the characteristic data of the user may include characteristic parameters such as an account number, a user name, a telephone number, and a URL of the user.
In some optional implementations of the present embodiment, determining an abnormal user of the plurality of users in an unsupervised learning manner includes: clustering the characteristic data of a plurality of users by adopting a clustering algorithm to obtain a plurality of clusters; and when the cluster contains the feature data matched with the preset abnormal feature data, determining the users corresponding to all the feature data in the cluster as abnormal users.
In this embodiment, the unsupervised learning manner may be a clustering algorithm, such as a density-based clustering algorithm. The characteristic data of a plurality of users can be clustered by adopting a clustering algorithm, and the characteristic data of the users with high association degree are aggregated to obtain a plurality of clusters. Each cluster may contain characteristic data of a plurality of users with high relevance.
In this embodiment, it may be respectively determined whether feature data matching the abnormal feature data exists in each cluster. The abnormal feature data may be composed in advance using feature parameters of the numerical abnormality. When the cluster contains the feature data matched with the preset abnormal feature data, the feature data of all the users in the cluster can be determined as the feature data of the abnormal user because the relevance of the feature information of the users in one cluster is high. Correspondingly, the user to which the feature data belongs is determined as an abnormal user.
Step 202, selecting key feature parameters based on the feature data of the abnormal user in a supervised learning mode, and generating the key feature data.
In this embodiment, after the abnormal user in the multiple users is determined in the unsupervised learning manner in step 201, the key feature parameters for constructing the classification model for identifying whether the user is the abnormal user may be selected from the multiple feature parameters in the supervised learning manner according to the feature data of the determined abnormal user, that is, the feature parameters important for identifying the abnormal user are selected.
Taking the example that the feature data of the user includes the feature parameters of the user, such as the account number, the user name, the phone number, and the URL, the feature data of the user determined in step 201 may be analyzed, and the feature parameters important for identifying the abnormal user may be selected from the feature parameters of the user, such as the account number, the user name, the phone number, and the URL.
And step 203, constructing a classification model by using the key characteristic data.
In this embodiment, after the key feature data including the key feature parameters are generated in step 202, a classification model may be constructed by using the key feature data, for example, the key feature data is used as a training sample to train the classification model, and the trained classification model is used to identify whether the user is an abnormal user.
In this embodiment, the classification model constructed by using the key feature data only identifies the abnormal user by using the key feature parameters determined in step 202, which have a higher importance degree for identifying the abnormal user, so that the interference of the features with a lower importance degree to the identification process is avoided, the identification accuracy is improved, and meanwhile, the cost of the identification process is reduced.
Referring to fig. 3, a flow 300 of another embodiment of an abnormal user identification method according to the present application is shown. The abnormal user identification method provided by the embodiment of the present application may be executed by the server 105 in fig. 1, and the method includes the following steps:
step 301, determining abnormal users in the plurality of users based on the feature data of the plurality of users in an unsupervised learning manner.
In this embodiment, the user's features may be described from multiple dimensions using multiple feature parameters. The characteristic data of the user includes: a plurality of feature parameters indicative of features of the user, wherein each feature corresponds to a feature parameter. For example, the feature data of the user includes feature parameters such as an account number, a user name, a telephone number, and a URL. After the feature data of each of the multiple users is obtained, an unsupervised learning mode can be adopted to determine abnormal users in the multiple users according to the feature data of the multiple users.
And 302, selecting key feature parameters based on the feature data of the abnormal user by adopting a decision tree or a naive Bayes algorithm, and generating key feature data.
In this embodiment, in order to construct a classification model for identifying an abnormal user, feature data of the abnormal user determined in step 301 may be analyzed in a supervised learning manner, and key feature parameters for constructing the classification model, that is, more important parameters in identifying the abnormal user, may be selected from the feature parameters.
In this embodiment, the supervised learning approach may adopt a decision tree. Before selecting the key feature parameters for constructing the classification model by using the decision tree, the decision tree may be constructed by using the feature data of the abnormal user determined in step 301. The decision tree is trained by taking the feature data of a plurality of abnormal users as training samples, so that the decision tree can learn the importance degree of each feature parameter in the feature data of the abnormal users in identifying the abnormal users. In the decision tree constructed by using the feature data of the abnormal user determined in step 301, a plurality of nodes are included, each node is a feature parameter, and the feature parameter corresponding to the node closer to the root node of the decision tree is more important in identifying the abnormal user. The characteristic parameters corresponding to the nodes with the depth greater than the depth threshold value in the decision tree, namely the more important characteristic parameters, can be selected as the key characteristic parameters for constructing the classification model.
Taking the example that the feature data of the user includes feature parameters such as an account, a user name, a telephone number, a URL, and the like, the decision tree constructed by using the feature data of the abnormal user includes nodes corresponding to the feature parameters such as the account, the user name, the telephone number, the URL, and the like, and in the decision tree, the depths of the nodes corresponding to the feature parameters such as the account, the user name, the telephone number, the URL, and the like in the decision tree are also different according to the difference of the importance degree of the feature parameters such as the account, the user name, the telephone number, the URL, and the like for identifying the abnormal user.
In this embodiment, after selecting the key feature parameters for constructing the classification model through the decision tree, the feature data of the abnormal user that satisfies the following conditions may be selected from the feature data of the abnormal user determined in step 301: and classifying the feature data of the abnormal user by the decision tree to obtain a classification result which is the abnormal user. Namely, the decision tree is adopted to classify the feature data of the abnormal user identified in step 301 again to obtain a classification result. When the classification result of the decision tree on the feature data of the abnormal user is the abnormal user, the key feature parameters in the feature data of the abnormal user can be combined to obtain key feature data, and a classification model is constructed by using the key feature data.
In this embodiment, a naive bayes algorithm may also be employed in the supervised learning manner. The feature data of the abnormal user determined in step 301 may be used to calculate the abnormal probability corresponding to each feature parameter by using a naive bayes algorithm, where the abnormal probability corresponding to a feature parameter is the probability that the user is an abnormal user when the value of the feature parameter is abnormal. The anomaly probability may represent how important the feature parameter is in identifying anomalous users. The feature parameters corresponding to a higher probability of abnormality are more important for identifying an abnormality. After the abnormal probabilities corresponding to the characteristic parameters are respectively calculated through a naive Bayes algorithm, the characteristic parameters with the corresponding abnormal probabilities larger than a probability threshold value can be used as key characteristic parameters for constructing a classification model.
In this embodiment, after selecting the key feature parameters for constructing the classification model by the naive bayes algorithm, the feature data of the abnormal user meeting the following conditions can be selected from the feature data of the abnormal user determined in step 301: and classifying the feature data of the abnormal user by adopting a naive Bayes algorithm to obtain a classification result, wherein the classification result is the abnormal user. That is, the feature data of the abnormal user identified in step 301 is classified again by using a naive bayes algorithm to obtain a classification result. When the classification result of the feature data of the abnormal user by the naive Bayes algorithm is the abnormal user, the key feature parameters in the feature data of the abnormal user can be combined to obtain key feature data, so that a classification model is constructed by using the key feature data.
Step 303, constructing a classification model by using the key feature data.
In this embodiment, the classification model may be a decision tree model. A decision tree model may be created, and the key feature data containing the key feature parameters generated in step 302 is used as a training sample to train the decision tree model. And then, whether the user is an abnormal user can be identified by utilizing the trained decision tree model.
In this embodiment, the trained decision tree model identifies the abnormal user only by using the key feature parameters determined in step 302, which have a higher importance degree for identifying the abnormal user, so as to avoid the interference of the features with a lower importance degree on the identification process, improve the identification accuracy, and reduce the overhead of the identification process.
Referring to fig. 4, a schematic structural diagram of an embodiment of an apparatus for identifying an abnormal user according to the present application is shown. This device embodiment corresponds to the method embodiment shown in fig. 2.
As shown in fig. 4, the apparatus 400 for identifying an abnormal user of the present embodiment includes: the identification unit 401, the selection unit 402 and the construction unit 403. The identifying unit 401 is configured to obtain feature data of a plurality of users, and determine an abnormal user in the plurality of users in an unsupervised learning manner based on the feature data, where the feature data includes: a plurality of feature parameters indicating features of a user; the selecting unit 402 is configured to select a key feature parameter for constructing a classification model from the plurality of feature parameters in a supervised learning manner based on the determined feature data of the abnormal user, and generate key feature data including the key feature parameter; the constructing unit 403 is configured to construct a classification model using the key feature data, so as to identify whether the user is an abnormal user using the classification model.
In some optional implementations of this embodiment, the identifying unit 401 includes: an abnormal user identification subunit (not shown) configured to cluster the feature data of the plurality of users by using a clustering algorithm to obtain a plurality of clusters; and when the cluster contains feature data matched with the preset abnormal feature data, determining the users corresponding to all the feature data in the cluster as abnormal users.
In some optional implementations of this embodiment, the selecting unit 402 includes: a decision tree selection subunit (not shown) configured to use the determined feature data of the abnormal user as a training sample to construct a decision tree, wherein one node in the decision tree corresponds to one feature parameter; taking the characteristic parameters corresponding to the nodes with the depth larger than the depth threshold value in the decision tree as key characteristic parameters for constructing a classification model; selecting the characteristic data of the abnormal users meeting the following conditions: classifying the feature data of the abnormal user by the decision tree to obtain a classification result which is the abnormal user; and combining the key characteristic parameters in the selected characteristic data of the abnormal user to obtain the key characteristic data.
In some optional implementations of this embodiment, the selecting unit 402 includes: a bayesian selecting subunit (not shown) configured to use a naive bayes algorithm to calculate an abnormal probability corresponding to each feature parameter according to the determined feature data of the abnormal user, where the abnormal probability indicates a probability that the user is an abnormal user when the value of the feature parameter is abnormal; taking the corresponding characteristic parameters with the abnormal probability larger than the probability threshold as key characteristic parameters for constructing a classification model; selecting the characteristic data of the abnormal users meeting the following conditions: classifying the feature data of the abnormal user by adopting a naive Bayes algorithm to obtain a classification result as the abnormal user; and combining the key characteristic parameters in the selected characteristic data of the abnormal user to obtain the key characteristic data.
In some optional implementations of this embodiment, the constructing unit 403 includes: a model building subunit (not shown) configured to create a decision tree model; and training the decision tree model by taking the key characteristic data as a training sample so as to identify whether the user is an abnormal user or not by using the trained decision tree model.
Fig. 5 shows a schematic structural diagram of a computer system suitable for implementing the apparatus for identifying an abnormal user according to the embodiment of the present application.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data necessary for the operation of the system 500 are also stored. The CPU501, ROM502, and RAM503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As another aspect, the present application also provides a non-volatile computer storage medium, which may be the non-volatile computer storage medium included in the apparatus in the above-described embodiments; or it may be a non-volatile computer storage medium that exists separately and is not incorporated into the terminal. The non-transitory computer storage medium stores one or more programs that, when executed by a device, cause the device to: the method comprises the steps of obtaining feature data of a plurality of users, and determining abnormal users in the plurality of users in an unsupervised learning mode based on the feature data, wherein the feature data comprise: a plurality of feature parameters indicating features of a user; based on the determined feature data of the abnormal user, selecting key feature parameters for constructing a classification model from the feature parameters in a supervised learning mode, and generating key feature data containing the key feature parameters; and constructing a classification model by using the key characteristic data so as to identify whether the user is an abnormal user or not by using the classification model.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (8)

1. An abnormal user identification method comprises the following steps:
the method comprises the steps of obtaining characteristic data of a plurality of users, and determining abnormal users in the plurality of users in an unsupervised learning mode based on the characteristic data, wherein the characteristic data comprises the following steps: a plurality of feature parameters indicating features of a user;
the determining abnormal users in the plurality of users in an unsupervised learning mode comprises the following steps: clustering the characteristic data of a plurality of users by adopting a clustering algorithm to obtain a plurality of clusters; when the cluster contains feature data matched with preset abnormal feature data, determining users corresponding to all the feature data in the cluster as abnormal users;
based on the determined feature data of the abnormal user, selecting key feature parameters for constructing a classification model from a plurality of feature parameters in a supervised learning mode, and generating key feature data containing the key feature parameters, wherein the key feature parameters are feature parameters for identifying the abnormal user;
and constructing a classification model by using the key characteristic data so as to identify whether the user is an abnormal user or not by using the classification model.
2. The method of claim 1, wherein extracting key feature parameters for constructing a classification model from the plurality of feature parameters in a supervised learning manner based on the determined feature data of the abnormal user, and generating the key feature data including the key feature parameters comprises:
taking the determined characteristic data of the abnormal user as a training sample, and constructing a decision tree, wherein one node in the decision tree corresponds to one characteristic parameter;
taking the characteristic parameters corresponding to the nodes with the depth greater than the depth threshold value in the decision tree as key characteristic parameters for constructing a classification model;
selecting the characteristic data of the abnormal users meeting the following conditions: the decision tree classifies the feature data of the abnormal users to obtain a classification result which is the abnormal users;
and combining the key characteristic parameters in the selected characteristic data of the abnormal user to obtain the key characteristic data.
3. The method of claim 1, wherein extracting key feature parameters for constructing a classification model from the plurality of feature parameters in a supervised learning manner based on the determined feature data of the abnormal user, and generating the key feature data including the key feature parameters comprises:
respectively calculating the abnormal probability corresponding to each characteristic parameter by adopting a naive Bayes algorithm according to the determined characteristic data of the abnormal user, wherein the abnormal probability indicates the probability that the user is the abnormal user when the numerical value of the characteristic parameter is abnormal;
taking the corresponding characteristic parameter with the abnormal probability larger than the probability threshold value as a key characteristic parameter for constructing a classification model;
selecting the characteristic data of the abnormal users meeting the following conditions: classifying the feature data of the abnormal user by adopting a naive Bayes algorithm to obtain a classification result as the abnormal user;
and combining the key characteristic parameters in the selected characteristic data of the abnormal user to obtain the key characteristic data.
4. The method of claim 2 or 3, wherein the classification model is a decision tree model; and
constructing a classification model by using the key characteristic data, and identifying whether the user is an abnormal user by using the classification model comprises the following steps:
creating a decision tree model;
and training the decision tree model by taking the key characteristic data as a training sample so as to identify whether the user is an abnormal user or not by using the trained decision tree model.
5. An apparatus for identifying anomalous users, comprising:
the identification unit is used for acquiring characteristic data of a plurality of users and determining abnormal users in the plurality of users in an unsupervised learning mode based on the characteristic data, and the characteristic data comprises: a plurality of feature parameters indicating features of a user;
the identification unit includes: the abnormal user identification subunit is configured for clustering the characteristic data of the plurality of users by adopting a clustering algorithm to obtain a plurality of clusters; when the cluster contains feature data matched with preset abnormal feature data, determining users corresponding to all the feature data in the cluster as abnormal users;
the selecting unit is configured to select key feature parameters for constructing a classification model from a plurality of feature parameters in a supervised learning mode based on the determined feature data of the abnormal user, and generate key feature data containing the key feature parameters, wherein the key feature parameters are feature parameters for identifying the abnormal user;
and the construction unit is configured to construct a classification model by using the key feature data so as to identify whether the user is an abnormal user by using the classification model.
6. The apparatus of claim 5, wherein the selecting unit comprises:
the decision tree selection subunit is configured to use the determined feature data of the abnormal user as a training sample to construct a decision tree, wherein one node in the decision tree corresponds to one feature parameter; taking the characteristic parameters corresponding to the nodes with the depth greater than the depth threshold value in the decision tree as key characteristic parameters for constructing a classification model; selecting the characteristic data of the abnormal users meeting the following conditions: the decision tree classifies the feature data of the abnormal users to obtain a classification result which is the abnormal users; and combining the key characteristic parameters in the selected characteristic data of the abnormal user to obtain the key characteristic data.
7. The apparatus of claim 6, wherein the selecting unit comprises:
the Bayes selecting subunit is configured to respectively calculate an abnormal probability corresponding to each characteristic parameter according to the determined characteristic data of the abnormal user by adopting a naive Bayes algorithm, wherein the abnormal probability indicates the probability that the user is an abnormal user when the numerical value of the characteristic parameter is abnormal; taking the corresponding characteristic parameters with the abnormal probability larger than the probability threshold as key characteristic parameters for constructing a classification model; selecting the characteristic data of the abnormal users meeting the following conditions: classifying the feature data of the abnormal user by adopting a naive Bayes algorithm to obtain a classification result as the abnormal user; and combining the key characteristic parameters in the selected characteristic data of the abnormal user to obtain the key characteristic data.
8. The apparatus of claim 7, wherein the building unit comprises:
a model building subunit configured to create a decision tree model; and training the decision tree model by taking the key characteristic data as a training sample so as to identify whether the user is an abnormal user or not by using the trained decision tree model.
CN201611051585.5A 2016-11-24 2016-11-24 Abnormal user identification method and device for identifying abnormal user Active CN108108743B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611051585.5A CN108108743B (en) 2016-11-24 2016-11-24 Abnormal user identification method and device for identifying abnormal user

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611051585.5A CN108108743B (en) 2016-11-24 2016-11-24 Abnormal user identification method and device for identifying abnormal user

Publications (2)

Publication Number Publication Date
CN108108743A CN108108743A (en) 2018-06-01
CN108108743B true CN108108743B (en) 2022-06-24

Family

ID=62204087

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611051585.5A Active CN108108743B (en) 2016-11-24 2016-11-24 Abnormal user identification method and device for identifying abnormal user

Country Status (1)

Country Link
CN (1) CN108108743B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108269012A (en) * 2018-01-12 2018-07-10 中国平安人寿保险股份有限公司 Construction method, device, storage medium and the terminal of risk score model
CN109166624A (en) * 2018-09-21 2019-01-08 广州杰赛科技股份有限公司 A kind of behavior analysis method, device, server, system and storage medium
CN109583470A (en) * 2018-10-17 2019-04-05 阿里巴巴集团控股有限公司 A kind of explanation feature of abnormality detection determines method and apparatus
CN110008980B (en) * 2019-01-02 2024-01-19 创新先进技术有限公司 Identification model generation method, identification device, identification equipment and storage medium
CN109936561B (en) * 2019-01-08 2022-05-13 平安科技(深圳)有限公司 User request detection method and device, computer equipment and storage medium
CN109902486A (en) * 2019-01-24 2019-06-18 平安科技(深圳)有限公司 Electronic device, abnormal user processing strategie Intelligent Decision-making Method and storage medium
CN109918279B (en) * 2019-01-24 2022-09-27 平安科技(深圳)有限公司 Electronic device, method for identifying abnormal operation of user based on log data and storage medium
CN110570244A (en) * 2019-09-04 2019-12-13 深圳创新奇智科技有限公司 hot-selling commodity construction method and system based on abnormal user identification
CN113822309B (en) * 2020-09-25 2024-04-16 京东科技控股股份有限公司 User classification method, apparatus and non-volatile computer readable storage medium
CN112308566A (en) * 2020-09-27 2021-02-02 中智关爱通(上海)科技股份有限公司 Enterprise fraud identification method
CN113129054B (en) * 2021-03-30 2024-05-31 广州博冠信息科技有限公司 User identification method and device
CN113743963A (en) * 2021-09-28 2021-12-03 北京奇艺世纪科技有限公司 Abnormal recognition model training method, abnormal object recognition device and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103458042A (en) * 2013-09-10 2013-12-18 上海交通大学 Microblog advertisement user detection method
CN104657744A (en) * 2015-01-29 2015-05-27 中国科学院信息工程研究所 Multi-classifier training method and classifying method based on non-deterministic active learning
CN105376248A (en) * 2015-11-30 2016-03-02 睿峰网云(北京)科技股份有限公司 Method and device for identifying abnormal flow

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7089592B2 (en) * 2001-03-15 2006-08-08 Brighterion, Inc. Systems and methods for dynamic detection and prevention of electronic fraud
CN103793484B (en) * 2014-01-17 2017-03-15 五八同城信息技术有限公司 The fraud identifying system based on machine learning in classification information website
CN105873113B (en) * 2015-01-21 2019-05-28 中国移动通信集团福建有限公司 Wireless quality positioning problems method and system
CN105005594B (en) * 2015-06-29 2018-07-13 嘉兴慧康智能科技有限公司 Abnormal microblog users recognition methods

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103458042A (en) * 2013-09-10 2013-12-18 上海交通大学 Microblog advertisement user detection method
CN104657744A (en) * 2015-01-29 2015-05-27 中国科学院信息工程研究所 Multi-classifier training method and classifying method based on non-deterministic active learning
CN105376248A (en) * 2015-11-30 2016-03-02 睿峰网云(北京)科技股份有限公司 Method and device for identifying abnormal flow

Also Published As

Publication number Publication date
CN108108743A (en) 2018-06-01

Similar Documents

Publication Publication Date Title
CN108108743B (en) Abnormal user identification method and device for identifying abnormal user
US20190163742A1 (en) Method and apparatus for generating information
CN109492772B (en) Method and device for generating information
WO2018103718A1 (en) Application recommendation method and apparatus, and server
CN108197652B (en) Method and apparatus for generating information
CN108874832B (en) Target comment determination method and device
CN110929799B (en) Method, electronic device, and computer-readable medium for detecting abnormal user
CN114329201B (en) Training method of deep learning model, content recommendation method and device
CN110659657B (en) Method and device for training model
CN112395487B (en) Information recommendation method and device, computer readable storage medium and electronic equipment
CN112149699B (en) Method and device for generating model and method and device for identifying image
CN109214501B (en) Method and apparatus for identifying information
CN108111399B (en) Message processing method, device, terminal and storage medium
CN113190670A (en) Information display method and system based on big data platform
CN112800919A (en) Method, device and equipment for detecting target type video and storage medium
CN111582341A (en) User abnormal operation prediction method and device
CN111898675A (en) Credit wind control model generation method and device, scoring card generation method, machine readable medium and equipment
CN111460288B (en) Method and device for detecting news event
CN111353103B (en) Method and device for determining user community information
CN114662697A (en) Time series anomaly detection
CN111191677B (en) User characteristic data generation method and device and electronic equipment
CN114494709A (en) Feature extraction model generation method, image feature extraction method and device
CN115759748A (en) Risk detection model generation method and device and risk individual identification method and device
CN110059172B (en) Method and device for recommending answers based on natural language understanding
WO2023129339A1 (en) Extracting and classifying entities from digital content items

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant