CN109977992B - Electronic device, method for identifying batch registration behaviors and storage medium - Google Patents
Electronic device, method for identifying batch registration behaviors and storage medium Download PDFInfo
- Publication number
- CN109977992B CN109977992B CN201910067104.7A CN201910067104A CN109977992B CN 109977992 B CN109977992 B CN 109977992B CN 201910067104 A CN201910067104 A CN 201910067104A CN 109977992 B CN109977992 B CN 109977992B
- Authority
- CN
- China
- Prior art keywords
- feature
- processed
- characteristic
- information
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to an artificial intelligence technology, and discloses an electronic device, a method for identifying batch registration behaviors and a computer-readable storage medium. The method comprises the steps of obtaining characteristic information from each account to be processed, wherein the obtained characteristic information comprises characteristic information marked as key fields; generating a feature vector of each account to be processed; performing clustering analysis on all the feature vectors to obtain a plurality of feature matrices; judging whether a characteristic matrix meeting a first preset condition exists, and if so, taking the characteristic matrix meeting the first preset condition as a matrix to be processed; and querying all the to-be-processed matrixes meeting the second preset condition, marking all the queried to-be-processed matrixes as abnormal matrixes when the to-be-processed matrixes meet the second preset condition, and identifying the to-be-processed accounts corresponding to the characteristic vectors in the abnormal matrixes as batch registered accounts respectively. Compared with the prior art, the method can identify various batch registration behaviors and has high identification accuracy.
Description
Technical Field
The present invention relates to the field of computer technologies, and in particular, to an electronic device, a method for identifying batch registration behaviors, and a computer-readable storage medium.
Background
With the development of internet technology, the application of the internet is widely existed in the production and life of people. Generally, before people engage in a transaction or receive a service using the internet, people often need to register an account number in a service providing platform providing the transaction or service. For the same service provision platform, registering one or several accounts is sufficient to meet the requirements of one user. Compared with normal registration behaviors, batch registration behaviors for the purpose of profit making also exist in each service provision platform, and the batch registration behaviors are often the objects of severe attacks of each service provision platform.
The existing identification method of batch registration behaviors is as follows: the method comprises the steps of finding out the number of accounts registered by using the same Internet Protocol Address (IP Address) from registration information of a user, and determining the accounts registered by the IP Address as batch registration accounts when the number of the accounts registered by the IP Address exceeds a preset threshold value. The method has the defects that only the batch registration behavior of the same IP address can be identified, and the identification accuracy is low.
Therefore, how to improve the identification accuracy of the batch registration behavior becomes an urgent problem to be solved.
Disclosure of Invention
The invention mainly aims to provide an electronic device, a batch registration behavior identification method and a computer readable storage medium, aiming at improving the identification accuracy rate of batch registration behaviors.
In order to achieve the above object, the present invention provides an electronic device, which includes a memory and a processor, wherein the memory stores an identification program of batch registration behaviors, and the identification program of batch registration behaviors implements the following steps when executed by the processor:
an acquisition step: acquiring a first preset amount of feature information from each account to be processed, wherein the first preset amount of feature information comprises a second preset amount of feature information marked as key fields;
a generation step: respectively converting the first preset amount of feature information of each account to be processed into corresponding feature values, and generating a feature vector of each account to be processed according to the feature value corresponding to each feature information in each account to be processed;
clustering: performing cluster analysis on all the feature vectors to obtain a plurality of feature matrices, wherein each feature matrix consists of a plurality of feature vectors;
the extraction step comprises: respectively acquiring the characteristic value corresponding to each key field from each characteristic matrix, and taking all the characteristic values corresponding to one key field in the same characteristic matrix as a characteristic value group of the key field;
a judging step: judging whether a characteristic matrix meeting a first preset condition exists according to the characteristic value group of each key field in each characteristic matrix, and if so, taking the characteristic matrix meeting the first preset condition as a matrix to be processed;
an identification step: and querying all to-be-processed matrixes meeting a second preset condition, marking all the queried to-be-processed matrixes as abnormal matrixes when the to-be-processed matrixes are queried, and respectively identifying the to-be-processed accounts corresponding to the characteristic vectors in the abnormal matrixes as batch registered accounts.
Preferably, the judging step includes:
respectively calculating the dispersion corresponding to the feature value group of each key field in each feature matrix;
judging whether a characteristic matrix meeting a first preset condition exists, and if so, taking the characteristic matrix meeting the first preset condition as a matrix to be processed, wherein the first preset condition is that the dispersion corresponding to characteristic value groups of all key fields in one characteristic matrix is smaller than a first preset threshold value.
Preferably, the processor executes an identification procedure of the batch registration behavior, and before the identification step, further implements the following steps:
determining eigenvalue distribution data corresponding to each key field according to eigenvalues corresponding to all key fields in all the matrixes to be processed;
the identifying step includes:
determining a distribution probability value corresponding to the characteristic value group of each key field in each matrix to be processed according to the characteristic value distribution data corresponding to each key field;
and querying all feature value groups with distribution probability values smaller than a third preset threshold value, marking the to-be-processed matrix to which all the queried feature value groups belong as an abnormal matrix when the feature value groups are queried, and respectively identifying the to-be-processed accounts corresponding to the feature vectors in each abnormal matrix as batch registered accounts.
Preferably, the generating step comprises:
determining a preprocessing rule corresponding to each feature information according to a mapping relation between the predetermined feature information and the preprocessing rule;
preprocessing each piece of feature information according to a preprocessing rule corresponding to each piece of feature information to obtain a feature value corresponding to each piece of feature information;
and generating a feature vector of each account to be processed according to the feature value corresponding to each feature information in each account to be processed.
Preferably, the key field includes one or more of a mobile phone number, a network address, and device identification information;
when one piece of feature information is any one of a mobile phone number, a network address and equipment identification information, the preprocessing rule corresponding to the feature information comprises:
taking the characteristic information as characteristic information to be processed, and extracting at least one characteristic field from the characteristic information to be processed of each account to be processed respectively;
adding all the characteristic fields of the characteristic information to be processed in all the accounts to be processed into a characteristic field set of the characteristic information to be processed, and counting the occurrence frequency of each characteristic field in the characteristic field set of the characteristic information to be processed;
and determining the characteristic value of the characteristic information to be processed according to the occurrence frequency of each characteristic field of the characteristic information to be processed in each account to be processed.
In addition, in order to achieve the above object, the present invention further provides a method for identifying batch registration behaviors, which includes the steps of:
an acquisition step: acquiring a first preset amount of feature information from each account to be processed, wherein the first preset amount of feature information comprises a second preset amount of feature information marked as key fields;
a generation step: respectively converting the feature information of the first preset quantity of each account to be processed into corresponding feature values, and generating a feature vector of each account to be processed according to the feature value corresponding to each feature information in each account to be processed;
clustering: performing cluster analysis on all the feature vectors to obtain a plurality of feature matrices, wherein each feature matrix consists of a plurality of feature vectors;
the extraction step comprises: respectively acquiring the characteristic value corresponding to each key field from each characteristic matrix, and taking all the characteristic values corresponding to one key field in the same characteristic matrix as a characteristic value group of the key field;
a judging step: judging whether a feature matrix meeting a first preset condition exists according to the feature value group of each key field in each feature matrix, and if so, taking the feature matrix meeting the first preset condition as a matrix to be processed;
an identification step: and querying all to-be-processed matrixes meeting a second preset condition, marking all queried to-be-processed matrixes as abnormal matrixes when the to-be-processed matrixes meet the second preset condition, and identifying to-be-processed accounts corresponding to all characteristic vectors in all the abnormal matrixes as batch registered accounts respectively.
Preferably, the judging step includes:
respectively calculating the dispersion corresponding to the characteristic value group of each key field in each characteristic matrix;
judging whether a feature matrix meeting a first preset condition exists, and if so, taking the feature matrix meeting the first preset condition as a matrix to be processed, wherein the first preset condition is that the dispersion corresponding to feature value groups of all key fields in one feature matrix is smaller than a first preset threshold value.
Preferably, before the identifying step, the method further comprises:
determining eigenvalue distribution data corresponding to each key field according to eigenvalues corresponding to all key fields in all the matrixes to be processed;
the identifying step includes:
determining a distribution probability value corresponding to the characteristic value group of each key field in each matrix to be processed according to the characteristic value distribution data corresponding to each key field;
and querying all feature value groups with distribution probability values smaller than a third preset threshold value, marking the to-be-processed matrix to which all the queried feature value groups belong as an abnormal matrix when the feature value groups are queried, and respectively identifying the to-be-processed accounts corresponding to the feature vectors in each abnormal matrix as batch registered accounts.
Preferably, the generating step comprises:
determining a pre-processing rule corresponding to each feature information according to a mapping relation between the predetermined feature information and the pre-processing rule;
preprocessing each piece of feature information according to a preprocessing rule corresponding to each piece of feature information to obtain a feature value corresponding to each piece of feature information;
and generating a feature vector of each account to be processed according to the feature value corresponding to each feature information in each account to be processed.
In addition, to achieve the above object, the present invention further provides a computer-readable storage medium, wherein the computer-readable storage medium stores an identification program of batch registration behavior, and the identification program of batch registration behavior is executable by at least one processor to cause the at least one processor to execute the steps of the identification method of batch registration behavior according to any one of the above.
The method comprises the steps of obtaining a first preset amount of feature information from each account to be processed, wherein the first preset amount of feature information comprises a second preset amount of feature information marked as key fields; respectively converting a first preset amount of feature information in each account to be processed into corresponding feature values, and generating a feature vector of each account to be processed according to the feature value corresponding to each feature information in each account to be processed; performing cluster analysis on all the feature vectors to obtain a plurality of feature matrices, wherein each feature matrix consists of a plurality of feature vectors; respectively acquiring the characteristic value corresponding to each key field from each characteristic matrix, and taking all the characteristic values corresponding to one key field in the same characteristic matrix as a characteristic value group of the key field; judging whether a feature matrix meeting a first preset condition exists according to the feature value group of each key field in each feature matrix, and if so, taking the feature matrix meeting the first preset condition as a matrix to be processed; and querying all to-be-processed matrixes meeting a second preset condition, marking all the queried to-be-processed matrixes as abnormal matrixes when the to-be-processed matrixes are queried, and respectively identifying the to-be-processed accounts corresponding to the characteristic vectors in the abnormal matrixes as batch registered accounts. Compared with the prior art, the method and the device adopt various characteristic information including the key fields with high relevance degree with the batch registration behaviors as the analysis targets, and finally identify the batch registration accounts after analysis through various analysis means, so that the method and the device can identify various batch registration behaviors and have high identification accuracy.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
FIG. 1 is a diagram illustrating an operating environment of a first embodiment of a batch enrollment function recognition program of the present invention;
FIG. 2 is a block diagram of a first embodiment of a program for batch enrollment behavior recognition according to the present invention;
fig. 3 is a flowchart illustrating a method for identifying batch registration behavior according to a first embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
The invention provides an identification program for batch registration behaviors.
Referring to fig. 1, a schematic operating environment of an identification program 10 for batch registration behavior according to a first embodiment of the present invention is shown.
In the present embodiment, the identification program 10 of the batch registration behavior is installed and run in the electronic apparatus 1. The electronic device 1 may be a desktop computer, a notebook, a palm computer, a server, or other computing equipment. The electronic device 1 may include, but is not limited to, a memory 11, a processor 12, and a display 13. Fig. 1 only shows the electronic device 1 with components 11-13, but it is to be understood that not all of the shown components are required to be implemented, and that more or less components may alternatively be implemented.
The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a hard disk or a memory of the electronic device 1. The memory 11 may also be an external storage device of the electronic apparatus 1 in other embodiments, such as a plug-in hard disk provided on the electronic apparatus 1, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic apparatus 1. The memory 11 is used for storing application software installed in the electronic device 1 and various types of data, such as program codes of the recognition program 10 for batch registration behavior. The memory 11 may also be used to temporarily store data that has been output or is to be output.
The processor 12 may be a Central Processing Unit (CPU), a microprocessor or other data Processing chip in some embodiments, and is used for running program codes stored in the memory 11 or Processing data, such as the identification program 10 for performing batch registration activities.
The display 13 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch panel, or the like in some embodiments. The display 13 is used for displaying information processed in the electronic apparatus 1 and for displaying a visualized user interface. The components 11-13 of the electronic device 1 communicate with each other via a program bus.
Referring to fig. 2, a block diagram of a first embodiment of a batch registration behavior recognition program 10 according to the present invention is shown. In this embodiment, the recognition program 10 for batch registration behavior may be divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by one or more processors (in this embodiment, the processor 12) to complete the present invention. For example, in fig. 2, the identification program 10 of the batch registration behavior may be divided into an acquisition module 101, a generation module 102, a clustering module 103, an extraction module 104, a judgment module 105, and an identification module 106. The module referred to in the present invention refers to a series of computer program instruction segments capable of performing specific functions, and is more suitable than a program for describing the execution process of the recognition program 10 of batch registration behavior in the electronic device 1, wherein:
the obtaining module 101 is configured to obtain a first preset amount of feature information from each account to be processed, where the first preset amount of feature information includes a second preset amount of feature information marked as a key field.
The obtaining module 101 obtains a first preset amount of feature information from each account to be processed, where the feature information includes one or more of a mobile phone number, a network address (e.g., an IP address), and device identification information, and in some application scenarios, the feature information further includes one or more of geographic location information, academic calendar information, and information loss amount. In the first preset amount of feature information of each account to be processed, a second preset amount of feature information marked as a key field exists, and for example, feature information such as a mobile phone number, a network address, device identification information and the like can be marked as a key field. The first preset number is greater than or equal to a second preset number.
In this embodiment, the obtaining module 101 is further configured to set feature information and a key field, where the setting method of the feature information and the key field includes:
the method comprises the steps of obtaining account item information of various categories from each account to be processed, respectively determining the correlation degree between the account item information of each category and batch registration behaviors, and sequencing the account item information of each category according to the magnitude sequence of the correlation degree. And selecting a first preset number of account item information as characteristic information according to the sequence from the large degree of correlation to the small degree of correlation, and selecting a second preset number of characteristic information as a key field from the characteristic information according to the sequence from the large degree of correlation to the small degree of correlation.
The generating module 102 is configured to convert the feature information of the first preset number of the accounts to be processed into corresponding feature values, and generate a feature vector of each account to be processed according to the feature value corresponding to each feature information in each account to be processed.
In this embodiment, the generating module 102 is further configured to:
firstly, according to a mapping relation between predetermined characteristic information and a preprocessing rule, a preprocessing rule corresponding to each characteristic information is determined.
And then, preprocessing each piece of characteristic information according to a preprocessing rule corresponding to each piece of characteristic information to obtain a characteristic value corresponding to each piece of characteristic information.
And finally, generating a feature vector of each account to be processed according to the feature value corresponding to each feature information in each account to be processed.
The preprocessing rule may be set according to a specific application scenario, for example, the following example may be referred to set the preprocessing rule:
in an embodiment, when a feature information is any one of a mobile phone number, a network address, and device identification information, the preprocessing rule corresponding to the feature information includes: at least one characteristic field is respectively extracted from the characteristic information to be processed of each account to be processed, for example, the first seven bits 1234567 are intercepted from the mobile phone number 12345678912 as the characteristic field of the mobile phone number, further, for example, the first two groups of numbers 10.11 are intercepted from the IP address 10.11.12.13 as the characteristic field of the IP address, or the first three groups of numbers 10.11.12 are intercepted as the characteristic field of the IP address, and further, for example, the device number is extracted from the device identification information as the characteristic field of the device identification information. And then, adding all the characteristic fields of the characteristic information to be processed in all the accounts to be processed into a characteristic field set of the characteristic information to be processed, and counting the occurrence frequency of each characteristic field in the characteristic field set of the characteristic information to be processed. And determining the characteristic value of the characteristic information to be processed according to the occurrence frequency of each characteristic field of the characteristic information to be processed in each account to be processed.
For example, when one feature information is any one of geographical location information and academic information, the preprocessing rule corresponding to the feature information includes: the method includes the steps that a one-hot coding mode is adopted, feature information to be processed of each account to be processed is converted into codes, the codes obtained through conversion are used as feature values of the feature information to be processed, for example, when three field values of the academic information exist, namely high school, subject and master, the codes with the length of three bits are used as the feature values corresponding to the academic information, each bit represents a state of the academic, when the field value of the academic information of the account to be processed is high school, the bit representing the high school calendar is set to be 1, and the other two bits are set to be 0.
Or adding the field values of the to-be-processed feature information in all the to-be-processed accounts into the field value set of the to-be-processed feature information, and counting the occurrence frequency of each field value in the field value set of the to-be-processed feature information. And determining the characteristic value of the characteristic information to be processed according to the frequency of the field value of the characteristic information to be processed in each account to be processed.
And the clustering module 103 is configured to perform clustering analysis on all the feature vectors to obtain a plurality of feature matrices, where each feature matrix is composed of a plurality of feature vectors.
The clustering module 103 inputs all the feature vectors into a pre-established clustering Model (e.g., a clustering Model established based on a maximum expectation algorithm), and the clustering Model performs clustering analysis on the feature vectors through a K-means algorithm (the K-means algorithm is a hard clustering algorithm), a Gaussian Mixed Model (GMM), and the like to obtain a plurality of feature vector groups, and each feature vector group is output in the form of a feature matrix, for example, a feature vector in one feature vector group is used as a row vector or a column vector to form a corresponding feature matrix.
An extracting module 104, configured to obtain feature values corresponding to the key fields from the feature matrices, respectively, and use all feature values corresponding to a key field in the same feature matrix as a feature value group of the key field.
For example, if a feature matrix is composed of feature vectors in a feature vector group as row vectors, and each row of elements of the feature matrix represents all feature values corresponding to a feature message, a row of elements corresponding to each key field can be directly found as a feature value group of the key field.
And the judging module 105 is configured to judge whether a feature matrix meeting a first preset condition exists according to the feature value group of each key field in each feature matrix, and if so, use the feature matrix meeting the first preset condition as a to-be-processed matrix.
The determining module 105 is further configured to:
firstly, the dispersion corresponding to the feature value set of each key field in each feature matrix is respectively calculated. The dispersion corresponding to a feature value group refers to the degree of difference or dispersion between the feature values in the group of feature values, and for example, an index such as a standard deviation, a variance, an average difference, or the like corresponding to the feature value group can be calculated from the feature values in the group of feature values as the dispersion corresponding to the feature value group.
And then, judging whether a characteristic matrix meeting a first preset condition exists, if so, taking the characteristic matrix meeting the first preset condition as a matrix to be processed, and if not, outputting that the batch registration accounts are not identified, wherein the first preset condition is that the dispersion corresponding to the characteristic value set of all key fields in one characteristic matrix is smaller than a first preset threshold value.
The identification module 106 is configured to query all to-be-processed matrices meeting the second preset condition, mark all queried to-be-processed matrices as abnormal matrices when the to-be-processed matrices meet the second preset condition, and identify to-be-processed accounts corresponding to feature vectors in the abnormal matrices as batch registration accounts respectively.
Further, in this embodiment, the program further includes:
and a determining module (not shown in the figure) configured to determine, according to the eigenvalues corresponding to all key fields in all the to-be-processed matrices, eigenvalue distribution data corresponding to each key field.
For example, the determining module uses the feature vectors of all accounts to be processed as row vectors or column vectors to form a full-scale matrix, extracts all feature values corresponding to each key field from the full-scale matrix, and performs statistics on all feature values corresponding to each key field to obtain feature value distribution data (e.g., cumulative distribution curves, cumulative distribution tables, etc.) corresponding to each key field.
Further, in this embodiment, the identifying module 106 is further configured to:
and determining a distribution probability value corresponding to the characteristic value group of each key field in each matrix to be processed according to the characteristic value distribution data corresponding to each key field. For example, determining a feature value with the largest value in a feature value group of a key field in a matrix to be processed as M and a feature value with the smallest value as N, and a value interval corresponding to the feature value group as [ N, M ], determining a distribution probability value corresponding to the value interval [ N, M ] according to feature value distribution data of the key field, for example, querying cumulative distribution probability values corresponding to N and M, respectively, and subtracting the cumulative distribution probability value corresponding to N from the cumulative distribution probability value corresponding to M to obtain a distribution probability value corresponding to the value interval [ N, M ].
And querying all the characteristic value groups with the distribution probability value smaller than a third preset threshold, marking the to-be-processed matrixes to which all the queried characteristic value groups belong as abnormal matrixes when the characteristic value groups are queried, and identifying the to-be-processed accounts corresponding to all the characteristic vectors in all the abnormal matrixes as batch registered accounts respectively.
The method comprises the steps of obtaining a first preset amount of feature information from each account to be processed, wherein the first preset amount of feature information comprises a second preset amount of feature information marked as key fields; respectively converting the first preset amount of feature information of each account to be processed into corresponding feature values, and generating a feature vector of each account to be processed according to the feature value corresponding to each feature information in each account to be processed; performing cluster analysis on all the characteristic vectors to obtain a plurality of characteristic matrixes, wherein each characteristic matrix consists of a plurality of characteristic vectors; respectively acquiring the characteristic value corresponding to each key field from each characteristic matrix, and taking all the characteristic values corresponding to one key field in the same characteristic matrix as a characteristic value group of the key field; judging whether a feature matrix meeting a first preset condition exists according to the feature value group of each key field in each feature matrix, and if so, taking the feature matrix meeting the first preset condition as a matrix to be processed; and querying all to-be-processed matrixes meeting a second preset condition, marking all queried to-be-processed matrixes as abnormal matrixes when the to-be-processed matrixes meet the second preset condition, and identifying to-be-processed accounts corresponding to all characteristic vectors in all the abnormal matrixes as batch registered accounts respectively. Compared with the prior art, the method adopts various characteristic information including the key fields with high correlation degree with the batch registration behaviors as the analysis targets, and finally identifies the batch registration accounts after the analysis by various analysis means, so that the method can identify various batch registration behaviors and has high identification accuracy.
In addition, the invention provides a method for identifying batch registration behaviors.
As shown in fig. 3, fig. 3 is a schematic flow chart of a method for identifying batch registration behavior according to a first embodiment of the present invention.
In this embodiment, the method includes:
step 10, obtaining a first preset amount of feature information from each account to be processed, wherein the first preset amount of feature information comprises a second preset amount of feature information marked as key fields.
The method includes the steps that a first preset amount of feature information is obtained from each account to be processed, the feature information comprises one or more of a mobile phone number, a network address (such as an IP address) and equipment identification information, and in some application scenarios, the feature information further comprises one or more of geographical location information, academic calendar information and information missing amount. In the first preset amount of feature information of each account to be processed, there is a second preset amount of feature information marked as a key field, for example, feature information such as a mobile phone number, a network address, and device identification information may be marked as a key field. The first preset number is greater than or equal to a second preset number.
In this embodiment, the method for setting the feature information and the key field includes:
the method comprises the steps of obtaining account item information of various types from accounts to be processed, respectively determining the correlation degree between the account item information of various types and batch registration behaviors, and sequencing the account item information of various types according to the magnitude sequence of the correlation degree. And selecting a first preset number of account item information as characteristic information according to the sequence from the large degree of correlation to the small degree of correlation, and selecting a second preset number of characteristic information as a key field from the characteristic information according to the sequence from the large degree of correlation to the small degree of correlation.
Step S20, respectively converting the feature information of the first preset quantity of each account to be processed into corresponding feature values, and generating the feature vector of each account to be processed according to the feature value corresponding to each feature information in each account to be processed.
In this embodiment, step S20 includes:
firstly, according to a mapping relation between predetermined characteristic information and a preprocessing rule, a preprocessing rule corresponding to each characteristic information is determined.
And then, preprocessing each piece of feature information according to a preprocessing rule corresponding to each piece of feature information to obtain a feature value corresponding to each piece of feature information.
And finally, generating a feature vector of each account to be processed according to the feature value corresponding to each feature information in each account to be processed.
The preprocessing rule may be set according to a specific application scenario, for example, the following examples may be referred to set the preprocessing rule:
for example, when a feature information is any one of a mobile phone number, a network address, and device identification information, the preprocessing rule corresponding to the feature information includes: at least one characteristic field is respectively extracted from the characteristic information to be processed of each account to be processed, for example, the first seven bits 1234567 are intercepted from the mobile phone number 12345678912 as the characteristic field of the mobile phone number, further, for example, the first two groups of numbers 10.11 are intercepted from the IP address 10.11.12.13 as the characteristic field of the IP address, or the first three groups of numbers 10.11.12 are intercepted as the characteristic field of the IP address, and further, for example, the device number is extracted from the device identification information as the characteristic field of the device identification information. And then, adding all the characteristic fields of the characteristic information to be processed in all the accounts to be processed into a characteristic field set of the characteristic information to be processed, and counting the occurrence frequency of each characteristic field in the characteristic field set of the characteristic information to be processed. And determining the characteristic value of the characteristic information to be processed according to the occurrence frequency of each characteristic field of the characteristic information to be processed in each account to be processed.
For example, when one feature information is any one of geographical location information and academic information, the preprocessing rule corresponding to the feature information includes: the method includes the steps of converting feature information to be processed of each account to be processed into codes by means of one-hot coding, and taking the codes obtained through conversion as feature values of the feature information to be processed, for example, if the academic information has three field values, namely high school, subject and master, codes with the length of three bits are taken as feature values corresponding to the academic information, wherein each bit represents a state of the academic calendar, and when the field value of the academic information of the account to be processed is high school, the bit representing the high school calendar is set to be 1, and the other two bits are set to be 0.
Or adding the field value of the to-be-processed feature information in all the to-be-processed accounts into a field value set of the to-be-processed feature information, and counting the occurrence frequency of each field value in the field value set of the to-be-processed feature information. And determining the characteristic value of the characteristic information to be processed according to the frequency of the field value of the characteristic information to be processed in each account to be processed.
And S30, performing cluster analysis on all the characteristic vectors to obtain a plurality of characteristic matrixes, wherein each characteristic matrix is composed of a plurality of characteristic vectors.
All the feature vectors are input into a pre-established clustering Model (for example, a clustering Model established based on a maximum expectation algorithm), the clustering Model performs clustering analysis on the feature vectors through a K-means algorithm (the K-means algorithm is a hard clustering algorithm), a Gaussian Mixed Model (GMM), and the like to obtain a plurality of feature vector groups, and each feature vector group is output in the form of a feature matrix, for example, the feature vectors in one feature vector group are used as row vectors or column vectors to form a corresponding feature matrix.
And step S40, respectively obtaining the eigenvalue corresponding to each key field from each feature matrix, and taking all the eigenvalues corresponding to one key field in the same feature matrix as one eigenvalue group of the key field.
For example, if a feature matrix is composed of feature vectors in a feature vector group as row vectors, and each row of elements of the feature matrix represents all feature values corresponding to a feature message, a row of elements corresponding to each key field can be directly found as a feature value group of the key field.
And S50, judging whether a characteristic matrix meeting a first preset condition exists according to the characteristic value group of each key field in each characteristic matrix, and if so, taking the characteristic matrix meeting the first preset condition as a matrix to be processed.
The step S50 includes:
firstly, calculating the dispersion corresponding to the characteristic value group of each key field in each characteristic matrix. The dispersion corresponding to a feature value group refers to the degree of difference or dispersion between the feature values in the group of feature values, and for example, an index such as a standard deviation, a variance, an average difference, or the like corresponding to the feature value group can be calculated from the feature values in the group of feature values as the dispersion corresponding to the feature value group.
And then, judging whether a characteristic matrix meeting a first preset condition exists, if so, taking the characteristic matrix meeting the first preset condition as a matrix to be processed, and if not, outputting the characteristic matrix not identified to the batch registered account, wherein the first preset condition is that the dispersion corresponding to the characteristic value groups of all key fields in one characteristic matrix is smaller than a first preset threshold value.
Step S60, inquiring all the matrixes to be processed which meet the second preset condition, marking all the inquired matrixes to be processed as abnormal matrixes when the matrixes to be processed are inquired, and respectively identifying the accounts to be processed corresponding to the characteristic vectors in the abnormal matrixes as batch registration accounts.
Further, in this embodiment, before step S60, the method further includes:
and determining the eigenvalue distribution data corresponding to each key field according to the eigenvalues corresponding to all key fields in all the matrixes to be processed.
For example, the feature vectors of all accounts to be processed are used as row vectors or column vectors to form a full-scale matrix, all feature values corresponding to each key field are extracted from the full-scale matrix, and all feature values corresponding to each key field are counted to obtain feature value distribution data (e.g., a cumulative distribution curve, a cumulative distribution table, etc.) corresponding to each key field.
Further, in this embodiment, the step S60 includes:
and determining a distribution probability value corresponding to the characteristic value group of each key field in each matrix to be processed according to the characteristic value distribution data corresponding to each key field. For example, determining a feature value with the largest value in a feature value group of a key field in a matrix to be processed as M and a feature value with the smallest value as N, and a value interval corresponding to the feature value group as [ N, M ], determining a distribution probability value corresponding to the value interval [ N, M ] according to feature value distribution data of the key field, for example, querying cumulative distribution probability values corresponding to N and M, respectively, and subtracting the cumulative distribution probability value corresponding to N from the cumulative distribution probability value corresponding to M to obtain a distribution probability value corresponding to the value interval [ N, M ].
And querying all the characteristic value groups with the distribution probability value smaller than a third preset threshold, marking the to-be-processed matrixes to which all the queried characteristic value groups belong as abnormal matrixes when the characteristic value groups are queried, and identifying the to-be-processed accounts corresponding to all the characteristic vectors in all the abnormal matrixes as batch registered accounts respectively.
The method comprises the steps of obtaining a first preset amount of feature information from each account to be processed, wherein the first preset amount of feature information comprises a second preset amount of feature information marked as key fields; respectively converting the feature information of the first preset quantity of each account to be processed into corresponding feature values, and generating a feature vector of each account to be processed according to the feature value corresponding to each feature information in each account to be processed; performing cluster analysis on all the characteristic vectors to obtain a plurality of characteristic matrixes, wherein each characteristic matrix consists of a plurality of characteristic vectors; respectively acquiring the characteristic value corresponding to each key field from each characteristic matrix, and taking all the characteristic values corresponding to one key field in the same characteristic matrix as a characteristic value group of the key field; judging whether a feature matrix meeting a first preset condition exists according to the feature value group of each key field in each feature matrix, and if so, taking the feature matrix meeting the first preset condition as a matrix to be processed; and querying all to-be-processed matrixes meeting a second preset condition, marking all queried to-be-processed matrixes as abnormal matrixes when the to-be-processed matrixes meet the second preset condition, and identifying to-be-processed accounts corresponding to all characteristic vectors in all the abnormal matrixes as batch registered accounts respectively. Compared with the prior art, the method and the device adopt various characteristic information including the key fields with high relevance degree with the batch registration behaviors as the analysis targets, and finally identify the batch registration accounts after analysis through various analysis means, so that the method and the device can identify various batch registration behaviors and have high identification accuracy.
Further, the present invention also provides a computer-readable storage medium, where the computer-readable storage medium stores an identification program of a batch registration behavior, and the identification program of the batch registration behavior is executable by at least one processor, so that the at least one processor executes the identification method of the batch registration behavior in any of the above embodiments.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all modifications and equivalents of the present invention, which are made by the contents of the present specification and the accompanying drawings, or directly/indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (8)
1. An electronic device, comprising a memory and a processor, wherein the memory stores thereon an identification program of a batch enrollment behavior, and when executed by the processor, the identification program of the batch enrollment behavior implements the steps of:
an acquisition step: extracting multi-class account item information from each account to be processed, determining the correlation degree between the multi-class account item information and batch registration behaviors, sorting the multi-class account item information from large to small according to the correlation degree, screening a first preset number of pieces of account item information to be set as feature information according to a sorting result, and screening a second preset number of pieces of account item information to be set as feature information of a key field from the feature information according to the sorting result;
a generation step: respectively converting the feature information of the first preset quantity of each account to be processed into corresponding feature values, and generating a feature vector of each account to be processed according to the feature value corresponding to each feature information in each account to be processed;
clustering: performing cluster analysis on all the feature vectors to obtain a plurality of feature matrices, wherein each feature matrix consists of a plurality of feature vectors;
the extraction step comprises: respectively acquiring the characteristic value corresponding to each key field from each characteristic matrix, and taking all the characteristic values corresponding to one key field in the same characteristic matrix as a characteristic value group of the key field;
a judging step: judging whether a characteristic matrix meeting a first preset condition exists according to the characteristic value group of each key field in each characteristic matrix, and if so, taking the characteristic matrix meeting the first preset condition as a matrix to be processed;
an identification step: inquiring all to-be-processed matrixes meeting a second preset condition, marking all the inquired to-be-processed matrixes as abnormal matrixes when the to-be-processed matrixes meet the second preset condition, and identifying to-be-processed accounts corresponding to all the characteristic vectors in all the abnormal matrixes as batch registered accounts respectively;
the generating step includes:
determining a preprocessing rule corresponding to each feature information according to a mapping relation between the predetermined feature information and the preprocessing rule;
preprocessing each piece of feature information according to a preprocessing rule corresponding to each piece of feature information to obtain a feature value corresponding to each piece of feature information;
and generating a feature vector of each account to be processed according to the feature value corresponding to each feature information in each account to be processed.
2. The electronic device of claim 1, wherein the determining step comprises:
respectively calculating the dispersion corresponding to the feature value group of each key field in each feature matrix;
judging whether a feature matrix meeting a first preset condition exists, and if so, taking the feature matrix meeting the first preset condition as a matrix to be processed, wherein the first preset condition is that the dispersion corresponding to feature value groups of all key fields in one feature matrix is smaller than a first preset threshold value.
3. The electronic device of claim 1 or 2, wherein the processor executes an identification procedure of the batch enrollment behavior, further implementing, prior to the identifying step, the steps of:
determining feature value distribution data corresponding to each key field according to the feature values corresponding to all key fields in all the matrixes to be processed;
the identifying step includes:
determining a distribution probability value corresponding to the characteristic value group of each key field in each matrix to be processed according to the characteristic value distribution data corresponding to each key field;
and querying all the characteristic value groups with the distribution probability value smaller than a third preset threshold, marking the to-be-processed matrixes to which all the queried characteristic value groups belong as abnormal matrixes when the characteristic value groups are queried, and identifying the to-be-processed accounts corresponding to all the characteristic vectors in all the abnormal matrixes as batch registered accounts respectively.
4. The electronic apparatus of claim 1, wherein the key field comprises one or more of a cell phone number, a network address, device identification information;
when one piece of feature information is any one of a mobile phone number, a network address and equipment identification information, the preprocessing rule corresponding to the feature information comprises:
taking the characteristic information as characteristic information to be processed, and extracting at least one characteristic field from the characteristic information to be processed of each account to be processed respectively;
adding all the characteristic fields of the characteristic information to be processed in all the accounts to be processed into a characteristic field set of the characteristic information to be processed, and counting the occurrence frequency of each characteristic field in the characteristic field set of the characteristic information to be processed;
and determining the characteristic value of the characteristic information to be processed according to the occurrence frequency of each characteristic field of the characteristic information to be processed in each account to be processed.
5. A method for identifying batch enrollment activities, the method comprising the steps of:
an acquisition step: extracting multi-class account item information from each account to be processed, determining the correlation degree between the multi-class account item information and batch registration behaviors, sorting the multi-class account item information from large to small according to the correlation degree, screening a first preset number of pieces of account item information to be set as feature information according to a sorting result, and screening a second preset number of pieces of account item information to be set as feature information of a key field from the feature information according to the sorting result;
a generation step: respectively converting the first preset amount of feature information of each account to be processed into corresponding feature values, and generating a feature vector of each account to be processed according to the feature value corresponding to each feature information in each account to be processed;
clustering: performing cluster analysis on all the feature vectors to obtain a plurality of feature matrices, wherein each feature matrix consists of a plurality of feature vectors;
the extraction step comprises: respectively acquiring the characteristic value corresponding to each key field from each characteristic matrix, and taking all the characteristic values corresponding to one key field in the same characteristic matrix as a characteristic value group of the key field;
a judging step: judging whether a characteristic matrix meeting a first preset condition exists according to the characteristic value group of each key field in each characteristic matrix, and if so, taking the characteristic matrix meeting the first preset condition as a matrix to be processed;
an identification step: inquiring all to-be-processed matrixes meeting a second preset condition, marking all the inquired to-be-processed matrixes as abnormal matrixes when the to-be-processed matrixes meet the second preset condition, and identifying to-be-processed accounts corresponding to all the characteristic vectors in all the abnormal matrixes as batch registered accounts respectively;
the generating step includes:
determining a preprocessing rule corresponding to each feature information according to a mapping relation between the predetermined feature information and the preprocessing rule;
preprocessing each piece of feature information according to a preprocessing rule corresponding to each piece of feature information to obtain a feature value corresponding to each piece of feature information;
and generating a feature vector of each account to be processed according to the feature value corresponding to each feature information in each account to be processed.
6. The method for identifying batch enrollment activities of claim 5, wherein the determining step comprises:
respectively calculating the dispersion corresponding to the feature value group of each key field in each feature matrix;
judging whether a feature matrix meeting a first preset condition exists, and if so, taking the feature matrix meeting the first preset condition as a matrix to be processed, wherein the first preset condition is that the dispersion corresponding to feature value groups of all key fields in one feature matrix is smaller than a first preset threshold value.
7. The method for identifying batch enrollment activities of claim 5 or 6, wherein prior to the identifying step, the method further comprises:
determining eigenvalue distribution data corresponding to each key field according to eigenvalues corresponding to all key fields in all the matrixes to be processed;
the identifying step includes:
determining a distribution probability value corresponding to the characteristic value group of each key field in each matrix to be processed according to the characteristic value distribution data corresponding to each key field;
and querying all feature value groups with distribution probability values smaller than a third preset threshold value, marking the to-be-processed matrix to which all the queried feature value groups belong as an abnormal matrix when the feature value groups are queried, and respectively identifying the to-be-processed accounts corresponding to the feature vectors in each abnormal matrix as batch registered accounts.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium stores an identification procedure of a batch enrollment behavior, which is executable by at least one processor to cause the at least one processor to perform the steps of the identification method of a batch enrollment behavior according to any of claims 5-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910067104.7A CN109977992B (en) | 2019-01-24 | 2019-01-24 | Electronic device, method for identifying batch registration behaviors and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910067104.7A CN109977992B (en) | 2019-01-24 | 2019-01-24 | Electronic device, method for identifying batch registration behaviors and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109977992A CN109977992A (en) | 2019-07-05 |
CN109977992B true CN109977992B (en) | 2023-01-17 |
Family
ID=67076625
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910067104.7A Active CN109977992B (en) | 2019-01-24 | 2019-01-24 | Electronic device, method for identifying batch registration behaviors and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109977992B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110324352B (en) * | 2019-07-11 | 2021-10-15 | 武汉斗鱼网络科技有限公司 | Method and device for identifying batch registered account groups |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105634855A (en) * | 2014-11-06 | 2016-06-01 | 阿里巴巴集团控股有限公司 | Method and device for recognizing network address abnormity |
CN105791255A (en) * | 2014-12-23 | 2016-07-20 | 阿里巴巴集团控股有限公司 | Method and system for identifying computer risks based on account clustering |
CN105808988A (en) * | 2014-12-31 | 2016-07-27 | 阿里巴巴集团控股有限公司 | Method and device for identifying exceptional account |
-
2019
- 2019-01-24 CN CN201910067104.7A patent/CN109977992B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105634855A (en) * | 2014-11-06 | 2016-06-01 | 阿里巴巴集团控股有限公司 | Method and device for recognizing network address abnormity |
CN105791255A (en) * | 2014-12-23 | 2016-07-20 | 阿里巴巴集团控股有限公司 | Method and system for identifying computer risks based on account clustering |
CN105808988A (en) * | 2014-12-31 | 2016-07-27 | 阿里巴巴集团控股有限公司 | Method and device for identifying exceptional account |
Non-Patent Citations (1)
Title |
---|
基于层次聚类的虚假用户检测;方勇 等;《清华大学学报(自然科学版)》;20170630;第57卷(第6期);第620-624页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109977992A (en) | 2019-07-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108768654B (en) | Identity verification method based on voiceprint recognition, server and storage medium | |
CN109933502B (en) | Electronic device, user operation record processing method and storage medium | |
CN112418798A (en) | Information auditing method and device, electronic equipment and storage medium | |
CN114398557B (en) | Information recommendation method and device based on double images, electronic equipment and storage medium | |
CN114491047A (en) | Multi-label text classification method and device, electronic equipment and storage medium | |
CN113868528A (en) | Information recommendation method and device, electronic equipment and readable storage medium | |
CN113961764A (en) | Method, device, equipment and storage medium for identifying fraud telephone | |
CN112560465A (en) | Method and device for monitoring batch abnormal events, electronic equipment and storage medium | |
CN114880368A (en) | Data query method and device, electronic equipment and readable storage medium | |
CN109977992B (en) | Electronic device, method for identifying batch registration behaviors and storage medium | |
CN111950623B (en) | Data stability monitoring method, device, computer equipment and medium | |
CN113505273A (en) | Data sorting method, device, equipment and medium based on repeated data screening | |
CN112579781A (en) | Text classification method and device, electronic equipment and medium | |
CN115146653B (en) | Dialogue scenario construction method, device, equipment and storage medium | |
CN113869455B (en) | Unsupervised clustering method and device, electronic equipment and medium | |
CN115203364A (en) | Software fault feedback processing method, device, equipment and readable storage medium | |
CN115168509A (en) | Processing method and device of wind control data, storage medium and computer equipment | |
CN111553133B (en) | Report generation method and device, electronic equipment and storage medium | |
CN114329164A (en) | Method, apparatus, device, medium and product for processing data | |
CN114006986A (en) | Outbound call compliance early warning method, device, equipment and storage medium | |
CN113486646A (en) | Product report issuing method and device, electronic equipment and readable storage medium | |
CN113590856A (en) | Label query method and device, electronic equipment and readable storage medium | |
CN113536788A (en) | Information processing method, device, storage medium and equipment | |
CN111738005A (en) | Named entity alignment method and device, electronic equipment and readable storage medium | |
CN115225489B (en) | Dynamic control method for queue service flow threshold, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |