CN113326178A - Abnormal account number propagation method and device, electronic equipment and storage medium - Google Patents

Abnormal account number propagation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113326178A
CN113326178A CN202110693767.7A CN202110693767A CN113326178A CN 113326178 A CN113326178 A CN 113326178A CN 202110693767 A CN202110693767 A CN 202110693767A CN 113326178 A CN113326178 A CN 113326178A
Authority
CN
China
Prior art keywords
users
community
abnormal
similarity
operation behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110693767.7A
Other languages
Chinese (zh)
Inventor
补彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202110693767.7A priority Critical patent/CN113326178A/en
Publication of CN113326178A publication Critical patent/CN113326178A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention provides an abnormal account number propagation method and device, electronic equipment and a storage medium, and relates to the technical field of abnormal detection. The method comprises the following steps: searching users with sharing relation in a plurality of registered users, and acquiring the similarity between the users with sharing relation; dividing a plurality of users into different communities according to the sharing relationship among the users and the similarity among the users with the sharing relationship; the similarity between the users in the community is greater than the similarity between the users in the community and the users outside the community; calculating the proportion of abnormal accounts in each community according to the determined abnormal accounts, and determining the communities with the proportion of the abnormal accounts larger than a preset threshold value as abnormal propagation communities; and propagating all the user accounts in the abnormal propagation community as abnormal accounts. According to the scheme provided by the invention, the recalling of more abnormal accounts is realized under the condition of ensuring the accuracy, the loss of the recall rate is reduced, and the recall capability is improved.

Description

Abnormal account number propagation method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of anomaly detection, in particular to an anomaly account number propagation method and device, electronic equipment and a storage medium.
Background
In the field of internet security, detection of abnormal accounts of illegal users is particularly important. In an abnormal account detection scene, in order to avoid disturbance to normal users as much as possible, the detection model is generally required to have high accuracy, but the high accuracy inevitably reduces the recall capability of the detection model, reduces the recall rate of abnormal accounts, and further generates 'missed-net fish'.
Disclosure of Invention
The invention provides an abnormal account number propagation method, an abnormal account number propagation device, electronic equipment and a storage medium, and solves the problem that in the prior art, the abnormal account number recall rate is low due to the high accuracy of an abnormal account number detection model.
In a first aspect of the present invention, a method for propagating an abnormal account is provided, including:
searching users with sharing relation in a plurality of registered users, and acquiring the similarity between the users with sharing relation;
dividing the plurality of users into different communities according to the sharing relationship among the plurality of users and the similarity among the users with the sharing relationship; the similarity between the users in the community is greater than the similarity between the users in the community and the users outside the community;
calculating the proportion of abnormal accounts in each community according to the determined abnormal accounts, and determining the communities with the proportion of the abnormal accounts larger than a preset threshold value as abnormal propagation communities;
and propagating all the user accounts in the abnormal propagation community as abnormal accounts.
Optionally, the step of dividing the multiple users into different communities according to the sharing relationship among the multiple users and the similarity between the users having the sharing relationship includes:
constructing a weighted graph model according to the sharing relationship among the users and the similarity among the users with the sharing relationship;
and dividing the plurality of users into different communities according to the weighted graph model and a preset community detection algorithm.
Optionally, the step of constructing a weighted graph model according to the sharing relationship among the multiple users and the similarity among the users having the sharing relationship includes:
each user is respectively used as a node in the weighted graph model;
connecting an edge between corresponding nodes in the weighted graph model according to the sharing relation among the users;
and determining the weight of the edges between the corresponding nodes in the weighted graph model according to the similarity between the users with the sharing relationship.
Optionally, the step of searching for users having a sharing relationship among the registered multiple users and obtaining the similarity between the users having the sharing relationship includes:
extracting operation behavior parameters of each user according to the operation behavior logs of the registered users;
and searching the users with the sharing relationship according to the operation behavior parameters of each user, and acquiring the similarity between the users with the sharing relationship according to the operation behavior parameters of each user.
Optionally, the step of obtaining the similarity between users having a sharing relationship according to the operation behavior parameter of each user includes:
acquiring the weight of each operation behavior parameter for classifying the user;
constructing a feature vector of each user according to the weight of each operation behavior parameter;
and acquiring the similar distance between the feature vectors of the users with the sharing relation according to the feature vector of each user.
Optionally, the step of obtaining the weight of each operation behavior parameter for classifying the user includes:
and determining the weight of each operation behavior parameter to the user classification according to the frequency of the operation behavior parameter appearing in the operation behavior logs of different users.
Optionally, before extracting the operation behavior parameters of each user account according to the operation behavior logs of the registered user accounts, the method further includes:
filtering the illegal data in the operation behavior log, and modifying the formats of the data of the same category in the operation behavior log into the same format.
Optionally, the operation behavior parameters include one or more of device parameters, network parameters, execution time of the operation behavior, and execution actions used by the user to trigger the operation behavior.
In a second aspect of the present invention, there is also provided an abnormal account number propagation apparatus, including:
the searching and obtaining module is used for searching users with sharing relation in a plurality of registered users and obtaining the similarity between the users with sharing relation;
the community dividing module is used for dividing the users into different communities according to the sharing relationship among the users and the similarity among the users with the sharing relationship; the similarity between the users in the community is greater than the similarity between the users in the community and the users outside the community;
the abnormal determining module is used for calculating the proportion of the abnormal account numbers in each community according to the determined abnormal account numbers and determining the community with the proportion of the abnormal account numbers larger than a preset threshold value as an abnormal propagation community;
and the exception propagation module is used for propagating all the user accounts in the exception propagation community into exception accounts.
In a third aspect of the present invention, there is also provided an electronic device, including: a processor, a communication interface, a memory, and a communication bus; the processor, the communication interface and the memory complete mutual communication through a communication bus;
a memory for storing a computer program;
and the processor is used for realizing the steps in the abnormal account number propagation method when executing the program stored in the memory.
In a fourth aspect implemented by the present invention, there is also provided a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the abnormal account number propagation method as described in any one of the above.
Aiming at the prior art, the invention has the following advantages:
in the embodiment of the invention, firstly, users with sharing relations are searched in a plurality of registered users, and the similarity between the users with sharing relations is obtained; then, dividing a plurality of users into different communities according to the sharing relationship among the users and the similarity among the users with the sharing relationship; the similarity between the users in the community is greater than the similarity between the users in the community and the users outside the community; therefore, the users are divided into different community groups, abnormal account number spreading can be carried out based on communities, and the community division fully considers the degree of closeness of association among the users, so that the similarity of the users in the same community is high, and the similarity of the users in different communities is low; calculating the proportion of abnormal accounts in each community according to the determined abnormal accounts, and determining the community with the proportion of the abnormal accounts larger than a preset threshold value as an abnormal propagation community; and finally, spreading all user accounts in the abnormal spreading community as abnormal accounts, and improving the recall capability of the abnormal account detection model. So carry out the community based on sharing relation and similarity between the user and divide, the inseparable degree of association between the fully considered user, make in the same community user similarity high, different community user similarity is low, thereby can do the same processing to the similar user in the same community, when discerning that the shared proportion of unusual account number in the community is greater than the preset threshold value, regard it as the unusual propagation community, carry out unusual account number propagation in this community, realized the recall of more unusual account numbers under the circumstances of assurance accuracy, the recall rate loss that has reduced because of the high accuracy of model arouses, the recall ability of unusual account number detection model has been promoted.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments will be briefly described below.
Fig. 1 is a schematic flow chart of an abnormal account number propagation method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart illustrating a substep of an abnormal account propagation method according to an embodiment of the present invention;
fig. 3 is a schematic flow chart illustrating another substep of the abnormal account number propagation method according to the embodiment of the present invention;
fig. 4 is a schematic diagram of a weighted graph model in the abnormal account number propagation method according to the embodiment of the present invention;
fig. 5 is a schematic diagram illustrating community division in the abnormal account number propagation method according to the embodiment of the present invention;
fig. 6 is a schematic block diagram of an abnormal account number propagation apparatus according to an embodiment of the present invention;
fig. 7 is a schematic block diagram of an electronic device provided in an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Fig. 1 is a schematic flow diagram of an abnormal account number propagation method provided in an embodiment of the present invention, and referring to fig. 1, the abnormal account number propagation method includes:
step 101: and searching users with sharing relation among the registered users, and acquiring the similarity among the users with sharing relation.
Here, users having a sharing relationship are first searched for among a plurality of registered users, the sharing relationship between the users indicates that the users are related users, and the similarity between the users having the sharing relationship is further acquired to analyze the degree of closeness of the relationship between the users.
The sharing relationship means that the users share the material or the behavior characteristics and the like. The material refers to environmental parameters directly related to one-time operation behaviors of the user, and the environmental parameters comprise network parameters, equipment parameters and the like. The network parameters may include, but are not limited to, network IP (Internet Protocol, Chinese: Protocol for interconnection between networks), ua (user agent, Chinese: user agent). Where ua is a special string header that enables the server to identify the operating system and version, CPU type, browser and version, browser rendering engine, browser language, browser plug-in, etc. used by the client. The device parameters may include, but are not limited to, a mobile phone number, a mobile phone model, an operator, a home, etc. Behavior characteristics refer to execution time of an operational behavior, execution actions, and the like.
Step 102: dividing the plurality of users into different communities according to the sharing relationship among the plurality of users and the similarity among the users with the sharing relationship; and the similarity between the users in the community is greater than the similarity between the users in the community and the users outside the community.
Here, according to the sharing relationship among a plurality of users and the similarity among the users having the sharing relationship, the association tightness degree among the users can be accurately known, so that the plurality of users are divided into different communities to perform abnormal account number propagation based on the communities, and the association tightness degree among the users is fully considered in the community division, so that the user similarity degree in the same community is high, the user similarity degrees in different communities are low, and the same processing can be performed on the similar users in the same community.
Step 103: and calculating the proportion of the abnormal account numbers in each community according to the determined abnormal account numbers, and determining the community with the proportion of the abnormal account numbers larger than a preset threshold value as an abnormal propagation community.
Here, according to the grasped abnormal account numbers, the concentration of the abnormal account numbers in each community, namely the proportion of the abnormal account numbers, can be calculated, communities with higher concentrations, namely communities with the proportion of the abnormal account numbers larger than a preset threshold value, are screened out, and are determined as abnormal propagation communities for abnormal account number propagation, so that more abnormal account numbers can be recalled to improve the recall capability of abnormal detection.
The preset threshold may be set to any value according to the requirement, and is not limited herein.
The determined abnormal account refers to an abnormal account identified by the abnormal account detection model, the embodiment of the invention makes up for the recall loss of the abnormal account detection model caused by high accuracy, and the recall capability of the abnormal account detection model is improved by abnormal account propagation.
The embodiment of the invention is not limited to the model structure of the abnormal account detection model and how to identify the abnormal account by using the abnormal account detection model, and any model capable of realizing abnormal account detection in the prior art can be applied to the embodiment of the invention.
Step 104: and propagating all the user accounts in the abnormal propagation community as abnormal accounts.
Here, all user accounts in the abnormal propagation community are propagated to be abnormal accounts, so that more abnormal accounts can be recalled under the condition that accuracy is guaranteed, and the recall capability of the abnormal account detection model is improved.
According to the abnormal account number propagation method, the community division is carried out based on the sharing relation and the similarity among the users, the association tightness degree among the users is fully considered, the user similarity in the same community is high, the user similarity in different communities is low, the similar users in the same community can be treated in the same way, when most of the user account numbers in the community are identified to be abnormal account numbers, namely the proportion of the abnormal account numbers in the community is larger than a preset threshold value, the abnormal account numbers are used as the abnormal propagation community, the abnormal account numbers are propagated in the community, the recalling of more abnormal account numbers is realized under the condition that the accuracy is guaranteed, the recall rate loss caused by the high accuracy of the model is reduced, and the recall capacity of the abnormal account number detection model is improved.
Preferably, as shown in fig. 2, the step 101 includes:
step 1011: and extracting the operation behavior parameters of each user according to the operation behavior logs of the registered users.
Here, the operation behavior log records operation behavior data of the users, and the operation behavior parameters of each user can be effectively extracted from the operation behavior logs of the plurality of users, so as to analyze the association relationship between the users based on the operation behavior parameters.
Wherein the operation behavior parameters may include one or more of device parameters, network parameters, execution time of the operation behavior and execution actions used by the user to trigger the operation behavior. As mentioned above, the network parameters may include, for example, network IP, ua, etc., and the device parameters may include, but are not limited to, mobile phone number, mobile phone model, operator, home, etc.
Step 1012: and searching the users with the sharing relationship according to the operation behavior parameters of each user, and acquiring the similarity between the users with the sharing relationship according to the operation behavior parameters of each user.
Here, according to the operation behavior parameters of each user, users having a sharing relationship, that is, users sharing materials or behavior characteristics, environment parameters such as device parameters and network parameters directly related to the materials, that is, the user operation behaviors, and behavior characteristics, that is, parameters such as execution time and execution actions of the operation behaviors are searched. And then according to the operation behavior parameters of each user, acquiring the similarity between the sharing relations so as to analyze the degree of closeness of the association between the users.
Preferably, as shown in fig. 3, in the step 1012, the step of obtaining the similarity between the users having the sharing relationship according to the operation behavior parameter of each user includes:
step 10121: and acquiring the weight of each operation behavior parameter to the user classification.
Here, the weight of each operation behavior parameter for classifying the users is first acquired to judge the similarity between the users based on the weight of the operation behavior parameter.
Preferably, step 10121 may determine the weight of each operation behavior parameter to the user classification according to the frequency of occurrence of each operation behavior parameter in the operation behavior logs of different users.
Here, if the operation behavior parameter of a certain user appears frequently in the operation behavior log of the certain user and appears frequently in the operation behavior logs of other users, the operation behavior parameter can be considered to have good user distinguishing capability and be suitable for classification, and a higher weight can be configured, otherwise a lower weight can be configured, so that the weight of each operation behavior parameter for classifying the user can be accurately determined according to the frequency of each operation behavior parameter appearing in the operation behavior logs of different users.
In step 10121, a TF-IDF (Term Frequency-Inverse text Frequency index) algorithm may be specifically adopted to assign a weight to each operation behavior parameter, but the method is not limited thereto.
Step 10122: and constructing a feature vector of each user according to the weight of each operation behavior parameter.
Here, after the weight of each operation behavior parameter is obtained, a feature vector of each user is constructed to determine a similarity distance between users based on the feature vector.
In step 10122, the weights corresponding to all the operation behavior parameters owned by the user may be specifically constructed as the feature vector of the user. For example, suppose that the operation behavior parameter dimensions of our device are 4 in total, and are IP, ua, phone number phone and device in turn. As shown in fig. 4, the operational behavior parameters owned by the trivia are ip1, ua1, phone1 and device1 in sequence, and the weight of each operational behavior parameter is calculated as w _ ip1, w _ ua1, w _ phone1 and w _ device1 according to the TF-IDF algorithm, so that the feature vector of the trivia is [ w _ ip1, w _ ua1, w _ phone1 and w _ device1 ]. Similarly, the feature vectors of the snow are [ w _ ip1, w _ ua1, w _ phone2 and w _ device2], and the first two bits of the snow feature vector are consistent because ip and ua are shared.
Step 10123: and acquiring the similar distance between the feature vectors of the users with the sharing relation according to the feature vector of each user.
After the feature vectors of the users are obtained, the similar distance between the feature vectors of the users with the sharing relationship is obtained according to the feature vector of each user, the similar distance is used as the similarity between the users with the sharing relationship, the association tightness between the users can be accurately analyzed, and therefore the users can be divided into different communities.
In step 10123, the similarity distance between the feature vectors may be calculated by using a gaussian kernel function, a cosine distance algorithm, an euclidean distance algorithm, or the like, but is not limited thereto.
The original log of the user operation behavior may contain various illegal abnormal values or data with non-uniform formats, for example, the date may take various formats such as "20180901", "9/1/2018/1/friday", or null, or even negative, and the like, and thus the judgment of the data may be affected. Therefore, before performing the analysis, the data needs to be preprocessed, and to achieve this, it is preferable that before the step 1011, the method further includes:
filtering the illegal data in the operation behavior log, and modifying the formats of the data of the same category in the operation behavior log into the same format.
At the moment, the abnormal or meaningless values in the operation behavior log are filtered, and the data format is unified, so that the normal analysis and judgment of the data are ensured, and the error is avoided.
Preferably, as shown in fig. 2, the step 102 includes:
step 1021: and constructing a weighted graph model according to the sharing relationship among the users and the similarity among the users with the sharing relationship.
Here, first, a weighted graph model is constructed according to the sharing relationships among a plurality of users and the similarity among sharing relationships, so as to realize community division by using the weighted graph model.
The weighted graph model is a graph model which describes the connection relationship and the association tightness between users, the weighted graph model comprises a plurality of user nodes, edges connected between the nodes represent the connection relationship between the users, and the weights of the edges represent the association tightness between the users.
Preferably, the step 1021 includes:
each user is respectively used as a node in the weighted graph model; connecting an edge between corresponding nodes in the weighted graph model according to the sharing relation among the users; and determining the weight of the edges between the corresponding nodes in the weighted graph model according to the similarity between the users with the sharing relationship.
At this time, the users form nodes in the weighted graph model, if two users have a sharing relationship, an edge is connected between the nodes corresponding to the two users in the weighted graph model, and then the weight of the corresponding edge is determined according to the similarity between the two users.
For example, as shown in fig. 4, the user hakuai, snow, sprout, wei, east, distant, sky, and hao are respectively used as a node in the weighted graph model. The sharing relationship among them is: the small kay and the small snow share one IP address and ua, the small kay and the small sprout share one IP address, the small kay and the small osmund share one device, the small osmund and the small east share one IP address and a mobile phone number phone, the small osmund and the small distance share one IP address, ua and a mobile phone number phone, the small osmund and the small sky share one mobile phone and ua, and the small snow and the small sky share one ua. Respectively connecting an edge between a small triumph and a small snow, a small triumph and a small sprout, a small triumph and a small osmunda, a small osmunda and a small east, a small osmunda and a small distance, a small osmunda and a small sky, and a small snow and a small Hao which have a sharing relation, and determining the weight of the corresponding edge according to the similarity between the two users.
Step 1022: and dividing the plurality of users into different communities according to the weighted graph model and a preset community detection algorithm.
In the weighted graph model, a preset community detection algorithm can be combined to divide a plurality of users into different communities, so that the similarity of the users in the same community is high, and the similarity of the users in different communities is low.
The preset community detection algorithm may adopt an algorithm based on modularity, such as a community discovery algorithm Louvain algorithm of a large-scale network, but is not limited thereto.
The modularity is a measurement method for evaluating the division quality of a community network, and the physical meaning of the modularity is the difference between the sum of the weights of the connection edges of the nodes in the community and the sum of the weights of the connection edges under random conditions. The modularity gain is that after an isolated point is put into a community C, the change of the modularity is calculated, firstly, the modularity of 1 point and the modularity of the community C are calculated, then the modularity of a new community after combination is calculated, and the modularity gain is obtained by subtracting the first two modularity from the modularity of the new community.
Using a modularity-based algorithm, preferably, the step 1022 includes:
circularly calculating the modularity gain obtained after each node in the weighted graph model is added into an adjacent community according to the connection relation among the nodes in the weighted graph model and the weight of each edge; wherein each node in the weighted graph model initially acts as a separate community;
in the process of circular calculation, if the maximum value of the modularity gain obtained after the node is added into the adjacent community is smaller than zero, clustering the node into the original community, otherwise, clustering the node into the adjacent community corresponding to the maximum value of the modularity gain, and updating the weighted graph model;
and when the modularity gains obtained after each node in the weighted graph model is added into the adjacent communities are all smaller than zero, stopping the circular calculation process, otherwise, repeating the circular calculation process.
At the moment, the community detection algorithm based on the modularity comprises two stages, wherein the first stage belongs to a cyclic calculation process, each node is taken as an independent community, at the moment, several communities exist in several nodes in the network, then another node i does not belong to the community but belongs to a community with a node j, the modularity gain at the moment is calculated, the node i is divided into communities enabling the modularity gain to be maximum and larger than zero, all nodes are traversed, the cyclic calculation process of the first stage is completed, the cyclic calculation process of the first stage is repeated in the second stage until the communities divided by all the nodes are not changed, and the community division is completed.
For example, as shown in fig. 5, the community detection algorithm based on modularity divides a plurality of users into 6 communities (the portion enclosed by the dashed box in the figure is a community).
After the community division is completed, the concentration of the abnormal account numbers (namely the proportion of the abnormal account numbers) in each community can be calculated according to the grasped abnormal account numbers, the community with higher concentration is screened out to be used as an abnormal propagation community, and the remaining unidentified account numbers in the abnormal propagation community are propagated to be the abnormal account numbers, so that more abnormal account numbers are recalled to improve the recall capability of the abnormal detection.
The following illustrates a specific implementation flow of the method according to the embodiment of the present invention.
The abnormal account number propagation method comprises the steps of preprocessing operation behavior logs of users, and extracting operation behavior parameters of each user according to the operation behavior logs of the users; then, searching users with a sharing relation according to the operation behavior parameters of each user, and if the two users share operation behavior parameters such as materials or behavior characteristics, indicating that the two users have the sharing relation; endowing different weights for each operation behavior parameter according to a TF-IDF algorithm, wherein the operation behavior parameter weight of each user forms a feature vector of the user; then, calculating the similarity distance between the characteristic vectors of the users by adopting a Gaussian kernel function so as to determine the similarity between the users with the sharing relationship; and constructing a weighted graph model according to the sharing relationship and the similarity between the users, wherein the users form nodes in the weighted graph model, if the two users have the sharing relationship, an edge is connected between the corresponding nodes in the weighted graph model, and the weight of the edge between the two users is determined by the similarity between the users.
The weighted graph model constructed in the way describes the connection relation and the degree of closeness of association between users, and a community detection algorithm (such as a Louvain algorithm) is adopted to divide the users into different communities. And calculating the abnormal account concentration of each community according to the abnormal accounts identified by the abnormal account detection model, screening out abnormal propagation communities, and propagating all user accounts in the abnormal propagation communities into abnormal accounts. Therefore, the recall rate of the original abnormal account detection model is improved.
According to the abnormal account number propagation method, the contact among the users is fully considered, the graph model is directly constructed according to the contact and attribute characteristics of the users, the abnormal account number detection model identification result is propagated and expanded, the model recall capability is obviously improved under the condition that the accuracy is ensured, and the recall rate loss caused by the high accuracy of the model is reduced.
Referring to fig. 6, an embodiment of the present invention further provides an abnormal account number propagation apparatus 600, including:
a search obtaining module 601, configured to search users having a sharing relationship among the registered multiple users, and obtain a similarity between the users having the sharing relationship;
a community dividing module 602, configured to divide the multiple users into different communities according to a sharing relationship among the multiple users and a similarity between users having the sharing relationship; the similarity between the users in the community is greater than the similarity between the users in the community and the users outside the community;
an anomaly determination module 603, configured to calculate, according to the determined abnormal account, a proportion of the abnormal account in each community, and determine a community in which the proportion of the abnormal account is greater than a preset threshold as an anomaly propagation community;
an exception propagation module 604, configured to propagate all user accounts in the exception propagation community as exception accounts.
According to the abnormal account number propagation device 600, the community division is performed based on the sharing relationship and the similarity between users, the association tightness degree between the users is fully considered, the user similarity in the same community is high, the user similarity in different communities is low, similar users in the same community can be treated in the same way, when most of the user account numbers in the community are identified to be abnormal account numbers, namely the proportion of the abnormal account numbers in the community is larger than the preset threshold value, the abnormal account numbers are used as the abnormal propagation community to propagate the abnormal account numbers in the community, the recalling of more abnormal account numbers is realized under the condition that the accuracy is ensured, the recall rate loss caused by the high accuracy of the model is reduced, and the recall capacity of the abnormal account number detection model is improved.
Preferably, the community dividing module 602 includes:
the graph model building submodule is used for building a weighted graph model according to the sharing relation among the users and the similarity among the users with the sharing relation;
and the community division submodule is used for dividing the plurality of users into different communities according to the weighted graph model and a preset community detection algorithm.
Preferably, the building graph model submodule is specifically configured to: each user is respectively used as a node in the weighted graph model; connecting an edge between corresponding nodes in the weighted graph model according to the sharing relation among the users; and determining the weight of the edges between the corresponding nodes in the weighted graph model according to the similarity between the users with the sharing relationship.
Preferably, the search obtaining module 601 includes:
the extraction submodule is used for extracting the operation behavior parameters of each user according to the operation behavior logs of the registered users;
and the searching and obtaining submodule is used for searching the users with the sharing relationship according to the operation behavior parameters of each user and obtaining the similarity between the users with the sharing relationship according to the operation behavior parameters of each user.
Preferably, the search obtaining sub-module includes:
the weight obtaining unit is used for obtaining the weight of each operation behavior parameter for classifying the user;
the characteristic vector construction unit is used for constructing a characteristic vector of each user according to the weight of each operation behavior parameter;
and the similar distance acquisition unit is used for acquiring similar distances among the feature vectors of the users with the sharing relationship according to the feature vector of each user.
Preferably, the weight obtaining unit is specifically configured to: and determining the weight of each operation behavior parameter to the user classification according to the frequency of the operation behavior parameter appearing in the operation behavior logs of different users.
Preferably, the apparatus further comprises:
and the preprocessing module is used for filtering illegal data in the operation behavior log and modifying the formats of the data of the same category in the operation behavior log into the same format.
Preferably, the operation behavior parameters include one or more of device parameters, network parameters, execution time of the operation behavior and execution actions used by the user to trigger the operation behavior.
According to the abnormal account number propagation device 600, the community division is performed based on the sharing relationship and the similarity between users, the association tightness degree between the users is fully considered, the user similarity in the same community is high, the user similarity in different communities is low, similar users in the same community can be treated in the same way, when most of the user account numbers in the community are identified to be abnormal account numbers, namely the proportion of the abnormal account numbers in the community is larger than the preset threshold value, the abnormal account numbers are used as the abnormal propagation community to propagate the abnormal account numbers in the community, the recalling of more abnormal account numbers is realized under the condition that the accuracy is ensured, the recall rate loss caused by the high accuracy of the model is reduced, and the recall capacity of the abnormal account number detection model is improved.
For the above device embodiments, since they are basically similar to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points.
The embodiment of the invention also provides the electronic equipment which can be a server. As shown in fig. 7, the system comprises a processor 701, a communication interface 702, a memory 703 and a communication bus 704, wherein the processor 701, the communication interface 702 and the memory 703 are communicated with each other through the communication bus 704.
A memory 703 for storing a computer program.
When the processor 701 is configured to execute the program stored in the memory 703, the following steps are implemented:
searching users with sharing relation in a plurality of registered users, and acquiring the similarity between the users with sharing relation;
dividing the plurality of users into different communities according to the sharing relationship among the plurality of users and the similarity among the users with the sharing relationship; the similarity between the users in the community is greater than the similarity between the users in the community and the users outside the community;
calculating the proportion of the abnormal account numbers in each community according to the determined abnormal account numbers, and determining an abnormal propagation community according to the proportion of the abnormal account numbers in each community;
and propagating all the user accounts in the abnormal propagation community as abnormal accounts.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In another embodiment of the present invention, a computer-readable storage medium is further provided, where instructions are stored in the computer-readable storage medium, and when the instructions are executed on a computer, the computer is caused to execute the abnormal account number propagation method described in the above embodiment.
In another embodiment of the present invention, a computer program product containing instructions is further provided, which when run on a computer, causes the computer to execute the abnormal account number propagation method described in the above embodiment.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (11)

1. An abnormal account number propagation method is characterized by comprising the following steps:
searching users with sharing relation in a plurality of registered users, and acquiring the similarity between the users with sharing relation;
dividing the plurality of users into different communities according to the sharing relationship among the plurality of users and the similarity among the users with the sharing relationship; the similarity between the users in the community is greater than the similarity between the users in the community and the users outside the community;
calculating the proportion of abnormal accounts in each community according to the determined abnormal accounts, and determining the communities with the proportion of the abnormal accounts larger than a preset threshold value as abnormal propagation communities;
and propagating all the user accounts in the abnormal propagation community as abnormal accounts.
2. The abnormal account propagation method according to claim 1, wherein the step of dividing the plurality of users into different communities according to the sharing relationship among the plurality of users and the similarity among the users having the sharing relationship includes:
constructing a weighted graph model according to the sharing relationship among the users and the similarity among the users with the sharing relationship;
and dividing the plurality of users into different communities according to the weighted graph model and a preset community detection algorithm.
3. The abnormal account number propagation method according to claim 2, wherein the step of constructing the weighted graph model according to the sharing relationship among the plurality of users and the similarity among the users having the sharing relationship includes:
each user is respectively used as a node in the weighted graph model;
connecting an edge between corresponding nodes in the weighted graph model according to the sharing relation among the users;
and determining the weight of the edges between the corresponding nodes in the weighted graph model according to the similarity between the users with the sharing relationship.
4. The abnormal account number propagation method according to claim 1, wherein the step of searching for users having a sharing relationship among the registered users and obtaining the similarity between the users having the sharing relationship includes:
extracting operation behavior parameters of each user according to the operation behavior logs of the registered users;
and searching the users with the sharing relationship according to the operation behavior parameters of each user, and acquiring the similarity between the users with the sharing relationship according to the operation behavior parameters of each user.
5. The abnormal account number propagation method according to claim 4, wherein the step of obtaining the similarity between users having a sharing relationship according to the operation behavior parameters of each user comprises:
acquiring the weight of each operation behavior parameter for classifying the user;
constructing a feature vector of each user according to the weight of each operation behavior parameter;
and acquiring the similar distance between the feature vectors of the users with the sharing relation according to the feature vector of each user.
6. The abnormal account propagation method according to claim 5, wherein the step of obtaining the weight of each operation behavior parameter for classifying the user comprises:
and determining the weight of each operation behavior parameter to the user classification according to the frequency of the operation behavior parameter appearing in the operation behavior logs of different users.
7. The abnormal account propagation method according to any one of claims 4 to 6, wherein before extracting the operation behavior parameters of each user account according to the operation behavior logs of the registered user accounts, the method further comprises:
filtering the illegal data in the operation behavior log, and modifying the formats of the data of the same category in the operation behavior log into the same format.
8. The abnormal account propagation method according to any one of claims 4 to 6, wherein the operation behavior parameters include one or more of device parameters, network parameters, execution time of operation behavior and execution actions used by a user to trigger operation behavior.
9. An abnormal account number propagation device, comprising:
the searching and obtaining module is used for searching users with sharing relation in a plurality of registered users and obtaining the similarity between the users with sharing relation;
the community dividing module is used for dividing the users into different communities according to the sharing relationship among the users and the similarity among the users with the sharing relationship; the similarity between the users in the community is greater than the similarity between the users in the community and the users outside the community;
the abnormal determining module is used for calculating the proportion of the abnormal account numbers in each community according to the determined abnormal account numbers and determining an abnormal propagation community according to the proportion of the abnormal account numbers in each community;
and the exception propagation module is used for propagating all the user accounts in the exception propagation community into exception accounts.
10. An electronic device, comprising: a processor, a communication interface, a memory, and a communication bus; the processor, the communication interface and the memory complete mutual communication through a communication bus;
a memory for storing a computer program;
a processor, configured to implement the steps of the abnormal account number propagation method according to any one of claims 1 to 8 when executing the program stored in the memory.
11. A computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing the abnormal account propagation method according to any one of claims 1 to 8.
CN202110693767.7A 2021-06-22 2021-06-22 Abnormal account number propagation method and device, electronic equipment and storage medium Pending CN113326178A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110693767.7A CN113326178A (en) 2021-06-22 2021-06-22 Abnormal account number propagation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110693767.7A CN113326178A (en) 2021-06-22 2021-06-22 Abnormal account number propagation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113326178A true CN113326178A (en) 2021-08-31

Family

ID=77424373

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110693767.7A Pending CN113326178A (en) 2021-06-22 2021-06-22 Abnormal account number propagation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113326178A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117074248A (en) * 2023-04-18 2023-11-17 国网宁夏电力有限公司中卫供电公司 SF after digital transformation 6 Method and system for monitoring gas density

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140165195A1 (en) * 2012-12-10 2014-06-12 Palo Alto Research Center Incorporated Method and system for thwarting insider attacks through informational network analysis
CN106709800A (en) * 2016-12-06 2017-05-24 中国银联股份有限公司 Community partitioning method and device based on characteristic matching network
CN108681936A (en) * 2018-04-26 2018-10-19 浙江邦盛科技有限公司 A kind of fraud clique recognition methods propagated based on modularity and balance label
CN110070364A (en) * 2019-03-27 2019-07-30 北京三快在线科技有限公司 Method and apparatus, storage medium based on the fraud of graph model detection clique
US20200136923A1 (en) * 2018-10-28 2020-04-30 Netz Forecasts Ltd. Systems and methods for prediction of anomalies
CN111339436A (en) * 2020-02-11 2020-06-26 腾讯科技(深圳)有限公司 Data identification method, device, equipment and readable storage medium
CN111932386A (en) * 2020-09-09 2020-11-13 腾讯科技(深圳)有限公司 User account determining method and device, information pushing method and device, and electronic equipment
US10841321B1 (en) * 2017-03-28 2020-11-17 Veritas Technologies Llc Systems and methods for detecting suspicious users on networks

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140165195A1 (en) * 2012-12-10 2014-06-12 Palo Alto Research Center Incorporated Method and system for thwarting insider attacks through informational network analysis
CN106709800A (en) * 2016-12-06 2017-05-24 中国银联股份有限公司 Community partitioning method and device based on characteristic matching network
US10841321B1 (en) * 2017-03-28 2020-11-17 Veritas Technologies Llc Systems and methods for detecting suspicious users on networks
CN108681936A (en) * 2018-04-26 2018-10-19 浙江邦盛科技有限公司 A kind of fraud clique recognition methods propagated based on modularity and balance label
US20200136923A1 (en) * 2018-10-28 2020-04-30 Netz Forecasts Ltd. Systems and methods for prediction of anomalies
CN110070364A (en) * 2019-03-27 2019-07-30 北京三快在线科技有限公司 Method and apparatus, storage medium based on the fraud of graph model detection clique
CN111339436A (en) * 2020-02-11 2020-06-26 腾讯科技(深圳)有限公司 Data identification method, device, equipment and readable storage medium
CN111932386A (en) * 2020-09-09 2020-11-13 腾讯科技(深圳)有限公司 User account determining method and device, information pushing method and device, and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117074248A (en) * 2023-04-18 2023-11-17 国网宁夏电力有限公司中卫供电公司 SF after digital transformation 6 Method and system for monitoring gas density

Similar Documents

Publication Publication Date Title
CN110839016B (en) Abnormal flow monitoring method, device, equipment and storage medium
CN111355697B (en) Detection method, device, equipment and storage medium for botnet domain name family
CN106899440B (en) Network intrusion detection method and system for cloud computing
CN109818961B (en) Network intrusion detection method, device and equipment
CN108390788B (en) User identification method and device and electronic equipment
CN107920055B (en) IP risk evaluation method and IP risk evaluation system
CN111949803A (en) Method, device and equipment for detecting network abnormal user based on knowledge graph
CN111368289B (en) Malicious software detection method and device
CN109190014B (en) Regular expression generation method and device and electronic equipment
CN112511561A (en) Network attack path determination method, equipment, storage medium and device
CN112839014B (en) Method, system, equipment and medium for establishing abnormal visitor identification model
CN115484112B (en) Payment big data safety protection method, system and cloud platform
CN111224941A (en) Threat type identification method and device
CN111835681A (en) Large-scale abnormal flow host detection method and device
Megantara et al. Feature importance ranking for increasing performance of intrusion detection system
CN113326178A (en) Abnormal account number propagation method and device, electronic equipment and storage medium
CN117176482B (en) Big data network safety protection method and system
CN113746780B (en) Abnormal host detection method, device, medium and equipment based on host image
CN117254983A (en) Method, device, equipment and storage medium for detecting fraud-related websites
de Araujo et al. Impact of feature selection methods on the classification of DDoS attacks using XGBoost
CN116599743A (en) 4A abnormal detour detection method and device, electronic equipment and storage medium
CN114760113B (en) Abnormality alarm detection method and device, electronic equipment and storage medium
CN110868382A (en) Decision tree-based network threat assessment method, device and storage medium
CN115883231A (en) Wind control rule updating method and device, electronic equipment and readable storage medium
CN117391214A (en) Model training method and device and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination