CN116383520A - Method, device, electronic equipment and medium for identifying key abnormal users - Google Patents

Method, device, electronic equipment and medium for identifying key abnormal users Download PDF

Info

Publication number
CN116383520A
CN116383520A CN202310566737.9A CN202310566737A CN116383520A CN 116383520 A CN116383520 A CN 116383520A CN 202310566737 A CN202310566737 A CN 202310566737A CN 116383520 A CN116383520 A CN 116383520A
Authority
CN
China
Prior art keywords
user network
abnormal user
abnormal
network
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310566737.9A
Other languages
Chinese (zh)
Inventor
李沅坷
金驰
程佩哲
许啸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310566737.9A priority Critical patent/CN116383520A/en
Publication of CN116383520A publication Critical patent/CN116383520A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Primary Health Care (AREA)
  • Accounting & Taxation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Computing Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

The method, the device, the electronic equipment and the medium for identifying the key abnormal user can be applied to the technical field of big data, the technical field of artificial intelligence and the technical field of information security. The method comprises the following steps: acquiring a first abnormal user network; analyzing the first abnormal user network to obtain structural features of the first abnormal user network; removing part of nodes and edges from the first abnormal user network to obtain a second abnormal user network, wherein the structural characteristics of the second abnormal user network are consistent with those of the first abnormal user network; and analyzing the second abnormal user network by using an influence maximization algorithm to obtain a set of key abnormal users. By simplifying the scale of the original abnormal user network, nodes with lower influence in the network are filtered, but the overall structural characteristics of the network are not influenced, so that the time complexity of the influence maximizing algorithm is reduced and the calculation power resource is saved under the condition that the accuracy of key abnormal user identification is not influenced.

Description

Method, device, electronic equipment and medium for identifying key abnormal users
Technical Field
The invention relates to the technical field of big data, artificial intelligence and information security, in particular to a method, a device, electronic equipment and a medium for identifying key abnormal users.
Background
An abnormal user group network may be considered a social network that has different characteristics due to different topologies. Influence diffusion in a social network mainly depends on a built network model, and users are fixed in the network by means of a relation network among user nodes. The social network enables users in the network to be more closely connected, and the influence of user nodes can be rapidly spread and spread in a small-range and high-density user group, so that deep interaction among users is achieved.
At present, on one hand, the analysis of the influence of the abnormal users is mainly carried out from the abnormal users and the association relation among the users, the network structural characteristics associated with the whole abnormal user group are seldom focused, so that a lot of hidden information among the abnormal groups is difficult to mine, the accurate protection of the abnormality is not facilitated, namely, the accuracy of identifying key abnormal users is lower. On the other hand, in real life, an abnormal user network is usually huge and has many nodes and edges. In such networks, the influence maximization algorithm is used for analyzing the influence of node users, so that the time complexity of the algorithm is high, and the calculation amount is very challenging.
Disclosure of Invention
In view of the foregoing, according to a first aspect of the present invention, an embodiment of the present invention provides a method of identifying a key abnormal user, the method including: obtaining a first abnormal user network, wherein the first abnormal user network comprises m1 nodes and n1 sides, each node represents each abnormal user, each side represents an association relationship between each abnormal user, and m1 and n1 are positive integers greater than or equal to 3; analyzing the first abnormal user network to obtain structural features of the first abnormal user network; removing m3 nodes and n3 edges from the first abnormal user network to obtain a second abnormal user network, wherein the second abnormal user network comprises m2 nodes and n2 edges, the structural characteristics of the second abnormal user network are consistent with those of the first abnormal user network, m2 and n2 are positive integers greater than or equal to 2, m3 and n3 are positive integers greater than or equal to 1, m2 and m3 are smaller than m1, and n2 and n3 are smaller than n1; and analyzing the second abnormal user network by using an influence maximization algorithm to obtain a set of key abnormal users, wherein the set of key abnormal users comprises k abnormal users, the k abnormal users are nodes with the influence of k bits in the first row from big to small in the second abnormal user network, k is a positive integer greater than or equal to 1, and k is smaller than m2.
According to some exemplary embodiments, the analyzing the first abnormal user network to obtain the structural feature of the first abnormal user network specifically includes: analyzing the first abnormal user network to obtain an overall structure characteristic value and a local structure characteristic value of the first abnormal user network; determining the type of the first abnormal user network according to the overall structure characteristic value and the local structure characteristic value of the first abnormal user network; and determining the structural characteristics of the first abnormal user network according to the type, the overall structural characteristic value and the local structural characteristic value of the first abnormal user network.
According to some exemplary embodiments, the overall structure feature value of the first abnormal user network includes a degree of each node of the first abnormal user network, a betweenness of each edge, a cluster coefficient of the first abnormal user network, and an average path length; and/or, the local structural feature values of the first abnormal user network comprise degree correlations and medium number correlations.
According to some exemplary embodiments, the removing m3 nodes and n3 edges from the first abnormal user network to obtain a second abnormal user network specifically includes: a network simplifying sub-step of determining a degree threshold and a betweenness threshold according to the degree distribution and the betweenness distribution of the first abnormal user network; and removing nodes with degrees smaller than the degree threshold from the first abnormal user network, and removing edges with the number of bets smaller than the number of bets threshold from the first abnormal user network to obtain an intermediate abnormal user network.
According to some exemplary embodiments, the removing m3 nodes and n3 edges from the first abnormal user network to obtain a second abnormal user network further specifically includes: analyzing the intermediate abnormal user network to obtain structural characteristics of the intermediate abnormal user network; comparing the structural features of the intermediate abnormal user network with the structural features of the first abnormal user network; if the structural characteristics of the intermediate abnormal user network are inconsistent with the structural characteristics of the first abnormal user network, adjusting the degree threshold and the betweenness threshold, and repeatedly executing the network simplifying sub-step until the structural characteristics of the intermediate abnormal user network are consistent with the structural characteristics of the first abnormal user network; and taking an intermediate abnormal user network consistent with the structural characteristics of the first abnormal user network as the second abnormal user network.
According to some exemplary embodiments, the analyzing the second abnormal user network by using an influence maximization algorithm to obtain a set of key abnormal users specifically includes: constructing a directed acyclic graph for each node v in the second abnormal user network to obtain m2 directed acyclic graphs, wherein each node v is distributed among a plurality of directed acyclic graphs; respectively calculating influence force of each node in each of m2 directed acyclic graphs in the directed acyclic graph; and aiming at each node v in the second abnormal user network, acquiring a plurality of influence forces of the node v in a plurality of distributed directed acyclic graphs, and superposing the plurality of influence forces of the node v to acquire the total influence force of the node v in the second abnormal user network.
According to some exemplary embodiments, the analyzing the second abnormal user network by using an influence maximization algorithm to obtain a set of key abnormal users further specifically includes: ordering the total influence of all nodes in the second abnormal user network according to the sequence from big to small; k nodes with the total influence on the first k bits are selected, and k key abnormal users are obtained to form a set of key abnormal users, wherein the value of k is preset.
According to some exemplary embodiments, the constructing a directed acyclic graph for each node v in the second abnormal user network specifically includes: calculating influence of other nodes except the node v in the second abnormal user network on the node v; screening q nodes from the other nodes, wherein the influence of each node in the q nodes on the node v is greater than a preset influence threshold, q is a positive integer greater than or equal to 1, and q is less than m2; and constructing a directed acyclic graph of the node v according to the node v, the q nodes and the corresponding edges.
According to some exemplary embodiments, the method further comprises: analyzing the second abnormal user network to obtain an overall structure characteristic value and a local structure characteristic value of the second abnormal user network; determining the type of the second abnormal user network according to the integral structure characteristic value and the local structure characteristic value of the second abnormal user network; and determining the structural characteristics of the second abnormal user network according to the type, the overall structural characteristic value and the local structural characteristic value of the second abnormal user network.
According to some exemplary embodiments, the first anomalous user network is of the scaleless type, the structural feature of the first anomalous user network comprises a distribution of degrees of nodes of the first anomalous user network conforming to a power law distribution, and the structural feature of the second anomalous user network comprises a distribution of degrees of nodes of the second anomalous user network conforming to a power law distribution.
According to some exemplary embodiments, the structural features of the second abnormal user network consistent with the structural features of the first abnormal user network include: the ratio of the power exponent in the power law distribution of the degree of the node of the second abnormal user network to the power exponent in the power law distribution of the degree of the node of the first abnormal user network is between 0.8 and 1.2.
According to some example embodiments, the adjusting the degree threshold and the betweenness threshold comprises: and adjusting the degree threshold and the medium threshold according to a Bayesian optimization parameter adjustment method.
According to a second aspect of the present invention there is also provided an apparatus for identifying key anomalous users, the apparatus comprising: the first abnormal user network acquisition module is used for acquiring a first abnormal user network, wherein the first abnormal user network comprises m1 nodes and n1 sides, each node represents each abnormal user, each side represents the association relationship among each abnormal user, and m1 and n1 are positive integers which are more than or equal to 3; the first abnormal user network analysis module is used for analyzing the first abnormal user network to obtain the structural characteristics of the first abnormal user network; a second abnormal user network obtaining module, configured to remove m3 nodes and n3 edges from the first abnormal user network to obtain a second abnormal user network, where the second abnormal user network includes m2 nodes and n2 edges, structural features of the second abnormal user network are consistent with structural features of the first abnormal user network, m2 and n2 are positive integers greater than or equal to 2, m3 and n3 are positive integers greater than or equal to 1, m2 and m3 are both less than m1, and n2 and n3 are both less than n1; the key abnormal user set obtaining module is used for analyzing the second abnormal user network by utilizing an influence maximization algorithm to obtain a set of key abnormal users, wherein the set of key abnormal users comprises k abnormal users, the k abnormal users are nodes with the influence of k bits in the first row from big to small in the second abnormal user network, k is a positive integer greater than or equal to 1, and k is smaller than m2.
According to a third aspect of the present invention, there is provided an electronic device comprising: one or more processors; and a storage device for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method as described above.
According to a fourth aspect of the present invention there is provided a computer readable storage medium having stored thereon executable instructions which when executed by a processor cause the processor to perform a method as described above.
According to a fifth aspect of the present invention there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
One or more of the above embodiments have the following advantages or benefits: by simplifying the scale of the original abnormal user network, nodes with lower influence in the network are filtered, but the overall structural characteristics of the network are not influenced, so that the time complexity of the influence maximizing algorithm is reduced and the calculation power resource is saved under the condition that the accuracy of key abnormal user identification is not influenced.
Drawings
The foregoing and other objects, features and advantages of the invention will be apparent from the following description of embodiments of the invention with reference to the accompanying drawings, in which:
Fig. 1 schematically illustrates an application scenario diagram of a method, an apparatus, an electronic device, and a medium for identifying key abnormal users according to an embodiment of the present invention.
FIG. 2 schematically illustrates a flow chart of a method of identifying key anomalous users in accordance with an embodiment of the invention.
Fig. 3 schematically shows a schematic diagram of a first anomalous user network according to an embodiment of the invention.
Fig. 4 schematically shows a flow chart of acquiring a first anomalous user network structure feature according to an embodiment of the invention.
Fig. 5 schematically shows a flow chart of network simplification steps according to an embodiment of the invention.
Fig. 6 schematically shows a schematic diagram of a second anomalous user network according to an embodiment of the invention.
FIG. 7A schematically illustrates a network scale/node number-time-consuming graph of a proposed maximum impact algorithm and other maximum impact algorithms according to an embodiment of the present invention; fig. 7B schematically illustrates a network scale/node number-impact propagation range graph of the proposed impact maximization algorithm and other impact maximization algorithms according to an embodiment of the present invention.
Fig. 8 schematically shows a flow chart of an analysis of a second abnormal user network by a proposed maximum impact algorithm according to an embodiment of the invention.
Fig. 9 schematically shows a block diagram of an apparatus for identifying key abnormal users according to an embodiment of the present invention.
Fig. 10 schematically illustrates a block diagram of an electronic device adapted to implement a method of identifying key abnormal users according to an embodiment of the present invention.
Detailed Description
Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the invention. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the invention. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
In the technical scheme of the invention, the acquisition, storage, application and the like of the related personal information of the user accord with the regulations of related laws and regulations, necessary security measures are taken, and the public order harmony is not violated.
First, technical terms described herein are explained and illustrated as follows.
Complex network: those skilled in the art will appreciate that a number of complex systems found in nature can be described by various networks. A typical network consists of a number of nodes that represent different individuals in the real system, and edges that represent relationships between individuals, often with a particular relationship between two nodes that is connected to an edge, and vice versa, and the two nodes that are connected to an edge are considered to be adjacent in the network. For example, the nervous system can be seen as a network of numerous nerve cells interconnected by nerve fibers; a computer network may be considered a network of autonomous computers interconnected by a communication medium such as fiber optic cable, twisted pair, coaxial cable, and the like. Also similar are power networks, social networks, traffic networks, dispatch networks, etc.
Scaleless network: a special network structure in which the distribution of the degrees of the nodes follows a power law distribution. That is, there are few nodes in the network that are very high in metric, while most nodes are low in metric. A significant feature of such a network is that it does not have a typical scale to describe the link situation between nodes, and is therefore referred to as a scaleless network. Many real world network structures exhibit scaleless characteristics such as the internet, social networks, biological networks, etc.
Random network: in a graph formed of N nodes, a network formed by randomly connecting any two nodes with probability P, that is, whether there is a connected edge between the nodes is an uncertain thing, and is determined by the probability P. It is characterized by no clustering characteristics but a small average path length.
Small world web (WS web): a new network is constructed by removing the original edges in the regular network with very little probability P and randomly connecting a new endpoint. The method is characterized in that the average path length is small and the clustering coefficient is large.
Abnormal user network: the method refers to a complex network formed by each user distributed on an abnormal industrial chain, in the abnormal user network, nodes represent each abnormal user, and edges represent association relations among each abnormal user.
Degree of node: also referred to as "degree" for short, the degree of a node indicates the total number of edges that the node has to connect with other nodes in the network.
Medium number: in a broad sense, bets are divided into node bets and edge bets. Node bets refer to the ratio of the number of shortest paths through the node to the total number of shortest paths in the network. Edge betweenness refers to the ratio of the number of shortest paths through the edge to the total number of shortest paths in the network. In this context, unless otherwise specified, the terms "number" and "number" refer to the number of edge medium.
Clustering coefficients: is an index for measuring the aggregation degree of nodes in a network. It indicates how many are interconnected between the neighbors of a node, that is, how many triangles exist around this node. The higher the clustering coefficient, the more prone the nodes in the network to form a tightly coupled population.
Average path length of network: defined as the average of the distances between any two nodes.
Degree of correlation: a relationship between nodes of greater degree and nodes of lesser degree in a network is described. If the nodes with large degrees are prone to be connected with the nodes with large degrees, the network is positively correlated or the network is called as being matched; conversely, if a node of a large degree tends to connect with a node of a small degree, the network is inversely related, or the network is heteroleptic.
Medium number correlation: similar to the degree correlation, the relationship between the large-medium-number side and the small-medium-number side in the network is described. If the high-betweenness edge is prone to connect with the high-betweenness edge, the network is directly related to betweenness or the network is co-configured; conversely, if a large-order edge tends to connect with a small-order edge, the network is inversely related to the order, or alternatively the network is heteroleptic.
With the rapid development of external abnormal industries, abnormal practitioners in abnormal markets at present have increased dramatically, which causes great economic loss to some enterprises and causes leakage of a large amount of personal information. The analysis of the influence of the abnormal users at the present stage is mainly carried out from the abnormal users and the association relation among the users, and the network structural characteristics associated with the whole abnormal user group are seldom focused, so that a lot of hidden information among the abnormal groups is difficult to mine, and the accurate protection of the abnormality is not facilitated.
Based on this, an embodiment of the present invention provides a method of identifying a key abnormal user, the method comprising: obtaining a first abnormal user network, wherein the first abnormal user network comprises m1 nodes and n1 sides, each node represents each abnormal user, each side represents an association relationship between each abnormal user, and m1 and n1 are positive integers greater than or equal to 3; analyzing the first abnormal user network to obtain structural features of the first abnormal user network; removing m3 nodes and n3 edges from the first abnormal user network to obtain a second abnormal user network, wherein the second abnormal user network comprises m2 nodes and n2 edges, the structural characteristics of the second abnormal user network are consistent with those of the first abnormal user network, m2 and n2 are positive integers greater than or equal to 2, m3 and n3 are positive integers greater than or equal to 1, m2 and m3 are smaller than m1, and n2 and n3 are smaller than n1; and analyzing the second abnormal user network by using an influence maximization algorithm to obtain a set of key abnormal users, wherein the set of key abnormal users comprises k abnormal users, the k abnormal users are nodes with the influence of the first k bits from big to small in the second abnormal user network, k is a positive integer greater than or equal to 1, and k is smaller than m2. In the method according to the embodiment of the invention, the node with lower influence in the network is filtered by simplifying the scale of the original abnormal user network, but the overall structural characteristics of the network are not influenced, so that the time complexity of the influence maximizing algorithm is reduced and the computing power resource is saved under the condition that the accuracy of the identification of the key abnormal users is not influenced.
It should be noted that the method and the device for identifying the key abnormal user determined by the invention can be used in the technical fields of big data, artificial intelligence and information security.
Fig. 1 schematically illustrates an application scenario diagram of a method, an apparatus, a device, a medium for identifying key abnormal users according to an embodiment of the present invention.
As shown in fig. 1, an application scenario 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that the method for identifying a critical abnormal user provided in the embodiment of the present invention may be generally performed by the server 105. Accordingly, the device for identifying key abnormal users provided in the embodiments of the present invention may be generally disposed in the server 105. The method of identifying key abnormal users provided by the embodiments of the present invention may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the apparatus for identifying key abnormal users provided in the embodiments of the present invention may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The method for identifying key abnormal users provided by the embodiment of the invention is described in detail below based on the scenario described in fig. 1 through fig. 2 to 8.
FIG. 2 schematically illustrates a flow chart of a method of identifying key anomalous users in accordance with an embodiment of the invention.
As shown in fig. 2, the method 200 for identifying a key abnormal user in this embodiment may include operations S210 to S240.
In operation S210, a first abnormal user network is obtained, where the first abnormal user network includes m1 nodes and n1 edges, each node represents each abnormal user, each edge represents an association relationship between each abnormal user, and m1 and n1 are positive integers greater than or equal to 3.
In an embodiment of the present invention, the acquiring the first abnormal user network specifically includes: collecting related data of abnormal users from an original data set; performing data processing on the related data; and constructing a network model by taking the abnormal users as nodes and certain association relations among the abnormal users as edges.
In an embodiment of the present invention, related data information of an abnormal user may be acquired from multiple levels to obtain an original data set, for example: personal information including user ID, head portrait, personal profile, etc.; the geographic position information comprises position information of abnormal users recorded by geographic position tags, check-in and the like. Meanwhile, the association relationship between abnormal users may be acquired, for example: friendship, namely, friendship established by abnormal users with other users in a social network; social dynamic connection comprises social dynamic connection established between abnormal user comments, forwarding, or social dynamic such as comments, forwarded and other users; group relationships, i.e., group relationships formed by an abnormal user with other users of the social network group with whom it is co-participating, etc.
In the embodiment of the invention, after the related data of the abnormal user is collected from the original data set, the related data can be further preprocessed, such as data format arrangement, cleaning, screening and the like, so as to meet the requirement of model construction.
In an embodiment of the present invention, in order to obtain the network model of the first abnormal user network, a Python-based network toolkit may be used. For example, a null undirected graph can be created by using a NetworkX toolkit, and m1 nodes and n1 edges are added to the network model according to the user IDs in the original dataset and the association relationship between the user IDs, so that a complete network model is constructed.
It should be noted that network x is a tool package for modeling graph theory and complex network developed in Python language, and provides functions of creating, operating and analyzing complex network, which allows a user to easily interact with graph theory algorithm using Python programming and a large number of data structures. Network x supports different types of graph structures, such as undirected graph, directed graph, and multiple graph, and additional information such as weights, attributes, etc., which enables network x to meet modeling requirements of various complex network scenarios. The network x further includes many commonly used graph algorithms and network analysis functions, which can provide functions of analyzing network structures, designing new network algorithms, performing network drawing, and the like.
Fig. 3 schematically shows a schematic diagram of a first anomalous user network according to an embodiment of the invention.
In the embodiment of the present invention, as shown in fig. 3, the first abnormal user network has nodes represented by circles and edges represented by line segments, that is, circles represent the abnormal users, and line segments represent the association relationships between the abnormal users. Wherein in said first abnormal user network the degree of the nodes exhibits a power law distribution, which means that only a few nodes (e.g. node 1) have a higher degree, whereas most nodes have a lower degree (e.g. node 2 and node 3).
In operation S220, the first abnormal user network is analyzed to obtain structural features of the first abnormal user network.
Fig. 4 schematically shows a flow chart of acquiring a first anomalous user network structure feature according to an embodiment of the invention.
In the embodiment of the present invention, the analyzing the first abnormal user network to obtain the structural feature of the first abnormal user network may include operations S310 to S330.
In operation S310, the first abnormal user network is analyzed to obtain an overall structural feature value and a local structural feature value of the first abnormal user network.
In the embodiment of the invention, the overall structure characteristic value of the first abnormal user network comprises the degree of each node of the first abnormal user network, the medium number of each side, the clustering coefficient of the first abnormal user network and the average path length; the local structural feature values of the first abnormal user network include degree correlations and medium number correlations.
According to the above, the network model built based on the abnormal user data may be calculated and analyzed as follows using the function built in network x: calculating the degree of each node in the first abnormal user network, and analyzing the connection density of the nodes in the network and possible central nodes according to the distribution condition of the degree; calculating the betweenness centrality of each node in the first abnormal user network, and reflecting the importance degree of the node in the network; calculating a clustering coefficient of the first abnormal user network to analyze the aggregation degree among nodes in the first abnormal user network; and calculating the average path length of the first abnormal user network to analyze the contact distance between the abnormal users. Further, the following calculations and analyses may be performed: calculating and analyzing degree correlation, and judging whether the tendency continuous edges between the similar nodes and the heterogeneous nodes exist or not; and (5) calculating and analyzing the dielectric number correlation and analyzing the potential correlation characteristics.
In operation S320, a type of the first abnormal user network is determined according to the global structural feature value and the local structural feature value of the first abnormal user network.
In an embodiment of the present invention, the type of the first abnormal user network may include a scaleless network, a random network, or a small world network, etc. It should be noted that the types listed herein are only exemplary, and are not intended to limit the types of the abnormal user networks in the embodiments of the present invention, that is, the abnormal user networks in the embodiments of the present invention may also include other types of networks.
In the embodiment of the invention, the network type can be more accurately determined based on the overall structure characteristic value and the local structure characteristic value of the network, and meanwhile, the smaller calculated amount is ensured, thereby being beneficial to realizing the balance of saving calculation resources and accurately calculating.
In operation S330, the structural feature of the first abnormal user network is determined according to the type, the overall structural feature value, and the local structural feature value of the first abnormal user network.
It should be noted that in real life, many external factors affect the structural properties of the real network, and thus affect the shape of the degree distribution. Therefore, for the property verification of the verification network, a power law distribution or index distribution simulation network structure is generally adopted, and whether the network meets certain model properties is judged according to simulation results, so that the overall structure characteristics of the model are obtained. The objective of the overall structural feature analysis of the network model is to obtain macroscopic understanding of the network structure, however, the generation and interaction of the association relationship between users are the important points of research, and further analysis of the network is required to involve the analysis of the properties of edges and nodes. Therefore, it is necessary to analyze local structural features of the network. Meanwhile, according to the network type, the overall characteristics and the local characteristics, the structural characteristics of the network are determined, so that the determined structural characteristics are more accurate. By combining the structural characteristics of the first abnormal user network determined by the overall structure analysis and the local structure analysis method, the network model established by the abnormal user can be deeply analyzed, and the key nodes in the network and the contact modes among the users are revealed.
Referring back to fig. 2, m3 nodes and n3 edges are removed from the first abnormal user network in operation S230 to obtain a second abnormal user network, wherein the second abnormal user network includes m2 nodes and n2 edges, structural features of the second abnormal user network are consistent with structural features of the first abnormal user network, m2 and n2 are positive integers greater than or equal to 2, m3 and n3 are positive integers greater than or equal to 1, m2 and m3 are both less than m1, and n2 and n3 are both less than n1.
In the embodiment of the present invention, the above operation filters the user node with smaller influence in the first abnormal user network, and does not influence the structural feature of the first abnormal user network, so as to reduce the calculation amount, so that the operation may be called a network simplification step, where the network simplification step specifically includes operations S410 to S460.
Fig. 5 schematically shows a flow chart of network simplification steps according to an embodiment of the invention.
In operation S410, a degree threshold and a betweenness threshold are determined according to the degree distribution and the betweenness distribution of the first abnormal user network.
Nodes having degrees less than the degree threshold are removed from the first abnormal user network, and edges having a number of bets less than the bets threshold are removed from the first abnormal user network to obtain an intermediate abnormal user network in operation S420.
The process of operation S420 is also referred to as a network simplification sub-step, which implements filtering of nodes whose degree and betweenness are smaller than a preset threshold, which itself is less computationally intensive. Through the simplifying step, the network of the first abnormal user is simplified, so that the subsequent calculation complexity of analysis of the abnormal user network by the influence maximization algorithm is reduced, and the total calculation amount is reduced.
In operation S430, the intermediate abnormal user network is analyzed to obtain structural features of the intermediate abnormal user network.
In operation S440, the structural features of the intermediate abnormal user network are compared with the structural features of the first abnormal user network.
In operation S450, if the structural feature of the intermediate abnormal user network is inconsistent with the structural feature of the first abnormal user network, the degree threshold and the betweenness threshold are adjusted, and operation S420 is repeatedly performed until the structural feature of the intermediate abnormal user network is consistent with the structural feature of the first abnormal user network.
In an embodiment of the invention, said adjusting said degree threshold and said betweenness threshold comprises: and adjusting the degree threshold and the medium threshold according to a Bayesian optimization parameter adjustment method.
It should be noted that bayesian optimization is a method for machine learning tuning, and its main idea is that given an optimized objective function (generalized function, only input and output need to be specified, internal structure and mathematical properties need not to be known), the posterior distribution of the objective function is updated by continuously adding sample points (gaussian process) until the posterior distribution substantially fits to the true distribution.
In operation S460, an intermediate abnormal user network consistent with the structural characteristics of the first abnormal user network is taken as the second abnormal user network.
In an embodiment of the present invention, in order to determine whether the structural features of the second abnormal user network are consistent with the structural features of the first abnormal user network, the method for identifying a key abnormal user further includes obtaining the structural features of the second abnormal user network, and specifically includes: analyzing the second abnormal user network to obtain an overall structure characteristic value and a local structure characteristic value of the second abnormal user network; determining the type of the second abnormal user network according to the integral structure characteristic value and the local structure characteristic value of the second abnormal user network; and determining the structural characteristics of the second abnormal user network according to the type, the overall structural characteristic value and the local structural characteristic value of the second abnormal user network. The specific steps for obtaining the second abnormal user network structure feature are similar to those for obtaining the first abnormal user network structure feature, and will not be described herein. That is, in step S430, obtaining the structural feature of the intermediate abnormal user network may include: analyzing the intermediate abnormal user network to obtain an overall structure characteristic value and a local structure characteristic value of the intermediate abnormal user network; determining the type of the middle abnormal user network according to the overall structure characteristic value and the local structure characteristic value of the middle abnormal user network; and determining the structural characteristics of the middle abnormal user network according to the type, the overall structural characteristic value and the local structural characteristic value of the middle abnormal user network. The specific steps for obtaining the intermediate abnormal user network structure feature are similar to those for obtaining the first abnormal user network structure feature, and are not described herein.
In an embodiment of the present invention, the type of the first abnormal user network is a scaleless network, and the structural feature of the first abnormal user network includes that the distribution of the degrees of the nodes of the first abnormal user network accords with the power law distribution; the structural features of the second abnormal user network include that a distribution of degrees of nodes of the second abnormal user network corresponds to a power law distribution.
In an embodiment of the present invention, the structural features of the second abnormal user network consistent with the structural features of the first abnormal user network include: the power exponent in the power law distribution of the degree of the node of the second abnormal user network is substantially identical to the ratio of the power exponent in the power law distribution of the degree of the node of the first abnormal user network, specifically, the ratio is between 0.8 and 1.2.
Fig. 6 schematically shows a schematic diagram of a second anomalous user network according to an embodiment of the invention.
In an embodiment of the present invention, as shown in fig. 6, the second abnormal user network deletes the original node 2 and node 3 (indicated by a dotted line in fig. 6) on the basis of the first abnormal user network, so that a ratio of a power exponent in a power law distribution of the degree of the node of the second abnormal user network to a power exponent in a power law distribution of the degree of the node of the first abnormal user network is substantially identical, specifically, the ratio is about 0.9.
According to the embodiment of the invention, after the first abnormal user network is simplified, through the steps S430-S460, the network is simplified, and meanwhile, the structural characteristics of the network are not changed, so that the accuracy of subsequent analysis is ensured. In operation S240, the second abnormal user network is analyzed by using an influence maximization algorithm to obtain a set of key abnormal users, where the set of key abnormal users includes k abnormal users, the k abnormal users are nodes with the influence of k bits in the first k bits from big to small in the second abnormal user network, k is a positive integer greater than or equal to 1, and k is less than m2.
In an embodiment of the present invention, the second abnormal user network may be analyzed using an influence maximization algorithm.
It should be noted that the influence maximization algorithm is an algorithm that selects some seed nodes in the social network so that the information propagation initiated from these nodes can cover as many other nodes as possible. In an embodiment of the invention, an influence maximization algorithm LDAG is preferably employed. The LDAG algorithm is a linear threshold model-based impact maximization algorithm that computes the propagation of impacts in a network by constructing a local directed acyclic graph for network nodes.
FIG. 7A schematically illustrates a network scale/node number-time-consuming graph of a proposed maximum impact algorithm and other maximum impact algorithms according to an embodiment of the present invention; fig. 7B schematically illustrates a network scale/node number-impact propagation range graph of the proposed impact maximization algorithm and other impact maximization algorithms according to an embodiment of the present invention.
As can be seen from fig. 7A and 7B, at the same network scale/node number, the impact propagation range of the impact maximization algorithm proposed by the embodiment of the present invention is weaker than that of the greedy algorithm, but the time consumption is shorter than that of the greedy algorithm.
As can be seen from fig. 7A and 7B, at the same network scale/node number, the impact maximization algorithm proposed by the embodiment of the present invention consumes more time than the degree discount algorithm, but the impact propagation range is stronger than the degree discount algorithm.
Based on the comparison result, the time and the propagation range are comprehensively considered, and the influence maximum algorithm provided by the embodiment of the invention reduces the running time while guaranteeing the stability of the influence propagation range. Therefore, the influence maximization algorithm provided by the embodiment of the invention has higher efficiency and accuracy compared with other influence maximization algorithms.
Fig. 8 schematically shows a flow chart of an analysis of a second abnormal user network by a proposed maximum impact algorithm according to an embodiment of the invention.
In the embodiment of the present invention, the maximum influence algorithm provided by the embodiment of the present invention analyzes the second abnormal user network to obtain a set of key abnormal users, which may include operations S510 to S550.
In operation S510, a directed acyclic graph is constructed for each node v in the second abnormal user network to obtain m2 directed acyclic graphs, wherein each node v is distributed among a plurality of directed acyclic graphs.
In the embodiment of the invention, when the directed acyclic graph is constructed for the node v, the nodes influencing the node v in the whole network are calculated, and the calculation efficiency of the algorithm is reduced due to excessive nodes, so that a part of the nodes with small influence on the node v are required to be reduced when the local directed acyclic graph of the node v is constructed, and the aims of improving the calculation efficiency and accuracy are fulfilled.
Based on the foregoing, in an embodiment of the present invention, the constructing a directed acyclic graph for each node v in the second abnormal user network may further include: calculating influence of other nodes except the node v in the second abnormal user network on the node v; screening q nodes from the other nodes, wherein the influence of each node in the q nodes on the node v is greater than a preset influence threshold, q is a positive integer greater than or equal to 1, and q is less than m2; and constructing a directed acyclic graph of the node v according to the node v, the q nodes and the corresponding edges.
In operation S520, influence forces of the respective nodes in each of the m2 directed acyclic graphs are calculated within the range of the directed acyclic graph, respectively.
In operation S530, for each node v in the second abnormal user network, a plurality of influence forces of the node v in a distributed plurality of directed acyclic graphs are obtained, and the plurality of influence forces of the node v are superimposed, so as to obtain a total influence force of the node v in the second abnormal user network.
The total influence of the nodes in the second abnormal user network is ordered in order from big to small in operation S540.
In operation S550, k nodes with the total influence in the first k bits are selected, and k key abnormal users are obtained to form a set of key abnormal users, where the value of k is preset.
Through operations S540-S550, key abnormal users can be obtained by using a simple sorting method, and the method is beneficial to saving computing resources.
The method for identifying the key abnormal user can mine the hidden key abnormal user nodes in the network, accurately protect the abnormal users with large hidden influence, and effectively improve the safety protection efficiency. In particular, the advantages of the embodiments of the present invention are as follows: analyzing network structure characteristics of the abnormal user group from the aspects of integrity and structure, and analyzing influence of the abnormal user nodes in the network through the characteristics of the abnormal user nodes in the network so as to mine complex hidden relations among the abnormal users, and identifying key abnormal users more accurately; and reducing the calculation amount of the influence maximization algorithm, and filtering nodes with lower influence in the network by simplifying the network scale, but not influencing the integral structural characteristics of the network, thereby greatly reducing the complexity of the influence maximization algorithm and having larger use value.
Fig. 9 schematically shows a block diagram of an apparatus for identifying key abnormal users according to an embodiment of the present invention.
As shown in fig. 9, the apparatus 700 for identifying a key abnormal user according to the embodiment includes a first abnormal user network acquisition module 710, a first abnormal user network analysis module 720, a second abnormal user network acquisition module 730, and a key abnormal user set acquisition module 740.
The first abnormal user network obtaining module 710 may be configured to obtain a first abnormal user network, where the first abnormal user network includes m1 nodes and n1 sides, each node represents each abnormal user, each side represents an association relationship between each abnormal user, and m1 and n1 are positive integers greater than or equal to 3. In an embodiment, the first abnormal user network obtaining module 710 may be configured to perform the operation S210 described above, which is not described herein.
The first abnormal user network analysis module 720 may be configured to analyze the first abnormal user network to obtain structural features of the first abnormal user network. In an embodiment, the first abnormal user network analysis module 720 may be configured to perform the operation S220 described above, which is not described herein.
The second abnormal user network obtaining module 730 may be configured to remove m3 nodes and n3 edges from the first abnormal user network to obtain a second abnormal user network, where the second abnormal user network includes m2 nodes and n2 edges, structural features of the second abnormal user network are consistent with structural features of the first abnormal user network, m2 and n2 are positive integers greater than or equal to 2, m3 and n3 are positive integers greater than or equal to 1, m2 and m3 are both less than m1, and n2 and n3 are both less than n1. In an embodiment, the second abnormal user network acquisition module 730 may be configured to perform the operation S230 described above, which is not described herein.
The key abnormal user set obtaining module 740 may be configured to analyze the second abnormal user network by using an influence maximization algorithm to obtain a set of key abnormal users, where the set of key abnormal users includes k abnormal users, where k abnormal users are nodes with influence values in the second abnormal user network from top to bottom arranged in a first k bits, k is a positive integer greater than or equal to 1, and k is less than m2. In an embodiment, the key abnormal user set obtaining module 740 may be used to perform the operation S240 described above, which is not described herein.
According to an embodiment of the present invention, the first abnormal user network analysis module 720 includes a first structural feature analysis unit, a first network type determination unit, and a first network structural feature determination unit.
The first structural feature analysis unit is used for analyzing the first abnormal user network to obtain the overall structural feature value and the local structural feature value of the first abnormal user network. In an embodiment, the first structural feature analysis unit may be used to perform the operation S310 described above, which is not described herein.
The first network type determining unit is used for determining the type of the first abnormal user network according to the integral structure characteristic value and the local structure characteristic value of the first abnormal user network. In an embodiment, the first network type determining unit may be configured to perform the operation S320 described above, which is not described herein.
The first network structure feature determining unit is used for determining the structure feature of the first abnormal user network according to the type, the overall structure feature value and the local structure feature value of the first abnormal user network. In an embodiment, the first network structural feature determining unit may be configured to perform the operation S330 described above, which is not described herein.
According to an embodiment of the present invention, the first abnormal user network analysis module 720 further includes: the second structural feature analysis unit is used for analyzing the second abnormal user network to obtain an overall structural feature value and a local structural feature value of the second abnormal user network; a second network type determining unit, configured to determine a type of the second abnormal user network according to the overall structure feature value and the local structure feature value of the second abnormal user network; and the second network structure characteristic determining unit is used for determining the structure characteristics of the second abnormal user network according to the type, the overall structure characteristic value and the local structure characteristic value of the second abnormal user network.
According to an embodiment of the present invention, the second abnormal user network acquisition module 730 includes a network simplification subunit, an intermediate abnormal user network structure feature analysis unit, a comparison unit, an adjustment unit, and a second abnormal user network determination unit.
The network simplification subunit is configured to determine a degree threshold and a betweenness threshold according to the degree distribution and the betweenness distribution of the first abnormal user network, remove nodes with degrees smaller than the degree threshold from the first abnormal user network, and remove edges with betweenness smaller than the betweenness threshold from the first abnormal user network to obtain an intermediate abnormal user network. In an embodiment, the network simplification subunit may be used to perform operations S410 to S420 described above, which are not described herein.
The intermediate abnormal user network structure characteristic analysis unit is used for analyzing the intermediate abnormal user network to obtain the structure characteristics of the intermediate abnormal user network. In an embodiment, the intermediate abnormal user network structure feature analysis unit may be configured to perform the operation S430 described above, which is not described herein.
The comparison unit is used for comparing the structural characteristics of the intermediate abnormal user network with the structural characteristics of the first abnormal user network. In an embodiment, the comparing unit may be configured to perform the operation S440 described above, which is not described herein.
And when the structural characteristics of the intermediate abnormal user network are inconsistent with the structural characteristics of the first abnormal user network, the adjusting unit is used for adjusting the degree threshold and the medium number threshold, and the network simplifying subunit is repeatedly executed until the structural characteristics of the intermediate abnormal user network are consistent with the structural characteristics of the first abnormal user network. In an embodiment, the adjusting unit may be configured to perform the operation S450 described above, which is not described herein.
And the second abnormal user network determining unit is used for taking the intermediate abnormal user network consistent with the structural characteristics of the first abnormal user network as the second abnormal user network. In an embodiment, the second abnormal user network determining unit may be configured to perform the operation S460 described above, which is not described herein.
According to an embodiment of the present invention, the key abnormal user set obtaining module 740 includes a directed acyclic graph construction module, a first influence calculating unit, a total influence obtaining unit, a ranking unit, and a key abnormal user determining unit.
The directed acyclic graph construction module is configured to construct a directed acyclic graph for each node v in the second abnormal user network to obtain m2 directed acyclic graphs, wherein each node v is distributed among a plurality of directed acyclic graphs. In an embodiment, the directed acyclic graph construction module may be used to perform operation S510 described above, which is not described herein.
The first influence calculation unit is used for calculating influence of each node in each of the m2 directed acyclic graphs within the range of the directed acyclic graph. In an embodiment, the first influence calculating unit may be configured to perform the operation S520 described above, which is not described herein.
The total influence obtaining unit is configured to obtain, for each node v in the second abnormal user network, a plurality of influence of the node v in a distributed plurality of directed acyclic graphs, and superimpose the plurality of influence of the node v, so as to obtain the total influence of the node v in the second abnormal user network. In an embodiment, the total influence obtaining unit may be configured to perform the operation S530 described above, which is not described herein.
The ordering unit is used for ordering the total influence of all the nodes in the second abnormal user network in order from big to small. In an embodiment, the sorting unit may be configured to perform the operation S540 described above, which is not described herein.
The key abnormal user determining unit is used for selecting k nodes with the total influence being in the first k bits to obtain k key abnormal users so as to form a set of key abnormal users, wherein the value of k is preset. In an embodiment, the key abnormal user determination unit may be used to perform the operation S550 described above, which is not described herein.
According to an embodiment of the present invention, the directed acyclic graph construction module further includes: the second influence calculation unit is used for calculating influence of other nodes except the node v on the node v in the second abnormal user network; a screening unit, configured to screen q nodes from the other nodes, where an influence of each of the q nodes on a node v is greater than a preset influence threshold, q is a positive integer greater than or equal to 1, and q is less than m2; and the directed acyclic graph acquisition module is used for constructing the directed acyclic graph of the node v according to the node v, the q nodes and the corresponding edges.
Fig. 10 schematically illustrates a block diagram of an electronic device adapted to implement a method of identifying key abnormal users according to an embodiment of the present invention.
As shown in fig. 10, the electronic device 800 according to the embodiment of the present invention includes a processor 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 801 may also include on-board memory for caching purposes. The processor 801 may comprise a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the invention.
In the RAM 803, various programs and data required for the operation of the electronic device 800 are stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. The processor 801 performs various operations of the method flow according to the embodiment of the present invention by executing programs in the ROM 802 and/or the RAM 803. Note that the program may be stored in one or more memories other than the ROM 802 and the RAM 803. The processor 801 may also perform various operations of the method flow according to embodiments of the present invention by executing programs stored in the one or more memories.
According to an embodiment of the invention, the electronic device 800 may also include an input/output (I/O) interface 808, the input/output (I/O) interface 808 also being connected to the bus 804. The electronic device 800 may also include one or more of the following components connected to the I/O interface 808: an input section 808 including a keyboard, a mouse, and the like; an output portion 808 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage section 808 including a hard disk or the like; and a communication section 808 including a network interface card such as a LAN card, a modem, or the like. The communication section 808 performs communication processing via a network such as the internet. The drive 810 is also connected to the I/O interface 808 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.
The present invention also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present invention.
According to embodiments of the present invention, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the invention, the computer-readable storage medium may include ROM 802 and/or RAM 803 and/or one or more memories other than ROM 802 and RAM 803 described above.
Embodiments of the present invention also include a computer program product comprising a computer program containing program code for performing the method shown in the flowcharts. The program code means for causing a computer system to carry out the methods provided by embodiments of the present invention when the computer program product is run on the computer system.
The above-described functions defined in the system/apparatus of the embodiment of the present invention are performed when the computer program is executed by the processor 801. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the invention.
In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed over a network medium in the form of signals, downloaded and installed via the communication portion 808, and/or installed from the removable medium 811. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 808, and/or installed from the removable medium 811. The above-described functions defined in the system of the embodiment of the present invention are performed when the computer program is executed by the processor 801. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the invention.
According to embodiments of the present invention, program code for carrying out computer programs provided by embodiments of the present invention may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or in assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The embodiments of the present invention are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the invention, and such alternatives and modifications are intended to fall within the scope of the invention.

Claims (16)

1. A method of identifying key anomalous users, the method comprising:
obtaining a first abnormal user network, wherein the first abnormal user network comprises m1 nodes and n1 sides, each node represents each abnormal user, each side represents an association relationship between each abnormal user, and m1 and n1 are positive integers greater than or equal to 3;
analyzing the first abnormal user network to obtain structural features of the first abnormal user network;
removing m3 nodes and n3 edges from the first abnormal user network to obtain a second abnormal user network, wherein the second abnormal user network comprises m2 nodes and n2 edges, the structural characteristics of the second abnormal user network are consistent with those of the first abnormal user network, m2 and n2 are positive integers greater than or equal to 2, m3 and n3 are positive integers greater than or equal to 1, m2 and m3 are smaller than m1, and n2 and n3 are smaller than n1; and
And analyzing the second abnormal user network by using an influence maximization algorithm to obtain a set of key abnormal users, wherein the set of key abnormal users comprises k abnormal users, the k abnormal users are nodes with the influence of k bits in the first row from big to small in the second abnormal user network, k is a positive integer greater than or equal to 1, and k is smaller than m2.
2. The method according to claim 1, wherein said analyzing said first abnormal user network to obtain structural features of said first abnormal user network, in particular comprises:
analyzing the first abnormal user network to obtain an overall structure characteristic value and a local structure characteristic value of the first abnormal user network;
determining the type of the first abnormal user network according to the overall structure characteristic value and the local structure characteristic value of the first abnormal user network; and
and determining the structural characteristics of the first abnormal user network according to the type, the overall structural characteristic value and the local structural characteristic value of the first abnormal user network.
3. The method of claim 2, wherein the overall structural feature value of the first abnormal user network comprises a degree of each node of the first abnormal user network, a betweenness of each edge, a clustering coefficient of the first abnormal user network, and an average path length; and/or the number of the groups of groups,
The local structural feature values of the first abnormal user network include degree correlations and medium number correlations.
4. A method according to claim 3, characterized in that said removing m3 nodes and n3 edges from said first anomalous user network to obtain a second anomalous user network comprises in particular:
network simplification sub-steps: determining a degree threshold and a betweenness threshold according to the degree distribution and the betweenness distribution of the first abnormal user network; and removing nodes with degrees smaller than the degree threshold from the first abnormal user network, and removing edges with the number of bets smaller than the number of bets threshold from the first abnormal user network to obtain an intermediate abnormal user network.
5. The method according to claim 4, wherein m3 nodes and n3 edges are removed from the first abnormal user network to obtain a second abnormal user network, further specifically comprising:
analyzing the intermediate abnormal user network to obtain structural characteristics of the intermediate abnormal user network;
comparing the structural features of the intermediate abnormal user network with the structural features of the first abnormal user network;
if the structural characteristics of the intermediate abnormal user network are inconsistent with the structural characteristics of the first abnormal user network, adjusting the degree threshold and the betweenness threshold, and repeatedly executing the network simplifying sub-step until the structural characteristics of the intermediate abnormal user network are consistent with the structural characteristics of the first abnormal user network; and
And taking the intermediate abnormal user network consistent with the structural characteristics of the first abnormal user network as the second abnormal user network.
6. The method according to any of claims 1-5, wherein said analyzing said second anomalous user network with an impact maximization algorithm to obtain a set of key anomalous users, in particular comprises:
constructing a directed acyclic graph for each node v in the second abnormal user network to obtain m2 directed acyclic graphs, wherein each node v is distributed among a plurality of directed acyclic graphs;
respectively calculating influence force of each node in each of m2 directed acyclic graphs in the directed acyclic graph;
and aiming at each node v in the second abnormal user network, acquiring a plurality of influence forces of the node v in a plurality of distributed directed acyclic graphs, and superposing the plurality of influence forces of the node v to acquire the total influence force of the node v in the second abnormal user network.
7. The method according to claim 6, wherein said analyzing said second abnormal user network with an impact maximization algorithm to obtain a set of critical abnormal users further comprises:
Ordering the total influence of all nodes in the second abnormal user network according to the sequence from big to small; and
k nodes with the total influence on the first k bits are selected, and k key abnormal users are obtained to form a set of key abnormal users, wherein the value of k is preset.
8. The method according to claim 6, characterized in that said constructing a directed acyclic graph for each node v in said second anomalous user network, in particular comprises:
calculating influence of other nodes except the node v in the second abnormal user network on the node v;
screening q nodes from the other nodes, wherein the influence of each node in the q nodes on the node v is greater than a preset influence threshold, q is a positive integer greater than or equal to 1, and q is less than m2;
and constructing a directed acyclic graph of the node v according to the node v, the q nodes and the corresponding edges.
9. The method according to any one of claims 2-5, further comprising: analyzing the second abnormal user network to obtain an overall structure characteristic value and a local structure characteristic value of the second abnormal user network;
Determining the type of the second abnormal user network according to the integral structure characteristic value and the local structure characteristic value of the second abnormal user network; and
and determining the structural characteristics of the second abnormal user network according to the type, the overall structural characteristic value and the local structural characteristic value of the second abnormal user network.
10. The method of claim 1, wherein the type of the first anomalous user network is a scaleless network, and wherein the structural features of the first anomalous user network include a distribution of degrees of nodes of the first anomalous user network conforming to a power law distribution; the structural features of the second abnormal user network include that a distribution of degrees of nodes of the second abnormal user network corresponds to a power law distribution.
11. The method of claim 10, wherein the structural features of the second abnormal user network are consistent with the structural features of the first abnormal user network comprises:
the ratio of the power exponent in the power law distribution of the degree of the node of the second abnormal user network to the power exponent in the power law distribution of the degree of the node of the first abnormal user network is between 0.8 and 1.2.
12. The method of claim 5, wherein the adjusting the degree threshold and the betweenness threshold comprises: and adjusting the degree threshold and the medium threshold according to a Bayesian optimization parameter adjustment method.
13. An apparatus for identifying key anomalous users, said apparatus comprising:
the first abnormal user network acquisition module is used for acquiring a first abnormal user network, wherein the first abnormal user network comprises m1 nodes and n1 sides, each node represents each abnormal user, each side represents the association relationship among each abnormal user, and m1 and n1 are positive integers which are more than or equal to 3;
a first abnormal user network analysis module for: analyzing the first abnormal user network to obtain structural features of the first abnormal user network;
a second abnormal user network acquisition module, configured to: removing m3 nodes and n3 edges from the first abnormal user network to obtain a second abnormal user network, wherein the second abnormal user network comprises m2 nodes and n2 edges, the structural characteristics of the second abnormal user network are consistent with those of the first abnormal user network, m2 and n2 are positive integers greater than or equal to 2, m3 and n3 are positive integers greater than or equal to 1, m2 and m3 are smaller than m1, and n2 and n3 are smaller than n1; and
a key abnormal user set obtaining module, configured to: and analyzing the second abnormal user network by using an influence maximization algorithm to obtain a set of key abnormal users, wherein the set of key abnormal users comprises k abnormal users, the k abnormal users are nodes with the influence of k bits in the first row from big to small in the second abnormal user network, k is a positive integer greater than or equal to 1, and k is smaller than m2.
14. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-12.
15. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1-12.
16. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-12.
CN202310566737.9A 2023-05-19 2023-05-19 Method, device, electronic equipment and medium for identifying key abnormal users Pending CN116383520A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310566737.9A CN116383520A (en) 2023-05-19 2023-05-19 Method, device, electronic equipment and medium for identifying key abnormal users

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310566737.9A CN116383520A (en) 2023-05-19 2023-05-19 Method, device, electronic equipment and medium for identifying key abnormal users

Publications (1)

Publication Number Publication Date
CN116383520A true CN116383520A (en) 2023-07-04

Family

ID=86963611

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310566737.9A Pending CN116383520A (en) 2023-05-19 2023-05-19 Method, device, electronic equipment and medium for identifying key abnormal users

Country Status (1)

Country Link
CN (1) CN116383520A (en)

Similar Documents

Publication Publication Date Title
US10200393B2 (en) Selecting representative metrics datasets for efficient detection of anomalous data
CN112148987B (en) Message pushing method based on target object activity and related equipment
CN111932386B (en) User account determining method and device, information pushing method and device, and electronic equipment
CN104077723B (en) A kind of social networks commending system and method
CN110148053B (en) User credit line evaluation method and device, electronic equipment and readable medium
CN110705719A (en) Method and apparatus for performing automatic machine learning
EP3805957A1 (en) Computer-implemented method and apparatus for determining a relevance of a node in a network
CN107808346B (en) Evaluation method and evaluation device for potential target object
CN107392259B (en) Method and device for constructing unbalanced sample classification model
CN116109121B (en) User demand mining method and system based on big data analysis
CN115238815A (en) Abnormal transaction data acquisition method, device, equipment, medium and program product
CN112949914A (en) Industry cluster identification method and device, storage medium and electronic equipment
CN112990583A (en) Method and equipment for determining mold entering characteristics of data prediction model
Rajan Integrating iot analytics into marketing decision making: A smart data-driven approach
CN113850669A (en) User grouping method and device, computer equipment and computer readable storage medium
CN116739605A (en) Transaction data detection method, device, equipment and storage medium
CN114897290B (en) Evolution identification method, device, terminal equipment and storage medium for business process
CN116308704A (en) Product recommendation method, device, electronic equipment, medium and computer program product
CN114710397B (en) Service link fault root cause positioning method and device, electronic equipment and medium
CN116383520A (en) Method, device, electronic equipment and medium for identifying key abnormal users
CN114925275A (en) Product recommendation method and device, computer equipment and storage medium
CN113869904A (en) Suspicious data identification method, device, electronic equipment, medium and computer program
CN112750047A (en) Behavior relation information extraction method and device, storage medium and electronic equipment
CN110895564A (en) Potential customer data processing method and device
CN118229427A (en) Risk group identification method, apparatus, electronic device, medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination