CN111475680A - Method, device, equipment and storage medium for detecting abnormal high-density subgraph - Google Patents

Method, device, equipment and storage medium for detecting abnormal high-density subgraph Download PDF

Info

Publication number
CN111475680A
CN111475680A CN202010226309.8A CN202010226309A CN111475680A CN 111475680 A CN111475680 A CN 111475680A CN 202010226309 A CN202010226309 A CN 202010226309A CN 111475680 A CN111475680 A CN 111475680A
Authority
CN
China
Prior art keywords
abnormal
density subgraph
data
density
subgraph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010226309.8A
Other languages
Chinese (zh)
Inventor
赵世泉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202010226309.8A priority Critical patent/CN111475680A/en
Priority to PCT/CN2020/103200 priority patent/WO2021189730A1/en
Publication of CN111475680A publication Critical patent/CN111475680A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • G06Q50/265Personal security, identity or safety

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Security & Cryptography (AREA)
  • Artificial Intelligence (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to the field of big data, and discloses a method, a device, equipment and a storage medium for detecting an abnormal high-density subgraph, which can improve the accuracy of detecting whether the high-density subgraph is abnormal or not. The method comprises the following steps: carrying out real-time graph segmentation processing on the obtained complex relation network to be analyzed through a preset algorithm to obtain a high-density subgraph; sampling the network topological structure characteristics of the high-density subgraph according to a first preset time interval to obtain dynamic characteristic change data; acquiring static characteristic data in a historical complex relationship network, and carrying out statistics and calculation on the static characteristic data through a preset statistical model to obtain a confidence interval; dividing the dynamic characteristic change data into non-abnormal characteristics and abnormal characteristics according to the inside and outside of the confidence interval, and taking the non-abnormal characteristics and the abnormal characteristics as target derivative characteristics; and carrying out anomaly detection on the high-density subgraph by combining the anomaly detection model with the target derivative characteristics to obtain a target anomaly high-density subgraph.

Description

Method, device, equipment and storage medium for detecting abnormal high-density subgraph
Technical Field
The invention relates to the field of risk management and control, in particular to a method, a device, equipment and a storage medium for detecting an abnormal high-density subgraph.
Background
The complex relation network plays a significant role in the wind control field and the anti-fraud field, and especially plays a significant role in the fields of malicious group identification, fraud risk group prevention and group control and the like. At present, the analysis method based on the complex high-density subgraph is a static analysis method, namely, the whole content in the high-density subgraph is analyzed at a certain moment to obtain various predefined indexes, and then the properties of the high-density subgraph are divided, so that a cheating group is identified. However, with the increase of black production capacity, it is difficult to identify a cheating group (i.e. an abnormal high-density sub-graph) well only by analyzing a high-density sub-graph from a static point of view, which leads to a decrease in accuracy of detecting whether the high-density sub-graph is abnormal.
Disclosure of Invention
The invention provides a method, a device, equipment and a storage medium for detecting an abnormal high-density subgraph, aiming at improving the accuracy of detecting whether the high-density subgraph is abnormal or not.
A first aspect of an embodiment of the present invention provides a method for detecting an abnormal high-density subgraph, including:
acquiring a complex relation network to be analyzed, and performing real-time graph segmentation processing on the complex relation network through a preset algorithm to obtain a high-density subgraph, wherein the high-density subgraph is used for indicating communities and incidence relations among the communities;
sampling the network topology structure characteristics of the high-density subgraph according to a first preset time interval to obtain dynamic characteristic change data, wherein the dynamic characteristic change data is used for indicating the network topology structure characteristic data of the high-density subgraph which dynamically changes along with the time change;
acquiring static characteristic data in a historical complex relationship network, and counting and calculating the static characteristic data through a preset counting model to obtain a confidence interval, wherein the historical complex relationship network is used for indicating a complex relationship network generated or stored before the complex relationship network, and the confidence interval is used for indicating an average variation range value of the static characteristic data between each time period;
dividing the dynamic feature change data into non-abnormal features and abnormal features according to the confidence interval and the confidence interval, and taking the non-abnormal features and the abnormal features as target derivative features;
and carrying out anomaly detection on the high-density subgraph by combining an anomaly detection model with the target derivative characteristics to obtain a target anomaly high-density subgraph.
Optionally, in a first implementation manner of the first aspect of the embodiment of the present invention, the obtaining static feature data in the historical complex relationship network, and performing statistics and calculation on the static feature data through a preset statistical model to obtain a confidence interval includes:
acquiring a historical complex relationship network, and selecting and extracting network topological structure characteristics of the historical complex relationship network to obtain static characteristic data;
taking the static characteristic data as nodes, acquiring the association relation among the static characteristic data in the historical complex relation network, taking the association relation as a dividing condition, and generating a static high-density subgraph according to the nodes and the dividing condition;
acquiring time sequence data in the static high-density subgraph, and sampling the time sequence data according to a second preset time interval to obtain static characteristic change data;
counting the static characteristic change data according to a third preset time interval to obtain statistical data corresponding to each time interval, wherein the statistical data corresponding to each time interval comprise the number of the static high-density subgraphs and the mean and variance of the static characteristic change data in the third preset time interval;
and calculating the statistical data corresponding to each time interval by a preset formula to obtain a first confidence threshold and a second confidence threshold, and generating a confidence interval according to the first confidence threshold and the second confidence threshold.
Optionally, in a second implementation manner of the first aspect of the embodiment of the present invention, the dividing the dynamic feature change data into non-abnormal features and abnormal features according to the inside and outside of the confidence interval, and taking the non-abnormal features and the abnormal features as target derivative features includes:
performing time continuity analysis on the dynamic characteristic change data to obtain first characteristic data and second characteristic data which are continuous in time, wherein the time continuity is used for indicating that the tail end time point of the first characteristic data is the same as or connected with the start end time point of the second characteristic data;
calculating a feature difference value between the first feature data and the second feature data;
judging whether the feature difference value is outside the confidence interval or not;
if the feature difference value is not outside the confidence interval, setting the feature difference value to zero, and taking the first feature data and the second feature data corresponding to the feature difference value as non-abnormal features;
if the feature difference value is outside the confidence interval, setting the feature difference value to be 1, and taking first feature data and second feature data corresponding to the feature difference value as abnormal features;
and taking the non-abnormal features and the abnormal features as target derived features.
Optionally, in a third implementation manner of the first aspect of the embodiment of the present invention, the performing, by using an anomaly detection model and combining the target derived feature, anomaly detection on the high-density subgraph to obtain an abnormal high-density subgraph includes:
establishing and marking the corresponding relation between the target derived features and the high-density subgraph through an anomaly detection model to obtain a marked high-density subgraph;
carrying out anomaly detection on the marked high-density subgraph by an isolated forest algorithm to obtain an initial abnormal high-density subgraph;
and carrying out anomaly detection on the initial anomaly high-density subgraph by using a clustering-based subspace anomaly detection algorithm to obtain a target anomaly high-density subgraph.
Optionally, in a fourth implementation manner of the first aspect of the embodiment of the present invention, the obtaining a complex relationship network to be analyzed, and performing real-time graph partitioning processing on the complex relationship network through a preset algorithm to obtain a high-density subgraph includes:
acquiring a complex relationship network to be analyzed, initializing each node of the complex relationship network into different first communities, and calculating a first modularization metric value of the first communities;
dividing each node into a second community in which adjacent nodes of each node are located, and calculating a second modularization metric value of the second community;
calculating a difference between the first modularity metric value and the second modularity metric value for each node;
analyzing whether the difference is a positive number or not, if not, continuing to perform community division processing on each node until the difference is the positive number to obtain divided communities, wherein the community division processing is used for indicating that each node is initialized to be different first communities and is respectively divided into second communities where adjacent nodes of each node are located;
and acquiring and analyzing the weight of the connection edge among the communities in the divided communities, and taking a graph formed by the divided communities with the weight average of the connection edge greater than a preset threshold value as a high-density subgraph.
Optionally, in a fifth implementation manner of the first aspect of the embodiment of the present invention, the sampling the network topology feature of the high-density subgraph according to a first preset time interval to obtain dynamic feature change data includes:
extracting the network topological structure characteristics of the high-density subgraph in real time to obtain dynamic characteristic data;
capturing the dynamic characteristic data according to a first preset time interval to obtain candidate dynamic characteristic change data;
and performing performance analysis and reliability analysis on the candidate dynamic characteristic change data to obtain dynamic characteristic change data.
Optionally, in a sixth implementation manner of the first aspect of the embodiment of the present invention, after the anomaly detection model is combined with the target derived feature to perform anomaly detection on the high-density subgraph to obtain a target anomalous high-density subgraph, the method for detecting an anomalous high-density subgraph further includes:
and carrying out abnormal degree classification processing, abnormal development prediction processing and same type abnormal analysis processing on the target abnormal high-density subgraph to obtain a final target abnormal high-density subgraph.
A second aspect of embodiments of the present invention provides an apparatus for detecting an abnormally high-density subgraph, having a function of implementing the method for detecting an abnormally high-density subgraph provided in correspondence with the first aspect described above. The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above functions, and the units may be software and/or hardware.
The device for detecting the abnormal high-density subgraph comprises the following steps:
the system comprises a segmentation processing module, a correlation analysis module and a correlation analysis module, wherein the segmentation processing module is used for acquiring a complex relationship network to be analyzed and carrying out real-time graph segmentation processing on the complex relationship network through a preset algorithm to obtain a high-density subgraph, and the high-density subgraph is used for indicating communities and incidence relations among the communities;
the sampling processing module is used for sampling the network topological structure characteristics of the high-density subgraph according to a first preset time interval to obtain dynamic characteristic change data, and the dynamic characteristic change data is used for indicating the network topological structure characteristic data of the high-density subgraph which dynamically changes along with the time change;
the statistical calculation module is used for acquiring static characteristic data in a historical complex relationship network, and performing statistics and calculation on the static characteristic data through a preset statistical model to obtain a confidence interval, wherein the historical complex relationship network is used for indicating a complex relationship network generated or stored in front of the complex relationship network, and the confidence interval is used for indicating an average variation range value of the static characteristic data between each time period;
the judgment analysis module is used for dividing the dynamic characteristic change data into non-abnormal characteristics and abnormal characteristics according to the confidence interval and the confidence interval, and taking the non-abnormal characteristics and the abnormal characteristics as target derivative characteristics;
and the anomaly detection module is used for carrying out anomaly detection on the high-density subgraph by combining an anomaly detection model with the target derived characteristics to obtain a target anomaly high-density subgraph.
Optionally, in a first implementation manner of the second aspect of the embodiment of the present invention, the statistical calculation module is specifically configured to:
acquiring a historical complex relationship network, and selecting and extracting network topological structure characteristics of the historical complex relationship network to obtain static characteristic data;
taking the static characteristic data as nodes, acquiring the association relation among the static characteristic data in the historical complex relation network, taking the association relation as a dividing condition, and generating a static high-density subgraph according to the nodes and the dividing condition;
acquiring time sequence data in the static high-density subgraph, and sampling the time sequence data according to a second preset time interval to obtain static characteristic change data;
counting the static characteristic change data according to a third preset time interval to obtain statistical data corresponding to each time interval, wherein the statistical data corresponding to each time interval comprise the number of the static high-density subgraphs and the mean and variance of the static characteristic change data in the third preset time interval;
and calculating the statistical data corresponding to each time interval by a preset formula to obtain a first confidence threshold and a second confidence threshold, and generating a confidence interval according to the first confidence threshold and the second confidence threshold.
Optionally, in a second implementation manner of the second aspect of the embodiment of the present invention, the judgment analysis module is specifically configured to:
performing time continuity analysis on the dynamic characteristic change data to obtain first characteristic data and second characteristic data which are continuous in time, wherein the time continuity is used for indicating that the tail end time point of the first characteristic data is the same as or connected with the start end time point of the second characteristic data;
calculating a feature difference value between the first feature data and the second feature data;
judging whether the feature difference value is outside the confidence interval or not;
if the feature difference value is not outside the confidence interval, setting the feature difference value to zero, and taking the first feature data and the second feature data corresponding to the feature difference value as non-abnormal features;
if the feature difference value is outside the confidence interval, setting the feature difference value to be 1, and taking first feature data and second feature data corresponding to the feature difference value as abnormal features;
and taking the non-abnormal features and the abnormal features as target derived features.
Optionally, in a third implementation manner of the second aspect of the embodiment of the present invention, the abnormality detection module is specifically configured to:
establishing and marking the corresponding relation between the target derived features and the high-density subgraph through an anomaly detection model to obtain a marked high-density subgraph;
carrying out anomaly detection on the marked high-density subgraph by an isolated forest algorithm to obtain an initial abnormal high-density subgraph;
and carrying out anomaly detection on the initial anomaly high-density subgraph by using a clustering-based subspace anomaly detection algorithm to obtain a target anomaly high-density subgraph.
Optionally, in a fourth implementation manner of the second aspect of the embodiment of the present invention, the segmentation processing module is specifically configured to:
acquiring a complex relationship network to be analyzed, initializing each node of the complex relationship network into different first communities, and calculating a first modularization metric value of the first communities;
dividing each node into a second community in which adjacent nodes of each node are located, and calculating a second modularization metric value of the second community;
calculating a difference between the first modularity metric value and the second modularity metric value for each node;
analyzing whether the difference is a positive number or not, if not, continuing to perform community division processing on each node until the difference is the positive number to obtain divided communities, wherein the community division processing is used for indicating that each node is initialized to be different first communities and is respectively divided into second communities where adjacent nodes of each node are located;
and acquiring and analyzing the weight of the connection edge among the communities in the divided communities, and taking a graph formed by the divided communities with the weight average of the connection edge greater than a preset threshold value as a high-density subgraph.
Optionally, in a fifth implementation manner of the second aspect of the embodiment of the present invention, the sampling processing module is specifically configured to:
extracting the network topological structure characteristics of the high-density subgraph in real time to obtain dynamic characteristic data;
capturing the dynamic characteristic data according to a first preset time interval to obtain candidate dynamic characteristic change data;
and performing performance analysis and reliability analysis on the candidate dynamic characteristic change data to obtain dynamic characteristic change data.
Optionally, in a sixth implementation manner of the second aspect of the embodiment of the present invention, the apparatus for detecting an abnormally high-density subgraph further includes:
and the processing module is used for carrying out abnormal degree classification processing, abnormal development prediction processing and same type abnormal analysis processing on the target abnormal high-density subgraph to obtain a final target abnormal high-density subgraph.
A third aspect of the embodiments of the present invention provides an apparatus for detecting an abnormally high-density subgraph, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the method for detecting an abnormally high-density subgraph according to any one of the above embodiments when executing the computer program.
A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium, which includes instructions, when the instructions are executed on a computer, cause the computer to execute the method for detecting an abnormally high-density subgraph described in any one of the above embodiments.
Compared with the prior art, in the technical scheme provided by the embodiment of the invention, the obtained complex relation network to be analyzed is subjected to real-time graph segmentation processing through a preset algorithm to obtain a high-density subgraph; sampling the network topological structure characteristics of the high-density subgraph according to a first preset time interval to obtain dynamic characteristic change data; acquiring static characteristic data in a historical complex relationship network, and carrying out statistics and calculation on the static characteristic data through a preset statistical model to obtain a confidence interval; dividing the dynamic characteristic change data into non-abnormal characteristics and abnormal characteristics according to the inside and outside of the confidence interval, and taking the non-abnormal characteristics and the abnormal characteristics as target derivative characteristics; and carrying out anomaly detection on the high-density subgraph by combining the anomaly detection model with the target derivative characteristics to obtain a target anomaly high-density subgraph. According to the embodiment of the invention, the risk capability of the high-density subgraph is analyzed by combining the static index of the high-density subgraph and the dynamic index in the dynamic evolution process, and the accuracy of detecting whether the high-density subgraph is abnormal is improved.
Drawings
FIG. 1 is a diagram of an embodiment of a method for detecting an abnormally high density subgraph in an embodiment of the present invention;
FIG. 2 is a diagram of another embodiment of a method for detecting an abnormally high density subgraph in an embodiment of the present invention;
FIG. 3 is a diagram of an embodiment of an apparatus for detecting an abnormally high density subgraph in an embodiment of the present invention;
FIG. 4 is a schematic diagram of another embodiment of an apparatus for detecting an abnormally high density subgraph in an embodiment of the present invention;
fig. 5 is a schematic diagram of an embodiment of the apparatus for detecting an abnormally high density subgraph in the embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a method, a device, equipment and a storage medium for detecting an abnormal high-density subgraph, which are used for analyzing the risk capability of the high-density subgraph by combining a static index of the high-density subgraph and a dynamic index in a dynamic evolution process and improving the accuracy of detecting whether the high-density subgraph is abnormal or not.
In order to make the technical field of the invention better understand the scheme of the invention, the embodiment of the invention will be described in conjunction with the attached drawings in the embodiment of the invention.
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. The terms "first," "second," and the like in the description and in the claims, and in the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules expressly listed, but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus, and the division of modules into blocks presented herein is merely a logical division that may be implemented in a practical application in a different manner, such that multiple blocks may be combined or integrated into another system, or some features may be omitted, or may not be implemented.
Referring to fig. 1, a flowchart of a method for detecting an abnormally high-density subgraph according to an embodiment of the present invention is described below, where the method is executed by a computer device, and the computer device may be a server or a terminal, where the present invention does not limit the type of an execution subject, and specifically includes:
101. and acquiring a complex relation network to be analyzed, and performing real-time graph segmentation processing on the complex relation network through a preset algorithm to obtain a high-density subgraph, wherein the high-density subgraph is used for indicating communities and incidence relations among the communities.
The complex relationship network is composed of service contents and relations between the service contents, such as: usage of a platform by people on a campus, the extent of usage of a platform, what relationships are between companies using a platform, etc. Because the complex relation network changes all the time along with the change of the service and the time, when the server receives the instruction sent by the terminal or the user, the server carries out real-time graph segmentation and community planning on the complex relation network at the current time through a preset algorithm to obtain a high-density subgraph with higher relation relevance and closer relation, and the data acquisition instruction is triggered while the high-density subgraph is generated.
Specifically, the step 101 may include: acquiring a complex relationship network to be analyzed, initializing each node of the complex relationship network into different first communities, and calculating a first modularization metric value of the first community; dividing each node into a second community in which adjacent nodes of each node are located, and calculating a second modularization metric value of the second community; calculating a difference between the first modularization metric value and the second modularization metric value of each node; analyzing whether the difference is a positive number or not, if not, continuing to perform community division processing on each node until the difference is the positive number to obtain divided communities, wherein the community division processing is used for indicating that each node is initialized to a different first community and each node is divided into a second community where adjacent nodes of each node are located; and acquiring and analyzing the connection edge weight among communities in the divided communities, and taking a graph formed by the divided communities with the connection edge weights larger than a preset threshold value as a high-density subgraph.
For example: when the server receives the instruction sent by the terminal or the user terminal, the server reads the complex relational network stored in the database, taking two nodes in a complex relation network, namely a node A and a node B as an illustration, wherein the node A and the node B are adjacent, the node A and the node B are respectively divided into an independent community, namely, the node A corresponds to the community A1, the node B corresponds to the community B1, the first modularization metric value of the community A1 is calculated, and a first modularization metric value of the B1 community, dividing the A node into communities of the B node to obtain an A2 community, dividing the B node into communities of the A node to obtain a B2 community, calculating a second modularization metric value of the A2 community and a second modularization metric value of the B2 community, the network community structural strength of the A1 community and the A2 community (or the B1 community and the B2 community) is measured by calculating the difference of the first modularization metric value and the second modularization metric value. The connection edge weight is large, and the relationship complexity and the relationship relevance are large, so that a graph obtained by dividing communities with connection edge weights larger than a preset threshold value is used as a high-density subgraph, and the quality of the generated high-density subgraph is improved.
102. And sampling the network topological structure characteristics of the high-density subgraph according to a first preset time interval to obtain dynamic characteristic change data, wherein the dynamic characteristic change data is used for indicating the network topological structure characteristic data of the high-density subgraph which dynamically changes along with the time change.
The dynamic characteristic change data is various network topological structure characteristics such as the number of vertexes, the degree, the average degree and the average correlation coefficient which are dynamically changed along with the time change. And the data acquisition instruction starts a related data acquisition tool to capture the characteristics of the high-density subgraph at a first preset time interval at regular intervals to obtain dynamic characteristic change data in continuous equal time slices, wherein the dynamic characteristic change data in each equal time slice can be subjected to mean value calculation or weighted mean value calculation to obtain dynamic characteristic change data which can represent comprehensive change in the time slice.
Specifically, this step 102 may include: extracting the network topological structure characteristics of the high-density subgraphs in real time to obtain dynamic characteristic data; capturing the dynamic characteristic data according to a first preset time interval to obtain candidate dynamic characteristic change data; and performing performance analysis and reliability analysis on the candidate dynamic characteristic change data to obtain dynamic characteristic change data.
The server gives a weight to the network topological structure characteristics of each dimension in the high-density subgraph, sorts the characteristics according to the weight values from large to small, performs characteristic selection on the network topological structure characteristics in a specific sequence to obtain the specified network topological structure characteristics, extracts the specified network topological structure characteristics through characteristic value decomposition to obtain dynamic characteristic data, captures the dynamic characteristic change data at a first preset time interval by combining the characteristics of a flexible plug-in system with an acquisition tool fluent, less required resources and supporting the buffer based on a memory and a file to prevent data loss between nodes to obtain candidate dynamic characteristic change data, performs performance analysis and reliability analysis on the candidate dynamic characteristic change data, and obtains the dynamic characteristic change data with guaranteed performance and reliable performance.
103. The method comprises the steps of obtaining static feature data in a historical complex relationship network, and carrying out statistics and calculation on the static feature data through a preset statistical model to obtain a confidence interval, wherein the historical complex relationship network is used for indicating a complex relationship network generated or stored in front of the complex relationship network, and the confidence interval is used for indicating an average variation range value of the static feature data between each time period.
The static characteristic change data is various network topological structure characteristics such as the number of vertexes, the degree, the average degree number, the average correlation coefficient and the like corresponding to a specific time in the historical complex relationship network. Static feature data in the historical complex relationship network are counted and calculated in an equal time interval mode (the specific time interval is long, depends on different service scenes, is generally 1 hour, and is required to be higher, and can be taken as a unit of minutes), the overall situation of the static feature change data (such as the number of vertexes, the number of degrees, the average degree, the average association coefficient and the like) of each time slice is counted, the average change range value (namely, the confidence interval) of each static feature change data among each time slice is calculated, and the average change range value (namely, the confidence interval) of each static feature change data is taken as the reference for judging the dynamic feature change data.
Specifically, the step 103 may include: acquiring a historical complex relation network, and selecting and extracting the characteristics of the historical complex relation network to obtain static characteristic data; taking the static characteristic data as nodes, acquiring the incidence relation among the static characteristic data in the historical complex relation network, taking the incidence relation as a dividing condition, and generating a static high-density subgraph according to the nodes and the dividing condition; acquiring time sequence data of the static high-density subgraph, and sampling the time sequence data according to a second preset time interval to obtain static characteristic change data; according to a third preset time interval, carrying out statistics of the preset time interval on the static characteristic change data to obtain statistical data corresponding to each time interval, wherein the statistical data corresponding to each time interval comprise the number of static high-density subgraphs and the mean and variance of the static characteristic change data in the third preset time interval; and calculating the statistical data corresponding to each time interval by a preset formula to obtain a first confidence threshold and a second confidence threshold, and generating a confidence interval according to the first confidence threshold and the second confidence threshold.
The server gives a weight to the static features of each dimension in the historical complex relationship network, sorts the static features according to the weight values from large to small, selects the features of the static features in a specific sequence to obtain appointed static features, and extracts the appointed static features through feature value decomposition to obtain static feature data. Taking static characteristic data as nodes, taking the incidence relation between the static characteristic data in the historical complex relation network as a dividing condition, and carrying out high-density subgraph division on the historical complex relation network to obtain a static high-density subgraph, for example: static characteristic data are A (the number of vertexes is 5, the average degree is 25 degrees and the average correlation coefficient is 4.5), B (the number of vertexes is 5, the average degree is 30 degrees and the average correlation coefficient is 5) and C (the number of vertexes is 6, the average degree is 35 degrees and the average correlation coefficient is 5.5), wherein, the areas of the historical complex relationship networks corresponding to A, B and C are far apart, the association relationship between A and B is that the similarity is very high and the association is very high, the association relationship between A and C is that the similarity is low and the association is low, the association relationship between B and C is that the similarity is high and the association is high, the historical complex relationship networks corresponding to A, B and C are divided into the same area, the historical complex relationship networks corresponding to A and B are connected in a combined mode, the historical complex relationship networks corresponding to B and C are connected in a combined mode, namely on a static high-density subgraph, the network topological structures respectively corresponding to the first and the second are adjacent, and the network topological structures respectively corresponding to the second and the third are adjacent; backtracking each static high-density subgraph from the moment of productionFirst, the static feature change data at equal time slice Δ t intervals can be calculated for each static high-density subgraph to obtain the corresponding static feature change data, such as:
Figure BDA0002427765030000081
represents t0The number of nodes in the time high-density subgraph;
t0:
Figure BDA0002427765030000082
t1:
Figure BDA0002427765030000083
……
tn:
Figure BDA0002427765030000084
for each individual static characteristic change data, calculating the change of each static characteristic change data at each moment:
t0~t1:
Figure BDA0002427765030000085
t1~t2:
Figure BDA0002427765030000086
and counting all static high-density subgraphs, and calculating the mean value (namely the mean value in a third preset time interval) and the confidence interval of the change of the static characteristic change data in each time slice delta t interval. Can pass through preset formulas
Figure BDA0002427765030000087
Calculating statistical data corresponding to each time interval to respectively obtain a first confidence threshold and a second confidence threshold, wherein the second confidence threshold is larger than the first confidence threshold, and calculating according to the first confidence threshold and the second confidence thresholdThe confidence interval [ first confidence threshold, second confidence threshold ] is obtained]Wherein, in the step (A),
Figure BDA0002427765030000088
is the average value of the static characteristic change data in a third preset time interval, sigma is the variance of the static characteristic change data in the third preset time interval, n is the number of historical high-density subgraphs,
Figure BDA0002427765030000089
for the corresponding values obtained by querying a preset percentage confidence interval table.
104. Dividing the dynamic characteristic change data into non-abnormal characteristics and abnormal characteristics according to the inside and outside of the confidence interval, and taking the non-abnormal characteristics and the abnormal characteristics as target derivative characteristics.
The server can visually display whether the dynamic characteristic change data is abnormal in the confidence interval or not through a preset statistical analysis tool and a statistical analysis chart. And (3) judging (defining) the dynamic characteristic change data in the confidence interval as non-abnormal characteristics, judging (defining) the dynamic characteristic change data outside the confidence interval as abnormal characteristics, and marking the ID of the high-density subgraph, wherein the non-abnormal characteristics and the abnormal characteristics are target derivative characteristics.
Specifically, this step 104 may include: performing time continuity analysis on the dynamic characteristic change data to obtain first characteristic data and second characteristic data which are continuous in time, wherein the time continuity is used for indicating that the tail end time point of the first characteristic data is the same as or connected with the start end time point of the second characteristic data; calculating a feature difference value between the first feature data and the second feature data; judging whether the feature difference value is outside the confidence interval or not; if the feature difference value is not outside the confidence interval, setting the feature difference value to zero, and taking the first feature data and the second feature data corresponding to the feature difference value as non-abnormal features; if the feature difference value is outside the confidence interval, setting the feature difference value to be 1, and taking the first feature data and the second feature data corresponding to the feature difference value as abnormal features; and taking the non-abnormal features and the abnormal features as target derived features.
The server calculates specified feature change data (namely first feature data and second feature data) at equal time slices delta t for the generated high-density subgraph, analyzes the difference value (namely the feature difference value) between the first feature data and the second feature data through a statistical analysis graph, generates a line graph, a histogram or other statistical graphs to analyze whether the feature difference value falls within a confidence interval at the current moment, takes the first feature data and the second feature data corresponding to the feature difference value falling outside the confidence interval as abnormal features, and takes the first feature data and the second feature data corresponding to the feature difference value falling within the confidence interval as non-abnormal features to obtain target derived features. Thus, all dynamic feature change data (namely target derived features) and abnormal features of each high-density subgraph can be obtained at one time. Wherein, the characteristic difference value: index change is t0~t1:
Figure BDA0002427765030000091
Derived by the characteristic t0~t1:(0,0,1,……)。
105. And carrying out anomaly detection on the high-density subgraph by combining the anomaly detection model with the target derivative characteristics to obtain a target anomaly high-density subgraph.
The server constructs an anomaly detection model which is a combined model integrating a plurality of performance models, the sample data (sample data with derivative characteristics) in the anomaly detection model is screened by an expert rule to obtain initial sample data, predicting the risk of the initial sample data to obtain a risk value, judging whether the risk value is greater than a preset value, obtaining the initial sample data with the risk value greater than the preset value to obtain candidate sample data, normal distribution analysis is carried out on candidate sample data in an anomaly detection algorithm based on Gaussian (normal) distribution in an unsupervised learning algorithm, and obtaining a target anomaly high-density subgraph corresponding to the anomaly in the target derivative characteristics to finish training the target anomaly detection model to obtain a final target anomaly detection model, and carrying out anomaly detection on the high-density subgraph by combining the anomaly detection model with the target derivative characteristics. The dynamic evolution abnormity detection of the high-density subgraph can well cope with the situation of mass black products or fraudulent inrush in a short time, namely when the static characteristics of the whole high-density subgraph are not deteriorated, the deterioration of the whole high-density subgraph is restrained in time through the evolution trend of each static characteristic.
Specifically, this step 105 may include: establishing and marking the corresponding relation between the target derived features and the high-density subgraph through an anomaly detection model to obtain a marked high-density subgraph; carrying out anomaly detection on the marked high-density subgraph by an isolated forest algorithm to obtain an initial abnormal high-density subgraph; and carrying out anomaly detection on the initial anomaly high-density subgraph by using a clustering-based subspace anomaly detection algorithm to obtain a target anomaly high-density subgraph.
And the server creates and marks the corresponding relation between the target derived features and the high-density subgraph corresponding to the target derived features through the anomaly detection model to obtain the marked high-density subgraph, so that the high-density subgraph can be intuitively and conveniently subjected to anomaly detection and display when the target derived features are analyzed. And carrying out anomaly detection on the marked high-density subgraph by an isolated forest algorithm to obtain an initial anomaly high-density subgraph, for example: five high-density subgraphs A, B, C, D and E are available at the current moment, the derivative features in the last time interval are A (0,0,0,0,1), B (0,0,0,0,0, 0), C (0,0,0,0,1), D (0,0,0,0,0), E (0,1,1,0,1) respectively, and the high-density subgraph E at the current moment is obtained as a target abnormal high-density subgraph through analysis of an isolated forest algorithm of an abnormal detection model. Because the derived features may be high-dimensional data, and the accuracy of analysis of the high-dimensional data by the isolated forest algorithm is influenced, the anomaly detection of the clustering-based subspace anomaly detection algorithm is performed on the initial anomaly high-density subgraph obtained by performing anomaly detection through the isolated forest algorithm, so that the anomaly detection accuracy is improved, and the quality and the accuracy of the target anomaly high-density subgraph are further ensured.
According to the embodiment of the invention, the risk capability of the high-density subgraph is analyzed by combining the static index of the high-density subgraph and the dynamic index in the dynamic evolution process, and the accuracy of detecting whether the high-density subgraph is abnormal is improved.
Referring to fig. 2, another embodiment of the method for detecting an abnormally high density subgraph according to the embodiment of the invention includes:
201. and acquiring a complex relation network to be analyzed, and performing real-time graph segmentation processing on the complex relation network through a preset algorithm to obtain a high-density subgraph, wherein the high-density subgraph is used for indicating communities and incidence relations among the communities.
202. And sampling the network topological structure characteristics of the high-density subgraph according to a first preset time interval to obtain dynamic characteristic change data, wherein the dynamic characteristic change data is used for indicating the network topological structure characteristic data of the high-density subgraph which dynamically changes along with the time change.
203. The method comprises the steps of obtaining static feature data in a historical complex relationship network, and carrying out statistics and calculation on the static feature data through a preset statistical model to obtain a confidence interval, wherein the historical complex relationship network is used for indicating a complex relationship network generated or stored in front of the complex relationship network, and the confidence interval is used for indicating an average variation range value of the static feature data between each time period.
204. Dividing the dynamic characteristic change data into non-abnormal characteristics and abnormal characteristics according to the inside and outside of the confidence interval, and taking the non-abnormal characteristics and the abnormal characteristics as target derivative characteristics.
205. And carrying out anomaly detection on the high-density subgraph by combining the anomaly detection model with the target derivative characteristics to obtain a target anomaly high-density subgraph.
In the embodiment of the present invention, the methods 201 to 205 can be referred to as 101 to 105, and are not described herein again.
206. And carrying out abnormal degree classification processing, abnormal development prediction processing and same type abnormal analysis processing on the target abnormal high-density subgraph to obtain the final target abnormal high-density subgraph.
The server classifies the abnormal degree of the target abnormal high-density subgraph by a k-nearest neighbor algorithm to obtain classification information of different abnormal degrees; carrying out abnormal development prediction on the target abnormal high-density subgraph through a time series prediction algorithm to obtain abnormal information of the predicted abnormal change in the future time period; carrying out the same type anomaly analysis on the target abnormal high-density subgraph through a clustering algorithm to obtain the clustering information of the same type anomaly with the target abnormal high-density subgraph; and performing score evaluation of preset weight on the classification information, the abnormal information and the clustering information to obtain scores, and sequencing the target abnormal high-density subgraphs according to the sequence of the scores from large to small to obtain the final target abnormal high-density subgraphs. And the acquisition accuracy and quality of the target abnormal high-density subgraph are improved through comprehensive evaluation.
According to the embodiment of the invention, the risk capability of the high-density subgraph is analyzed by combining the static index of the high-density subgraph and the dynamic index in the dynamic evolution process, the accuracy of detecting whether the high-density subgraph is abnormal is improved, and the accuracy and the quality of acquiring the high-density subgraph with the abnormal target are improved by carrying out abnormal degree classification processing, abnormal development prediction processing and similar type abnormal analysis processing on the high-density subgraph with the abnormal target.
With reference to fig. 3, the method for detecting an abnormally high density subgraph in the embodiment of the present invention is described above, and an embodiment of the apparatus for detecting an abnormally high density subgraph in the embodiment of the present invention includes:
the segmentation processing module 301 is configured to obtain a complex relationship network to be analyzed, and perform real-time graph segmentation processing on the complex relationship network through a preset algorithm to obtain a high-density subgraph, where the high-density subgraph is used for indicating communities and association relationships among the communities;
the sampling processing module 302 is configured to sample network topology structure characteristics of the high-density subgraph according to a first preset time interval to obtain dynamic characteristic change data, where the dynamic characteristic change data is used to indicate that the high-density subgraph is dynamically changed along with time change;
the statistical calculation module 303 is configured to obtain static feature data in a historical complex relationship network, and perform statistics and calculation on the static feature data through a preset statistical model to obtain a confidence interval, where the historical complex relationship network is used to indicate a complex relationship network generated or stored before the complex relationship network, and the confidence interval is used to indicate an average variation range value of the static feature data between each time period;
the judgment analysis module 304 is used for dividing the dynamic characteristic change data into non-abnormal characteristics and abnormal characteristics according to the inside and outside of the confidence interval, and taking the non-abnormal characteristics and the abnormal characteristics as target derivative characteristics;
and the anomaly detection module 305 is configured to perform anomaly detection on the high-density subgraph through an anomaly detection model in combination with the target derived features, so as to obtain a target anomaly high-density subgraph.
The function implementation of each module in the apparatus for detecting an abnormally high-density subgraph corresponds to each step in the method embodiment for detecting an abnormally high-density subgraph, and the function and implementation process thereof are not described in detail herein.
According to the embodiment of the invention, the risk capability of the high-density subgraph is analyzed by combining the static index of the high-density subgraph and the dynamic index in the dynamic evolution process, and the accuracy of detecting whether the high-density subgraph is abnormal is improved.
Referring to fig. 4, another embodiment of the apparatus for detecting an abnormally high density subgraph according to the present invention includes:
the segmentation processing module 301 is configured to obtain a complex relationship network to be analyzed, and perform real-time graph segmentation processing on the complex relationship network through a preset algorithm to obtain a high-density subgraph, where the high-density subgraph is used for indicating communities and association relationships among the communities;
the sampling processing module 302 is configured to sample network topology structure characteristics of the high-density subgraph according to a first preset time interval to obtain dynamic characteristic change data, where the dynamic characteristic change data is used to indicate that the high-density subgraph is dynamically changed along with time change;
the statistical calculation module 303 is configured to obtain static feature data in a historical complex relationship network, and perform statistics and calculation on the static feature data through a preset statistical model to obtain a confidence interval, where the historical complex relationship network is used to indicate a complex relationship network generated or stored before the complex relationship network, and the confidence interval is used to indicate an average variation range value of the static feature data between each time period;
the judgment analysis module 304 is used for dividing the dynamic characteristic change data into non-abnormal characteristics and abnormal characteristics according to the confidence interval and the confidence interval, and taking the non-abnormal characteristics and the abnormal characteristics as derived characteristics;
the anomaly detection module 305 is used for carrying out anomaly detection on the high-density subgraph by combining an anomaly detection model with target derived features to obtain a target anomaly high-density subgraph;
and the processing module 306 is used for performing abnormal degree classification processing, abnormal development prediction processing and similar type abnormal analysis processing on the target abnormal high-density subgraph to obtain a final target abnormal high-density subgraph.
Optionally, the segmentation processing module 301 is specifically configured to: acquiring a complex relationship network to be analyzed, initializing each node of the complex relationship network into different first communities, and calculating a first modularization metric value of the first community;
dividing each node into a second community in which adjacent nodes of each node are located, and calculating a second modularization metric value of the second community;
calculating a difference between the first modularization metric value and the second modularization metric value of each node;
analyzing whether the difference is a positive number or not, if not, continuing to perform community division processing on each node until the difference is the positive number to obtain divided communities, wherein the community division processing is used for indicating that each node is initialized to a different first community and each node is divided into a second community where adjacent nodes of each node are located;
and acquiring and analyzing the connection edge weight among communities in the divided communities, and taking a graph formed by the divided communities with the connection edge weights larger than a preset threshold value as a high-density subgraph.
Optionally, the sampling processing module 302 is specifically configured to: extracting the characteristics of the high-density subgraphs to obtain dynamic characteristic data;
extracting the network topological structure characteristics of the high-density subgraphs in real time to obtain dynamic characteristic data;
capturing the dynamic characteristic data according to a first preset time interval to obtain candidate dynamic characteristic change data;
and performing performance analysis and reliability analysis on the candidate dynamic characteristic change data to obtain dynamic characteristic change data.
Optionally, the statistical calculation module 303 is specifically configured to: acquiring a historical complex relation network, and selecting and extracting network topological structure characteristics of the historical complex relation network to obtain static characteristic data;
taking the static characteristic data as nodes, acquiring the incidence relation among the static characteristic data in the historical complex relation network, taking the incidence relation as a dividing condition, and generating a static high-density subgraph according to the nodes and the dividing condition;
acquiring time sequence data in the static high-density subgraph, and sampling the time sequence data according to a second preset time interval to obtain static characteristic change data;
according to a third preset time interval, counting the static characteristic change data to obtain statistical data corresponding to each time interval, wherein the statistical data corresponding to each time interval comprise the number of static high-density subgraphs and the mean value and the variance of the static characteristic change data in the third preset time interval;
and calculating the statistical data corresponding to each time interval by a preset formula to obtain a first confidence threshold and a second confidence threshold, and generating a confidence interval according to the first confidence threshold and the second confidence threshold.
Optionally, the judgment analysis module 304 is specifically configured to: performing time continuity analysis on the dynamic characteristic change data to obtain first characteristic data and second characteristic data which are continuous in time, wherein the time continuity is used for indicating that the tail end time point of the first characteristic data is the same as or connected with the start end time point of the second characteristic data;
calculating a feature difference value between the first feature data and the second feature data;
judging whether the feature difference value is outside the confidence interval or not;
if the feature difference value is not outside the confidence interval, setting the feature difference value to zero, and taking the first feature data and the second feature data corresponding to the feature difference value as non-abnormal features;
if the feature difference value is outside the confidence interval, setting the feature difference value to be 1, and taking the first feature data and the second feature data corresponding to the feature difference value as abnormal features;
and taking the non-abnormal features and the abnormal features as target derived features.
Optionally, the anomaly detection module 305 is specifically configured to: establishing and marking a corresponding relation between the target derived feature and the high-density subgraph through an anomaly detection model to obtain a marked high-density subgraph;
carrying out anomaly detection on the marked high-density subgraph by an isolated forest algorithm to obtain an initial abnormal high-density subgraph;
and carrying out anomaly detection on the initial anomaly high-density subgraph by using a clustering-based subspace anomaly detection algorithm to obtain a target anomaly high-density subgraph.
The function implementation of each module in the apparatus for detecting an abnormally high-density subgraph corresponds to each step in the method embodiment for detecting an abnormally high-density subgraph, and the function and implementation process thereof are not described in detail herein.
According to the embodiment of the invention, the risk capability of the high-density subgraph is analyzed by combining the static index of the high-density subgraph and the dynamic index in the dynamic evolution process, the accuracy of detecting whether the high-density subgraph is abnormal is improved, and the accuracy and the quality of acquiring the high-density subgraph with the abnormal target are improved by carrying out abnormal degree classification processing, abnormal development prediction processing and similar type abnormal analysis processing on the high-density subgraph with the abnormal target.
The above fig. 3 to fig. 4 describe the apparatus for detecting an abnormally high density sub-graph in the embodiment of the present invention in detail from the perspective of a modular functional entity, and the following describes the apparatus for detecting an abnormally high density sub-graph in the embodiment of the present invention in detail from the perspective of hardware processing.
Fig. 5 is a schematic structural diagram of an apparatus for detecting an abnormally high density sub-graph, where the apparatus 500 for detecting an abnormally high density sub-graph may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 501 (e.g., one or more processors) and a memory 509, one or more storage media 508 (e.g., one or more mass storage devices) for storing an application 507 or data 506. Memory 509 and storage medium 508 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 508 may include one or more modules (not shown), each of which may include a series of instruction operations for a check-in management device. Still further, the processor 501 may be arranged to communicate with a storage medium 508, executing a series of instruction operations in the storage medium 508 on the device 500 detecting an abnormally high density subgraph.
The device 500 for detecting an abnormally high density subgraph may further include one or more power supplies 502, one or more wired or wireless network interfaces 503, one or more input/output interfaces 504, and/or one or more operating systems 505, such as Windows Server, Mac OS X, Unix, L inux, FreeBSD, etc. it will be understood by those skilled in the art that the device structure for detecting an abnormally high density subgraph shown in FIG. 5 does not constitute a limitation of the device for detecting an abnormally high density subgraph, may include more or less components than those shown, or may combine some components, or a different arrangement of components.
The following describes the components of the apparatus for detecting an abnormally high density subgraph in detail with reference to fig. 5:
the processor 501 is a control center of a device that detects an abnormally high density subgraph, and can perform processing according to a method for detecting an abnormally high density subgraph. The processor 501 connects the various parts of the whole device for detecting the abnormal high-density subgraph by using various interfaces and lines, executes various functions of the device for detecting the abnormal high-density subgraph and processes data by running or executing software programs and/or modules stored in the memory 509 and calling the data stored in the memory 509, thereby realizing the function of improving the accuracy of detecting whether the high-density subgraph is abnormal or not. The storage medium 508 and the memory 509 are carriers for storing data, in the embodiment of the present invention, the storage medium 508 may be an internal memory with a small storage capacity but a high speed, and the memory 509 may be an external memory with a large storage capacity but a low storage speed.
The memory 509 may be used to store software programs and modules, and the processor 501 executes various functional applications and data processing of the device 500 for detecting an abnormally high density subgraph by running the software programs and modules stored in the memory 509. The memory 509 may mainly include a program storage area and a data storage area, where the program storage area may store an operating system, an application program required by at least one function (obtaining a complex relationship network to be analyzed, and performing real-time graph partitioning processing on the complex relationship network through a preset algorithm to obtain a high-density subgraph, etc.), and the like; the storage data area may store data created according to the use of the check-in management device (the network topology characteristics of the high-density subgraph are sampled at first preset time intervals to obtain dynamic characteristic change data, etc.), and the like. Further, the memory 509 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. The method program for detecting an abnormally high density subgraph and the received data stream provided in the embodiment of the invention are stored in a memory, and when the method program needs to be used, the processor 501 calls the method program from the memory 509.
When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, optical fiber, twisted pair) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a server, a data center, etc., that is integrated with one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., compact disk), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for detecting an abnormally high density subgraph, comprising:
acquiring a complex relation network to be analyzed, and performing real-time graph segmentation processing on the complex relation network through a preset algorithm to obtain a high-density subgraph, wherein the high-density subgraph is used for indicating communities and incidence relations among the communities;
sampling the network topology structure characteristics of the high-density subgraph according to a first preset time interval to obtain dynamic characteristic change data, wherein the dynamic characteristic change data is used for indicating the network topology structure characteristic data of the high-density subgraph which dynamically changes along with the time change;
acquiring static characteristic data in a historical complex relationship network, and counting and calculating the static characteristic data through a preset counting model to obtain a confidence interval, wherein the historical complex relationship network is used for indicating a complex relationship network generated or stored before the complex relationship network, and the confidence interval is used for indicating an average variation range value of the static characteristic data between each time period;
dividing the dynamic feature change data into non-abnormal features and abnormal features according to the confidence interval and the confidence interval, and taking the non-abnormal features and the abnormal features as target derivative features;
and carrying out anomaly detection on the high-density subgraph by combining an anomaly detection model with the target derivative characteristics to obtain a target anomaly high-density subgraph.
2. The method for detecting an abnormal high-density subgraph according to claim 1, wherein the step of obtaining static feature data in the historical complex relationship network, and carrying out statistics and calculation on the static feature data through a preset statistical model to obtain a confidence interval comprises the following steps:
acquiring a historical complex relationship network, and selecting and extracting network topological structure characteristics of the historical complex relationship network to obtain static characteristic data;
taking the static characteristic data as nodes, acquiring the incidence relation among the static characteristic data in the historical complex relation network, taking the incidence relation as a dividing condition, and generating a static high-density subgraph according to the nodes and the dividing condition;
acquiring time sequence data in the static high-density subgraph, and sampling the time sequence data according to a second preset time interval to obtain static characteristic change data;
counting the static characteristic change data according to a third preset time interval to obtain statistical data corresponding to each time interval, wherein the statistical data corresponding to each time interval comprise the number of the static high-density subgraphs and the mean and variance of the static characteristic change data in the third preset time interval;
and calculating the statistical data corresponding to each time interval by a preset formula to obtain a first confidence threshold and a second confidence threshold, and generating a confidence interval according to the first confidence threshold and the second confidence threshold.
3. The method for detecting an abnormal high-density subgraph according to claim 2, wherein the step of dividing the dynamic feature change data into non-abnormal features and abnormal features according to the inside and outside of the confidence interval and taking the non-abnormal features and the abnormal features as target derivative features comprises the following steps:
performing time continuity analysis on the dynamic characteristic change data to obtain first characteristic data and second characteristic data which are continuous in time, wherein the time continuity is used for indicating that the tail end time point of the first characteristic data is the same as or connected with the start end time point of the second characteristic data;
calculating a feature difference value between the first feature data and the second feature data;
judging whether the feature difference value is outside the confidence interval or not;
if the feature difference value is not outside the confidence interval, setting the feature difference value to zero, and taking the first feature data and the second feature data corresponding to the feature difference value as non-abnormal features;
if the feature difference value is outside the confidence interval, setting the feature difference value to be 1, and taking first feature data and second feature data corresponding to the feature difference value as abnormal features;
and taking the non-abnormal features and the abnormal features as target derived features.
4. The method for detecting an abnormally-high-density subgraph according to claim 3, wherein the step of carrying out abnormal detection on the high-density subgraph by combining an abnormal detection model and the target derived features to obtain a target abnormally-high-density subgraph comprises the following steps:
establishing and marking the corresponding relation between the target derived features and the high-density subgraph through an anomaly detection model to obtain a marked high-density subgraph;
carrying out anomaly detection on the marked high-density subgraph by an isolated forest algorithm to obtain an initial abnormal high-density subgraph;
and carrying out anomaly detection on the initial anomaly high-density subgraph by using a clustering-based subspace anomaly detection algorithm to obtain a target anomaly high-density subgraph.
5. The method according to claim 1, wherein the obtaining a complex relationship network to be analyzed and performing real-time graph partitioning processing on the complex relationship network through a preset algorithm to obtain a high-density subgraph comprises:
acquiring a complex relationship network to be analyzed, initializing each node of the complex relationship network into different first communities, and calculating a first modularization metric value of the first communities;
dividing each node into a second community in which adjacent nodes of each node are located, and calculating a second modularization metric value of the second community;
calculating a difference between the first modularity metric value and the second modularity metric value for each node;
analyzing whether the difference is a positive number or not, if not, continuing to perform community division processing on each node until the difference is the positive number to obtain divided communities, wherein the community division processing is used for indicating that each node is initialized to be different first communities and is respectively divided into second communities where adjacent nodes of each node are located;
and acquiring and analyzing the weight of the connection edge among the communities in the divided communities, and taking a graph formed by the divided communities with the weight average of the connection edge greater than a preset threshold value as a high-density subgraph.
6. The method according to claim 5, wherein the sampling the network topology features of the high-density subgraph according to the first preset time interval to obtain dynamic feature change data includes:
extracting the network topological structure characteristics of the high-density subgraph in real time to obtain dynamic characteristic data;
capturing the dynamic characteristic data according to a first preset time interval to obtain candidate dynamic characteristic change data;
and performing performance analysis and reliability analysis on the candidate dynamic characteristic change data to obtain dynamic characteristic change data.
7. The method for detecting an abnormally high-density subgraph according to any one of claims 1 to 6, wherein after the high-density subgraph is detected abnormally by combining the target derived features through an abnormal detection model, the method for detecting the abnormally high-density subgraph further comprises the following steps:
and carrying out abnormal degree classification processing, abnormal development prediction processing and same type abnormal analysis processing on the target abnormal high-density subgraph to obtain a final target abnormal high-density subgraph.
8. An apparatus for detecting abnormally high density subgraphs, the apparatus comprising:
the system comprises a segmentation processing module, a correlation analysis module and a correlation analysis module, wherein the segmentation processing module is used for acquiring a complex relationship network to be analyzed and carrying out real-time graph segmentation processing on the complex relationship network through a preset algorithm to obtain a high-density subgraph, and the high-density subgraph is used for indicating communities and incidence relations among the communities;
the sampling processing module is used for sampling the network topological structure characteristics of the high-density subgraph according to a first preset time interval to obtain dynamic characteristic change data, and the dynamic characteristic change data is used for indicating the network topological structure characteristic data of the high-density subgraph which dynamically changes along with the time change;
the statistical calculation module is used for acquiring static characteristic data in a historical complex relationship network, and performing statistics and calculation on the static characteristic data through a preset statistical model to obtain a confidence interval, wherein the historical complex relationship network is used for indicating a complex relationship network generated or stored in front of the complex relationship network, and the confidence interval is used for indicating an average variation range value of the static characteristic data between each time period;
the judgment analysis module is used for dividing the dynamic characteristic change data into non-abnormal characteristics and abnormal characteristics according to the confidence interval and the confidence interval, and taking the non-abnormal characteristics and the abnormal characteristics as target derivative characteristics;
and the anomaly detection module is used for carrying out anomaly detection on the high-density subgraph by combining an anomaly detection model with the target derived characteristics to obtain a target anomaly high-density subgraph.
9. Device for detecting abnormally high density subgraphs, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method for detecting abnormally high density subgraphs as claimed in any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method of detecting an abnormally high density subgraph according to any one of claims 1 to 7.
CN202010226309.8A 2020-03-27 2020-03-27 Method, device, equipment and storage medium for detecting abnormal high-density subgraph Pending CN111475680A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010226309.8A CN111475680A (en) 2020-03-27 2020-03-27 Method, device, equipment and storage medium for detecting abnormal high-density subgraph
PCT/CN2020/103200 WO2021189730A1 (en) 2020-03-27 2020-07-21 Method, apparatus and device for detecting abnormal dense subgraph, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010226309.8A CN111475680A (en) 2020-03-27 2020-03-27 Method, device, equipment and storage medium for detecting abnormal high-density subgraph

Publications (1)

Publication Number Publication Date
CN111475680A true CN111475680A (en) 2020-07-31

Family

ID=71750252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010226309.8A Pending CN111475680A (en) 2020-03-27 2020-03-27 Method, device, equipment and storage medium for detecting abnormal high-density subgraph

Country Status (2)

Country Link
CN (1) CN111475680A (en)
WO (1) WO2021189730A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112134862A (en) * 2020-09-11 2020-12-25 国网电力科学研究院有限公司 Coarse-fine granularity mixed network anomaly detection method and device based on machine learning
CN112669299A (en) * 2020-12-31 2021-04-16 上海智臻智能网络科技股份有限公司 Defect detection method and device, computer equipment and storage medium
WO2022116689A1 (en) * 2020-12-03 2022-06-09 腾讯科技(深圳)有限公司 Graph data processing method and apparatus, computer device and storage medium
CN115134246A (en) * 2021-03-22 2022-09-30 中国移动通信集团河南有限公司 Network performance index monitoring method, device, equipment and storage medium

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7403431B2 (en) * 2020-11-13 2023-12-22 株式会社日立製作所 Data integration methods and data integration systems
CN113837874B (en) * 2021-11-22 2022-04-12 北京芯盾时代科技有限公司 Data identification method and device, storage medium and electronic equipment
CN114201535A (en) * 2021-12-14 2022-03-18 平安科技(深圳)有限公司 Abnormal data detection method and device, computer equipment and storage medium
CN114257493B (en) * 2021-12-17 2024-04-23 中国电信股份有限公司 Fault early warning method and device for network node, medium and electronic equipment
CN116055385B (en) * 2022-12-30 2024-06-18 中国联合网络通信集团有限公司 Routing method, management node, routing node and medium
CN115912359B (en) * 2023-02-23 2023-07-25 豪派(陕西)电子科技有限公司 Digital potential safety hazard identification, investigation and treatment method based on big data
CN116151511B (en) * 2023-03-01 2023-10-20 国网山东省电力公司菏泽供电公司 Intelligent diagnosis management method and system for distribution feeder and transformer area based on data processing
CN116204690B (en) * 2023-04-28 2023-07-18 泰力基业股份有限公司 Block terminal data transmission system with automatic fire extinguishing function
CN116844684B (en) * 2023-05-18 2024-04-02 首都医科大学附属北京朝阳医院 Quality control processing method, device, equipment and medium for medical inspection result
CN116269738B (en) * 2023-05-25 2023-08-01 深圳市科医仁科技发展有限公司 Intelligent control method, device, equipment and storage medium of radio frequency therapeutic apparatus
CN116628554B (en) * 2023-05-31 2023-11-03 烟台大学 Industrial Internet data anomaly detection method, system and equipment
CN117282261B (en) * 2023-11-23 2024-02-23 天津恩纳社环保有限公司 Microorganism waste gas treatment system
CN117436006B (en) * 2023-12-22 2024-03-15 圣道天德电气(山东)有限公司 Intelligent ring main unit fault real-time monitoring method and system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018203956A1 (en) * 2017-05-02 2018-11-08 Google Llc Systems and methods to detect clusters in graphs
CN109816535A (en) * 2018-12-13 2019-05-28 中国平安财产保险股份有限公司 Cheat recognition methods, device, computer equipment and storage medium
CN109711746A (en) * 2019-01-02 2019-05-03 中国联合网络通信集团有限公司 A kind of credit estimation method and system based on complex network
CN109788001B (en) * 2019-03-07 2021-06-25 武汉极意网络科技有限公司 Suspicious internet protocol address discovery method, user equipment, storage medium and device

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112134862A (en) * 2020-09-11 2020-12-25 国网电力科学研究院有限公司 Coarse-fine granularity mixed network anomaly detection method and device based on machine learning
CN112134862B (en) * 2020-09-11 2023-09-08 国网电力科学研究院有限公司 Coarse-fine granularity hybrid network anomaly detection method and device based on machine learning
WO2022116689A1 (en) * 2020-12-03 2022-06-09 腾讯科技(深圳)有限公司 Graph data processing method and apparatus, computer device and storage medium
US11935049B2 (en) 2020-12-03 2024-03-19 Tencent Technology (Shenzhen) Company Limited Graph data processing method and apparatus, computer device, and storage medium
CN112669299A (en) * 2020-12-31 2021-04-16 上海智臻智能网络科技股份有限公司 Defect detection method and device, computer equipment and storage medium
CN112669299B (en) * 2020-12-31 2023-04-07 上海智臻智能网络科技股份有限公司 Flaw detection method and device, computer equipment and storage medium
CN115134246A (en) * 2021-03-22 2022-09-30 中国移动通信集团河南有限公司 Network performance index monitoring method, device, equipment and storage medium
CN115134246B (en) * 2021-03-22 2023-07-21 中国移动通信集团河南有限公司 Network performance index monitoring method, device, equipment and storage medium

Also Published As

Publication number Publication date
WO2021189730A1 (en) 2021-09-30

Similar Documents

Publication Publication Date Title
CN111475680A (en) Method, device, equipment and storage medium for detecting abnormal high-density subgraph
CN109818961B (en) Network intrusion detection method, device and equipment
CN109088869B (en) APT attack detection method and device
CN112118141A (en) Communication network-oriented alarm event correlation compression method and device
CN110162970A (en) A kind of program processing method, device and relevant device
CN109257383B (en) BGP anomaly detection method and system
CN111651767A (en) Abnormal behavior detection method, device, equipment and storage medium
CN110245687B (en) User classification method and device
CN112134862B (en) Coarse-fine granularity hybrid network anomaly detection method and device based on machine learning
CN114386538B (en) Method for marking wave band characteristics of KPI (Key performance indicator) curve of monitoring index
CN114742477B (en) Enterprise order data processing method, device, equipment and storage medium
CN113125903A (en) Line loss anomaly detection method, device, equipment and computer-readable storage medium
CN111984442A (en) Method and device for detecting abnormality of computer cluster system, and storage medium
CN116662817B (en) Asset identification method and system of Internet of things equipment
CN115484112B (en) Payment big data safety protection method, system and cloud platform
CN113259176A (en) Alarm event analysis method and device
CN103530312A (en) User identification method and system using multifaceted footprints
CN117041017A (en) Intelligent operation and maintenance management method and system for data center
Megantara et al. Feature importance ranking for increasing performance of intrusion detection system
CN112202718A (en) XGboost algorithm-based operating system identification method, storage medium and device
CN107274025B (en) System and method for realizing intelligent identification and management of power consumption mode
CN112953948A (en) Real-time network transverse worm attack flow detection method and device
CN114978878B (en) Positioning method, positioning device, electronic equipment and computer readable storage medium
CN112465073B (en) Numerical distribution abnormity detection method and detection system based on distance
CN111835541B (en) Method, device, equipment and system for detecting aging of flow identification model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination