CN112612832B - Node analysis method, device, equipment and storage medium - Google Patents

Node analysis method, device, equipment and storage medium Download PDF

Info

Publication number
CN112612832B
CN112612832B CN202011499271.8A CN202011499271A CN112612832B CN 112612832 B CN112612832 B CN 112612832B CN 202011499271 A CN202011499271 A CN 202011499271A CN 112612832 B CN112612832 B CN 112612832B
Authority
CN
China
Prior art keywords
node
graph database
information
super
edges
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011499271.8A
Other languages
Chinese (zh)
Other versions
CN112612832A (en
Inventor
李艳红
冯宇波
张俊杰
毛勇岗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ruian Technology Co Ltd
Original Assignee
Beijing Ruian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Ruian Technology Co Ltd filed Critical Beijing Ruian Technology Co Ltd
Priority to CN202011499271.8A priority Critical patent/CN112612832B/en
Publication of CN112612832A publication Critical patent/CN112612832A/en
Application granted granted Critical
Publication of CN112612832B publication Critical patent/CN112612832B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Abstract

The invention discloses a node analysis method, a device, equipment and a storage medium, wherein the node analysis method comprises the following steps: the method comprises the steps of obtaining node information and side information of a graph database, determining sides associated with nodes based on the node information and the side information, counting the number of the sides associated with the nodes aiming at each node, and determining that the nodes are super nodes when the number of the sides associated with the nodes is larger than a preset threshold value. In the embodiment, the number of edges owned by each node is determined through the incidence relation between the nodes and the edges, and then the super nodes are determined according to the number of the edges, so that each node in the graph database does not need to be traversed, the efficiency of searching the super nodes is improved, and the pressure of node searching for normal service of the graph database is reduced.

Description

Node analysis method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of databases, in particular to a node analysis method, a node analysis device, node analysis equipment and a storage medium.
Background
Graph databases are a type of NoSQL database that uses graph theory to store relationship information between entities. The most common example is the interpersonal relationship in a social network. For example, janus graph is an extensible graph database that can store graphs containing hundreds of billions of nodes and edges on a multi-machine cluster, using modular interfaces for data persistence, indexing, and client access. Apache HBase is a typical back-end storage system it supports.
The graph database generally has some super nodes, and the super nodes refer to nodes with higher edge number to the edge number of the whole network. The graph database has poor efficiency in the operations of adding, deleting and modifying related to the super nodes, and the super nodes encountered in the query process can also explode the query result to cause graph traversal to be incapable of continuing, so that it is very necessary to identify the super nodes in the graph database.
When the data volume is very large, the traditional graph traversal mode is adopted to find the super nodes, so that the efficiency is low, the normal service of a graph database is stressed greatly, and even all the super nodes cannot be calculated and obtained.
Disclosure of Invention
The embodiment of the invention provides a node analysis method, a node analysis device, a node analysis equipment and a storage medium, which improve the efficiency of searching for super nodes and reduce the pressure of node searching for normal service of a graph database.
In a first aspect, an embodiment of the present invention provides a node analysis method, including:
acquiring node information and side information of a graph database;
determining an edge associated with a node based on the node information and the edge information;
for each node, counting the number of edges associated with the node,
and when the number of the edges associated with the node is larger than a preset threshold value, determining that the node is a super node.
Further, obtaining node information and side information of the graph database includes:
loading a metadata file of the graph database;
reading and analyzing the original record of the graph database;
and acquiring node information and side information of the graph database from the original record.
Further, counting the number of edges associated with the node includes:
counting a total number of edges associated with the node;
correspondingly, when the number of edges associated with the node is greater than a preset threshold, determining that the node is a super node includes: when the total number of edges associated with the node is greater than a first preset threshold value, determining that the node is a super node.
Further, counting the number of edges associated with the node, including: obtaining an edge type associated with the node; counting the number of edges corresponding to the edge types respectively;
correspondingly, when the number of edges associated with the node is greater than a preset threshold, determining that the node is a super node includes:
and when the number of edges corresponding to any one edge type exceeds a second preset threshold corresponding to the edge type, determining that the node corresponding to the node identifier is a super node.
Further, before obtaining the node identifier and the edge identifier of the graph database, the method further includes:
loading configuration information of the graph database;
initializing the graph library management interface through the configuration information;
and connecting the figure database through the figure database management interface.
Wherein the configuration information comprises: and the rear end of the graph database stores the host name, the port number, the table name and the preset threshold value of the system.
Further, after determining that the node is a super node, the method further includes:
sequencing all super nodes according to the order of the number of edges from large to small;
and sending the node information, the side information and the side quantity corresponding to all the sequenced super nodes to a preset file.
In a second aspect, an embodiment of the present invention further provides a node analysis apparatus, including:
the information acquisition module is used for acquiring node information and side information of a graph database;
an associated edge determination module to determine an edge associated with a node based on the node information and the edge information;
an edge number counting module for counting, for each node, the number of edges associated with the node,
and the super node determining module is used for determining that the node is a super node when the number of the edges associated with the node is greater than a preset threshold value.
In a third aspect, an embodiment of the present invention further provides a node analysis device, including:
one or more processors;
a memory for storing one or more programs;
the one or more programs are executed by the one or more processors to cause the one or more processors to implement the node analysis method as provided in the first aspect above.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which one or more computer programs are stored, which when executed by a processor implement the node analysis method as provided in the first aspect above.
In the node analysis method, apparatus, device and storage medium provided in the foregoing embodiments, the node analysis method includes: the method comprises the steps of obtaining node information and side information of a graph database, determining sides associated with nodes based on the node information and the side information, counting the number of the sides associated with the nodes aiming at each node, and determining that the nodes are super nodes when the number of the sides associated with the nodes is larger than a preset threshold value. In the embodiment, the number of edges owned by each node is determined through the incidence relation between the nodes and the edges, and then the super nodes are determined according to the number of the edges, so that each node in a graph database does not need to be traversed, the efficiency of searching the super nodes is improved, and the pressure of searching the graph database by the nodes during normal service is reduced.
Drawings
Fig. 1 is a flowchart of a node analysis method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a graph database supernode analysis method based on HBase storage according to an embodiment of the present invention;
FIG. 3 is a flowchart of a HBase storage-based graph database metadata processing procedure according to an embodiment of the present invention;
FIG. 4 is a flowchart of a graph database supernode analysis processing procedure based on HBase storage according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a graph database supernode analysis system based on HBase storage according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a node analysis apparatus according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a hardware structure of a device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Fig. 1 is a flowchart of a node analysis method according to an embodiment of the present invention, where the method is suitable for searching for a super node in a graph database, and the method may be performed by a node analysis apparatus, and the apparatus may be implemented by hardware and/or software. The node analysis device may be formed by two or more physical entities, or may be formed by one physical entity, and is generally integrated in a computer device.
It should be noted that the node analysis method provided in this embodiment may be specifically used in a computer device, and may be considered to be specifically executed by a node analysis apparatus integrated on the computer device, where the computer device may specifically be a computer device including a processor, a memory, an input device, and an output device. Such as a notebook computer, a desktop computer, a tablet computer, an intelligent terminal, etc.
Specifically, as shown in fig. 1, the node analysis method provided in the embodiment of the present invention specifically includes the following operations:
s11, obtaining node information and side information of the graph database.
The node information may include a node ID, a node label, and a node attribute. The edge information may be information including an edge label, an edge attribute, an ID of a source node associated with the edge connection, an ID of a destination node associated with the edge connection, and the like.
Among them, the graph database is a type of NoSQL database, which stores relationship information between entities using graph theory. The graph database in this embodiment is preferably a JanusGraph database, where the JanusGraph database is an extensible graph database, and can store a graph including hundreds of billions of nodes and edges on a multi-machine cluster, and uses a modular interface for data persistence, indexing, and client access.
In one embodiment, obtaining node information and side information for a graph database includes: and loading the metadata file of the graph database, reading and analyzing the original record of the graph database, and acquiring the node information and the side information of the graph database from the original record.
In one embodiment, before obtaining the node identifier and the edge identifier of the graph database, the method further comprises: loading configuration information of the graph database; initializing the graph library management interface through the configuration information; the graph number library is connected through the graph number library management interface.
Wherein the configuration information comprises: and the rear end of the graph database stores the host name, the port number, the table name and the preset threshold value of the system. Wherein, the database back-end storage system is preferably HBase.
In this embodiment, before loading the configuration information of the graph database, the method further includes: and setting graph database connection parameters and a preset threshold value.
Further, loading the map database configuration information, acquiring parameters such as host names, ports and table names of the back-end storage system, initializing a map database management interface, and connecting a map database. And creating distributed operation and reading the database data in a parallel task mode.
The distributed job can be MapReduce job or Spark job, and can read the original data record of the back-end storage system in a parallel task mode, and perform super node calculation after converting the format into a graph database element.
Further, loading a metadata file of the graph database in the Map task, and storing the metadata file into a file, wherein the metadata file comprises a node metadata file and an edge metadata file, reading and analyzing all graph database original records one by one based on the metadata file, and acquiring node information and edge information of the graph database from the original records.
Specifically, the metadata file is output to the local file according to the JSON format, and meanwhile, the capability of acquiring the metadata from the local file is provided, metadata related services are provided for a database data record analysis module, and the influence on performance caused by repeated retrieval on a database is avoided.
And S12, determining edges associated with the nodes based on the node information and the edge information.
And S13, counting the number of edges associated with the nodes aiming at each node.
And S14, when the number of edges associated with the node is larger than a preset threshold value, determining that the node is a super node.
In an embodiment, after determining that the node is a super node, the method further includes: sequencing all super nodes according to the order of the number of edges from large to small; and sending the node information, the side information and the side quantity corresponding to all the sequenced super nodes to a preset file.
In this embodiment, the discovered supernodes are merged in the Reduce task, and the result is output to a specified file for subsequent data cleaning and other work.
In one embodiment, counting the number of edges associated with a node includes: counting a total number of edges associated with the node; correspondingly, when the number of edges associated with the node is greater than a preset threshold, determining that the node is a super node includes: when the total number of edges associated with the node is greater than a first preset threshold value, determining that the node is a super node.
In this embodiment, the preset threshold includes a first preset threshold, where the first threshold refers to a threshold related to the total number of edges associated with the node. The first preset threshold may be set according to actual situations, and this embodiment is not limited.
In one embodiment, counting the number of edges associated with a node includes: obtaining an edge type associated with the node; counting the number of edges corresponding to the edge types respectively; correspondingly, when the number of edges associated with the node is greater than a preset threshold, determining that the node is a super node includes: and when the number of edges corresponding to any one edge type exceeds a second preset threshold corresponding to the edge type, determining that the node corresponding to the node identifier is a super node.
In this embodiment, the second preset threshold may be a single threshold or a set of multiple thresholds. Preferably, the second preset threshold is a set of a plurality of thresholds. I.e. different edge types may correspond to different second preset thresholds. For example: when the edge type is the age of the student, the second preset threshold is 50; when the edge type is student achievement, the second preset threshold is 60.
And all the connected edges of the analyzed graph nodes are classified and counted according to the edge label types in the memory, a final statistic value is calculated according to a preset rule of the super node, if the value exceeds the threshold value, the super node is judged, and relevant information of the super node, such as the node ID, the attribute value, the edge statistic value and the like, is output.
Specifically, for each node, the edge type associated with the node is counted, and the number of edges under each edge type is counted. And aiming at each edge type, comparing the number of the edges under the edge type with a second preset threshold corresponding to the edge type. And if the number of the edges under any edge type is larger than a second preset threshold corresponding to the edge type, determining that the node corresponding to the node identifier is a super node.
The node analysis method provided by the embodiment of the invention comprises the following steps: the method comprises the steps of obtaining node information and side information of a graph database, determining sides associated with nodes based on the node information and the side information, counting the number of the sides associated with the nodes aiming at each node, and determining that the nodes are super nodes when the number of the sides associated with the nodes is larger than a preset threshold value. In the embodiment, the number of edges owned by each node is determined through the incidence relation between the nodes and the edges, and then the super nodes are determined according to the number of the edges, so that each node in the graph database does not need to be traversed, the efficiency of searching the super nodes is improved, and the pressure of node searching for normal service of the graph database is reduced.
In one embodiment, fig. 2 is a flowchart of a graph database super node analysis method based on HBase storage according to an embodiment of the present invention, where the method is applicable to a case where an original record stored in a graph database is subjected to distributed computation and a super node is output. The method specifically comprises the following steps:
and step 110, setting graph database connection parameters and a super node preset threshold value.
And step 120, connecting the graph database, acquiring metadata such as node and edge definitions by using a graph database management interface, and storing the metadata into a file.
And step 130, creating distributed operation, and reading graph database data in a parallel task mode.
Step 140, loading graph database nodes and edge-related metadata files in the Map task, analyzing graph data records one by one, acquiring node and edge-related IDs and attribute values, performing classification counting on edges according to super node threshold setting, judging as super nodes if the super nodes exceed the threshold, and outputting information such as the node IDs, the attribute values and edge statistical values.
And 150, combining the discovered super nodes in the Reduce task, and outputting the result to a specified file for subsequent data cleaning and other work.
The invention provides a method and a system for analyzing a super node of a graph database, which can efficiently find out the super node so as to avoid the performance influence of the super node on the graph database by optimizing in the subsequent process.
In an embodiment, fig. 3 is a flowchart of a graph database metadata processing procedure based on HBase storage according to an embodiment of the present invention, which includes the following steps:
step 201, loading graph database configuration information, and acquiring parameters such as host names, ports and table names of the back-end storage system.
Step 202, initializing a graph database management interface and connecting a graph database.
Step 203, obtaining node definition metadata by using a graph database management interface.
Step 204, using the graph database management interface to obtain edge definition metadata.
And step 205, closing graph database connection, and outputting the graph database metadata to a file according to a JSON format.
In an embodiment, fig. 4 is a flowchart of a graph database supernode analysis processing procedure based on HBase storage according to an embodiment of the present invention, which includes the following steps:
step 301, initializing a super node calculation task, and acquiring a super node correlation threshold.
Step 302, load the graph database metadata file.
Step 303, reading the original record of the graph database data.
And step 304, analyzing the data records of the database, acquiring relevant IDs and attribute values of the nodes and all edges thereof, and classifying and counting the edges according to the relevant threshold of the super nodes.
And 305, if the node ID exceeds the threshold, determining the node as a super node, and outputting the node ID, the attribute value and the edge statistic value.
And step 306, merging all the discovered super nodes, and outputting the super nodes to an external file in a reverse order according to the edge statistics.
In an embodiment, fig. 5 is a schematic structural diagram of a graph database supernode analysis system based on HBase storage according to an embodiment of the present invention. The system comprises a configuration file loading module, a graph database metadata processing module, a distributed operation execution module, a graph database data recording and analyzing module, a graph database super node calculating module and a task management module.
The configuration file loading module 410 is configured to read a parameter value from a configuration file, and verify the validity of the parameter value, where the content of the configuration file includes a host name, a port number, a table name, a super node correlation threshold, and the like of a rear-end storage system of the graph database.
The graph database metadata processing module 420 is connected with a graph database through a graph database management interface, obtains node and edge definition related metadata, outputs the metadata to a local file according to a JSON format, provides the capability of obtaining metadata from the local file, provides metadata related services for a graph database data recording and analyzing module, and avoids performance influence caused by repeated retrieval of the graph database.
The distributed job execution module 430 creates a MapReduce/Spark job according to the characteristics of the rear-end storage system of the graph database, configures job parameters, submits the job to a cluster for running, and monitors the job execution state.
The graph database data record analysis module 440 analyzes the original record read from the graph database back-end storage system, and in combination with the graph database metadata, converts the original record format into identifiable graph database elements, such as node IDs, node tags and attributes, and all the edge-related source node IDs, target node IDs, edge tags and attributes connected to the node.
The graph database super node calculation module 450 classifies and counts all the connected edges of the analyzed graph nodes in the memory according to the edge label types, calculates a final statistical value according to a super node preset rule, determines the super node if the value exceeds the threshold value, and outputs super node related information such as the node ID, the attribute value, the edge statistical value and the like.
The task management module 460 reasonably allocates resources by arranging other module work units, and ensures that all work units complete the computation and analysis tasks of the database data supernode in sequence.
Fig. 6 is a schematic structural diagram of a node analysis apparatus according to an embodiment of the present invention, where the apparatus is suitable for searching for a super node in a graph database, and the method may be executed by the node analysis apparatus, and the apparatus may be implemented by hardware and/or software. The node analysis device may be formed by two or more physical entities, or may be formed by one physical entity, and is generally integrated in a computer device.
Specifically, as shown in fig. 6, the node analysis apparatus provided in the embodiment of the present invention specifically includes an information obtaining module 61, an associated edge determining module 62, an edge number counting module 63, and a super node determining module 64.
The information acquisition module 61 is configured to acquire node information and side information of a graph database;
an associated edge determination module 62 for determining an edge associated with a node based on the node information and the side information;
an edge number counting module 63 for counting, for each node, the number of edges associated with the node,
a super node determining module 64, configured to determine that the node is a super node when the number of edges associated with the node is greater than a preset threshold.
The node analysis device provided in the embodiment of the present invention includes: the method comprises the steps of obtaining node information and side information of a graph database, determining sides associated with nodes based on the node information and the side information, counting the number of the sides associated with the nodes aiming at each node, and determining that the node is a super node when the number of the sides associated with the node is larger than a preset threshold value. In the embodiment, the number of edges owned by each node is determined through the incidence relation between the nodes and the edges, and then the super nodes are determined according to the number of the edges, so that each node in the graph database does not need to be traversed, the efficiency of searching the super nodes is improved, and the pressure of node searching for normal service of the graph database is reduced.
Further, the information obtaining module 61 is specifically configured to obtain node information and side information of a graph database, and includes: loading metadata files of the graph database; reading and analyzing the original record of the graph database; and acquiring node information and side information of the graph database from the original record.
Further, the edge number counting module 63 is specifically configured to count the total number of edges associated with the node;
correspondingly, the super node determining module 64 is specifically configured to determine that the node is a super node when the total number of edges associated with the node is greater than a first preset threshold.
Further, the edge number counting module 63 is specifically configured to obtain edge types associated with the nodes, and count the number of edges corresponding to the edge types respectively;
correspondingly, the super node determining module 64 is specifically configured to determine that the node corresponding to the node identifier is a super node when the number of edges corresponding to any one of the edge types exceeds a second preset threshold corresponding to the edge type.
Further, before obtaining the node identifier and the edge identifier of the graph database, the method further includes:
loading configuration information of the graph database;
initializing the graph library management interface through the configuration information;
the graph number library is connected through the graph number library management interface.
Wherein the configuration information comprises: and the rear end of the graph database stores the host name, the port number, the table name and the preset threshold value of the system.
Further, after determining that the node is a super node, the method further includes:
sequencing all super nodes according to the order of the number of edges from large to small;
and sending the node information, the side information and the side quantity corresponding to all the sequenced super nodes to a preset file.
The node analysis device provided by the embodiment of the invention can execute the node analysis method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Fig. 7 is a schematic diagram of a hardware structure of an apparatus according to an embodiment of the present invention, as shown in fig. 7, the apparatus includes a processor 701, a memory 702, an input device 703 and an output device 704; the number of the processors 701 in the device may be one or more, and one processor 701 is taken as an example in fig. 7; the processor 701, the memory 702, the input device 703 and the output device 704 of the apparatus may be connected by a bus or other means, for example, in fig. 7.
The memory 702 is used as a computer-readable storage medium, and can be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the node analysis method in the embodiment of the present invention (for example, the modules in the node analysis apparatus shown in fig. 6 include the information obtaining module 61, the associated edge determining module 62, the edge number counting module 63, and the super node determining module 64). The processor 701 executes software programs, instructions, and modules stored in the memory 702 to execute various functional applications of the device and data processing, that is, to implement the node analysis method described above.
The memory 702 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory 702 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 702 may further include memory located remotely from the processor 701, which may be connected to devices through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
And, when the one or more programs included in the above-mentioned apparatus are executed by the one or more processors 701, the programs perform the following operations:
acquiring node information and side information of a graph database;
determining edges associated with nodes based on the node information and the edge information;
counting, for each node, the number of edges associated with the node;
and when the number of the edges associated with the node is greater than a preset threshold value, determining that the node is a super node.
The input device 703 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function controls of the apparatus. The output device 704 may include a display device such as a display screen.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processing apparatus, implements a node analysis method provided in the embodiment of the present invention, where the method includes:
acquiring node information and side information of a graph database;
determining edges associated with nodes based on the node information and the edge information;
counting, for each node, the number of edges associated with the node;
and when the number of the edges associated with the node is larger than a preset threshold value, determining that the node is a super node.
Of course, the storage medium provided by the embodiment of the present invention contains computer-executable instructions, and the computer-executable instructions are not limited to the method operations described above, and may also perform related operations in the node analysis method provided by any embodiment of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the embodiment of the node analysis apparatus, each included unit and each included module are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. Those skilled in the art will appreciate that the present invention is not limited to the particular embodiments described herein, and that various obvious changes, rearrangements and substitutions will now be apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (8)

1. A node analysis method, comprising:
acquiring node information and side information of a graph database;
determining edges associated with nodes based on the node information and the edge information;
counting, for each node, the number of edges associated with the node;
when the number of edges associated with the node is greater than a preset threshold value, determining that the node is a super node;
the obtaining of the node information and the side information of the graph database includes:
loading a metadata file of the graph database in a Map task;
reading and analyzing the original records of the graph database one by one based on the metadata files;
acquiring node information and side information of a graph database from the original record;
wherein, the counting the number of edges associated with the node comprises:
obtaining an edge type associated with the node;
counting the number of edges corresponding to the edge types respectively;
wherein, when the number of edges associated with a node is greater than a preset threshold, determining that the node is a super node includes:
when the number of edges corresponding to any one edge type exceeds a second preset threshold corresponding to the edge type, determining that a node corresponding to the node identifier is a super node; wherein the second preset threshold is a set of multiple thresholds.
2. The method of claim 1, wherein counting the number of edges associated with a node comprises:
counting a total number of edges associated with the node;
correspondingly, when the number of edges associated with the node is greater than a preset threshold, determining that the node is a super node includes:
when the total number of edges associated with the node is greater than a first preset threshold value, determining that the node is a super node.
3. The method according to claim 1, wherein prior to obtaining the node identifiers and the edge identifiers of the graph database, further comprising:
loading configuration information of the graph database;
initializing the graph database management interface with the configuration information;
the graph database is connected through the graph database management interface.
4. The method of claim 3, wherein the configuration information comprises: and the rear end of the graph database stores the host name, the port number, the table name and the preset threshold value of the system.
5. The method of claim 1, wherein after determining that the node is a super node, further comprising:
sequencing all super nodes according to the order of the number of edges from large to small;
and sending the node information, the side information and the side quantity corresponding to all the sequenced super nodes to a preset file.
6. A node analysis apparatus, comprising:
the information acquisition module is used for acquiring node information and side information of the graph database;
an associated edge determination module to determine an edge associated with a node based on the node information and the edge information;
the edge number counting module is used for counting the number of edges associated with the nodes aiming at each node;
a super node determining module, configured to determine that the node is a super node when the number of edges associated with the node is greater than a preset threshold;
the information acquisition module is specifically configured to:
loading a metadata file of the graph database in a Map task;
reading and analyzing the original records of the graph database one by one based on the metadata files;
acquiring node information and side information of a graph database from the original record;
wherein, limit quantity statistics module specifically is used for:
obtaining an edge type associated with the node;
counting the number of edges corresponding to the edge types respectively;
the super node determining module is specifically configured to:
when the number of edges corresponding to any one edge type exceeds a second preset threshold value corresponding to the edge type, determining that a node corresponding to the node identifier is a super node; wherein the second preset threshold is a set of multiple thresholds.
7. A node analysis apparatus, comprising:
one or more processors;
a memory for storing one or more programs;
the one or more programs being executable by the one or more processors to cause the one or more processors to implement the node analysis method of any one of claims 1-5.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the node analysis method according to any one of claims 1 to 5.
CN202011499271.8A 2020-12-17 2020-12-17 Node analysis method, device, equipment and storage medium Active CN112612832B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011499271.8A CN112612832B (en) 2020-12-17 2020-12-17 Node analysis method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011499271.8A CN112612832B (en) 2020-12-17 2020-12-17 Node analysis method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112612832A CN112612832A (en) 2021-04-06
CN112612832B true CN112612832B (en) 2023-02-10

Family

ID=75240250

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011499271.8A Active CN112612832B (en) 2020-12-17 2020-12-17 Node analysis method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112612832B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112988827B (en) * 2021-04-20 2021-08-17 杭州欧若数网科技有限公司 Method, system, equipment and storage medium for counting point edges by using graph database
CN114338190A (en) * 2021-12-30 2022-04-12 奇安信科技集团股份有限公司 Entity behavior correlation analysis method and device, electronic equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10061841B2 (en) * 2015-10-21 2018-08-28 International Business Machines Corporation Fast path traversal in a relational database-based graph structure
CN110727740B (en) * 2018-07-17 2023-03-14 百度在线网络技术(北京)有限公司 Correlation analysis method and device, computer equipment and readable medium
CN109840286A (en) * 2019-01-31 2019-06-04 中国农业银行股份有限公司 It is a kind of identification mass data building relational graph in super node method and device
CN110674359B (en) * 2019-09-03 2022-07-05 中国建设银行股份有限公司 Method and system for displaying relation map in multiple scenes
CN111932174A (en) * 2020-07-28 2020-11-13 中华人民共和国深圳海关 Freight monitoring abnormal information acquisition method, device, server and storage medium
CN111813963B (en) * 2020-09-10 2020-12-22 平安国际智慧城市科技股份有限公司 Knowledge graph construction method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112612832A (en) 2021-04-06

Similar Documents

Publication Publication Date Title
US11379475B2 (en) Analyzing tags associated with high-latency and error spans for instrumented software
US20110167148A1 (en) System and method for merging monitoring data streams from a server and a client of the server
US20210092160A1 (en) Data set creation with crowd-based reinforcement
US9992269B1 (en) Distributed complex event processing
CN112612832B (en) Node analysis method, device, equipment and storage medium
CN109947804B (en) Data set query optimization method and device, server and storage medium
US8965895B2 (en) Relationship discovery in business analytics
CN110928851B (en) Method, device and equipment for processing log information and storage medium
CN113254255B (en) Cloud platform log analysis method, system, device and medium
CN111294233A (en) Network alarm statistical analysis method, system and computer readable storage medium
JP2022118108A (en) Log auditing method, device, electronic apparatus, medium and computer program
US20150356162A1 (en) Method and system for implementing analytic function based on mapreduce
US10776401B2 (en) Efficient database query aggregation of variable length data
CN112328688B (en) Data storage method, device, computer equipment and storage medium
WO2021027331A1 (en) Graph data-based full relationship calculation method and apparatus, device, and storage medium
US11003513B2 (en) Adaptive event aggregation
CN110309206B (en) Order information acquisition method and system
Lee et al. Detecting anomaly teletraffic using stochastic self-similarity based on Hadoop
CN113918577B (en) Data table identification method and device, electronic equipment and storage medium
CN114168557A (en) Processing method and device for access log, computer equipment and storage medium
CN111159213A (en) Data query method, device, system and storage medium
CN109522915B (en) Virus file clustering method and device and readable medium
KR102656541B1 (en) Device, method and program that analyzes large log data using a distributed method for each log type
CN112214290B (en) Log information processing method, edge node, center node and system
CN115396319B (en) Data stream slicing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant