CN110727740A - Association analysis method and device, computer equipment and readable medium - Google Patents

Association analysis method and device, computer equipment and readable medium Download PDF

Info

Publication number
CN110727740A
CN110727740A CN201810784909.9A CN201810784909A CN110727740A CN 110727740 A CN110727740 A CN 110727740A CN 201810784909 A CN201810784909 A CN 201810784909A CN 110727740 A CN110727740 A CN 110727740A
Authority
CN
China
Prior art keywords
logical
edge
initial point
association
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810784909.9A
Other languages
Chinese (zh)
Other versions
CN110727740B (en
Inventor
杨双全
张阳
杨晓亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810784909.9A priority Critical patent/CN110727740B/en
Publication of CN110727740A publication Critical patent/CN110727740A/en
Application granted granted Critical
Publication of CN110727740B publication Critical patent/CN110727740B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Abstract

The invention provides a correlation analysis method and device, computer equipment and a readable medium. The method comprises the following steps: according to the input attribute of the target entity, searching an initial point corresponding to the target entity in a pre-established distributed map storage structure; according to the input association condition, sequentially searching a plurality of association logic edges associated with an initial point and association nodes corresponding to the association logic edges in a distributed map storage structure by taking the initial point as a center; and constructing a clue path and a corresponding subgraph of the event associated with the target entity according to the initial point, each associated logical edge and the associated node corresponding to each associated logical edge. Compared with the prior art of manually performing the association analysis processing, the technical scheme of the invention can automatically perform the association analysis processing, thereby avoiding the time and labor consumption of manual analysis and effectively improving the accuracy and the processing efficiency of the association analysis.

Description

Association analysis method and device, computer equipment and readable medium
[ technical field ] A method for producing a semiconductor device
The present invention relates to the field of computer application technologies, and in particular, to a correlation analysis method and apparatus, a computer device, and a readable medium.
[ background of the invention ]
With the rapid development of various acquisition technologies, the continuous upgrading and large-scale application of acquisition equipment, and the explosive growth of massive data (including equipment acquisition data, internet data and various offline social data) is realized.
Under the background of explosive growth of data, large-scale association analysis is carried out on multi-source heterogeneous data, key association related to a certain entity is cleared, and event clues are found out to become a difficult problem. The association analysis in the prior art is mainly applied to scenes with small data scale, event association between entities is mainly analyzed manually, an association network map is constructed manually, and association clue information is combed. The existing manual analysis technology can only perform the correlation analysis with the depth of 2.
Based on the above, it can be known that the existing correlation analysis technology performed in a manual manner can only be applied to a scene with a small data scale, and in the context of explosive growth of data, it is difficult for the manual manner to find accurate correlation events in massive data, and the manual analysis is time-consuming and labor-consuming, resulting in very low correlation analysis efficiency.
[ summary of the invention ]
The invention provides a correlation analysis method and device, computer equipment and a readable medium, which are used for improving the efficiency of correlation analysis.
The invention provides a correlation analysis method, which comprises the following steps:
according to the input attribute of a target entity, searching an initial point corresponding to the target entity in a pre-established distributed map storage structure;
according to input association conditions, sequentially retrieving a plurality of association logic edges associated with the initial point and association nodes corresponding to the association logic edges by taking the initial point as a center in the distributed graph storage structure;
and constructing a clue path and a corresponding subgraph of the event associated with the target entity according to the initial point, each associated logical edge and the associated node corresponding to each associated logical edge.
The present invention provides a correlation analysis apparatus, the apparatus comprising:
the retrieval module is used for retrieving an initial point corresponding to a target entity from a pre-established distributed map storage structure according to the input attribute of the target entity;
the retrieval module is further configured to sequentially retrieve, according to an input association condition, a plurality of association logic edges associated with the initial point and association nodes corresponding to the association logic edges, with the initial point as a center, in the distributed graph storage structure;
and the construction module is used for constructing a clue path and a corresponding subgraph of the event associated with the target entity according to the initial point, each associated logical edge and the associated node corresponding to each associated logical edge.
The present invention also provides a computer apparatus, the apparatus comprising:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the association analysis method as described above.
The invention also provides a computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the association analysis method as described above.
According to the association analysis method and device, the computer equipment and the readable medium, the initial point corresponding to the target entity is searched in the pre-established distributed map storage structure according to the input attribute of the target entity; according to the input association condition, sequentially searching a plurality of association logic edges associated with an initial point and association nodes corresponding to the association logic edges in a distributed map storage structure by taking the initial point as a center; and constructing a clue path and a corresponding subgraph of the event associated with the target entity according to the initial point, each associated logical edge and the associated node corresponding to each associated logical edge. According to the technical scheme, accurate retrieval can be performed on the distributed map storage structure based on the input attributes and the association conditions of the target entities, and the clue paths and the corresponding sub-graphs of the events associated with the target entities are constructed. Compared with the prior art of manually performing the correlation analysis processing, the technical scheme of the invention can automatically perform the correlation analysis processing, avoid the time and labor consumption of manual analysis, and effectively improve the accuracy and the processing efficiency of the correlation analysis; the clue path and the corresponding sub-graph of the event associated with the target entity obtained by the invention can clearly and intuitively present the association analysis result, so that a user can clearly obtain the association analysis result, and the presentation effect of the association analysis result is very good.
[ description of the drawings ]
FIG. 1 is a flow chart of an embodiment of a correlation analysis method of the present invention.
Fig. 2 is a diagram of a pruning example of a distributed graph storage structure according to an embodiment of the present invention.
Fig. 3 is a configuration diagram of a first embodiment of the association analysis apparatus according to the present invention.
Fig. 4 is a configuration diagram of a second embodiment of the correlation analysis device according to the present invention.
FIG. 5 is a block diagram of an embodiment of a computer device of the present invention.
Fig. 6 is an exemplary diagram of a computer device provided by the present invention.
[ detailed description ] embodiments
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
FIG. 1 is a flow chart of an embodiment of a correlation analysis method of the present invention. As shown in fig. 1, the association analysis method of this embodiment may specifically include the following steps:
100. according to the input attribute of the target entity, searching an initial point corresponding to the target entity in a pre-established distributed map storage structure;
the execution subject of the association analysis method of this embodiment may be an association analysis apparatus, and the association analysis apparatus is configured to retrieve, from the periphery of the target entity, a plurality of associated nodes based on the analyzed association condition with the target entity as an initial point, and construct, from the cue path of the event associated with the target entity and the associated sub-graph corresponding to the event, based on the initial point and the retrieved plurality of nodes.
The pre-established distributed graph storage structure in this embodiment may include a plurality of nodes, and each node corresponds to one entity. The entity of the embodiment may specifically correspond to a person or an object. Each entity has its own attribute, and during association analysis, an initial point corresponding to a target entity can be retrieved from the established distributed graph storage structure according to the input attribute of the target entity.
In a practical application scenario, there may be some logical connection between entities. If a person corresponding to an entity uses an object corresponding to another entity in some scenarios, there may be a logical connection between a node corresponding to the person and a node corresponding to the object in the distributed graph storage structure. For another example, a person corresponding to a first entity may have given a piece of information to a person corresponding to a second entity in some scenarios, and in the distributed graph storage structure, there may be a logical connection between a node corresponding to the person of the first entity and a node corresponding to the person of the second entity. And a plurality of logical connections may exist between the node pairs corresponding to the two entities, so that one physical edge between the two entities may carry a plurality of logical connections. For example, before step 100 of this embodiment, establishing a distributed graph storage structure may specifically include the following steps:
(a1) collecting source data;
(b1) mining a plurality of entities and attributes of the entities from source data;
(c1) respectively taking a plurality of entities as nodes, storing the nodes in a distributed graph storage structure, and recording the attribute of each node;
(d1) mining the logical association relationship between each node pair according to the source data, establishing logical edges between the node pairs based on the logical association relationship between the node pairs, and storing the logical attributes of the corresponding logical edges;
(e1) in the distributed atlas storage structure, physical edges are set for node pairs with logical association relation, so that at least one logical edge can be carried on one unique physical edge between the same node pair.
In this embodiment, the acquired source data may be data of a multi-source heterogeneous structure, and have different sources for the acquired massive multi-source heterogeneous data, and the data format is not standardized and may include various formats. In this embodiment, the multi-source heterogeneous data can be uniformly sorted, so that the data is isomorphic. In this embodiment, a plurality of entities and attributes of each entity may be mined from source data. The entity in this embodiment may be a person or an actually existing object. The attribute of the entity of the embodiment may be all the characteristic data of the entity mined from the source data. Then, the plurality of entities are respectively used as nodes and stored in a distributed graph storage structure, and the attribute of the corresponding entity is recorded in the node corresponding to each entity. Further, logical associations between pairs of nodes identified by different entities may also be mined in the source data. Further, based on the logical association relationship between the node pairs, a logical edge may be established between the node pairs in the distributed graph storage structure, and the logical attributes of the corresponding logical edges may be stored. The logical attribute of the logical edge of the present embodiment may be obtained based on the logical association relationship between the nodes. And finally, in the distributed graph storage structure, physical edges are arranged between the node pairs with the logical association relationship, so that at least one logical edge can be borne on the only one physical edge between the same node pair. In the distributed graph storage structure established based on the above manner, more than one logical edge may exist between two nodes, and each logical edge corresponds to one logical association relationship, so that if one logical edge is connected for each logical association relationship between two nodes, when there are many logical association relationships between two nodes, the logical edge connection between two nodes is very disordered. Therefore, in this embodiment, in the distributed graph storage structure, if a logical association relationship exists between two nodes, only one physical edge may be connected, and meanwhile, multiple logical edges may be carried on the physical edge, and each logical edge may be represented by one logical attribute. That is, a plurality of logical attributes may be set on the same physical edge, and each logical attribute corresponds to a logical association relationship between two nodes.
For example, the collected source data may be data related to a plurality of events, such as may include user a driving a car having a license plate number of jing XXXX at a few points through a Y intersection; the user A calls the user B at several points; the owner of the Benz car of Jing XXXX is user C; the Beijing XXXXXX car is borrowed by user A from user C, and so on. After the source data is isomorphic processed, four entities can be mined from the isomorphic processed source data: user a, user B, user C, and a galloping car; furthermore, the attributes of the user A, the user B and the user C can be mined respectively, such as a contact address, a mailbox, an identity card number and the like. Attributes of a running car may include a license plate number, corresponding owner information, and the like. Correspondingly, the entity user a, the user B, the user C and the running car can be respectively used as a node in the distributed graph storage structure, and the attribute of the node is recorded in each node. Further, there is a logical association between the nodes of user a and user B that is a fraction of a call in terms of several points, and therefore, the physical edge between the nodes of user a and user B has the logical attribute of "fraction of a call". There is a driving relationship between user a and the node of the car running, and the point of intersection Y is passed in fractions, so the physical edge between user a and the node of the car running has a logical attribute of "driving relationship, and the point of intersection Y is passed in fractions". By analogy, the logical attributes between any two nodes with the logical association relationship can be mined, and each corresponding mined logical attribute is recorded on the physical edges of the corresponding two nodes. According to the description of the above embodiment, it can be known that, in the creation process of the distributed atlas storage structure of this embodiment, the data is divided into a multi-dimensional attribute point (multi-dimensional attribute Note) and a multi-dimensional attribute Edge (multi-dimensional attribute Edge) according to the data type; the multidimensional attribute point data corresponds to attribute data of points of multiple dimensions of the nodes of the entity and is stored on the nodes. The multidimensional attribute edge data corresponds to attribute data of edges between nodes, and because two nodes can include a plurality of logical edges and correspond to a plurality of logical attributes, the two nodes can have attribute edge data of a plurality of dimensions. Further optionally, in this embodiment, after the distributed graph storage structure is created, the following steps may be further performed: and setting the weight for each logic edge according to a preset weight setting rule and the logic attribute of each logic edge.
When the distributed atlas storage structure is created, the importance degree of each created edge is not limited, and each edge is considered to be the same and has the same weight. In practical application, the logic attributes of each logic edge are different, and the different logic attributes contribute differently to the user analysis of event association. Therefore, in order to set different weights for different logical edges, in this embodiment, the weights may be set for the logical edges according to a preset weight setting rule and a logical attribute of each logical edge. The weight setting rule of the present embodiment may set the weight setting rule according to the domain used by the distributed spectrum. If in a certain field, when the correlation analysis is carried out, the call contact among the concerned nodes is compared, so that the weight set by the logic edge corresponding to the call contact in the logic attribute can be higher; in another field, when performing association analysis, the transaction connection between the concerned nodes is compared, so that the weight of the logic attribute setting corresponding to the transaction connection in the logic attribute can be higher. The weight setting rule of each field may have a plurality of weight setting policies, such as a first-class attribute relationship, a highest-class attribute relationship, a second-class attribute relationship, a next-highest weight, a second-class attribute relationship, a lowest weight, and so on. By adopting the method, the weight can be set for the logic edges between the nodes in the distributed graph storage structure in any field.
Further optionally, in this embodiment, after the distributed graph storage structure is created, the following steps may also be included:
(a1) calculating the association degree of a logical edge between each node pair with a logical association relation in the distributed graph storage structure;
(b1) and pruning the logic edges with the association degrees smaller than the association degree threshold value in the distributed map storage structure according to the association degree of each logic edge and a preset association degree threshold value so as to revise the distributed map storage structure again.
Specifically, in this embodiment, the association degree of two nodes in a node pair having a logical association relationship may be analyzed according to the attributes of the two nodes and the logical attributes of the logical edges. If the matching degree of the logic attribute of the logic edge and the attributes of the two nodes is high, the association degree of the logic edge between the two nodes is higher, and conversely, if the matching degree of the logic attribute of the logic edge and the attributes of the two nodes is low, the association degree of the logic edge between the two nodes is lower. For example, in the distributed graph storage structure, the association degree between two nodes corresponding to two persons who always contact by telephone should exceed the association degree between two nodes corresponding to two persons who send blessing messages occasionally. For another example, in a police scene, a user a with a boss identity often drives a car running away during work, and according to the attribute of the user a, such as the boss identity and a certain body price, in combination with the attribute of the car running away, such as the vehicle value, and the like, and referring to the logical relationship between the user a and the car running away for driving, it can be known that the association between the user a and the car running away is higher. While another employee B who often rides on a battery car for work may occasionally take a car that is on a trip, and in a similar manner, depending on the attributes of user B, the attributes of the car that is on a trip, and the logical relationship, the employee B is less associated with the car that is on a trip. Based on the above manner, the association degree of the logical edge between each node pair having the logical association relationship in the distributed graph storage structure can be calculated. In practical application, the relevance between some node pairs is very low, the contribution of the relevance between the node pairs to relevance analysis is very small, and the node pairs can be regarded as a noise node or a subgraph in the distributed graph storage structure, at this time, whether the relevance of each logical edge is smaller than a relevance threshold value can be judged, if the relevance of each logical edge is smaller than the relevance threshold value, the logical edge can be deleted, so that another part of subgraphs connected with the logical edge are pruned, and meanwhile, the logical attribute corresponding to the logical edge is deleted, so that the distributed graph storage structure is revised again.
Fig. 2 is a diagram of a pruning example of a distributed graph storage structure according to an embodiment of the present invention. As shown in fig. 2, the present embodiment includes 9 nodes from node 1, node 2, … … to node 9 before pruning. In this embodiment, the size of the association degree with the core node is represented by the size of the node, where fig. 4 is the core node, so that it is the largest in the whole subgraph, and then the size identification of each other node and its relationship closeness. Wherein the relevance of the node 5 and the node 8 is lower than the relevance threshold, and correspondingly, as shown in fig. 2, the corresponding subgraph is pruned.
101. According to the input association condition, sequentially searching a plurality of association logic edges associated with an initial point and association nodes corresponding to the association logic edges in a distributed map storage structure by taking the initial point as a center;
in this embodiment, the association analysis processing needs to determine the initial point to be analyzed and also needs to specify the association condition to perform effective association analysis. The association condition of this embodiment is an input condition for requesting association analysis, and the association condition may include one condition or a combination of multiple conditions. For example, the association condition may be that there are all people and things that have met and contacted by instant messaging during month X of year X. The plurality of associated logical edges of the present embodiment may include an associated logical edge directly associated with the initial point, and may further include an associated logical edge indirectly associated with the initial point through another associated logical edge.
The association condition of the present embodiment may include the following two cases:
in the first case, only the logical attributes of the associated edge are included in the associated condition. The corresponding search is now a normal, unauthorized search. In this case, the step 101 may specifically include the following steps:
(a2) in the distributed map storage structure, an initial point is used as a central node, a related logic edge related to the initial point is searched from the periphery of the initial point according to the logic attribute of the related edge, and a node corresponding to the related logic edge is obtained and used as a related node;
for example, the association index between each logical edge and the initial point may be calculated according to the logical attributes of the associated edge, the logical attributes of each logical edge around the initial point, and a preset expert rule; and acquiring the logic edge with the largest correlation index from a plurality of logic edges around the initial point as the correlation logic edge correlated with the initial point.
For example, the correlation index may be expressed using the following formula: and (3) scoring the attribute relevance of points + c by the expert rule with the relevance index of a side + b, wherein a, b and c are parameter factors, and a + b + c is 1, namely values of a, b and c are all between 0 and 1, and each represents the importance of the corresponding feature.
The association degree of the edge is the association degree between the logical attribute of the associated edge and the logical attribute of the current logical edge in the association condition. A plurality of scoring rules set by experts can be set in the preset expert rules, and then the scores are normalized to a value of 1-100 to be used as the expert rule scoring. For example, the expert may set the possibility that the daytime and nighttime output is a criminal to be 90%, determine whether the logical attribute of the logical edge hits the rule according to the rule and the logical attribute of the logical edge, and if so, score the logical edge to be 90 according to the rule. In addition, the attribute relevance of the point is the relevance between the node to be determined by the logic edge and the initial point, so that the relation is transmitted to a certain degree, the relevance is attenuated, and the real cognition is met. The attribute relevance of the point can specifically calculate the relative degree of the two points according to the attributes of the two points, and in practical application, only the first two items can be considered in the calculation of the association index, and the value of c is 0.
After the correlation indexes of the initial points and the logic edges around the initial points are calculated according to the mode, acquiring the logic edge with the maximum correlation index from a plurality of logic edges around the initial points as the correlation logic edge correlated with the initial points; and further, the node corresponding to the associated logical edge can be obtained as the associated node.
(b2) In the distributed graph storage structure, the determined associated node is used as a central node, the associated logical edge associated with the associated node and the associated node corresponding to the retrieved associated logical edge are retrieved from the periphery of the associated node according to the logical attribute of the associated edge, and the associated node corresponding to the retrieved associated logical edge is not repeated with the determined associated node;
according to the similar processing mode of the step (a2), taking the determined associated node as a central node, calculating the associated index of each associated logical edge around the associated node and the central node, and acquiring the logical edge with the maximum associated index as the associated logical edge associated with the central node; and further, the node corresponding to the associated logical edge can be obtained as the associated node. The relevant logical edges around the central node include the relevant logical edge with the central node as a starting point, and may also include the relevant logical edge with the central node as an end point. In addition, in this embodiment, each time the newly determined associated node is retrieved, the newly determined associated node cannot be duplicated by the already confirmed associated node.
In this embodiment, a candidate set of an edge and a candidate set of a point may also be established, and after each retrieval is completed, the associated logical edge retrieved this time is added to the candidate set of the edge, and the associated node retrieved this time is added to the candidate set of the point. Meanwhile, the association logics and the association nodes are also stored in the candidate set of edges and the candidate set of points according to the adding sequence.
(c2) And re-executing the previous step until the retrieval termination condition is reached, and stopping the retrieval to sequentially obtain a plurality of associated logical edges associated with the initial point and associated nodes corresponding to the associated logical edges.
The termination condition of the present embodiment may be the following three cases:
(1) the depth of the completed retrieval reaches a preset depth threshold;
in this embodiment, the initial point is used as a center node of the search, the corresponding search depth when the first associated logical edge is searched out is 1, and then another node except the initial point in the first associated logical edge is used as a center node, and the search is continued to obtain a second associated logical edge, where the corresponding search depth is 2; by analogy, deep retrieval can be performed step by step. In this embodiment, a preset depth threshold, such as 5 degrees, 10 degrees, or other values, may be set, so that when the preset depth threshold is retrieved, the retrieval may be terminated, and at this time, the number of retrieved associated logical edges and the number of corresponding associated nodes are both equal to the preset depth threshold.
(2) In the retrieval, the maximum value of the correlation indexes corresponding to all logic edges around the central node is smaller than a preset correlation index threshold value;
the termination condition of this embodiment may also be controlled according to a preset correlation index threshold, and as the search depth goes deeper and farther from the initial point, the correlation index corresponding to each logical edge around the central node in the search is theoretically continuously reduced. Of course, if the center node selected by the user is relatively isolated, or the association degree with other nodes is very low and is already lower than the preset association index threshold, it indicates that the association degree of the center node with other nodes is too low, and subsequent retrieval is not necessary at all, so that the process is terminated.
(3) And no unselected associated nodes are arranged around the associated node which is finally retrieved in the distributed map storage structure.
For example, after each retrieval of the associated node, the associated node may be compared with each associated node in the point candidate set to ensure that each retrieved associated node is an associated node that has not been retrieved before. If during the searching process, no unselected related nodes around the related node searched out finally are found, the searching can be exited at the moment.
In the second case, the association condition includes a weight threshold of the associated edge in addition to the logical attribute of the associated edge. The corresponding search is now a weighted association search.
At this time, the corresponding step 101 may be specifically implemented as follows:
(a3) in the distributed graph storage structure, an initial point is used as a central node, and according to the logic attribute of the associated edge and the weight of the associated edge, the associated logical edge which is associated with the initial point and has the weight larger than the weight threshold of the associated edge is searched from the periphery of the initial point, and a node corresponding to the associated logical edge is obtained and used as an associated node.
(b3) In the distributed graph storage structure, the determined associated nodes are used as central nodes, and associated logical edges which are associated with the associated nodes and have weights larger than the weight threshold of the associated edges and associated nodes corresponding to the currently retrieved associated logical edges are retrieved from the periphery of the associated nodes according to the logical attributes of the associated edges and the weight threshold of the associated edges;
(c3) and re-executing the previous step until the retrieval termination condition is reached, and stopping the retrieval to sequentially obtain a plurality of associated logical edges associated with the initial point and associated nodes corresponding to the associated logical edges.
Specifically, in the step (a3), in a specific implementation, the logical edges around the central node may be screened according to the weight threshold of the associated edge, and the logical edges with weights greater than the weight threshold of the associated edge are screened, and then according to the manner of the step (a2) in the foregoing embodiment, the associated logical edge associated with the initial point is retrieved, and the node corresponding to the associated logical edge is obtained as the associated node. Similarly, when the step (b3) is specifically implemented, the logical edges around the central node may be first screened by using the weight threshold of the associated edge, and the logical edges whose weight is greater than the weight threshold of the associated edge are screened, and then the associated logical edges associated with the associated nodes are retrieved according to the method of the step (b2) in the above embodiment, and the nodes corresponding to the associated logical edges are obtained as the associated nodes. And then according to the step (c3), repeatedly executing the step (b3) until the retrieval termination condition is reached and stopping the retrieval. The step (c3) can be performed by referring to the step (c2), and will not be described herein again.
The above search of this embodiment may be integrated with a BFS algorithm, a DFS algorithm, a Dijkstra algorithm, etc. for different scenarios, so as to quickly perform breadth search and depth search for the initial point and the associated condition.
102. And constructing a clue path and a corresponding subgraph of the event associated with the target entity according to the initial point, each associated logical edge and the associated node corresponding to each associated logical edge.
According to the processing of the above embodiment, after the initial point, each associated logic edge, and the associated node corresponding to each associated logic edge are obtained, the thread path and the corresponding sub-graph of the event associated with the target entity may be constructed according to the obtained information, and during the specific construction, the initial point may be used as a starting point, and the associated nodes corresponding to each associated logic edge from near to far from the initial point may be connected together through the corresponding associated logic edges according to the order from near to far from the initial point by each associated logic edge, so as to construct the thread path and the corresponding sub-graph of the event associated with the target entity.
Or if during retrieval, when the candidate set with the corresponding edge and the candidate set with the point are established, the initial point can be directly taken as the starting point, the associated logical edge added firstly is obtained from the candidate set of the edge according to the adding sequence of the logical edges in the candidate set of the edge, then the associated node added firstly is obtained from the candidate set of the point, and the initial point is connected with the associated node through the obtained associated logical edge. And then, acquiring a next associated logical edge from the edge candidate set according to the adding sequence, acquiring a next associated node from the point candidate set according to the adding sequence, connecting the next associated node with the current associated node through the acquired next associated logical edge, and so on until an associated logical edge which is added last in the edge candidate set and an associated node which is added last in the point candidate set are acquired, connecting the last associated node with the corresponding previous associated node by using the last associated logical edge, and finishing the construction. All the associated logical edges between the node of the target entity to the last associated node constitute the clue path of the associated event, and the initial point and the associated nodes connected by the associated logical edges constitute the subgraph of the associated event.
In addition, in practical applications, in the distributed graph storage structure, if there is a logical edge between the node 1 and the node 2, a logical edge between the node 2 and the node 3, and a logical edge between the node 3 and the node 4, virtual logical edges may be constructed between the node 1 and the node 3 and between the node 1 and the node 4 to indicate that the node 3 or the node 4 is finally retrieved according to the node 1. By establishing the virtual logic edge, the target node can be quickly found in all nodes of the distributed graph storage structure in a flat retrieval mode, so that the retrieval time is saved, and the retrieval efficiency is improved. However, based on the search results, the construction of the event thread path is not based on flattening the virtual logical edges in the search.
The association analysis method of the embodiment can be applied to a big data scene which is long in data chain and inconvenient for manual analysis, for example, in a police scene, the technical scheme of the embodiment can be fully utilized to search all persons and objects related to a target suspect, so that an accurate event clue path is analyzed, a corresponding event association sub-graph can be obtained, and the method is very intuitive and clear.
When the association analysis method of the embodiment is used, the attribute and the association condition of the target entity can be used as the input conditions of the association analysis, then, in the established distributed atlas storage structure, the deep search and the breadth search are performed according to the input conditions of the association analysis, and the clue path and the corresponding sub-graph of the event associated with the target entity are constructed.
In the association analysis method of the embodiment, according to the input attribute of the target entity, an initial point corresponding to the target entity is retrieved from a pre-established distributed map storage structure; according to the input association condition, sequentially searching a plurality of association logic edges associated with an initial point and association nodes corresponding to the association logic edges in a distributed map storage structure by taking the initial point as a center; and constructing a clue path and a corresponding subgraph of the event associated with the target entity according to the initial point, each associated logical edge and the associated node corresponding to each associated logical edge. According to the technical scheme of the embodiment, accurate retrieval can be performed on the distributed graph storage structure based on the input attributes and the association conditions of the target entities, and a clue path and a corresponding sub-graph of the event associated with the target entities are constructed. Compared with the prior art in which correlation analysis processing is performed manually, the technical scheme of the embodiment can automatically perform correlation analysis processing, avoid time and labor consumption of manual analysis, and effectively improve the accuracy and processing efficiency of correlation analysis; in addition, the clue path and the corresponding sub-graph of the event associated with the target entity obtained by the embodiment can clearly and intuitively present the association analysis result, so that the user can clearly obtain the association analysis result, and the presentation effect of the association analysis result is very good.
Fig. 3 is a configuration diagram of a first embodiment of the association analysis apparatus according to the present invention. As shown in fig. 3, the association analysis apparatus of this embodiment may specifically include:
the retrieval module 10 is configured to retrieve an initial point corresponding to a target entity from a pre-established distributed graph storage structure according to an input attribute of the target entity;
the retrieval module 10 is further configured to sequentially retrieve, according to the input association condition, a plurality of association logic edges associated with the initial point and association nodes corresponding to the association logic edges, with the initial point as a center, in the distributed graph storage structure;
the construction module 11 is configured to construct a thread path and a corresponding sub-graph of an event associated with the target entity according to the initial point, each associated logical edge, and an associated node corresponding to each associated logical edge retrieved by the retrieval module 10.
The implementation principle and technical effect of the association analysis implemented by the modules in the association analysis device of this embodiment are the same as those of the related method embodiment, and reference may be made to the description of the related method embodiment in detail, which is not described herein again.
Fig. 4 is a configuration diagram of a second embodiment of the correlation analysis device according to the present invention. As shown in fig. 4, the association analysis apparatus of the present embodiment will further describe the technical solution of the present invention in more detail on the basis of the technical solution of the embodiment shown in fig. 3.
In the association analysis apparatus of this embodiment, the building module 11 is specifically configured to:
and connecting the initial point and the associated nodes corresponding to the associated logical edges from near to far away from the initial point by the corresponding associated logical edges according to the sequence from near to far away from the initial point by taking the initial point as a starting point, and constructing a clue path and a corresponding sub-graph of the event associated with the target entity.
Further optionally, in the association analysis apparatus of this embodiment, the association condition includes a logical attribute of the association edge; the retrieval module 10 is specifically configured to:
in the distributed map storage structure, an initial point is used as a central node, a related logic edge related to the initial point is searched from the periphery of the initial point according to the logic attribute of the related edge, and a node corresponding to the related logic edge is obtained and used as a related node;
in the distributed graph storage structure, the determined associated node is used as a central node, the associated logical edge associated with the associated node and the associated node corresponding to the retrieved associated logical edge are retrieved from the periphery of the associated node according to the logical attribute of the associated edge, and the associated node corresponding to the retrieved associated logical edge is not repeated with the determined associated node;
and re-executing the previous step until the retrieval termination condition is reached, and stopping the retrieval to sequentially obtain a plurality of associated logical edges associated with the initial point and associated nodes corresponding to the associated logical edges.
Further optionally, in the association analysis apparatus of this embodiment, the retrieval module 10 is specifically configured to:
calculating the association index of each logic edge and the initial point according to the logic attribute of the association edge, the logic attribute of each logic edge around the initial point and a preset expert rule;
and acquiring the logic edge with the largest correlation index from a plurality of logic edges around the initial point as the correlation logic edge correlated with the initial point.
Further optionally, in the association analysis apparatus of this embodiment, the termination condition includes that the depth of the completed search reaches a preset depth threshold, a maximum value of association indexes corresponding to each logical edge around the central node in the current search is smaller than a preset association index threshold, or no unselected association node exists around the association node that is searched out last in the distributed graph storage structure.
Further optionally, in the association analysis apparatus of this embodiment, the association condition further includes a weight threshold of the association edge, and at this time, the corresponding retrieval module 10 is specifically configured to:
in the distributed graph storage structure, an initial point is used as a central node, and according to the logic attribute of an associated edge and the weight of the associated edge, the associated logical edge which is associated with the initial point and has the weight larger than the weight threshold of the associated edge is searched from the periphery of the initial point, and a node corresponding to the associated logical edge is obtained and used as an associated node;
in the distributed graph storage structure, the determined associated nodes are used as central nodes, and associated logical edges which are associated with the associated nodes and have weights larger than the weight threshold of the associated edges and associated nodes corresponding to the currently retrieved associated logical edges are retrieved from the periphery of the associated nodes according to the logical attributes of the associated edges and the weight threshold of the associated edges;
and re-executing the previous step until the retrieval termination condition is reached, and stopping the retrieval to sequentially obtain a plurality of associated logical edges associated with the initial point and associated nodes corresponding to the associated logical edges.
Further optionally, as shown in fig. 5, the association analysis apparatus of this embodiment further includes:
the acquisition module 12 is used for acquiring source data;
the mining module 13 is used for mining a plurality of entities and attributes of the entities from the source data collected by the collecting module 12;
the structure creating module 14 is configured to store, as nodes, the entities mined by the mining module 13 in the distributed graph storage structure, and record attributes of the nodes in the distributed graph storage structure;
the mining module 13 is further configured to mine a logical association relationship between each node pair according to the source data acquired by the acquisition module 12;
the structure creating module 14 is further configured to establish a logical edge between the node pairs based on the logical association relationship between the node pairs mined by the mining module 13, and store a logical attribute of the corresponding logical edge in the created distributed graph storage structure;
the structure creating module 14 is configured to set a physical edge for the node pairs mined by the mining module 13 and having the logical association relationship in the distributed graph storage structure, so that at least one logical edge can be carried on a single physical edge between the same node pair. Based on the above, the structure creation module 14 creates a distributed graph storage structure for the retrieval module 10 to retrieve,
Further optionally, in the association analysis apparatus of this embodiment, the setting module 15 is further configured to set a weight for each logical edge in the distributed graph storage structure created by the structure creating module 14 according to a preset weight setting rule and a logical attribute of each logical edge.
Further optionally, as shown in fig. 5, the association analysis apparatus of this embodiment further includes:
the calculation module 16 is configured to calculate an association degree of a logical edge between each node pair having a logical association relationship in the distributed graph storage structure created by the structure creation module 14;
the pruning processing module 17 is configured to prune, according to the relevance of each logical edge calculated by the calculation module 16 and a preset relevance threshold, the logical edge whose relevance is smaller than the relevance threshold in the distributed map storage structure created by the structure creation module 14, so as to revise the distributed map storage structure again.
The implementation principle and technical effect of the association analysis implemented by the modules in the association analysis device of this embodiment are the same as those of the related method embodiment, and reference may be made to the description of the related method embodiment in detail, which is not described herein again.
FIG. 5 is a block diagram of an embodiment of a computer device of the present invention. As shown in fig. 5, the computer device of the present embodiment includes: one or more processors 30, and a memory 40, the memory 40 being configured to store one or more programs, which when executed by the one or more processors 30, cause the one or more processors 30 to implement the association analysis method of the embodiment shown in fig. 1 above. The embodiment shown in fig. 5 is exemplified by including a plurality of processors 30.
For example, fig. 6 is an exemplary diagram of a computer device provided by the present invention. FIG. 6 illustrates a block diagram of an exemplary computer device 12a suitable for use in implementing embodiments of the present invention. The computer device 12a shown in FIG. 6 is only an example and should not bring any limitations to the functionality or scope of use of embodiments of the present invention.
As shown in FIG. 6, computer device 12a is in the form of a general purpose computing device. The components of computer device 12a may include, but are not limited to: one or more processors 16a, a system memory 28a, and a bus 18a that connects the various system components (including the system memory 28a and the processors 16 a).
Bus 18a represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer device 12a typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12a and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28a may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30a and/or cache memory 32 a. Computer device 12a may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34a may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, and commonly referred to as a "hard drive"). Although not shown in FIG. 6, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18a by one or more data media interfaces. System memory 28a may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of the various embodiments of the invention described above in fig. 1-4.
A program/utility 40a having a set (at least one) of program modules 42a may be stored, for example, in system memory 28a, such program modules 42a including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may include an implementation of a network environment. Program modules 42a generally perform the functions and/or methodologies described above in connection with the various embodiments of fig. 1-4 of the present invention.
Computer device 12a may also communicate with one or more external devices 14a (e.g., keyboard, pointing device, display 24a, etc.), with one or more devices that enable a user to interact with computer device 12a, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12a to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22 a. Also, computer device 12a may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet) through network adapter 20 a. As shown, network adapter 20a communicates with the other modules of computer device 12a via bus 18 a. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with computer device 12a, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processor 16a executes various functional applications and data processing by executing programs stored in the system memory 28a, for example, to implement the association analysis method shown in the above-described embodiment.
The present invention also provides a computer-readable medium, on which a computer program is stored, which when executed by a processor implements the association analysis method as shown in the above embodiments.
The computer-readable media of this embodiment may include RAM30a, and/or cache memory 32a, and/or storage system 34a in system memory 28a in the embodiment illustrated in fig. 6 described above.
With the development of technology, the propagation path of computer programs is no longer limited to tangible media, and the computer programs can be directly downloaded from a network or acquired by other methods. Accordingly, the computer-readable medium in the present embodiment may include not only tangible media but also intangible media.
The computer-readable medium of the present embodiments may take any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (20)

1. A method of correlation analysis, the method comprising:
according to the input attribute of a target entity, searching an initial point corresponding to the target entity in a pre-established distributed map storage structure;
according to input association conditions, sequentially retrieving a plurality of association logic edges associated with the initial point and association nodes corresponding to the association logic edges by taking the initial point as a center in the distributed graph storage structure;
and constructing a clue path and a corresponding subgraph of the event associated with the target entity according to the initial point, each associated logical edge and the associated node corresponding to each associated logical edge.
2. The method of claim 1, wherein constructing a cue path and a corresponding sub-graph of the event associated with the target entity based on the initial point, each of the associated logical edges, and the associated node corresponding to each of the associated logical edges comprises:
and connecting the initial point and the associated nodes corresponding to the associated logical edges which are far away from the initial point through the corresponding associated logical edges to construct a clue path and a corresponding subgraph of the event associated with the target entity by taking the initial point as a starting point and according to the sequence of the associated logical edges from near to far from the initial point.
3. The method of claim 1, wherein the association condition comprises a logical attribute of an associated edge; according to the input association condition, sequentially retrieving a plurality of association logic edges associated with the initial point and association nodes corresponding to each association logic edge in the distributed graph storage structure by taking the initial point as a center, specifically comprising:
in the distributed graph storage structure, taking the initial point as a central node, retrieving a related logic edge related to the initial point from the periphery of the initial point according to the logic attribute of the related edge, and acquiring a node corresponding to the related logic edge as a related node;
in the distributed graph storage structure, the determined associated node is used as a central node, and according to the logic attribute of the associated edge, an associated logic edge associated with the associated node and an associated node corresponding to the currently retrieved associated logic edge are retrieved from the periphery of the associated node, and the associated node corresponding to the currently retrieved associated logic edge is not repeated with the determined associated node;
and re-executing the previous step until a retrieval termination condition is reached, and stopping the retrieval to sequentially obtain a plurality of associated logical edges associated with the initial point and the associated nodes corresponding to the associated logical edges.
4. The method according to claim 3, wherein retrieving, in the distributed graph storage structure, an associated logical edge associated with the initial point from around the initial point according to the logical attribute of the associated edge, with the initial point as a center, specifically comprises:
calculating the association index of each logic edge and the initial point according to the logic attribute of the association edge, the logic attribute of each logic edge around the initial point and a preset expert rule;
acquiring the logic edge with the largest association index from a plurality of logic edges around the initial point as the associated logic edge associated with the initial point.
5. The method according to claim 4, wherein the termination condition includes that the depth of the completed search reaches a preset depth threshold, the maximum value of the correlation indexes corresponding to each logical edge around the central node in the current search is smaller than a preset correlation index threshold, or no unselected correlation node around the correlation node which is searched out last in the distributed graph storage structure.
6. The method according to claim 3, wherein the association condition further includes a weight threshold of an associated edge, and the sequentially retrieving, in the distributed graph storage structure, a plurality of associated logical edges associated with the initial point and associated nodes corresponding to the associated logical edges with the initial point as a center includes:
in the distributed graph storage structure, taking the initial point as a central node, retrieving an associated logical edge which is associated with the initial point and has a weight larger than a weight threshold of the associated edge from the periphery of the initial point according to the logical attribute of the associated edge and the weight of the associated edge, and acquiring a node corresponding to the associated logical edge as an associated node;
in the distributed graph storage structure, the determined associated nodes are used as central nodes, and associated logical edges which are associated with the associated nodes and have weights larger than the weight threshold of the associated edges and associated nodes corresponding to the retrieved associated logical edges are retrieved from the periphery of the associated nodes according to the logical attributes of the associated edges and the weight threshold of the associated edges;
and re-executing the previous step until the retrieval termination condition is reached, and stopping the retrieval to sequentially obtain a plurality of associated logical edges associated with the initial point and the associated nodes corresponding to the associated logical edges.
7. The method according to any one of claims 1-6, wherein before sequentially retrieving a plurality of associated logical edges associated with the initial point and associated nodes corresponding to each of the associated logical edges in the distributed graph storage structure centered on the initial point according to the input association condition, the method further comprises:
collecting source data;
mining a plurality of entities and attributes of each entity from the source data;
the entities are respectively used as nodes and stored in a distributed graph storage structure, and the attributes of the nodes are recorded in the distributed graph storage structure;
mining a logical association relation between each node pair according to the source data, establishing a logical edge between the node pairs based on the logical association relation between the node pairs, and storing a corresponding logical attribute of the logical edge in the distributed graph storage structure;
in the distributed graph storage structure, physical edges are set for the node pairs with the logical association relationship, so that at least one logical edge can be carried on one physical edge between the same node pair.
8. The method of claim 7, further comprising:
and setting the weight for each logic edge according to a preset weight setting rule and the logic attribute of each logic edge.
9. The method according to claim 7 or 8, characterized in that the method further comprises:
calculating the association degree of the logical edge between each pair of nodes with logical association relation in the distributed graph storage structure;
according to the association degree of each logic edge and a preset association degree threshold value, pruning the logic edges of which the association degrees are smaller than the association degree threshold value in the distributed map storage structure so as to revise the distributed map storage structure again.
10. An association analysis apparatus, characterized in that the apparatus comprises:
the retrieval module is used for retrieving an initial point corresponding to a target entity from a pre-established distributed map storage structure according to the input attribute of the target entity;
the retrieval module is further configured to sequentially retrieve, according to an input association condition, a plurality of association logic edges associated with the initial point and association nodes corresponding to the association logic edges, with the initial point as a center, in the distributed graph storage structure;
and the construction module is used for constructing a clue path and a corresponding subgraph of the event associated with the target entity according to the initial point, each associated logical edge and the associated node corresponding to each associated logical edge.
11. The apparatus according to claim 10, wherein the building block is specifically configured to:
and connecting the initial point and the associated nodes corresponding to the associated logical edges which are far away from the initial point through the corresponding associated logical edges to construct a clue path and a corresponding subgraph of the event associated with the target entity by taking the initial point as a starting point and according to the sequence of the associated logical edges from near to far from the initial point.
12. The apparatus of claim 10, wherein the association condition comprises a logical attribute of an associated edge; the retrieval module is specifically configured to:
in the distributed graph storage structure, taking the initial point as a central node, retrieving a related logic edge related to the initial point from the periphery of the initial point according to the logic attribute of the related edge, and acquiring a node corresponding to the related logic edge as a related node;
in the distributed graph storage structure, the determined associated node is used as a central node, and according to the logic attribute of the associated edge, an associated logic edge associated with the associated node and an associated node corresponding to the currently retrieved associated logic edge are retrieved from the periphery of the associated node, and the associated node corresponding to the currently retrieved associated logic edge is not repeated with the determined associated node;
and re-executing the previous step until a retrieval termination condition is reached, and stopping the retrieval to sequentially obtain a plurality of associated logical edges associated with the initial point and the associated nodes corresponding to the associated logical edges.
13. The apparatus according to claim 12, wherein the retrieving module is specifically configured to:
calculating the association index of each logic edge and the initial point according to the logic attribute of the association edge, the logic attribute of each logic edge around the initial point and a preset expert rule;
acquiring the logic edge with the largest association index from a plurality of logic edges around the initial point as the associated logic edge associated with the initial point.
14. The apparatus according to claim 13, wherein the termination condition includes that a depth of the completed search reaches a preset depth threshold, a maximum value of the correlation indexes corresponding to each logical edge around the central node in the current search is smaller than a preset correlation index threshold, or no unselected correlation node around the correlation node that is retrieved last in the distributed graph storage structure.
15. The apparatus according to claim 12, wherein the association condition further includes a weight threshold of the associated edge, and the retrieving module is specifically configured to:
in the distributed graph storage structure, taking the initial point as a central node, retrieving an associated logical edge which is associated with the initial point and has a weight larger than a weight threshold of the associated edge from the periphery of the initial point according to the logical attribute of the associated edge and the weight of the associated edge, and acquiring a node corresponding to the associated logical edge as an associated node;
in the distributed graph storage structure, the determined associated nodes are used as central nodes, and associated logical edges which are associated with the associated nodes and have weights larger than the weight threshold of the associated edges and associated nodes corresponding to the retrieved associated logical edges are retrieved from the periphery of the associated nodes according to the logical attributes of the associated edges and the weight threshold of the associated edges;
and re-executing the previous step until the retrieval termination condition is reached, and stopping the retrieval to sequentially obtain a plurality of associated logical edges associated with the initial point and the associated nodes corresponding to the associated logical edges.
16. The apparatus of any of claims 10-15, further comprising:
the acquisition module is used for acquiring source data;
the mining module is used for mining a plurality of entities and the attribute of each entity from the source data;
the structure creating module is used for respectively taking the entities as nodes, storing the nodes in the distributed graph storage structure and recording the attribute of each node in the distributed graph storage structure;
the mining module is further used for mining the logical association relation between each node pair according to the source data;
the structure creating module is further configured to establish a logical edge between the node pairs based on the logical association relationship between the node pairs, and store a logical attribute of the corresponding logical edge in the distributed graph storage structure;
the structure creating module is further configured to set a physical edge for the node pairs having the logical association relationship in the distributed graph storage structure, so that at least one logical edge can be carried on a single physical edge between the same node pair.
17. The apparatus of claim 16, wherein:
the structure creating module is further configured to set a weight for each logical edge according to a preset weight setting rule and a logical attribute of each logical edge.
18. The apparatus of claim 16 or 17, further comprising:
a calculation module, configured to calculate a relevance degree of the logical edge between each pair of nodes having a logical relevance relationship in the distributed graph storage structure;
and the pruning processing module is used for pruning the logic edges with the association degrees smaller than the association degree threshold value in the distributed map storage structure according to the association degrees of the logic edges and a preset association degree threshold value so as to revise the distributed map storage structure.
19. A computer device, the device comprising:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-9.
20. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-9.
CN201810784909.9A 2018-07-17 2018-07-17 Correlation analysis method and device, computer equipment and readable medium Active CN110727740B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810784909.9A CN110727740B (en) 2018-07-17 2018-07-17 Correlation analysis method and device, computer equipment and readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810784909.9A CN110727740B (en) 2018-07-17 2018-07-17 Correlation analysis method and device, computer equipment and readable medium

Publications (2)

Publication Number Publication Date
CN110727740A true CN110727740A (en) 2020-01-24
CN110727740B CN110727740B (en) 2023-03-14

Family

ID=69217555

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810784909.9A Active CN110727740B (en) 2018-07-17 2018-07-17 Correlation analysis method and device, computer equipment and readable medium

Country Status (1)

Country Link
CN (1) CN110727740B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111949839A (en) * 2020-08-24 2020-11-17 上海宏路数据技术股份有限公司 Data association method, electronic device and medium
CN112541043A (en) * 2020-12-24 2021-03-23 北京明略软件系统有限公司 Method, device and equipment for detecting connectivity of nodes of knowledge graph
CN112612832A (en) * 2020-12-17 2021-04-06 北京锐安科技有限公司 Node analysis method, device, equipment and storage medium
CN112633178A (en) * 2020-12-24 2021-04-09 深圳集智数字科技有限公司 Image identification method and device, storage medium and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3086189A2 (en) * 2015-04-24 2016-10-26 Accenture Global Services Limited System architecture for control systems via knowledge graph search
CN108153901A (en) * 2018-01-16 2018-06-12 北京百度网讯科技有限公司 The information-pushing method and device of knowledge based collection of illustrative plates

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3086189A2 (en) * 2015-04-24 2016-10-26 Accenture Global Services Limited System architecture for control systems via knowledge graph search
CN108153901A (en) * 2018-01-16 2018-06-12 北京百度网讯科技有限公司 The information-pushing method and device of knowledge based collection of illustrative plates

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111949839A (en) * 2020-08-24 2020-11-17 上海宏路数据技术股份有限公司 Data association method, electronic device and medium
CN112612832A (en) * 2020-12-17 2021-04-06 北京锐安科技有限公司 Node analysis method, device, equipment and storage medium
CN112541043A (en) * 2020-12-24 2021-03-23 北京明略软件系统有限公司 Method, device and equipment for detecting connectivity of nodes of knowledge graph
CN112633178A (en) * 2020-12-24 2021-04-09 深圳集智数字科技有限公司 Image identification method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN110727740B (en) 2023-03-14

Similar Documents

Publication Publication Date Title
CN110727740B (en) Correlation analysis method and device, computer equipment and readable medium
CN107992596B (en) Text clustering method, text clustering device, server and storage medium
CN110020422B (en) Feature word determining method and device and server
CN109241225B (en) Method and device for mining competition relationship of interest points, computer equipment and storage medium
CN111343161B (en) Abnormal information processing node analysis method, abnormal information processing node analysis device, abnormal information processing node analysis medium and electronic equipment
CN112148843B (en) Text processing method and device, terminal equipment and storage medium
CN109978619B (en) Method, system, equipment and medium for screening air ticket pricing strategy
CN111666346A (en) Information merging method, transaction query method, device, computer and storage medium
US11893073B2 (en) Method and apparatus for displaying map points of interest, and electronic device
CN113641994B (en) Data processing method and system based on graph data
CN107133263A (en) POI recommends method, device, equipment and computer-readable recording medium
CN114281823A (en) Table processing method, device, equipment, storage medium and product
US9213759B2 (en) System, apparatus, and method for executing a query including boolean and conditional expressions
CN110162518B (en) Data grouping method, device, electronic equipment and storage medium
CN114419631A (en) Network management virtual system based on RPA
CN112084448A (en) Similar information processing method and device
CN112970011A (en) Recording pedigrees in query optimization
CN115619245A (en) Portrait construction and classification method and system based on data dimension reduction method
CN105786929A (en) Information monitoring method and device
CN114357180A (en) Knowledge graph updating method and electronic equipment
CN110457705B (en) Method, device, equipment and storage medium for processing point of interest data
CN113742450A (en) User data grade label falling method and device, electronic equipment and storage medium
CN110968690B (en) Clustering division method and device for words, equipment and storage medium
CN112784113A (en) Data processing method and device, computer readable storage medium and electronic equipment
CN115168577B (en) Model updating method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant