CN114186073A - Operation and maintenance fault diagnosis and analysis method based on subgraph matching and distributed query - Google Patents

Operation and maintenance fault diagnosis and analysis method based on subgraph matching and distributed query Download PDF

Info

Publication number
CN114186073A
CN114186073A CN202111520430.2A CN202111520430A CN114186073A CN 114186073 A CN114186073 A CN 114186073A CN 202111520430 A CN202111520430 A CN 202111520430A CN 114186073 A CN114186073 A CN 114186073A
Authority
CN
China
Prior art keywords
graph
sub
query
retrieval
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111520430.2A
Other languages
Chinese (zh)
Inventor
顾昊旻
陆宏波
袁以友
高德荃
来风刚
赵子岩
徐浩
曲延盛
王云霄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Information and Telecommunication Co Ltd
Anhui Jiyuan Software Co Ltd
Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd
Original Assignee
State Grid Information and Telecommunication Co Ltd
Anhui Jiyuan Software Co Ltd
Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Information and Telecommunication Co Ltd, Anhui Jiyuan Software Co Ltd, Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd filed Critical State Grid Information and Telecommunication Co Ltd
Priority to CN202111520430.2A priority Critical patent/CN114186073A/en
Publication of CN114186073A publication Critical patent/CN114186073A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Animal Behavior & Ethology (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an operation and maintenance fault diagnosis and analysis method based on subgraph matching and distributed query, which comprises the steps of 1) establishing a fault handling measure retrieval model of a knowledge graph based on a subgraph matching method; 2) based on similarity calculation of graph structures and semantic information, sorting the result sub-graphs to obtain an optimal query result; 3) optimizing based on a Top-k query model, and accelerating the query speed by using a distributed query method; 4) classifying the operation and maintenance alarm data and screening related network element attributes; 5) based on the large-scale intelligent operation and maintenance knowledge graph, processing steps of each fault are regularized; 6) directly calling an entity-relation-entity object in an intelligent operation and maintenance decision analysis module based on the knowledge graph platform in the steps 1), 2) and 3) to finally form a key operation and maintenance fault diagnosis analysis report. The invention solves the usability problem and the efficiency problem of the prior art by optimizing in the directions of sub-graph matching, retrieval algorithm, distributed processing and the like.

Description

Operation and maintenance fault diagnosis and analysis method based on subgraph matching and distributed query
Technical Field
The invention relates to the field of intelligent retrieval analysis, in particular to an operation and maintenance fault diagnosis and analysis method based on subgraph matching and distributed query.
Background
With the continuous development of artificial intelligence, the intelligent retrieval analysis method based on the knowledge graph is gradually applied to the fields of search engines, education, medical treatment, smart power grids and the like. Semantic information such as entities, attributes, relations and the like is extracted from data of each field through an extraction technology, a knowledge base is constructed through technologies such as knowledge fusion, knowledge processing and the like, and then retrieval analysis services required by a user are realized through matching analysis among the entities. Meanwhile, the knowledge graph adopts a format expressed by ontology terms and semantics, has a standard conceptual model, and can well solve a large amount of multi-source heterogeneous operation data accumulated by a power grid system, including numbers, characters, images and the like; moreover, the knowledge graph enhances the incidence relation among the data through the semantic link function, so that the data expression is more standard, the structuralization is stronger, the application scenes of technologies such as intelligent question answering, intelligent retrieval, auxiliary decision making and the like can be well adapted, and meanwhile, the method is also suitable for retrieval analysis of power grid knowledge.
The operation and maintenance data of the national network company oriented in the method are dispersed and large in scale, the data volume reaches ZB level scale, the constructed intelligent operation and maintenance knowledge graph collects data from a complex structure network, and the characteristics of data center dispersion, complex data network and large data scale are presented, and the characteristics make it difficult for a user to quickly obtain a satisfactory query result. Aiming at the characteristics, how to realize fast and efficient knowledge graph query is a problem to be solved urgently by the current system. The traditional knowledge graph query work generally simply models the knowledge graph query into a sub-graph matching problem, but in practical application, a plurality of defects exist.
First, most of the conventional knowledge graph query models require that query results are matched with user queries accurately, but due to the fact that noise data exists in knowledge graphs, the models can omit the query results which are interested by users, and the problem of poor usability exists.
Secondly, in order to accelerate the query speed, a graph indexing technology is generally adopted in the traditional knowledge graph query algorithm, but the data scale of the intelligent operation and maintenance knowledge graph in the project is large, and the graph index is established by consuming high time and space expenses.
Finally, the intelligent operation and maintenance knowledge graph network is complex and large in scale, so that the query process needs to be realized in a distributed mode, but the traditional distributed graph data processing platform is not optimized for the execution process of knowledge graph query, and the problem of low execution efficiency exists.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides an operation and maintenance fault diagnosis and analysis method based on sub-graph matching and distributed query, so that the usability problem and the efficiency problem in the prior art are solved through optimization in the directions of sub-graph matching, retrieval algorithm, distributed processing and the like.
The purpose of the invention is realized as follows: an operation and maintenance fault diagnosis and analysis method based on subgraph matching and distributed query comprises the following steps:
step 1) establishing a fault handling measure retrieval model of the knowledge graph based on a subgraph matching method: in an existing operation and maintenance knowledge graph, an operation and maintenance fault treatment measure retrieval model based on the knowledge graph is constructed through five steps of defining a retrieval graph, matching sub graphs, sub retrieval division, performing sub retrieval and connecting sub retrieval results;
step 2) checking the topological structure characteristics of the query graph and the result graph in the knowledge graph, and sorting the result graphs based on similarity calculation of graph structures and semantic information to obtain an optimal query result: similarity calculation based on graph structures is carried out on the query graph and the result subgraph, and semantic similarity calculation is carried out on semantic information between graphs through semantic feature description;
carrying out linear superposition on similarity calculation based on a graph structure and similarity calculation based on semantic information to obtain a final comprehensive Score Score of each sub-graph, and sequencing result sub-graphs through the Score to obtain an optimal query result so as to obtain optimal k result graphs;
and 3) optimizing based on the Top-k query model, accelerating the query speed by using a distributed query method, and optimizing the execution efficiency of distributed knowledge graph query from two aspects of job scheduling and data storage on a distributed graph data processing platform: optimizing based on a Top-k query model, accelerating query speed by using computing power of a distributed environment, and optimizing execution efficiency of distributed knowledge graph query from two aspects of job scheduling and data storage on a distributed graph data processing platform;
step 4), classifying the operation and maintenance alarm data and screening the related network element attributes: according to the problem information of different levels in a large amount of alarm data, important and key alarms are preferentially grabbed, and fault information is classified; when fault information occurs, preliminarily judging the processing level of the fault information and the affected service according to alarm classification, searching a network element attribution relation and a user capacity report form of a performance system through the network element attribution relation, and screening out the attribution relation, the number of registered users and the coverage area attribute according to the fault network element;
and 5) regularizing the processing steps of each fault based on the large-scale intelligent operation and maintenance knowledge graph: based on a large-scale intelligent operation and maintenance knowledge map, processing steps of each fault are regularized according to information in a historical fault database;
step 6) directly calling an entity-relation-entity object in an intelligent operation and maintenance decision analysis module based on the knowledge graph platform in the steps 1), 2) and 3), and finally forming a key operation and maintenance fault diagnosis analysis report: determining an entity-relation-entity object through a large-scale intelligent operation and maintenance knowledge map, and outputting a fault diagnosis description; the fault diagnosis knowledge conversion adopts an automatic means, and directly calls an entity-relation-entity object in an intelligent operation and maintenance decision analysis prototype module based on a knowledge map platform to finally form a one-key fault diagnosis analysis report.
As a further limitation of the present invention, the step 1) specifically comprises:
step 1.1) defining a retrieval graph: for search graph Q ═ EQ,RQ) Containing a set of points EQAnd edge set RQEach retrieval point corresponds to a specific entity description, and the edge represents the relationship between any two points;
step 1.2) matching subgraphs: for a given knowledge-graph G ═ (E)G,RG,EG) And search subgraph Q ═ EQ,RQ) The purpose of matching subgraphs is to find a matching subgraph phi (Q) of subgraph Q in graph G, phi being the point E in subgraph QQMapping to a point phi (E) in the map GG) In (2), the edge R in the subgraph QQMapping to an edge phi (R) in the graph GG) In the method, sub-graphs satisfying the relevant mapping function in the graph G are defined as matching sub-graphs phi (Q);
step 1.3) sub-retrieval division: dividing the retrieval graph into a plurality of sub retrieval graphs with small number of top points and single edge characteristics to reduce the retrieval difficulty, and dividing the sub retrieval graphs into a two-layer tree structure to enable each self retrieval graph to comprise a root node, a layer of sub nodes and edges; obtaining the retrieval result of the sub-retrieval through the layer-by-layer matching so as to obtain the retrieval result of the retrieval graph;
step 1.4) sub-search is carried out: decomposing the sub-search graph in the step 1.3) into a minimum spanning tree, inputting the data graph and the divided sub-search graph, and initializing a sub-search result set DiIf the matching point pair set T is empty, the root node generates an alternative matching point pair set T, and if all nodes of the sub-retrieval graph Q are contained in the set T and the edges of the calculation graph meet the standard, the result meeting the judgment standard is stored into a sub-retrieval result set DiAnd finally obtaining a result set D after completing all matchingi
Step 1.5) connector search results: for the sub-retrieval results obtained in the step 1.4), connecting all the sub-retrieval results together to generate a matching sub-graph; if and only if Qi、QjWhen two sub-searches have a common vertex, connecting search results; the basic process of the concatenation of the sub-search results is as follows: initializing a sub-search result set D, for a partitioned sub-search set Qi∈(Q1,Q2,…QN) Performing all Q's in accordance with a sub-search progression methodiAnd obtaining all sub-retrieval results, performing Hash connection on all the sub-retrieval results, storing the results with the matching degree meeting the threshold lambda into C, sorting the results according to the matching degree, evaluating the retrieval results stored in C by using an evaluation model to obtain the importance degree f of the retrieval results, returning to the retrieval result set C, and completing retrieval.
As a further limitation of the present invention, the step 2) specifically includes:
step 2.1) similarity calculation based on graph structure: carrying out quantitative analysis on the structures of the query map and the result subgraph; definition if there are two knowledge-maps G1Node a, G in2In the node b, the neighbor nodes in the two maps are similar, and then the node a is similar to the node b; similarly, if the starting point and the end point of the edge are similar, the edges are similar; defining that if the similarity of any node or any edge is higher, the degree of subgraph matching is higher; similar in structureThe degree is mainly measured by a matrix formed by the node similarity and the edge similarity; definition map G1In the map, i nodes are present2If there are j nodes, the size of the similarity matrix i x j is expressed by xabRepresentation map G1Middle node a and graph G2Similarity of middle node b, ycdRepresentation map G1Middle border c and map G2And (3) obtaining the score solving formula of the following nodes and edges according to the similarity of the middle edge d:
Figure BDA0003407326380000051
wherein SX represents map G1And map G2Node similarity score matrix, xi(k) Representing the similarity of each point in the two maps after k iterations; SY represents map G1And map G2Edge similarity score matrix, yi(k) Representing the similarity of each edge in the two maps after k iterations; obtaining a structural similarity score S of the query graph and the result sub-graph by averaging the graph point similarity and the graph edge similarity and adding the graph point similarity and the graph edge similaritysimThe formula is as follows:
Figure BDA0003407326380000052
wherein n is1And n2Respectively represent a map G1And map G2Number of middle nodes, m1And m2Respectively represent a map G1And map G2The number of middle edges;
step 2.2) similarity calculation based on semantic information: for a given query graph Gs=(g1,g2,…,gn) And result sub-graph Gr=(r1,r2,…,rn) Wherein r isiFor triples, likelihood estimation probability p (G) is defined for similarity of query graph and result subgraphs|Gr) Representing, judging the similarity according to the probability, sorting the result sub-graphs, and estimating the similarity based on likelihoodProbability p (G)s|Gr) Semantic similarity score of
Figure BDA0003407326380000053
The calculation method is as follows:
Figure BDA0003407326380000061
wherein, p (g)i|Gr) Representing query graph GsCan generate words giProbability of using giProbability p (g) generated in multiple trigram modelsi|rj) Is expressed by the average value of;
step 2.3) obtaining a linear weighted similarity score: general pairs of structural similarity scores S in step 2.1)simScoring semantic similarity with step 2.2)
Figure BDA0003407326380000062
And performing linear weighted fusion to obtain the final similarity score condition, wherein the formula is shown as follows:
Figure BDA0003407326380000063
wherein eta is a variable parameter with a value of [0, 1] and is used for adjusting the proportion of the two similarity scores in the comprehensive similarity score; and sequencing the result subgraphs by the scores of the comprehensive similarity to obtain the optimal query result and finish the fault retrieval.
As a further limitation of the present invention, the step 3) specifically includes:
step 3.1) optimizing based on a Top-k query model, accelerating query speed by utilizing computing power of a distributed environment, and calculating the distance between entities in the knowledge graph in real time by adopting a distributed breadth-first search method; the checking and optimizing method based on the bounding technology is proposed to accelerate the checking speed, the accurate distance is replaced by the upper and lower bounds of the distance between the entities, and the optimal k result graphs are deduced based on the upper and lower bounds, so that the checking time is reduced; the knowledge graph spectrogram query algorithm is realized in a distributed environment, and the distributed query algorithm is ensured to be executed in an actual environment through a storage mode of the knowledge graph in the distributed environment and an interaction mode among query tasks;
and 3.2) on the distributed graph data processing platform, optimizing the execution efficiency of the distributed knowledge graph query from two aspects of job scheduling and data storage: data loading time of the distributed graph checking task is optimized; scheduling the tasks to the computing nodes where the data are located through a data locality oriented task scheduling algorithm; through a data map multiplexing technology based on a shared memory, knowledge map data in the memory is multiplexed by a plurality of checking tasks.
By adopting the technical scheme, compared with the prior art, the invention has the beneficial effects that: 1) the improved retrieval method is designed on the basis of sub-graph matching, and the retrieval accuracy is effectively improved and the influence of noise data is reduced by linearly overlapping the similarity based on the graph structure and the similarity based on the semantic information; 2) the invention adopts a distributed method to realize the query process, optimizes the query time and accelerates the query speed; 3) on a distributed graph data processing platform, the execution efficiency of distributed knowledge graph query is optimized from two aspects of job scheduling and data storage, the data I/O overhead is reduced, and the overall query completion time is further shortened.
Drawings
Figure 1 is an overall block diagram of the present invention.
FIG. 2 is a conceptual diagram of a retrieval subgraph constructed by the present invention.
FIG. 3 is a conceptual diagram of the search subgraph partitioning of the present invention.
Detailed Description
The operation and maintenance fault diagnosis and analysis method based on subgraph matching and distributed query as shown in fig. 1 comprises the following steps:
step 1) establishing a fault handling measure retrieval model of the knowledge graph based on a subgraph matching method: in an existing operation and maintenance knowledge graph, an operation and maintenance fault treatment measure retrieval model based on the knowledge graph is constructed through five steps of defining a retrieval graph, matching sub graphs, sub retrieval division, performing sub retrieval and connecting sub retrieval results;
step 1.1) defining a retrieval graph: for search graph Q ═ EQ,RQ) Containing a set of points EQAnd edge set RQEach retrieval point corresponds to a specific entity description, and the edge represents the relationship between any two points;
step 1.2) matching subgraphs: for a given knowledge-graph G ═ (E)G,RG,EG) And search subgraph Q ═ EQ,RQ) The purpose of matching subgraphs is to find a matching subgraph phi (Q) of subgraph Q in graph G, phi being the point E in subgraph QQMapping to a point phi (E) in the map GG) In (2), the edge R in the subgraph QQMapping to an edge phi (R) in the graph GG) In the method, sub-graphs satisfying the relevant mapping function in the graph G are defined as matching sub-graphs phi (Q);
step 1.3) sub-retrieval division: considering that the number of vertexes and edges of a retrieval graph is too large, dividing the retrieval graph into a plurality of sub retrieval graphs with small number of vertexes and single edge characteristics, and reducing retrieval difficulty, dividing the sub retrieval graphs into a two-layer tree structure to enable each self retrieval graph to comprise a root node, a layer of sub nodes and edges; obtaining the retrieval result of the sub-retrieval through the layer-by-layer matching so as to obtain the retrieval result of the retrieval graph; the constructed retrieval subgraph is shown in FIG. 2, and the division of the retrieval subgraph is shown in FIG. 3;
step 1.4) sub-search is carried out: decomposing the sub-search graph in the step 1.3) into a minimum spanning tree, inputting the data graph and the divided sub-search graph, and initializing a sub-search result set DiIf the matching point pair set T is empty, the root node generates an alternative matching point pair set T, and if all nodes of the sub-retrieval graph Q are contained in the set T and the edges of the calculation graph meet the standard, the result meeting the judgment standard is stored into a sub-retrieval result set DiAnd finally obtaining a result set D after completing all matchingi
Step 1.5) connector search results: for the sub-retrieval results obtained in the step 1.4), connecting the results obtained by all the sub-retrievalConnecting together to generate a matching subgraph; if and only if Qi、QjWhen two sub-searches have a common vertex, connecting search results; the basic process of the concatenation of the sub-search results is as follows: initializing a sub-search result set D, for a partitioned sub-search set Qi∈(Q1,Q2,…QN) Performing all Q's in accordance with a sub-search progression methodiAnd obtaining all sub-retrieval results, performing Hash connection on all the sub-retrieval results, storing the results with the matching degree meeting the threshold lambda into C, sorting the results according to the matching degree, evaluating the retrieval results stored in C by using an evaluation model to obtain the importance degree f of the retrieval results, returning to the retrieval result set C, and completing retrieval.
Step 2) checking the topological structure characteristics of the query graph and the result graph in the knowledge graph, and sorting the result graphs based on similarity calculation of graph structures and semantic information to obtain an optimal query result: similarity calculation based on graph structures is carried out on the query graph and the result subgraph, and semantic similarity calculation is carried out on semantic information between graphs through semantic feature description;
carrying out linear superposition on similarity calculation based on a graph structure and similarity calculation based on semantic information to obtain a final comprehensive Score Score of each sub-graph, and sequencing result sub-graphs through the Score to obtain an optimal query result so as to obtain optimal k result graphs;
step 2.1) similarity calculation based on graph structure: carrying out quantitative analysis on the structures of the query map and the result subgraph; definition if there are two knowledge-maps G1Node a, G in2In the node b, the neighbor nodes in the two maps are similar, and then the node a is similar to the node b; similarly, if the starting point and the end point of the edge are similar, the edges are similar; defining that if the similarity of any node or any edge is higher, the degree of subgraph matching is higher; the structural similarity is mainly measured by a matrix formed by the node similarity and the edge similarity; definition map G1In the map, i nodes are present2If there are j nodes, the size of the similarity matrix i x j is expressed by xabRepresentation map G1Middle node a and graph G2Similarity of middle node b, ycdRepresentation map G1Middle border c and map G2And (3) obtaining the score solving formula of the following nodes and edges according to the similarity of the middle edge d:
Figure BDA0003407326380000091
wherein SX represents map G1And map G2Node similarity score matrix, xi(k) Representing the similarity of each point in the two maps after k iterations; SY represents map G1And map G2Edge similarity score matrix, yi(k) Representing the similarity of each edge in the two maps after k iterations; obtaining a structural similarity score S of the query graph and the result sub-graph by averaging the graph point similarity and the graph edge similarity and adding the graph point similarity and the graph edge similaritysimThe formula is as follows:
Figure BDA0003407326380000092
wherein n is1And n2Respectively represent a map G1And map G2Number of middle nodes, m1And m2Respectively represent a map G1And map G2The number of middle edges;
step 2.2) similarity calculation based on semantic information: for a given query graph Gs=(g1,g2,…,gn) And result sub-graph Gr=(r1,r2,…,rn) Wherein r isiFor triples, likelihood estimation probability p (G) is defined for similarity of query graph and result subgraphs|Gr) Representing that the similarity is judged according to the probability and the result sub-graphs are sorted, and the probability p (G) is estimated based on the likelihoods|Gr) Semantic similarity score of
Figure BDA0003407326380000101
The calculation method is as follows:
Figure BDA0003407326380000102
wherein, p (g)i|Gr) Representing query graph GsCan generate words giProbability of using giProbability p (g) generated in multiple trigram modelsi|rj) Is expressed by the average value of;
step 2.3) obtaining a linear weighted similarity score: general pairs of structural similarity scores S in step 2.1)simScoring semantic similarity with step 2.2)
Figure BDA0003407326380000103
And performing linear weighted fusion to obtain the final similarity score condition, wherein the formula is shown as follows:
Figure BDA0003407326380000104
wherein eta is a variable parameter with a value of [0, 1] and is used for adjusting the proportion of the two similarity scores in the comprehensive similarity score; and sequencing the result subgraphs by the scores of the comprehensive similarity to obtain the optimal query result and finish the fault retrieval.
And 3) optimizing based on the Top-k query model, accelerating the query speed by using a distributed query method, and optimizing the execution efficiency of distributed knowledge graph query from two aspects of job scheduling and data storage on a distributed graph data processing platform: optimizing based on a Top-k query model, accelerating query speed by using computing power of a distributed environment, achieving the purpose of quickly responding to a query request, and optimizing the execution efficiency of distributed knowledge graph query from two aspects of job scheduling and data storage on a distributed graph data processing platform;
step 3.1) optimizing based on a Top-k query model, and accelerating the query speed by using the computing power of a distributed environment to achieve the aim of quickly responding to the query request; in order to achieve the purpose of index avoidance, the distance between the entities in the knowledge graph is calculated in real time by adopting a distributed breadth-first search method, and the distance between any two entities is prevented from being calculated in advance and stored; in order to accelerate the query speed, a checking and querying optimization method based on a clearance technology is proposed to accelerate the query speed, the accurate distance is replaced by the upper and lower bounds of the distance between the entities, and the optimal k result graphs are derived based on the upper and lower bounds, so that the purpose of effectively reducing the query time is achieved; the knowledge graph spectrogram query algorithm is realized in a distributed environment, and the distributed query algorithm is ensured to be executed in an actual environment through a storage mode of the knowledge graph in the distributed environment and an interaction mode among query tasks;
and 3.2) on the distributed graph data processing platform, optimizing the execution efficiency of the distributed knowledge graph query from two aspects of job scheduling and data storage: by optimizing the data loading time of the distributed checking task, the execution performance of the checking task is improved; scheduling tasks to computing nodes where data are located through a data locality-oriented task scheduling algorithm so as to avoid the influence of network I/O on checking performance as much as possible; through a data map multiplexing technology based on a shared memory, knowledge map data in the memory is multiplexed by a plurality of checking tasks, and I/O (input/output) overhead caused by repeated loading of data maps is avoided.
Step 4), classifying the operation and maintenance alarm data and screening the related network element attributes: according to the problem information of different levels in a large amount of alarm data, important and key alarms are preferentially grabbed, and fault information is classified; when fault information occurs, the processing level of the fault information and possibly influenced services are preliminarily judged according to alarm classification, the home relationship of a network element of a performance system and a user capacity report are searched through the home relationship of the network element, the home relationship, the number of registered users and the coverage area attribute are screened out according to the fault network element, and support of relevant information is provided for fault auxiliary decision making;
and 5) regularizing the processing steps of each fault based on the large-scale intelligent operation and maintenance knowledge graph: based on a large-scale intelligent operation and maintenance knowledge map, processing steps of each fault are regularized according to information in a historical fault database; such as the equipment which needs to be inquired after the key alarm appears clearly, and the specific inquiry content of different professional equipment.
Step 6) directly calling an entity-relation-entity object in an intelligent operation and maintenance decision analysis module based on the knowledge graph platform in the steps 1), 2) and 3), and finally forming a key operation and maintenance fault diagnosis analysis report: determining an entity-relation-entity object through a large-scale intelligent operation and maintenance knowledge map, and outputting a fault diagnosis description; the fault diagnosis knowledge conversion adopts an automatic means, and directly calls an entity-relation-entity object in an intelligent operation and maintenance decision analysis prototype module based on a knowledge map platform to finally form a one-key fault diagnosis analysis report.
Aiming at the characteristics of a cloud data center that the intelligent operation and maintenance knowledge graph has more noise data and large data scale, the operation and maintenance fault diagnosis and analysis method based on sub-graph matching and distributed query is provided, so that the problems of availability and efficiency in the prior art are solved through optimization in the directions of sub-graph matching, retrieval algorithm, distributed processing and the like, and support is provided for intelligent operation and maintenance decision analysis.
The present invention is not limited to the above-mentioned embodiments, and based on the technical solutions disclosed in the present invention, those skilled in the art can make some substitutions and modifications to some technical features without creative efforts according to the disclosed technical contents, and these substitutions and modifications are all within the protection scope of the present invention.

Claims (4)

1. An operation and maintenance fault diagnosis and analysis method based on subgraph matching and distributed query is characterized by comprising the following steps:
step 1) establishing a fault handling measure retrieval model of the knowledge graph based on a subgraph matching method: in an existing operation and maintenance knowledge graph, an operation and maintenance fault treatment measure retrieval model based on the knowledge graph is constructed through five steps of defining a retrieval graph, matching sub graphs, sub retrieval division, performing sub retrieval and connecting sub retrieval results;
step 2) checking the topological structure characteristics of the query graph and the result graph in the knowledge graph, and sorting the result graphs based on similarity calculation of graph structures and semantic information to obtain an optimal query result: similarity calculation based on graph structures is carried out on the query graph and the result subgraph, and semantic similarity calculation is carried out on semantic information between graphs through semantic feature description;
carrying out linear superposition on similarity calculation based on a graph structure and similarity calculation based on semantic information to obtain a final comprehensive Score Score of each sub-graph, and sequencing result sub-graphs through the Score to obtain an optimal query result so as to obtain optimal k result graphs;
and 3) optimizing based on the Top-k query model, accelerating the query speed by using a distributed query method, and optimizing the execution efficiency of distributed knowledge graph query from two aspects of job scheduling and data storage on a distributed graph data processing platform: optimizing based on a Top-k query model, accelerating query speed by using computing power of a distributed environment, and optimizing execution efficiency of distributed knowledge graph query from two aspects of job scheduling and data storage on a distributed graph data processing platform;
step 4), classifying the operation and maintenance alarm data and screening the related network element attributes: according to the problem information of different levels in a large amount of alarm data, important and key alarms are preferentially grabbed, and fault information is classified; when fault information occurs, preliminarily judging the processing level of the fault information and the affected service according to alarm classification, searching a network element attribution relation and a user capacity report form of a performance system through the network element attribution relation, and screening out the attribution relation, the number of registered users and the coverage area attribute according to the fault network element;
and 5) regularizing the processing steps of each fault based on the large-scale intelligent operation and maintenance knowledge graph: based on a large-scale intelligent operation and maintenance knowledge map, processing steps of each fault are regularized according to information in a historical fault database;
step 6) directly calling an entity-relation-entity object in an intelligent operation and maintenance decision analysis module based on the knowledge graph platform in the steps 1), 2) and 3), and finally forming a one-key operation and maintenance fault diagnosis analysis report: determining an entity-relation-entity object through a large-scale intelligent operation and maintenance knowledge map, and outputting a fault diagnosis description; the fault diagnosis knowledge conversion adopts an automatic means, and directly calls an entity-relation-entity object in an intelligent operation and maintenance decision analysis prototype module based on a knowledge map platform to finally form a one-key fault diagnosis analysis report.
2. The operation and maintenance fault diagnosis analysis method based on subgraph matching and distributed query according to claim 1, wherein the step 1) specifically comprises:
step 1.1) defining a retrieval graph: for search graph Q ═ EQ,RQ) Containing a set of points EQAnd edge set RQEach retrieval point corresponds to a specific entity description, and the edge represents the relationship between any two points;
step 1.2) matching subgraphs: for a given knowledge-graph G ═ (E)G,RG,EG) And search subgraph Q ═ EQ,RQ) The purpose of matching subgraphs is to find a matching subgraph phi (Q) of subgraph Q in graph G, phi being the point E in subgraph QQMapping to a point phi (E) in the map GG) In (2), the edge R in the subgraph QQMapping to an edge phi (R) in the graph GG) In the method, sub-graphs satisfying the relevant mapping function in the graph G are defined as matching sub-graphs phi (Q);
step 1.3) sub-retrieval division: dividing the retrieval graph into a plurality of sub retrieval graphs with small number of top points and single edge characteristics to reduce the retrieval difficulty, and dividing the sub retrieval graphs into a two-layer tree structure to enable each self retrieval graph to comprise a root node, a layer of sub nodes and edges; obtaining the retrieval result of the sub-retrieval through the layer-by-layer matching so as to obtain the retrieval result of the retrieval graph;
step 1.4) sub-search is carried out: decomposing the sub-search graph in the step 1.3) into a minimum spanning tree, inputting the data graph and the divided sub-search graph, and initializing a sub-search result set DiIf the matching point pair set T is empty, generating an alternative matching point pair set T by the root node, and if all the nodes of the sub retrieval graph Q are contained in the set T, calculating whether the edges of the graph meet the standard or not,storing the result meeting the judgment standard into a sub-retrieval result set DiAnd finally obtaining a result set D after completing all matchingi
Step 1.5) connector search results: for the sub-retrieval results obtained in the step 1.4), connecting all the sub-retrieval results together to generate a matching sub-graph; if and only if Qi、QjWhen two sub-searches have a common vertex, connecting search results; the basic process of the concatenation of the sub-search results is as follows: initializing a sub-search result set D, for a partitioned sub-search set Qi∈(Q1,Q2,…QN) Performing all Q's in accordance with a sub-search progression methodiAnd obtaining all sub-retrieval results, performing Hash connection on all the sub-retrieval results, storing the results with the matching degree meeting the threshold lambda into C, sorting the results according to the matching degree, evaluating the retrieval results stored in C by using an evaluation model to obtain the importance degree f of the retrieval results, returning to the retrieval result set C, and completing retrieval.
3. The operation and maintenance fault diagnosis analysis method based on subgraph matching and distributed query according to claim 1, wherein the step 2) specifically comprises:
step 2.1) similarity calculation based on graph structure: carrying out quantitative analysis on the structures of the query map and the result subgraph; definition if there are two knowledge-maps G1Node a, G in2In the node b, the neighbor nodes in the two maps are similar, and then the node a is similar to the node b; similarly, if the starting point and the end point of the edge are similar, the edges are similar; defining that if the similarity of any node or any edge is higher, the degree of subgraph matching is higher; the structural similarity is mainly measured by a matrix formed by the node similarity and the edge similarity; definition map G1In the map, i nodes are present2If there are j nodes, the size of the similarity matrix i x j is expressed by xabRepresentation map G1Middle node a and graph G2Similarity of middle node b, ycdRepresentation map G1Middle border c and map G2The similarity of the middle edge d is obtained by the following score solving formula of the node and the edge:
Figure FDA0003407326370000031
Wherein SX represents map G1And map G2Node similarity score matrix, xi(k) Representing the similarity of each point in the two maps after k iterations; SY represents map G1And map G2Edge similarity score matrix, yi(k) Representing the similarity of each edge in the two maps after k iterations; obtaining a structural similarity score S of the query graph and the result sub-graph by averaging the graph point similarity and the graph edge similarity and adding the graph point similarity and the graph edge similaritysimThe formula is as follows:
Figure FDA0003407326370000041
wherein n is1And n2Respectively represent a map G1And map G2Number of middle nodes, m1And m2Respectively represent a map G1And map G2The number of middle edges;
step 2.2) similarity calculation based on semantic information: for a given query graph Gs=(g1,g2,…,gn) And result sub-graph Gr=(r1,r2,…,rn) Wherein r isiFor triples, likelihood estimation probability p (G) is defined for similarity of query graph and result subgraphs|Gr) Representing that the similarity is judged according to the probability and the result sub-graphs are sorted, and the probability p (G) is estimated based on the likelihoods|Gr) Semantic similarity score of
Figure FDA0003407326370000043
The calculation method is as follows:
Figure FDA0003407326370000042
wherein, p (g)i|Gr) Representing query graph GsCan generate words giProbability of using giProbability p (g) generated in multiple trigram modelsi|rj) Is expressed by the average value of;
step 2.3) obtaining a linear weighted similarity score: general pairs of structural similarity scores S in step 2.1)simScoring semantic similarity with step 2.2)
Figure FDA0003407326370000044
And performing linear weighted fusion to obtain the final similarity score condition, wherein the formula is shown as follows:
Figure FDA0003407326370000045
wherein eta is a variable parameter with a value of [0, 1] and is used for adjusting the proportion of the two similarity scores in the comprehensive similarity score; and sequencing the result subgraphs by the scores of the comprehensive similarity to obtain the optimal query result and finish the fault retrieval.
4. The operation and maintenance fault diagnosis analysis method based on subgraph matching and distributed query according to claim 1, wherein the step 3) specifically comprises:
step 3.1) optimizing based on a Top-k query model, accelerating query speed by utilizing computing power of a distributed environment, and calculating the distance between entities in the knowledge graph in real time by adopting a distributed breadth-first search method; the checking and optimizing method based on the bounding technology is proposed to accelerate the checking speed, the accurate distance is replaced by the upper and lower bounds of the distance between the entities, and the optimal k result graphs are deduced based on the upper and lower bounds, so that the checking time is reduced; the knowledge graph spectrogram query algorithm is realized in a distributed environment, and the distributed query algorithm is ensured to be executed in an actual environment through a storage mode of the knowledge graph in the distributed environment and an interaction mode among query tasks;
and 3.2) on the distributed graph data processing platform, optimizing the execution efficiency of the distributed knowledge graph query from two aspects of job scheduling and data storage: data loading time of the distributed graph checking task is optimized; scheduling the tasks to the computing nodes where the data are located through a data locality oriented task scheduling algorithm; through a data map multiplexing technology based on a shared memory, knowledge map data in the memory is multiplexed by a plurality of checking tasks.
CN202111520430.2A 2021-12-13 2021-12-13 Operation and maintenance fault diagnosis and analysis method based on subgraph matching and distributed query Pending CN114186073A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111520430.2A CN114186073A (en) 2021-12-13 2021-12-13 Operation and maintenance fault diagnosis and analysis method based on subgraph matching and distributed query

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111520430.2A CN114186073A (en) 2021-12-13 2021-12-13 Operation and maintenance fault diagnosis and analysis method based on subgraph matching and distributed query

Publications (1)

Publication Number Publication Date
CN114186073A true CN114186073A (en) 2022-03-15

Family

ID=80543518

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111520430.2A Pending CN114186073A (en) 2021-12-13 2021-12-13 Operation and maintenance fault diagnosis and analysis method based on subgraph matching and distributed query

Country Status (1)

Country Link
CN (1) CN114186073A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114465874A (en) * 2022-04-07 2022-05-10 北京宝兰德软件股份有限公司 Fault prediction method, device, electronic equipment and storage medium
CN114912637A (en) * 2022-05-21 2022-08-16 重庆大学 Operation and maintenance decision method and system for man-machine knowledge map manufacturing production line and storage medium
CN115524002A (en) * 2022-09-19 2022-12-27 国家电投集团河南电力有限公司 Running state early warning method and system for power plant rotating equipment and storage medium
CN117272170A (en) * 2023-09-20 2023-12-22 东旺智能科技(上海)有限公司 Knowledge graph-based IT operation and maintenance fault root cause analysis method

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114465874A (en) * 2022-04-07 2022-05-10 北京宝兰德软件股份有限公司 Fault prediction method, device, electronic equipment and storage medium
CN114465874B (en) * 2022-04-07 2022-07-29 北京宝兰德软件股份有限公司 Fault prediction method, device, electronic equipment and storage medium
CN114912637A (en) * 2022-05-21 2022-08-16 重庆大学 Operation and maintenance decision method and system for man-machine knowledge map manufacturing production line and storage medium
CN114912637B (en) * 2022-05-21 2023-08-29 重庆大学 Human-computer object knowledge graph manufacturing production line operation and maintenance decision method and system and storage medium
CN115524002A (en) * 2022-09-19 2022-12-27 国家电投集团河南电力有限公司 Running state early warning method and system for power plant rotating equipment and storage medium
CN115524002B (en) * 2022-09-19 2023-08-22 国家电投集团河南电力有限公司 Operation state early warning method, system and storage medium of power plant rotating equipment
CN117272170A (en) * 2023-09-20 2023-12-22 东旺智能科技(上海)有限公司 Knowledge graph-based IT operation and maintenance fault root cause analysis method
CN117272170B (en) * 2023-09-20 2024-03-08 东旺智能科技(上海)有限公司 Knowledge graph-based IT operation and maintenance fault root cause analysis method

Similar Documents

Publication Publication Date Title
CN114186073A (en) Operation and maintenance fault diagnosis and analysis method based on subgraph matching and distributed query
WO2021189729A1 (en) Information analysis method, apparatus and device for complex relationship network, and storage medium
US11442915B2 (en) Methods and systems for extracting and visualizing patterns in large-scale data sets
CN110825769A (en) Data index abnormity query method and system
US11755284B2 (en) Methods and systems for improved data retrieval and sorting
CN111552813A (en) Power knowledge graph construction method based on power grid full-service data
US11947596B2 (en) Index machine
CN112085072A (en) Cross-modal retrieval method of sketch retrieval three-dimensional model based on space-time characteristic information
CN116611546B (en) Knowledge-graph-based landslide prediction method and system for target research area
CN113254630A (en) Domain knowledge map recommendation method for global comprehensive observation results
CN113220904A (en) Data processing method, data processing device and electronic equipment
Sabarish et al. Clustering of trajectory data using hierarchical approaches
CN117744784B (en) Medical scientific research knowledge graph construction and intelligent retrieval method and system
Dong et al. Rw-tree: A learned workload-aware framework for r-tree construction
Sasi Kumar et al. DeepQ Based Heterogeneous Clustering Hybrid Cloud Prediction Using K-Means Algorithm
CN117221087A (en) Alarm root cause positioning method, device and medium
CN117010373A (en) Recommendation method for category and group to which asset management data of power equipment belong
CN113821550B (en) Road network topological graph dividing method, device, equipment and computer program product
CN116011564A (en) Entity relationship completion method, system and application for power equipment
US11768857B2 (en) Methods and systems for indexlet based aggregation
CN109582806B (en) Personal information processing method and system based on graph calculation
CN113127714A (en) Logistics big data acquisition method
CN111291102A (en) High-performance scale statistical calculation method for government affair data mining
CN116578676B (en) Method and system for inquiring space-time evolution of place name
CN116993307B (en) Collaborative office method and system with artificial intelligence learning capability

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination