CN117435714B

CN117435714B - Knowledge graph-based database and middleware problem intelligent diagnosis system

Info

Publication number: CN117435714B
Application number: CN202311753237.2A
Authority: CN
Inventors: 李静; 李亚运; 董钢
Original assignee: Hunan Ziweiyuan Information System Co ltd
Current assignee: Hunan Ziweiyuan Information System Co ltd
Priority date: 2023-12-20
Filing date: 2023-12-20
Publication date: 2024-03-08
Anticipated expiration: 2043-12-20
Also published as: CN117435714A

Abstract

The invention relates to the technical field of natural language processing, in particular to a knowledge-graph-based database and middleware problem intelligent diagnosis system. The system comprises: the system comprises a data acquisition module, a knowledge graph database construction module and a middleware problem intelligent diagnosis module, wherein training data and middleware problem text data are acquired; training a Bi-LSTM relation extraction model; acquiring a middleware entity set; acquiring an entity distance matrix; analyzing the occurrence frequency of the entity in the text data, and constructing a middleware entity distance collineation matrix; converting the words of the middleware problem sentences into word vectors; constructing a middleware entity function similarity matrix and an entity association confidence coefficient matrix; acquiring a middleware entity association degree matrix; obtaining a triplet by adopting a Bi-LSTM model to complete the construction of a knowledge graph database; therefore, intelligent diagnosis of the middleware problems is completed, the quality of a database is improved, and the performance of the intelligent diagnosis system of the middleware problems is ensured.

Description

Knowledge graph-based database and middleware problem intelligent diagnosis system

Technical Field

The invention relates to the technical field of natural language processing, in particular to a knowledge-graph-based database and middleware problem intelligent diagnosis system.

Background

Middleware refers to a software component or service located between a client and a server, which can abstract the complexity of the underlying layer, so that a developer is focused on the implementation of business logic without concern for the details of the underlying layer. However, the middleware has problems of compatibility and version, possibly becoming a security vulnerability source of the system, high complexity of middleware configuration and management, and possible dependence of the middleware on other components or services. When these problems occur, the safety and stability of the system operation are seriously affected, and the problems need to be solved in time. Therefore, an intelligent diagnosis system for middleware problems needs to be constructed so as to quickly locate the problems, improve efficiency, reduce cost and improve user experience. The knowledge graph-based method is adopted to construct the middleware problem, and knowledge can be induced and relation extracted through natural language processing and other technologies.

In the process of constructing the knowledge graph, named entity recognition is generally used for marking and extracting the entities in the text; then extracting relationships between entities from the text using rule-based, machine-learning or deep-learning methods; then fusing the knowledge acquired by different sources to solve the consistency problems of different entities and relations, such as duplication elimination, disambiguation, entity alignment and the like; and finally selecting a proper database for storage. When the relation extraction between entities is performed, the relation extraction is difficult due to the diversity and complexity of information sources and the existence of some interference data containing noise, so that the final accuracy is affected.

Disclosure of Invention

In order to solve the technical problems, the invention aims to provide a knowledge graph-based database and middleware problem intelligent diagnosis system, which adopts the following technical scheme:

the invention provides a knowledge graph-based database and middleware problem intelligent diagnosis system, which comprises:

the data acquisition module is used for acquiring training data and middleware problem text data;

the knowledge graph database construction module is used for training a Bi-LSTM relation extraction model according to training data; acquiring a middleware entity set based on sentences of the middleware problem text data; acquiring an entity distance matrix according to the position information of each element of the middleware entity set; acquiring a middleware entity distance collineation matrix according to the occurrence frequency of each entity and other entities in text data and combining a middleware entity set; converting words in the middleware problem sentences into word vectors by adopting a BERT pre-training language model; acquiring a middleware entity function similarity matrix according to cosine similarity among word vectors; acquiring an entity association confidence coefficient matrix according to word part-of-speech distribution in each entity neighborhood; acquiring a middleware entity association degree matrix according to the middleware entity function similarity matrix and the entity association confidence coefficient matrix; acquiring triples by combining Bi-LSTM model with middleware entity set and middleware entity association degree; completing construction of a knowledge graph database according to the triples, specifically taking each triplet as each data of the knowledge graph database;

and the middleware problem intelligent diagnosis module is used for completing the intelligent diagnosis of the middleware problem based on the constructed knowledge graph database.

Further, the training the Bi-LSTM relationship extraction model according to the training data includes:

the input of the relation extraction model is a segmented sentence, and the output of the relation extraction model is entity pairs contained in each sentence and the relation between the entity pairs.

Further, the sentence obtaining middleware entity set based on the middleware question text data includes:

sentences for each middleware question text; extracting the entity of the sentence by adopting a named entity recognition model, and taking the entity of each sentence as an element of a middleware entity set.

Further, the obtaining the entity distance matrix according to the position information of each element of the middleware entity set includes:

taking a set formed by positions of each entity in sentences of the middleware entity set as an entity position set; the absolute value of the difference of the positions between each entity and other entities is taken as the distance between the two entities, and the distance is taken as each element of the entity distance matrix.

Further, the obtaining the middleware entity distance collineation matrix according to the frequency of each entity and other entities in the text data and combining the middleware entity set includes:

counting the number of times the ith entity and the jth entity co-occurThe method comprises the steps of carrying out a first treatment on the surface of the Counting the number of times of independent occurrence of the ith entity and the jth entity respectively, and marking the number of times as: />、/>The method comprises the steps of carrying out a first treatment on the surface of the The middleware entity distance collineation matrix has the expression:

in the method, in the process of the invention,representing the position +.>Is an element of (2); />Representation ofA maximum function; />Representing the elements of the ith row and jth column of the entity distance matrix.

Further, the obtaining the functional similarity matrix of the middleware entity according to cosine similarity between word vectors includes:

matching the ith entity and the jth entity by adopting a Hungary algorithm to obtain the number M of matched word pairs; the functional similarity matrix of the middleware entity has the expression:

in the method, in the process of the invention,representing the position +.>Is an element of (2); />Representing the position +.>Is an element of (2); />Representing a computed cosine similarity; />A word vector representing a kth matching word of an ith entity in the set of middleware entities; />A word vector representing a kth matching word of a jth entity in the set of middleware entities.

Further, the obtaining the entity association confidence coefficient matrix according to the word part of speech distribution in each entity neighborhood includes:

acquiring two entities corresponding to each element of the entity distance matrix, and storing the two entities as an ith entity and a jth entity;

setting a neighborhood window; taking the ith entity and the jth entity as centers respectively, and if verbs or predicates exist in the neighborhood window, judging coefficients of the entities are equal to 1; if the verb or the predicate does not exist in the neighborhood window, the judgment coefficient of the entity is equal to 0;

calculating the judging coefficient of the ith entity, and the sum value of the judging coefficient of the jth entity and 1; and taking the product of the sum value and the reciprocal of the corresponding entity matrix element as the corresponding element of the entity association confidence matrix, wherein i and j represent the serial numbers of the entities.

Further, the obtaining the middleware entity association degree matrix according to the middleware entity function similarity matrix and the entity association confidence coefficient matrix includes:

and taking the product of each element of the middleware entity function similarity matrix and the corresponding element of the entity association confidence matrix as the corresponding element of the middleware entity association confidence matrix.

Further, the obtaining the triples by combining the Bi-LSTM model with the middleware entity set and the middleware entity association degree matrix includes:

taking the corresponding elements of the association degree of the two entities and the middleware entity and the corresponding middleware problem sentences as input of the Bi-LSTM model, and outputting the input as the association relation type between the two entities;

the association relationship type comprises the following steps: data transfer relationship, dependency relationship, cooperative relationship, control relationship, monitoring relationship, and independent system;

and the triples are formed by two entities and corresponding association relationship types.

Furthermore, the intelligent diagnosis of the middleware problem is completed based on the constructed knowledge graph database, specifically:

designing a user interface: allowing a user to seek a solution by inputting a problem or describing a phenomenon;

analysis of problems: analyzing the problems input by the user, and extracting corresponding keywords, error information, environment configuration and other entities;

inquiring the knowledge graph: inquiring keywords in the user problem by using the constructed knowledge graph to find out entities and relations related to the user input problem;

knowledge reasoning: based on the entities and the relations in the knowledge graph, the system performs reasoning and analysis to find out reasons and solutions;

results show that: presenting the generated solution to the user in an easy to understand and use manner;

providing a feedback mechanism: user feedback is collected for continuous optimization of knowledge maps and diagnostic systems.

The invention has the following beneficial effects:

the invention mainly extracts the association relation features among the entities for the middleware problem so as to improve the accuracy in the entity relation extraction process. Firstly, training a relation extraction model by adopting general data, then taking sentences of each middleware problem as input, and inputting the sentences into the model to extract entity triples with association relations in the sentences. In order to improve the accuracy of the relation extraction model on the middleware entity, an entity distance matrix is built for each sentence, then a middleware entity distance co-occurrence matrix, a middleware entity function similarity matrix and an entity association confidence coefficient matrix are calculated, and finally a middleware entity association degree matrix is obtained; providing corresponding reference information for the model when finally extracting the triples of the relationships among the entities, obtaining the triples with higher quality, and finally constructing a database and an intelligent inquiry system based on the knowledge graph.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a block diagram of a knowledge-based database and middleware problem intelligent diagnosis system according to an embodiment of the present invention;

fig. 2 is a flowchart of obtaining a middleware entity association matrix.

Detailed Description

In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following is a detailed description of the specific implementation, structure, characteristics and effects of the database and middleware problem intelligent diagnosis system based on the knowledge graph according to the invention, which are provided by the invention, with reference to the accompanying drawings and the preferred embodiment. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The invention provides a knowledge-graph-based database and a specific scheme of a middleware problem intelligent diagnosis system.

Referring to fig. 1, a block diagram of a knowledge-graph-based database and middleware problem intelligent diagnosis system according to an embodiment of the invention is shown, where the system includes: the system comprises a data acquisition module 101, a knowledge graph database construction module 102 and a middleware problem intelligent diagnosis module 103.

The data acquisition module 101 acquires training data of the relation extraction model, and documents of middleware problems.

The present embodiment uses the DuIE2.0 entity relationship data set as training data for the training relationship extraction model, duIE2.0 being the maximum schema-based Chinese relationship extraction data set of the industry. The data set is used to train a generic relational extraction model. And then acquiring data contents in the aspects of corresponding installation, configuration, use, fault elimination and the like through an official document of the middleware product, extracting the characteristics of the middleware problem and constructing a triplet of the middleware problem through a knowledge extraction model.

In the process of processing text data of Chinese, firstly, the text to be processed needs to be subjected to word segmentation, namely, chinese sentences are segmented into a plurality of independent words through a word segmentation tool, for example, input: [ i like chinese dish ], word segmentation result: [ I like, china, vegetable ], so that the sentence is divided into 4 words to better accord with the understanding of people on the sentence. But the results of word segmentation may vary with sentence context, granularity of word segmentation, etc., or as a result. The chinese word segmentation tool jieba is used here to segment all sentences. Meanwhile, words which are common but have no meaning in text analysis, such as 'yes', and the like, are removed according to a stop word list before the text is subjected to subsequent processing, wherein the stop word list adopts a Hadamard stop word list.

So far, training data and middleware problem text data with word segmentation and word removal disabled are obtained.

The knowledge graph database construction module 102 extracts corresponding features according to the acquired middleware problems and constructs a knowledge graph database.

In order to construct an intelligent inquiry system for middleware problems, the embodiment is realized by constructing a knowledge graph for middleware, wherein the core of the knowledge graph construction is to accurately extract the relationship and acquire a triplet of the relationship between entity pairs. It is generally necessary to identify all entities in a sentence by using named entity identification technology, and then extract corresponding relationships for different entities through a relationship classification model. However, since most entities are irrelevant, label imbalance problems occur during classification, and a large amount of labeling data is required.

A relationship extraction model is first trained using the duie2.0 entity relationship data set disclosed herein. The relation extraction model adopts a Bi-LSTM model, and training data is sentences after each word segmentation and entity information contained in the sentences, wherein the training data comprises triples for representing the relation between entity pairs. The input of the reasoning stage is a sentence after word segmentation, and the output is a triplet of the relation between entity pairs contained in each sentence.

During relation extraction, the accuracy of relation extraction is also affected by problems such as data scarcity, ambiguity, field specificity and the like, and in order to enable a relation extraction model to extract the relation among middleware problem entities more accurately, the accuracy of the relation extraction model is improved by constructing the relation degree of the middleware entities among the entities for the middleware problems.

1) And constructing an entity distance matrix of each sentence of the middleware problem.

For the text of each middleware related problem, labeling each sentence by adopting a named entity recognition model Bi-LSTM-CRF, extracting all named entities, and constructing a middleware entity setWherein->Representing an nth entity occurring in the sentence; and constructing an entity position sequence corresponding to the position of each entity in the sentence, wherein the expression is as follows: />Wherein->Representing the position of the nth entity present in the sentence. Considering that the same entity in a sentence may appear multiple times and have different positions, in order to ensure that each entity has a corresponding position, in this embodiment, the entities appearing multiple times in the same sentence are treated as different entities.

Since the likelihood of the entity existence relationship between different positions is different, namely, the likelihood of the entity existence relationship between the entities with the closer distance on the text distribution is higher, and conversely, the likelihood of the entity existence relationship between the entities with the farther distance is lower. Thus, each sentence is structured by its corresponding set of middleware entitiesBody distance matrixThe method is characterized by comprising the following steps:

wherein,elements representing the ith row and jth column of the entity distance matrix, i.e. the set of middleware entities +.>The distance between the i-th entity and the j-th entity; />Representing the position of an ith entity in the middleware entity set; />Representing the location of the jth entity in the set of middleware entities.

2) According to the middleware entity set and the entity distance matrixCalculating a middleware entity distance co-occurrence matrix +.>。

Since in the text of the middleware problem, the probability of the existence of the relationship between the entity pairs has a relationship with the probability of the co-occurrence of the entity pairs, that is, the greater the co-occurrence probability in the same sentence between two entities, the more likely the entity pairs have an association relationship.

The middleware entity distance co-occurrence matrix is calculated by counting the occurrence frequency of the middleware entity distance co-occurrence matrix in each document during calculation, and is specifically as follows:

in the method, in the process of the invention,representing the position +.>I.e., the entity distance co-occurrence value between the ith entity and the jth entity in the middleware entity set; />Representing a maximum function; />Representing the number of times the ith entity and the jth entity co-occur; />、/>The number of times that the i-th entity and the j-th entity occur individually is represented respectively; />The element representing the ith row and jth column in the entity distance matrix, i.e., the distance between the ith entity and the jth entity of the middleware entity set.

When the ratio of the frequency of the co-occurrence of the ith entity and the jth entity to the total occurrence frequency of the ith entity is larger, the entity distance co-occurrence value between the ith entity and the jth entity is larger, which means that the probability of the association relationship between the ith entity and the jth entity is larger, otherwise, the probability of the association relationship between the ith entity and the jth entity is smaller; meanwhile, when the two entities are closer in distribution, the entity distance co-occurrence value between the two entities is larger, the association relationship is more likely to exist between the two entities, and otherwise, the association relationship is less likely to exist between the two entities.

3) Based on the functional similarity between the middleware entities and the distance co-occurrence matrix of the middleware entitiesCalculating a middleware entity function similarity matrix +.>。

In the process of extracting the feature of the middleware entity, a certain degree of association relationship between the entity pairs is possible in consideration of the fact that the expressed functions between two different entities are the same or similar. Such as "order management systems" and "inventory management systems," which are functionally similar to the management systems, there is also a certain relationship between orders and inventory. Wherein the functional similarity of the entities can be expressed by their distance in the vector space, specifically as follows:

firstly, through a BERT pre-training language model, words of a middleware sentence are converted into corresponding word vectors, and each entity can obtain a corresponding entity word vector sequence, wherein words with the same semantic meaning, the same part of speech or related functions are relatively close in spatial distribution. It should be noted that the BERT pre-training language model is a known technology, and this embodiment is not repeated.

The two entities are then maximally matched. Since each entity may be composed of multiple words, when calculating the functional similarity between entities, it can be regarded as a bipartite graph problem, i.e. the words in each entity are one set, and then the two sets are maximally matched by using the hungarian algorithm. Wherein the cosine similarity of the two vectors is used as a criterion in the calculation. The hungarian algorithm is a known technique, and will not be described in detail in this embodiment.

Finally, calculating the functional similarity matrix of the middleware entityThe expression is as follows:

in the method, in the process of the invention,representing the position +.>Representing the functional similarity of entities between the ith and jth entities in the set of middleware entities; />Representing the position +.>Representing an entity distance co-occurrence value between an ith entity and a jth entity in the set of middleware entities; m represents the number of word pairs matched between two entities through the Hungary algorithm; />Representing a computed cosine similarity; />A word vector representing a kth matching word of an ith entity in the set of middleware entities; />A word vector representing a kth matching word of a jth entity in the set of middleware entities.

When the entity distance co-occurrence value of two entities is larger and the average cosine similarity of the maximum match between the entities is larger, the similarity of semantic functions and the like between the two entities is indicated, namely, the greater the similarity of the functions of the middleware entities is, the more likely the association relationship between the entity pairs is.

4) Constructing an entity association confidence coefficient matrix according to the upper and lower Wen Cixing between entity pairs。

Generally, words of verb or predicate parts around entities can provide how relationships between entities are generated, for example, when words around two middleware entities include words such as "visit", "monitor", "deploy", etc., the more explanatory that there is some association between two entities.

Based on this, part-of-speech tagging is performed on each sentence through a sequence tagging model CRF. The input of the model is a sentence sequence, the output is a corresponding part-of-speech tag sequence, and the tag sequence corresponds to the words of the input sentence one by one, namely, the part of speech of each corresponding position word is represented. It should be noted that, the sequence labeling model CRF is a known technology, and will not be described in detail in this embodiment.

Then, for step 1) the entity sets constructed for each sentenceEach entity is divided into a neighborhood window with 3 words around by taking the entity as a center, and an implementer can adjust the division range according to actual conditions. Constructing an entity association confidence matrix between entity pairs according to whether verbs and predicates appear in the window>The method is characterized by comprising the following steps:

wherein,representing the position in the entity-associated confidence matrix>The element represents the confidence that the relation exists between the ith entity and the jth entity of the middleware entity set E in the sentence; />Elements representing the ith row and the jth column in the entity distance matrix, namely the distances between the ith entity and the jth entity in the middleware entity set; />、/>The judgment coefficients of the ith and jth entities are respectively represented. The judgment coefficient is used for judging whether verbs or predicates exist in a neighborhood window of each entity in the sentence, if so, the judgment coefficient is marked as '1', and if not, the judgment coefficient is marked as '0'.

In the above formula, when the distance between two entity words is longer, the probability of the association relationship between the two entity words is weaker, namely the entity association confidence between the two entity words is lower; when verbs or predicates exist in the neighborhood window around the entity word, the probability of the existence of the association relationship between the entity word and the neighborhood window is greatly increased, namely when the values of the judgment coefficients of the ith entity and the jth entity are both 1, the entity association confidence coefficient between the two entities can be greatly increased.

5) According to the functional similarity matrix Q of the middleware entity and the entity association confidence matrixAnd extracting a middleware entity association degree matrix D of each sentence.

The co-occurrence matrix B of the distance between the middleware entities of each sentence can be obtained according to the co-occurrence condition between the entities and the distance between the entities, so that the functional similarity matrix Q of the middleware entities is obtained, and the entity association confidence level between each entity pair can be calculated through part-of-speech tagging, so that the entity association confidence level matrix based on the distance between the entity pairs and part-of-speech distribution is obtained。

Next, for each sentence, a corresponding middleware entity association matrix D is calculated, which is specifically calculated as follows:

wherein,elements representing the ith row and the jth column in the middleware entity association degree matrix represent the association degree between two middleware entities; />Representing the position +.>Representing the functional similarity of entities between the ith entity and the jth entity in the middleware entity set; />Representing positions +.>The element represents the confidence that the relation exists between the ith entity and the jth entity of the middleware entity set E in the sentence. The flow of obtaining the middleware entity association degree matrix is shown in fig. 2.

Triples were constructed from the Bi-LSTM relationship extraction model trained from the DuIE2.0 entity relationship dataset disclosed. The Bi-LSTM model is input into entity pairs and corresponding middleware entity association degrees thereof, and plays context (namely sentences per se), and output into association relation types between the entity pairs, wherein the types comprise data transfer relation, dependency relation, cooperative relation, control relation, monitoring relation and no relation. Finally, the entity pairs and the corresponding relations thereof form a triplet [ entity 1, relation, entity 2], and a database based on the knowledge graph is constructed based on the triplet.

The middleware problem intelligent diagnosis module 103 completes intelligent diagnosis of the middleware problem based on the knowledge graph database.

And constructing a more accurate intelligent middleware problem diagnosis system based on a database of the knowledge graph. The method comprises the following steps:

1) Designing a user interface: allowing the user to seek solutions by entering questions or describing phenomena. When the user interface is implemented, the embodiment uses HTML, CSS, javaScript to implement the construction of the front-end page, uses node. Js to implement the user request and logic processing, and implements the interaction and communication with the back-end data through the RESTful API. The implementer can design according to the actual requirement.

2) Analysis of problems: analyzing the problems input by the user, and extracting corresponding keywords, error information, environment configuration and other entities. In the embodiment, part-of-speech labeling is carried out on the problem description submitted by the user on the interactive interface, named body recognition is carried out through an LSTM-CRF model, and then keywords in sentences are extracted by adopting a textRank algorithm so as to be used for subsequent query based on a knowledge graph.

3) Inquiring the knowledge graph: and inquiring the keywords in the user problem by using the constructed knowledge graph to find out the entity and relation related to the user input problem. In this embodiment, the knowledge graph is used to describe various resources in the world by using RDF and is saved in the form of triples in the knowledge base, so that SPARQL is used to query RDF format data, and SPARQL is a query language and data acquisition protocol developed for RDF.

4) Knowledge reasoning: based on the entities and relationships in the knowledge graph, the system performs reasoning and analysis to find possible reasons and solutions. And acquiring the associated entity and the corresponding relation based on the knowledge graph database according to the entity and the relation related to the user input problem, and using a DKRL model to reason and combine the associated entity and the relation so as to generate a final solution. It should be noted that, the DKRL model is a prior art, and will not be described in detail in this embodiment. For example, "what is not networked with my computers, what is solved? The information of the medium keywords may be "computer is unable to be networked", the system may infer that the possible cause is network connection problem, and possible solutions include checking whether network cable is plugged in, restarting the router, checking network settings, etc.

5) Results show that: the generated solution is presented to the user in a manner that is easy to understand and use. In this embodiment, the answer is in text form, and the practitioner can select the result display method by himself.

6) Providing a feedback mechanism: user feedback is collected for continuous optimization of knowledge maps and diagnostic systems. In this embodiment, the reinforcement learning model is used to train the questions posed by the user, the results given by the system, and the feedback of the user, so as to update the structure of the optimized knowledge graph. It should be noted that the reinforcement learning model is a prior art, and is not described in detail in this embodiment.

Because the intelligent diagnosis system for the middleware problems mainly depends on the relation between the entities in the knowledge graph when searching the solution, the quality of the intelligent diagnosis system for the middleware problems directly influences the performance of the intelligent diagnosis system for the middleware problems by taking the triples containing the relation between the entities as the core for constructing the knowledge graph.

In summary, according to the embodiment of the invention, the system extracts the association relationship features between the entities for the middleware problem so as to improve the accuracy in the entity relationship extraction process. Firstly, training a relation extraction model by adopting general data, then taking sentences of each middleware problem as input, and inputting the sentences into the model to extract entity triples with association relations in the sentences. In order to improve the accuracy of the relation extraction model on the middleware entity, an entity distance matrix is built for each sentence, then a middleware entity distance co-occurrence matrix, a middleware entity function similarity matrix and an entity association confidence coefficient matrix are calculated, and finally a middleware entity association degree matrix is obtained; providing corresponding reference information for the model when finally extracting the triples of the relationships among the entities, obtaining the triples with higher quality, and finally constructing a database and an intelligent inquiry system based on the knowledge graph.

It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. The processes depicted in the accompanying drawings do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.

The foregoing description of the preferred embodiments of the present invention is not intended to be limiting, but rather, any modifications, equivalents, improvements, etc. that fall within the principles of the present invention are intended to be included within the scope of the present invention.

Claims

1. A knowledge-graph-based database and middleware problem intelligent diagnosis system, the system comprising:

the knowledge graph database construction module is used for training a Bi-LSTM relation extraction model according to training data; acquiring a middleware entity set based on sentences of the middleware problem text data; acquiring an entity distance matrix according to the position information of each element of the middleware entity set; acquiring a middleware entity distance co-occurrence matrix according to the occurrence frequency of each entity and other entities in text data and combining the entity distance matrix; converting words in the middleware problem sentences into word vectors by adopting a BERT pre-training language model; acquiring a middleware entity function similarity matrix according to cosine similarity among word vectors and a middleware entity distance co-occurrence matrix; acquiring an entity association confidence coefficient matrix according to word part-of-speech distribution in each entity neighborhood; acquiring a middleware entity association degree matrix according to the middleware entity function similarity matrix and the entity association confidence coefficient matrix; acquiring triples by combining Bi-LSTM model with middleware entity set and middleware entity association degree; completing construction of a knowledge graph database according to the triples, specifically taking each triplet as each data of the knowledge graph database;

the middleware problem intelligent diagnosis module is used for completing intelligent diagnosis of the middleware problem based on the constructed knowledge graph database;

the obtaining the entity distance matrix according to the position information of each element of the middleware entity set comprises the following steps:

taking a set formed by positions of each entity in sentences of the middleware entity set as an entity position set; taking the absolute value of the difference value of the positions between each entity and other entities as the distance between the two entities, and taking the distance as each element of an entity distance matrix;

the obtaining the functional similarity matrix of the middleware entity according to the cosine similarity among word vectors and the co-occurrence matrix of the distance between the middleware entities comprises the following steps:

in the method, in the process of the invention,representing the position +.>Is an element of (2); />Representing the position +.>Is an element of (2); />Representing a computed cosine similarity; />A word vector representing a kth matching word of an ith entity in the set of middleware entities; />Representing the first in a set of middleware entitiesThe word vector of the kth matching word of the j entities;

the obtaining the entity association confidence coefficient matrix according to the word part of speech distribution in each entity neighborhood comprises the following steps:

calculating the judging coefficient of the ith entity, and the sum value of the judging coefficient of the jth entity and 1; taking the product of the sum and the reciprocal of the corresponding entity matrix element as the corresponding element of the entity association confidence matrix, wherein i and j represent the serial numbers of the entities;

the obtaining the middleware entity association degree matrix according to the middleware entity function similarity matrix and the entity association confidence coefficient matrix comprises the following steps:

2. The knowledge-graph-based database and middleware problem intelligent diagnosis system according to claim 1, wherein training the Bi-LSTM relationship extraction model according to training data comprises:

3. The knowledge-graph-based database and middleware problem intelligent diagnosis system according to claim 1, wherein the sentence acquisition middleware entity set based on the middleware problem text data comprises:

4. The knowledge-graph-based database and middleware problem intelligent diagnosis system according to claim 1, wherein the obtaining the middleware entity distance co-occurrence matrix according to the frequency of occurrence of each entity and other entities in text data in combination with the entity distance matrix comprises:

counting the number of times the ith entity and the jth entity co-occurThe method comprises the steps of carrying out a first treatment on the surface of the Counting the number of times of independent occurrence of the ith entity and the jth entity respectively, and marking the number of times as: />、/>The method comprises the steps of carrying out a first treatment on the surface of the The middleware entity distance co-occurrence matrix has the expression:

in the method, in the process of the invention,representing the position +.>Is an element of (2); />Representing a maximum function;representing the elements of the ith row and jth column of the entity distance matrix.

5. The knowledge-graph-based database and middleware problem intelligent diagnosis system according to claim 1, wherein the obtaining the triples by combining the Bi-LSTM model with the middleware entity set and the middleware entity association matrix comprises:

6. The knowledge-graph-based database and middleware problem intelligent diagnosis system according to claim 1, wherein the knowledge-graph-based database is constructed to complete the intelligent diagnosis of the middleware problem, specifically:

analysis of problems: analyzing the problems input by the user, and extracting corresponding keywords, error information and environment configuration as entities;