CN117909754A

CN117909754A - Auxiliary power plant equipment defect elimination method and system based on twin neural network

Info

Publication number: CN117909754A
Application number: CN202311864162.5A
Authority: CN
Inventors: 谢黎; 范小兵; 杜辉; 谭小元; 米路中; 李锐; 贾顺杰; 李洋; 张艳萍; 王骞; 胡岿; 杨孝锐; 夏晶
Original assignee: National Energy Changyuan Suizhou Power Generation Co ltd; Wuhan University WHU; Guoneng Xinkong Internet Technology Co Ltd
Current assignee: National Energy Changyuan Suizhou Power Generation Co ltd; Wuhan University WHU; Guoneng Xinkong Internet Technology Co Ltd
Priority date: 2023-12-28
Filing date: 2023-12-28
Publication date: 2024-04-19

Abstract

The invention discloses a power plant equipment defect auxiliary elimination method and system based on a twin neural network. The method defines entity types and association modes in a power plant equipment defect knowledge graph, adopts entity identification and relation extraction technology in natural language processing to realize automatic structuring of power plant history defect case related data, takes 'equipment-history defect-discovery team-defect team-allocation step' as a core context to map entity relations into the power plant equipment defect knowledge graph, constructs and trains semantic features and semantic similarity of history defects, sudden defect hidden equipment and fault information in a twin neural network model Siamese-BERT calculation graph, and sorts the semantic features and semantic similarity into sudden defect intelligent recommendation adapted defect team and defect elimination flow. The invention solves the problems of low defect processing precision, poor timeliness and the like caused by the knowledge storage difference and the lack of first-line staff, thereby effectively improving the intelligentization level of the defect repair of power plant equipment.

Description

Auxiliary power plant equipment defect elimination method and system based on twin neural network

Technical Field

The invention belongs to the field of auxiliary defect elimination of power plant equipment, and particularly relates to an auxiliary defect elimination method and system of power plant equipment based on a twin neural network.

Background

The text data of the power plant contains rich field knowledge and experience knowledge, the field knowledge relates to the professional knowledge of the equipment in the aspect of equipment defect elimination, the experience knowledge is mainly reflected by historical defect records, how to acquire normalized information from various data, effectively correlate and integrate multi-source heterogeneous data, and utilize various knowledge information to assist the equipment defect elimination process of the power plant, so that the key for realizing the defect knowledge management and defect elimination intelligence of the power plant is realized.

Knowledge graph is a concept proposed by Google corporation in 2012, and aims to promote the intellectualization and efficiency of search engines. Knowledge maps have been used in recent years in a variety of industries including electric power. Unstructured and structured data such as a large number of equipment defect records, personnel work records and the like are generated in the production and working processes of the power plant, but the utilization efficiency of the data is low, and the functions of the historical data can be fully exerted by establishing a relation of the historical data in a knowledge mode through a knowledge graph, so that power plant staff can be better assisted in carrying out power plant knowledge inquiry, reasoning and defect elimination.

With the development of natural language processing technology, the reasoning and searching effects based on the semantics are also continuously improved. Word2vec and GloVe are the most commonly used Word vector models, on which text semantic vectors can be obtained by averaging the Word vectors of each Word in the text, on which similarity between the texts can be calculated, but such methods ignore Word ambiguity and Word order and context in the text; in recent years, the appearance of a general field pre-training language model such as BERT is improved obviously, but a great deal of professional vocabularies and equipment names are involved in text record data in the power plant field, so that the accuracy of similarity between defect description texts of power plant equipment is not high by directly adopting semantic vectors generated by the general field BERT, and therefore, the semantic features of the defect description texts of the equipment are calculated based on a twin neural network model Siamese-BERT trained in a defect text corpus of the power plant equipment, and the similarity between the defect description texts is calculated through the semantic features, so that accurate associated information is provided for recommending defect elimination groups and defect elimination flows of sudden defects.

Disclosure of Invention

In order to solve the defects in the prior art, the invention provides a power plant equipment defect auxiliary elimination method and system based on a twin neural network model, which utilize the semantic features of a defect description text of equipment calculated by using a twin neural network model Siamese-BERT, calculate the similarity between the defect description texts through the semantic features, further provide accurate associated information for recommending defect elimination groups and defect elimination processes of sudden defects, and solve the problem of low accuracy of calculating the similarity between the defect description texts of the power plant equipment in the prior art.

The invention adopts the following technical scheme.

The invention provides a power plant equipment defect auxiliary elimination method based on a twin neural network model, which comprises the following steps: step one, designing a power plant equipment defect knowledge graph pattern layer, and defining a core entity type and an association pattern; step two, building a power plant field dictionary based on the historical defect data and the equipment data, and identifying entities in the power plant defect related data by combining a method of a self-defined power plant field dictionary through a natural language processing tool jieba;

Thirdly, constructing an entity relation corpus, and performing fine adjustment on the BERT model;

Step four, recognizing the relation in the unstructured defect text by utilizing a trained BERT model, extracting entity relation by adopting direct mapping for structured data, and mapping the extracted and recognized entity and relation to a map database Neo4j by taking a 'equipment-history defect-discovery team-defect team-allocation step' as a core context according to a mode layer designed in the step one;

Fifthly, constructing and training a twin neural network model Siamese-BERT;

Inputting text description of the sudden defect, calculating semantic features of the text description of the sudden defect and semantic features of hidden equipment and fault information of the historical defect in the map by using a trained Siamese-BERT model, calculating cosine similarity between the two semantic features, obtaining semantic similarity of the two semantic features, sorting the historical defects according to the semantic similarity, and screening out the first ten historical defects with the highest semantic similarity to the sudden defect;

Extracting entity information in the text description of the burst defect by using jieba tools, sequentially searching in ten historical defects with highest semantic similarity, and stopping searching if the same entity information is searched in the first historical defect; if not, searching the next historical defect, and repeating the operation until ten historical defects are searched;

Step eight, if similar historical defects consistent with the entity information of the burst defect are retrieved from the first ten similar historical defects, recommending a defect eliminating group associated with the similar historical defects, and automatically generating a defect eliminating flow; if not, recommending a defect elimination group associated with similar historical defects with highest semantic similarity, and forming a defect elimination record of the sudden defect after the defect elimination group completes the processing of the sudden defect according to an automatically generated or manually assisted generated defect elimination scheme.

Preferably, the core entity types in the step one include: equipment, history defects, teams, allocation steps;

The association mode includes: "Equipment-historical Defect", "historical Defect-discovery team", "historical Defect-Defect team", "historical Defect-assignment step" and "personnel-team".

Preferably, the defect data in the second step includes: the defect list number, the defect name, the defect description, the defect list filling information, the defect discovery person, the notifier, the notified person and the processor information, the defect classification and the defect elimination.

Preferably, the third step includes:

The corpus consists of entity pairs, relationship types between entity pairs and defect text, according to 3:1:1 dividing the corpus into a training set, a verification set and a test set according to the corpus consisting of entity pairs, relation types among the entity pairs and the defect text, training to obtain feature vectors of the defect text, head entities and tail entities, and fine-tuning the BERT model to classify the relation among the entities.

Preferably, the structured defect text format is a relationship type-tail entity between a head entity-entity pair, the head entity and the tail entity constituting an entity pair.

Preferably, the classifying the relationships between the entities includes:

the final output formula after the model passes through the first layer of the added full-connection layer when the BERT model is finely adjusted is as follows:

H′_cls＝W₀[tanh(H₀)]+b₀ (1)

Wherein H' _cls、H′_e1、H′_e2 represents [ CLS ], a feature vector obtained by the head entity and the tail entity through the full connection layer, and [ CLS ] represents the feature vector of the defect text; tanh is an activation function, W ₀、W₁、W₂ represents a weight vector of [ CLS ], a weight vector of a head entity and a weight vector of a tail entity respectively, and b ₀、b₁、b₂ is a corresponding offset vector; i. j represents the beginning and ending positions of the header entity in the text; l and m represent the start position and end position of the tail entity; h _t denotes the feature vector of the t-th word in the text;

Splicing the obtained feature vectors to obtain a pair of feature vectors H _r＝[H′_cls,H′_e1,H′_e2 of the entity pair to be classified, calculating probability distribution p (y|H _r, theta) of the relationship type of the H _r vector by using a second full-connection layer and a Softmax function, and taking the relationship with the maximum probability value as the relationship between two entities:

Wherein y is various entity relations, θ is a parameter to be learned, and comprises a bias b and a weight matrix W Wherein N is the number of relation types, d is the dimension of the feature vector output by BERT, and cross entropy is adopted as a loss function during model training.

Preferably, the training process of the Siamese-BERT model is as follows: firstly, constructing a data set based on historical defect data, wherein the data set consists of defect pairs with labels, the labels are 0 and 1,0 indicate that two defect descriptions are not actually the same defect, and 1 indicates that the two defect descriptions are actually the same defect; pooling the output vectors of all characters in the two defect description texts by using a BERT model to obtain semantic representations u and v of the two defect description texts, wherein the probability of similarity of the defect texts is as follows:

o＝softmax(W_t[u,v,|u-v|]) (5)

wherein the weight parameters to be trained N is the dimension of sentence semantic features, k is the number of labels, the value is 2, two kinds of labels which are similar and dissimilar are respectively corresponding, and cross entropy is adopted as a training loss function of Siamese-BERT;

Calculating a loss value of Siamese-BERT, and stopping training if the loss value is smaller than a set threshold value; and if the loss value is not lower than the threshold value, adjusting the weight parameter, and continuing training until the threshold value is met, so as to obtain the Siamese-BERT model after training is completed.

The invention also provides a power plant equipment defect auxiliary eliminating system based on the twin neural network, which is a system used by the power plant equipment defect auxiliary eliminating method based on the twin neural network, and comprises the following steps:

a pattern layer for defining a core entity type and an association model;

the data layer is used for storing data;

The entity relation corpus is used for storing entity pairs, relation types among the entity pairs and defect texts;

The analysis layer is used for calculating the semantic similarity of the historical defects and the sudden defects, selecting the historical defects with the highest semantic similarity, and generating a defect elimination flow.

The invention also provides a terminal, which comprises a processor and a storage medium;

the storage medium is used for storing instructions;

the processor is operative according to the instructions to perform steps according to the method described above.

The invention also provides a computer readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor realizes the steps of the aforementioned method.

Compared with the prior art, the auxiliary power plant defect elimination method based on the twin neural network model has the advantages that compared with the prior art, the auxiliary power plant defect elimination method based on the twin neural network model is provided, the utilization rate of historical defect data is improved, and power plant workers can be effectively guided to achieve accurate and rapid elimination of sudden defects of power plant equipment. Aiming at the characteristics of unstructured, nonstandard and low association degree of the related data of the historical defects, the method defines entity types and association modes in a power plant equipment defect knowledge graph: the method comprises the steps of calculating semantic features and semantic similarity of hidden equipment and fault information in equipment defect description based on a twin neural network model Siamese-BERT, calculating text features by using the same set of parameters, adding a pooling operation in BERT output to obtain text semantic features with fixed size, and then using SoftmaxLoss as a loss function training model, namely firstly classifying by using Softmax, judging whether two sentences are similar or not, and then using a cross entropy loss function training model. The twin neural network model enhances the semantic features of the defect text of the power plant equipment, and can accurately and efficiently search the most similar historical defects for the sudden defects so as to recommend proper defect elimination groups and defect elimination flows.

The method and the device solve the problems of text semantic feature extraction and similar defect retrieval in the electric power field by combining a twin neural network to use Softmax on a downstream text semantic extraction task in the electric power field.

Drawings

FIG. 1 is a flow chart of a power plant equipment defect auxiliary elimination method based on a twin neural network model.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. The described embodiments of the application are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art without making any inventive effort, are within the scope of the present application.

Embodiment 1 of the present invention provides a method for auxiliary defect elimination of power plant equipment based on a twin neural network model, the implementation flow chart of which is shown in fig. 1, the method comprises the following steps:

Step one, designing a power plant equipment defect knowledge graph pattern layer, and defining a core entity type and an association pattern;

The association mode includes: "Equipment-History Defect", "History Defect-discovery team", "History Defect-Defect team", "History Defect-assignment step" and "personnel-team";

The core entity types include: equipment, historical defects, teams, allocation steps, etc.;

The relationship types include: the relationship, composition relationship, allocation relationship, etc.;

each type of entity comprises two basic attributes, namely id and name;

The allocation step is a sequence attribute indicating the execution order of the steps.

Step two, a power plant field dictionary is made based on the defect data and the equipment data, and entities in the power plant defect related data are identified by combining a method of the self-defined power plant field dictionary through a natural language processing tool jieba;

The defect data includes: defect list number, defect name, defect description, whether it belongs to leakage point defect, defect list filling person information, defect discovery person, notifier, notified person and processor information (including group and department), defect classification, defect elimination condition, etc.

Thirdly, constructing an entity relation corpus, performing fine-tuning (BERT) model, wherein the corpus consists of entity pairs, relation types among the entity pairs and defect texts, such as a sentence (m number) described for historical defect cases, and the corpus is: [ (flushometer-constitute-m x, m x flushometer strong vibration), (m x flushometer strong vibration-relate to-m x meter, m x flushometer strong vibration), (m x flushometer strong vibration-relate to-flushometer, m x flushometer strong vibration) ] according to 3:1: the scale of 1 divides the corpus into a training set, a verification set and a test set,

Taking (rinse pump-constitute-m, m is the strong vibration of rinse pump) as an example, "rinse pump" as a head entity, "m is the tail entity," head entity and tail entity constitute entity pair, "constitute" is the type of relationship between entity pair, "m is the strong vibration of rinse pump" as a defect text; training according to entity pairs and a corpus composed of the entity pairs and the defect text to obtain feature vectors of the defect text, the head entity and the tail entity. The fine tuning model allows it to classify relationships between entities: the final output formula after the model passes through the first layer of the added full-connection layer when the BERT model is finely adjusted is as follows:

H′_cls＝W₀[tanh(H₀)]+b₀ (1)

Wherein H' _cls、H′_e1、H′_e2 represents [ CLS ], a feature vector obtained by the head entity and the tail entity through the full connection layer, and [ CLS ] represents the feature vector of the defect text; tanh is an activation function, W ₀、W₁、W₂ represents a weight vector of [ CLS ], a weight vector of a head entity and a weight vector of a tail entity respectively, and b ₀、b₁、b₂ is a corresponding offset vector; i. j represents the beginning and ending positions of the header entity in the text; l and m represent the start position and end position of the tail entity; h _t denotes the feature vector of the t-th word in the text. And splicing the obtained feature vectors to obtain the entity pair feature vector H _r＝[H′_cls,H′_e1,H′_e2 to be classified. Calculating probability distribution p (y|H _r, theta) of the relationship type to which the H _r vector belongs through a second full-connection layer and a Softmax function, and taking the relationship with the maximum probability value as the relationship between two entities:

Wherein y is various entity relations, θ is a parameter to be learned, and comprises a bias b and a weight matrix W Where N is the number of relationship types and d is the dimension of the feature vector of the BERT output. Cross entropy is used as a loss function in model training.

Step four, identifying the relationship in the unstructured defect text by utilizing the BERT model trained in the step three, extracting entity relationship in a direct mapping mode for the structured data in the defect list, mapping the extracted and identified entity and relationship to a map database Neo4j by taking a 'equipment-history defect-discovery group-defect group-allocation step' as a core context corresponding entity and relationship type according to a mode layer designed in the step one, and completing the construction of a power plant equipment defect knowledge graph data layer;

fifthly, constructing and training a twin neural network model Siamese-BERT, wherein the specific process of model training is as follows: firstly, constructing a data set based on historical defect data, wherein the data set consists of defect pairs with labels, the labels are 0 and 1,0 indicate that two defect descriptions are not actually the same defect, and 1 indicates that the two defect descriptions are actually the same defect; pooling the output vectors of all characters in the two defect description texts by using a BERT model to obtain semantic representations u and v of the two defect description texts, wherein the probability of similarity of the defect texts is as follows:

o＝softmax(W_t[u,v,|u-v|]) (5)

wherein the weight parameters to be trained N is the dimension of the semantic features of the sentence, k is the number of tags, and the value is 2, and the two tags which are similar and dissimilar are respectively corresponding. Cross entropy is used as a training loss function for Siamese-BERT.

And step six, calculating semantic features of the sudden defect text description and semantic features of hidden equipment of historical defects and fault information in the map by using the trained Siamese-BERT model, and calculating cosine similarity between the two semantic features to obtain the semantic similarity of the two semantic features. The first 10 historical defects most similar to the burst defect are obtained after sorting.

If the sudden defect is described as the semantic feature obtained by model calculation of'm-number x strong vibration of the flushing water pump', then calculating the semantic similarity with the historical defects in the map to obtain the first 10 historical defects with the highest similarity (the similarity is from large to small): the "n-meter rinsing pump vibration is large", "n-meter industrial pump vibration is large", "k-meter water pump vibration is large", "n-meter water pump vibration is strong", "k-meter water pump vibration is large", "n-meter water pump noise is large", vibration is large ". The m, n, k and symbols in this example are for illustrative purposes only and are not true device numbers or device names.

Step seven: extracting two devices of'm' and 'flushing water pump' of the burst defect description'm' in the vibration intensity of a flushing water pump by a jieba tool loading a custom dictionary, sequentially searching'm' and 'flushing water pump' in the most similar 10 history defect case-related devices, and stopping searching if the same device is searched in the 1 st similar history defect, wherein the defect elimination group and the defect elimination flow of the similar history defect are applicable to the burst defect; if the same equipment is not searched in the 1 st similar defect, searching in the 2 nd similar historical defects, and the like, until the same equipment is searched in a certain similar historical defect or all the most similar 10 historical defects are searched.

Step eight: according to the retrieval result of the step seven, when the equipment which is the same as the burst defect is retrieved from a certain similar historical defect, recommending the defect elimination group associated with the similar historical defect and automatically generating a defect elimination flow; if the same equipment as the sudden defect is not searched in the first 10 similar historical defects, the defect eliminating mode of the first 10 similar historical defects is not suitable for the sudden defect, so that the defect eliminating flow needs to be generated manually in an auxiliary mode under the condition, but the defect eliminating group can be recommended according to the 1 st similar historical defect. And after the defect elimination group completes the processing of the burst defect according to the defect elimination scheme generated automatically or with the aid of manpower, forming a defect elimination record of the burst defect.

Embodiment 2 of the present invention provides a system for auxiliary power plant defect elimination based on a twin neural network, where the system is a system used in the auxiliary power plant defect elimination method based on a twin neural network, and the system includes:

a pattern layer for defining a core entity type and an association model;

the data layer is used for storing data;

Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims

1. A power plant equipment defect auxiliary elimination method based on a twin neural network model comprises the following steps:

Step two, building a power plant field dictionary based on the historical defect data and the equipment data, and identifying entities in the power plant defect related data by combining a method of a self-defined power plant field dictionary through a natural language processing tool jieba;

Fifthly, constructing and training a twin neural network model Siamese-BERT;

2. The auxiliary power plant equipment defect elimination method based on the twin neural network according to claim 1, wherein the method comprises the following steps of:

The core entity types in the first step include: equipment, history defects, teams, allocation steps;

3. The auxiliary power plant equipment defect elimination method based on the twin neural network according to claim 1, wherein the method comprises the following steps of:

the defect data in the second step includes: the defect list number, the defect name, the defect description, the defect list filling information, the defect discovery person, the notifier, the notified person and the processor information, the defect classification and the defect elimination.

4. The auxiliary power plant equipment defect elimination method based on the twin neural network according to claim 1, wherein the method comprises the following steps of:

The third step comprises the following steps:

5. The auxiliary power plant equipment defect elimination method based on the twin neural network according to claim 4, wherein the auxiliary power plant equipment defect elimination method is characterized in that:

The structured defect text format is a relationship type-tail entity between a head entity-entity pair, the head entity and the tail entity forming an entity pair.

6. The auxiliary power plant equipment defect elimination method based on the twin neural network according to claim 5, wherein the auxiliary power plant equipment defect elimination method is characterized in that:

the classifying the relationship between the entities comprises:

H′_cls＝W₀[tanh(H₀)]+b₀ (1)

7. The auxiliary power plant equipment defect elimination method based on the twin neural network according to claim 6, wherein the method comprises the following steps:

The training process of the Siamese-BERT model is as follows: firstly, constructing a data set based on historical defect data, wherein the data set consists of defect pairs with labels, the labels are 0 and 1,0 indicate that two defect descriptions are not actually the same defect, and 1 indicates that the two defect descriptions are actually the same defect; pooling the output vectors of all characters in the two defect description texts by using a BERT model to obtain semantic representations u and v of the two defect description texts, wherein the probability of similarity of the defect texts is as follows:

o＝softmax(W_t[u,v,|u-v|]) (5)

8. A twin neural network-based auxiliary power plant defect removal system for use in the twin neural network-based auxiliary power plant defect removal method of claims 1-7, comprising:

a pattern layer for defining a core entity type and an association model;

the data layer is used for storing data;

9. A terminal comprising a processor and a storage medium; the method is characterized in that:

the storage medium is used for storing instructions;

The processor being operative according to the instructions to perform the steps of the method according to any one of claims 1-8.

10. Computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any of claims 1-8.