CN117708280A

CN117708280A - Knowledge graph-based intelligent retrieval method and system for power transmission work ticket

Info

Publication number: CN117708280A
Application number: CN202311588704.0A
Authority: CN
Inventors: 徐勇; 李军; 韩波; 陈诚; 赵三虎; 万宇; 刘远伟; 包成; 林芝茂
Original assignee: Yangzhou Power Supply Branch Of State Grid Jiangsu Electric Power Co ltd
Current assignee: Yangzhou Power Supply Branch Of State Grid Jiangsu Electric Power Co ltd
Priority date: 2023-11-27
Filing date: 2023-11-27
Publication date: 2024-03-15
Anticipated expiration: 2043-11-27
Also published as: CN117708280B

Abstract

The invention discloses an intelligent retrieval method and system for a power transmission work ticket based on a knowledge graph, and particularly relates to the technical field of knowledge graphs.

Description

Knowledge graph-based intelligent retrieval method and system for power transmission work ticket

Technical Field

The invention relates to the technical field of big data, in particular to an intelligent retrieval method and system for a power transmission work ticket based on a knowledge graph.

Background

With the development of the power industry, the voltage level of power transmission is higher and higher, and the wiring of each stage of power grid is also more and more complex. The traditional manual billing is slow in billing speed and easy to error, and can not meet the requirements of modern work. Compared with the traditional manual billing, the work ticket system combined with the computer technology has the advantages of short billing time, billing standard, safety assurance and the like, and is an important way for realizing automatic informatization in the power industry.

How to process the transmission work ticket and realize quick search is a current problem.

Disclosure of Invention

In order to overcome the defects in the prior art, the embodiment of the invention provides an intelligent power transmission work ticket searching method based on a knowledge graph, which improves the searching accuracy and efficiency by constructing a knowledge graph covering related information of the power transmission work ticket so as to solve the problems in the background art.

In order to achieve the above purpose, the invention provides a power transmission work ticket intelligent retrieval method based on a knowledge graph, which specifically comprises the following steps:

101. establishing an entity and attribute relationship thereof by constructing a knowledge graph covering relevant information of the power transmission work ticket;

102. entity identification and extraction are carried out on the transmission work ticket text, and the transmission work ticket text corresponds to the entity in the knowledge graph;

103. analyzing the query statement input by the user, and identifying keywords, entities and attributes thereof;

104. using the relation and attribute information between the entities in the knowledge graph to find the relevant knowledge points of the power transmission working ticket matched with the query condition;

105. generating a power transmission work ticket list meeting the requirements according to the query result, and sequencing the result according to a sequencing strategy;

106. and presenting the basic information of the work ticket to the user according to the ordered query result in a list format.

In a preferred embodiment, in step 101, information related to a power transmission working ticket is obtained from a data source by using a programming technology, the power transmission working ticket data is converted into an RDF model, the power transmission working ticket data is analyzed, an entity, a relation and an attribute triplet are built by traversing the power transmission working ticket data, and a knowledge graph covering the information related to the power transmission working ticket is constructed, which specifically includes the following contents:

s1, converting power transmission work ticket data into an RDF model, analyzing the power transmission work ticket data, determining entities needing to be converted, including equipment, work tasks, personnel, work tickets, auditors, substations and power transmission line entities, and assigning unique URI identifiers to each entity and attribute, wherein the corresponding related attributes comprise work ticket numbers, work contents and executive names;

s2, building a triplet by traversing the power transmission work ticket data, and converting each entity and attribute into a triplet composed of a subject, a predicate and an object in the RDF model, wherein the subject is a URI of a corresponding entity, the predicate is a URI of a corresponding attribute, and the object is a URI of an associated entity;

s3, constructing a power knowledge graph according to the information of the entity, the relation and the attribute, organizing the power knowledge graph in the form of nodes and edges to form a directed graph structure, and quickly retrieving the related nodes and edges according to conditions by establishing indexes in the entity and the relation in the knowledge graph, wherein the specific steps are as follows:

step 1, entity index: selecting a plurality of attributes as index keys, and mapping the values of the attributes with the entity nodes to quickly locate related entity nodes;

step 2, relation index: and selecting a plurality of attributes as index keys, and mapping the values of the attributes and the relationship edges to quickly locate the related relationship edges.

In a preferred embodiment, in step 102, entity recognition and extraction are performed on the transmission ticket text, and the entity in the text is recognized, including the equipment name, the task, and the staff, and corresponds to the entity in the knowledge graph, which specifically includes the following contents:

s1, entity identification: the method comprises the steps of taking an input text and a corresponding word segmentation label and a labeled Chinese word segmentation data set as training data, extracting features from the training data, wherein the features comprise word vectors, parts of speech and context information which are used for training a model, and carrying out entity recognition, and specifically comprises the following steps:

step 1, data preparation: the input text and the corresponding word segmentation labels are used for representing the boundary position of each word, B represents the initial position of the word, I represents the middle position of the word, O represents the outside of the word, and a marked Chinese word segmentation data set is obtained and used as training data;

step 2, feature extraction: extracting features from training dataThe word embedding method comprises the steps of using word vectors, part of speech and context information for training models, converting words into vector representations by using word embedding technology, and randomly taking out two word vectors x and y, wherein the word vectors of x are expressed as (x) ₁ ,x ₂ ,....x _n ) The word vector of y is expressed as (y ₁ ,y ₂ ,...,y _n ) For the word vectors x and y, the similarity between the word vectors is represented by calculating the straight line distance between the two vectors, and the specific calculation formula is as follows:

wherein P represents a point (x ₁ ,x ₂ ,....x _n ) And point (y) ₁ ,y ₂ ,...,y _n ) Euclidean distance between them; the |X| is the point (X ₁ ,x ₂ ,....x _n ) The closer the Euclidean distance to the origin is to 0, the more similar the calculation result of the Euclidean distance is, the more similar the two vectors are;

s2, extracting entity relation: selecting a machine learning algorithm of a perception machine, constructing a model and training, wherein in the training process, input characteristics are used as input of the model, predicted boundary labels are output, and the relation among entities is predicted, and specifically comprises the following steps:

step 1, a perceptron receives an input vector x, performs linear weighted summation on the input vector x and a weight vector w, judges an output result through an activation function, and a linear weighted summation formula is specifically as follows:

S＝w ₁ ×x ₁ +w ₂ ×x ₂ +...+w _n ×x _n

wherein S represents the result of summation, w ₁ ,w ₂ ,...,w _n Representing weights, x ₁ ,x ₂ ,...,x _n Representing the corresponding numerical value.

A step function is a commonly used activation function that maps an input value to one of two discrete output values, and is defined as follows:

the step function produces a sudden change when the input reaches a critical point, from 0 to 1 and from 1 to 0.

Step 2, initializing a weight vector w and a bias b, and calculating a predicted output value for each sample (x, y), wherein the specific formula is as follows:

y_hat＝sign(w·x+b)

w＝w+η×y×x

b＝b+η×y

wherein x represents an input feature vector, y represents a label (1 or-1), and eta represents a learning rate; predicting correct y_hat=y, continuing the next sample, predicting incorrect y_hat is not equal to y, updating weight vector and bias, and continuing to iteratively update the current sample until the prediction is correct;

s3, the entity and the relation extracted from the text are structurally represented and stored in a relation database, wherein the relation database is a database management system based on a relation model, and data are organized and stored by using tables, and each table comprises rows and columns.

In a preferred embodiment, in step 103, the query sentence input by the user is parsed, the keywords, the entities and the attributes thereof are identified, the model parameters are updated by adopting a back propagation algorithm, and the intention of the query is accurately predicted, which specifically includes the following contents:

s1, data preprocessing: preprocessing a text, and removing redundant spaces, punctuation marks and special characters to obtain a text data set;

s2, grammar analysis: dividing a query sentence into individual words, labeling each word with the part of speech of the word, including verbs, nouns and adjectives, updating model parameters by adopting a back propagation algorithm, and accurately predicting the intention of the query, wherein the method specifically comprises the following steps:

step 1, forward propagation: inputting text data into a CNN model, calculating and storing an output result of each layer by layer, and calculating a final prediction result, wherein a specific calculation formula is as follows:

wherein,representing the final prediction result, the argmax function represents the value of x such that f (x) takes the maximum value, f (x) representing the output of the CNN model;

step 2, calculating a loss function: comparing the predicted result with a real label, and calculating the value of the loss function, wherein the specific calculation formula is as follows:

where MSE represents the mean squared error, Σ represents the sum of all samples, (y) _i -h _θ (x _i )) ² Representing the square of the difference between the true value and the predicted value;

step 3, back propagation: starting from the last layer, the gradient is transferred forward from the output layer to each layer by using the chain rule, the gradient of each layer is calculated according to the weight of the parameter and the derivative of the activation function, the gradient of each layer is set to be composed of L layers from the input layer to the output layer, the input of the first layer is a [ L-1], the output is a [ L ], the function of the output a [ L ] of the output layer is loss=f (a [ L ]), and the specific calculation formula is as follows:

wherein,is the derivative of the activation function of layer I with respect to the input, -/->Is the gradient transferred by the gradient of the layer 1;

step 4, parameter updating: according to the gradient obtained by calculation, updating parameters of the model, and controlling the parameter updating amplitude by multiplying a learning rate so as to avoid quick updating;

step 5, repeating training: repeating steps 1 to 4, and performing multiple iterations by using different training samples until the set training round number is reached.

In a preferred embodiment, in step 104, according to the query condition obtained by analysis, query is performed in a knowledge graph, and the related knowledge points of the power transmission working ticket matched with the query condition are found by using the relationship and attribute information between the entities in the knowledge graph, which specifically includes the following contents:

s1, entity relation expansion: according to the relativity of the power transmission working tickets, carrying out relation expansion on the found entities, searching the entities related to the power transmission working tickets, gradually expanding the relation between the entities related to the power transmission working tickets, extracting richer associated knowledge points, and specifically comprising the following steps:

step 1, analyzing the relationship types among entities: through statistics of the relation between the entities in the knowledge graph, the relation types between the found entities and the power transmission work ticket are analyzed, and the method specifically comprises the following steps:

applicant/writer: the applicant/writer of the power transmission working ticket is responsible for submitting the relevant information of the application and filling of the working ticket;

approver/auditor: the approver/auditor of the power transmission working ticket performs approval and audit on the content of the power transmission working ticket and decides whether to approve the working ticket;

executor/operator: personnel actually executing the power transmission work perform the work according to the instructions in the work ticket and ensure that the task is completed according to a specified program;

step 2, searching and discovering new associated entities: searching for other entities associated with the found entity based on the known relationship type;

step 3, expanding a relation path: according to the searched new entity, associating the new entity with the known entity to form a new relation path;

step 4, screening and verifying relation: screening and verifying the expanded relationship according to the domain knowledge, removing irrelevant relationship, and ensuring that the expanded relationship has rationality and accuracy;

s2, filtering attributes: according to the attribute limit in the query condition, filtering out the entity and relation meeting the condition, sending the query request to the knowledge graph, obtaining the data containing the related entity and the attribute thereof, extracting the attribute value corresponding to the attribute condition from the query result, matching the extracted attribute value by the Trie algorithm, judging whether the attribute condition is met, filtering out the entity meeting the condition according to the result of the attribute matching, retaining the successfully matched entity, and removing the entity not meeting the attribute condition, which comprises the following steps:

step 1, splitting each character string into individual characters, constructing the nodes of the tree according to the sequence, starting from a root node, each node represents one character, and the path from the root node to leaf nodes is a complete character string.

Step 2, node structure: each node comprises a character, an array of pointers to child nodes, and a marker indicating the end of the node as a string;

step 3, inserting operation: starting from the root node, inserting nodes step by step according to the character sequence of the character string until the last character of the character string exists, continuing to insert the child nodes corresponding to the characters downwards, and creating no new node;

step 4, searching operation: starting from the root node, matching nodes step by step according to the character sequence of the target character string, successfully matching all characters, and marking the end of the character string by the node where the last character is positioned, thereby indicating that the inquiry is successful;

step 5, prefix matching: all strings with the specified prefix are looked up, the prefix is matched starting from the root node until the last character of the prefix string.

In a preferred embodiment, in step 105, a power transmission ticket list meeting requirements is generated according to the query result, the query result is filtered according to the requirements, and the ticket data meeting the requirements are screened, and the results are ordered, which specifically includes the following contents:

s1, acquiring a query result: acquiring queried power transmission work ticket data from a database, and ensuring that the data contains required fields including work ticket numbers, work contents and responsible data information;

s2, filtering data: filtering the query result according to the requirement, selecting only the work ticket data meeting the conditions, and screening according to the date range, the work content keywords and the responsible person conditions, wherein the method specifically comprises the following steps of:

date range: comparing the date of the work ticket with the appointed starting date and ending date, and selecting only the work ticket data within the range;

work content keywords: searching the work ticket data containing specific keywords in the work content for screening;

responsible person: selecting work ticket data of a specific responsible person according to the name of the responsible person;

s3, sequencing data: and sequencing the filtered work ticket data, and arranging the work ticket data in ascending order according to the work ticket number and date fields.

In a preferred embodiment, in step 106, a request is sent through an API interface, a ranked query result is obtained, response data returned by the API is parsed, information to be displayed is extracted, and the ranked query result is presented with basic information of a work ticket according to a list format and is displayed to a user, and specifically includes the following contents:

s1, sending an API request and obtaining a result: an API request is sent, request parameters are constructed according to documents and requirements of the API, the request is ensured to be sent to a correct URL, and a correct HTTP method is designated;

s2, analyzing response data returned by the API: analyzing according to the data format returned by the API, and converting the response data into a list;

s3, displaying the query result to the user: presenting basic information of the work ticket to the user according to the ordered query result and the list format, wherein the method specifically comprises the following steps:

step 1, determining the header of a list: determining the title of each column in the list, and displaying the title in the final list;

step 2, constructing a data line of a list: traversing the ordered query results, acquiring the related information of each work ticket row by row, and organizing the data into a row according to the sequence of the table head;

step 3, outputting a list: and outputting the constructed data lines in sequence, and displaying a work ticket list on a console by using a table form.

An intelligent power transmission work ticket retrieval system based on a knowledge graph comprises:

the knowledge graph construction module is used for constructing a knowledge graph covering the related information of the power transmission working ticket and establishing an entity and attribute relationship thereof;

the entity identification and extraction module is used for carrying out entity identification and extraction on the transmission work ticket text and corresponds to the entity in the knowledge graph;

the query analysis module is used for analyzing the query statement input by the user and identifying the keywords, the entities and the attributes thereof;

the knowledge graph query module is used for finding out related knowledge points of the power transmission work ticket matched with the query condition by utilizing the relation and attribute information among the entities in the knowledge graph;

the result generation and sorting module is used for generating a power transmission work ticket list meeting the requirements according to the inquired result and sorting the result according to a sorting strategy;

and the visual display module is used for displaying the basic information of the work ticket to the user according to the list format of the ordered query result.

In operation, the invention acquires the related information of the power transmission working ticket from a data source, traverses the power transmission working ticket data to establish an entity, relation and attribute triplet, constructs a knowledge graph covering the related information of the power transmission working ticket, carries out entity identification and extraction on the text of the power transmission working ticket, corresponds to the entity in the knowledge graph, analyzes the query statement input by a user, updates the model parameters by adopting a back propagation algorithm, accurately predicts the query intention, finds the related knowledge points of the power transmission working ticket matched with the query condition by utilizing the relation and attribute information among the entities in the knowledge graph, generates a power transmission working ticket list meeting the requirement according to the query result, presents the basic information of the working ticket according to the list format after the sequencing, and displays the basic information to the user, thereby realizing quick retrieval of a large number of power transmission working ticket data and improving the retrieval efficiency.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

fig. 2 is a functional block diagram of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

The embodiment provides an intelligent power transmission work ticket retrieval method based on a knowledge graph, as shown in fig. 1, which specifically comprises the following steps:

The method comprises the following specific steps:

further, the related information of the power transmission working ticket is obtained from a data source by utilizing a programming technology, the power transmission working ticket data is converted into an RDF model, the power transmission working ticket data is analyzed, the power transmission working ticket data is traversed to establish entity, relation and attribute triplets, and a knowledge graph covering the related information of the power transmission working ticket is constructed, and the method specifically comprises the following steps:

further, entity identification and extraction are carried out on the transmission work ticket text, and the entity in the text is identified, wherein the entity comprises a device name, a work task and a worker, and corresponds to the entity in the knowledge graph, and the entity identification method specifically comprises the following steps:

step 2, feature extraction: extracting features from training data, including word vectors, part of speech, context information, for training models, converting words into vector representations using word embedding techniques, randomly extracting two word vectors x and y, the word vector representation of x being (x) ₁ ,x ₂ ,....x _n ) The word vector of y is expressed as (y ₁ ,y ₂ ,...,y _n ) For the word vectors x and y, the similarity between the word vectors is represented by calculating the straight line distance between the two vectors, and the specific calculation formula is as follows:

wherein P represents a point (x ₁ ,x ₂ ,....x _n ) And point (y) ₁ ,y ₂ ,...,y _n ) Euclidean distance between them; the |X| is the point (X ₁ ,x ₂ ,....x _n ) The closer the Euclidean distance to the origin is calculated to be0, the more similar the two vectors are;

S＝w ₁ ×x ₁ +w ₂ ×x ₂ +...+w _n ×x _n

y_hat＝sign(w·x+b)

w＝w+η×y×x

b＝b+η×y

further, analyzing the query sentence input by the user, identifying the keyword, the entity and the attribute thereof, updating the model parameter by adopting a back propagation algorithm, and accurately predicting the intention of the query, wherein the method specifically comprises the following steps:

where MSE represents the mean squared error, Σ represents summing all samples，(y _i -h _θ (x _i )) ² Representing the square of the difference between the true value and the predicted value;

further, according to the query conditions obtained by analysis, querying is performed in the knowledge graph, and the related knowledge points of the power transmission working ticket matched with the query conditions are found by utilizing the relation and attribute information among the entities in the knowledge graph, wherein the related knowledge points comprise the following contents:

further, according to the query result, generating a power transmission working ticket list meeting the requirements, filtering the query result according to the requirements, screening working ticket data meeting the requirements, and sequencing the results, wherein the power transmission working ticket list comprises the following specific contents:

106. Presenting basic information of the work ticket to the user according to the ordered query result and the list format;

further, a request is sent through an API interface, a sequenced query result is obtained, response data returned by the API is analyzed, information to be displayed is extracted, the sequenced query result presents basic information of a work ticket according to a list format and is displayed to a user, and the method specifically comprises the following steps:

As shown in fig. 2, the intelligent power transmission work ticket retrieval system based on the knowledge graph specifically comprises a knowledge graph construction module, an entity identification and extraction module, a query analysis module, a knowledge graph query module, a result generation and sorting module and a visual display module;

knowledge graph construction module: acquiring information related to the power transmission working ticket from a data source by utilizing a programming technology, converting the power transmission working ticket data into an RDF model, analyzing the power transmission working ticket data, traversing the power transmission working ticket data to establish entity, relation and attribute triples, and constructing a knowledge graph covering the information related to the power transmission working ticket;

entity recognition and extraction module: entity identification and extraction are carried out on the transmission work ticket text, and the entity in the text, including equipment name, work task and staff, is identified and corresponds to the entity in the knowledge graph;

and a query analysis module: analyzing the query statement input by the user, identifying the key words, the entities and the attributes thereof, updating the model parameters by adopting a back propagation algorithm, and accurately predicting the query intention;

knowledge graph query module: according to the query conditions obtained by analysis, querying in a knowledge graph, and finding out related knowledge points of the power transmission working ticket matched with the query conditions by utilizing the relation and attribute information among entities in the knowledge graph;

and a result generation and sorting module: generating a power transmission working ticket list meeting the requirements according to the query results, filtering the query results according to the requirements, screening working ticket data meeting the requirements, and sequencing the results;

visual display module: and sending a request through an API interface, acquiring a sequenced query result, analyzing response data returned by the API, extracting information to be displayed, and displaying the sequenced query result to a user according to basic information of the work ticket in a list format.

According to the invention, the knowledge graph covering the information related to the power transmission working ticket is constructed, and the structural representation of the entity, the attribute and the relation is established, so that the intention of the user query is more accurately understood, and the accuracy and the efficiency of the retrieval are improved.

The formula in the invention is a formula which is obtained by removing dimension and taking the numerical calculation, and is closest to the actual situation by acquiring a large amount of data and performing software simulation, and the preset proportionality coefficient in the formula is set by a person skilled in the art according to the actual situation or is obtained by simulating the large amount of data.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Finally: the foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. An intelligent retrieval method of a power transmission work ticket based on a knowledge graph is characterized by comprising the following steps of: the method specifically comprises the following steps:

2. The knowledge-graph-based intelligent retrieval method for the power transmission working tickets according to claim 1, wherein the method is characterized by comprising the following steps of: in step 101, information related to a power transmission working ticket is obtained from a data source, power transmission working ticket data is converted into an RDF model, the power transmission working ticket data is analyzed, entity, relation and attribute triples are built through traversing the power transmission working ticket data, and a knowledge graph covering the information related to the power transmission working ticket is constructed.

3. The knowledge-graph-based intelligent retrieval method for the power transmission working tickets according to claim 1, wherein the method is characterized by comprising the following steps of: in step 102, entity recognition and extraction are performed on the transmission work ticket text, and the entity in the text, including the equipment name, the work task and the staff, is recognized and corresponds to the entity in the knowledge graph.

4. The knowledge-graph-based intelligent retrieval method for the power transmission working tickets according to claim 1, wherein the method is characterized by comprising the following steps of: in step 103, the query sentence input by the user is parsed, the keywords, the entities and the attributes thereof are identified, and the model parameters are updated by adopting a back propagation algorithm, so that the query intention is accurately predicted.

5. The knowledge-graph-based intelligent retrieval method for the power transmission working tickets according to claim 1, wherein the method is characterized by comprising the following steps of: in step 104, according to the query condition obtained by analysis, query is performed in the knowledge graph, and the related knowledge points of the power transmission working ticket matched with the query condition are found by utilizing the relation and attribute information among the entities in the knowledge graph.

6. The knowledge-graph-based intelligent retrieval method for the power transmission working tickets according to claim 1, wherein the method is characterized by comprising the following steps of: in step 105, a power transmission working ticket list meeting the requirements is generated according to the query result, the query result is filtered according to the requirements, working ticket data meeting the requirements is screened, and the results are ordered.

7. The knowledge-graph-based intelligent retrieval method for the power transmission working tickets according to claim 1, wherein the method is characterized by comprising the following steps of: in step 106, a request is sent through an API interface, the ordered query result is obtained, response data returned by the API is analyzed, information to be displayed is extracted, and the ordered query result presents basic information of the work ticket according to a list format and is displayed to a user.

8. An intelligent power transmission work ticket retrieval system based on a knowledge graph is characterized in that: comprising the following steps: