CN117708280A - Knowledge graph-based intelligent retrieval method and system for power transmission work ticket - Google Patents
Knowledge graph-based intelligent retrieval method and system for power transmission work ticket Download PDFInfo
- Publication number
- CN117708280A CN117708280A CN202311588704.0A CN202311588704A CN117708280A CN 117708280 A CN117708280 A CN 117708280A CN 202311588704 A CN202311588704 A CN 202311588704A CN 117708280 A CN117708280 A CN 117708280A
- Authority
- CN
- China
- Prior art keywords
- power transmission
- ticket
- entity
- query
- knowledge graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000005540 biological transmission Effects 0.000 title claims abstract description 115
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000000605 extraction Methods 0.000 claims description 17
- 238000004422 calculation algorithm Methods 0.000 claims description 11
- 238000012163 sequencing technique Methods 0.000 claims description 11
- 238000004458 analytical method Methods 0.000 claims description 10
- 230000004044 response Effects 0.000 claims description 8
- 238000010276 construction Methods 0.000 claims description 4
- 238000011451 sequencing strategy Methods 0.000 claims description 4
- 230000000007 visual effect Effects 0.000 claims description 4
- 239000013598 vector Substances 0.000 description 32
- 238000012549 training Methods 0.000 description 22
- 230000006870 function Effects 0.000 description 20
- 239000008186 active pharmaceutical agent Substances 0.000 description 16
- 238000004364 calculation method Methods 0.000 description 12
- 238000001914 filtration Methods 0.000 description 12
- 238000012216 screening Methods 0.000 description 10
- 230000004913 activation Effects 0.000 description 8
- 230000011218 segmentation Effects 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 5
- 238000013507 mapping Methods 0.000 description 4
- 238000007781 pre-processing Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 3
- 230000001174 ascending effect Effects 0.000 description 2
- 238000012550 audit Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000011049 filling Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/338—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Animal Behavior & Ethology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses an intelligent retrieval method and system for a power transmission work ticket based on a knowledge graph, and particularly relates to the technical field of knowledge graphs.
Description
Technical Field
The invention relates to the technical field of big data, in particular to an intelligent retrieval method and system for a power transmission work ticket based on a knowledge graph.
Background
With the development of the power industry, the voltage level of power transmission is higher and higher, and the wiring of each stage of power grid is also more and more complex. The traditional manual billing is slow in billing speed and easy to error, and can not meet the requirements of modern work. Compared with the traditional manual billing, the work ticket system combined with the computer technology has the advantages of short billing time, billing standard, safety assurance and the like, and is an important way for realizing automatic informatization in the power industry.
How to process the transmission work ticket and realize quick search is a current problem.
Disclosure of Invention
In order to overcome the defects in the prior art, the embodiment of the invention provides an intelligent power transmission work ticket searching method based on a knowledge graph, which improves the searching accuracy and efficiency by constructing a knowledge graph covering related information of the power transmission work ticket so as to solve the problems in the background art.
In order to achieve the above purpose, the invention provides a power transmission work ticket intelligent retrieval method based on a knowledge graph, which specifically comprises the following steps:
101. establishing an entity and attribute relationship thereof by constructing a knowledge graph covering relevant information of the power transmission work ticket;
102. entity identification and extraction are carried out on the transmission work ticket text, and the transmission work ticket text corresponds to the entity in the knowledge graph;
103. analyzing the query statement input by the user, and identifying keywords, entities and attributes thereof;
104. using the relation and attribute information between the entities in the knowledge graph to find the relevant knowledge points of the power transmission working ticket matched with the query condition;
105. generating a power transmission work ticket list meeting the requirements according to the query result, and sequencing the result according to a sequencing strategy;
106. and presenting the basic information of the work ticket to the user according to the ordered query result in a list format.
In a preferred embodiment, in step 101, information related to a power transmission working ticket is obtained from a data source by using a programming technology, the power transmission working ticket data is converted into an RDF model, the power transmission working ticket data is analyzed, an entity, a relation and an attribute triplet are built by traversing the power transmission working ticket data, and a knowledge graph covering the information related to the power transmission working ticket is constructed, which specifically includes the following contents:
s1, converting power transmission work ticket data into an RDF model, analyzing the power transmission work ticket data, determining entities needing to be converted, including equipment, work tasks, personnel, work tickets, auditors, substations and power transmission line entities, and assigning unique URI identifiers to each entity and attribute, wherein the corresponding related attributes comprise work ticket numbers, work contents and executive names;
s2, building a triplet by traversing the power transmission work ticket data, and converting each entity and attribute into a triplet composed of a subject, a predicate and an object in the RDF model, wherein the subject is a URI of a corresponding entity, the predicate is a URI of a corresponding attribute, and the object is a URI of an associated entity;
s3, constructing a power knowledge graph according to the information of the entity, the relation and the attribute, organizing the power knowledge graph in the form of nodes and edges to form a directed graph structure, and quickly retrieving the related nodes and edges according to conditions by establishing indexes in the entity and the relation in the knowledge graph, wherein the specific steps are as follows:
step 1, entity index: selecting a plurality of attributes as index keys, and mapping the values of the attributes with the entity nodes to quickly locate related entity nodes;
step 2, relation index: and selecting a plurality of attributes as index keys, and mapping the values of the attributes and the relationship edges to quickly locate the related relationship edges.
In a preferred embodiment, in step 102, entity recognition and extraction are performed on the transmission ticket text, and the entity in the text is recognized, including the equipment name, the task, and the staff, and corresponds to the entity in the knowledge graph, which specifically includes the following contents:
s1, entity identification: the method comprises the steps of taking an input text and a corresponding word segmentation label and a labeled Chinese word segmentation data set as training data, extracting features from the training data, wherein the features comprise word vectors, parts of speech and context information which are used for training a model, and carrying out entity recognition, and specifically comprises the following steps:
step 1, data preparation: the input text and the corresponding word segmentation labels are used for representing the boundary position of each word, B represents the initial position of the word, I represents the middle position of the word, O represents the outside of the word, and a marked Chinese word segmentation data set is obtained and used as training data;
step 2, feature extraction: extracting features from training dataThe word embedding method comprises the steps of using word vectors, part of speech and context information for training models, converting words into vector representations by using word embedding technology, and randomly taking out two word vectors x and y, wherein the word vectors of x are expressed as (x) 1 ,x 2 ,....x n ) The word vector of y is expressed as (y 1 ,y 2 ,...,y n ) For the word vectors x and y, the similarity between the word vectors is represented by calculating the straight line distance between the two vectors, and the specific calculation formula is as follows:
wherein P represents a point (x 1 ,x 2 ,....x n ) And point (y) 1 ,y 2 ,...,y n ) Euclidean distance between them; the |X| is the point (X 1 ,x 2 ,....x n ) The closer the Euclidean distance to the origin is to 0, the more similar the calculation result of the Euclidean distance is, the more similar the two vectors are;
s2, extracting entity relation: selecting a machine learning algorithm of a perception machine, constructing a model and training, wherein in the training process, input characteristics are used as input of the model, predicted boundary labels are output, and the relation among entities is predicted, and specifically comprises the following steps:
step 1, a perceptron receives an input vector x, performs linear weighted summation on the input vector x and a weight vector w, judges an output result through an activation function, and a linear weighted summation formula is specifically as follows:
S=w 1 ×x 1 +w 2 ×x 2 +...+w n ×x n
wherein S represents the result of summation, w 1 ,w 2 ,...,w n Representing weights, x 1 ,x 2 ,...,x n Representing the corresponding numerical value.
A step function is a commonly used activation function that maps an input value to one of two discrete output values, and is defined as follows:
the step function produces a sudden change when the input reaches a critical point, from 0 to 1 and from 1 to 0.
Step 2, initializing a weight vector w and a bias b, and calculating a predicted output value for each sample (x, y), wherein the specific formula is as follows:
y_hat=sign(w·x+b)
w=w+η×y×x
b=b+η×y
wherein x represents an input feature vector, y represents a label (1 or-1), and eta represents a learning rate; predicting correct y_hat=y, continuing the next sample, predicting incorrect y_hat is not equal to y, updating weight vector and bias, and continuing to iteratively update the current sample until the prediction is correct;
s3, the entity and the relation extracted from the text are structurally represented and stored in a relation database, wherein the relation database is a database management system based on a relation model, and data are organized and stored by using tables, and each table comprises rows and columns.
In a preferred embodiment, in step 103, the query sentence input by the user is parsed, the keywords, the entities and the attributes thereof are identified, the model parameters are updated by adopting a back propagation algorithm, and the intention of the query is accurately predicted, which specifically includes the following contents:
s1, data preprocessing: preprocessing a text, and removing redundant spaces, punctuation marks and special characters to obtain a text data set;
s2, grammar analysis: dividing a query sentence into individual words, labeling each word with the part of speech of the word, including verbs, nouns and adjectives, updating model parameters by adopting a back propagation algorithm, and accurately predicting the intention of the query, wherein the method specifically comprises the following steps:
step 1, forward propagation: inputting text data into a CNN model, calculating and storing an output result of each layer by layer, and calculating a final prediction result, wherein a specific calculation formula is as follows:
wherein,representing the final prediction result, the argmax function represents the value of x such that f (x) takes the maximum value, f (x) representing the output of the CNN model;
step 2, calculating a loss function: comparing the predicted result with a real label, and calculating the value of the loss function, wherein the specific calculation formula is as follows:
where MSE represents the mean squared error, Σ represents the sum of all samples, (y) i -h θ (x i )) 2 Representing the square of the difference between the true value and the predicted value;
step 3, back propagation: starting from the last layer, the gradient is transferred forward from the output layer to each layer by using the chain rule, the gradient of each layer is calculated according to the weight of the parameter and the derivative of the activation function, the gradient of each layer is set to be composed of L layers from the input layer to the output layer, the input of the first layer is a [ L-1], the output is a [ L ], the function of the output a [ L ] of the output layer is loss=f (a [ L ]), and the specific calculation formula is as follows:
wherein,is the derivative of the activation function of layer I with respect to the input, -/->Is the gradient transferred by the gradient of the layer 1;
step 4, parameter updating: according to the gradient obtained by calculation, updating parameters of the model, and controlling the parameter updating amplitude by multiplying a learning rate so as to avoid quick updating;
step 5, repeating training: repeating steps 1 to 4, and performing multiple iterations by using different training samples until the set training round number is reached.
In a preferred embodiment, in step 104, according to the query condition obtained by analysis, query is performed in a knowledge graph, and the related knowledge points of the power transmission working ticket matched with the query condition are found by using the relationship and attribute information between the entities in the knowledge graph, which specifically includes the following contents:
s1, entity relation expansion: according to the relativity of the power transmission working tickets, carrying out relation expansion on the found entities, searching the entities related to the power transmission working tickets, gradually expanding the relation between the entities related to the power transmission working tickets, extracting richer associated knowledge points, and specifically comprising the following steps:
step 1, analyzing the relationship types among entities: through statistics of the relation between the entities in the knowledge graph, the relation types between the found entities and the power transmission work ticket are analyzed, and the method specifically comprises the following steps:
applicant/writer: the applicant/writer of the power transmission working ticket is responsible for submitting the relevant information of the application and filling of the working ticket;
approver/auditor: the approver/auditor of the power transmission working ticket performs approval and audit on the content of the power transmission working ticket and decides whether to approve the working ticket;
executor/operator: personnel actually executing the power transmission work perform the work according to the instructions in the work ticket and ensure that the task is completed according to a specified program;
step 2, searching and discovering new associated entities: searching for other entities associated with the found entity based on the known relationship type;
step 3, expanding a relation path: according to the searched new entity, associating the new entity with the known entity to form a new relation path;
step 4, screening and verifying relation: screening and verifying the expanded relationship according to the domain knowledge, removing irrelevant relationship, and ensuring that the expanded relationship has rationality and accuracy;
s2, filtering attributes: according to the attribute limit in the query condition, filtering out the entity and relation meeting the condition, sending the query request to the knowledge graph, obtaining the data containing the related entity and the attribute thereof, extracting the attribute value corresponding to the attribute condition from the query result, matching the extracted attribute value by the Trie algorithm, judging whether the attribute condition is met, filtering out the entity meeting the condition according to the result of the attribute matching, retaining the successfully matched entity, and removing the entity not meeting the attribute condition, which comprises the following steps:
step 1, splitting each character string into individual characters, constructing the nodes of the tree according to the sequence, starting from a root node, each node represents one character, and the path from the root node to leaf nodes is a complete character string.
Step 2, node structure: each node comprises a character, an array of pointers to child nodes, and a marker indicating the end of the node as a string;
step 3, inserting operation: starting from the root node, inserting nodes step by step according to the character sequence of the character string until the last character of the character string exists, continuing to insert the child nodes corresponding to the characters downwards, and creating no new node;
step 4, searching operation: starting from the root node, matching nodes step by step according to the character sequence of the target character string, successfully matching all characters, and marking the end of the character string by the node where the last character is positioned, thereby indicating that the inquiry is successful;
step 5, prefix matching: all strings with the specified prefix are looked up, the prefix is matched starting from the root node until the last character of the prefix string.
In a preferred embodiment, in step 105, a power transmission ticket list meeting requirements is generated according to the query result, the query result is filtered according to the requirements, and the ticket data meeting the requirements are screened, and the results are ordered, which specifically includes the following contents:
s1, acquiring a query result: acquiring queried power transmission work ticket data from a database, and ensuring that the data contains required fields including work ticket numbers, work contents and responsible data information;
s2, filtering data: filtering the query result according to the requirement, selecting only the work ticket data meeting the conditions, and screening according to the date range, the work content keywords and the responsible person conditions, wherein the method specifically comprises the following steps of:
date range: comparing the date of the work ticket with the appointed starting date and ending date, and selecting only the work ticket data within the range;
work content keywords: searching the work ticket data containing specific keywords in the work content for screening;
responsible person: selecting work ticket data of a specific responsible person according to the name of the responsible person;
s3, sequencing data: and sequencing the filtered work ticket data, and arranging the work ticket data in ascending order according to the work ticket number and date fields.
In a preferred embodiment, in step 106, a request is sent through an API interface, a ranked query result is obtained, response data returned by the API is parsed, information to be displayed is extracted, and the ranked query result is presented with basic information of a work ticket according to a list format and is displayed to a user, and specifically includes the following contents:
s1, sending an API request and obtaining a result: an API request is sent, request parameters are constructed according to documents and requirements of the API, the request is ensured to be sent to a correct URL, and a correct HTTP method is designated;
s2, analyzing response data returned by the API: analyzing according to the data format returned by the API, and converting the response data into a list;
s3, displaying the query result to the user: presenting basic information of the work ticket to the user according to the ordered query result and the list format, wherein the method specifically comprises the following steps:
step 1, determining the header of a list: determining the title of each column in the list, and displaying the title in the final list;
step 2, constructing a data line of a list: traversing the ordered query results, acquiring the related information of each work ticket row by row, and organizing the data into a row according to the sequence of the table head;
step 3, outputting a list: and outputting the constructed data lines in sequence, and displaying a work ticket list on a console by using a table form.
An intelligent power transmission work ticket retrieval system based on a knowledge graph comprises:
the knowledge graph construction module is used for constructing a knowledge graph covering the related information of the power transmission working ticket and establishing an entity and attribute relationship thereof;
the entity identification and extraction module is used for carrying out entity identification and extraction on the transmission work ticket text and corresponds to the entity in the knowledge graph;
the query analysis module is used for analyzing the query statement input by the user and identifying the keywords, the entities and the attributes thereof;
the knowledge graph query module is used for finding out related knowledge points of the power transmission work ticket matched with the query condition by utilizing the relation and attribute information among the entities in the knowledge graph;
the result generation and sorting module is used for generating a power transmission work ticket list meeting the requirements according to the inquired result and sorting the result according to a sorting strategy;
and the visual display module is used for displaying the basic information of the work ticket to the user according to the list format of the ordered query result.
In operation, the invention acquires the related information of the power transmission working ticket from a data source, traverses the power transmission working ticket data to establish an entity, relation and attribute triplet, constructs a knowledge graph covering the related information of the power transmission working ticket, carries out entity identification and extraction on the text of the power transmission working ticket, corresponds to the entity in the knowledge graph, analyzes the query statement input by a user, updates the model parameters by adopting a back propagation algorithm, accurately predicts the query intention, finds the related knowledge points of the power transmission working ticket matched with the query condition by utilizing the relation and attribute information among the entities in the knowledge graph, generates a power transmission working ticket list meeting the requirement according to the query result, presents the basic information of the working ticket according to the list format after the sequencing, and displays the basic information to the user, thereby realizing quick retrieval of a large number of power transmission working ticket data and improving the retrieval efficiency.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
fig. 2 is a functional block diagram of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
The embodiment provides an intelligent power transmission work ticket retrieval method based on a knowledge graph, as shown in fig. 1, which specifically comprises the following steps:
101. establishing an entity and attribute relationship thereof by constructing a knowledge graph covering relevant information of the power transmission work ticket;
102. entity identification and extraction are carried out on the transmission work ticket text, and the transmission work ticket text corresponds to the entity in the knowledge graph;
103. analyzing the query statement input by the user, and identifying keywords, entities and attributes thereof;
104. using the relation and attribute information between the entities in the knowledge graph to find the relevant knowledge points of the power transmission working ticket matched with the query condition;
105. generating a power transmission work ticket list meeting the requirements according to the query result, and sequencing the result according to a sequencing strategy;
106. and presenting the basic information of the work ticket to the user according to the ordered query result in a list format.
The method comprises the following specific steps:
101. establishing an entity and attribute relationship thereof by constructing a knowledge graph covering relevant information of the power transmission work ticket;
further, the related information of the power transmission working ticket is obtained from a data source by utilizing a programming technology, the power transmission working ticket data is converted into an RDF model, the power transmission working ticket data is analyzed, the power transmission working ticket data is traversed to establish entity, relation and attribute triplets, and a knowledge graph covering the related information of the power transmission working ticket is constructed, and the method specifically comprises the following steps:
s1, converting power transmission work ticket data into an RDF model, analyzing the power transmission work ticket data, determining entities needing to be converted, including equipment, work tasks, personnel, work tickets, auditors, substations and power transmission line entities, and assigning unique URI identifiers to each entity and attribute, wherein the corresponding related attributes comprise work ticket numbers, work contents and executive names;
s2, building a triplet by traversing the power transmission work ticket data, and converting each entity and attribute into a triplet composed of a subject, a predicate and an object in the RDF model, wherein the subject is a URI of a corresponding entity, the predicate is a URI of a corresponding attribute, and the object is a URI of an associated entity;
s3, constructing a power knowledge graph according to the information of the entity, the relation and the attribute, organizing the power knowledge graph in the form of nodes and edges to form a directed graph structure, and quickly retrieving the related nodes and edges according to conditions by establishing indexes in the entity and the relation in the knowledge graph, wherein the specific steps are as follows:
step 1, entity index: selecting a plurality of attributes as index keys, and mapping the values of the attributes with the entity nodes to quickly locate related entity nodes;
step 2, relation index: and selecting a plurality of attributes as index keys, and mapping the values of the attributes and the relationship edges to quickly locate the related relationship edges.
102. Entity identification and extraction are carried out on the transmission work ticket text, and the transmission work ticket text corresponds to the entity in the knowledge graph;
further, entity identification and extraction are carried out on the transmission work ticket text, and the entity in the text is identified, wherein the entity comprises a device name, a work task and a worker, and corresponds to the entity in the knowledge graph, and the entity identification method specifically comprises the following steps:
s1, entity identification: the method comprises the steps of taking an input text and a corresponding word segmentation label and a labeled Chinese word segmentation data set as training data, extracting features from the training data, wherein the features comprise word vectors, parts of speech and context information which are used for training a model, and carrying out entity recognition, and specifically comprises the following steps:
step 1, data preparation: the input text and the corresponding word segmentation labels are used for representing the boundary position of each word, B represents the initial position of the word, I represents the middle position of the word, O represents the outside of the word, and a marked Chinese word segmentation data set is obtained and used as training data;
step 2, feature extraction: extracting features from training data, including word vectors, part of speech, context information, for training models, converting words into vector representations using word embedding techniques, randomly extracting two word vectors x and y, the word vector representation of x being (x) 1 ,x 2 ,....x n ) The word vector of y is expressed as (y 1 ,y 2 ,...,y n ) For the word vectors x and y, the similarity between the word vectors is represented by calculating the straight line distance between the two vectors, and the specific calculation formula is as follows:
wherein P represents a point (x 1 ,x 2 ,....x n ) And point (y) 1 ,y 2 ,...,y n ) Euclidean distance between them; the |X| is the point (X 1 ,x 2 ,....x n ) The closer the Euclidean distance to the origin is calculated to be0, the more similar the two vectors are;
s2, extracting entity relation: selecting a machine learning algorithm of a perception machine, constructing a model and training, wherein in the training process, input characteristics are used as input of the model, predicted boundary labels are output, and the relation among entities is predicted, and specifically comprises the following steps:
step 1, a perceptron receives an input vector x, performs linear weighted summation on the input vector x and a weight vector w, judges an output result through an activation function, and a linear weighted summation formula is specifically as follows:
S=w 1 ×x 1 +w 2 ×x 2 +...+w n ×x n
wherein S represents the result of summation, w 1 ,w 2 ,...,w n Representing weights, x 1 ,x 2 ,...,x n Representing the corresponding numerical value.
A step function is a commonly used activation function that maps an input value to one of two discrete output values, and is defined as follows:
the step function produces a sudden change when the input reaches a critical point, from 0 to 1 and from 1 to 0.
Step 2, initializing a weight vector w and a bias b, and calculating a predicted output value for each sample (x, y), wherein the specific formula is as follows:
y_hat=sign(w·x+b)
w=w+η×y×x
b=b+η×y
wherein x represents an input feature vector, y represents a label (1 or-1), and eta represents a learning rate; predicting correct y_hat=y, continuing the next sample, predicting incorrect y_hat is not equal to y, updating weight vector and bias, and continuing to iteratively update the current sample until the prediction is correct;
s3, the entity and the relation extracted from the text are structurally represented and stored in a relation database, wherein the relation database is a database management system based on a relation model, and data are organized and stored by using tables, and each table comprises rows and columns.
103. Analyzing the query statement input by the user, and identifying keywords, entities and attributes thereof;
further, analyzing the query sentence input by the user, identifying the keyword, the entity and the attribute thereof, updating the model parameter by adopting a back propagation algorithm, and accurately predicting the intention of the query, wherein the method specifically comprises the following steps:
s1, data preprocessing: preprocessing a text, and removing redundant spaces, punctuation marks and special characters to obtain a text data set;
s2, grammar analysis: dividing a query sentence into individual words, labeling each word with the part of speech of the word, including verbs, nouns and adjectives, updating model parameters by adopting a back propagation algorithm, and accurately predicting the intention of the query, wherein the method specifically comprises the following steps:
step 1, forward propagation: inputting text data into a CNN model, calculating and storing an output result of each layer by layer, and calculating a final prediction result, wherein a specific calculation formula is as follows:
wherein,representing the final prediction result, the argmax function represents the value of x such that f (x) takes the maximum value, f (x) representing the output of the CNN model;
step 2, calculating a loss function: comparing the predicted result with a real label, and calculating the value of the loss function, wherein the specific calculation formula is as follows:
where MSE represents the mean squared error, Σ represents summing all samples,(y i -h θ (x i )) 2 Representing the square of the difference between the true value and the predicted value;
step 3, back propagation: starting from the last layer, the gradient is transferred forward from the output layer to each layer by using the chain rule, the gradient of each layer is calculated according to the weight of the parameter and the derivative of the activation function, the gradient of each layer is set to be composed of L layers from the input layer to the output layer, the input of the first layer is a [ L-1], the output is a [ L ], the function of the output a [ L ] of the output layer is loss=f (a [ L ]), and the specific calculation formula is as follows:
wherein,is the derivative of the activation function of layer I with respect to the input, -/->Is the gradient transferred by the gradient of the layer 1;
step 4, parameter updating: according to the gradient obtained by calculation, updating parameters of the model, and controlling the parameter updating amplitude by multiplying a learning rate so as to avoid quick updating;
step 5, repeating training: repeating steps 1 to 4, and performing multiple iterations by using different training samples until the set training round number is reached.
104. Using the relation and attribute information between the entities in the knowledge graph to find the relevant knowledge points of the power transmission working ticket matched with the query condition;
further, according to the query conditions obtained by analysis, querying is performed in the knowledge graph, and the related knowledge points of the power transmission working ticket matched with the query conditions are found by utilizing the relation and attribute information among the entities in the knowledge graph, wherein the related knowledge points comprise the following contents:
s1, entity relation expansion: according to the relativity of the power transmission working tickets, carrying out relation expansion on the found entities, searching the entities related to the power transmission working tickets, gradually expanding the relation between the entities related to the power transmission working tickets, extracting richer associated knowledge points, and specifically comprising the following steps:
step 1, analyzing the relationship types among entities: through statistics of the relation between the entities in the knowledge graph, the relation types between the found entities and the power transmission work ticket are analyzed, and the method specifically comprises the following steps:
applicant/writer: the applicant/writer of the power transmission working ticket is responsible for submitting the relevant information of the application and filling of the working ticket;
approver/auditor: the approver/auditor of the power transmission working ticket performs approval and audit on the content of the power transmission working ticket and decides whether to approve the working ticket;
executor/operator: personnel actually executing the power transmission work perform the work according to the instructions in the work ticket and ensure that the task is completed according to a specified program;
step 2, searching and discovering new associated entities: searching for other entities associated with the found entity based on the known relationship type;
step 3, expanding a relation path: according to the searched new entity, associating the new entity with the known entity to form a new relation path;
step 4, screening and verifying relation: screening and verifying the expanded relationship according to the domain knowledge, removing irrelevant relationship, and ensuring that the expanded relationship has rationality and accuracy;
s2, filtering attributes: according to the attribute limit in the query condition, filtering out the entity and relation meeting the condition, sending the query request to the knowledge graph, obtaining the data containing the related entity and the attribute thereof, extracting the attribute value corresponding to the attribute condition from the query result, matching the extracted attribute value by the Trie algorithm, judging whether the attribute condition is met, filtering out the entity meeting the condition according to the result of the attribute matching, retaining the successfully matched entity, and removing the entity not meeting the attribute condition, which comprises the following steps:
step 1, splitting each character string into individual characters, constructing the nodes of the tree according to the sequence, starting from a root node, each node represents one character, and the path from the root node to leaf nodes is a complete character string.
Step 2, node structure: each node comprises a character, an array of pointers to child nodes, and a marker indicating the end of the node as a string;
step 3, inserting operation: starting from the root node, inserting nodes step by step according to the character sequence of the character string until the last character of the character string exists, continuing to insert the child nodes corresponding to the characters downwards, and creating no new node;
step 4, searching operation: starting from the root node, matching nodes step by step according to the character sequence of the target character string, successfully matching all characters, and marking the end of the character string by the node where the last character is positioned, thereby indicating that the inquiry is successful;
step 5, prefix matching: all strings with the specified prefix are looked up, the prefix is matched starting from the root node until the last character of the prefix string.
105. Generating a power transmission work ticket list meeting the requirements according to the query result, and sequencing the result according to a sequencing strategy;
further, according to the query result, generating a power transmission working ticket list meeting the requirements, filtering the query result according to the requirements, screening working ticket data meeting the requirements, and sequencing the results, wherein the power transmission working ticket list comprises the following specific contents:
s1, acquiring a query result: acquiring queried power transmission work ticket data from a database, and ensuring that the data contains required fields including work ticket numbers, work contents and responsible data information;
s2, filtering data: filtering the query result according to the requirement, selecting only the work ticket data meeting the conditions, and screening according to the date range, the work content keywords and the responsible person conditions, wherein the method specifically comprises the following steps of:
date range: comparing the date of the work ticket with the appointed starting date and ending date, and selecting only the work ticket data within the range;
work content keywords: searching the work ticket data containing specific keywords in the work content for screening;
responsible person: selecting work ticket data of a specific responsible person according to the name of the responsible person;
s3, sequencing data: and sequencing the filtered work ticket data, and arranging the work ticket data in ascending order according to the work ticket number and date fields.
106. Presenting basic information of the work ticket to the user according to the ordered query result and the list format;
further, a request is sent through an API interface, a sequenced query result is obtained, response data returned by the API is analyzed, information to be displayed is extracted, the sequenced query result presents basic information of a work ticket according to a list format and is displayed to a user, and the method specifically comprises the following steps:
s1, sending an API request and obtaining a result: an API request is sent, request parameters are constructed according to documents and requirements of the API, the request is ensured to be sent to a correct URL, and a correct HTTP method is designated;
s2, analyzing response data returned by the API: analyzing according to the data format returned by the API, and converting the response data into a list;
s3, displaying the query result to the user: presenting basic information of the work ticket to the user according to the ordered query result and the list format, wherein the method specifically comprises the following steps:
step 1, determining the header of a list: determining the title of each column in the list, and displaying the title in the final list;
step 2, constructing a data line of a list: traversing the ordered query results, acquiring the related information of each work ticket row by row, and organizing the data into a row according to the sequence of the table head;
step 3, outputting a list: and outputting the constructed data lines in sequence, and displaying a work ticket list on a console by using a table form.
As shown in fig. 2, the intelligent power transmission work ticket retrieval system based on the knowledge graph specifically comprises a knowledge graph construction module, an entity identification and extraction module, a query analysis module, a knowledge graph query module, a result generation and sorting module and a visual display module;
knowledge graph construction module: acquiring information related to the power transmission working ticket from a data source by utilizing a programming technology, converting the power transmission working ticket data into an RDF model, analyzing the power transmission working ticket data, traversing the power transmission working ticket data to establish entity, relation and attribute triples, and constructing a knowledge graph covering the information related to the power transmission working ticket;
entity recognition and extraction module: entity identification and extraction are carried out on the transmission work ticket text, and the entity in the text, including equipment name, work task and staff, is identified and corresponds to the entity in the knowledge graph;
and a query analysis module: analyzing the query statement input by the user, identifying the key words, the entities and the attributes thereof, updating the model parameters by adopting a back propagation algorithm, and accurately predicting the query intention;
knowledge graph query module: according to the query conditions obtained by analysis, querying in a knowledge graph, and finding out related knowledge points of the power transmission working ticket matched with the query conditions by utilizing the relation and attribute information among entities in the knowledge graph;
and a result generation and sorting module: generating a power transmission working ticket list meeting the requirements according to the query results, filtering the query results according to the requirements, screening working ticket data meeting the requirements, and sequencing the results;
visual display module: and sending a request through an API interface, acquiring a sequenced query result, analyzing response data returned by the API, extracting information to be displayed, and displaying the sequenced query result to a user according to basic information of the work ticket in a list format.
According to the invention, the knowledge graph covering the information related to the power transmission working ticket is constructed, and the structural representation of the entity, the attribute and the relation is established, so that the intention of the user query is more accurately understood, and the accuracy and the efficiency of the retrieval are improved.
The formula in the invention is a formula which is obtained by removing dimension and taking the numerical calculation, and is closest to the actual situation by acquiring a large amount of data and performing software simulation, and the preset proportionality coefficient in the formula is set by a person skilled in the art according to the actual situation or is obtained by simulating the large amount of data.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Finally: the foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (8)
1. An intelligent retrieval method of a power transmission work ticket based on a knowledge graph is characterized by comprising the following steps of: the method specifically comprises the following steps:
101. establishing an entity and attribute relationship thereof by constructing a knowledge graph covering relevant information of the power transmission work ticket;
102. entity identification and extraction are carried out on the transmission work ticket text, and the transmission work ticket text corresponds to the entity in the knowledge graph;
103. analyzing the query statement input by the user, and identifying keywords, entities and attributes thereof;
104. using the relation and attribute information between the entities in the knowledge graph to find the relevant knowledge points of the power transmission working ticket matched with the query condition;
105. generating a power transmission work ticket list meeting the requirements according to the query result, and sequencing the result according to a sequencing strategy;
106. and presenting the basic information of the work ticket to the user according to the ordered query result in a list format.
2. The knowledge-graph-based intelligent retrieval method for the power transmission working tickets according to claim 1, wherein the method is characterized by comprising the following steps of: in step 101, information related to a power transmission working ticket is obtained from a data source, power transmission working ticket data is converted into an RDF model, the power transmission working ticket data is analyzed, entity, relation and attribute triples are built through traversing the power transmission working ticket data, and a knowledge graph covering the information related to the power transmission working ticket is constructed.
3. The knowledge-graph-based intelligent retrieval method for the power transmission working tickets according to claim 1, wherein the method is characterized by comprising the following steps of: in step 102, entity recognition and extraction are performed on the transmission work ticket text, and the entity in the text, including the equipment name, the work task and the staff, is recognized and corresponds to the entity in the knowledge graph.
4. The knowledge-graph-based intelligent retrieval method for the power transmission working tickets according to claim 1, wherein the method is characterized by comprising the following steps of: in step 103, the query sentence input by the user is parsed, the keywords, the entities and the attributes thereof are identified, and the model parameters are updated by adopting a back propagation algorithm, so that the query intention is accurately predicted.
5. The knowledge-graph-based intelligent retrieval method for the power transmission working tickets according to claim 1, wherein the method is characterized by comprising the following steps of: in step 104, according to the query condition obtained by analysis, query is performed in the knowledge graph, and the related knowledge points of the power transmission working ticket matched with the query condition are found by utilizing the relation and attribute information among the entities in the knowledge graph.
6. The knowledge-graph-based intelligent retrieval method for the power transmission working tickets according to claim 1, wherein the method is characterized by comprising the following steps of: in step 105, a power transmission working ticket list meeting the requirements is generated according to the query result, the query result is filtered according to the requirements, working ticket data meeting the requirements is screened, and the results are ordered.
7. The knowledge-graph-based intelligent retrieval method for the power transmission working tickets according to claim 1, wherein the method is characterized by comprising the following steps of: in step 106, a request is sent through an API interface, the ordered query result is obtained, response data returned by the API is analyzed, information to be displayed is extracted, and the ordered query result presents basic information of the work ticket according to a list format and is displayed to a user.
8. An intelligent power transmission work ticket retrieval system based on a knowledge graph is characterized in that: comprising the following steps:
the knowledge graph construction module is used for constructing a knowledge graph covering the related information of the power transmission working ticket and establishing an entity and attribute relationship thereof;
the entity identification and extraction module is used for carrying out entity identification and extraction on the transmission work ticket text and corresponds to the entity in the knowledge graph;
the query analysis module is used for analyzing the query statement input by the user and identifying the keywords, the entities and the attributes thereof;
the knowledge graph query module is used for finding out related knowledge points of the power transmission work ticket matched with the query condition by utilizing the relation and attribute information among the entities in the knowledge graph;
the result generation and sorting module is used for generating a power transmission work ticket list meeting the requirements according to the inquired result and sorting the result according to a sorting strategy;
and the visual display module is used for displaying the basic information of the work ticket to the user according to the list format of the ordered query result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311588704.0A CN117708280B (en) | 2023-11-27 | 2023-11-27 | Intelligent power transmission work ticket retrieval method based on knowledge graph |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311588704.0A CN117708280B (en) | 2023-11-27 | 2023-11-27 | Intelligent power transmission work ticket retrieval method based on knowledge graph |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117708280A true CN117708280A (en) | 2024-03-15 |
CN117708280B CN117708280B (en) | 2024-06-21 |
Family
ID=90161451
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311588704.0A Active CN117708280B (en) | 2023-11-27 | 2023-11-27 | Intelligent power transmission work ticket retrieval method based on knowledge graph |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117708280B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180232443A1 (en) * | 2017-02-16 | 2018-08-16 | Globality, Inc. | Intelligent matching system with ontology-aided relation extraction |
CN111353030A (en) * | 2020-02-26 | 2020-06-30 | 陕西师范大学 | Knowledge question and answer retrieval method and device based on travel field knowledge graph |
CN111522910A (en) * | 2020-04-14 | 2020-08-11 | 浙江大学 | Intelligent semantic retrieval method based on cultural relic knowledge graph |
CN115080694A (en) * | 2022-06-27 | 2022-09-20 | 国网甘肃省电力公司电力科学研究院 | Power industry information analysis method and equipment based on knowledge graph |
CN115455935A (en) * | 2022-09-14 | 2022-12-09 | 华东师范大学 | Intelligent text information processing system |
CN116450776A (en) * | 2023-04-23 | 2023-07-18 | 北京石油化工学院 | Oil-gas pipe network law and regulation and technical standard retrieval system based on knowledge graph |
CN116881436A (en) * | 2023-08-09 | 2023-10-13 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Knowledge graph-based document retrieval method, system, terminal and storage medium |
-
2023
- 2023-11-27 CN CN202311588704.0A patent/CN117708280B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180232443A1 (en) * | 2017-02-16 | 2018-08-16 | Globality, Inc. | Intelligent matching system with ontology-aided relation extraction |
CN111353030A (en) * | 2020-02-26 | 2020-06-30 | 陕西师范大学 | Knowledge question and answer retrieval method and device based on travel field knowledge graph |
CN111522910A (en) * | 2020-04-14 | 2020-08-11 | 浙江大学 | Intelligent semantic retrieval method based on cultural relic knowledge graph |
CN115080694A (en) * | 2022-06-27 | 2022-09-20 | 国网甘肃省电力公司电力科学研究院 | Power industry information analysis method and equipment based on knowledge graph |
CN115455935A (en) * | 2022-09-14 | 2022-12-09 | 华东师范大学 | Intelligent text information processing system |
CN116450776A (en) * | 2023-04-23 | 2023-07-18 | 北京石油化工学院 | Oil-gas pipe network law and regulation and technical standard retrieval system based on knowledge graph |
CN116881436A (en) * | 2023-08-09 | 2023-10-13 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Knowledge graph-based document retrieval method, system, terminal and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN117708280B (en) | 2024-06-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106649260B (en) | Product characteristic structure tree construction method based on comment text mining | |
CN109255031A (en) | The data processing method of knowledge based map | |
US20060242180A1 (en) | Extracting data from semi-structured text documents | |
US20220164363A1 (en) | Data extraction system | |
CN116737967B (en) | Knowledge graph construction and perfecting system and method based on natural language | |
CN111949855A (en) | Knowledge map-based engineering technology knowledge retrieval platform and method thereof | |
CN113254507B (en) | Intelligent construction and inventory method for data asset directory | |
CN112163424A (en) | Data labeling method, device, equipment and medium | |
CN111078835A (en) | Resume evaluation method and device, computer equipment and storage medium | |
CN114238653B (en) | Method for constructing programming education knowledge graph, completing and intelligently asking and answering | |
CN111949306A (en) | Pushing method and system supporting fragmented learning of open-source project | |
CN110851584A (en) | Accurate recommendation system and method for legal provision | |
JP2018147351A (en) | Knowledge model construction system and knowledge model construction method | |
CN118195260A (en) | Course learning evaluation method and system based on knowledge graph visualization | |
CN117952200A (en) | Knowledge graph and personalized learning path construction method and system | |
CN113505195A (en) | Knowledge base, construction method and retrieval method thereof, and question setting method and system based on knowledge base | |
CN116304115B (en) | Knowledge-graph-based material matching and replacing method and device | |
CN113190692A (en) | Self-adaptive retrieval method, system and device for knowledge graph | |
CN117708280B (en) | Intelligent power transmission work ticket retrieval method based on knowledge graph | |
CN111597400A (en) | Computer retrieval system and method based on way-finding algorithm | |
CN109213830B (en) | Document retrieval system for professional technical documents | |
CN115878818A (en) | Geographic knowledge graph construction method and device, terminal and storage medium | |
CN115760495A (en) | Method and device for realizing automatic labeling of legal cases | |
CN114372148A (en) | Data processing method based on knowledge graph technology and terminal equipment | |
CN114254199A (en) | Course recommendation method based on bipartite graph projection and node2vec |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |