CN114428862A - Oil and gas pipeline-based knowledge graph construction method and processor - Google Patents

Oil and gas pipeline-based knowledge graph construction method and processor Download PDF

Info

Publication number
CN114428862A
CN114428862A CN202111578731.0A CN202111578731A CN114428862A CN 114428862 A CN114428862 A CN 114428862A CN 202111578731 A CN202111578731 A CN 202111578731A CN 114428862 A CN114428862 A CN 114428862A
Authority
CN
China
Prior art keywords
oil
entity
gas pipeline
data
field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111578731.0A
Other languages
Chinese (zh)
Inventor
赵明华
吴张中
李莉
张斌
李秋扬
温文
杨瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Oil and Gas Pipeline Network Corp
National Pipe Network Group North Pipeline Co Ltd
Original Assignee
China Oil and Gas Pipeline Network Corp
National Pipe Network Group North Pipeline Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Oil and Gas Pipeline Network Corp, National Pipe Network Group North Pipeline Co Ltd filed Critical China Oil and Gas Pipeline Network Corp
Priority to CN202111578731.0A priority Critical patent/CN114428862A/en
Publication of CN114428862A publication Critical patent/CN114428862A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of oil and gas pipelines, in particular to a knowledge graph construction method, a processor, a device and a storage medium based on an oil and gas pipeline. The method comprises the following steps: acquiring text data of the field where the oil and gas pipeline is located; preprocessing the text data, and labeling the preprocessed text data to construct a labeled data corpus of the oil and gas pipeline; inputting the sentences contained in the labeled data corpus into an entity recognition learning model so as to extract the entities contained in the sentences through the entity recognition learning model; inputting the entities into an entity relationship extraction model so as to determine entity relationships among the entities through the entity relationship extraction model; and constructing a knowledge graph based on the oil and gas pipeline according to the entity and the entity relation. According to the technical scheme, the entities and the entity relations are extracted by acquiring the text data of the field of the oil and gas pipeline, and the knowledge graph of the oil and gas pipeline is constructed. Therefore, intelligent pipe network applications such as knowledge retrieval, decision support and the like are better supported.

Description

Oil and gas pipeline-based knowledge graph construction method and processor
Technical Field
The application relates to the technical field of oil and gas pipelines, in particular to a knowledge graph construction method, a processor, a device and a storage medium based on an oil and gas pipeline.
Background
The knowledge graph graphically describes the complex relationship between concepts and entities in the real world, so that a computer can convey information, organize and manage information in a world-cognitive mode which is more acceptable to human beings, and people can understand knowledge better. The knowledge map is used as a mode for representing knowledge in the knowledge engineering field of the big data era, the knowledge is graphically and structurally represented through concepts, entities, entity attributes and semantic relations among the entities, a machine can understand and explain objective phenomena and facts, deep knowledge reasoning capacity and gradually expanded cognitive ability of the machine can help practitioners in related industries to analyze, reason and aid decision-making on specific problems. The knowledge graph researches and solves the problem from the perspective of relationship, and the powerful semantic function of the knowledge graph enables the knowledge graph to highlight incomparable advantages in the aspects of data analysis, semantic search, intelligent recommendation, natural human-computer interaction and decision support. At present, the knowledge graph has been widely practiced and applied in the fields of finance, medical treatment, traffic, education, electric commercial power grids and the like.
The oil and gas pipeline industry is used as a knowledge-intensive industry, through development for many years, a large amount of knowledge including experience, specification, axiom, workflow, common knowledge, calculation formulas and the like is accumulated in the whole life cycle from pipeline design, construction to operation and maintenance, and a complete industry knowledge system is formed. At present, the knowledge types of the oil and gas pipeline industry mainly comprise three types of structuring, semi-structuring and non-structuring, and particularly the non-structuring type is more. In the prior art, methods for extracting relationships among entities from a large amount of unstructured data to construct a knowledge graph are mostly constructed in a semi-automatic or manual mode and cannot realize automatic extraction; many entities and potential relationships between entities are not a fully mined problem.
Disclosure of Invention
The embodiment of the application aims to solve the problem that an effective knowledge graph in the field of oil and gas pipelines is not constructed in the prior art, and provides a knowledge graph construction method, a processor, a device and a storage medium based on the oil and gas pipelines.
In order to achieve the above object, a first aspect of the present application provides a method for constructing a knowledge graph based on an oil and gas pipeline, including:
acquiring text data of the field where the oil and gas pipeline is located;
preprocessing the text data, and labeling the preprocessed text data to construct a labeled data corpus of the oil and gas pipeline;
inputting the sentences contained in the labeled data corpus into an entity recognition learning model so as to extract the entities contained in the sentences through the entity recognition learning model;
inputting the entities into an entity relationship extraction model so as to determine entity relationships among the entities through the entity relationship extraction model;
and constructing a knowledge graph based on the oil and gas pipeline according to the entity and the entity relation.
In one embodiment of the application, before text data of a field where an oil and gas pipeline is located is obtained, a field body based on oil and gas pipeline full life cycle service is constructed; and marking the preprocessed text data according to the domain ontology.
In one embodiment of the present application, a resource library of a field of oil and gas pipelines is constructed, the resource library comprising at least one of a plurality of vocabularies, terms, and rule templates within the field of oil and gas pipelines; summarizing and sorting data in a resource library according to each stage of the full life cycle of the oil and gas pipeline to construct a basic text set; classifying resources in the basic text set according to the content and the characteristics of the oil and gas pipeline service; performing word segmentation on resources in the basic text set, and determining words included in each classification to obtain a basic word stock; and labeling words contained in the basic word bank according to the preset semantic type and semantic relation set of the oil and gas pipeline field to construct a basic semantic concept set of the oil and gas pipeline.
In one embodiment of the application, MySQL is used to store user information, atlas information of the knowledge-atlas, and data of the domain ontology; storing the entities and entity relationships using MongoDB; storing a graph structure of a knowledge graph using a graph database Neo4 j; the textual index data is stored using an ElasticSearch.
In one embodiment of the present application, the method further includes a step of training an entity relationship extraction model, including: acquiring text data of the field where the oil and gas pipelines are located as sample data; preprocessing sample data; inputting the preprocessed sample data to an entity relation extraction model; segmenting sentences contained in input sample data through an entity relationship extraction model to obtain a plurality of participles of each sentence, and word characteristics of each participle and type characteristic vectors of the words; splicing and fusing the word features and the type feature vectors of the words to obtain word embedding representation of each sentence; embedding the words of each statement into input data of a convolution layer serving as an entity relation extraction model, and acquiring local characteristics of each statement output by the convolution layer; taking the local features of each statement as input data of a pooling layer of the entity relationship extraction model, and determining the global features of each statement through the pooling layer; classifying the global features, and taking the classified global features as input data of a full connection layer of the entity relation extraction model; acquiring a predicted entity relation output by a full connection layer and aiming at each pair of entities; determining the prediction accuracy of the entity relation extraction model according to the entity relation labeled in advance and the predicted entity relation; and under the condition that the prediction accuracy reaches a preset threshold value, determining that the entity relationship extraction model is completely trained.
In one embodiment of the application, the business data contained in at least one of a management system, a production system, a scientific research management system and a standard query system in the field of oil and gas pipelines; journal and literature data in the field of oil and gas pipelines contained in external systems; data within the field of oil and gas pipelines included in the web page.
In one embodiment of the present application, the entity recognition learning model is a two-way long-term memory network, and the entity relationship extraction model is a CNN neural network.
A second aspect of the present application provides a processor configured to perform the method of hydrocarbon pipeline based knowledge-map construction of any of the embodiments described above.
The third aspect of the application provides a knowledge graph building device based on an oil and gas pipeline, which comprises the processor.
A fourth aspect of the present application provides a storage medium having instructions stored thereon, which when executed by the processor, cause the processor to perform the method for constructing a knowledge graph based on a hydrocarbon pipeline in any one of the embodiments.
According to the technical scheme, the text data in the field of the oil and gas pipeline is obtained and processed, the entity and the entity relation of the processed text data are extracted, and the knowledge graph of the oil and gas pipeline is constructed according to the extracted entity and the entity relation, so that the intelligent pipe network application such as knowledge retrieval and decision support can be better supported, and the knowledge sharing in the field of the oil and gas pipeline is promoted.
Additional features and advantages of embodiments of the present application will be described in detail in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the embodiments of the disclosure, but are not intended to limit the embodiments of the disclosure. In the drawings:
FIG. 1 schematically illustrates a flow diagram of a method for hydrocarbon pipeline based knowledge-graph construction according to an embodiment of the present application;
FIG. 2 schematically illustrates an example of a detailed hydrocarbon pipeline knowledge-map visualization in accordance with an embodiment of the present application;
fig. 3 schematically shows an internal structure diagram of a computer device according to an embodiment of the present application.
Detailed Description
The following detailed description of embodiments of the present application will be made with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present application, are given by way of illustration and explanation only, and are not intended to limit the present application.
It should be noted that if directional indications (such as up, down, left, right, front, and back … …) are referred to in the embodiments of the present application, the directional indications are only used to explain the relative positional relationship between the components, the movement situation, and the like in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indications are changed accordingly.
In addition, if there is a description of "first", "second", etc. in the embodiments of the present application, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present application.
As shown in fig. 1, which schematically illustrates a flow chart of a method for constructing a knowledge graph based on an oil and gas pipeline according to an embodiment of the present application, as shown in fig. 1, in an embodiment of the present application, there is provided a method for constructing a knowledge graph based on an oil and gas pipeline, including the following steps:
step 101, acquiring text data of the field where the oil and gas pipeline is located.
In one embodiment, the text data includes at least one of: business data contained in at least one of a management system, a production system, a scientific research management system and a standard query system in the field of oil and gas pipelines; journal and literature data in the field of oil and gas pipelines contained in external systems; data within the field of oil and gas pipelines included in the web page.
The oil and gas pipeline knowledge graph needs to be constructed based on mass, multi-source and heterogeneous oil and gas pipeline field data, wherein the multi-source mainly refers to diversification of data sources, and the heterogeneous mainly refers to difference of data structures. The processor can obtain text data of the field where the oil and gas pipeline is located. The text data of the field of the oil and gas pipeline can be derived from service data of service systems such as an integrity management system, a production system, a scientific research management system, a standard query system and the like, which relate to the oil and gas pipeline; journal and literature data in the field of oil and gas pipelines contained in external systems in the oil and gas pipeline industry; and data in the oil and gas pipeline field included in the webpage related to the oil and gas pipeline field. That is, the processor may obtain all relevant, queryable text data in the field of the oil and gas pipeline.
In one embodiment, before text data of a field where an oil and gas pipeline is located is obtained, a field body based on oil and gas pipeline full life cycle service is constructed; and marking the preprocessed text data according to the domain ontology.
Before the processor obtains text data of the field where the oil and gas pipeline is located, a field body of the oil and gas pipeline can be constructed based on the full life cycle service of the oil and gas pipeline. After the processor constructs the field body of the oil and gas pipeline, the preprocessed text data can be labeled according to the field body.
In one embodiment, constructing a domain ontology based on a full life cycle business of an oil and gas pipeline comprises: constructing a resource library of the oil and gas pipeline field, wherein the resource library comprises at least one of a plurality of vocabularies, terms and rule templates in the oil and gas pipeline field; summarizing and sorting data in a resource library according to each stage of the full life cycle of the oil and gas pipeline to construct a basic text set; classifying resources in the basic text set according to the content and the characteristics of the oil and gas pipeline service; performing word segmentation on resources in the basic text set, and determining words included in each classification to obtain a basic word stock; and labeling words contained in the basic word bank according to the preset semantic type and semantic relation set of the oil and gas pipeline field to construct a basic semantic concept set of the oil and gas pipeline.
When the processor constructs a domain ontology based on the oil and gas pipeline full life cycle service, the processor can firstly process literature resources in the oil and gas pipeline domain. The processor can acquire related literature data such as vocabularies, terms, rule templates and the like in the field of oil and gas pipelines, and then the acquired related literature data is used as a resource library in the field of the oil and gas pipelines. And the data in the resource library is summarized and sorted according to each stage of the full life cycle of the oil and gas pipeline, so that a basic text set related to the oil and gas field is constructed.
After the basic text sets related to the oil and gas field are established according to all stages of the full life cycle of the oil and gas pipeline, the processor can classify the basic text sets according to the service content and the characteristics of all stages of the full life cycle of the oil and gas pipeline. For example, the processor may classify the oil and gas pipeline domain knowledge into five major categories of planning, design, construction, operational maintenance, abandonment, etc., and the processor may list the concept definitions of each category and decompose the classified categories layer by layer according to the content included in the base text set.
The processor can perform word segmentation on the basic text set according to the well-segmented categories, wherein the processor can perform word segmentation on the basic text set of the oil and gas pipeline by adopting a Chinese word segmentation system, so that words included under each category are determined, and a basic word bank in the field of the oil and gas pipeline is obtained. After the basic word bank in the field of the oil and gas pipeline is determined, the processor can define and label words contained in the basic word bank according to preset semantic types and semantic relation sets, and therefore the basic semantic concept set of the oil and gas pipeline is constructed. The semantic type and the semantic relation can be set by experts in the field of oil and gas pipelines and stored in the processor, so that the processor can define and label the concept semantic type and the semantic relation in the field of the oil and gas pipelines according to the preset semantic type and the semantic relation set.
And 102, preprocessing the text data, and labeling the preprocessed text data to construct a labeled data corpus of the oil and gas pipeline.
The processor may perform preprocessing on the obtained text data of the field in which the oil and gas pipeline is located, for example, perform data electronization on the obtained text data of the field in which the oil and gas pipeline is located, and use the electronized text data of the field in which the oil and gas pipeline is located as a labeling object. After the processor obtains the text data after the oil and gas pipeline pretreatment, the text data after the pretreatment can be labeled,
the processor can use the established oil and gas pipeline field body as a marking basis to carry out type marking and semantic marking on the text data in the oil and gas pipeline field and establish an oil and gas pipeline marking data corpus. Technicians can manually mark types and semantically mark text data according to the oil and gas pipeline field body built by the processor or semi-automatically mark the text data by combining the processor so as to build a marked data corpus of the oil and gas pipeline field.
Step 103, inputting the sentences contained in the annotated data corpus into the entity recognition learning model, so as to extract the entities contained in the sentences through the entity recognition learning model.
After the processor constructs the labeled data corpus in the oil and gas pipeline field, sentences contained in the labeled data corpus can be input into the entity recognition learning model. The entity recognition learning model can extract entities in the sentence.
In one embodiment, the entity recognition learning model is a two-way long-term memory network and the entity relationship extraction model is a CNN neural network.
The processor inputs sentences contained in the labeling data corpus into the entity recognition model, the entity recognition model can extract features from the sentences contained in the input labeling data corpus after receiving the sentences, and a bidirectional long-term and short-term memory network is adopted to extract the features of each character in the sentences, so that the entity in the sentences is recognized by effectively utilizing context information. The entity recognition model has a bidirectional network structure, one direction is propagated forwards, and the other direction is propagated backwards. And connecting the two by a splicing vector to obtain the output characteristic of the model. And sending the obtained characteristic output to a conditional random field for training so as to realize entity recognition. The entity relation extraction model is extracted by adopting a CNN neural network.
And 104, inputting the entities into the entity relationship extraction model so as to determine the entity relationship between the entities through the entity relationship extraction model.
After the processor extracts the entities in the statement through the entity recognition model. The obtained entities can be input into an entity relationship extraction model, so that entity relationships among the entities can be determined through the entity relationship extraction model. The processor may train the entity relationship extraction model prior to using the entity relationship extraction model.
In one embodiment, text data of the field where a plurality of oil and gas pipelines are located is obtained as sample data; preprocessing sample data; inputting the preprocessed sample data to an entity relation extraction model; segmenting sentences contained in input sample data through an entity relationship extraction model to obtain a plurality of participles of each sentence, and word characteristics of each participle and type characteristic vectors of the words; splicing and fusing the word features and the type feature vectors of the words to obtain word embedding representation of each sentence; embedding the words of each statement into input data of a convolutional layer serving as an entity relationship extraction model, and acquiring local characteristics of each statement output by the convolutional layer; the local features of each statement are used as input data of a pooling layer of the entity relationship extraction model, and the global features of each statement are determined through the pooling layer; classifying the global features, and taking the classified global features as input data of a full connection layer of the entity relation extraction model; acquiring a predicted entity relation output by a full connection layer and aiming at each pair of entities; determining the prediction accuracy of the entity relation extraction model according to the entity relation labeled in advance and the predicted entity relation; and under the condition that the prediction accuracy reaches a preset threshold value, determining that the entity relationship extraction model is completely trained.
The processor can acquire text data of the fields of the plurality of oil and gas pipelines as sample data, wherein the text data can be derived from service data of service systems such as an integrity management system, a production system, a scientific research management system, a standard query system and the like related to the oil and gas pipelines or periodical and literature data in the field of the oil and gas pipelines contained in external systems in the oil and gas pipeline industry. After sample data is obtained, the sample data may be preprocessed. Namely, the obtained sample data is electronized and then used as a labeling object, the constructed oil and gas pipeline field body is used as a labeling basis to carry out type labeling and semantic labeling, and an oil and gas pipeline labeling data corpus is constructed. The sentences contained in the data corpus contain entities and entity relation information.
The processor may input the preprocessed sample data to the entity relationship extraction model, that is, the preprocessed statements containing the entity and the entity relationship in the corpus are input to the entity relationship extraction model. After receiving the preprocessed sample data, the entity relationship extraction model can segment the sentences contained in the sample data to obtain a plurality of participles of each sentence, and determine the word features of the participles and the type feature vectors of the words for each participle. After the processor obtains the plurality of participles and the word feature of each participle and the type feature vector of the word, the word feature and the type feature vector of the word can be spliced and fused to obtain the word embedding representation of each sentence. And embedding the word of each word into input data of a convolution layer serving as an entity relationship extraction model, and outputting the local characteristics of each statement through the convolution layer after the convolution layer of the entity relationship extraction model is input. And the output local features of each statement are used as input data of the pooling layer of the relation extraction model, and after the input data are input into the pooling layer of the relation extraction model, the global features of each statement can be determined by performing maximum pooling on the input data. Obtaining global features for each sentence through the pooling layer may reduce the dimensionality of the output and may preserve, to some extent, the most salient features of each sentence.
After obtaining the global features of each statement, the processor can classify the global features, and input the classified global features as input data into the full-connection layer of the entity relationship extraction model. The fully-connected layer may output predicted entity relationships for each pair of entities, e.g., using a softmax classifier to obtain a particular relationship type at the time of output. After the processor obtains the predicted entity relationship output by the entity relationship extraction model, the processor can compare the entity relationship with the predicted entity relationship according to the entity relationship labeled in advance, so as to determine the accuracy of the predicted entity relationship. When the accuracy of the preset entity relationship output by the entity relationship extraction model reaches the preset threshold set by the processor, the processor may determine that the entity relationship extraction model is completely trained at this time.
And 105, constructing a knowledge graph based on the oil and gas pipeline according to the entity and the entity relation.
The processor can determine the entity in the labeled data corpus of the field where the oil and gas pipeline is located according to the entity recognition learning model, and determine the relationship of the obtained entity through the trained entity relationship extraction model, so that the knowledge graph based on the oil and gas pipeline field is constructed according to the obtained entity and the entity relationship.
In one embodiment, MySQL is used to store user information, atlas information for the knowledge-atlas, and data for the domain ontology; storing the entities and entity relationships using MongoDB; storing a graph structure of a knowledge graph using a graph database Neo4 j; the textual index data is stored using an ElasticSearch.
The processor may use MySQL, a relational database management system, to store the user information, the knowledge map of the oil and gas pipelines, and the data of the domain ontology of the oil and gas pipelines for the relational database. For a file database, a database based on MongoDB, distributed file storage, may be used to store information for the entity layer, including entities and entity relationships. Storing a graph structure of the constructed knowledge graph of the oil and gas pipeline by using a graph database Neo4 j; and storing the text index data of the knowledge graph of the oil and gas pipeline through an ElasticSearch.
In one embodiment, a processor configured to perform the above-described hydrocarbon pipeline-based knowledge-map construction method is provided.
The oil and gas pipeline knowledge graph needs to be constructed based on mass, multi-source and heterogeneous oil and gas pipeline field data, wherein the multi-source mainly refers to diversification of data sources, and the heterogeneous mainly refers to difference of data structures.
The processor may construct a domain ontology for the oil and gas pipeline based on the oil and gas pipeline full life cycle business. When the processor constructs a domain ontology based on the oil and gas pipeline full life cycle service, the processor can firstly process literature resources in the oil and gas pipeline domain. The processor can acquire related literature data such as vocabularies, terms, rule templates and the like in the field of oil and gas pipelines, and then the acquired related literature data is used as a resource library in the field of the oil and gas pipelines. And the data in the resource library is summarized and sorted according to each stage of the full life cycle of the oil-gas pipeline, so that a basic text set related to the oil-gas field is constructed. After the basic text set related to the oil and gas field is constructed, the processor can classify the basic text set according to the service content and characteristics of each stage of the whole life cycle of the oil and gas pipeline. For example, the processor may classify the oil and gas pipeline domain knowledge into five major categories of planning, design, construction, operational maintenance, disposal, and so forth. After classification is complete, the processor may determine the concept for each category and perform a layer-by-layer decomposition.
The processor may perform word segmentation on the base text set according to the classified categories. The processor can use a Chinese word segmentation system to segment words of the basic text set of the oil and gas pipeline, so that words included in each category are determined, and a basic word bank in the field of the oil and gas pipeline is obtained. After the basic word bank in the field of the oil and gas pipeline is determined, the processor can define and label words contained in the basic word bank according to preset semantic types and semantic relation sets, and therefore the basic semantic concept set of the oil and gas pipeline is constructed. The semantic type and the semantic relation can be set by experts in the field of oil and gas pipelines and stored in the processor, so that the processor can define and label the concept semantic type and the semantic relation in the field of the oil and gas pipelines according to the preset semantic type and the semantic relation set. Thereby can construct the field body of oil and gas pipeline.
After the processor successfully constructs the field body of the oil and gas pipeline, the processor can acquire text data of the field where the oil and gas pipeline is located, wherein the text data can be derived from service data of service systems such as an integrity management system, a production system, a scientific research management system, a standard query system and the like related to the oil and gas pipeline; journal and literature data in the field of oil and gas pipelines contained in external systems in the oil and gas pipeline industry; and data in the oil and gas pipeline field included in the webpage related to the oil and gas pipeline field. That is, the processor may obtain all relevant, queryable textual data in the field of oil and gas pipelines, and the textual data may include three types of structured, semi-structured, and unstructured. The obtained text data of the field where the oil and gas pipeline is located can be preprocessed, for example, the obtained text data of the field where the oil and gas pipeline is located is subjected to data electronization, and the electronized text data of the field where the oil and gas pipeline is located is used as a labeling object.
After the processor constructs the field body of the oil and gas pipeline, the preprocessed text data can be labeled according to the field body. The processor can use the established oil and gas pipeline field body as a marking basis to carry out type marking and semantic marking on the text data in the oil and gas pipeline field and establish an oil and gas pipeline marking data corpus. Technicians can also manually or semi-automatically label the type and the semantic of the text data by combining the processor according to the oil and gas pipeline field body constructed by the processor so as to construct a labeled data corpus of the oil and gas pipeline field.
After the processor constructs the labeled data corpus in the oil and gas pipeline field, sentences contained in the labeled data corpus can be input into the entity recognition learning model. After receiving the sentences contained in the input labeled data corpus, the entity recognition model can extract features from the sentences, and a bidirectional long-term and short-term memory network is adopted to extract the features of each character in the sentences, so that the entities in the sentences are recognized by effectively utilizing context information. And extracting the entities in the sentence through the entity recognition learning model.
After the processor extracts the entities in the statement through the entity recognition model. The obtained entities may be input into an entity relationship extraction model, which extracts entity relationships between the entities by using a CNN neural network.
The processor can determine the entity in the labeled data corpus of the field where the oil and gas pipeline is located according to the entity recognition learning model, and determine the relationship of the obtained entity through the trained entity relationship extraction model, so that the knowledge graph based on the oil and gas pipeline field is constructed according to the obtained entity and the entity relationship.
Further, after the oil and gas pipeline knowledge graph is constructed, the processor can extract the knowledge stored in the file database and the graph database through the oil and gas pipeline knowledge graph and perform visual display, and as shown in fig. 2, the oil and gas pipeline geological disaster related data are visualized through the oil and gas pipeline knowledge graph.
Before extracting the relationship of the entity through the entity relationship extraction model, the processor needs to train the entity relationship extraction model. When the entity relationship extraction model is trained, the processor can acquire text data of the fields of a plurality of oil and gas pipelines as sample data, wherein the text data can be derived from service data of service systems such as an integrity management system, a production system, a scientific research management system, a standard query system and the like related to the oil and gas pipelines or periodical and literature data in the oil and gas pipeline field contained in an external system in the oil and gas pipeline industry. After sample data is obtained, the sample data may be preprocessed. Namely, the obtained sample data is electronized and then used as a labeling object, the constructed oil and gas pipeline field body is used as a labeling basis to carry out type labeling and semantic labeling, and an oil and gas pipeline labeling data corpus is constructed. The sentences contained in the data corpus contain entities and entity relation information.
The processor may input the preprocessed sample data to the entity relationship extraction model, that is, the preprocessed statements containing the entity and the entity relationship in the corpus are input to the entity relationship extraction model. After receiving the preprocessed sample data, the entity relationship extraction model can segment the sentences contained in the sample data to obtain a plurality of participles of each sentence, and determine the word features of the participles and the type feature vectors of the words for each participle. After the processor obtains the plurality of participles and the word feature of each participle and the type feature vector of the word, the word feature and the type feature vector of the word can be spliced and fused to obtain the word embedding representation of each sentence. And embedding the word of each word into input data of a convolution layer serving as an entity relationship extraction model, and outputting the local characteristics of each statement through the convolution layer after the convolution layer of the entity relationship extraction model is input. And the output local features of each statement are used as input data of the pooling layer of the relation extraction model, and after the input data are input into the pooling layer of the relation extraction model, the global features of each statement can be determined by performing maximum pooling on the input data. Obtaining global features for each sentence through the pooling layer may reduce the dimensionality of the output and may preserve, to some extent, the most salient features of each sentence.
After obtaining the global features of each statement, the processor can classify the global features, and input the classified global features as input data into the full-connection layer of the entity relationship extraction model. The fully-connected layer may output predicted entity relationships for each pair of entities, e.g., using a softmax classifier to obtain a particular relationship type at the time of output. After the processor obtains the predicted entity relationship output by the entity relationship extraction model, the processor can compare the entity relationship with the predicted entity relationship according to the entity relationship labeled in advance, so as to determine the accuracy of the predicted entity relationship. When the accuracy of the preset entity relationship output by the entity relationship extraction model reaches the preset threshold set by the processor, the processor can determine that the entity relationship extraction model is completely trained at the moment, so that the entity is extracted by using the trained entity relationship extraction model.
According to the technical scheme, the text data of the field where the oil and gas pipeline is located is obtained, the text data is processed, the entity and the entity relation of the processed text data are extracted, and the knowledge graph of the oil and gas pipeline is constructed according to the extracted entity and the entity relation. Therefore, the intelligent pipe network application such as knowledge retrieval, decision support and the like can be better supported, and the knowledge sharing in the field of oil and gas pipelines is promoted. Further, the entity recognition and entity relationship extraction model is trained in a mode of combining machine learning and feature fusion, and the problem that the relationships among entities need to be automatically extracted from a large amount of unstructured data to construct a knowledge graph is solved. And the knowledge stored in the file database and the database can be extracted through the oil and gas pipeline knowledge map, and visualized display is carried out.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. One or more than one kernel can be set, and the oil and gas pipeline-based knowledge graph construction method is realized by adjusting kernel parameters.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), including at least one memory chip.
The embodiment of the application provides a storage medium, wherein a program is stored on the storage medium, and the program is executed by a processor to realize the oil and gas pipeline-based knowledge graph construction method.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 3. The computer device includes a processor a01, a network interface a02, a memory (not shown in the figure), and a database (not shown in the figure) connected through a system bus. Wherein processor a01 of the computer device is used to provide computing and control capabilities. The memory of the computer device comprises an internal memory a03 and a non-volatile storage medium a 04. The nonvolatile storage medium a04 stores an operating system B01, a computer program B02, and a database (not shown in the figure). The internal memory a03 provides an environment for the operation of the operating system B01 and the computer programs B02 in the non-volatile storage medium a 04. The database of the computer equipment is used for storing the acquired massive related data in the field of multi-source heterogeneous oil and gas pipelines. The network interface a02 of the computer device is used for communication with an external terminal through a network connection. The computer program B02 is executed by the processor a01 to implement a method of hydrocarbon pipeline based knowledge graph construction.
Those skilled in the art will appreciate that the architecture shown in fig. 3 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
The embodiment of the application provides equipment, the equipment comprises a processor, a memory and a program which is stored on the memory and can run on the processor, and the following steps are realized when the processor executes the program: acquiring text data of the field where the oil and gas pipeline is located; preprocessing the text data, and labeling the preprocessed text data to construct a labeled data corpus of the oil and gas pipeline; inputting the sentences contained in the labeled data corpus into an entity recognition learning model so as to extract the entities contained in the sentences through the entity recognition learning model; inputting the entities into an entity relationship extraction model so as to determine entity relationships among the entities through the entity relationship extraction model; and constructing a knowledge graph based on the oil and gas pipeline according to the entity and the entity relation.
In one embodiment, before text data of a field where an oil and gas pipeline is located is obtained, a field body based on oil and gas pipeline full life cycle service is constructed; and marking the preprocessed text data according to the domain ontology.
In one embodiment, a resource library of the oil and gas pipeline field is constructed, the resource library comprising at least one of a plurality of vocabularies, terms and rule templates within the oil and gas pipeline field; summarizing and sorting data in a resource library according to each stage of the full life cycle of the oil and gas pipeline to construct a basic text set; classifying resources in the basic text set according to the content and the characteristics of the oil and gas pipeline service; performing word segmentation on resources in the basic text set, and determining words included under each classification to obtain a basic word bank; and labeling words contained in the basic word bank according to the preset semantic type and semantic relation set of the oil and gas pipeline field to construct a basic semantic concept set of the oil and gas pipeline.
In one embodiment, MySQL is used to store user information, atlas information for the knowledge-atlas, and data for the domain ontology; storing the entities and entity relationships using MongoDB; storing a graph structure of a knowledge graph using a graph database Neo4 j; the textual index data is stored using an ElasticSearch.
In one embodiment, the method further comprises the step of training the entity relationship extraction model, including: acquiring text data of the field where the oil and gas pipelines are located as sample data; preprocessing sample data; inputting the preprocessed sample data to an entity relation extraction model; segmenting sentences contained in input sample data through an entity relationship extraction model to obtain a plurality of participles of each sentence, word characteristics of each participle and type characteristic vectors of the words; splicing and fusing the word features and the type feature vectors of the words to obtain word embedding representation of each sentence; embedding the words of each statement into input data of a convolutional layer serving as an entity relationship extraction model, and acquiring local characteristics of each statement output by the convolutional layer; the local features of each statement are used as input data of a pooling layer of the entity relationship extraction model, and the global features of each statement are determined through the pooling layer; classifying the global features, and taking the classified global features as input data of a full connection layer of the entity relation extraction model; acquiring a predicted entity relation output by a full connection layer and aiming at each pair of entities; determining the prediction accuracy of the entity relation extraction model according to the entity relation labeled in advance and the predicted entity relation; and under the condition that the prediction accuracy reaches a preset threshold value, determining that the entity relationship extraction model is completely trained.
In one embodiment, the business data contained in at least one of a management system, a production system, a scientific research management system and a standard query system in the field of oil and gas pipelines; journal and literature data in the field of oil and gas pipelines contained in external systems; data within the field of oil and gas pipelines included in the web page.
In one embodiment, the entity recognition learning model is a two-way long-term memory network and the entity relationship extraction model is a CNN neural network.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art to which the present application pertains. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. A knowledge graph construction method based on an oil and gas pipeline is characterized by comprising the following steps:
acquiring text data of the field where the oil and gas pipeline is located;
preprocessing the text data, and labeling the preprocessed text data to construct a labeled data corpus of the oil and gas pipeline;
inputting the sentences contained in the annotated data corpus into an entity recognition learning model so as to extract the entities contained in the sentences through the entity recognition learning model;
inputting the entities into an entity relationship extraction model to determine entity relationships between the entities through the entity relationship extraction model;
and constructing a knowledge graph based on the oil and gas pipeline according to the entity and the entity relation.
2. The method of claim 1, further comprising:
before acquiring text data of the field where the oil and gas pipeline is located, constructing a field body based on the oil and gas pipeline full life cycle service;
and marking the preprocessed text data according to the domain ontology.
3. The method of claim 2, wherein the constructing a domain ontology based on the oil and gas pipeline full lifecycle business comprises:
constructing a resource library of a field of oil and gas pipelines, the resource library comprising at least one of a plurality of vocabularies, terms, and rule templates within the field of oil and gas pipelines;
summarizing and sorting the data in the resource library according to each stage of the full life cycle of the oil and gas pipeline to construct a basic text set;
classifying the resources in the basic text set according to the content and the characteristics of the oil and gas pipeline service;
performing word segmentation on the resources in the basic text set, and determining words included in each classification to obtain a basic word stock;
and labeling words contained in the basic word bank according to a preset semantic type and a semantic relation set in the field of the oil and gas pipeline so as to construct a basic semantic concept set of the oil and gas pipeline.
4. The method of claim 2, further comprising:
storing user information, map information of the knowledge map and data of the domain ontology by using MySQL;
storing the entity and the entity relationship using MongoDB;
storing a graph structure of the knowledge-graph using a graph database Neo4 j;
the textual index data is stored using an ElasticSearch.
5. The method of claim 1, further comprising the step of training the entity-relationship extraction model, comprising:
acquiring text data of the field where the oil and gas pipelines are located as sample data;
preprocessing the sample data;
inputting the preprocessed sample data into the entity relationship extraction model;
segmenting sentences contained in input sample data through the entity relationship extraction model to obtain a plurality of participles of each sentence and word characteristics of each participle and type characteristic vectors of the words;
splicing and fusing the word features and the type feature vectors of the words to obtain word embedding representation of each sentence;
embedding the words of each statement into input data of a convolution layer serving as the entity relationship extraction model, and acquiring local features of each statement output by the convolution layer;
taking the local features of each statement as input data of a pooling layer of the entity relationship extraction model, and determining the global features of each statement through the pooling layer;
classifying the global features, and taking the classified global features as input data of a full connection layer of the entity relation extraction model;
acquiring a predicted entity relationship output by the full connection layer and aiming at each pair of entities;
determining the prediction accuracy of the entity relation extraction model according to the entity relation labeled in advance and the predicted entity relation;
and determining that the entity relationship extraction model is completely trained under the condition that the prediction accuracy reaches a preset threshold value.
6. The method of claim 1, wherein the text data comprises at least one of:
business data contained in at least one of a management system, a production system, a scientific research management system and a standard query system in the field of oil and gas pipelines;
journal and literature data in the field of said oil and gas pipelines contained in external systems;
data within the oil and gas pipeline domain included in a web page.
7. The method according to any one of claims 1 to 6, wherein the entity recognition learning model is a two-way long-term memory network, and the entity relationship extraction model is a CNN neural network.
8. A processor configured to perform the hydrocarbon pipeline based knowledge graph construction method of any one of claims 1 to 7.
9. A hydrocarbon pipeline based knowledge graph building apparatus comprising the processor of claim 8.
10. A machine readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, cause the processor to be configured to perform the hydrocarbon pipeline based knowledge graph construction method according to any one of claims 1 to 7.
CN202111578731.0A 2021-12-22 2021-12-22 Oil and gas pipeline-based knowledge graph construction method and processor Pending CN114428862A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111578731.0A CN114428862A (en) 2021-12-22 2021-12-22 Oil and gas pipeline-based knowledge graph construction method and processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111578731.0A CN114428862A (en) 2021-12-22 2021-12-22 Oil and gas pipeline-based knowledge graph construction method and processor

Publications (1)

Publication Number Publication Date
CN114428862A true CN114428862A (en) 2022-05-03

Family

ID=81310735

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111578731.0A Pending CN114428862A (en) 2021-12-22 2021-12-22 Oil and gas pipeline-based knowledge graph construction method and processor

Country Status (1)

Country Link
CN (1) CN114428862A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115495593A (en) * 2022-10-13 2022-12-20 中原工学院 Mathematical knowledge graph construction method based on big data
CN116450776A (en) * 2023-04-23 2023-07-18 北京石油化工学院 Oil-gas pipe network law and regulation and technical standard retrieval system based on knowledge graph
CN117114092A (en) * 2023-08-09 2023-11-24 昆仑数智科技有限责任公司 Conduction updating method, system, equipment and medium for oil and gas reserves calculation data

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115495593A (en) * 2022-10-13 2022-12-20 中原工学院 Mathematical knowledge graph construction method based on big data
CN115495593B (en) * 2022-10-13 2023-08-01 中原工学院 Mathematical knowledge graph construction method based on big data
CN116450776A (en) * 2023-04-23 2023-07-18 北京石油化工学院 Oil-gas pipe network law and regulation and technical standard retrieval system based on knowledge graph
CN117114092A (en) * 2023-08-09 2023-11-24 昆仑数智科技有限责任公司 Conduction updating method, system, equipment and medium for oil and gas reserves calculation data
CN117114092B (en) * 2023-08-09 2024-04-30 昆仑数智科技有限责任公司 Conduction updating method, system, equipment and medium for oil and gas reserves calculation data

Similar Documents

Publication Publication Date Title
Trupthi et al. Sentiment analysis on twitter using streaming API
CN110968700B (en) Method and device for constructing domain event map integrating multiple types of affairs and entity knowledge
Nguyen et al. Recurrent neural network-based models for recognizing requisite and effectuation parts in legal texts
CN114428862A (en) Oil and gas pipeline-based knowledge graph construction method and processor
US9058317B1 (en) System and method for machine learning management
CN111552766B (en) Using machine learning to characterize reference relationships applied on reference graphs
CN112559734B (en) Brief report generating method, brief report generating device, electronic equipment and computer readable storage medium
US11620453B2 (en) System and method for artificial intelligence driven document analysis, including searching, indexing, comparing or associating datasets based on learned representations
CN112528010A (en) Knowledge recommendation method and device, computer equipment and readable storage medium
CN116228383A (en) Risk prediction method and device, storage medium and electronic equipment
Kaur et al. A review of artificial intelligence techniques for requirement engineering
CN110968776A (en) Policy knowledge recommendation method, device storage medium and processor
US10387472B2 (en) Expert stance classification using computerized text analytics
Adhikari et al. Privacy policy analysis with sentence classification
US11341188B2 (en) Expert stance classification using computerized text analytics
Saleh et al. Finding semantic relationships in folksonomies
CA3104292C (en) Systems and methods for identifying and linking events in structured proceedings
Amato et al. A hybrid approach for document analysis in digital forensic domain
Moreira et al. Deepex: A robust weak supervision system for knowledge base augmentation
Rajesh et al. Significance of natural language processing in data analysis using business intelligence
Sharma et al. An efficient development framework for the generation of a local knowledge graph
CN117633197B (en) Search information generation method and device applied to paraphrasing document and electronic equipment
CN115269851B (en) Article classification method, apparatus, electronic device, storage medium and program product
CN116595192B (en) Technological front information acquisition method and device, electronic equipment and readable storage medium
Kabra et al. Automated Content Generation System Using Neural Text Generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination