CN114218333A - Geological knowledge map construction method and device, electronic equipment and storage medium - Google Patents

Geological knowledge map construction method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114218333A
CN114218333A CN202111422212.5A CN202111422212A CN114218333A CN 114218333 A CN114218333 A CN 114218333A CN 202111422212 A CN202111422212 A CN 202111422212A CN 114218333 A CN114218333 A CN 114218333A
Authority
CN
China
Prior art keywords
geological
entity
data
model
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111422212.5A
Other languages
Chinese (zh)
Inventor
黄进
王晴
杨涛
刘鑫
李剑波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Jiaotong University
Original Assignee
Southwest Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Jiaotong University filed Critical Southwest Jiaotong University
Priority to CN202111422212.5A priority Critical patent/CN114218333A/en
Publication of CN114218333A publication Critical patent/CN114218333A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Remote Sensing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a geological knowledge graph construction method, a geological knowledge graph construction device, electronic equipment and a storage medium, relates to the technical field of natural language processing, and solves the technical problem that the conventional unstructured data cannot be used for constructing a knowledge graph; the invention relates to a geological knowledge map construction method, which comprises the following steps: s1, establishing a model; s2, structuring entity information; s3, structuring entity relationship information; and S4, constructing a knowledge graph. The invention extracts the entity data and the entity relation data in the unstructured data, constructs a more perfect knowledge map in the geological field, and is convenient for a user to inquire and know related information. Meanwhile, the invention utilizes the Neo4j graphical database to construct the knowledge graph of the achievement geological and mineral field and optimize the visual interface, and the knowledge graph of the geological and mineral field is presented in a 3D manner, thereby improving the query convenience of users.

Description

Geological knowledge map construction method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of natural language processing, in particular to the technical field of geological knowledge map construction methods and devices, electronic equipment and storage media.
Background
The concept of knowledge graph was first proposed by Google corporation as a hot direction for natural language processing, since its birth was 2012. Knowledge maps were proposed to accurately describe the relationships between people, things, and were first applied to search engines. With the development of science and technology, the technology is widely applied to a plurality of important fields such as chat robots, intelligent medical treatment, recommendation systems, geological data and the like. The collected result geological data with main geological data sources and rich information refers to the result geological data which is collected by a geological data collector according to the regulations and then stored and provided for utilization by a collection mechanism. The collected result geological data is an important basic information resource of the country and is a socialized public product.
At present, natural resource administrative departments and geological data collection organizations have set up an internet + geological cloud data management and service system, and are expected to realize cross-region and cross-hierarchy information sharing, service collaboration and multi-level linkage so as to meet the requirement that users in a shop or users with qualified internet acquire more valuable information. However, the existing geological data knowledge map construction has the following problems:
(1) data of the collected result geology are extracted from different data sources: the data source channel is multiple, the main stream is two channels, the first is the data of the service, and the second is the data published and captured on the network; therefore, the data structures obtained from different fields are different greatly, and the data structures can be mainly divided into structured data, semi-structured data and unstructured data based on the data structure, wherein the structured data and the semi-structured data can be directly utilized, and the unstructured data cannot be directly utilized; therefore, how to extract and utilize the information of the unstructured data is the first difficult problem of the construction of the quality data knowledge graph;
(2) the difficulty of extracting the related information is large: the difficulty of information extraction lies in processing unstructured data, and the unstructured data extraction mainly comprises text information extraction, entity identification, relation extraction, concept extraction, event extraction and the like; meanwhile, the information extraction can be divided into information extraction facing a specific field and information extraction facing an open field; how to extract the relationship between entities is another difficult problem in constructing the knowledge graph of the qualitative information.
Disclosure of Invention
The invention aims to: in order to solve the technical problems and make up the defects of the existing corpus, the corpus in the geological field of collection achievement needs to be designed; based on the above, the invention provides a geological knowledge map construction method, a geological knowledge map construction device, electronic equipment and a storage medium, which mainly aim at the directions of construction of a data set in a collected result geological field, named entity identification and relationship extraction technology, so that the construction of the data set in the collected result geological field can be promoted, and the construction of the geological knowledge map in the collected result geological field can be promoted. The present invention is based on the root cause of the two problems mentioned above: at present, no open corpus in the field of collection achievement geology and mineral products and the related field is developed.
The invention combs the entity type, entity relationship and entity attribute of geological data by adopting a natural language processing technology to extract the entity, relationship and attribute content of the text. While the data is organized graphically in the context of Neo4j graph databases. Therefore, the geological field corpus is constructed by arranging the multivariate heterogeneous data of the collected result geological data, and the entity and the relation are extracted from the unstructured text in the collected result geological data to construct the result geological field knowledge map, so that the query and the borrowing of users in the museum can be facilitated, the cross-region and cross-level information sharing, the service cooperation and the multi-level linkage can be realized, and the condition that the users in the museum or the qualified users in the Internet obtain more valuable information can be met.
In order to achieve the above object, the present invention discloses the following:
in a first aspect, the invention discloses a geological knowledge map construction method, which comprises the following steps:
s1, model establishment: training a geological entity information extraction model for extracting geological related entity information in the acquired data through a deep learning neural network model; training a geological entity relationship information extraction model for extracting geological related entity relationship information in the acquired data through a deep learning neural network model;
s2, structuring entity information: the geological entity information extraction model carries out entity information extraction on the obtained data information to obtain structured data of geological information;
s3, structuring entity relationship information: the geological entity relationship information extraction model extracts the entity relationship information of the acquired data information to obtain the structured data of the geological entity relationship information;
s4, knowledge graph construction: and storing the data extracted by the model and constructing a geological knowledge map.
Further, the geologically-relevant entity information includes one or more of the following: mineral type, administrative area, strata, metallic elements, orientation, organizational structure.
Further, the geologically relevant entity relationship information includes one or more of the following: spatial relationships, semantic relationships, temporal relationships.
Further, in the step S4, the data extracted by the model is stored in the Neo4j database in Cypher language, and a geological knowledge map is constructed according to the data stored in the Neo4j database.
And further, constructing a three-dimensional knowledge map according to the constructed geological knowledge map.
Further, the model training step in S1 includes the following steps:
a. constructing a geological mineral domain corpus;
b. marking feature data related to geological entity information and geological relation information in the acquired data according to the content of the corpus;
c. inputting the marked data into a network model consisting of BERT + LSTM + CRF for training to obtain a geological entity information extraction model;
d. and (4) carrying out triple labeling on the labeled data, and inputting the data into a BERT network model for training to obtain a geological entity relationship information extraction model.
Further, entities in the corpus are labeled in a BIO format, and the relationship information is labeled by a triple labeling method.
In a second aspect, the invention discloses a geological knowledge map construction device, comprising:
an entity model acquisition module: the geological entity information extraction model is used for obtaining a trained geological entity information extraction model;
an entity relationship model acquisition module: the geological entity relation information extraction model is used for obtaining a trained geological entity relation information extraction model;
a data input module: the entity relation model obtaining module is used for obtaining the entity relation model of the entity;
a data storage module: the data storage module is used for storing the data output by the entity model acquisition module and the entity relationship model acquisition module;
a knowledge graph construction module: and the knowledge graph building module is used for building a knowledge graph according to the data output by the entity model obtaining module and the entity relation model obtaining module.
In a third aspect, the present invention discloses an electronic device, comprising one or more processors and a storage device for storing one or more programs, which when executed by the processors, cause the processors to implement the geological knowledge base construction method of any one of the above.
In a fourth aspect, the present invention discloses a storage medium, which is a computer readable storage medium storing a computer program, the computer program comprising program instructions, which when executed by a processor, implement the geological knowledge base construction method of any one of the above.
The invention has the following beneficial effects:
marking geological related data to obtain data of entity marking and entity relation marking, and then respectively inputting the data into a network model consisting of BERT + LSTM + CRF and a BERT model for training to respectively obtain a geological entity information extraction model and a geological entity relation information extraction model; then inputting data to be processed, such as non-structural data, into the two models, namely, performing entity extraction and relationship extraction on the data by adopting a natural language processing technology through the models, namely obtaining the structural data of the geological information and the structural data of the geological entity relationship information respectively; then organizing the obtained data in a map mode under the environment of a Neo4j map database to obtain a geological knowledge map; based on the operation, the entity data and the entity relation data in the unstructured data can be extracted by utilizing the existing geological related unstructured data, a more perfect knowledge map in the geological field is constructed, and a user can conveniently inquire and know related information.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a schematic diagram of the apparatus of the present invention;
FIG. 3 is a schematic diagram of a three-dimensional knowledge-graph;
FIG. 4 is a partial schematic view of a knowledge-graph;
FIG. 5 is an overall schematic of a knowledge-graph;
FIG. 6 is a geological profile knowledge-graph relationship extraction flow chart;
FIG. 7 is a diagram construction process based on geology collection files and network geology.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
As shown in fig. 1 to 7, the present embodiment provides a geological knowledge base construction method, including:
s1, model establishment:
a. constructing a geological mineral domain corpus;
b. acquiring data information related to geology, and then labeling entity characteristics related to the geology by the data information, wherein the entity characteristics related to the geology are related to entity information related to the geology; preferably using yedda labeling tools; the geologically-relevant entity information includes one or more of the following: mineral type, administrative region, stratum, metal element, orientation, organization; the entity relation information adopts an LANn marking tool to mark the triple entity-relation-entity;
c. inputting the data marked in the step b: training in a network model consisting of BERT + LSTM + CRF to obtain a geological entity information extraction model;
d. acquiring geological related data information, marking geological related entity characteristics by the geological related data information, and adopting a yedda labeling tool; the labeling information mainly comprises: mineral product information, organization, names and geographical areas, and then carrying out entity relation information annotation in a triple annotation mode by adopting a Lann annotation tool, namely 'entity-relation-entity', so as to obtain entity relation information related to geology; the geologically relevant entity relationship information includes one or more of the following: spatial relationship, semantic relationship, temporal relationship;
e. inputting the data marked in the step d into a BERT model for training to obtain a geological entity relationship information extraction model;
s2, structuring entity information: inputting unstructured information related to the obtained geological information into a geological entity information extraction model to obtain structured data of the geological information;
s3, structuring entity relationship information: inputting the obtained unstructured information related to the geological information into a geological entity relationship information extraction model to obtain structured data of the geological entity relationship information, namely 'entity-relationship-entity' structured data of a triple labeling mode;
s4, knowledge graph construction: storing the structured data obtained by the two models into a Neo4j database by using a Cypher language; and carrying out geological knowledge map construction based on the obtained data.
Marking geological related data to obtain data of entity marking and entity relation marking, and then respectively inputting the data into a network model consisting of BERT + LSTM + CRF and a BERT model for training to respectively obtain a geological entity information extraction model and a geological entity relation information extraction model; then inputting data to be processed, such as non-structural data, into the two models, namely, performing entity extraction and relationship extraction on the data by adopting a natural language processing technology through the models, namely obtaining the structural data of the geological information and the structural data of the geological entity relationship information respectively; and then organizing the obtained data in a map mode under the environment of a Neo4j map database to obtain the geological knowledge map.
Based on the operation, the existing geological related unstructured data can be utilized to extract the entity data and the entity relation data in the unstructured data, so that a more perfect knowledge map in the geological field is constructed, and a user can conveniently inquire and know related information.
Example 2
The invention solves the knowledge updating problem of the knowledge graph by utilizing the Cypher graph database query language, is easy to understand, and is convenient for a user to update unreasonable graph databases. And constructing a knowledge map based on the extracted entities, relations and attributes and by combining the characteristics of the knowledge in the geological and mineral field. Then, the Neo4j is used for storing the mineral knowledge graph, namely, the constructed triple entity-relation-entity is stored in a database by using Cypher language; and then extracting entities and attributes from the natural sentences of the users, injecting the entities and the attributes into a Cypher query template to realize query in the geological mineral knowledge map, and researching a geological mineral knowledge map query system on the basis to realize the functions of entity query, relation query visualization query and the like.
Example 3
In this embodiment, based on embodiment 1, the present invention is further described, in S5, the data extracted by the model is stored in the Neo4j database in Cypher language, and a geological knowledge map is constructed according to the data stored in the Neo4j database.
And further, constructing a three-dimensional knowledge map according to the constructed geological knowledge map.
The steps of constructing the three-dimensional knowledge graph are as follows:
due to the high query performance and customizable query language of the Neo4j graphical database, the method not only can query the relationship between entities, but also can realize the query of geological minerals so as to return quick, accurate and structured knowledge. And (3) inquiring the geological and mineral knowledge, based on a customizable Cypher inquiry language of a Neo4j graph database, injecting entities and attributes into a Cypher inquiry template to inquire corresponding node data. As shown in FIG. 3, real-time visual display analysis of geological mineral data can be realized
Example 4
The following method is specifically adopted for the text annotation of "constructing a corpus of geological and mineral domains" in the above embodiment 1: entities in the corpus are labeled in a BIO format, and relationship information is labeled by a triple labeling method.
The corpus is a collection of a large number of texts, text preprocessing is the basic work for processing all the texts, and the texts in the corpus can be more conveniently applied to subsequent data processing after the text preprocessing. In this embodiment, the notation in BIO format is used herein, i.e., each element is labeled as "B-X", "I-X" or "O". "B-X" indicates that the segment where the element is located belongs to X type and at the beginning of the segment, "I-X" indicates that the segment where the element is located belongs to X type and in the middle of the segment, wherein "X" is geologically related entity information which comprises information such as geological mineral names, geographical region names, organization mechanisms and the like. "O" means not of any type. The partial corpus constructed in the BIO format is shown in Table 1. For example, the southwest geological survey with the initials labeled "B-ORG" indicates that "West" is an entity with the initials of this entity and belongs to the type "ORG", where "ORG" represents that an organizational entity is abstracted, and for example, "LOC" represents that a geographical area entity is abstracted, and "ROCK" represents that a geological mineral entity is abstracted.
TABLE 1 BIO Format construction of a corpus
Chinese anticipation Label (R)
To be received O
Western medicine B-ORG
South China I-ORG
Ground I-ORG
Quality of food I-ORG
Regulating device I-ORG
Check the I-ORG
What is needed is I-ORG
An O
Row board O
Into O
Line of O
Regulating device O
Check the O
O
The established model is used for converting unstructured data in the data related to geology into structured data by utilizing the existing related technologies such as named entity identification and relation extraction.
Named entity recognition refers to recognition of named entities marked in text, including names of people, organizations, places, and the like. Named entity recognition is an important fundamental task of natural language processing, and its foundation is embodied in the improvement of the named entity task performance, which further facilitates the conversion of unstructured text into structured text. An entity is a core element of a knowledge graph. The geological related data has rich domain characteristics, the types, inherent relations and attributes of geological entities are combed, the recognition and marking work of the geological entities is completed, and the method is a premise for establishing a 'result geological content label' geological data corpus. The knowledge graph is constructed by multi-source heterogeneous data with multiple data sources, namely different sources and different structures, and the data difference mainly refers to structured, semi-structured and unstructured data. The unstructured data is converted into a triple form conforming to the graph structure through named entity recognition and relation extraction.
Meanwhile, the method carries out Chinese geological named entity recognition under the Bert framework, and automatically extracts entities from large-scale geological non-structural data by adopting a pre-training corpus mode. Bert mainly comprises two stages of training and fine tuning. The Bert framework is employed herein for named entity recognition. The Bert model mainly utilizes the Encoder structure of a Transformer, adopts the most original Transformer, utilizes large-scale unmarked corpus training, and gradually adjusts model parameters in the training process, namely: and (4) performing fine adjustment on the subsequent specific NLP task by semantic representation of the text. The Bert requires only one extra output layer to fine-tune the pre-trained model. Such as "scheduled for investigation by southwest geological survey" in the textual data. Iron minerals are produced in lower strata in the Jurassic period, the ores are hematite, and unstructured data such as oolitic structures or conglomerate structures need to be extracted, wherein the entity contents such as the organization name of the geological survey institute in the southwest and the name of the geological minerals such as the hematite are required to be extracted.
Training the labeled corpus can obtain the result of entity extraction, as shown in table 2. It can be seen from the figure that the extracted entities include information such as organizational structures, geographic locations, and geological minerals. Wherein "LOC" represents that a geographical region entity is extracted, "ROCK" represents that a geological mineral entity is extracted, and "ORG" represents that an organization entity is extracted.
Table 2 entity extraction example
Entity type Entity label
Zuangjiang county LOC
Southwest geological survey institute ORG
Jurassic period LOC
Hematite (iron ore) ROCK
Manchurian willow beach LOC
White stone pond LOC
Sichuan geological survey institute ORG
Hematite (iron ore) ROCK
Siderite ore ROCK
Sichuan province government ORG
The core idea of the knowledge graph is to represent data as a graph, nodes represent specific objects, information or concepts, and edges represent semantic relationships. And acquiring data about geological mineral products, organization, geographical positions and the like according to the result geological data. After the extracted structured data of the geological information and the structured data of the geological entity relationship information are imported into Neo4j, a geological knowledge map can be constructed based on the existing knowledge map construction method. The invention constructs the knowledge graph by extracting two information of entities and entity relations, and stores the two information of the entities and the entity relations by adopting the graph database Neo4j to obtain the layout structure of the mineral product knowledge graph, as shown in fig. 4, the whole figure can be shown in fig. 5, the 'circles' with the same color belong to the same entity type, the 'circles' with different colors represent different entities, and the connecting lines between the circles represent the relations between the entities. The "circle-line-circle" corresponds to the "entity-relationship-entity" triple. As shown in fig. 5, red represents geological mineral, more lines of dark grey represent geographical positions, light grey represents names of geological briefs, less lines of dark grey represent names of organizations, and the relationship between them is connected by lines, thereby constructing a knowledge graph of the geological mineral field. In practice, different colors, such as yellow and red and blue, may be used to distinguish different entity information, and the drawings with the colors removed are used as the drawings of the specification of the application.
Example 4
The entity relationship information structuring relates to entity relationship extraction, the entity relationship extraction is important in the whole geological knowledge map construction process, the geological information based on the entity relationship extraction comprises a space relationship extraction part, a semantic relationship extraction part and a time relationship extraction part, the technical process is shown in figure 6 by taking collection achievement geological archive data as an example, the technical process is only shown in figure 7, and in actual application, the entity relationship extraction method can be used for identifying the entity and extracting the entity relationship respectively according to the entity and the entity relationship according to requirements, and attribute identification and the like can be added. In fig. 6, the collection of archive data is first established by existing archive rules, the spatial relationship extraction is performed by the extraction time limit, and then the physical linkage is performed by the relationship fusion. And then analyzing the structure of the collected archival data, wherein the collected archival data comprises structured data, semi-structured data and unstructured data, and the structured and semi-structured data can be directly utilized, namely, an entity and an entity relation are mapped into a knowledge graph by adopting a rule method. Data such as "the geologic briefs of iron ore in penxian county, sichuan" belong to structured data.
For unstructured data, entities and entity relations need to be extracted from texts, and entity relation extraction is performed, wherein the extraction includes spatial, semantic and temporal relation extraction. Regarding the extraction of unstructured data, preprocessing the collected archive data, such as word segmentation, part of speech tagging and syntactic analysis, then converting the language features into distributed representation, then extracting entities and entity relations to obtain the entities and the entity relations, and finally linking the entities.
Example 5
There are mainly 2 schemes for data storage of knowledge maps: RDF storage and graph database storage. The knowledge representation method in the knowledge graph takes an ontology as a core and a triple mode of RDF as a basic framework, but more embodies the multi-granularity and multi-level semantic relations such as entities, categories, attributes, relations and the like. The triple is a general representation form of the knowledge graph, which is composed of 2 geological mineral entities with semantic connection relation and the relation between the entities, and is a visual representation of mineral knowledge, namely G (head, tail), wherein the head is a head entity of the triple, the tail is a tail entity of the triple, and the relation { R1, R2 …, rn } is a relation set in the knowledge base, and contains | R | different relations.
Example 6
A geological knowledge map construction apparatus which can execute the knowledge map construction method provided in any embodiment of the present invention, comprising:
an entity model acquisition module: the geological entity information extraction model is used for obtaining a trained geological entity information extraction model;
an entity relationship model acquisition module: the geological entity relation information extraction model is used for obtaining a trained geological entity relation information extraction model;
a data input module: the entity relation model obtaining module is used for obtaining the entity relation model of the entity;
a data storage module: the data storage module is used for storing the data output by the entity model acquisition module and the entity relationship model acquisition module;
a knowledge graph construction module: and the knowledge graph building module is used for building a knowledge graph according to the data output by the entity model obtaining module and the entity relation model obtaining module.
The geological knowledge map construction device disclosed by the embodiment comprises an entity model acquisition module, an entity relation model acquisition module, a data input module, a data storage module and a knowledge map construction module, and solves the technical problems that the existing unstructured data cannot be utilized and the constructed knowledge map is incomplete.
Example 7
An electronic device comprising one or more processors and storage means for storing one or more programs which, when executed by the processors, cause the processors to carry out a geological knowledge base construction method as claimed in any one of the preceding claims. The memory is preferably connected to the processor by a bus.
Example 8
A storage medium being a computer readable storage medium storing a computer program comprising program instructions which, when executed by a processor, implement a geological knowledge map construction method as defined in any one of the preceding claims.
The memory is a readable storage medium that can store a software program, a computer executable program. The storage medium executes the software program, instructions, and modules stored in the memory through the processor, thereby executing various functional applications of the server, data processing, and the like, that is, implementing the above-described knowledge graph construction method.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A geological knowledge graph construction method is characterized by comprising the following steps:
s1, model establishment: training a geological entity information extraction model for extracting geological related entity information in the acquired data through a deep learning neural network model; training a geological entity relationship information extraction model for extracting geological related entity relationship information in the acquired data through a deep learning neural network model;
s2, structuring entity information: the geological entity information extraction model carries out entity information extraction on the obtained data information to obtain structured data of geological information;
s3, structuring entity relationship information: the geological entity relationship information extraction model extracts the entity relationship information of the acquired data information to obtain the structured data of the geological entity relationship information;
s4, knowledge graph construction: and storing the data extracted by the model and constructing a geological knowledge map.
2. The geological knowledge map construction method of claim 1, wherein the geologically relevant entity information comprises one or more of the following: mineral type, administrative area, strata, metallic elements, orientation, organizational structure.
3. The geological knowledge map construction method of claim 1, wherein the geologically relevant entity relationship information comprises one or more of the following: spatial relationships, semantic relationships, temporal relationships.
4. The geological knowledge map construction method according to claim 1, wherein in the step S4, the data extracted by the model are stored in a Neo4j database in Cypher language, and the geological knowledge map is constructed according to the data stored in the Neo4j database.
5. The geological knowledge map construction method of claim 4, wherein a three-dimensional knowledge map is constructed from the constructed geological knowledge map.
6. The geological knowledge graph construction method according to any one of claims 1-5, wherein the model training step in S1 comprises the following steps:
a. constructing a geological mineral domain corpus;
b. marking feature data related to geological entity information and geological relation information in the acquired data according to the content of the corpus;
c. inputting the marked data into a network model consisting of BERT + LSTM + CRF for training to obtain a geological entity information extraction model;
d. and (4) carrying out triple labeling on the labeled data, and inputting the data into a BERT network model for training to obtain a geological entity relationship information extraction model.
7. The geological knowledge graph construction method according to claim 6, wherein entities in the corpus are labeled in a BIO format, and the relationship information is labeled by a triple labeling method.
8. A geological knowledge map construction apparatus, comprising:
an entity model acquisition module: the geological entity information extraction model is used for obtaining a trained geological entity information extraction model;
an entity relationship model acquisition module: the geological entity relation information extraction model is used for obtaining a trained geological entity relation information extraction model;
a data input module: the entity relation model obtaining module is used for obtaining the entity relation model of the entity;
a data storage module: the data storage module is used for storing the data output by the entity model acquisition module and the entity relationship model acquisition module;
a knowledge graph construction module: and the knowledge graph building module is used for building a knowledge graph according to the data output by the entity model obtaining module and the entity relation model obtaining module.
9. An electronic device comprising one or more processors and storage means for storing one or more programs which, when executed by the processors, cause the processors to implement a geological knowledge base construction method as claimed in any one of claims 1 to 7.
10. A storage medium being a computer readable storage medium storing a computer program comprising program instructions, wherein the program instructions, when executed by a processor, implement a geological knowledge base construction method according to any of claims 1-7.
CN202111422212.5A 2021-11-26 2021-11-26 Geological knowledge map construction method and device, electronic equipment and storage medium Pending CN114218333A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111422212.5A CN114218333A (en) 2021-11-26 2021-11-26 Geological knowledge map construction method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111422212.5A CN114218333A (en) 2021-11-26 2021-11-26 Geological knowledge map construction method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114218333A true CN114218333A (en) 2022-03-22

Family

ID=80698518

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111422212.5A Pending CN114218333A (en) 2021-11-26 2021-11-26 Geological knowledge map construction method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114218333A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115983381A (en) * 2023-02-28 2023-04-18 华院计算技术(上海)股份有限公司 Knowledge base rapid construction method and system based on online encyclopedia
CN116090662A (en) * 2023-03-02 2023-05-09 中国地质科学院矿产资源研究所 Knowledge-graph-based method and system for predicting potential of copper mine outside environment and electronic equipment
CN116307123A (en) * 2023-02-23 2023-06-23 中国地质大学(武汉) Knowledge graph driving-based mineral resource prediction method and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598000A (en) * 2019-08-01 2019-12-20 达而观信息科技(上海)有限公司 Relationship extraction and knowledge graph construction method based on deep learning model
CN112100401A (en) * 2020-09-14 2020-12-18 北京大学 Knowledge graph construction method, device, equipment and storage medium for scientific and technological service
CN113312501A (en) * 2021-06-29 2021-08-27 中新国际联合研究院 Construction method and device of safety knowledge self-service query system based on knowledge graph
CN113535917A (en) * 2021-06-30 2021-10-22 山东师范大学 Intelligent question-answering method and system based on travel knowledge map

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598000A (en) * 2019-08-01 2019-12-20 达而观信息科技(上海)有限公司 Relationship extraction and knowledge graph construction method based on deep learning model
CN112100401A (en) * 2020-09-14 2020-12-18 北京大学 Knowledge graph construction method, device, equipment and storage medium for scientific and technological service
CN113312501A (en) * 2021-06-29 2021-08-27 中新国际联合研究院 Construction method and device of safety knowledge self-service query system based on knowledge graph
CN113535917A (en) * 2021-06-30 2021-10-22 山东师范大学 Intelligent question-answering method and system based on travel knowledge map

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116307123A (en) * 2023-02-23 2023-06-23 中国地质大学(武汉) Knowledge graph driving-based mineral resource prediction method and storage medium
CN116307123B (en) * 2023-02-23 2023-11-14 中国地质大学(武汉) Knowledge graph driving-based mineral resource prediction method and storage medium
CN115983381A (en) * 2023-02-28 2023-04-18 华院计算技术(上海)股份有限公司 Knowledge base rapid construction method and system based on online encyclopedia
CN116090662A (en) * 2023-03-02 2023-05-09 中国地质科学院矿产资源研究所 Knowledge-graph-based method and system for predicting potential of copper mine outside environment and electronic equipment
CN116090662B (en) * 2023-03-02 2024-05-24 中国地质科学院矿产资源研究所 Knowledge-graph-based method and system for predicting potential of copper mine outside environment and electronic equipment

Similar Documents

Publication Publication Date Title
CN114218333A (en) Geological knowledge map construction method and device, electronic equipment and storage medium
CN104318340B (en) Information visualization methods and intelligent visible analysis system based on text resume information
CN110990590A (en) Dynamic financial knowledge map construction method based on reinforcement learning and transfer learning
CN110059196A (en) A kind of Relation extraction method and system of medical health domain knowledge map
WO2020010834A1 (en) Faq question and answer library generalization method, apparatus, and device
CN104090907A (en) Method and device for providing information for user in specific application field
CN103440287A (en) Web question-answering retrieval system based on product information structuring
CN113656647B (en) Intelligent operation and maintenance oriented engineering archive data management platform, method and system
CN113609305B (en) Method and system for constructing regional knowledge map of film and television works based on BERT
CN116245177B (en) Geographic environment knowledge graph automatic construction method and system and readable storage medium
CN112579796A (en) Knowledge graph construction method for teaching resources of online education classroom
CN114218472A (en) Intelligent search system based on knowledge graph
CN116108215A (en) Cross-modal big data retrieval method and system based on depth fusion
CN117151659B (en) Ecological restoration engineering full life cycle tracing method based on large language model
CN117094390A (en) Knowledge graph construction and intelligent search method oriented to ocean engineering field
CN116431828A (en) Construction method of power grid center data asset knowledge graph database constructed based on neural network technology
Zhang et al. Ontology-based automatic semantic annotation method for iot data resources
CN114612071A (en) Data management method based on knowledge graph
CN114970547A (en) Multi-level and multi-type planning content difference identification and conflict elimination method
Hou et al. A spatial knowledge sharing platform. Using the visualization approach
CN105320717B (en) The semi-automatic construction method of individual in body learning
Zhou et al. Power grid engineering data knowledge retrieval and graph construction technology
Xie et al. Research into the construction of audio database based on intelligent recognition algorithm
Malinverno et al. A novel architecture for knowledge mining from digitised document libraries
CN118014072A (en) Construction method and system of knowledge graph for hydraulic and hydroelectric engineering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220322