CN112487212A - Method and device for constructing domain knowledge graph - Google Patents

Method and device for constructing domain knowledge graph Download PDF

Info

Publication number
CN112487212A
CN112487212A CN202011507759.0A CN202011507759A CN112487212A CN 112487212 A CN112487212 A CN 112487212A CN 202011507759 A CN202011507759 A CN 202011507759A CN 112487212 A CN112487212 A CN 112487212A
Authority
CN
China
Prior art keywords
data
concept
knowledge graph
knowledge
instance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011507759.0A
Other languages
Chinese (zh)
Inventor
侯磊
刘丁枭
张益�
李涓子
张鹏
唐杰
许斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202011507759.0A priority Critical patent/CN112487212A/en
Publication of CN112487212A publication Critical patent/CN112487212A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for constructing a domain knowledge graph, wherein the method comprises the following steps: acquiring seed vocabularies of a target field; performing vocabulary expansion by using the seed vocabulary of the target field until the expanded vocabulary meets a preset condition, and obtaining related vocabulary of the target field; extracting original data corresponding to the related vocabulary from an existing database; and constructing a knowledge graph based on the original data to generate the knowledge graph of the target field. In the embodiment of the invention, related vocabularies are obtained by utilizing the seed vocabularies in the target field to expand the vocabularies, original data are obtained based on the related vocabularies, and the knowledge graph is constructed based on the original data, so that a set of construction method of the knowledge graph suitable for any field is provided, the construction method can be realized without depending on expert knowledge and industry investigation in the specific field in the construction process of the knowledge graph, the construction efficiency of the knowledge graph can be effectively improved, and manpower and material resources are saved.

Description

Method and device for constructing domain knowledge graph
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for constructing a domain knowledge graph.
Background
The knowledge graph is a database for storing knowledge, is a concept formally proposed by google corporation in 2012, and is mainly used for enhancing the search efficiency and improving the user experience in the era of high-speed internet development and explosive network data growth. The knowledge graph establishes a foundation for intelligent information application by virtue of excellent semantic processing technology and interconnectivity, is widely applied to the aspects of search, question answering, information analysis and the like, and promotes the development of information technology from information service to knowledge service. In recent years, all walks of life are researching and applying the knowledge map to the professional field and better serve the specific field.
However, the construction of the current knowledge graph needs to be based on original data of a specific field, is not universal in all fields, and can be realized only by means of expert knowledge and industry research of the specific field, so that more manpower and material resources are consumed.
Disclosure of Invention
The invention provides a method and a device for constructing a domain knowledge graph, which are used for solving the problems that the construction of the existing knowledge graph is required to be based on original data of a specific domain, is not universal in all domains, can be realized only by assisting expert knowledge and industry research of the specific domain, and consumes more manpower and material resources.
The invention provides a method for constructing a domain knowledge graph, which comprises the following steps:
acquiring seed vocabularies of a target field;
performing vocabulary expansion by using the seed vocabulary of the target field until the expanded vocabulary meets a preset condition, and obtaining related vocabulary of the target field;
extracting original data corresponding to the related vocabulary from an existing database;
and constructing a knowledge graph based on the original data to generate the knowledge graph of the target field.
According to the method for constructing the domain knowledge graph, which is provided by the invention, the construction of the knowledge graph based on the original data to generate the knowledge graph of the target domain comprises the following steps:
preprocessing the original data to obtain preprocessed target data;
executing knowledge modeling operation based on the preprocessed target data to obtain concept data, and upper and lower relations among different concepts and concept attribute data;
executing knowledge acquisition operation based on the preprocessed target data to obtain instance data, a relation between an instance and a concept and instance attribute data;
and executing knowledge fusion operation according to the concept data, the superior-inferior relation and the concept attribute data among different concepts, the example data, the relation among the examples and the concepts and the example attribute data, and generating the knowledge graph of the target field.
According to the method for constructing the domain knowledge graph provided by the invention, the knowledge modeling operation is executed based on the preprocessed target data to obtain concept data, the upper and lower relations among different concepts and concept attribute data, and the method comprises the following steps:
executing concept acquisition operation based on the preprocessed target data to obtain concept data;
executing concept context generation operation based on the preprocessed target data to obtain the upper and lower relations among different concepts;
and executing concept attribute acquisition operation based on the preprocessed target data to obtain concept attribute data.
According to the method for constructing the domain knowledge graph provided by the invention, the knowledge acquisition operation is executed based on the preprocessed target data to obtain the example data, the relation between the example and the concept and the example attribute data, and the method comprises the following steps:
executing instance extraction operation based on the preprocessed target data to obtain instance data;
executing instance classification operation based on the preprocessed target data to obtain the relation between the instances and the concepts;
and executing instance attribute extraction operation based on the preprocessed target data to obtain instance attribute data.
According to the method for constructing the domain knowledge graph provided by the invention, the knowledge fusion operation is executed according to the concept data, the superior-inferior relation and the concept attribute data among different concepts, the example data, the relation among examples and concepts and the example attribute data, and the knowledge graph of the target domain is generated, and the method comprises the following steps:
executing concept fusion operation according to the concept data, the superior-inferior relation and the concept attribute data among different concepts, the example data, the relation among the examples and the concepts and the example attribute data, and realizing the data alignment of the concept layer;
executing instance fusion operation according to the concept data, the superior-inferior relation and the concept attribute data among different concepts, the instance data, the relation among the instances and the concepts and the instance attribute data, and realizing the alignment of the data of the instance layer;
and executing a relationship fusion operation according to the concept data, the superior-inferior relationship and the concept attribute data among different concepts, the instance data, the relationship among the instances and the concepts and the instance attribute data, realizing the alignment of the relationship among the concepts, the relationship among the concepts and the instances and the relationship among the instances, and generating the knowledge graph of the target field.
According to the method for constructing the domain knowledge graph, provided by the invention, the preprocessing is carried out on the original data to obtain the preprocessed target data, and the method comprises the following steps:
and performing abstract interception, text interception and information frame interception on the original data to obtain preprocessed target data.
The invention also provides a device for constructing the domain knowledge graph, which comprises the following components:
the seed vocabulary acquisition module is used for acquiring seed vocabularies of the target field;
the vocabulary expansion module is used for expanding the vocabulary by utilizing the seed vocabulary of the target field until the expanded vocabulary meets a preset condition, and obtaining the related vocabulary of the target field;
the original data extraction module is used for extracting original data corresponding to the related vocabulary from an existing database;
and the knowledge graph construction module is used for constructing a knowledge graph based on the original data to generate the knowledge graph of the target field.
According to the device for constructing the domain knowledge graph provided by the invention, the knowledge graph constructing module comprises:
the preprocessing submodule is used for preprocessing the original data to obtain preprocessed target data;
the knowledge modeling submodule is used for executing knowledge modeling operation based on the preprocessed target data to obtain concept data, upper and lower relations among different concepts and concept attribute data;
the knowledge acquisition submodule is used for executing knowledge acquisition operation based on the preprocessed target data to acquire example data, a relation between an example and a concept and example attribute data;
and the knowledge fusion sub-module is used for executing knowledge fusion operation according to the concept data, the superior-inferior relation and the concept attribute data among different concepts, the example data, the relation among the examples and the concepts and the example attribute data to generate the knowledge map of the target field.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the method for constructing the domain knowledge graph.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of domain knowledge graph construction as described in any of the above.
In the embodiment of the invention, related vocabularies are obtained by utilizing the seed vocabularies in the target field to expand the vocabularies, original data are obtained based on the related vocabularies, and the knowledge graph is constructed based on the original data, so that a set of construction method of the knowledge graph suitable for any field is provided, the construction method can be realized without depending on expert knowledge and industry investigation in the specific field in the construction process of the knowledge graph, the construction efficiency of the knowledge graph can be effectively improved, and manpower and material resources are saved.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow chart of a method for constructing a domain knowledge graph according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a process of constructing a knowledge graph based on the raw data to generate a knowledge graph of the target domain according to an embodiment of the present invention;
FIG. 3 is a schematic view of a "Yuanming Garden" interface provided in an embodiment of the present invention;
FIG. 4 is a schematic diagram of a pre-processing abstract interception provided by an embodiment of the present invention;
FIG. 5 is a diagram illustrating an interception result of a preprocessed information frame according to an embodiment of the present invention;
fig. 6 is a schematic diagram of an abstract extraction result of knowledge acquisition in a "garden of Yuanming province" according to an embodiment of the present invention;
fig. 7 is a schematic diagram of an extraction result of a knowledge acquisition information frame of the "garden rounding" according to the embodiment of the present invention;
FIG. 8 is a schematic view of a home page of a travel knowledge base according to an embodiment of the present invention;
FIG. 9 is a schematic view of a final interface portion of a "Yuanming Garden" provided in an embodiment of the present invention;
FIG. 10 is a schematic structural diagram of a domain knowledge graph constructing apparatus according to an embodiment of the present invention;
fig. 11 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The method and apparatus for constructing the domain knowledge graph according to the present invention will be described with reference to fig. 1 to 11.
The terms to which the present invention relates will be explained first.
Knowledge graph: a database storing knowledge, in which are stored triplets, such as yaoming, birth place, shanghai, etc., each of which represents a fact. The knowledge graph can also be seen in the form of a graph, such as the above triples, where Yaoming and Shanghai are nodes, and the radix rehmanniae is a line of Yaoming pointing to Shanghai and having a label.
The concept is as follows: a class of entities in a knowledge graph, such as fruits, pomes, and the like.
Entity: specific real objects in the knowledge map, such as apple, hawthorn and the like.
The attributes are as follows: the knowledge-graph includes the characteristics of the concept or entity, such as the origin and color of apple.
The relationship is as follows: the relationship between the concept, the entity and the attribute in the knowledge graph and the knowledge graph, for example, the entity apple is one of the entities under the concept of fruit, and the color attribute of the apple can be red, pink, golden yellow and the like.
Fig. 1 is a schematic flowchart of a method for constructing a domain knowledge graph according to an embodiment of the present invention, where an execution subject of the method may be a terminal or a server, as shown in fig. 1, the method includes:
step 100, acquiring seed vocabularies of a target field;
aiming at a certain target field, in order to construct a knowledge graph of the target field, firstly, a seed vocabulary of the target field is obtained.
Alternatively, the seed vocabulary contained in the user input may be obtained in response to the user input by receiving the user input.
Optionally, the seed vocabulary of the target domain may also be directly obtained under the condition that the target domain is determined.
101, utilizing the seed vocabulary of the target field to expand the vocabulary until the expanded vocabulary meets a preset condition, and obtaining related vocabulary of the target field;
and performing vocabulary extension through a vocabulary extension function realized in advance by using the seed vocabulary in the target field until the vocabulary obtained by extension meets a preset condition, and finally obtaining the related vocabulary in the target field.
Optionally, the condition that the expanded vocabulary meets the preset condition may be that the number of the expanded vocabulary meets the preset number, or that the number ratio of the vocabularies of different languages in the expanded vocabulary meets the preset ratio, or that other conditions can be used to determine that the vocabulary expansion is finished.
102, extracting original data corresponding to the related vocabulary from an existing database;
and extracting corresponding original data from the existing database according to the related vocabulary obtained by expansion.
The existing database is an already existing database.
The raw data is the source data of the related vocabulary, and optionally, the raw data can be encyclopedic page information, web page raw data or other raw data forms.
And 103, constructing a knowledge graph based on the original data to generate the knowledge graph of the target field.
The knowledge graph construction process comprises the steps of preprocessing, knowledge modeling, knowledge acquisition, knowledge fusion and the like.
And based on the original data, carrying out steps of preprocessing, knowledge modeling, knowledge acquisition, knowledge fusion and the like to finally obtain the knowledge graph of the target field.
In the embodiment of the invention, related vocabularies are obtained by utilizing the seed vocabularies in the target field to expand the vocabularies, original data are obtained based on the related vocabularies, and the knowledge graph is constructed based on the original data, so that a set of construction method of the knowledge graph suitable for any field is provided, the construction method can be realized without depending on expert knowledge and industry investigation in the specific field in the construction process of the knowledge graph, the construction efficiency of the knowledge graph can be effectively improved, and manpower and material resources are saved.
On the basis of the foregoing embodiment, optionally, as shown in fig. 2, the constructing a knowledge graph based on the original data to generate the knowledge graph of the target domain includes:
step 200, preprocessing the original data to obtain preprocessed target data;
preprocessing refers to normalizing data before main processing.
Optionally, in an embodiment, the preprocessing the raw data to obtain preprocessed target data includes:
and performing abstract interception, text interception and information frame interception on the original data to obtain preprocessed target data.
Step 201, based on the preprocessed target data, executing knowledge modeling operation to obtain concept data, and upper and lower relations among different concepts and concept attribute data;
knowledge modeling comprises the processes of concept acquisition, concept upper and lower generation, concept attribute acquisition and the like. The concept acquisition is to extract concept data from the preprocessed data, the concept context generation is to obtain the context relationship between different concepts from the original data through a certain rule, and the concept attribute extraction is to extract the concept attributes.
Optionally, in an embodiment, the performing knowledge modeling operation based on the preprocessed target data to obtain concept data, a context relationship between different concepts, and concept attribute data includes:
executing concept acquisition operation based on the preprocessed target data to obtain concept data;
executing concept context generation operation based on the preprocessed target data to obtain the upper and lower relations among different concepts;
and executing concept attribute acquisition operation based on the preprocessed target data to obtain concept attribute data.
Step 202, executing knowledge acquisition operation based on the preprocessed target data, and acquiring instance data, a relation between an instance and a concept and instance attribute data;
the knowledge acquisition mainly comprises instance extraction, instance classification, instance attribute extraction and the like. The example extraction is to extract example data from the preprocessed data, the example classification is to extract the relation between the examples and the concepts from the preprocessed data, and the example attribute extraction is to extract the attribute data of the examples from the preprocessed data.
Optionally, in an embodiment, the executing a knowledge acquisition operation based on the preprocessed target data to obtain instance data, a relationship between an instance and a concept, and instance attribute data includes:
executing instance extraction operation based on the preprocessed target data to obtain instance data;
executing instance classification operation based on the preprocessed target data to obtain the relation between the instances and the concepts;
and executing instance attribute extraction operation based on the preprocessed target data to obtain instance attribute data.
Step 203, executing knowledge fusion operation according to the concept data, the superior-inferior relation and the concept attribute data among different concepts, the example data, the relation among the examples and the concepts, and the example attribute data, and generating the knowledge graph of the target field.
The knowledge fusion mainly comprises concept fusion, instance fusion and relationship fusion, wherein the concept fusion mainly refers to fusion of concept layer data, the instance fusion mainly refers to fusion of instance layer data, and the relationship fusion refers to fusion of relationships between concepts, relationships between concepts and instances and relationships between instances.
Optionally, in an embodiment, the performing a knowledge fusion operation according to the concept data, the context relationship and the concept attribute data between different concepts, and the instance data, the relationship between the instance and the concept, and the instance attribute data to generate the knowledge graph of the target domain includes:
executing concept fusion operation according to the concept data, the superior-inferior relation and the concept attribute data among different concepts, the example data, the relation among the examples and the concepts and the example attribute data, and realizing the data alignment of the concept layer;
executing instance fusion operation according to the concept data, the superior-inferior relation and the concept attribute data among different concepts, the instance data, the relation among the instances and the concepts and the instance attribute data, and realizing the alignment of the data of the instance layer;
and executing a relationship fusion operation according to the concept data, the superior-inferior relationship and the concept attribute data among different concepts, the instance data, the relationship among the instances and the concepts and the instance attribute data, realizing the alignment of the relationship among the concepts, the relationship among the concepts and the instances and the relationship among the instances, and generating the knowledge graph of the target field.
In the embodiment of the invention, the expansion function of the vocabulary in a certain field is realized through the vocabulary expansion function by a small number of seed words in the certain field, the process can be iterated for many times until the preset finishing condition of vocabulary expansion is met, the original data is extracted from the existing database according to the vocabulary in the field, the specific steps of constructing the knowledge graph based on the original data are given, and the construction of the knowledge graph is realized through the steps of preprocessing, knowledge modeling, knowledge acquisition, knowledge fusion and the like.
A specific example is given below to further illustrate the method for constructing the domain knowledge graph provided by the present invention.
And constructing a knowledge map related to Beijing tourism. Firstly, 50 seed words such as 'Beijing, tourism, Tiananmen, and the Guochong' are input, the expansion function of the tourism field words is realized through the word expansion function, information such as 'Yuanmingyu, thirteen lingering, Xiangshan Temple, geological park, Hijongchang museum, Yanqing museum, China horse culture museum' and the like of 1043 related words of the tourism field is obtained, and then corresponding encyclopedic page information is extracted from a database according to the related words obtained through expansion.
For example, the "garden roundness" interface therein is shown in fig. 3.
And obtaining original data of the vocabulary related to the tourist field knowledge map in the original library according to the acquisition, and then carrying out knowledge map construction work aiming at the original data. The raw data is first subjected to a preprocessing operation. The method comprises the steps of preprocessing the original data of the webpage, including processing such as abstract interception, text interception, information frame interception and the like.
The interception of the preprocessing summary is shown in fig. 4, and the interception result of the preprocessing information frame is shown in fig. 5.
And carrying out a knowledge modeling process on the basis of preprocessing. Knowledge modeling comprises the processes of concept acquisition, concept upper and lower generation, concept attribute acquisition and the like. The concept acquisition is to extract concept data from the preprocessed data, the concept context generation is to obtain the context relationship between different concepts from the original data through a certain rule, and the concept attribute extraction is to extract the concept attribute. The existing concepts are 'scenic spots', 'old word numbers', 'cultural relics' and the like, wherein the 'scenic spots' are lower-layer concepts of the 'tourism' concept, and the 'scenic spot' concept has attributes of 'open time', 'category', 'entrance ticket price' and the like.
The knowledge acquisition mainly comprises instance extraction, instance classification, instance attribute extraction and the like. The example extraction is to extract example data from the preprocessed data, the example classification is to extract the relation between the examples and the concepts from the preprocessed data, and the example attribute extraction is to extract the attribute data of the examples from the preprocessed data.
FIG. 6 shows the abstract extraction result of the knowledge acquisition of the Yuanmingyuan; fig. 7 shows the extraction result of the knowledge acquisition information frame of the "Yuanmingyuan".
Through the steps of vocabulary extension, preprocessing, knowledge modeling, knowledge acquisition and the like, the tourist domain knowledge map data are successfully constructed, wherein the number of concepts is 115, the number of examples is 1043, and the number of attributes is 827. And storing all the obtained triples into a database virtuoso, and then displaying pages, wherein FIG. 8 is a schematic view for displaying a home page of the travel knowledge map. Figure 9 is a partial illustration of the "garden circle" final interface.
The following describes the apparatus for constructing a domain knowledge graph according to the present invention, and the apparatus for constructing a domain knowledge graph described below and the method for constructing a domain knowledge graph described above may be referred to in correspondence with each other.
Fig. 10 is a schematic structural diagram of a domain knowledge graph constructing apparatus provided by the present invention, including: a seed vocabulary acquisition module 1010, a vocabulary expansion module 1020, an original data extraction module 1030, and a knowledge graph construction module 1040, wherein,
a seed vocabulary acquiring module 1010, configured to acquire a seed vocabulary of a target field;
a vocabulary extension module 1020, configured to perform vocabulary extension by using the seed vocabulary of the target field until the vocabulary obtained by extension meets a preset condition, and obtain a related vocabulary of the target field;
an original data extraction module 1030, configured to extract original data corresponding to the relevant vocabulary from an existing database;
a knowledge graph constructing module 1040, configured to construct a knowledge graph based on the original data, and generate the knowledge graph of the target field.
Optionally, the knowledge graph constructing module 1040 includes:
the preprocessing submodule is used for preprocessing the original data to obtain preprocessed target data;
the knowledge modeling submodule is used for executing knowledge modeling operation based on the preprocessed target data to obtain concept data, upper and lower relations among different concepts and concept attribute data;
the knowledge acquisition submodule is used for executing knowledge acquisition operation based on the preprocessed target data to acquire example data, a relation between an example and a concept and example attribute data;
and the knowledge fusion sub-module is used for executing knowledge fusion operation according to the concept data, the superior-inferior relation and the concept attribute data among different concepts, the example data, the relation among the examples and the concepts and the example attribute data to generate the knowledge map of the target field.
Optionally, the knowledge modeling sub-module is specifically configured to:
executing concept acquisition operation based on the preprocessed target data to obtain concept data;
executing concept context generation operation based on the preprocessed target data to obtain the upper and lower relations among different concepts;
and executing concept attribute acquisition operation based on the preprocessed target data to obtain concept attribute data.
Optionally, the knowledge acquisition sub-module is specifically configured to:
executing instance extraction operation based on the preprocessed target data to obtain instance data;
executing instance classification operation based on the preprocessed target data to obtain the relation between the instances and the concepts;
and executing instance attribute extraction operation based on the preprocessed target data to obtain instance attribute data.
Optionally, the knowledge fusion submodule is specifically configured to:
executing concept fusion operation according to the concept data, the superior-inferior relation and the concept attribute data among different concepts, the example data, the relation among the examples and the concepts and the example attribute data, and realizing the data alignment of the concept layer;
executing instance fusion operation according to the concept data, the superior-inferior relation and the concept attribute data among different concepts, the instance data, the relation among the instances and the concepts and the instance attribute data, and realizing the alignment of the data of the instance layer;
and executing a relationship fusion operation according to the concept data, the superior-inferior relationship and the concept attribute data among different concepts, the instance data, the relationship among the instances and the concepts and the instance attribute data, realizing the alignment of the relationship among the concepts, the relationship among the concepts and the instances and the relationship among the instances, and generating the knowledge graph of the target field.
Optionally, the preprocessing sub-module is specifically configured to:
and performing abstract interception, text interception and information frame interception on the original data to obtain preprocessed target data.
The device for constructing the domain knowledge graph provided by the invention can realize each process realized by the method embodiments of fig. 1 to 9, achieve the same technical effect, and is not repeated here to avoid repetition.
Fig. 11 illustrates a physical structure diagram of an electronic device, and as shown in fig. 11, the electronic device may include: a processor (processor)1110, a communication Interface (Communications Interface)1120, a memory (memory)1130, and a communication bus 1140, wherein the processor 1110, the communication Interface 1120, and the memory 1130 communicate with each other via the communication bus 1140. Processor 1110 may invoke logic instructions in memory 1130 to perform a method of domain knowledge graph construction, the method comprising: acquiring seed vocabularies of a target field; performing vocabulary expansion by using the seed vocabulary of the target field until the expanded vocabulary meets a preset condition, and obtaining related vocabulary of the target field; extracting original data corresponding to the related vocabulary from an existing database; and constructing a knowledge graph based on the original data to generate the knowledge graph of the target field.
In addition, the logic instructions in the memory 1130 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, enable the computer to perform the method for constructing a domain knowledge graph provided by the above methods, the method comprising: acquiring seed vocabularies of a target field; performing vocabulary expansion by using the seed vocabulary of the target field until the expanded vocabulary meets a preset condition, and obtaining related vocabulary of the target field; extracting original data corresponding to the related vocabulary from an existing database; and constructing a knowledge graph based on the original data to generate the knowledge graph of the target field.
In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to perform the method for constructing the domain knowledge graph provided in the above aspects, the method comprising: acquiring seed vocabularies of a target field; performing vocabulary expansion by using the seed vocabulary of the target field until the expanded vocabulary meets a preset condition, and obtaining related vocabulary of the target field; extracting original data corresponding to the related vocabulary from an existing database; and constructing a knowledge graph based on the original data to generate the knowledge graph of the target field.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for constructing a domain knowledge graph is characterized by comprising the following steps:
acquiring seed vocabularies of a target field;
performing vocabulary expansion by using the seed vocabulary of the target field until the expanded vocabulary meets a preset condition, and obtaining related vocabulary of the target field;
extracting original data corresponding to the related vocabulary from an existing database;
and constructing a knowledge graph based on the original data to generate the knowledge graph of the target field.
2. The method for constructing the domain knowledge graph according to claim 1, wherein the constructing the knowledge graph based on the raw data to generate the knowledge graph of the target domain comprises:
preprocessing the original data to obtain preprocessed target data;
executing knowledge modeling operation based on the preprocessed target data to obtain concept data, and upper and lower relations among different concepts and concept attribute data;
executing knowledge acquisition operation based on the preprocessed target data to obtain instance data, a relation between an instance and a concept and instance attribute data;
and executing knowledge fusion operation according to the concept data, the superior-inferior relation and the concept attribute data among different concepts, the example data, the relation among the examples and the concepts and the example attribute data, and generating the knowledge graph of the target field.
3. The method for constructing a domain knowledge graph according to claim 2, wherein the performing knowledge modeling operations based on the preprocessed target data to obtain concept data, context relationships between different concepts and concept attribute data comprises:
executing concept acquisition operation based on the preprocessed target data to obtain concept data;
executing concept context generation operation based on the preprocessed target data to obtain the upper and lower relations among different concepts;
and executing concept attribute acquisition operation based on the preprocessed target data to obtain concept attribute data.
4. The method for constructing a domain knowledge graph according to claim 2, wherein the performing knowledge acquisition operations based on the preprocessed target data to obtain instance data, relationships between instances and concepts, and instance attribute data comprises:
executing instance extraction operation based on the preprocessed target data to obtain instance data;
executing instance classification operation based on the preprocessed target data to obtain the relation between the instances and the concepts;
and executing instance attribute extraction operation based on the preprocessed target data to obtain instance attribute data.
5. The method for constructing a domain knowledge graph according to claim 2, wherein the performing a knowledge fusion operation to generate the knowledge graph of the target domain according to the concept data, the superior-inferior relationship and the concept attribute data between different concepts, and the instance data, the relationship between the instance and the concept, and the instance attribute data comprises:
executing concept fusion operation according to the concept data, the superior-inferior relation and the concept attribute data among different concepts, the example data, the relation among the examples and the concepts and the example attribute data, and realizing the data alignment of the concept layer;
executing instance fusion operation according to the concept data, the superior-inferior relation and the concept attribute data among different concepts, the instance data, the relation among the instances and the concepts and the instance attribute data, and realizing the alignment of the data of the instance layer;
and executing a relationship fusion operation according to the concept data, the superior-inferior relationship and the concept attribute data among different concepts, the instance data, the relationship among the instances and the concepts and the instance attribute data, realizing the alignment of the relationship among the concepts, the relationship among the concepts and the instances and the relationship among the instances, and generating the knowledge graph of the target field.
6. The method for constructing a domain knowledge graph according to claim 2, wherein the preprocessing the raw data to obtain preprocessed target data comprises:
and performing abstract interception, text interception and information frame interception on the original data to obtain preprocessed target data.
7. An apparatus for constructing a domain knowledge graph, comprising:
the seed vocabulary acquisition module is used for acquiring seed vocabularies of the target field;
the vocabulary expansion module is used for expanding the vocabulary by utilizing the seed vocabulary of the target field until the expanded vocabulary meets a preset condition, and obtaining the related vocabulary of the target field;
the original data extraction module is used for extracting original data corresponding to the related vocabulary from an existing database;
and the knowledge graph construction module is used for constructing a knowledge graph based on the original data to generate the knowledge graph of the target field.
8. The domain knowledge graph building apparatus of claim 7, wherein the knowledge graph building module comprises:
the preprocessing submodule is used for preprocessing the original data to obtain preprocessed target data;
the knowledge modeling submodule is used for executing knowledge modeling operation based on the preprocessed target data to obtain concept data, upper and lower relations among different concepts and concept attribute data;
the knowledge acquisition submodule is used for executing knowledge acquisition operation based on the preprocessed target data to acquire example data, a relation between an example and a concept and example attribute data;
and the knowledge fusion sub-module is used for executing knowledge fusion operation according to the concept data, the superior-inferior relation and the concept attribute data among different concepts, the example data, the relation among the examples and the concepts and the example attribute data to generate the knowledge map of the target field.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of building a domain knowledge graph as claimed in any one of claims 1 to 7 when executing the program.
10. A non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the method for constructing a domain knowledge graph as claimed in any one of claims 1 to 7.
CN202011507759.0A 2020-12-18 2020-12-18 Method and device for constructing domain knowledge graph Pending CN112487212A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011507759.0A CN112487212A (en) 2020-12-18 2020-12-18 Method and device for constructing domain knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011507759.0A CN112487212A (en) 2020-12-18 2020-12-18 Method and device for constructing domain knowledge graph

Publications (1)

Publication Number Publication Date
CN112487212A true CN112487212A (en) 2021-03-12

Family

ID=74914246

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011507759.0A Pending CN112487212A (en) 2020-12-18 2020-12-18 Method and device for constructing domain knowledge graph

Country Status (1)

Country Link
CN (1) CN112487212A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113158688A (en) * 2021-05-11 2021-07-23 科大讯飞股份有限公司 Domain knowledge base construction method, device, equipment and storage medium
CN115905559A (en) * 2022-11-10 2023-04-04 北京大学 Method and device for constructing knowledge graph in field of intelligent careless care
WO2023246007A1 (en) * 2022-06-23 2023-12-28 广州大学 Value chain knowledge discovery method under personalized customization

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488724A (en) * 2013-09-16 2014-01-01 复旦大学 Book-oriented reading field knowledge map construction method
US20170193393A1 (en) * 2016-01-04 2017-07-06 International Business Machines Corporation Automated Knowledge Graph Creation
CN109446341A (en) * 2018-10-23 2019-03-08 国家电网公司 The construction method and device of knowledge mapping
CN110390023A (en) * 2019-07-02 2019-10-29 安徽继远软件有限公司 A kind of knowledge mapping construction method based on improvement BERT model
CN110750698A (en) * 2019-09-09 2020-02-04 深圳壹账通智能科技有限公司 Knowledge graph construction method and device, computer equipment and storage medium
CN111488467A (en) * 2020-04-30 2020-08-04 北京建筑大学 Construction method and device of geographical knowledge graph, storage medium and computer equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488724A (en) * 2013-09-16 2014-01-01 复旦大学 Book-oriented reading field knowledge map construction method
US20170193393A1 (en) * 2016-01-04 2017-07-06 International Business Machines Corporation Automated Knowledge Graph Creation
CN109446341A (en) * 2018-10-23 2019-03-08 国家电网公司 The construction method and device of knowledge mapping
CN110390023A (en) * 2019-07-02 2019-10-29 安徽继远软件有限公司 A kind of knowledge mapping construction method based on improvement BERT model
CN110750698A (en) * 2019-09-09 2020-02-04 深圳壹账通智能科技有限公司 Knowledge graph construction method and device, computer equipment and storage medium
CN111488467A (en) * 2020-04-30 2020-08-04 北京建筑大学 Construction method and device of geographical knowledge graph, storage medium and computer equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113158688A (en) * 2021-05-11 2021-07-23 科大讯飞股份有限公司 Domain knowledge base construction method, device, equipment and storage medium
CN113158688B (en) * 2021-05-11 2023-12-01 科大讯飞股份有限公司 Domain knowledge base construction method, device, equipment and storage medium
WO2023246007A1 (en) * 2022-06-23 2023-12-28 广州大学 Value chain knowledge discovery method under personalized customization
CN115905559A (en) * 2022-11-10 2023-04-04 北京大学 Method and device for constructing knowledge graph in field of intelligent careless care
CN115905559B (en) * 2022-11-10 2024-01-23 北京大学 Knowledge graph construction method and device for field of care of mental retardation

Similar Documents

Publication Publication Date Title
CN110837550B (en) Knowledge graph-based question answering method and device, electronic equipment and storage medium
CN107679039B (en) Method and device for determining statement intention
CN108595583B (en) Dynamic graph page data crawling method, device, terminal and storage medium
CN112487212A (en) Method and device for constructing domain knowledge graph
CN109960810B (en) Entity alignment method and device
CN111488467B (en) Construction method and device of geographical knowledge graph, storage medium and computer equipment
US20210097089A1 (en) Knowledge graph building method, electronic apparatus and non-transitory computer readable storage medium
CN109408811B (en) Data processing method and server
CN106991175B (en) Customer information mining method, device, equipment and storage medium
CN109522562B (en) Webpage knowledge extraction method based on text image fusion recognition
CN107657048A (en) user identification method and device
CN111488468B (en) Geographic information knowledge point extraction method and device, storage medium and computer equipment
CN111524593A (en) Medical question-answering method and system based on context language model and knowledge embedding
CN112527924A (en) Dynamically updated knowledge graph expansion method and device
CN112183102A (en) Named entity identification method based on attention mechanism and graph attention network
CN110737820B (en) Method and apparatus for generating event information
CN112528146B (en) Content resource recommendation method and device, electronic equipment and storage medium
CN109783471A (en) Enterprise's portrait small routine method, apparatus, computer equipment and storage medium
CN117312531A (en) Power distribution network fault attribution analysis method based on large language model with enhanced knowledge graph
CN114579796B (en) Machine reading understanding method and device
CN112541087A (en) Cross-language knowledge graph construction method and device based on encyclopedia
CN112767933B (en) Voice interaction method, device, equipment and medium of highway maintenance management system
CN112328812B (en) Domain knowledge extraction method and system based on self-adjusting parameters and electronic equipment
CN114357164A (en) Emotion-reason pair extraction method, device and equipment and readable storage medium
Pu et al. A vision-based approach for deep web form extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination