CN110569371A - Knowledge graph construction method and device and storage equipment - Google Patents

Knowledge graph construction method and device and storage equipment Download PDF

Info

Publication number
CN110569371A
CN110569371A CN201910875545.XA CN201910875545A CN110569371A CN 110569371 A CN110569371 A CN 110569371A CN 201910875545 A CN201910875545 A CN 201910875545A CN 110569371 A CN110569371 A CN 110569371A
Authority
CN
China
Prior art keywords
knowledge
result
triples
specific entity
triple
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910875545.XA
Other languages
Chinese (zh)
Inventor
林凤绿
雷欣
李志飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Go Out And Ask (wuhan) Information Technology Co Ltd
Original Assignee
Go Out And Ask (wuhan) Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Go Out And Ask (wuhan) Information Technology Co Ltd filed Critical Go Out And Ask (wuhan) Information Technology Co Ltd
Priority to CN201910875545.XA priority Critical patent/CN110569371A/en
Publication of CN110569371A publication Critical patent/CN110569371A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

the invention discloses a knowledge graph construction method, a knowledge graph construction device and computer storage equipment, wherein knowledge is extracted from a data source associated with a specific entity word to obtain a triple of the specific entity word; then fusing the obtained triples of the specific entity words with the existing triples in the concept map to obtain a fusion result; performing upper and lower reasoning on the obtained fusion result to obtain a reasoning result of the triples for expanding the specific entity words; further translating the extended triples in the inference result by using a knowledge generation engine to obtain a translation result; and finally, writing the obtained translation result into a storage service through a storage engine.

Description

Knowledge graph construction method and device and storage equipment
Technical Field
The invention relates to the technical field of information processing, in particular to a large-scale knowledge graph construction method and device and computer storage equipment.
Background
The knowledge map, also called knowledge domain visualization or knowledge domain mapping map, is a series of different graphs displaying the relationship between the knowledge development process and the structure, describes knowledge resources and carriers thereof by using visualization technology, and mines, analyzes, constructs, draws and displays knowledge and the mutual relation between the knowledge resources and the carriers. The knowledge graph can be applied to many application scenarios, such as information recommendation based on the knowledge graph in an information recommendation system, or classification based on the knowledge graph in a text classification process, and the like. Therefore, in order to ensure the wide application of the knowledge graph, a plurality of research methods are available to realize the construction of the knowledge graph.
according to the knowledge graph construction method recorded in the patent document with the publication number of CN108563710A, when the knowledge graph is constructed, the labels of published texts and the entity information in the basic graph are used as the information of graph nodes in the knowledge graph to be constructed, and then the occurrence times of the information of two graph stages in the same published text are used as node relation information to complete the construction of the knowledge graph. According to a knowledge graph construction method described in still another patent document with publication No. CN108694177A, a knowledge graph is constructed mainly by constructing relationship data between respective entities.
The concept knowledge graph constructed by the existing knowledge graph construction method has the problems of small scale, single language (Chinese or English) support, single knowledge extraction method, inaccurate isA concept relation, incapability of dynamic updating and the like.
disclosure of Invention
in order to effectively overcome the defects in the conventional knowledge graph construction method, the embodiment of the invention creatively provides a large-scale knowledge graph construction method, a large-scale knowledge graph construction device and computer storage equipment.
According to a first aspect of the embodiments of the present invention, there is provided a method for constructing a knowledge graph, the method including: extracting knowledge from a data source associated with a specific entity word to obtain a triple of the specific entity word; fusing the obtained triples of the specific entity words with the existing triples in the concept map to obtain a fusion result; performing upper and lower reasoning on the obtained fusion result to obtain a reasoning result of the triple of the specific entity word; translating the extended triples in the inference result by using a knowledge generation engine to obtain a translation result; the resulting translation results are written to the storage service by the storage engine.
According to an embodiment of the present invention, the data source associated with the specific entity word includes at least one of the following types: structured data, semi-structured data, and unstructured data; accordingly, knowledge extraction from data sources associated with particular entity words includes: and performing knowledge extraction from the data source associated with the specific entity word by adopting a knowledge extraction mode corresponding to the type of the data source, wherein different data sources correspond to different knowledge extraction methods.
According to one embodiment of the invention, the knowledge extraction from the data source associated with the specific entity word comprises the following steps: if the data source type is structured data, extracting knowledge from a relational database by using a D2R method or extracting knowledge from link data by using a graph mapping method; and/or, if the data source type is semi-structured data, extracting knowledge from the semi-structured data by using a wrapper; and/or if the data source type is unstructured data, extracting knowledge from the free text by using an information extraction method.
According to an embodiment of the present invention, the fusing the obtained triplet of the specific entity word with the existing triplet in the concept graph to obtain a fused result includes: judging whether the obtained triple of the specific entity word is contained in the existing triple in the concept map; and if the obtained triple of the specific entity word is not contained in the existing triple in the concept map, adding the obtained triple of the specific entity word into the concept map to obtain a fusion result.
According to an embodiment of the present invention, the determining whether the obtained triple of the specific entity word is included in the existing triple in the concept graph includes: performing word expansion on the entity words in the obtained triples of the specific entity words to obtain triples after the word expansion; and judging whether the triples of the expanded words are contained in the existing triples in the concept map.
According to one embodiment of the invention, the knowledge generation engine comprises a knowledge base query engine, a neural network translation engine and an online translation engine; correspondingly, the method for translating the triples expanded in the inference result by using the knowledge generation engine to obtain the translation result comprises the following steps: translating the extended triple in the inference result by using a plurality of different knowledge generation engines to obtain a plurality of processing results; performing fusion comparison on the obtained multiple processing results to obtain a fusion comparison result; and determining the processing result with the highest rank in the fusion comparison results as a translation result.
According to an embodiment of the present invention, after writing the obtained translation result into the storage service by the storage engine, the method further includes: and reading the written translation result from the storage service by adopting a query engine matched with the storage engine.
According to a second aspect of the embodiments of the present invention, there is also provided a knowledge-graph constructing apparatus, including: the knowledge extraction module is used for extracting knowledge from a data source associated with a specific entity word to obtain a triple of the specific entity word; the fusion module is used for fusing the obtained triples of the specific entity words with the existing triples in the concept map to obtain a fusion result; the reasoning module is used for carrying out upper and lower reasoning on the obtained fusion result to obtain a reasoning result of the triple of the specific entity word; the knowledge generation module is used for translating the extended triples in the inference result by using a knowledge generation engine to obtain a translation result; and the storage module is used for writing the obtained translation result into the storage service through the storage engine.
According to an embodiment of the present invention, the data source associated with the specific entity word includes at least one of the following types: structured data, semi-structured data, and unstructured data; correspondingly, the knowledge extraction module is specifically configured to extract knowledge from the data source associated with the specific entity word in a knowledge extraction manner corresponding to the type of the data source, where different data source types correspond to different knowledge extraction methods.
According to an embodiment of the present invention, the knowledge extraction module is specifically configured to, if the data source type is structured data, extract knowledge from a relational database using a D2R method or extract knowledge from link data using a graph mapping method; and/or, if the data source type is semi-structured data, extracting knowledge from the semi-structured data by using a wrapper; and/or if the data source type is unstructured data, extracting knowledge from the free text by using an information extraction method.
According to an embodiment of the invention, the fusion module comprises: the judging unit is used for judging whether the obtained triple of the specific entity word is contained in the existing triple in the concept map; and the adding unit is used for adding the obtained triple of the specific entity word into the concept map to obtain a fusion result if the obtained triple of the specific entity word is not contained in the existing triple in the concept map.
according to an embodiment of the present invention, the determining unit is specifically configured to perform word expansion on an entity word in the obtained triple of the specific entity word to obtain an expanded triple; and judging whether the triples of the expanded words are contained in the existing triples in the concept map.
according to one embodiment of the invention, the knowledge generation engine comprises a knowledge query engine, a neural network translation engine and an online translation engine; correspondingly, the knowledge generation module is specifically configured to utilize a plurality of different knowledge generation engines to perform translation processing on the extended triples in the inference result to obtain a plurality of processing results; performing fusion comparison on the obtained multiple processing results to obtain a fusion comparison result; and determining the processing result with the highest rank in the fusion comparison results as a translation result.
according to an embodiment of the present invention, the apparatus further includes a query module, configured to read the written translation result from the storage service using a query engine matching the storage engine.
According to a third aspect of embodiments of the present invention, there is provided a computer storage device comprising a set of computer-executable instructions which, when executed, perform any of the above-described methods of knowledge-graph construction.
The knowledge graph construction method, the knowledge graph construction device and the computer storage equipment disclosed by the embodiment of the invention firstly extract knowledge from a data source associated with a specific entity word to obtain a triple of the specific entity word; then fusing the obtained triples of the specific entity words with the existing triples in the concept map to obtain a fusion result; performing upper and lower reasoning on the obtained fusion result to obtain a reasoning result of the triples for expanding the specific entity words; further translating the extended triples in the inference result by using a knowledge generation engine to obtain a translation result; and finally, writing the obtained translation result into a storage service through a storage engine. Therefore, the invention is strictly organized according to the entity in the construction process of the knowledge graph, which is beneficial to the accurate understanding of the entity; moreover, by carrying out fusion, reasoning and translation processing on the triples obtained by knowledge extraction, a large-scale high-quality concept knowledge graph can be constructed and completed, so that the accuracy and the recall rate of natural language understanding are improved.
Drawings
the above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
FIG. 1 is a diagram of a system architecture for implementing knowledge graph construction according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating an implementation of a knowledge graph construction method according to an embodiment of the present invention;
FIG. 3 shows an architecture diagram of a knowledge generation module of an embodiment of the invention;
FIG. 4 illustrates a conceptual knowledge graph effect diagram of an application example of the present invention;
FIG. 5 is a schematic diagram showing the composition structure of the knowledge graph constructing apparatus according to the embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
FIG. 1 is a diagram illustrating a system architecture for implementing knowledge graph construction according to an embodiment of the present invention. Referring to fig. 1, a system architecture for implementing knowledge graph construction according to an embodiment of the present invention at least includes: the system comprises modules of knowledge extraction, knowledge fusion, knowledge reasoning, knowledge generation, a storage engine, a query engine and the like. The knowledge extraction link can extract knowledge from different types of data sources such as result data, semi-structured data, unstructured data and the like; then, the extracted knowledge is subjected to fusion, reasoning, knowledge generation (namely translation processing) and other processing, so that a large-scale high-quality concept knowledge graph is constructed and completed; further, the constructed concept knowledge graph is written into a storage service through a storage engine, so that the query processing can be carried out through a query engine subsequently.
FIG. 2 is a schematic flow chart of a method for constructing a knowledge graph according to an embodiment of the present invention; please refer to fig. 2. The method for constructing the knowledge graph comprises the following steps: operation 201, performing knowledge extraction from a data source associated with a specific entity word to obtain a triple of the specific entity word; operation 202, fusing the obtained triples of the specific entity words with the existing triples in the concept graph to obtain a fusion result; operation 203, performing upper and lower reasoning on the obtained fusion result to obtain a reasoning result of the triplet for expanding the specific entity word; operation 204, translating the extended triple in the inference result by using a knowledge generation engine to obtain a translation result; in operation 205, the resulting translation result is written into the storage service by the storage engine.
The triples in the knowledge graph generally include (entity, entity relationship, entity). If an entity is considered as a node and an entity relationship (including attributes, categories, etc.) is considered as an edge, then the knowledge base containing a large number of triples becomes a huge knowledge graph. For example, the triplets including liu de hua may be (liu de hua, isA, actor), (liu de hua, isA, singer), (liu de hua, isA, word filler), and (liu de hua, isA, producer), among others.
at operation 201, the data source associated with the particular entity word includes at least one of the following types: structured data, semi-structured data, and unstructured data. Accordingly, operation 201 includes: and performing knowledge extraction from the data source associated with the specific entity word by adopting a knowledge extraction mode corresponding to the type of the data source, wherein different data sources correspond to different knowledge extraction methods.
In an example, if the data source type is structured data, then knowledge is extracted from a relational database using the D2R method or from linked data using the graph mapping method.
In another example, if the data source type is semi-structured data, a wrapper is used to extract knowledge from the semi-structured data. Wherein a wrapper may also be referred to as a decimator. Specifically, the semi-structured data is taken as a web page as an example to analyze the web page, and knowledge and a triple are extracted from the web page. For example, in the case of the liu de hua Baidu encyclopedia page, multiple triples including liu de hua, such as (liu de hua, isA, actor), (liu de hua, isA, singer), (liu de hua, isA, word filler), (liu de hua, isA, slide producer), (liu de hua, isA, music character), (liu de hua, isA, amusement character), etc. can be extracted from the career of the top page, such as the actor, singer, word filler, and slide producer, to the entry tag, such as music character, actor, singer, amusement character, producer, and producer.
In yet another example, if the data source type is unstructured data, then knowledge is extracted from free text using an information extraction method. Specifically, the information extraction method may include: 1) a regular expression; 2) a template; 3) participle/technical dependency (subject & object, i.e., S & O); and 4) a method of sequence labeling, namely a multi-label classification model BERT + BilSTM + CRF.
In operation 202, it is first determined whether the obtained triples of the specific entity word are included in existing triples in the concept graph; and if the obtained triple of the specific entity word is not contained in the existing triple in the concept map, adding the obtained triple of the specific entity word into the concept map to obtain a fusion result. Conversely, if the resulting triples of a particular entity word are contained in existing triples in the concept graph, they may be ignored. For example, a triple (liu de hua, isA, actor) is first determined whether the isA triple exists in the concept map, if so, the isA triple is ignored, and if not, the isA triple is newly added.
According to an embodiment of the present invention, before determining whether the obtained triplet of the specific entity word is included in the existing triples in the concept graph, word expansion may be performed on the entity word in the obtained triplet of the specific entity word to obtain a word expanded triplet; and further judging whether the triples after the word expansion are contained in the existing triples in the concept map. For example, a triplet (liu de hua, isA, actor), liu de hua may be first word expanded into hua zi to obtain an expanded triplet such as (hua zi, isA, actor); and further judging whether the expanded isA triple exists in the concept map, if so, ignoring the extended isA triple, and if not, adding the extended isA triple.
At operation 203, the fused results may be used to perform a context inference using an inference engine to supplement more isA relationships. For example, triplets (liud, isA, entertain character) and (liud, isA, character) may be inferred from triplets (liud, isA, actor), (actor, isA, entertain character), (entertain character, isA, character).
fig. 3 is an architecture diagram of a knowledge generation module according to an embodiment of the invention. Referring to fig. 3, the knowledge generation engine includes a knowledge base query engine, a neural network translation engine, and an online translation engine; correspondingly, in operation 204, firstly, a plurality of different knowledge generation engines are used for performing translation processing on the extended triple in the inference result to obtain a plurality of processing results; then, carrying out fusion comparison on the obtained multiple processing results to obtain a fusion comparison result; and finally determining the processing result with the highest rank in the fusion comparison results as a translation result.
Wherein, different knowledge generation engines correspond to different database models, for example, a knowledge base query engine corresponds to DBPedia and Wikipedia; the neural network translation engine corresponds to a neural network machine translation model; the online translation engine corresponds to Baidu translation, Google translation, track translation and the like. Specifically, a multi-strategy translation mode is used, the triples after upper and lower reasoning are translated into English entities by a knowledge base query engine, a neural network machine translation engine and an online translation engine respectively, the English entities are translated into Chinese entities, and finally translation results with the highest ranking (i.e. top1) in the fusion comparison results are returned after recall result fusion comparison.
At operation 205, the type of storage engine includes gStore, Neo4j, and digraph; and different storage engines correspond to different data formats. Specifically, in operation 205, a corresponding storage service may be selected according to the type of the storage engine, and the translation result (i.e., the knowledge generation result) obtained by the knowledge generation module may be written into the storage service according to the data format corresponding to the type of the selected query engine.
Those skilled in the art will appreciate that in the selection of query engine types, the engine type is typically determined by the amount of storage, which is from small to large, gStore, Neo4j, and digraph. Of course, under the default condition, the engine type corresponding to the default parameter of the device at the time of starting may be used as the standard.
According to an embodiment of the present invention, after operation 205, the method may further include: and reading the written translation result from the storage service by adopting a query engine matched with the storage engine.
the knowledge graph construction method comprises the steps of firstly, extracting knowledge from a data source associated with a specific entity word to obtain a triple of the specific entity word; then fusing the obtained triples of the specific entity words with the existing triples in the concept map to obtain a fusion result; performing upper and lower reasoning on the obtained fusion result to obtain a reasoning result of the triples for expanding the specific entity words; further translating the extended triples in the inference result by using a knowledge generation engine to obtain a translation result; and finally, writing the obtained translation result into a storage service through a storage engine. Taking the concept knowledge graph effect graph of Liu De Hua as shown in FIG. 4 as an example, the finally constructed concept knowledge graph comprises more than 50 ten thousand concepts, 5000 ten thousand entities and 2.5 hundred million isA relations. Therefore, the invention is strictly organized according to the entity in the construction process of the knowledge graph, which is beneficial to the accurate understanding of the entity; moreover, by carrying out fusion, reasoning and translation processing on the triples obtained by knowledge extraction, a large-scale high-quality concept knowledge graph can be constructed and completed, so that the accuracy and the recall rate of natural language understanding are improved. For example, regarding P30 for Huache and Iphone 10 for apple, traditional natural language understanding only extracts Huache, P30, apple, Iphone 10. However, as it is well known that Huacheng refers to a company, P30 and Iphone 10 are electronic products, and since apple may be a fruit or apple company, with the help of conceptual knowledge maps, it can be inferred that apple here refers to apple company, and thus the subject of this text is the product release meeting of Huacheng and apple company.
Also, based on the knowledge-graph constructing method as described above, an embodiment of the present invention further provides a computer-readable storage medium storing a program that, when executed by a processor, causes the processor to perform at least the following operation steps: operation 201, performing knowledge extraction from a data source associated with a specific entity word to obtain a triple of the specific entity word; operation 202, fusing the obtained triples of the specific entity words with the existing triples in the concept graph to obtain a fusion result; operation 203, performing upper and lower reasoning on the obtained fusion result to obtain a reasoning result of the triplet for expanding the specific entity word; operation 204, translating the extended triple in the inference result by using a knowledge generation engine to obtain a translation result; in operation 205, the resulting translation result is written into the storage service by the storage engine.
Further, based on the above-mentioned method for constructing a knowledge graph, an embodiment of the present invention further provides an apparatus for constructing a knowledge graph, as shown in fig. 5, where the apparatus 50 includes: a knowledge extraction module 501, configured to perform knowledge extraction from a data source associated with a specific entity word to obtain a triple of the specific entity word; the fusion module 502 is configured to fuse the obtained triples of the specific entity words with existing triples in the concept graph to obtain a fusion result; the inference module 503 is configured to perform upper and lower inference on the obtained fusion result to obtain an inference result of the triplet that expands the specific entity word; a knowledge generation module 504, configured to perform translation processing on the extended triple in the inference result by using a knowledge generation engine to obtain a translation result; and a storage module 505, configured to write the obtained translation result into the storage service through the storage engine.
According to an embodiment of the present invention, the data source associated with the specific entity word includes at least one of the following types: structured data, semi-structured data, and unstructured data; correspondingly, the knowledge extraction module 501 is specifically configured to extract knowledge from a data source associated with a specific entity word in a knowledge extraction manner corresponding to a data source type, where different data source types correspond to different knowledge extraction methods.
According to an embodiment of the present invention, the knowledge extraction module 501 is specifically configured to, if the data source type is structured data, extract knowledge from a relational database using a D2R method or extract knowledge from link data using a graph mapping method; and/or, if the data source type is semi-structured data, extracting knowledge from the semi-structured data by using a wrapper; and/or if the data source type is unstructured data, extracting knowledge from the free text by using an information extraction method.
According to an embodiment of the present invention, the fusion module 502 includes: the judging unit is used for judging whether the obtained triple of the specific entity word is contained in the existing triple in the concept map; and the adding unit is used for adding the obtained triple of the specific entity word into the concept map to obtain a fusion result if the obtained triple of the specific entity word is not contained in the existing triple in the concept map.
according to an embodiment of the present invention, the determining unit is specifically configured to perform word expansion on an entity word in the obtained triple of the specific entity word to obtain an expanded triple; and judging whether the triples of the expanded words are contained in the existing triples in the concept map.
According to one embodiment of the invention, the knowledge generation engine comprises a knowledge query engine, a neural network translation engine and an online translation engine; correspondingly, the knowledge generation module 504 is specifically configured to perform translation processing on the extended triple in the inference result by using a plurality of different knowledge generation engines to obtain a plurality of processing results; performing fusion comparison on the obtained multiple processing results to obtain a fusion comparison result; and determining the processing result with the highest rank in the fusion comparison results as a translation result.
according to an embodiment of the present invention, as shown in fig. 5, the apparatus 50 further includes a query module 506, configured to read the written translation result from the storage service by using a query engine matching the storage engine.
Here, it should be noted that: the above description of the embodiment of the knowledge graph constructing apparatus is similar to the description of the embodiment of the method shown in fig. 2, and has similar beneficial effects to the embodiment of the method shown in fig. 2, and therefore, the description thereof is omitted. For technical details that are not disclosed in the embodiment of the knowledge-graph constructing apparatus of the present invention, please refer to the description of the embodiment of the method shown in fig. 2 of the present invention, which will not be repeated herein for brevity.
it should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another device, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.
Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method of knowledge graph construction, the method comprising:
Extracting knowledge from a data source associated with a specific entity word to obtain a triple of the specific entity word;
Fusing the obtained triples of the specific entity words with the existing triples in the concept map to obtain a fusion result;
performing upper and lower reasoning on the obtained fusion result to obtain a reasoning result of the triple of the specific entity word;
translating the extended triples in the inference result by using a knowledge generation engine to obtain a translation result;
The resulting translation results are written to the storage service by the storage engine.
2. The method of claim 1, wherein the data source associated with the particular entity word comprises at least one of the following types: structured data, semi-structured data, and unstructured data;
Accordingly, knowledge extraction from data sources associated with particular entity words includes:
And extracting knowledge from the data source associated with the specific entity word by adopting a knowledge extraction mode corresponding to the type of the data source, wherein different data source types correspond to different knowledge extraction methods.
3. the method of claim 2, wherein performing knowledge extraction from data sources associated with particular entity words comprises:
If the data source type is structured data, extracting knowledge from a relational database by using a D2R method or extracting knowledge from link data by using a graph mapping method;
And/or, if the data source type is semi-structured data, extracting knowledge from the semi-structured data by using a wrapper;
And/or if the data source type is unstructured data, extracting knowledge from the free text by using an information extraction method.
4. The method according to claim 1, wherein the fusing the obtained triples of the specific entity words with the existing triples in the concept graph to obtain a fused result comprises:
Judging whether the obtained triple of the specific entity word is contained in the existing triple in the concept map;
and if the obtained triple of the specific entity word is not contained in the existing triple in the concept map, adding the obtained triple of the specific entity word into the concept map to obtain a fusion result.
5. the method of claim 4, wherein the determining whether the obtained triples of the specific entity word are included in the existing triples in the concept graph comprises:
performing word expansion on the entity words in the obtained triples of the specific entity words to obtain triples after the word expansion;
and judging whether the triples of the expanded words are contained in the existing triples in the concept map.
6. the method of claim 1, wherein the knowledge generation engine comprises a knowledge base query engine, a neural network translation engine, and an online translation engine;
correspondingly, the method for translating the triples expanded in the inference result by using the knowledge generation engine to obtain the translation result comprises the following steps:
Translating the extended triple in the inference result by using a plurality of different knowledge generation engines to obtain a plurality of processing results;
Performing fusion comparison on the obtained multiple processing results to obtain a fusion comparison result;
And determining the processing result with the highest rank in the fusion comparison results as a translation result.
7. The method of any of claims 1 to 6, wherein after writing the obtained translation results to the storage service by the storage engine, the method further comprises:
and reading the written translation result from the storage service by adopting a query engine matched with the storage engine.
8. An apparatus for knowledge-graph construction, the apparatus comprising:
The knowledge extraction module is used for extracting knowledge from a data source associated with a specific entity word to obtain a triple of the specific entity word;
The fusion module is used for fusing the obtained triples of the specific entity words with the existing triples in the concept map to obtain a fusion result;
the reasoning module is used for carrying out upper and lower reasoning on the obtained fusion result to obtain a reasoning result of the triple of the specific entity word;
The knowledge generation module is used for translating the extended triples in the inference result by using a knowledge generation engine to obtain a translation result;
And the storage module is used for writing the obtained translation result into the storage service through the storage engine.
9. the apparatus of claim 8, wherein the data source associated with a particular entity word comprises at least one of the following types: structured data, semi-structured data, and unstructured data;
Correspondingly, the knowledge extraction module is specifically configured to extract knowledge from the data source associated with the specific entity word in a knowledge extraction manner corresponding to the type of the data source, where different data source types correspond to different knowledge extraction methods.
10. a computer-readable storage medium comprising a set of computer-executable instructions that, when executed, perform the method of knowledge-graph construction of any one of claims 1 to 7.
CN201910875545.XA 2019-09-17 2019-09-17 Knowledge graph construction method and device and storage equipment Pending CN110569371A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910875545.XA CN110569371A (en) 2019-09-17 2019-09-17 Knowledge graph construction method and device and storage equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910875545.XA CN110569371A (en) 2019-09-17 2019-09-17 Knowledge graph construction method and device and storage equipment

Publications (1)

Publication Number Publication Date
CN110569371A true CN110569371A (en) 2019-12-13

Family

ID=68780587

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910875545.XA Pending CN110569371A (en) 2019-09-17 2019-09-17 Knowledge graph construction method and device and storage equipment

Country Status (1)

Country Link
CN (1) CN110569371A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111222918A (en) * 2020-01-04 2020-06-02 厦门二五八网络科技集团股份有限公司 Keyword mining method and device, electronic equipment and storage medium
CN111444181A (en) * 2020-03-20 2020-07-24 腾讯科技(深圳)有限公司 Knowledge graph updating method and device and electronic equipment
CN111767440A (en) * 2020-09-03 2020-10-13 平安国际智慧城市科技股份有限公司 Vehicle portrayal method based on knowledge graph, computer equipment and storage medium
CN111897972A (en) * 2020-08-06 2020-11-06 南方电网科学研究院有限责任公司 Data track visualization method and device
CN112380864A (en) * 2020-11-03 2021-02-19 广西大学 Text triple labeling sample enhancement method based on translation

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070250493A1 (en) * 2006-04-19 2007-10-25 Peoples Bruce E Multilingual data querying
CN103678714A (en) * 2013-12-31 2014-03-26 北京百度网讯科技有限公司 Construction method and device for entity knowledge base
CN107368468A (en) * 2017-06-06 2017-11-21 广东广业开元科技有限公司 A kind of generation method and system of O&M knowledge mapping
CN109271529A (en) * 2018-10-10 2019-01-25 内蒙古大学 Cyrillic Mongolian and the double language knowledge mapping construction methods of traditional Mongolian
CN109378053A (en) * 2018-11-30 2019-02-22 安徽影联云享医疗科技有限公司 A kind of knowledge mapping construction method for medical image
CN110008355A (en) * 2019-04-11 2019-07-12 华北科技学院 The disaster scene information fusion method and device of knowledge based map

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070250493A1 (en) * 2006-04-19 2007-10-25 Peoples Bruce E Multilingual data querying
CN103678714A (en) * 2013-12-31 2014-03-26 北京百度网讯科技有限公司 Construction method and device for entity knowledge base
CN107368468A (en) * 2017-06-06 2017-11-21 广东广业开元科技有限公司 A kind of generation method and system of O&M knowledge mapping
CN109271529A (en) * 2018-10-10 2019-01-25 内蒙古大学 Cyrillic Mongolian and the double language knowledge mapping construction methods of traditional Mongolian
CN109378053A (en) * 2018-11-30 2019-02-22 安徽影联云享医疗科技有限公司 A kind of knowledge mapping construction method for medical image
CN110008355A (en) * 2019-04-11 2019-07-12 华北科技学院 The disaster scene information fusion method and device of knowledge based map

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111222918A (en) * 2020-01-04 2020-06-02 厦门二五八网络科技集团股份有限公司 Keyword mining method and device, electronic equipment and storage medium
CN111222918B (en) * 2020-01-04 2023-06-30 厦门二五八网络科技集团股份有限公司 Keyword mining method and device, electronic equipment and storage medium
CN111444181A (en) * 2020-03-20 2020-07-24 腾讯科技(深圳)有限公司 Knowledge graph updating method and device and electronic equipment
CN111897972A (en) * 2020-08-06 2020-11-06 南方电网科学研究院有限责任公司 Data track visualization method and device
CN111897972B (en) * 2020-08-06 2023-10-17 南方电网科学研究院有限责任公司 Data track visualization method and device
CN111767440A (en) * 2020-09-03 2020-10-13 平安国际智慧城市科技股份有限公司 Vehicle portrayal method based on knowledge graph, computer equipment and storage medium
CN112380864A (en) * 2020-11-03 2021-02-19 广西大学 Text triple labeling sample enhancement method based on translation

Similar Documents

Publication Publication Date Title
CN110569371A (en) Knowledge graph construction method and device and storage equipment
US9600530B2 (en) Updating a search index used to facilitate application searches
US8868609B2 (en) Tagging method and apparatus based on structured data set
JP5576003B1 (en) Corpus generation device, corpus generation method, and corpus generation program
CN107562600B (en) Page detection method and device, computing equipment and storage medium
WO2019153685A1 (en) Text processing method, apparatus, computer device and storage medium
US20210012103A1 (en) Systems and methods for information extraction from text documents with spatial context
JP6462970B1 (en) Classification device, classification method, generation method, classification program, and generation program
CN112463991B (en) Historical behavior data processing method and device, computer equipment and storage medium
CN108228676A (en) Information extraction method and system
CN107798123A (en) Knowledge base and its foundation, modification, intelligent answer method, apparatus and equipment
CN104750771A (en) Method and system for contextual data analysis using domain information
US10489024B2 (en) UI rendering based on adaptive label text infrastructure
CN111258577B (en) Page rendering method, device, electronic equipment and storage medium
US9674259B1 (en) Semantic processing of content for product identification
US8290925B1 (en) Locating product references in content pages
CN109191158A (en) The processing method and processing equipment of user's portrait label data
CN113157899B (en) Big data portrait analysis method, server and readable storage medium
CN107273548A (en) The implementation method and device of dynamic page
Zhang et al. Annotating needles in the haystack without looking: Product information extraction from emails
CN111523289A (en) Text format generation method, device, equipment and readable medium
KR20150084706A (en) Apparatus for knowledge learning of ontology and method thereof
CN112463986A (en) Information storage method and device
CN109598171A (en) A kind of data processing method based on two dimensional code, apparatus and system
CN114021042A (en) Webpage content extraction method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191213

RJ01 Rejection of invention patent application after publication