CN110347798B - Knowledge graph auxiliary understanding system based on natural language generation technology - Google Patents

Knowledge graph auxiliary understanding system based on natural language generation technology Download PDF

Info

Publication number
CN110347798B
CN110347798B CN201910629843.0A CN201910629843A CN110347798B CN 110347798 B CN110347798 B CN 110347798B CN 201910629843 A CN201910629843 A CN 201910629843A CN 110347798 B CN110347798 B CN 110347798B
Authority
CN
China
Prior art keywords
knowledge graph
subject
predicate
array
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910629843.0A
Other languages
Chinese (zh)
Other versions
CN110347798A (en
Inventor
李劲松
吕可伟
尚勇
周天舒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN201910629843.0A priority Critical patent/CN110347798B/en
Publication of CN110347798A publication Critical patent/CN110347798A/en
Priority to PCT/CN2020/083591 priority patent/WO2020233261A1/en
Priority to JP2021532885A priority patent/JP7064262B2/en
Application granted granted Critical
Publication of CN110347798B publication Critical patent/CN110347798B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Abstract

The invention discloses a knowledge graph auxiliary understanding system based on natural language generation technology, which comprises a knowledge graph selection module, a knowledge graph translation module and a result display module; the knowledge graph is converted into the natural language text by using the natural language generation technology, so that a domain expert can accurately, deeply and comprehensively know the knowledge graph in the domain before using the knowledge graph on the basis that the domain expert does not know the source code and software of the knowledge graph. Meanwhile, each short sentence is associated with the source code corresponding to the knowledge graph, if redundant and wrong information existing in the knowledge graph is found, the knowledge graph can be corrected in time, and the method is high in universality. The invention further accelerates the understanding of domain experts on the knowledge graph by using a visualization method.

Description

Knowledge graph auxiliary understanding system based on natural language generation technology
Technical Field
The invention relates to the technical field of knowledge graphs, in particular to a knowledge graph auxiliary understanding system based on a natural language generation technology.
Background
The knowledge graph is a semantic knowledge base, usually adopts a subject-predicate-object triple form to represent a knowledge point, and compared with the body which has strict requirements on logic and semantics, the knowledge graph emphasizes weak semantics and weak logic, so that the knowledge graph is well popularized in academic circles and industrial circles, and large internet companies including google begin to research the knowledge graph to improve the search quality. Reports according to 2014 show that the current knowledge graph of google has gathered over 16 billion facts, with 2.71 million facts being considered to be over 90% true. In the Google search of 5 months in 2016, the knowledge-graph answers about one third of the questions in 1000 hundred million searches of the month.
Natural language generation technology is one of the large technologies of natural language processing technology. Unlike natural language understanding, natural language generation techniques focus on how a computer expresses a given meaning, idea, etc. in natural language text. For the knowledge graph, especially for the knowledge graph in a specific field, the accuracy requirement of the knowledge graph for practical application is very high, for example, the quality of the knowledge graph of medical related knowledge graph is seriously related to the accuracy of the whole system. However, the programming Language for constructing the knowledge graph is the same as the Ontology, and mainly includes RDF (Resource Description Framework) and OWL (Web Ontology Language), and the adopted software is mainly Prot g developed by stanford university. These languages and software are highly specialized and it is difficult for irrelevant persons to understand their specific meanings without long-term learning and training. Meanwhile, knowledge points stored by OWL and RDF are unordered, and knowledge points related to the same content are stored in different parts of a program, so that the difficulty of directly understanding the source code of the knowledge map by field experts is further increased. The knowledge graph is mostly established by computer industry actors, but users are scholars and experts in the field related to the knowledge graph content, and the mismatching of the scholars and the experts results in that the domain experts cannot understand the knowledge graph content, and the knowledge graph can be further improved only through the use of the knowledge graph, but cannot visually understand and improve the knowledge graph content in advance. This indirectly leads to instability in the quality of the knowledge-graph and severity of the phenomenon of secondary knowledge-graph development of the same content. In 2017, students randomly drawn 200 biomedical related ontologies in the national biomedical ontologies center of the United states, and found that only 17 of them were officially evaluated by experts in their corresponding design documents.
Knowledge maps in many fields need domain experts to deeply and comprehensively know the contents of the representation before use so as to ensure the accuracy of the representation in the actual use process. However, the related languages and software of the knowledge map are highly professional, knowledge points with the same theme are scattered, and domain experts are difficult to master and understand the knowledge points in a short time. At present, most of software for assisting understanding of the knowledge graph presents the association of different knowledge nodes in a visual mode through searching, so that presented knowledge is local knowledge and does not relate to the knowledge graph. Meanwhile, the methods discover the problems existing in the knowledge graph in the using process without fully knowing and evaluating the knowledge graph before the knowledge graph is used.
Disclosure of Invention
The invention aims to provide a knowledge graph auxiliary understanding system based on a natural language generation technology on the basis of insufficient knowledge graph quality control and difficulty in understanding of knowledge graphs related to fields by field experts.
The invention is realized by the following technical scheme: a knowledge graph auxiliary understanding system based on natural language generation technology comprises a knowledge graph selection module, a knowledge graph translation module and a result display module;
the knowledge map selection module is used for acquiring a target knowledge map which accords with RDF or OWL grammar specifications;
the knowledge-graph translation module: firstly, extracting a triple of a target knowledge graph, and performing character string segmentation on the extracted triple to obtain three dynamic arrays: the subject array, the predicate array and the object array have one-to-one correspondence, and then the subject, the predicate and the object are assembled by using a Simplelg tool through nested circulation to form a complete short sentence; simultaneously, regarding the relationship of the subject-predicate-object, one-to-many and many-to-many, adding special characters in a predicate array and an object array for identification so as to determine that the predicate corresponds to a subject and the object corresponds to a subject and a predicate, then judging the special characters in a nested loop so as to determine the corresponding relationship of the subject, the predicate and the object, and assembling the corresponding subject, the predicate and the object by using a Simplelng tool to form a complete long sentence; the triple corresponding to the annotation part is not formed into a sentence independently, but is used as annotation information for supplementing other sentences; and then translating the target knowledge graph into short sentences and long sentences, storing the sentences into a local database (which can adopt a MySQL database) after further specification, and selecting contents of the relation between the classes and the subclasses and between the classes and the instances from three dynamic arrays of subjects, predicates and objects to assemble the files in a JSON format.
The result display module calls translation contents (namely short sentences and long sentences) of the target knowledge graph from a local database, displays the translation contents and source files (RDF (resource description framework) and OWL (network ontology language)) of the target knowledge graph together, obtains JSON (Java Server object notation) format files at the same time, draws a tree graph through a visualization tool (D3 can be adopted), and visually displays classes and subclasses in the knowledge graph and the hierarchical structure of the classes and examples.
Further, the method for acquiring the target knowledge graph by the knowledge graph selection module comprises two ways:
the first way is as follows: the method comprises the steps that a knowledge graph which accords with RDF (resource description language) or OWL (Ontology Web language) grammar specifications is crawled from an open source knowledge graph database (when the system applies knowledge graph assistance understanding in the Biomedical field, the open source knowledge graph database can select a National Biomedical Ontology Center (NCBO)), the crawled knowledge graph is translated through a knowledge graph translation module, and translation results are stored in a local database; when the system is used for searching the knowledge graph of a certain theme, the input name and the English name of the knowledge graph are subjected to similarity calculation, and the input name and the English name are sorted from large to small according to the similarity to obtain a target knowledge graph to be selected;
and (2) a second way: and uploading the knowledge graph which accords with the RDF or OWL grammar specification by the user to be used as the target knowledge graph.
Further, in the first approach for obtaining the target knowledge graph, the similarity judgment coefficient is a Jaccard similarity coefficient (Jaccard coefficient), which is commonly used for comparing similarity and difference between limited sample sets, and the larger the Jaccard coefficient value is, the higher the sample similarity is.
Record the concept set of user input names as C1The concept set of English name of the knowledge graph is marked as C2The Jaccard similarity coefficient J (C) between the two1,C2) Comprises the following steps:
Figure BDA0002128368170000031
if C1 and C2 are identical, then J (C)1,C2) A value of 1; and sequencing the search results according to the similarity, and presenting N results with higher similarity, wherein N is defined by the user.
Further, the steps of extracting the triples of the target knowledge graph in the knowledge graph translation module are as follows: the method comprises the steps of extracting subjects, predicates and objects corresponding to all knowledge points (classes, examples, object attributes, data attributes, annotations and the like) in a target knowledge graph by using SPARQL (SPARQL Protocol and RDF Query Language), and encoding the subjects, predicates and objects into triples (RDF triples) of a resource description framework.
Further, the short sentence generating step of the target knowledge graph in the knowledge graph translation module is specifically as follows: firstly, character string segmentation is carried out on the obtained triples, names of subjects, predicates and objects are obtained, and three dynamic arrays are constructed. In the short sentence generation, since the subject, predicate, and object relationships are one-to-one relationships, the subject, predicate, and object corresponding thereto may be directly assembled into a short sentence using simplelg by a nested loop.
Further, the steps of generating the long sentence of the target knowledge graph in the knowledge graph translation module are as follows: firstly, character string segmentation is carried out on the obtained triples, names of subjects, predicates and objects are obtained, and three dynamic arrays are constructed. In long sentence generation, considering that one subject can correspond to a plurality of predicates, and each predicate can correspond to a plurality of objects, in a predicate array, predicates corresponding to different subjects are marked by special identifiers; in the object array, objects of different predicates corresponding to different subjects are marked by adopting another special identifier, so that the one-to-one correspondence relationship among the subjects, the predicates and the objects is realized, then the special identifiers are judged by adopting a nested loop, and the corresponding subjects, the predicates and the objects are assembled by using Simplenlg. Wherein, different predicates of the same subject form a sentence, all sentences of the same subject form a paragraph, and different objects are connected by connecting words (and/or).
Further, the annotation information steps of the supplementary sentence of the target knowledge graph in the knowledge graph translation module are as follows: the predicate array is first cycled through, and if the predicate is "comment" (meaning that the object is the subject of the annotation), the corresponding subject and object are extracted to form a new dynamic array-annotation array, where the odd-subscripted array elements store the subject and the even-subscripted array elements store the object. And then, carrying out nested loop of the subject array, the predicate array and the object array, judging whether the subject and the object are in the annotation array, if so, adding brackets behind the subject or the object, and if the subject or the object exists, annotating the subject or the object in the brackets, then judging the predicate, and if the predicate is not "comment", assembling, otherwise, not assembling.
Further, the step of inserting the short sentence and the long sentence of the target knowledge graph into the database in the knowledge graph translation module is specifically as follows: the method comprises the steps of utilizing JDBC (Java DataBase connectivity) API to connect databases, firstly creating a DataBase and a data table for storing translation results, defining table names, table fields, confirming main keys and the like, then matching English names of knowledge maps with names stored in the DataBase, if the translation results of the knowledge maps exist in the knowledge base, not performing insertion operation, and if the translation results do not exist in the knowledge base, adding generated short sentence arrays and long sentence arrays into the data table.
Further, the specific steps of the translation content and the source file display in the result display module are as follows: after a target knowledge graph is selected in a webpage interface, all translation contents corresponding to the knowledge graph are called from a database by using ajax and displayed on the interface, and a source file of the target knowledge graph is read from a local server and displayed in the interface together.
Further, the specific steps of the visual display in the result display module are as follows: after a target knowledge graph is selected in a webpage interface, a JSON format file corresponding to the rear end is obtained by using ajax, and a tree diagram is drawn; in the tree diagram, each node represents a subject or an object, and each node is connected with other associated nodes through connecting lines.
The invention has the beneficial effects that: the knowledge graph is converted into the natural language text by using the natural language generation technology, so that a domain expert can accurately, deeply and comprehensively know the knowledge graph in the domain before using the knowledge graph on the basis that the domain expert does not know the source code and software of the knowledge graph. Meanwhile, each short sentence is associated with the source code corresponding to the knowledge graph, if redundant and wrong information existing in the knowledge graph is found, the knowledge graph can be corrected in time, and the method is high in universality. The invention further accelerates the understanding of domain experts on the knowledge graph by using a visualization method.
Drawings
FIG. 1 is a block diagram of a knowledge-graph aided understanding system based on natural language generation technology according to the present invention;
FIG. 2 is a flow chart of an implementation of the knowledge-graph aided understanding system based on natural language generation technology according to the present invention;
FIG. 3 is a flow diagram of natural language generation by the knowledge-graph translation module of the present invention;
FIG. 4 is a schematic diagram of a portion of source code for a knowledge-graph;
FIG. 5 is a diagram of a phrase generated using natural language techniques;
FIG. 6 is a diagram of a long sentence generated using natural language techniques;
FIG. 7 is a tree diagram of classes and subclasses.
Detailed Description
The invention is described in further detail below with reference to the figures and specific examples.
As shown in fig. 1 and 2, the knowledge graph aided understanding system based on the natural language generation technology provided by the invention comprises a knowledge graph selection module, a knowledge graph translation module and a result display module;
knowledge graph selection module
The knowledge map selection module is used for acquiring a target knowledge map which accords with RDF or OWL grammar specifications; the method for obtaining the target knowledge graph comprises two ways:
the first way is as follows: the method comprises the steps that a knowledge graph which accords with RDF (resource description language) or OWL (Ontology Web language) grammar specifications is crawled from an open source knowledge graph database (when the system applies knowledge graph assistance understanding in the Biomedical field, the open source knowledge graph database can select a National Biomedical Ontology Center (NCBO)), the crawled knowledge graph is translated through a knowledge graph translation module, and translation results are stored in a local database; when the system is used for searching the knowledge graph of a certain theme, the input name and the English name of the knowledge graph are subjected to similarity calculation, and the input name and the English name are sorted from large to small according to the similarity to obtain a target knowledge graph to be selected;
the similarity judgment coefficient can adopt a Jaccard similarity coefficient (Jaccard coefficient) which is commonly used for comparing similarity and difference between limited sample sets, wherein the larger the Jaccard coefficient value is, the higher the sample similarity is.
Record the concept set of user input names as C1The concept set of English name of the knowledge graph is marked as C2The Jaccard similarity coefficient J (C) between the two1,C2) Comprises the following steps:
Figure BDA0002128368170000051
if C1 and C2 are identical, then J (C)1,C2) A value of 1; and sequencing the search results according to the similarity, and presenting N results with higher similarity, wherein N is user-defined and can be set to be 15.
And (2) a second way: and uploading the knowledge graph which accords with the RDF or OWL grammar specification by the user to be used as the target knowledge graph.
Second, knowledge map translation module
As shown in fig. 3, a specific flow is to extract a triplet of a target knowledge graph, and perform string segmentation on the extracted triplet to obtain three dynamic arrays: the subject array, the predicate array and the object array have one-to-one correspondence, and then the subject, the predicate and the object are assembled by using a Simplelg tool through nested circulation to form a complete short sentence; simultaneously, regarding the relationship of the subject-predicate-object, one-to-many and many-to-many, adding special characters in a predicate array and an object array for identification so as to determine that the predicate corresponds to a subject and the object corresponds to a subject and a predicate, then judging the special characters in a nested loop so as to determine the corresponding relationship of the subject, the predicate and the object, and assembling the corresponding subject, the predicate and the object by using a Simplelng tool to form a complete long sentence; the triple corresponding to the annotation part is not formed into a sentence independently, but is used as annotation information for supplementing other sentences; the target knowledge graph is then translated into short and long sentences, and the simultaneously generated sentences need further specification, such as capital English letters at the beginning of the sentences, hyperlink added to part names and the like. And inserting the normalized sentences into a local database, and selecting the contents of the class and the subclass and the class and instance relation from three dynamic arrays of the subject, the predicate and the object to assemble the files in the JSON format. The local database can adopt MySQL database, MySQL is a popular open-source relational database management system at present, and the MySQL database can store data in different tables instead of putting all data in a warehouse, thus increasing the speed.
The steps of extracting the triples of the target knowledge graph are as follows: the method comprises the steps of extracting subjects, predicates and objects corresponding to all knowledge points (classes, examples, object attributes, data attributes, annotations and the like) in a target knowledge graph by using SPARQL (SPARQL Protocol and RDF Query Language), and encoding the subjects, predicates and objects into triples (RDF triples) of a resource description framework.
The short sentence generating step of the target knowledge graph comprises the following specific steps: firstly, character string segmentation is carried out on the obtained triples, names of subjects, predicates and objects are obtained, and three dynamic arrays are constructed. In the short sentence generation, since the subject, predicate, and object relationships are one-to-one relationships, the subject, predicate, and object corresponding thereto may be directly assembled into a short sentence using simplelg by a nested loop.
The steps of generating the long sentence of the target knowledge graph are as follows: firstly, character string segmentation is carried out on the obtained triples, names of subjects, predicates and objects are obtained, and three dynamic arrays are constructed. In long sentence generation, considering that one subject can correspond to a plurality of predicates, and each predicate can correspond to a plurality of objects, in a predicate array, predicates corresponding to different subjects are marked by special identifiers; in the object array, objects of different predicates corresponding to different subjects are marked by adopting another special identifier, so that the one-to-one correspondence relationship among the subjects, the predicates and the objects is realized, then the special identifiers are judged by adopting a nested loop, and the corresponding subjects, the predicates and the objects are assembled by using Simplenlg. Wherein, different predicates of the same subject form a sentence, all sentences of the same subject form a paragraph, and different objects are connected by connecting words (and/or).
The annotation information steps of the supplementary sentences of the target knowledge graph are as follows: the predicate array is first cycled through, and if the predicate is "comment" (meaning that the object is the subject of the annotation), the corresponding subject and object are extracted to form a new dynamic array-annotation array, where the odd-subscripted array elements store the subject and the even-subscripted array elements store the object. And then, carrying out nested loop of the subject array, the predicate array and the object array, judging whether the subject and the object are in the annotation array, if so, adding brackets behind the subject or the object, and if the subject or the object exists, annotating the subject or the object in the brackets, then judging the predicate, and if the predicate is not "comment", assembling, otherwise, not assembling.
The steps of inserting the short sentences and the long sentences of the target knowledge graph into the database are as follows: the method comprises the steps of utilizing JDBC (Java DataBase connectivity) API to realize connection between Java and a DataBase, firstly creating the DataBase and a data table for storing translation results, defining table names, table fields, confirming main keys and the like, then matching English names of a knowledge graph with names stored in the DataBase, if the translation results of the knowledge graph exist in the knowledge base, not performing insertion operation, and if the translation results do not exist in the knowledge base, adding a generated short sentence array and a generated long sentence array into the data table.
Third, result display module
The results are shown to be divided into three parts. When a target knowledge graph is selected at a webpage end or uploaded at a website, the file or the parameters are submitted to a back end through ajax, after the file is transmitted to the back end, a source code of the file is displayed on the webpage and natural language generation is automatically carried out, a generated result is inserted into a database, and then related contents are read from the database and displayed at the webpage end. Meanwhile, the system selects the contents of the relation between the class and the subclass and between the class and the instance from the three dynamic arrays of the subject, the predicate and the object, assembles the contents into a file in a JSON format, transmits the file to the front end, and utilizes a visualization tool D3 to draw a tree graph and display the main hierarchical structure of the tree graph. Taking a knowledge map of chronic kidney disease disclosed by the American biomedical ontologies center as an example, the operation results are shown in FIGS. 4-7, and FIG. 7 shows a part of the content of a dendrogram.
By using the system of the invention, after the target knowledge graph is uploaded to a website or the knowledge graph in a library is selected on the website, the system can automatically inquire related contents in the knowledge graph, divide character strings, translate RDF triples into short sentences and long sentences, further standardize sentence patterns, and finally display the generated text to a domain expert, wherein each sentence corresponds to the source code of the knowledge graph. Meanwhile, the system presents important classes and subclasses, and class and instance relations in the knowledge graph in the form of a tree diagram, and helps experts to quickly understand and master the content and information of the knowledge graph so as to control the quality in a short time.
The above are merely examples of the present invention, and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement and the like, which are not made by the inventive work, are included in the scope of protection of the present invention within the spirit and principle of the present invention.

Claims (10)

1. A knowledge graph auxiliary understanding system based on natural language generation technology is characterized by comprising a knowledge graph selection module, a knowledge graph translation module and a result display module;
the knowledge map selection module is used for acquiring a target knowledge map which accords with RDF or OWL grammar specifications;
the knowledge-graph translation module: firstly, extracting a triple of a target knowledge graph, and performing character string segmentation on the extracted triple to obtain three dynamic arrays: the subject array, the predicate array and the object array have one-to-one correspondence, and then the subject, the predicate and the object are assembled by using a Simplelg tool through nested circulation to form a complete short sentence; simultaneously, regarding the relationship of the subject-predicate-object, one-to-many and many-to-many, adding special characters in a predicate array and an object array for identification so as to determine that the predicate corresponds to a subject and the object corresponds to a subject and a predicate, then judging the special characters in a nested loop so as to determine the corresponding relationship of the subject, the predicate and the object, and assembling the corresponding subject, the predicate and the object by using a Simplelng tool to form a complete long sentence; the triple corresponding to the annotation part is not formed into a sentence independently, but is used as annotation information for supplementing other sentences; then translating the target knowledge graph into short sentences and long sentences, storing the sentences into a local database after further specification, and selecting contents of the relation between the class and the subclass and between the class and the instance from three dynamic arrays of subjects, predicates and objects to assemble the files into JSON (Java Server pages open) format files;
the result display module calls the translation content of the target knowledge graph from the local database, displays the translation content and the source file of the target knowledge graph together, obtains a JSON format file at the same time, draws a tree graph through a visualization tool, and visually displays the class and the subclass in the knowledge graph and the hierarchical structure of the class and the example.
2. The system of claim 1, wherein the knowledge-graph selection module obtains the target knowledge-graph in two ways:
the first way is as follows: crawling a knowledge graph which accords with RDF or OWL grammar specifications from an open source knowledge graph database, translating the crawled knowledge graph through a knowledge graph translation module, and storing a translation result into a local database; when the system is used for searching the knowledge graph of a certain theme, the input name and the English name of the knowledge graph are subjected to similarity calculation, and the input name and the English name are sorted from large to small according to the similarity to obtain a target knowledge graph to be selected;
and (2) a second way: and uploading the knowledge graph which accords with the RDF or OWL grammar specification by the user to be used as the target knowledge graph.
3. The system of claim 2, wherein in a first approach to obtaining a target knowledge graph, the similarity determination coefficient is a Jaccard similarity coefficient;
record the concept set of user input names as C1The concept set of English name of the knowledge graph is marked as C2The Jaccard similarity coefficient J (C) between the two1,C2) Comprises the following steps:
Figure FDA0003012352420000011
if C1 and C2 are identical, then J (C)1,C2) A value of 1; and sorting the search results according to the similarity degree.
4. The system of claim 1, wherein the steps of extracting the triples of the target knowledge-graph in the knowledge-graph translation module are as follows: and extracting subjects, predicates and objects corresponding to all knowledge points in the target knowledge graph by using the SPARQL, and encoding the subjects, predicates and objects into triples of a resource description framework, wherein the knowledge points comprise classes, instances, object attributes, data attributes and annotations.
5. The system of claim 1, wherein the steps of generating the short sentence of the target knowledge graph in the knowledge graph translation module are as follows: firstly, carrying out character string segmentation on the obtained triples to obtain names of a subject, a predicate and an object, and constructing three dynamic arrays; in the short sentence generation, since the subject, predicate, and object relationships are one-to-one relationships, the subject, predicate, and object corresponding thereto may be directly assembled into a short sentence using simplelg by a nested loop.
6. The system of claim 1, wherein the generation of the long sentence of the target knowledge graph in the knowledge graph translation module specifically comprises the following steps: firstly, carrying out character string segmentation on the obtained triples to obtain names of a subject, a predicate and an object, and constructing three dynamic arrays; in long sentence generation, considering that one subject can correspond to a plurality of predicates, and each predicate can correspond to a plurality of objects, in a predicate array, predicates corresponding to different subjects are marked by special identifiers; in the object array, marking objects of different predicates corresponding to different subjects by adopting another special identifier to realize the one-to-one correspondence of the subjects, the predicates and the objects, judging the special identifiers by adopting a nested loop, and assembling the corresponding subjects, predicates and objects by using Simplelng; wherein, different predicates of the same subject form a sentence, all sentences of the same subject form a paragraph, and different objects are connected by connecting words.
7. The system of claim 1, wherein the steps of annotating the sentences supplemented by the target knowledge graph in the knowledge graph translation module are as follows: firstly, circulating a predicate array, if the predicate is "comment", namely, the predicate represents that the object is a comment of the subject, extracting the corresponding subject and the object to form a new dynamic array-comment array, wherein the odd subscript array elements store the subject, and the even subscript array elements store the object; and then, carrying out nested loop of the subject array, the predicate array and the object array, judging whether the subject and the object are in the annotation array, if so, adding brackets behind the subject or the object, and if the subject or the object exists, annotating the subject or the object in the brackets, then judging the predicate, and if the predicate is not "comment", assembling, otherwise, not assembling.
8. The system of claim 1, wherein the steps of inserting short sentences and long sentences of the target knowledge graph into the database in the knowledge graph translation module are as follows: the JDBC API is used for connecting the databases, firstly, the database and the data table for storing the translation result are created, the table name, the table field and the confirmation main key are defined, then the English name of the knowledge map is matched with the name stored in the database, if the translation result of the knowledge map exists in the local database, the insertion operation is not carried out, and if the translation result of the knowledge map does not exist in the local database, the generated short sentence array and the generated long sentence array are added into the data table.
9. The system of claim 1, wherein the specific steps of translating the content and displaying the source file in the result displaying module are as follows: after a target knowledge graph is selected in a webpage interface, all translation contents corresponding to the knowledge graph are called from a database by using ajax and displayed on the interface, and a source file of the target knowledge graph is read from a local server and displayed in the interface together.
10. The system of claim 1, wherein the visualization of the result presentation module comprises the following steps: after a target knowledge graph is selected in a webpage interface, a JSON format file corresponding to the rear end is obtained by using ajax, and a tree diagram is drawn; in the tree diagram, each node represents a subject or an object, and each node is connected with other associated nodes through connecting lines.
CN201910629843.0A 2019-07-12 2019-07-12 Knowledge graph auxiliary understanding system based on natural language generation technology Active CN110347798B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201910629843.0A CN110347798B (en) 2019-07-12 2019-07-12 Knowledge graph auxiliary understanding system based on natural language generation technology
PCT/CN2020/083591 WO2020233261A1 (en) 2019-07-12 2020-04-07 Natural language generation-based knowledge graph understanding assistance system
JP2021532885A JP7064262B2 (en) 2019-07-12 2020-04-07 Knowledge graph understanding support system based on natural language generation technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910629843.0A CN110347798B (en) 2019-07-12 2019-07-12 Knowledge graph auxiliary understanding system based on natural language generation technology

Publications (2)

Publication Number Publication Date
CN110347798A CN110347798A (en) 2019-10-18
CN110347798B true CN110347798B (en) 2021-06-01

Family

ID=68176110

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910629843.0A Active CN110347798B (en) 2019-07-12 2019-07-12 Knowledge graph auxiliary understanding system based on natural language generation technology

Country Status (3)

Country Link
JP (1) JP7064262B2 (en)
CN (1) CN110347798B (en)
WO (1) WO2020233261A1 (en)

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347798B (en) * 2019-07-12 2021-06-01 之江实验室 Knowledge graph auxiliary understanding system based on natural language generation technology
CN111370127B (en) * 2020-01-14 2022-06-10 之江实验室 Decision support system for early diagnosis of chronic nephropathy in cross-department based on knowledge graph
US20210295036A1 (en) * 2020-03-18 2021-09-23 International Business Machines Corporation Systematic language to enable natural language processing on technical diagrams
CN112100322B (en) * 2020-08-06 2022-09-16 复旦大学 API element comparison result automatic generation method based on knowledge graph
CN112101040B (en) * 2020-08-20 2024-03-29 淮阴工学院 Ancient poetry semantic retrieval method based on knowledge graph
CN112380864B (en) * 2020-11-03 2021-05-28 广西大学 Text triple labeling sample enhancement method based on translation
CN112749184B (en) * 2021-01-13 2024-02-20 广东粤通天下科技有限公司 SPARQL joint query data source selection method
US11829726B2 (en) 2021-01-25 2023-11-28 International Business Machines Corporation Dual learning bridge between text and knowledge graph
CN112966493A (en) * 2021-02-07 2021-06-15 重庆惠统智慧科技有限公司 Knowledge graph construction method and system
CN113111458B (en) * 2021-04-13 2022-10-21 合肥工业大学 DXF-based sheet metal part automatic identification and positioning method
CN113094517A (en) * 2021-04-27 2021-07-09 中国美术学院 Method and system for constructing product knowledge unit
CN113157891B (en) * 2021-05-07 2023-11-17 泰康保险集团股份有限公司 Knowledge graph path ordering method, system, equipment and storage medium
CN113282762B (en) * 2021-05-27 2023-06-02 深圳数联天下智能科技有限公司 Knowledge graph construction method, knowledge graph construction device, electronic equipment and storage medium
CN113407688B (en) * 2021-06-15 2022-09-16 西安理工大学 Method for establishing knowledge graph-based survey standard intelligent question-answering system
CN113377349B (en) * 2021-06-21 2022-05-13 浙江工业大学 Method for detecting difference between service processes and translating natural language
CN113467755B (en) * 2021-07-12 2022-07-26 卡斯柯信号有限公司 Demand compliance analysis method, system, electronic device and storage medium
CN113553443B (en) * 2021-07-18 2023-08-22 北京智慧星光信息技术有限公司 Relation map generation method and system for recording knowledge map migration path
CN113434626B (en) * 2021-08-27 2021-12-07 之江实验室 Multi-center medical diagnosis knowledge map representation learning method and system
CN113810480B (en) * 2021-09-03 2022-09-16 海南大学 Emotion communication method based on DIKW content object
CN113890899B (en) * 2021-09-13 2022-11-18 北京交通大学 Protocol conversion method based on knowledge graph
CN113805847A (en) * 2021-09-15 2021-12-17 南通在渡教育咨询有限公司 On-line codeless development system
CN114153943A (en) * 2021-11-22 2022-03-08 之江实验室 System and method for constructing robot behavior tree based on knowledge graph
CN114201618B (en) * 2022-02-17 2022-09-13 药渡经纬信息科技(北京)有限公司 Drug development literature visualization interpretation method and system
WO2023159650A1 (en) * 2022-02-28 2023-08-31 Microsoft Technology Licensing, Llc Mining and visualizing related topics in knowledge base
CN115271683B (en) * 2022-09-26 2023-01-13 西南交通大学 BIM automatic standard checking system based on standard knowledge map element structure
CN115545006B (en) * 2022-10-10 2024-02-13 清华大学 Rule script generation method, device, computer equipment and medium
CN115577713B (en) * 2022-12-07 2023-03-17 中科雨辰科技有限公司 Text processing method based on knowledge graph
CN116628229B (en) * 2023-07-21 2023-11-10 支付宝(杭州)信息技术有限公司 Method and device for generating text corpus by using knowledge graph
CN117436420A (en) * 2023-12-18 2024-01-23 武汉大数据产业发展有限公司 Method and device for generating business process model based on natural language processing

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766483A (en) * 2017-10-13 2018-03-06 华中科技大学 The interactive answering method and system of a kind of knowledge based collection of illustrative plates
CN107798136A (en) * 2017-11-23 2018-03-13 北京百度网讯科技有限公司 Entity relation extraction method, apparatus and server based on deep learning
CN108829696A (en) * 2018-04-18 2018-11-16 西安理工大学 Towards knowledge mapping node method for auto constructing in metro design code
CN109062939A (en) * 2018-06-20 2018-12-21 广东外语外贸大学 A kind of intelligence towards Chinese international education leads method
CN109146078A (en) * 2018-07-19 2019-01-04 桂林电子科技大学 A kind of knowledge mapping expression learning method based on dynamic route
US10216839B2 (en) * 2017-06-22 2019-02-26 International Business Machines Corporation Relation extraction using co-training with distant supervision
CN109408811A (en) * 2018-09-29 2019-03-01 联想(北京)有限公司 A kind of data processing method and server

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020148A (en) * 2012-11-23 2013-04-03 复旦大学 System and method for converting Chinese phrase structure tree banks into interdependent structure tree banks
US20170024405A1 (en) * 2015-07-24 2017-01-26 Samsung Electronics Co., Ltd. Method for automatically generating dynamic index for content displayed on electronic device
CN110741389A (en) * 2017-11-21 2020-01-31 谷歌有限责任公司 Improved data communication of entities
CN110019471B (en) * 2017-12-15 2024-03-08 微软技术许可有限责任公司 Generating text from structured data
CN109033260B (en) * 2018-07-06 2021-08-31 天津大学 Knowledge graph interactive visual query method based on RDF
CN108959613B (en) * 2018-07-17 2021-09-03 杭州电子科技大学 RDF knowledge graph-oriented semantic approximate query method
CN110347798B (en) * 2019-07-12 2021-06-01 之江实验室 Knowledge graph auxiliary understanding system based on natural language generation technology

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10216839B2 (en) * 2017-06-22 2019-02-26 International Business Machines Corporation Relation extraction using co-training with distant supervision
CN107766483A (en) * 2017-10-13 2018-03-06 华中科技大学 The interactive answering method and system of a kind of knowledge based collection of illustrative plates
CN107798136A (en) * 2017-11-23 2018-03-13 北京百度网讯科技有限公司 Entity relation extraction method, apparatus and server based on deep learning
CN108829696A (en) * 2018-04-18 2018-11-16 西安理工大学 Towards knowledge mapping node method for auto constructing in metro design code
CN109062939A (en) * 2018-06-20 2018-12-21 广东外语外贸大学 A kind of intelligence towards Chinese international education leads method
CN109146078A (en) * 2018-07-19 2019-01-04 桂林电子科技大学 A kind of knowledge mapping expression learning method based on dynamic route
CN109408811A (en) * 2018-09-29 2019-03-01 联想(北京)有限公司 A kind of data processing method and server

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向中文知识图谱构建的知识抽取方法研究与实现;赫中翮;《信息科技》;20190115;I138-4375 *

Also Published As

Publication number Publication date
JP7064262B2 (en) 2022-05-10
WO2020233261A1 (en) 2020-11-26
JP2022510031A (en) 2022-01-25
CN110347798A (en) 2019-10-18

Similar Documents

Publication Publication Date Title
CN110347798B (en) Knowledge graph auxiliary understanding system based on natural language generation technology
CN110399457B (en) Intelligent question answering method and system
Shigarov et al. Rule-based spreadsheet data transformation from arbitrary to relational tables
CN111475623A (en) Case information semantic retrieval method and device based on knowledge graph
CN102955848B (en) A kind of three-dimensional model searching system based on semanteme and method
JP2017513134A (en) Ontology mapping method and apparatus
WO2021213314A1 (en) Data processing method and device, and computer readable storage medium
Lin et al. OWL Ontology Extraction from Relational Databases via Database Reverse Engineering.
Rauf et al. Logical structure extraction from software requirements documents
Florescu Managing Semi-Structured Data: I vividly remember during my first college class my fascination with the relational database.
Gupta et al. KG4ASTRA: question answering over Indian missiles knowledge graph
Karkar et al. Illustrate it! An Arabic multimedia text-to-picture m-learning system
Borsje et al. Graphical query composition and natural language processing in an RDF visualization interface
CN110750632A (en) Improved Chinese ALICE intelligent question-answering method and system
Emani et al. NALDO: From natural language definitions to OWL expressions
US11816770B2 (en) System for ontological graph creation via a user interface
Fudholi et al. Code (common ontology development): A knowledge integration approach from multiple ontologies
CN114490930A (en) Cultural relic question-answering system and question-answering method based on knowledge graph
Alabbas et al. Online multilingual plagiarism detection system using multi search engines
Wang et al. Question answering system of discipline inspection laws and regulations based on knowledge graph
Samih et al. * Improving Natural Language Queries Search and Retrieval through Semantic Image Annotation Understanding
US11940964B2 (en) System for annotating input data using graphs via a user interface
Klang et al. Docforia: A multilayer document model
CN116204618A (en) Intelligent question and answer generation method and device, electronic equipment and storage medium
Ting et al. Query refinement for ontology information extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant