WO2022107989A1 - Procédé et dispositif permettant de compléter des connaissances à l'aide d'un apprentissage de relation entre une interrogation et un graphe de connaissances - Google Patents
Procédé et dispositif permettant de compléter des connaissances à l'aide d'un apprentissage de relation entre une interrogation et un graphe de connaissances Download PDFInfo
- Publication number
- WO2022107989A1 WO2022107989A1 PCT/KR2020/018966 KR2020018966W WO2022107989A1 WO 2022107989 A1 WO2022107989 A1 WO 2022107989A1 KR 2020018966 W KR2020018966 W KR 2020018966W WO 2022107989 A1 WO2022107989 A1 WO 2022107989A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- query
- embedding
- knowledge
- value
- knowledge graph
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000004364 calculation method Methods 0.000 claims abstract description 18
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 239000000284 extract Substances 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 9
- 230000015654 memory Effects 0.000 claims description 8
- 238000010586 diagram Methods 0.000 description 10
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000007792 addition Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3334—Selection or weighting of terms from queries, including natural language queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
Definitions
- the present invention relates to a knowledge completion method and apparatus using a relationship learning between a query and a knowledge graph.
- a knowledge graph refers to a network composed of relationships between entities.
- there is a problem of an incomplete knowledge graph due to problems such as omission of relationships for specific entities or incorrect connection of relationships.
- the present invention intends to propose a knowledge completion method and apparatus using a relationship learning between a query sentence and a knowledge graph capable of inferring missing knowledge by using a specific query sentence and a knowledge graph.
- a query embedding module that outputs a query embedding value corresponding to an input query ; a topic extraction module for extracting topics from the input query; a knowledge graph embedding module for outputting embedding values for a plurality of predicates, subjects, and objects included in the knowledge graph; a similarity calculation module for determining a predicate most similar to a query sentence by calculating a similarity between the query embedding value and each of the embedding values of the plurality of predicates; an embedding connection module that connects the embedding value of the query with the embedding value of the most similar predicate; and a scoring module for inferring a new triple using the extracted topic, the embedding value connecting the embedding value of the query and the embedding value of the most similar predicate, and the subject and object of the knowledge graph.
- a knowledge completion apparatus that outputs a query embedding value corresponding to an input query ; a topic extraction module for extracting topics from the input query; a knowledge
- the query embedding module may determine an embedding value corresponding to the input query using a BERT-based RoBERT model.
- the query embedding module may perform tokenization to separate the input query into words, and may extract a topic by excluding words having a preset number of appearances or more in the knowledge graph.
- the similarity calculation module may perform a dot product operation on the embedding values of all predicates obtained through the graph embedding module, take a sigmoid, find the highest value, and search for a predicate most similar to the query statement.
- the scoring module places the extracted topic in the subject, places the embedding value connecting the query embedding value and the embedding value of the most similar predicate in the predicate, and places the subject and objects of the knowledge graph in the candidate object to deduce a new triple.
- the scoring module sequentially places a plurality of candidate objects in the score calculation function, so that the entity with the highest score is an object related to an embedding value that connects the extracted topic and the query embedding value with the embedding value of the most similar predicate. can decide
- a method of completing knowledge by using a query and knowledge graph relationship learning in an apparatus including a processor and a memory, the method comprising: outputting a query embedding value corresponding to an input query; extracting a topic from the input query; outputting embedding values for a plurality of predicates, subjects, and objects included in the knowledge graph; determining a predicate most similar to a query statement by calculating a similarity between the query embedding value and each of the embedding values of the plurality of predicates; concatenating the embedding value of the query and the embedding value of the most similar predicate; and inferring a new triple using the extracted topic, the embedding value connecting the embedding value of the query and the embedding value of the most similar predicate, and the subject and object of the knowledge graph.
- FIG. 1 is a diagram showing the configuration of a knowledge completion device according to an embodiment of the present invention.
- FIG. 2 is a diagram illustrating a detailed configuration of a query embedding module according to the present embodiment.
- FIG. 3 is a diagram illustrating a case in which the number of appearances is used for topic extraction according to the present embodiment.
- FIG. 4 is a diagram for explaining a process of searching for a predicate similar to a query according to the present embodiment.
- FIG. 5 is a diagram for explaining a process of inferring a new triple through the score calculation function according to the present embodiment.
- the present invention provides a method for inferring missing knowledge by using a specific query and a knowledge graph.
- a topic is automatically extracted from a question-type query to obtain the corresponding topic embedding value, and a new triple is created by learning the relationship between the topic and the query from the knowledge graph by using the query embedding and the knowledge graph embedding.
- predicate embedding of a knowledge graph related to a specific query is used together.
- FIG. 1 is a diagram showing the configuration of a knowledge completion device according to an embodiment of the present invention.
- the knowledge completion device includes a query embedding module 100, a topic extraction module 102, a knowledge graph embedding module 104, a similarity calculation module 106, and an embedding connection module. 108 and a scoring module 110 .
- the query embedding module 100 outputs an embedding value corresponding to the query input by the user.
- query embedding means embedding a query inputted through various algorithms in a vector form in a multidimensional space.
- FIG. 2 is a diagram illustrating a detailed configuration of a query embedding module according to the present embodiment.
- a BERT-based RoBERT model is used for query embedding.
- the BERT model on which RoBERT is based is a context-dependent model.
- the word 'bank' can have two meanings, such as 'bank deposit' or 'river bank', so the advantage of obtaining an embedding value that expresses the characteristics of the word well considering the context, that is, the front and back sentences. have it
- each word constituting the query is divided into tokens and input, and the first value of the result is embedded in the input query. used as a value.
- the topic extraction module 102 extracts a topic in consideration of the number of appearances in the knowledge graph of each word included in the input query.
- the input query is “What does Christian Bale star in?”
- the topic is separately marked as [Christian Bale] in the query, such as “What does [Christian Bale] star in?”. Therefore, when extracting a topic, work for marking is required.
- FIG. 3 is a diagram illustrating a case in which the number of appearances is used for topic extraction according to the present embodiment.
- a topic is extracted by extracting a small number of words by calculating the number of appearances of each word included in a query in the knowledge graph. For example, “What does Christian Bale star in?” 'Waht', 'does', 'star', 'in', and '?' appear in the query because the probability of appearing in other queries is much higher than that of “Christian Bale”. It is desirable to extract the few words as the topic.
- a topic may be extracted by performing tokenization of dividing a query into words and excluding words having a preset number of appearances or more (eg, 2,000 or more).
- the knowledge graph embedding module 104 outputs an embedding matrix that well represents a knowledge graph (KB).
- a triple is composed of a predicate corresponding to a relation and an entity (subject, object) corresponding to a subject and purpose, and the knowledge graph embedding module 104 includes embedding values for all predicates and entities included in the knowledge graph.
- the ComplEx model that can express real and imaginary numbers as well as symmetric and asymmetric relationships can be used, and all triple learning of KG is possible through the Score Function.
- an embedding value of a query sentence and a knowledge graph is used to find a predicate similar to the query sentence.
- the similarity calculation module 106 calculates the similarity between the embedding values output through the query embedding module 100 and the embedding values of all predicates obtained through the knowledge graph embedding module 104 to determine the predicate most similar to the query.
- the similarity calculation module 106 performs a dot product operation on the embedding values output through the query embedding module 100 and the embedding values of all predicates obtained through the knowledge graph embedding module 104, and takes a sigmoid to simulate Look for high values to find similar predicate embeddings.
- FIG. 4 is a diagram for explaining a process of searching for a predicate similar to a query according to the present embodiment.
- an embedding value is obtained through the query embedding module 100 for “What does Christian Bale star in?”. And if you take all the predicate embedding values of the knowledge graph in matrix form and take the sigmoid, you can see the similarity value between the query and each predicate as shown in the box on the right. Among them, the highest value “starred_actors” becomes the predicate embedding value most similar to the corresponding query.
- the embedding concatenation module 108 concatenates the query embedding value output from the query embedding module 100 and the predicate embedding value most similar to the query sentence determined by the similarity calculation module 106 .
- the scoring module 110 infers the missing triple by using the score calculation function (Equation 1) of the knowledge graph embedding module.
- the scoring module 100 assigns the topic extracted from the query to the subject of the score calculation function ( ) and concatenating the query and similar predicate embeddings in that query to place the predicate in the score calculation function ( ) is placed in In place of the last object, entities such as the subject and object of the knowledge graph ( ) are candidates, and after calculation, the target with the highest value is found and a corresponding triple is newly inferred.
- FIG. 5 is a diagram for explaining a process of inferring a new triple through the score calculation function according to the present embodiment.
- the knowledge completion process using the query and knowledge graph relationship learning may be performed in an apparatus including a processor and a memory.
- the processor may include a central processing unit (CPU) or other virtual machine capable of executing a computer program.
- CPU central processing unit
- the memory may include a non-volatile storage device such as a fixed hard drive or a removable storage device.
- the removable storage device may include a compact flash unit, a USB memory stick, and the like.
- the memory may also include volatile memory, such as various random access memories.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Electrically Operated Instructional Devices (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
Sont divulgués, un procédé et un dispositif permettant de compléter des connaissances à l'aide d'un apprentissage de relation entre une interrogation et un graphe de connaissances. Selon la présente invention, un dispositif permettant de compléter des connaissances à l'aide d'un apprentissage de relation entre une interrogation et un graphe de connaissances est décrit, comprenant : un module d'incorporation d'interrogation qui délivre une valeur d'incorporation d'interrogation correspondant à une interrogation d'entrée ; un module d'extraction de sujet qui extrait des sujets de l'interrogation d'entrée ; un module d'incorporation de graphe de connaissances qui délivre des valeurs d'incorporation pour une pluralité de prédicats, de sujets et d'objets compris dans le graphe de connaissances ; un module de calcul de similarité qui détermine le prédicat le plus similaire à l'interrogation en calculant la similarité entre la valeur d'incorporation de l'interrogation et les valeurs d'incorporation de chaque prédicat de la pluralité de prédicats ; un module de liaison d'incorporation qui relie la valeur d'incorporation de l'interrogation à la valeur d'incorporation du prédicat le plus similaire ; et un module de notation qui infère un nouveau triple à l'aide du sujet extrait, la valeur d'incorporation reliant la valeur d'incorporation d'interrogation et la valeur d'incorporation du prédicat le plus similaire, et les sujets et les objets du graphe de connaissances.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2020-0157981 | 2020-11-23 | ||
KR1020200157981A KR102442422B1 (ko) | 2020-11-23 | 2020-11-23 | 질의문과 지식 그래프 관계 학습을 이용한 지식 완성 방법 및 장치 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022107989A1 true WO2022107989A1 (fr) | 2022-05-27 |
Family
ID=81709225
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2020/018966 WO2022107989A1 (fr) | 2020-11-23 | 2020-12-23 | Procédé et dispositif permettant de compléter des connaissances à l'aide d'un apprentissage de relation entre une interrogation et un graphe de connaissances |
Country Status (2)
Country | Link |
---|---|
KR (1) | KR102442422B1 (fr) |
WO (1) | WO2022107989A1 (fr) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102583818B1 (ko) * | 2022-09-14 | 2023-10-04 | 주식회사 글로랑 | Bert를 기반으로한 응답자 집단을 대표하는 질의 응답 네트워크를 활용한 인적성 검사의 표집 과정 방법 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101267038B1 (ko) * | 2011-02-25 | 2013-05-24 | 주식회사 솔트룩스 | 벡터 공간 모델을 이용한 rdf 트리플 선택 방법, 장치, 및 그 방법을 실행하기 위한 프로그램 기록매체 |
KR101662450B1 (ko) * | 2015-05-29 | 2016-10-05 | 포항공과대학교 산학협력단 | 다중 소스 하이브리드 질의응답 방법 및 시스템 |
US20170357906A1 (en) * | 2016-06-08 | 2017-12-14 | International Business Machines Corporation | Processing un-typed triple store data |
KR20180108257A (ko) * | 2017-03-24 | 2018-10-04 | (주)아크릴 | 온톨로지에 의해 표현되는 자원들을 이용하여 상기 온톨로지를 확장하는 방법 |
CN111639171A (zh) * | 2020-06-08 | 2020-09-08 | 吉林大学 | 一种知识图谱问答方法及装置 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101228865B1 (ko) | 2011-11-23 | 2013-02-01 | 주식회사 한글과컴퓨터 | 문서 표시 장치 및 문서 내 중요 단어 추출 방법 |
-
2020
- 2020-11-23 KR KR1020200157981A patent/KR102442422B1/ko active IP Right Grant
- 2020-12-23 WO PCT/KR2020/018966 patent/WO2022107989A1/fr active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101267038B1 (ko) * | 2011-02-25 | 2013-05-24 | 주식회사 솔트룩스 | 벡터 공간 모델을 이용한 rdf 트리플 선택 방법, 장치, 및 그 방법을 실행하기 위한 프로그램 기록매체 |
KR101662450B1 (ko) * | 2015-05-29 | 2016-10-05 | 포항공과대학교 산학협력단 | 다중 소스 하이브리드 질의응답 방법 및 시스템 |
US20170357906A1 (en) * | 2016-06-08 | 2017-12-14 | International Business Machines Corporation | Processing un-typed triple store data |
KR20180108257A (ko) * | 2017-03-24 | 2018-10-04 | (주)아크릴 | 온톨로지에 의해 표현되는 자원들을 이용하여 상기 온톨로지를 확장하는 방법 |
CN111639171A (zh) * | 2020-06-08 | 2020-09-08 | 吉林大学 | 一种知识图谱问答方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
KR102442422B1 (ko) | 2022-09-08 |
KR20220070919A (ko) | 2022-05-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111259653B (zh) | 基于实体关系消歧的知识图谱问答方法、系统以及终端 | |
CN111597314B (zh) | 推理问答方法、装置以及设备 | |
CN109783817A (zh) | 一种基于深度强化学习的文本语义相似计算模型 | |
WO2018196718A1 (fr) | Procédé et dispositif de désambiguïsation d'image, support de stockage et dispositif électronique | |
WO2018092936A1 (fr) | Procédé de regroupement de documents pour des données de texte non structurées à l'aide d'un apprentissage profond | |
Üstün et al. | Characters or morphemes: How to represent words? | |
WO2020111314A1 (fr) | Appareil et procédé d'interrogation-réponse basés sur un graphe conceptuel | |
CN108446404B (zh) | 面向无约束视觉问答指向问题的检索方法及系统 | |
CN107679070B (zh) | 一种智能阅读推荐方法与装置、电子设备 | |
CN110245353B (zh) | 自然语言表示方法、装置、设备及存储介质 | |
CN113593661A (zh) | 临床术语标准化方法、装置、电子设备及存储介质 | |
WO2022107989A1 (fr) | Procédé et dispositif permettant de compléter des connaissances à l'aide d'un apprentissage de relation entre une interrogation et un graphe de connaissances | |
WO2021129411A1 (fr) | Procédé et dispositif de traitement de texte | |
CN110543551B (zh) | 一种问题语句处理方法和装置 | |
CN111444313B (zh) | 基于知识图谱的问答方法、装置、计算机设备和存储介质 | |
JP2020057359A (ja) | 訓練データ生成方法、訓練データ生成装置、電子機器およびコンピュータ読み取り可能な記憶媒体 | |
CN112434533A (zh) | 实体消歧方法、装置、电子设备及计算机可读存储介质 | |
CN114722174A (zh) | 提词方法和装置、电子设备及存储介质 | |
CN109033318B (zh) | 智能问答方法及装置 | |
McClendon et al. | The use of paraphrase identification in the retrieval of appropriate responses for script based conversational agents | |
CN111241276A (zh) | 题目搜索方法、装置、设备及存储介质 | |
CN115774782A (zh) | 多语种文本分类方法、装置、设备及介质 | |
CN114463822A (zh) | 用于图像处理的神经网络训练方法、人脸识别方法及装置 | |
CN114491060A (zh) | 动态联想知识网络的更新方法、语义纠错方法 | |
CN113591004A (zh) | 游戏标签生成方法、装置、存储介质及电子设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20962595 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20962595 Country of ref document: EP Kind code of ref document: A1 |