WO2022107989A1 - Procédé et dispositif permettant de compléter des connaissances à l'aide d'un apprentissage de relation entre une interrogation et un graphe de connaissances - Google Patents

Procédé et dispositif permettant de compléter des connaissances à l'aide d'un apprentissage de relation entre une interrogation et un graphe de connaissances Download PDF

Info

Publication number
WO2022107989A1
WO2022107989A1 PCT/KR2020/018966 KR2020018966W WO2022107989A1 WO 2022107989 A1 WO2022107989 A1 WO 2022107989A1 KR 2020018966 W KR2020018966 W KR 2020018966W WO 2022107989 A1 WO2022107989 A1 WO 2022107989A1
Authority
WO
WIPO (PCT)
Prior art keywords
query
embedding
knowledge
value
knowledge graph
Prior art date
Application number
PCT/KR2020/018966
Other languages
English (en)
Korean (ko)
Inventor
박영택
이완곤
김민성
이민호
Original Assignee
숭실대학교산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 숭실대학교산학협력단 filed Critical 숭실대학교산학협력단
Publication of WO2022107989A1 publication Critical patent/WO2022107989A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Definitions

  • the present invention relates to a knowledge completion method and apparatus using a relationship learning between a query and a knowledge graph.
  • a knowledge graph refers to a network composed of relationships between entities.
  • there is a problem of an incomplete knowledge graph due to problems such as omission of relationships for specific entities or incorrect connection of relationships.
  • the present invention intends to propose a knowledge completion method and apparatus using a relationship learning between a query sentence and a knowledge graph capable of inferring missing knowledge by using a specific query sentence and a knowledge graph.
  • a query embedding module that outputs a query embedding value corresponding to an input query ; a topic extraction module for extracting topics from the input query; a knowledge graph embedding module for outputting embedding values for a plurality of predicates, subjects, and objects included in the knowledge graph; a similarity calculation module for determining a predicate most similar to a query sentence by calculating a similarity between the query embedding value and each of the embedding values of the plurality of predicates; an embedding connection module that connects the embedding value of the query with the embedding value of the most similar predicate; and a scoring module for inferring a new triple using the extracted topic, the embedding value connecting the embedding value of the query and the embedding value of the most similar predicate, and the subject and object of the knowledge graph.
  • a knowledge completion apparatus that outputs a query embedding value corresponding to an input query ; a topic extraction module for extracting topics from the input query; a knowledge
  • the query embedding module may determine an embedding value corresponding to the input query using a BERT-based RoBERT model.
  • the query embedding module may perform tokenization to separate the input query into words, and may extract a topic by excluding words having a preset number of appearances or more in the knowledge graph.
  • the similarity calculation module may perform a dot product operation on the embedding values of all predicates obtained through the graph embedding module, take a sigmoid, find the highest value, and search for a predicate most similar to the query statement.
  • the scoring module places the extracted topic in the subject, places the embedding value connecting the query embedding value and the embedding value of the most similar predicate in the predicate, and places the subject and objects of the knowledge graph in the candidate object to deduce a new triple.
  • the scoring module sequentially places a plurality of candidate objects in the score calculation function, so that the entity with the highest score is an object related to an embedding value that connects the extracted topic and the query embedding value with the embedding value of the most similar predicate. can decide
  • a method of completing knowledge by using a query and knowledge graph relationship learning in an apparatus including a processor and a memory, the method comprising: outputting a query embedding value corresponding to an input query; extracting a topic from the input query; outputting embedding values for a plurality of predicates, subjects, and objects included in the knowledge graph; determining a predicate most similar to a query statement by calculating a similarity between the query embedding value and each of the embedding values of the plurality of predicates; concatenating the embedding value of the query and the embedding value of the most similar predicate; and inferring a new triple using the extracted topic, the embedding value connecting the embedding value of the query and the embedding value of the most similar predicate, and the subject and object of the knowledge graph.
  • FIG. 1 is a diagram showing the configuration of a knowledge completion device according to an embodiment of the present invention.
  • FIG. 2 is a diagram illustrating a detailed configuration of a query embedding module according to the present embodiment.
  • FIG. 3 is a diagram illustrating a case in which the number of appearances is used for topic extraction according to the present embodiment.
  • FIG. 4 is a diagram for explaining a process of searching for a predicate similar to a query according to the present embodiment.
  • FIG. 5 is a diagram for explaining a process of inferring a new triple through the score calculation function according to the present embodiment.
  • the present invention provides a method for inferring missing knowledge by using a specific query and a knowledge graph.
  • a topic is automatically extracted from a question-type query to obtain the corresponding topic embedding value, and a new triple is created by learning the relationship between the topic and the query from the knowledge graph by using the query embedding and the knowledge graph embedding.
  • predicate embedding of a knowledge graph related to a specific query is used together.
  • FIG. 1 is a diagram showing the configuration of a knowledge completion device according to an embodiment of the present invention.
  • the knowledge completion device includes a query embedding module 100, a topic extraction module 102, a knowledge graph embedding module 104, a similarity calculation module 106, and an embedding connection module. 108 and a scoring module 110 .
  • the query embedding module 100 outputs an embedding value corresponding to the query input by the user.
  • query embedding means embedding a query inputted through various algorithms in a vector form in a multidimensional space.
  • FIG. 2 is a diagram illustrating a detailed configuration of a query embedding module according to the present embodiment.
  • a BERT-based RoBERT model is used for query embedding.
  • the BERT model on which RoBERT is based is a context-dependent model.
  • the word 'bank' can have two meanings, such as 'bank deposit' or 'river bank', so the advantage of obtaining an embedding value that expresses the characteristics of the word well considering the context, that is, the front and back sentences. have it
  • each word constituting the query is divided into tokens and input, and the first value of the result is embedded in the input query. used as a value.
  • the topic extraction module 102 extracts a topic in consideration of the number of appearances in the knowledge graph of each word included in the input query.
  • the input query is “What does Christian Bale star in?”
  • the topic is separately marked as [Christian Bale] in the query, such as “What does [Christian Bale] star in?”. Therefore, when extracting a topic, work for marking is required.
  • FIG. 3 is a diagram illustrating a case in which the number of appearances is used for topic extraction according to the present embodiment.
  • a topic is extracted by extracting a small number of words by calculating the number of appearances of each word included in a query in the knowledge graph. For example, “What does Christian Bale star in?” 'Waht', 'does', 'star', 'in', and '?' appear in the query because the probability of appearing in other queries is much higher than that of “Christian Bale”. It is desirable to extract the few words as the topic.
  • a topic may be extracted by performing tokenization of dividing a query into words and excluding words having a preset number of appearances or more (eg, 2,000 or more).
  • the knowledge graph embedding module 104 outputs an embedding matrix that well represents a knowledge graph (KB).
  • a triple is composed of a predicate corresponding to a relation and an entity (subject, object) corresponding to a subject and purpose, and the knowledge graph embedding module 104 includes embedding values for all predicates and entities included in the knowledge graph.
  • the ComplEx model that can express real and imaginary numbers as well as symmetric and asymmetric relationships can be used, and all triple learning of KG is possible through the Score Function.
  • an embedding value of a query sentence and a knowledge graph is used to find a predicate similar to the query sentence.
  • the similarity calculation module 106 calculates the similarity between the embedding values output through the query embedding module 100 and the embedding values of all predicates obtained through the knowledge graph embedding module 104 to determine the predicate most similar to the query.
  • the similarity calculation module 106 performs a dot product operation on the embedding values output through the query embedding module 100 and the embedding values of all predicates obtained through the knowledge graph embedding module 104, and takes a sigmoid to simulate Look for high values to find similar predicate embeddings.
  • FIG. 4 is a diagram for explaining a process of searching for a predicate similar to a query according to the present embodiment.
  • an embedding value is obtained through the query embedding module 100 for “What does Christian Bale star in?”. And if you take all the predicate embedding values of the knowledge graph in matrix form and take the sigmoid, you can see the similarity value between the query and each predicate as shown in the box on the right. Among them, the highest value “starred_actors” becomes the predicate embedding value most similar to the corresponding query.
  • the embedding concatenation module 108 concatenates the query embedding value output from the query embedding module 100 and the predicate embedding value most similar to the query sentence determined by the similarity calculation module 106 .
  • the scoring module 110 infers the missing triple by using the score calculation function (Equation 1) of the knowledge graph embedding module.
  • the scoring module 100 assigns the topic extracted from the query to the subject of the score calculation function ( ) and concatenating the query and similar predicate embeddings in that query to place the predicate in the score calculation function ( ) is placed in In place of the last object, entities such as the subject and object of the knowledge graph ( ) are candidates, and after calculation, the target with the highest value is found and a corresponding triple is newly inferred.
  • FIG. 5 is a diagram for explaining a process of inferring a new triple through the score calculation function according to the present embodiment.
  • the knowledge completion process using the query and knowledge graph relationship learning may be performed in an apparatus including a processor and a memory.
  • the processor may include a central processing unit (CPU) or other virtual machine capable of executing a computer program.
  • CPU central processing unit
  • the memory may include a non-volatile storage device such as a fixed hard drive or a removable storage device.
  • the removable storage device may include a compact flash unit, a USB memory stick, and the like.
  • the memory may also include volatile memory, such as various random access memories.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

Sont divulgués, un procédé et un dispositif permettant de compléter des connaissances à l'aide d'un apprentissage de relation entre une interrogation et un graphe de connaissances. Selon la présente invention, un dispositif permettant de compléter des connaissances à l'aide d'un apprentissage de relation entre une interrogation et un graphe de connaissances est décrit, comprenant : un module d'incorporation d'interrogation qui délivre une valeur d'incorporation d'interrogation correspondant à une interrogation d'entrée ; un module d'extraction de sujet qui extrait des sujets de l'interrogation d'entrée ; un module d'incorporation de graphe de connaissances qui délivre des valeurs d'incorporation pour une pluralité de prédicats, de sujets et d'objets compris dans le graphe de connaissances ; un module de calcul de similarité qui détermine le prédicat le plus similaire à l'interrogation en calculant la similarité entre la valeur d'incorporation de l'interrogation et les valeurs d'incorporation de chaque prédicat de la pluralité de prédicats ; un module de liaison d'incorporation qui relie la valeur d'incorporation de l'interrogation à la valeur d'incorporation du prédicat le plus similaire ; et un module de notation qui infère un nouveau triple à l'aide du sujet extrait, la valeur d'incorporation reliant la valeur d'incorporation d'interrogation et la valeur d'incorporation du prédicat le plus similaire, et les sujets et les objets du graphe de connaissances.
PCT/KR2020/018966 2020-11-23 2020-12-23 Procédé et dispositif permettant de compléter des connaissances à l'aide d'un apprentissage de relation entre une interrogation et un graphe de connaissances WO2022107989A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2020-0157981 2020-11-23
KR1020200157981A KR102442422B1 (ko) 2020-11-23 2020-11-23 질의문과 지식 그래프 관계 학습을 이용한 지식 완성 방법 및 장치

Publications (1)

Publication Number Publication Date
WO2022107989A1 true WO2022107989A1 (fr) 2022-05-27

Family

ID=81709225

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2020/018966 WO2022107989A1 (fr) 2020-11-23 2020-12-23 Procédé et dispositif permettant de compléter des connaissances à l'aide d'un apprentissage de relation entre une interrogation et un graphe de connaissances

Country Status (2)

Country Link
KR (1) KR102442422B1 (fr)
WO (1) WO2022107989A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102583818B1 (ko) * 2022-09-14 2023-10-04 주식회사 글로랑 Bert를 기반으로한 응답자 집단을 대표하는 질의 응답 네트워크를 활용한 인적성 검사의 표집 과정 방법

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101267038B1 (ko) * 2011-02-25 2013-05-24 주식회사 솔트룩스 벡터 공간 모델을 이용한 rdf 트리플 선택 방법, 장치, 및 그 방법을 실행하기 위한 프로그램 기록매체
KR101662450B1 (ko) * 2015-05-29 2016-10-05 포항공과대학교 산학협력단 다중 소스 하이브리드 질의응답 방법 및 시스템
US20170357906A1 (en) * 2016-06-08 2017-12-14 International Business Machines Corporation Processing un-typed triple store data
KR20180108257A (ko) * 2017-03-24 2018-10-04 (주)아크릴 온톨로지에 의해 표현되는 자원들을 이용하여 상기 온톨로지를 확장하는 방법
CN111639171A (zh) * 2020-06-08 2020-09-08 吉林大学 一种知识图谱问答方法及装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101228865B1 (ko) 2011-11-23 2013-02-01 주식회사 한글과컴퓨터 문서 표시 장치 및 문서 내 중요 단어 추출 방법

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101267038B1 (ko) * 2011-02-25 2013-05-24 주식회사 솔트룩스 벡터 공간 모델을 이용한 rdf 트리플 선택 방법, 장치, 및 그 방법을 실행하기 위한 프로그램 기록매체
KR101662450B1 (ko) * 2015-05-29 2016-10-05 포항공과대학교 산학협력단 다중 소스 하이브리드 질의응답 방법 및 시스템
US20170357906A1 (en) * 2016-06-08 2017-12-14 International Business Machines Corporation Processing un-typed triple store data
KR20180108257A (ko) * 2017-03-24 2018-10-04 (주)아크릴 온톨로지에 의해 표현되는 자원들을 이용하여 상기 온톨로지를 확장하는 방법
CN111639171A (zh) * 2020-06-08 2020-09-08 吉林大学 一种知识图谱问答方法及装置

Also Published As

Publication number Publication date
KR102442422B1 (ko) 2022-09-08
KR20220070919A (ko) 2022-05-31

Similar Documents

Publication Publication Date Title
CN111259653B (zh) 基于实体关系消歧的知识图谱问答方法、系统以及终端
CN111597314B (zh) 推理问答方法、装置以及设备
CN109783817A (zh) 一种基于深度强化学习的文本语义相似计算模型
WO2018196718A1 (fr) Procédé et dispositif de désambiguïsation d'image, support de stockage et dispositif électronique
WO2018092936A1 (fr) Procédé de regroupement de documents pour des données de texte non structurées à l'aide d'un apprentissage profond
Üstün et al. Characters or morphemes: How to represent words?
WO2020111314A1 (fr) Appareil et procédé d'interrogation-réponse basés sur un graphe conceptuel
CN108446404B (zh) 面向无约束视觉问答指向问题的检索方法及系统
CN107679070B (zh) 一种智能阅读推荐方法与装置、电子设备
CN110245353B (zh) 自然语言表示方法、装置、设备及存储介质
CN113593661A (zh) 临床术语标准化方法、装置、电子设备及存储介质
WO2022107989A1 (fr) Procédé et dispositif permettant de compléter des connaissances à l'aide d'un apprentissage de relation entre une interrogation et un graphe de connaissances
WO2021129411A1 (fr) Procédé et dispositif de traitement de texte
CN110543551B (zh) 一种问题语句处理方法和装置
CN111444313B (zh) 基于知识图谱的问答方法、装置、计算机设备和存储介质
JP2020057359A (ja) 訓練データ生成方法、訓練データ生成装置、電子機器およびコンピュータ読み取り可能な記憶媒体
CN112434533A (zh) 实体消歧方法、装置、电子设备及计算机可读存储介质
CN114722174A (zh) 提词方法和装置、电子设备及存储介质
CN109033318B (zh) 智能问答方法及装置
McClendon et al. The use of paraphrase identification in the retrieval of appropriate responses for script based conversational agents
CN111241276A (zh) 题目搜索方法、装置、设备及存储介质
CN115774782A (zh) 多语种文本分类方法、装置、设备及介质
CN114463822A (zh) 用于图像处理的神经网络训练方法、人脸识别方法及装置
CN114491060A (zh) 动态联想知识网络的更新方法、语义纠错方法
CN113591004A (zh) 游戏标签生成方法、装置、存储介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20962595

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20962595

Country of ref document: EP

Kind code of ref document: A1