WO2021107231A1 - Procédé et dispositif de codage de phrases au moyen d'informations de mots hiérarchiques - Google Patents

Procédé et dispositif de codage de phrases au moyen d'informations de mots hiérarchiques Download PDF

Info

Publication number
WO2021107231A1
WO2021107231A1 PCT/KR2019/016770 KR2019016770W WO2021107231A1 WO 2021107231 A1 WO2021107231 A1 WO 2021107231A1 KR 2019016770 W KR2019016770 W KR 2019016770W WO 2021107231 A1 WO2021107231 A1 WO 2021107231A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
context
attention
sentence
vector
Prior art date
Application number
PCT/KR2019/016770
Other languages
English (en)
Korean (ko)
Inventor
맹성현
김경민
Original Assignee
한국과학기술원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 한국과학기술원 filed Critical 한국과학기술원
Publication of WO2021107231A1 publication Critical patent/WO2021107231A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis

Definitions

  • the present disclosure relates to a method and apparatus for encoding a sentence by using hierarchical word information.
  • the reading comprehension system is a type of question-and-answer system that outputs an appropriate answer from the body of a document when a natural language query is input. Recently, the performance of reading comprehension systems has improved a lot with the development of deep learning technology. These reading comprehension systems compare natural language query sentences with the entire text in the document to find the correct answer.
  • One embodiment provides a method of encoding a sentence using hierarchical word information.
  • Another embodiment provides an encoder for encoding a sentence using hierarchical word information.
  • Another embodiment provides a question answering apparatus including an encoder using hierarchical word information.
  • a method for encoding a sentence includes generating a word embedding vector of a word included in a sentence, and generating at least two context attention vectors based on hierarchical word information and word embedding vectors about the word collected from an external domain. , and combining at least two contextual attention vectors to generate a sentence embedding vector.
  • the at least two context attention vectors may include a place attention vector on a place context and a time attention vector on a temporal context.
  • the generating a sentence embedding vector by combining at least two contextual attention vectors in the sentence encoding method may include combining a place attention vector and a temporal attention vector by element-wise summing.
  • the hierarchical word information may include a higher level concept of the context word hierarchically.
  • the hierarchical word information may include the name of a region in which the place is located or the name of a country in which the place is located.
  • the hierarchical word information may include a year expression in units of 10 years or a year expression in units of 100 years including a specific year.
  • an encoder for encoding a sentence includes a word embedder for generating a word embedding vector of a word included in a sentence, a database for storing hierarchical word information about a word collected from an external domain, and at least two data sets based on the hierarchical word information and the word embedding vector. and an attention determining unit with the context attention vector, and an attention combining unit generating a sentence embedding vector by combining at least two contextual attention vectors.
  • the at least two context attention vectors may include a place attention vector on a place context and a temporal attention vector on a temporal context.
  • the attention combiner may combine the place attention vector and the temporal attention vector by element-by-element summation.
  • the hierarchical word information may include hierarchical concepts of the context word.
  • the hierarchical word information may include the name of a region in which the place is located or the name of a country in which the place is located.
  • the hierarchical word information may include a year expression in units of ten years or a year expression in units of 100 years including a specific year.
  • a query answering apparatus for responding to an open domain query.
  • the query response device generates a sentence embedding vector of a query based on hierarchical word information collected from an external domain, and compares the encoder generating the context embedding vector of the document, the sentence embedding vector, and the context embedding vector to the query and document. and an embedding comparison unit that determines the degree of similarity between the two groups, and a correct answer candidate determiner that determines a correct answer candidate for a query in the document based on the similarity.
  • a query answering apparatus for responding to an open domain query.
  • the question answering device includes a processor and a memory, and the processor executes a program stored in the memory, generates a sentence embedding vector of a query based on hierarchical word information collected from an external domain, and provides information about a plurality of sentences in the document. generating a context embedding vector, determining the similarity between the query and a plurality of sentences by comparing the sentence embedding vector and the context embedding vector, respectively, and determining a correct answer candidate for the query from among the plurality of sentences based on the similarity carry out
  • learning of the language model may be performed so that sentence embedding vectors of sentences having similar meanings in contexts may be similarly output.
  • FIG. 1 is a block diagram illustrating a sentence encoder according to an embodiment.
  • FIG. 2 is a flowchart illustrating a sentence embedding method according to an embodiment.
  • FIG. 3 is a conceptual diagram illustrating a sentence embedding method according to an embodiment.
  • FIG. 4 is a diagram illustrating hierarchical word information related to a place context according to an exemplary embodiment.
  • FIG. 5 is a diagram illustrating hierarchical word information related to a temporal context according to an embodiment.
  • FIG. 6 is a block diagram illustrating a question answering apparatus according to an embodiment.
  • FIG. 7 is a block diagram illustrating an encoder according to another embodiment.
  • 'context information' may refer to a word including at least one of a place, time, or topic for a query or document.
  • the context information may be "Ohio” indicating the place and "2011” indicating the time, or 'history' indicating the topic of the question. have.
  • the 'embedding vector' may refer to a k-dimensional vector of a fixed length that implicitly represents the meaning of context information.
  • the 'embedding vector' may be expressed as a 'context embedding vector' or a 'context space'.
  • FIG. 1 is a block diagram illustrating a sentence encoder according to an embodiment.
  • the sentence encoder 100 may generate a sentence embedding vector by encoding a query sentence or a sentence in a document.
  • the sentence encoder 100 includes a word embedder 110 , an attention determiner 120 , and an attention combiner 130 .
  • the word embedder 110 encodes a word (eg, a context word) in a sentence to generate a word embedding vector.
  • a word eg, a context word
  • the method for determining the context word in the sentence is not limited herein.
  • a word embedding vector of an i-th word among a plurality of words included in one sentence may be expressed as y i .
  • the word embedder 110 may be included in an artificial intelligence language model such as word2vec, GloVe, and BERT.
  • the word embedder 110 may generate a word embedding vector by encoding a word in a query sentence, and generate a word embedding vector by encoding a word in a document related to a query or a document including a correct answer to the query can do.
  • the attention determiner 120 determines an attention matrix, an attention weight matrix, and an attention matrix based on layer information used to output a sentence embedding from a word embedding vector.
  • the attention determiner 120 may determine an attention weight matrix and an attention matrix based on layer information based on the attention matrix.
  • the attention determiner 120 may search the database 200 for hierarchical word information on the context word, and generate a context attention vector based on the word embedding vector of the context word and the searched hierarchical word information.
  • an attention weight matrix may also be used.
  • the hierarchical word information is hierarchical information about one context word.
  • the database 200 may pre-store the name of a region in which the corresponding place is located, the country name in which the corresponding place is located, and the like from an external domain as hierarchical word information.
  • the context word is a specific year
  • a 10-year year expression including a specific year and a 100-year year expression (ie, century) may be stored in advance as hierarchical word information.
  • the attention determiner 120 may inquire a parent set of one context word from the database 200 and determine an attention matrix based on hierarchical information using the parent word set.
  • the parent word set of the context word is a kind of hierarchical word information of the context word.
  • the context attention vector may include a place attention vector on the place context and a time attention vector on the temporal context.
  • the place attention vector may be generated based on the word embedding vector to which the contextual attention regarding place is reflected, and the temporal attention vector may be generated based on the word embedding vector to which the contextual attention regarding the time is reflected.
  • the context attention vector may relate to a context (name of a person, a thing, etc.) other than place and time.
  • the attention combiner 130 generates sentence embeddings by combining a place attention vector and a temporal attention vector generated from each word embedding vector.
  • the attention combiner 130 may combine two attention vectors through element-wise operations (sum, multiplication, etc.), average operation, weighted sum, and the like.
  • FIG. 2 is a flowchart illustrating a sentence embedding method according to an embodiment
  • FIG. 3 is a conceptual diagram illustrating a sentence embedding method according to an embodiment.
  • the word embedder 110 of the sentence encoder 100 encodes each word in the sentence to generate a word embedding vector ( S110 ).
  • m context words in a sentence include Ohio, governor, 2011, and word embedding vectors generated by encoding each context word are y 1 to y m .
  • the attention determiner 120 generates an attention vector for each context attention by using an attention weight matrix corresponding to each word embedding vector and an attention matrix based on layer information ( S120 ).
  • the attention determiner 120 may determine a place attention vector (Z LOC ) and a time attention vector (Z TIME ) as attention vectors for contextual attention.
  • the attention determiner 120 may determine the attention matrix based on one word embedding vector y j , the weight W s assigned to the word embedding vector, and the bias b s .
  • Attention matrix for embedding vector of jth word in one sentence may be determined as in Equation 1 below.
  • the weight W s may be predetermined for each context word so that the context attention can be reflected in the word embedding vector.
  • the weight W s corresponding to the word related to the place context may be predetermined as a larger value than the weight of other words (not related to the place). For example, when the context word is 'Ohio', the weight corresponding to the word embedding vector generated from 'Ohio' is greater than the weight of the word embedding vector of another word that does not indicate a place.
  • a weight W s corresponding to a word related to a temporal context among words included in a sentence may be predetermined as a larger value than a weight of another word (not related to time). For example, when the context word is '2011', the weight corresponding to the word embedding vector generated from '2011' is greater than the weight of the word embedding vector of other words that do not represent time.
  • the attention determiner 120 is an attention matrix An attention weight matrix may be determined based on . Attention weight matrix for the jth word embedding vector in a sentence may be determined as in Equation 2 below.
  • m is the number of context words included in one sentence.
  • the attention determiner 120 is an attention matrix Based on , the layer information-based attention matrix may be determined. Attention matrix based on hierarchical information when a context word in one sentence is located in the i-th layer may be determined as in Equation 3 below.
  • Equation 3 i is a natural number between 1 and p, and p is the number of layers of hierarchical word information regarding context words.
  • Equation 4 the context attention vector generated by the attention determiner 120 based on the attention weight matrix and the layer information-based attention matrix is the same as in Equation 4 below.
  • one context attention vector is the word embedding vector y j , the attention weight matrix , and an attention matrix based on hierarchical information can be determined based on Referring to FIG. 3 , the attention determiner 120 may determine a place attention vector Z LOC and a time attention vector Z TIME .
  • hierarchical word information about a place context and hierarchical word information about a time context which are collected from an external domain and stored in the database 200 in advance, will be described with reference to FIGS. 4 and 5 .
  • FIG. 4 is a diagram illustrating hierarchical word information related to a place context according to an exemplary embodiment
  • FIG. 5 is a diagram illustrating hierarchical word information related to a temporal context according to an exemplary embodiment.
  • hierarchical word information about context words included in a sentence may include hierarchical concepts of context words and may be collected from an external domain.
  • 'Ohio' which is a place context word included in a sentence, has 'United States' and 'Americas' as parent layers in hierarchical word information stored in the database 200 .
  • the temporal context word '2011' included in the sentence has '2010s' and '21C' as parent layers in the hierarchical word information stored in the database 200 .
  • the hierarchical word information about a place shown in FIG. 4 and the hierarchical word information about a time shown in FIG. 5 make it possible to more accurately express the context of a sentence including each context word. That is, the attention determiner 120 according to an embodiment may more accurately express a spatial or temporal inclusion relationship between contextual words in a sentence using hierarchical word information.
  • the attention combiner 130 may generate a sentence embedding vector of a sentence by combining at least two contextual attention vectors ( S130 ).
  • the attention combiner 130 may generate a sentence embedding vector by combining a place attention vector (Z LOC ) and a time attention vector (Z TIME ).
  • the attention combiner 130 combines the place attention vector (Z LOC ) and the time attention vector (Z TIME ) through element-wise operations (sum, multiplication, etc.) of vectors, average operations, weighted sum, etc. can
  • the sentence encoder 120 utilizes hierarchical word information collected from an external domain and stored in the database 200, so that sentence embedding vectors of sentences having similar meanings in contexts can be output similarly. can be learned That is, when the sentence encoder 120 learns the embedding of the context word in the sentence, it can also learn the attention matrix of the hierarchical word information collected from the external domain, and thus the sentence embedding vector of the sentence finally generated is can be learned more accurately.
  • FIG. 6 is a block diagram illustrating a question answering apparatus according to an embodiment.
  • the question answering apparatus 10 includes an encoder 100 , a database 200 , an embedding comparison unit 300 , and a correct answer candidate determiner 400 .
  • the encoder 100 may generate an embedding vector of a query sentence or document using hierarchical word information previously stored in the database 200 .
  • the query is an open domain query
  • the question answering apparatus 10 may determine a predetermined number of correct answer candidates for the open domain query.
  • the embedding comparison unit 300 may determine the similarity between the query and the document by comparing the sentence embedding vector of the query sentence and the context embedding vector of the document. Alternatively, the embedding comparison unit 300 may determine the similarity between the query and the plurality of sentences in the document by comparing the sentence embedding vector of the query sentence and the context embedding vector of a plurality of sentences in the document. The similarity between the query and the document may be determined as the vector similarity between the sentence embedding vector of the query sentence and the context embedding vector of the document. Alternatively, the similarity between the query and the plurality of sentences in the document may be determined as the vector similarity between the sentence embedding vector of the query sentence and the context embedding vector of the plurality of sentences.
  • the correct answer candidate determiner 400 may determine a correct answer candidate for the query in the document based on the similarity between the query and the document. Alternatively, the correct answer candidate determiner 400 may determine a correct answer candidate for the query within the plurality of sentences based on the similarity between the query and the plurality of sentences. A correct answer candidate may be determined within a document or sentence having the highest vector similarity between vectors.
  • FIG. 7 is a block diagram illustrating a sentence encoder according to another embodiment.
  • a sentence encoder may be implemented in a computer system, for example, a computer-readable medium.
  • a computer system 700 includes a processor 710 , a memory 730 , an input interface device 750 , an output interface device 760 , and a storage device 740 that communicate via a bus 770 .
  • Computer system 700 may also include a communication device 720 coupled to a network.
  • the processor 710 may be a central processing unit (CPU) or a semiconductor device that executes instructions stored in the memory 730 or the storage device 740 .
  • the memory 730 and the storage device 740 may include various types of volatile or nonvolatile storage media.
  • the memory may include read only memory (ROM) and random access memory (RAM).
  • ROM read only memory
  • RAM random access memory
  • the memory may be located inside or outside the processor, and the memory may be connected to the processor through various known means.
  • the memory is a volatile or non-volatile storage medium of various types.
  • the memory may include a read-only memory (ROM) or a random access memory (RAM).
  • an embodiment of the present invention may be implemented as a computer-implemented method, or as a non-transitory computer-readable medium having computer-executable instructions stored thereon.
  • the computer readable instructions when executed by a processor, may perform a method according to at least one aspect of the present disclosure.
  • the communication device 720 may transmit or receive a wired signal or a wireless signal.
  • the embodiment of the present invention is not implemented only through the apparatus and/or method described so far, and a program for realizing a function corresponding to the configuration of the embodiment of the present invention or a recording medium in which the program is recorded may be implemented. And, such an implementation can be easily implemented by those skilled in the art from the description of the above-described embodiments.
  • the method eg, network management method, data transmission method, transmission schedule generation method, etc.
  • the computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination.
  • the program instructions recorded on the computer-readable medium may be specially designed and configured for the embodiment of the present invention, or may be known and used by those skilled in the computer software field.
  • the computer-readable recording medium may include a hardware device configured to store and execute program instructions.
  • the computer-readable recording medium includes magnetic media such as hard disks, floppy disks, and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and floppy disks. Such as magneto-optical media, ROM, RAM, flash memory, or the like.
  • the program instructions may include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer through an interpreter or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Machine Translation (AREA)

Abstract

L'invention concerne un procédé et un codeur destinés à coder une phrase dont les étapes consistent : à générer un vecteur d'incorporation de mots d'un mot inclus dans une phrase ; à générer au moins deux vecteurs d'attention au contexte sur la base du vecteur d'incorporation de mots et d'informations de mots hiérarchiques concernant le mot, collectées depuis un domaine externe ; et à générer un vecteur d'incorporation de phrase en combinant les au moins deux vecteurs d'attention au contexte.
PCT/KR2019/016770 2019-11-28 2019-11-29 Procédé et dispositif de codage de phrases au moyen d'informations de mots hiérarchiques WO2021107231A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2019-0155815 2019-11-28
KR1020190155815A KR102483927B1 (ko) 2019-11-28 2019-11-28 계층적 단어 정보를 이용한 문장 인코딩 방법 및 장치

Publications (1)

Publication Number Publication Date
WO2021107231A1 true WO2021107231A1 (fr) 2021-06-03

Family

ID=76130268

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2019/016770 WO2021107231A1 (fr) 2019-11-28 2019-11-29 Procédé et dispositif de codage de phrases au moyen d'informations de mots hiérarchiques

Country Status (2)

Country Link
KR (1) KR102483927B1 (fr)
WO (1) WO2021107231A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102642488B1 (ko) * 2023-09-04 2024-03-04 주식회사 그래디언트바이오컨버전스 인공지능 기술을 이용하여 질문에 대한 답변을 생성하는 데이터 제공 장치, 방법 및 컴퓨터 프로그램

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20180047409A (ko) * 2016-10-31 2018-05-10 삼성전자주식회사 문장 생성 장치 및 방법
KR20180113438A (ko) * 2017-04-06 2018-10-16 네이버 주식회사 주제별 질의의 서브토픽 자동 추출 및 구조화
KR20190038243A (ko) * 2017-09-28 2019-04-08 한국과학기술원 맥락을 이용하여 문서를 검색하는 시스템 및 방법
KR20190076452A (ko) * 2017-12-22 2019-07-02 삼성전자주식회사 자연어 생성 방법 및 장치

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20180047409A (ko) * 2016-10-31 2018-05-10 삼성전자주식회사 문장 생성 장치 및 방법
KR20180113438A (ko) * 2017-04-06 2018-10-16 네이버 주식회사 주제별 질의의 서브토픽 자동 추출 및 구조화
KR20190038243A (ko) * 2017-09-28 2019-04-08 한국과학기술원 맥락을 이용하여 문서를 검색하는 시스템 및 방법
KR20190076452A (ko) * 2017-12-22 2019-07-02 삼성전자주식회사 자연어 생성 방법 및 장치

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DONGKEON LEE, KYOJOONG OH, HO-JIN CHOI, AND JEONG HEO: "Measuring Sentence Similarity Using Morpheme Embedding Model and GRU Encoder for Question and Answering System", CS-CONFERENCE PAPERS, 2018, pages 1 - 6, XP055832617, Retrieved from the Internet <URL:https://koasas.kaist.ac.kr/handle/10203/219384> [retrieved on 20200814] *

Also Published As

Publication number Publication date
KR20210066505A (ko) 2021-06-07
KR102483927B1 (ko) 2023-01-04

Similar Documents

Publication Publication Date Title
CN112131366B (zh) 训练文本分类模型及文本分类的方法、装置及存储介质
CN111737476B (zh) 文本处理方法、装置、计算机可读存储介质及电子设备
CN111382868B (zh) 神经网络结构搜索方法和神经网络结构搜索装置
WO2020122456A1 (fr) Système et procédé pour assortir de similitudes entre des images et des textes
WO2021096009A1 (fr) Procédé et dispositif permettant d&#39;enrichir la connaissance sur la base d&#39;un réseau de relations
CN112131883B (zh) 语言模型训练方法、装置、计算机设备和存储介质
CN113486665B (zh) 隐私保护文本命名实体识别方法、装置、设备及存储介质
WO2020111314A1 (fr) Appareil et procédé d&#39;interrogation-réponse basés sur un graphe conceptuel
WO2019098418A1 (fr) Procédé et dispositif d&#39;apprentissage de réseau neuronal
WO2022163996A1 (fr) Dispositif pour prédire une interaction médicament-cible à l&#39;aide d&#39;un modèle de réseau neuronal profond à base d&#39;auto-attention, et son procédé
KR101963404B1 (ko) 2-단계 최적화 딥 러닝 방법, 이를 실행시키기 위한 프로그램을 기록한 컴퓨터 판독 가능한 기록매체 및 딥 러닝 시스템
WO2021095987A1 (fr) Procédé et appareil de complémentation de connaissances basée sur une entité de type multiple
CN115017911A (zh) 针对视觉和语言的跨模态处理
CN113806582B (zh) 图像检索方法、装置、电子设备和存储介质
CN113779225B (zh) 实体链接模型的训练方法、实体链接方法及装置
WO2022050724A1 (fr) Dispositif, procédé et système de détermination de réponses à des requêtes
KR101985900B1 (ko) 텍스트 콘텐츠 작성자의 메타정보를 추론하는 방법 및 컴퓨터 프로그램
CN113128431A (zh) 视频片段检索方法、装置、介质与电子设备
WO2021107231A1 (fr) Procédé et dispositif de codage de phrases au moyen d&#39;informations de mots hiérarchiques
WO2019107625A1 (fr) Procédé de traduction automatique et appareil associé
WO2022108206A1 (fr) Procédé et appareil pour remplir un graphe de connaissances pouvant être décrit
CN118133839A (zh) 基于语义信息推理和跨模态交互的图文检索方法及系统
CN113590578B (zh) 跨语言知识单元迁移方法、装置、存储介质及终端
CN117034942B (zh) 一种命名实体识别方法、装置、设备及可读存储介质
WO2021091096A1 (fr) Procédé et appareil de réponse à des questions visuelles utilisant un réseau de classification d&#39;équités

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19953910

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19953910

Country of ref document: EP

Kind code of ref document: A1