WO2020204364A3 - 단어의 문맥 정보와 형태론적 정보를 고려한 단어 임베딩 방법 및 장치 - Google Patents

단어의 문맥 정보와 형태론적 정보를 고려한 단어 임베딩 방법 및 장치 Download PDF

Info

Publication number
WO2020204364A3
WO2020204364A3 PCT/KR2020/003000 KR2020003000W WO2020204364A3 WO 2020204364 A3 WO2020204364 A3 WO 2020204364A3 KR 2020003000 W KR2020003000 W KR 2020003000W WO 2020204364 A3 WO2020204364 A3 WO 2020204364A3
Authority
WO
WIPO (PCT)
Prior art keywords
word
embedding
context
sentence
character model
Prior art date
Application number
PCT/KR2020/003000
Other languages
English (en)
French (fr)
Other versions
WO2020204364A2 (ko
Inventor
원민섭
이지형
이상헌
신윤섭
정동언
Original Assignee
성균관대학교 산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 성균관대학교 산학협력단 filed Critical 성균관대학교 산학협력단
Publication of WO2020204364A2 publication Critical patent/WO2020204364A2/ko
Publication of WO2020204364A3 publication Critical patent/WO2020204364A3/ko

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)

Abstract

본 발명은 단어의 문맥 정보와 형태론적 정보를 고려한 단어 임베딩 방법 및 장치에 관한 것으로, 본 발명의 일 실시예에 따른 단어 임베딩 방법은, 학습시킬 문장에서 미등록 단어(OOV: Out Of Vocabulary)를 미지의 토큰(unknown token)으로 대체하여 문장을 가공하는 단계, 상기 가공된 문장에서 상기 미등록 단어를 제외한 타겟 단어의 문자(Character)를 학습 대상인 문맥 문자 모델(Context Character Model)의 입력으로 입력하는 단계, 상기 문장에서 타겟 단어의 주변 단어에 대한 주변 문맥 벡터를 조합하여 상기 문맥 문자 모델의 초기 상태로 설정하는 단계; 및 상기 문맥 문자 모델로부터 산출된 순방향 은닉 상태(Forward hidden state) 및 역방향 은닉 상태(Backward hidden state)를 연결하여 생성된 상기 타겟 단어의 예측 임베딩(Predicted embedding)과 상기 타겟 단어의 실제 임베딩(Real embedding) 간의 오류가 최소가 되도록, 상기 문맥 문자 모델을 학습하는 단계를 포함한다.
PCT/KR2020/003000 2019-04-02 2020-03-03 단어의 문맥 정보와 형태론적 정보를 고려한 단어 임베딩 방법 및 장치 WO2020204364A2 (ko)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2019-0038587 2019-04-02
KR1020190038587A KR102227939B1 (ko) 2019-04-02 2019-04-02 단어의 문맥 정보와 형태론적 정보를 고려한 단어 임베딩 방법 및 장치

Publications (2)

Publication Number Publication Date
WO2020204364A2 WO2020204364A2 (ko) 2020-10-08
WO2020204364A3 true WO2020204364A3 (ko) 2020-11-19

Family

ID=72667172

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2020/003000 WO2020204364A2 (ko) 2019-04-02 2020-03-03 단어의 문맥 정보와 형태론적 정보를 고려한 단어 임베딩 방법 및 장치

Country Status (2)

Country Link
KR (1) KR102227939B1 (ko)
WO (1) WO2020204364A2 (ko)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102586569B1 (ko) 2020-11-12 2023-10-10 주식회사 엔씨소프트 아이템 임베딩 장치 및 방법
KR102614912B1 (ko) * 2021-02-10 2023-12-19 주식회사 페어랩스 딥러닝 기반 특허 잠재가치 평가 장치 및 그 방법
CN113190602B (zh) * 2021-04-09 2022-03-25 桂林电子科技大学 融合字词特征与深度学习的事件联合抽取方法
CN113254637B (zh) * 2021-05-07 2023-04-07 山东师范大学 一种融合语法的方面级文本情感分类方法及系统
KR102574512B1 (ko) * 2021-08-19 2023-09-05 성균관대학교산학협력단 은유 탐지 장치 및 방법

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004070636A (ja) * 2002-08-06 2004-03-04 Mitsubishi Electric Corp 概念検索装置
JP2019021206A (ja) * 2017-07-20 2019-02-07 ヤフー株式会社 学習装置、プログラムパラメータ、学習方法およびモデル
KR20190019661A (ko) * 2017-08-18 2019-02-27 동아대학교 산학협력단 언어 분석기별 정답 레이블 분포를 이용한 자연어 이해 방법

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004070636A (ja) * 2002-08-06 2004-03-04 Mitsubishi Electric Corp 概念検索装置
JP2019021206A (ja) * 2017-07-20 2019-02-07 ヤフー株式会社 学習装置、プログラムパラメータ、学習方法およびモデル
KR20190019661A (ko) * 2017-08-18 2019-02-27 동아대학교 산학협력단 언어 분석기별 정답 레이블 분포를 이용한 자연어 이해 방법

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
BAZZI, ISSAM AND GLASS, JAMES R.: "Modelling Out-of-Vocabulary Words for Robust Speech Recognition", MASSACHUSETTS INSTITUTE OF TECHNOLOGY. DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE, June 2002 (2002-06-01), pages 1 - 153, XP055753396 *
FRANZISKA HORN: "Context encoders as a simple but powerful extension of word2vec", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 8 June 2017 (2017-06-08), 201 Olin Library Cornell University Ithaca, NY 14853, XP080768410, DOI: 10.18653/v1/W17-2602 *
SEO, MINJOON ET AL.: "Bidirectional Attention Flow for Machine Comprehension", ARXIV:1611.01603V6, 21 August 2018 (2018-08-21), XP055543095, Retrieved from the Internet <URL:https://arxiv.org/pdf/1611.01603.pdf> *
WON MIN-SUB; LEE JEE-HYONG: "Embedding for Out of Vocabulary Words Considering Contextual and Morphosyntactic Information", 2018 INTERNATIONAL CONFERENCE ON FUZZY THEORY AND ITS APPLICATIONS (IFUZZY), IEEE, 14 November 2018 (2018-11-14), pages 212 - 215, XP033571606, DOI: 10.1109/iFUZZY.2018.8751687 *

Also Published As

Publication number Publication date
KR20200116760A (ko) 2020-10-13
WO2020204364A2 (ko) 2020-10-08
KR102227939B1 (ko) 2021-03-15

Similar Documents

Publication Publication Date Title
WO2020204364A3 (ko) 단어의 문맥 정보와 형태론적 정보를 고려한 단어 임베딩 방법 및 장치
CN109446534B (zh) 机器翻译方法及装置
Xu et al. Contextual domain classification in spoken language understanding systems using recurrent neural network
US9818409B2 (en) Context-dependent modeling of phonemes
US11669712B2 (en) Robustness evaluation via natural typos
US10181098B2 (en) Generating representations of input sequences using neural networks
CN109313719B (zh) 使用神经网络生成文本段的依存性解析
US20240054767A1 (en) Multi-modal Model Training Method, Apparatus and Device, and Storage Medium
JP5788953B2 (ja) 音声認識のエラー修正方法及び装置
US10866877B2 (en) Automated repair of bugs and security vulnerabilities in software
KR20210046840A (ko) 모바일 디바이스들에서의 모달리티 학습
US20210174162A1 (en) Spatial-Temporal Reasoning Through Pretrained Language Models for Video-Grounded Dialogues
Cho et al. Punctuation insertion for real-time spoken language translation
CN105074817A (zh) 用于使用手势来切换处理模式的系统和方法
US20170133016A1 (en) Speech recognition candidate selection based on non-acoustic input
US11113335B2 (en) Dialogue system and computer program therefor
US9099091B2 (en) Method and apparatus of adaptive textual prediction of voice data
US20150242386A1 (en) Using language models to correct morphological errors in text
JP2024522328A (ja) 言語モデルを使うマルチモーダル入力の処理
US20200043493A1 (en) Translation device
KR20200132619A (ko) 구어에서 수어로의 주의 기반 인공신경망 기계 번역 방법 및 그 장치
IL185752A (en) Adjustable system and method for recognizing distorted text in computer images
Guo The Re-Label Method For Data-Centric Machine Learning
Srinivasan et al. Analyzing utility of visual context in multimodal speech recognition under noisy conditions
US20130231933A1 (en) Addressee Identification of Speech in Small Groups of Children and Adults

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20784581

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20784581

Country of ref document: EP

Kind code of ref document: A2