WO2020204364A3 - 단어의 문맥 정보와 형태론적 정보를 고려한 단어 임베딩 방법 및 장치 - Google Patents
단어의 문맥 정보와 형태론적 정보를 고려한 단어 임베딩 방법 및 장치 Download PDFInfo
- Publication number
- WO2020204364A3 WO2020204364A3 PCT/KR2020/003000 KR2020003000W WO2020204364A3 WO 2020204364 A3 WO2020204364 A3 WO 2020204364A3 KR 2020003000 W KR2020003000 W KR 2020003000W WO 2020204364 A3 WO2020204364 A3 WO 2020204364A3
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- word
- embedding
- context
- sentence
- character model
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Machine Translation (AREA)
Abstract
본 발명은 단어의 문맥 정보와 형태론적 정보를 고려한 단어 임베딩 방법 및 장치에 관한 것으로, 본 발명의 일 실시예에 따른 단어 임베딩 방법은, 학습시킬 문장에서 미등록 단어(OOV: Out Of Vocabulary)를 미지의 토큰(unknown token)으로 대체하여 문장을 가공하는 단계, 상기 가공된 문장에서 상기 미등록 단어를 제외한 타겟 단어의 문자(Character)를 학습 대상인 문맥 문자 모델(Context Character Model)의 입력으로 입력하는 단계, 상기 문장에서 타겟 단어의 주변 단어에 대한 주변 문맥 벡터를 조합하여 상기 문맥 문자 모델의 초기 상태로 설정하는 단계; 및 상기 문맥 문자 모델로부터 산출된 순방향 은닉 상태(Forward hidden state) 및 역방향 은닉 상태(Backward hidden state)를 연결하여 생성된 상기 타겟 단어의 예측 임베딩(Predicted embedding)과 상기 타겟 단어의 실제 임베딩(Real embedding) 간의 오류가 최소가 되도록, 상기 문맥 문자 모델을 학습하는 단계를 포함한다.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2019-0038587 | 2019-04-02 | ||
KR1020190038587A KR102227939B1 (ko) | 2019-04-02 | 2019-04-02 | 단어의 문맥 정보와 형태론적 정보를 고려한 단어 임베딩 방법 및 장치 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2020204364A2 WO2020204364A2 (ko) | 2020-10-08 |
WO2020204364A3 true WO2020204364A3 (ko) | 2020-11-19 |
Family
ID=72667172
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2020/003000 WO2020204364A2 (ko) | 2019-04-02 | 2020-03-03 | 단어의 문맥 정보와 형태론적 정보를 고려한 단어 임베딩 방법 및 장치 |
Country Status (2)
Country | Link |
---|---|
KR (1) | KR102227939B1 (ko) |
WO (1) | WO2020204364A2 (ko) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102586569B1 (ko) | 2020-11-12 | 2023-10-10 | 주식회사 엔씨소프트 | 아이템 임베딩 장치 및 방법 |
KR102614912B1 (ko) * | 2021-02-10 | 2023-12-19 | 주식회사 페어랩스 | 딥러닝 기반 특허 잠재가치 평가 장치 및 그 방법 |
CN113190602B (zh) * | 2021-04-09 | 2022-03-25 | 桂林电子科技大学 | 融合字词特征与深度学习的事件联合抽取方法 |
CN113254637B (zh) * | 2021-05-07 | 2023-04-07 | 山东师范大学 | 一种融合语法的方面级文本情感分类方法及系统 |
KR102574512B1 (ko) * | 2021-08-19 | 2023-09-05 | 성균관대학교산학협력단 | 은유 탐지 장치 및 방법 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004070636A (ja) * | 2002-08-06 | 2004-03-04 | Mitsubishi Electric Corp | 概念検索装置 |
JP2019021206A (ja) * | 2017-07-20 | 2019-02-07 | ヤフー株式会社 | 学習装置、プログラムパラメータ、学習方法およびモデル |
KR20190019661A (ko) * | 2017-08-18 | 2019-02-27 | 동아대학교 산학협력단 | 언어 분석기별 정답 레이블 분포를 이용한 자연어 이해 방법 |
-
2019
- 2019-04-02 KR KR1020190038587A patent/KR102227939B1/ko active IP Right Grant
-
2020
- 2020-03-03 WO PCT/KR2020/003000 patent/WO2020204364A2/ko active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004070636A (ja) * | 2002-08-06 | 2004-03-04 | Mitsubishi Electric Corp | 概念検索装置 |
JP2019021206A (ja) * | 2017-07-20 | 2019-02-07 | ヤフー株式会社 | 学習装置、プログラムパラメータ、学習方法およびモデル |
KR20190019661A (ko) * | 2017-08-18 | 2019-02-27 | 동아대학교 산학협력단 | 언어 분석기별 정답 레이블 분포를 이용한 자연어 이해 방법 |
Non-Patent Citations (4)
Title |
---|
BAZZI, ISSAM AND GLASS, JAMES R.: "Modelling Out-of-Vocabulary Words for Robust Speech Recognition", MASSACHUSETTS INSTITUTE OF TECHNOLOGY. DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE, June 2002 (2002-06-01), pages 1 - 153, XP055753396 * |
FRANZISKA HORN: "Context encoders as a simple but powerful extension of word2vec", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 8 June 2017 (2017-06-08), 201 Olin Library Cornell University Ithaca, NY 14853, XP080768410, DOI: 10.18653/v1/W17-2602 * |
SEO, MINJOON ET AL.: "Bidirectional Attention Flow for Machine Comprehension", ARXIV:1611.01603V6, 21 August 2018 (2018-08-21), XP055543095, Retrieved from the Internet <URL:https://arxiv.org/pdf/1611.01603.pdf> * |
WON MIN-SUB; LEE JEE-HYONG: "Embedding for Out of Vocabulary Words Considering Contextual and Morphosyntactic Information", 2018 INTERNATIONAL CONFERENCE ON FUZZY THEORY AND ITS APPLICATIONS (IFUZZY), IEEE, 14 November 2018 (2018-11-14), pages 212 - 215, XP033571606, DOI: 10.1109/iFUZZY.2018.8751687 * |
Also Published As
Publication number | Publication date |
---|---|
KR20200116760A (ko) | 2020-10-13 |
WO2020204364A2 (ko) | 2020-10-08 |
KR102227939B1 (ko) | 2021-03-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020204364A3 (ko) | 단어의 문맥 정보와 형태론적 정보를 고려한 단어 임베딩 방법 및 장치 | |
CN109446534B (zh) | 机器翻译方法及装置 | |
Xu et al. | Contextual domain classification in spoken language understanding systems using recurrent neural network | |
US9818409B2 (en) | Context-dependent modeling of phonemes | |
US11669712B2 (en) | Robustness evaluation via natural typos | |
US10181098B2 (en) | Generating representations of input sequences using neural networks | |
CN109313719B (zh) | 使用神经网络生成文本段的依存性解析 | |
US20240054767A1 (en) | Multi-modal Model Training Method, Apparatus and Device, and Storage Medium | |
JP5788953B2 (ja) | 音声認識のエラー修正方法及び装置 | |
US10866877B2 (en) | Automated repair of bugs and security vulnerabilities in software | |
KR20210046840A (ko) | 모바일 디바이스들에서의 모달리티 학습 | |
US20210174162A1 (en) | Spatial-Temporal Reasoning Through Pretrained Language Models for Video-Grounded Dialogues | |
Cho et al. | Punctuation insertion for real-time spoken language translation | |
CN105074817A (zh) | 用于使用手势来切换处理模式的系统和方法 | |
US20170133016A1 (en) | Speech recognition candidate selection based on non-acoustic input | |
US11113335B2 (en) | Dialogue system and computer program therefor | |
US9099091B2 (en) | Method and apparatus of adaptive textual prediction of voice data | |
US20150242386A1 (en) | Using language models to correct morphological errors in text | |
JP2024522328A (ja) | 言語モデルを使うマルチモーダル入力の処理 | |
US20200043493A1 (en) | Translation device | |
KR20200132619A (ko) | 구어에서 수어로의 주의 기반 인공신경망 기계 번역 방법 및 그 장치 | |
IL185752A (en) | Adjustable system and method for recognizing distorted text in computer images | |
Guo | The Re-Label Method For Data-Centric Machine Learning | |
Srinivasan et al. | Analyzing utility of visual context in multimodal speech recognition under noisy conditions | |
US20130231933A1 (en) | Addressee Identification of Speech in Small Groups of Children and Adults |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20784581 Country of ref document: EP Kind code of ref document: A2 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20784581 Country of ref document: EP Kind code of ref document: A2 |