WO2012030053A3 - 병렬 말뭉치의 구 정렬을 이용한 숙어 표현 인식 장치 및 그 방법 - Google Patents

병렬 말뭉치의 구 정렬을 이용한 숙어 표현 인식 장치 및 그 방법 Download PDF

Info

Publication number
WO2012030053A3
WO2012030053A3 PCT/KR2011/003832 KR2011003832W WO2012030053A3 WO 2012030053 A3 WO2012030053 A3 WO 2012030053A3 KR 2011003832 W KR2011003832 W KR 2011003832W WO 2012030053 A3 WO2012030053 A3 WO 2012030053A3
Authority
WO
WIPO (PCT)
Prior art keywords
idiomatic expression
recognizing
parallel corpus
expression
phrase alignment
Prior art date
Application number
PCT/KR2011/003832
Other languages
English (en)
French (fr)
Other versions
WO2012030053A2 (ko
Inventor
김상범
윤창호
황영숙
임해창
이형규
Original Assignee
에스케이텔레콤 주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 에스케이텔레콤 주식회사 filed Critical 에스케이텔레콤 주식회사
Priority to US13/820,199 priority Critical patent/US20140303955A1/en
Publication of WO2012030053A2 publication Critical patent/WO2012030053A2/ko
Publication of WO2012030053A3 publication Critical patent/WO2012030053A3/ko

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/191Automatic line break hyphenation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/45Example-based machine translation; Alignment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/49Data-driven translation using very large corpora, e.g. the web

Abstract

본 발명은 병렬 말뭉치의 구 정렬을 이용한 숙어 표현 인식 장치 및 그 방법에 관한 것으로서, 상세하게는 병렬 말뭉치의 구 정렬 정보를 이용하여 숙어 후보 표현을 추출하고, 그 추출된 숙어 후보 표현마다 숙어 표현 지수를 측정하여 숙어 표현으로 인식함으로써, 단어의 번역 엔트로피 측정 및 단어의 대표 대역어 추출의 오류를 해결하고 숙어 표현 인식의 정확률을 향상시킬 수 있다.
PCT/KR2011/003832 2010-09-02 2011-05-25 병렬 말뭉치의 구 정렬을 이용한 숙어 표현 인식 장치 및 그 방법 WO2012030053A2 (ko)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/820,199 US20140303955A1 (en) 2010-09-02 2011-05-25 Apparatus and method for recognizing an idiomatic expression using phrase alignment of a parallel corpus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2010-0085959 2010-09-02
KR1020100085959A KR101745349B1 (ko) 2010-09-02 2010-09-02 병렬 말뭉치의 구 정렬을 이용한 숙어 표현 인식 장치 및 그 방법

Publications (2)

Publication Number Publication Date
WO2012030053A2 WO2012030053A2 (ko) 2012-03-08
WO2012030053A3 true WO2012030053A3 (ko) 2012-04-19

Family

ID=45773336

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2011/003832 WO2012030053A2 (ko) 2010-09-02 2011-05-25 병렬 말뭉치의 구 정렬을 이용한 숙어 표현 인식 장치 및 그 방법

Country Status (3)

Country Link
US (1) US20140303955A1 (ko)
KR (1) KR101745349B1 (ko)
WO (1) WO2012030053A2 (ko)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9785704B2 (en) * 2012-01-04 2017-10-10 Microsoft Technology Licensing, Llc Extracting query dimensions from search results
KR102013230B1 (ko) 2012-10-31 2019-08-23 십일번가 주식회사 구문 전처리 기반의 구문 분석 장치 및 그 방법
US10347240B2 (en) * 2015-02-26 2019-07-09 Nantmobile, Llc Kernel-based verbal phrase splitting devices and methods
CN106202068B (zh) * 2016-07-25 2019-01-22 哈尔滨工业大学 基于多语平行语料的语义向量的机器翻译方法
US11288452B2 (en) * 2019-07-26 2022-03-29 Beijing Didi Infinity Technology And Development Co., Ltd. Dual monolingual cross-entropy-delta filtering of noisy parallel data and use thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR19990047856A (ko) * 1997-12-05 1999-07-05 정선종 다국어 기계번역 장치를 위한 다국어용 숙어 인식 시스템
KR20010027882A (ko) * 1999-09-16 2001-04-06 정선종 대역문틀에 기반한 구 단위 숙어의 인식 장치 및 그 방법
KR20030094632A (ko) * 2002-06-07 2003-12-18 인터내셔널 비지네스 머신즈 코포레이션 변환방식 기계번역시스템에서 사용되는 변환사전을생성하는 방법 및 장치

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6161083A (en) * 1996-05-02 2000-12-12 Sony Corporation Example-based translation method and system which calculates word similarity degrees, a priori probability, and transformation probability to determine the best example for translation
JP2005527894A (ja) * 2002-03-28 2005-09-15 ユニバーシティ・オブ・サザン・カリフォルニア 統計的機械翻訳
US7249012B2 (en) * 2002-11-20 2007-07-24 Microsoft Corporation Statistical method and apparatus for learning translation relationships among phrases
US7765098B2 (en) * 2005-04-26 2010-07-27 Content Analyst Company, Llc Machine translation using vector space representations
US7536295B2 (en) * 2005-12-22 2009-05-19 Xerox Corporation Machine translation using non-contiguous fragments of text
US7657421B2 (en) * 2006-06-28 2010-02-02 International Business Machines Corporation System and method for identifying and defining idioms
US8594992B2 (en) * 2008-06-09 2013-11-26 National Research Council Of Canada Method and system for using alignment means in matching translation
US8244519B2 (en) * 2008-12-03 2012-08-14 Xerox Corporation Dynamic translation memory using statistical machine translation
KR101266361B1 (ko) * 2009-09-10 2013-05-22 한국전자통신연구원 구조화된 번역 메모리 기반의 자동 번역 시스템 및 자동 번역 방법
US8548796B2 (en) * 2010-01-20 2013-10-01 Xerox Corporation Statistical machine translation system and method for translation of text into languages which produce closed compound words
US8543374B2 (en) * 2010-08-12 2013-09-24 Xerox Corporation Translation system combining hierarchical and phrase-based models

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR19990047856A (ko) * 1997-12-05 1999-07-05 정선종 다국어 기계번역 장치를 위한 다국어용 숙어 인식 시스템
KR20010027882A (ko) * 1999-09-16 2001-04-06 정선종 대역문틀에 기반한 구 단위 숙어의 인식 장치 및 그 방법
KR20030094632A (ko) * 2002-06-07 2003-12-18 인터내셔널 비지네스 머신즈 코포레이션 변환방식 기계번역시스템에서 사용되는 변환사전을생성하는 방법 및 장치

Also Published As

Publication number Publication date
KR101745349B1 (ko) 2017-06-09
KR20120022390A (ko) 2012-03-12
US20140303955A1 (en) 2014-10-09
WO2012030053A2 (ko) 2012-03-08

Similar Documents

Publication Publication Date Title
WO2009035863A3 (en) Mining bilingual dictionaries from monolingual web pages
WO2012030053A3 (ko) 병렬 말뭉치의 구 정렬을 이용한 숙어 표현 인식 장치 및 그 방법
WO2013064752A3 (en) Machine translation quality measurement
IL240549B (en) Device, system and method for imaging and labeling whole blood
MX2016005225A (es) Metodo y aparato de reconocimiento de huellas dactilares.
MX2018003490A (es) Traduccion universal.
EP2466541A3 (en) Image processing apparatus, image processing method and image processing program
BR112012011091A2 (pt) método e aparelho para extração e avaliação de qualidade de palavra
CL2014002526A1 (es) Metodo para detectar una cara, que comprende preprocesar una imagen, extraer las esquinas de la imagen preprocesada, obtener un componente conectado de las esquinas, extraer los centroides del componente, hacer coincidir los centroides con una plantilla, calcular una probabilidad de coincidencia de los centroides con la plantilla, identificar las regiones formadas por los centroides teniendo una probabilidad de coincidencia mayor o igual a un valor predeterminado; sistema; medio de almacenamiento
MX340339B (es) Metodos de transferencia de calibracion para un instrumento de pruebas.
WO2011051817A3 (en) System and method for increasing the accuracy of optical character recognition (ocr)
MX340429B (es) Sistema y metodo para coincidencia de direcciones contextual y de formato libre.
MX347895B (es) Dispositivo y método para obtener y procesar lecturas de medición de un ser vivo.
WO2010140779A3 (ko) 시료 채취/주입 기구 및 이를 포함하는 생체 데이터 측정용 세트
WO2011021198A3 (en) Gas chromatographic analysis method and system
MX357547B (es) Metodos y aparato para identificar atributos de fluidos.
BR112014010208A2 (pt) método, aparelho e sistema para permitir a recuperação de conteúdo de interesse para uma revisão posterior
WO2015050321A8 (ko) 자율학습 정렬 기반의 정렬 코퍼스 생성 장치 및 그 방법과, 정렬 코퍼스를 사용한 파괴 표현 형태소 분석 장치 및 그 형태소 분석 방법
BR112012011377A2 (pt) equipamento e método implementado em computador para o reconhecimento de característica de imagem independentemente da orientação ou escala da imagem e meio de armazenamento legível por computador não transitório para armazenar instruções
GB2547350A (en) Molecular cell imaging using optical spectroscopy
WO2013159972A3 (de) Sensor mit zeitstempel für abtast-zeitpunkt
NZ589039A (en) Recognition of a word image with a plurality of characters by way of comparing two possible candidates based on an evaluation value
WO2011074772A3 (ko) 문법 오류 시뮬레이션 장치 및 방법.
BR112012031056A2 (pt) dispositivo, método e programa de identificação de informação de avaliação, e, meio de gravação legível por computador
WO2011143141A3 (en) Method and apparatus for performing asynchronous and synchronous reset removal during synthesis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11822028

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 13820199

Country of ref document: US

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 13.06.2013)

122 Ep: pct application non-entry in european phase

Ref document number: 11822028

Country of ref document: EP

Kind code of ref document: A2