KR101445904B1

KR101445904B1 - System and methods for maintaining speech-to-speech translation in the field

Info

Publication number: KR101445904B1
Application number: KR1020107025487A
Authority: KR
Inventors: 이얀 알 레인; 알렉산더 와이벨
Original assignee: 페이스북, 인크.
Priority date: 2008-04-15
Filing date: 2009-04-15
Publication date: 2014-09-29
Also published as: CN102084417B; JP2016053726A; EP2274742A1; JP6345638B2; KR20110031274A; BRPI0910706A2; JP2011524991A; CN102084417A

Abstract

제1 언어를 문어와 구어를 포함하는 제2 언어로 번역하는 음성 번역 시스템의 어휘를 갱신하는 방법과 장치가 제공된다. 이 방법은 상기 제1 언어로 된 새로운 단어를 상기 제1 언어의 제1 인식 어휘집에 추가하는 단계와, 발음과 단어 클래스 정보를 포함하는 소정 설명문을 상기 새로운 단어에 연관시키는 단계를 포함한다. 그런 다음에 상기 제1 언어와 연관된 제1 기계 번역 모듈에서 상기 새로운 단어와 상기 설명문이 갱신된다. 상기 제1 기계 번역 모듈은 제1 태깅 모듈, 제1 번역 모델 및 제1 언어 모듈을 포함하고 상기 새로운 단어를 상기 제2 언어로 된 대응 번역 단어로 번역하도록 구성된다. 선택적으로, 본 발명은 단방향 또는 다방향 번역에 이용될 수 있다. There is provided a method and apparatus for updating a vocabulary of a speech translation system that translates a first language into a second language including an octopus and a spoken language. The method includes adding a new word in the first language to a first recognition vocabulary of the first language and associating a predetermined description including pronunciation and word class information with the new word. The new word and the comment are then updated in a first machine translation module associated with the first language. The first machine translation module is configured to include a first tagging module, a first translation model and a first language module and translate the new word into a corresponding translation word in the second language. Alternatively, the present invention may be used for unidirectional or multi-directional translation.

Description

[0001] SYSTEM AND METHOD FOR MAINTAINING SPEECH TRANSLATION IN THE FIELD [0002]

본 발명은 일반적으로 언어간 통신을 위한 음성 번역 시스템에 관한 것으로, 특히 사용자가 언어적 또는 기술적 지식 또는 전문지식을 필요로 하지 않고 현장에서 새로운 어휘 항목을 추가하여 시스템의 내용과 용법을 개선하고 수정할 수 있도록 해주는 현장 유지 방법 및 장치에 관한 것이다.The present invention relates generally to speech translation systems for interlanguage communication, and more particularly to a system and method for enhancing and improving the contents and usage of a system by adding new vocabulary items in the field without requiring the user to have linguistic or technical knowledge or expertise. And more particularly to a method and apparatus for on-site maintenance.

관련 출원의 상호 인용Mutual citation of related application

본 출원은 미국 임시 특허출원 제61/045,079호(출원일: 2008년 4월 15일), 미국 임시 특허출원 제61/092,581호(출원일: 2008년 8월 28일) 및 미국 임시 특허출원 제61/093,898호(출원일: 2008년 9월 3일)의 우선권을 주장한다.This application is a continuation-in-part of U.S. Provisional Patent Application No. 61 / 045,079 filed on April 15, 2008, provisional U.S. Provisional Patent Application No. 61 / 092,581 filed on August 28, 2008, 093,898 (filed on September 3, 2008).

자동 음성 인식(ASR) 및 기계 번역(MT) 기술은 제한 영역(limited domain)과 무제한 영역(unlimited domain)에 있어서 랩톱이나 모바일 장치에서 실제 음성 번역 시스템 개발이 가능할 정도까지 발전하였다. 특히 영역 제한 음성 번역 시스템은 관광, 의료 배치(medical deployment) 및 군사 분야를 포함한 다양한 응용분야에서 연구 현장과 연구소에서 개발되어 왔다. 이런 시스템은 「A. Waibel, C. Fugen, "Spoken language translation" in Signal Processing Magazine, IEEE may 2008; 25(3): 70-79, In Proc. HLT, 2003」; 및 「Nguyen Bach, Matthias Eck, Paisarn Charoenpornsawat, Thilo Kohler, Sebastian Stuker, ThuyLinh Nguyen, Roger Hsiao, Alex Waibel, Stephan Vogel, Tanja Schultz and Alan W. Black」의 저서에, 예컨대 "The CMU TransTac 2007 eyes-free and hands-free two-way speech-to-speech translation system"에서 이미 볼 수 있었다. 그러나 이들 시스템은 시스템 개발자에 의해 미리 정의되고 시스템이 이용될 것으로 상정되는 적용 영역과 장소에 따라 정해지는 한정된 어휘를 가지고 작동한다는 점에서 제한적이다. 따라서 어휘와 언어의 용법은 대부분 예시 시나리오에 기초하고 그 시나리오에서 수집 또는 가정된 데이터에 따라서 결정된다.Automatic Speech Recognition (ASR) and Machine Translation (MT) technologies have developed to the point where it is possible to develop real speech translation systems in laptops and mobile devices in limited and unlimited domains. In particular, region-limited speech translation systems have been developed at research sites and laboratories in a variety of applications, including tourism, medical deployment and military. Such a system is called " A. Waibel, C. Fugen, "Spoken language translation" in Signal Processing Magazine, IEEE May 2008; 25 (3): 70-79, In Proc. HLT, 2003 "; &Quot; The CMU TransTac 2007 eyes-free " in the book by Nguyen Bach, Matthias Eck, Paisarn Charoenpornsawat, Thilo Kohler, Sebastian Stuker, Thuy Linh Nguyen, Roger Hsiao, Alex Waibel, Stephan Vogel, Tanja Schultz and Alan W. Black and a hands-free two-way speech-to-speech translation system. However, these systems are limited in that they operate with a finite set of vocabularies that are predefined by the system developer and dictated by the application area and location that the system is supposed to be used. Therefore, most vocabulary and language usage is based on example scenarios and is determined by data collected or assumed in that scenario.

그러나 현장 상황에서는 실제 단어와 언어 용법은 실험실의 예상 시나리오에서 벗어난다. 관광 언어와 같은 간단한 영역에서조차도 사용자가 다른 장소로 여행하고 다른 사람들과 대화하고 다른 목적과 필요를 추구함에 따라 현장에서의 용법이 크게 다를 것이다. 따라서 새로운 단어와 새로운 표현이 늘 생기게 마련이다. 이러한 새로운 단어 -음성 인식 용어로 말해 OOV(out-of-vocabulary) 단어는 어휘 사전에 있는 단어로 잘못 인식되어 틀리게 번역될 수 있다. 사용자는 다른 말로 바꾸어서 표현해 볼 수 있지만, (사람이름이나 도시명과 같은) 중요한 단어나 개념이 입력되거나 전달될 수 있다면 그 단어나 표현이 없음으로 해서 의사 소통이 되지 않을 수 있다.However, in the field, actual words and language usage deviate from the expected scenarios of the laboratory. Even in simple areas such as tourism languages, usage in the field will vary greatly as users travel to different places, communicate with others, and pursue different purposes and needs. Therefore, new words and new expressions will always be created. In this new word-speech recognition term, an out-of-vocabulary (OOV) word is mistakenly recognized as a word in the vocabulary dictionary and can be incorrectly translated. The user may be able to express in other words, but if an important word or concept (such as a person's name or a city's name) can be entered or communicated, it may not be communicated because there is no word or expression.

사용자 변경이 가능한 음성 번역 시스템이 필요하지만 지금까지는 실제 해결 방안이 제시되지 못하고 있다. 단어를 시스템에 추가하는 것이 쉬워보이지만 그런 변경은 매우 어려운 일인 것으로 드러났다. 시스템 전체에 걸쳐 많은 성분 모듈에 대한 적절한 변경이 이루어져야 하며 대부분의 모듈은 성분들의 균형과 통합 기능을 복원하기 위해 재학습되어야 했다. 실제로 하나의 새로운 단어를 학습하려면 약 20개의 서로 다른 모듈들이 변경되거나 다시 최적화되어야 했었다. 이런 변경은 음성 번역 시스템의 성분에 대한 전문가와 경험이 필요하며, 따라서, 본 발명자가 알기로는 그러한 변경은 지금까지는 전문가가 실험실에서만 해왔으며, 전문 지식, 시간 비용이 많이 드는 일이었다.A user-changeable voice translation system is needed, but no real solution has been presented so far. Adding words to the system seems easy, but it turns out that such a change is very difficult. Appropriate changes to many component modules have to be made throughout the system and most modules have to be re-learned to restore the balance and integration of the components. In fact, to learn a new word, about 20 different modules had to be changed or re-optimized. This change requires expertise and experience with the components of the voice translation system, and as such, the inventors have known that such changes have been made by experts in the laboratory so far, and are expensive and time-consuming.

예컨대 유럽에서의 사용자용으로 설계된 시스템이 어휘 사전에 "Hong Kong"이라는 이름을 포함하고 있지 않으면, 화자가 문장 "Let's go to Hong Kong"이라고 말한다면 시스템은 그 발음에 가장 가까운 사전 내 단어를 인식하여 "Let's go to home call"을 발생할 것이다. 이때에 그 에러가 인식 오류 때문인지 아니면 이 단어가 음성 번역 시스템에 있지 않기 때문인지 명확하지가 않다. 그러므로 사용자는 시스템을 정정하게 된다. 이 일은 몇 가지 정정 기술 중 하나에 따라 이루어질 수 있다. 가장 간단한 것은 다시 말하거나 타이핑하는 것일 수 있지만, 다른 문헌과 종래 기술(Waibel 등의 미국특허 제5,855,000호)에 기재된 크로스 모달(cross-modal) 에러 정정 기술을 이용하면 더욱 효과적일 수 있다. 원하는 단어 계열의 올바른 스펠링이 설정되고 나면("Let's go to Hong Kong"), 시스템은 번역 작업을 수행한다. 사전에 "Hong Kong"이 있으면 시스템은 거기서부터 작업을 정상적으로 진행하여 번역과 합성을 수행할 것이다. 그러나 이 단어가 인식 및 번역 사전에 없다면 시스템은 이 단어가 개체명(named entity)인지 아닌지를 결정할 필요가 있을 것이다. 마지막으로, 가장 중요한 것인데, 이름이나 단어가 학습없이도 사용자 개입을 통해 출력 언어로 올바르게 번역될 수 있다 하더라도 사용자가 이 단어를 다음번에 말할 때에 시스템은 이를 제대로 인식하고 번역하지 못할 수 있다.For example, if a system designed for users in Europe does not include the name "Hong Kong" in the vocabulary dictionary, the system will recognize the word in the dictionary closest to the pronunciation if the speaker says "Let's go to Hong Kong" To "Let's go to home call". It is not clear at this time whether the error is due to a recognition error or the word is not in the speech translation system. Therefore, the user corrects the system. This work can be done in accordance with one of several correction techniques. The simplest may be rephrasing or typing, but it may be more effective to use cross-modal error correction techniques as described in other documents and prior art (US Pat. No. 5,855,000 to Waibel et al.). Once the correct spelling of the desired word sequence is established ("Let's go to Hong Kong"), the system performs translation. If there is "Hong Kong" in advance, the system will proceed from there to work normally and perform translation and synthesis. However, if the word is not in the recognition and translation dictionary, the system will need to determine whether it is a named entity or not. Finally, and most importantly, even if names or words can be correctly translated into the output language through user intervention without learning, the system may not recognize and translate it correctly the next time the user speaks the word.

불행히도 새 단어를 학습하는 것은 새 단어를 단어장에 타입핑해 넣는 것만으로는 해결될 수 있는 문제가 아니고 약 20가지의 서로 다른 포인트와 음성 번역 시스템의 전체 레벨을 변경해야 하는 것이다. 현재로서는 이것은 모든 성분과 성분의 사전 간의 일관성을 다시 설정하고 그리고 시스템 내의 단어, 구절 그리고 개념들 간의 통계적 균형을 복원시키기 위해(확률이 1까지 되어야하며 따라서 단 한 단어만 추가하더라도 모든 단어가 영향을 받을 수 있음), 엔트리의 태깅(tagging)과 편집, 요구되는 단어를 포함하는 광범위한 데이터베이스의 수집, 언어 모델 및 번역 모델 확률의 재학습, 및 전체 시스템의 재최적화와 관련된다.Unfortunately learning a new word is not a problem that can be solved by typing a new word into a vocabulary, but rather about 20 different points and a whole level of the speech translation system. For now, this is done to re-establish consistency between all elements and the dictionary of elements and to restore the statistical balance between words, phrases, and concepts in the system (the probability should be 1, Tagging and editing entries, collecting a wide range of databases containing the required words, re-learning the language model and translation model probabilities, and re-optimizing the overall system.

결과적으로, 기존의 음성 번역 시스템은 아주 약간만 변경하려고 해도 대개는 연구소에 있는 고급 컴퓨팅 툴과 언어 자원을 이용했어야 했다. 그러나 실제 현장 사용을 위해서 모든 변경을 연구소에서 해야 한다는 것은 시간, 노력, 비용면에서 가능한 일로 보이지 않는다. 그 대신에 사용자측에서 보면 복잡하지 않으면서 모든 중요한 동작과 언어 처리 단계를 반자율적으로 또는 자율적으로 수행하고, 간단한 직관적 인터페이스를 통해 가능한 단순 명쾌하게 사용자와 대화할 수 있어 현장에서 언어적 전문지식이나 기술적 전문지식을 필요로 하지 않는 학습 및 사용자 맞춤 모듈이 요구된다. 본 발명에서는 이러한 요구를 충족할 수 있는 학습 및 사용자 맞춤 모듈에 대해 상세한 제공한다.As a result, existing voice translation systems would have to make use of advanced computing tools and language resources, usually in the lab, to make very few changes. However, it does not seem possible in terms of time, effort and cost to make all the changes in the laboratory for actual field use. Instead, the user can perform all important operations and language processing steps in a semi-autonomous or autonomous manner without complexity, and communicate with the user as simple and clear as possible through a simple intuitive interface. Learning and custom modules that do not require expert knowledge are required. The present invention provides in detail learning and user-customized modules that can meet these needs.

불행히도 번역 시스템은 대개는 사용자 접근이 실제로 가능하지 못할 정도로 매우 복잡하다. 따라서 기계 번역 기술을 이용하고, 사용자가 언어나 기술에 대해 전문적 지식이 없더라도 사용자가 쉽게 변경할 수 있도록 하여 다국어간 의사 소통을 가능하게 하여 언어 장벽을 극복하고 사람들이 더욱 가까워질 수 있도록 해주는 시스템과 방법이 필요하다.Unfortunately, translation systems are usually so complex that user access is not really possible. Therefore, a system and a method that enable users to easily change even if the user does not have expertise in language or technology by using machine translation technology, to enable multilingual communication, to overcome language barriers and to make people closer Is required.

<발명의 개요>SUMMARY OF THE INVENTION [

여러 가지 실시예에서 본 발명은 음성 번역 시스템의 어휘를 갱신하는 방법과 장치를 제공함으로써 상기 문제점들을 해결한다. 여러 가지 실시예에서, 제1 언어를 문어와 구어를 포함하는 제2 언어로 번역하는 음성 번역 시스템의 어휘를 갱신하는 방법이 제공된다. 이 방법은 상기 제1 언어로 된 새로운 단어를 상기 제1 언어의 제1 인식 어휘집에 추가하는 단계와, 발음과 단어 클래스 정보를 포함하는 소정 설명문을 상기 새로운 단어에 연관시키는 단계를 포함한다. 그런 다음에 상기 제1 언어와 연관된 제1 기계 번역 모듈에서 상기 새로운 단어와 상기 설명문이 갱신된다. 상기 제1 기계 번역 모듈은 제1 태깅 모듈, 제1 번역 모델 및 제1 언어 모듈을 포함하고 상기 새로운 단어를 상기 제2 언어로 된 대응 번역 단어로 번역하도록 구성된다.In various embodiments, the present invention solves these problems by providing a method and apparatus for updating a vocabulary of a speech translation system. In various embodiments, a method is provided for updating a vocabulary of a speech translation system that translates a first language into a second language comprising an octopus and a spoken word. The method includes adding a new word in the first language to a first recognition vocabulary of the first language and associating a predetermined description including pronunciation and word class information with the new word. The new word and the comment are then updated in a first machine translation module associated with the first language. The first machine translation module is configured to include a first tagging module, a first translation model and a first language module and translate the new word into a corresponding translation word in the second language.

선택적으로, 양방향 번역을 위해, 이 방법은 상기 번역 단어를 상기 제2 언어에서 다시 상기 제1 언어의 상기 새로운 단어로 번역하고, 상기 새로운 단어를 상기 제2 언어의 대응 번역 단어와 상관시키고, 상기 번역 단어와 그 설명문을 상기 제2 언어의 제2 인식 어휘집에 추가하는 단계를 더 포함한다. 그런 다음에 상기 제2 언어와 연관된 제2 기계 번역 모듈은 상기 번역 단어와 상기 설명문으로 갱신된다. 상기 제2 기계 번역 모듈은 제2 태깅 모듈, 제2 번역 모델 및 제2 언어 모듈을 포함한다.Optionally, for bidirectional translation, the method further comprises translating the translation word back into the new language in the second language, correlating the new word with a corresponding translation word in the second language, And adding the translated word and the comment to the second recognized lexicon of the second language. The second machine translation module associated with the second language is then updated with the translation word and the comment. The second machine translation module includes a second tagging module, a second translation model, and a second language module.

실시예들에서, 이 방법은 제1 단어를 상기 제1 언어와 연관된 텍스트-음성 발음 어휘집에 입력하는 단계와, 제2 단어를 상기 제2 언어와 연관된 텍스트-음성 발음 어휘집에 입력하는 단계를 더 포함한다. 상기 입력 신호는 서로 다른 양식(예컨대 음성과 비언어적 스펠링, 음성과 언어적 스펠링, 글과 말 등)(여기서는 "크로스 모달(cross-modal)"이라 함)으로 되어 있거나 동일 양식(말과 거듭 말하기, 글과 거듭 쓰기 등)으로 되어 있을 수 있다.In embodiments, the method further comprises inputting a first word into a text-to-speech pronunciation lexicon associated with the first language and inputting a second word into a text-to-speech pronunciation lexicon associated with the second language . The input signal may be in a different form (e.g., speech and non-verbal spelling, speech and verbal spelling, text and speech) (referred to herein as "cross-modal" Writing, and repeated writing).

본 발명의 실시예는 제1 언어와 제2 언어 간에 의사 소통하는 현장 유지가능한 클래스 기반 음성 번역 시스템에 관한 것이다. 이 시스템은 각각이 제1 또는 제2 언어의 구어를 포함하는 소리를 수용하고 상기 구어에 대응하는 텍스트를 발생하도록 구성된 2개의 음성 인식 유닛, 및 각각이 상기 음성 인식 유닛들 중 하나로부터 텍스트를 수신하고 상기 텍스트의 다른 영어로의 텍스트로의 번역을 출력하도록 구성된 2개의 대응 기계 번역 유닛을 포함한다. 이 시스템은 또한 시스템이 사용자와 협력하여 새로운 단어를 학습할 수 있도록 해주는 사용자 필드 맞춤 모듈을 포함한다. 사용자 필드 맞춤 모듈은 상기 언어들 중 하나 또는 모두에 대응하는 소리 또는 텍스트를 포함하는 사용자 선택 입력을 수용하도록 구성되고, 상기 사용자 선택 입력을 가지고 상기 기계 번역 유닛들을 적절하게 갱신한다.An embodiment of the invention relates to a field-maintainable class-based speech translation system that communicates between a first language and a second language. The system comprises two speech recognition units each adapted to receive a sound comprising a spoken language of a first or a second language and to generate text corresponding to the spoken word and to receive text from one of the speech recognition units And two corresponding machine translation units configured to output the translation of the text into another English text. The system also includes a user field customization module that allows the system to work with the user to learn new words. The user field customization module is configured to accept user selection inputs that include sound or text corresponding to one or both of the languages, and suitably update the machine translation units with the user selection input.

일 실시예에서, 이 시스템은 현장 유지가능한 클래스 기반 음성 번역 시스템이 될 수 있도록 하는 4개의 주요 특징을 갖고 있다. 첫째는 액티브 시스템 어휘에의 새로운 단어 추가와 위치 또는 업무 특정 어휘들 간의 교체를 가능하게 하는 음성 번역 체계를 포함한다. 이에 따라 모듈을 재시동시킬 필요없이 음성 인식 모듈에의 동적인 단어 추가가 가능하게 된다. 이 시스템은 음성 번역 장치 내의 모든 시스템 성분에 걸친 다국어 시스템 사전과 언어 무관 단어 클래스, 클래스 기반 기계 번역(구절 기반 통계적 MT, 신택틱, 예시 기반 등), 단일 언어 태거(tagger)들의 조합에 기초한, 모델 학습 중의 다국어 단어 클래스 태깅, 및 기지의 태그된 언어로부터의 병렬 코퍼스(corpus)를 통한 정렬을 통한 새로운 언어로 된 단어 클래스 태깅을 이용한다. 둘째, 멀티모달 대화식 인터페이스에 의해 비전문가라도 시스템에 새로운 단어를 추가할 수 있다. 셋째, 시스템은 사용자가 제공하는 멀티모달 피드백을 이용하여 ASR 및 SMT 모델 적응을 수용하도록 설계된다. 넷째, 시스템은 정정 또는 단어들을 공유할 수 있는 네트워킹 기능을 갖고 있다.In one embodiment, the system has four key features that enable it to be a field-maintainable class-based speech translation system. First, it includes a speech translation system that enables the addition of new words to active system vocabularies and the replacement of position or task specific vocabularies. This makes it possible to add dynamic words to the speech recognition module without having to restart the module. The system is based on a combination of multilingual system dictionaries and language-independent word classes, class-based machine translation (phrase-based statistical MT, syntactic, example-based, etc.) Use multilingual word class tagging during model learning, and word class tagging in a new language through alignment through parallel corpus from a known tagged language. Second, a multimodal interactive interface allows non-experts to add new words to the system. Third, the system is designed to accommodate ASR and SMT model adaptation using multi-modal feedback provided by the user. Fourth, the system has a networking function that can correct or share words.

다른 실시예에서, 사용자가 기술적 전문지식이 없더라도 현장에서 음성 번역 장치에 새로운 단어를 추가할 수 있도록 해주는 멀티모달 대화식 인터페이스가 개시된다. 예들로서, (1) 시스템에 추가될 단어나 단어 구절의 클래스를 자동적으로 분류하고 그 단어의 발음과 번역을 자동적으로 발생하는 방법; (2) 스피킹, 타이핑, 스펠링, 핸드라이팅, 브라우징 및 패러프레이징 중 하나 이상으로 크로스 모달방식으로 새로운 단어를 입력하는 방법; (3) 언어학적으로 훈련되지 않은 사용자가 음성 표기 및 번역이 적당한지를 판단할 수 있도록 도와주는 멀티모달 피드백: 텍스트-음성 변환(TTS: 이것이 적절한 것으로 들린다)을 통한 복수 텍스트 형태(즉, 다른 언어의 스크립트로 된 문어체 형태는 물론 로마자 형태)와 음성 형태; (4) 새로운 단어에 대한 언어 모델과 번역 확률을 설정하는 방법; 및 (5) 사용자 활동, 관심 및 이용 이력과의 관련성에 기초하여 새로 학습된 단어에 대한 언어 모델과 번역 확률을 증가(boosting) 또는 감소(discounting)시키는 것이 있다.In another embodiment, a multimodal interactive interface is disclosed that allows a user to add new words to a speech translation device in the field without technical expertise. Examples include (1) automatically classifying words or phrases to be added to the system and automatically generating pronunciation and translation of the words; (2) a method of inputting new words in a cross-modal manner with at least one of speaking, typing, spelling, handwriting, browsing and paraphrasing; (3) Multimodal feedback to help a linguistically untrained user determine whether phonetic transcriptions and translations are appropriate: Multiple text forms through text-to-speech conversion (TTS: this sounds appropriate) In Roman form, as well as script form in script form) and voice form; (4) a method for setting a language model and translation probability for a new word; And (5) boosting or decreasing the language model and translation probability for the newly learned word based on relevance to user activity, interest, and usage history.

다른 실시예에서, 현장에서 멀티모달 사용자 피드백을 통해 정정하는 온라인 시스템이 개시된다. 예들로서, (1) 사용자가 자동 음성 인식 결과를 정정하고 이 피드백 정보를 이용하여 음성 인식 모듈을 적응시킬 수 있도록 해주는 인터페이스 및 방법; (2) 사용자가 기계 번역 샘플문을 정정하고 이 피드백 정보를 이용하여 기계 번역 성분을 향상시킬 수 있도록 해주는 인터페이스 및 방법; 및 (3) 사용자 정정에 기초하여 정확한 단어 또는 정정된 단어에 대한 언어 모델, 사전 및 번역 모델 확률을 자동적으로 조정하는(증가 또는 감소시키는) 방법이 있다.In another embodiment, an on-line system is disclosed that corrects on-site with multi-modal user feedback. Examples include (1) an interface and method that allows a user to correct an automatic speech recognition result and adapt the speech recognition module using the feedback information; (2) an interface and method that allows a user to correct machine translation sample statements and use this feedback information to enhance machine translation components; And (3) automatically adjusting (increasing or decreasing) the language model, dictionary, and translation model probabilities for correct words or corrected words based on user corrections.

다른 실시예에서, 사용자가 현장에서 한 정정 또는 새로운 단어 추가를 여러 장치에 걸쳐 공유할 수 있도록 해주는 인터넷 애플리케이션이 개시된다. 예들로서, (1) 월드 와이드 웹을 통해 음성 번역 장치에 이용되는 모델들을 업로드, 다운로드 및 편집하는 방법; (2) 사용자의 전체 커뮤니티에 걸쳐 새로운 단어의 추가와 정정을 현장에서 대조하는 방법; 및 (3) 음성 번역 장치에서 이용되는 위치 또는 업무 특정 어휘를 업로드, 다운로드 및 편집하는 방법이 있다.In another embodiment, an Internet application is disclosed that allows a user to share a correction or new word addition in the field across multiple devices. By way of example, (1) a method for uploading, downloading and editing models used in a voice translation device over the World Wide Web; (2) a method of collating the addition and correction of new words on-site across the user's entire community; And (3) uploading, downloading and editing location-specific or business-specific vocabularies used in the speech translation device.

첨부도면은 본 발명의 실시예들의 예를 도시한 것이다. 도면에서,
도 1은 본 발명의 실시예에 따라 구성된 음성 번역 시스템을 도시한 블록도.
도 2는 테블릿 인터페이스를 통해 사용자에게 표시되는 그래픽 사용자 인터페이스의 예를 도시한 도.
도 3은 도 1의 본 발명의 실시예에 따라 수행되는 음성 번역 단계들을 보여주는 플로우 차트.
도 4는 시스템이 사용자에 의한 정정으로부터 학습하는 단계(정정 및 교정 모듈)를 보여주는 플로우 차트.
도 5는 사용자가 새로운 단어를 시스템에 추가할 수 있는 단계(사용자 필드 맞춤 모듈)를 보여주는 플로우 차트.
도 6은 장치가 사용자가 시스템에 추가하고자 하는 새로운 단어에 대한 번역과 발음을 자동적으로 발생하는 방법의 일례를 보여주는 플로우 차트.
도 7은 멀티모달 인터페이스를 통해 새로운 단어 입력을 검증하는 방법의 일례를 보여주는 플로우 차트.
도 8은 자동 발생된 단어 정보를 표시하는 시각적 인터페이스의 예를 도시한 도.
도 9는 클래스 기반 MT 모델을 학습시키는데 필요한 단계를 보여주는 플로우 차트.
도 10은 클래스 기반 MT를 입력 문장에 적용하는 단계를 보여주는 플로우차트.
도 11은 통계적 또는 기계적 학습 방식을 통한 단어 클래스 태깅 중에 이용된 가능한 피처를 보여주는 도.The accompanying drawings illustrate examples of embodiments of the present invention. In the drawings,
1 is a block diagram illustrating a speech translation system constructed in accordance with an embodiment of the present invention.
Figure 2 illustrates an example of a graphical user interface displayed to a user via a tablet interface;
Figure 3 is a flow chart showing the steps of voice translation performed in accordance with the embodiment of the invention of Figure 1;
4 is a flowchart showing steps (correction and calibration module) in which the system learns from correction by a user.
5 is a flow chart showing the steps by which a user can add a new word to the system (a user field customization module);
6 is a flow chart showing an example of a method by which a device automatically generates a translation and pronunciation of a new word that a user wishes to add to the system.
7 is a flow chart illustrating an example of a method for verifying new word input through a multimodal interface;
8 is a diagram showing an example of a visual interface for displaying automatically generated word information;
Figure 9 is a flow chart showing the steps required to learn a class-based MT model.
10 is a flow chart showing the step of applying a class-based MT to an input sentence.
Figure 11 is a diagram showing possible features used during word class tagging through statistical or mechanical learning schemes.

본 발명의 다양한 실시예들은 음성 번역 방법 및 시스템을 설명한다. 실시예들은 모델 적용을 통해 사용자의 음성과 말투에 적응시키는데 이용될 수 있다. 추가 실시예에서 사용자는 인식 에러를 정정할 수 있으며 시스템은 사용자가 정정한 에러로부터 명확하게 학습할 수 있으며, 따라서 이들 에러가 나중에 다시 발생할 가능성이 작게 된다. 본 발명에 따라서 사용자는 새로운 단어를 시스템에 추가하거나 특정 위치 또는 업무에 최적화된 미리 정해진 사전을 선택함으로써 어휘를 자신의 개인적 필요와 환경에 맞출 수가 있다. 새로운 단어 추가 시에 사용자는 멀티모달(multimodal) 인터페이스를 이용하여 자동 생성된 번역과 발음을 정정하고 검증할 수 있다. 이에 따라 사용자는 다른 언어를 몰라도 새로운 단어를 시스템에 추가할 수 있다. 실시예에서 시스템은 사용자가 입력한 새로운 어휘를 사용자 커뮤니티(community)에 전달하도록 더 구성된다. 이 데이터는 대조 확인되고, 사전이 자동적으로 생성되고 임의의 사용자는 이 사전을 다운로드받을 수 있다.Various embodiments of the present invention describe a method and system for speech translation. Embodiments can be used to adapt to the user ' s voice and speech through model application. In a further embodiment, the user can correct the recognition error, and the system can learn clearly from the error corrected by the user, so that these errors are less likely to occur again later. In accordance with the present invention, the user can tailor the vocabulary to his or her personal needs and environment by adding new words to the system or selecting a predetermined dictionary optimized for a particular location or task. When adding a new word, the user can correct and verify the automatically generated translation and pronunciation using a multimodal interface. Accordingly, the user can add a new word to the system without knowing another language. In an embodiment, the system is further configured to communicate a new vocabulary entered by the user to a user community. This data is verified, a dictionary is automatically generated and any user can download the dictionary.

도 1은 본 발명의 실시예에 따른 현장 유지가능한 음성 번역 시스템의 일례를 도시한 블록도이다. 이 예에서 시스템은 2개의 언어 L_a와 L_b간에서 운용된다. 이는 양방향, 즉 L_a에서 L_b로 그리고 L_b에서 L_a로의 음성 번역과 관련한 음성 대화 시스템의 통상적인 구현이다. 그러나 이 구성의 양방향성은 본 발명에서 필수 사항은 아니다. L_a에서 L_b로의 단방향 시스템 또는 몇 가지 언어 L₁, ..., L_n와 관련한 다방향 시스템도 똑같이 본 발명의 혜택을 받을 수 있다. 본 시스템은 각자 L_a와 L_b에 대한 음성을 인식하고 각자 음향 모델(18), ASR 클래스 기반(class-based) 언어 모델(19) 및 인식 어휘집(lexicon) 모델(20)(이들 모델은 도 3에 도시되어 있음)을 이용하여 L_a와 L_b에 대응하는 텍스트를 생성하는 2개의 ASR 모듈(2, 9)을 갖고 있다. 이 예에서는 Mobile Technologies, LLC에서 개발한 "닌자(Ninja)" 음성 인식 시스템을 이용한다. 이용가능한 다른 형태의 ASR로는 IBM사, SRI, BBN 또는 Cambridge 또는 Aachen에서 개발한 음성 인식기가 있다.1 is a block diagram illustrating an example of a field-maintainable speech translation system in accordance with an embodiment of the present invention. In this example, the system operates between two languages, L _a and L _b . This is _a common implementation of a speech dialogue system in both directions, namely from L _a to L _b and from L _b to L _a , with respect to speech translation. However, the bi-directionality of this configuration is not essential to the present invention. A unidirectional system from L _a to L _b or a multi-directional system with respect to several languages L ₁ , ..., L _n can likewise benefit from the invention. The system recognizes the speech for each of L _a and L _b and generates a corresponding acoustic model 18, an ASR class-based language model 19 and a lexicon model 20, 3) for generating text corresponding to L _a and L _b using a plurality of ASR modules (2, 9). This example uses the "Ninja" speech recognition system developed by Mobile Technologies, LLC. Other types of ASRs available are IBM, SRI, BBN, or speech recognizers developed by Cambridge or Aachen.

본 시스템은 각자가 텍스트를 L_a에서 L_b로 그리고 L_b에서 L_a로 번역하는 2개의 기계 번역 모듈(3, 8)도 포함한다. 이 예에서 사용된 MT 모듈은 Mobile Technologies, LLC에서 개발한 "판도라(PanDoRA)" 시스템이다. 이용가능한 다른 MT로는 IBM사, SRI, BBN 또는 Aachen 대학에서 개발한 것과 같은 것이 있다.The system also includes two machine translation modules (3, 8), each of which translates text from L _a to L _b and from L _b to L _a . The MT module used in this example is a "Pandora" system developed by Mobile Technologies, LLC. Other MTs available include those developed by IBM, SRI, BBN, or Aachen University.

기계 번역 모듈(3, 8)에 각각 대응하는 2개의 텍스트-음성 변환 엔진(4, 7)은 대응하는 ASR 유닛에서 생성된 텍스트를 수신하도록 구성된다. 출력 텍스트는 각자의 MT 모듈(3 또는 8)로 전송되고, 이 모듈은 각자 그 텍스트를 L_a에서 L_b로 그리고 L_b에서 L_a로 번역한다. TTS 모듈은 L_a로 된 적어도 하나의 텍스트 단어를 음성으로 변환하는 오디오 출력을 확성기와 같은 출력 장치(5)를 통해 발생하고, L_b로 된 적어도 하나의 텍스트 단어를 음성으로 변환하는 오디오 출력을 출력 장치(5) 또는 확성기(6)와 같은 다른 출력 장치를 통해 발생한다. 이 예에서는 켑스트럴(Cepstral) TTS 모듈이 이용된다. Windows SAPI(speech application programming interface)를 지원하는 TTS 모듈도 이용될 수 있다.The two text-to-speech engines 4 and 7, respectively corresponding to the machine translation modules 3 and 8, are configured to receive the text generated in the corresponding ASR unit. The output text is sent to its own MT module (3 or 8), each of which translates the text from L _a to L _b and from L _b to L _a . The TTS module generates an audio output for converting at least one text word in L _a to speech through an output device 5 such as a loudspeaker and an audio output for converting at least one text word in L _b to speech Such as the output device 5 or the loudspeaker 6. In this example, a Cepstral TTS module is used. TTS modules that support the Windows speech application programming interface (SAPI) can also be used.

정정 및 교정 모듈(11)은 사용자가 음성, 제스처, 작문, 촉감, 터치 감응식 및 키보드 인터페이스를 포함한 여러 가지 양식을 통해 시스템 출력을 정정할 수 있도록 해주고, 그리고 시스템이 사용자의 정정으로부터 학습할 수 있게 해준다. 이 정정 및 교정 모듈은 미국특허 제5,855,000호에 개시된 것과 같은 형태일 수 있다. 사용자 필드 맞춤(customization) 모듈(12)은 사용자에게 시스템에 새로운 어휘를 추가하는 인터페이스를 제공하며 사용자의 현재 상황에 대한 적당한 시스템 어휘를 선택할 수도 있다. 예컨대 이것은 장치의 현재 위치를 나타내는 GPS 좌표로 알아낸 장소의 변경에 따라 또는 사용자의 작업이나 장소의 명시적 선택에 따라 기동(trigger)된다.The correction and calibration module 11 allows the user to correct system output through various forms including voice, gesture, writing, touch, touch-sensitive and keyboard interface, and allows the system to learn from user corrections I will. This correction and calibration module may be in the form as disclosed in U.S. Patent No. 5,855,000. The user field customization module 12 provides the user with an interface to add a new vocabulary to the system and may select an appropriate system vocabulary for the user's current situation. For example, this may be triggered according to a change in the location found by the GPS coordinates indicating the current location of the device or according to an explicit selection of the user's job or location.

사용자는 사용자 필드 맞춤 모듈(12)에 액세스하여, 장치(13)의 스크린(또는 액티브 터치 스크린) 상에 표시된 그래픽 사용자 인터페이스와, 마우스나 펜을 포함하는 포인팅 장치(14)를 통해 시스템과 대화할 수 있다. 그래픽 사용자 인터페이스의 예는 도 2에 나타나 있다. 이 예에서 장치(13)는 창(15)에 L_a의 오디오 입력의 텍스트와 그 대응 텍스트를 표시한다. 텍스트 L_a의 제2 언어 L_b로의 기계 번역은 창(16)에 표시된다.The user accesses the user field customization module 12 to interact with the system through a graphical user interface displayed on the screen of the device 13 (or an active touch screen) and a pointing device 14 including a mouse or pen . An example of a graphical user interface is shown in FIG. In this example, the device 13 displays in the window 15 the text of the audio input of L _a and its corresponding text. Text machine translation to a second language of L _a L _b are displayed in the window 16.

실시예에서는 양 언어에 같은 마이크로폰과 확성기가 사용될 수 있다. 따라서 마이크로폰(1, 10)은 단일 물리적 장치일 수 있고, 스피커(5, 6)도 단일 물리적 장치일 수 있다.In the embodiment, the same microphone and loudspeaker can be used in both languages. Thus, the microphones 1, 10 may be a single physical device, and the speakers 5, 6 may also be a single physical device.

본 발명의 방법의 예의 동작을 예시하는 플로우 차트는 도 3에 나타나 있다. 먼저 단계(15b)에서 사용자에 의해 음성 인식 시스템이 작동된다. 예컨대 그래픽 사용자 인터페이스(도 2에서 항목 15b) 또는 외부의 물리적 버튼(미도시) 상의 버튼이 선택될 수 있다. 그 다음 단계(27)에서 ASR 모듈들 중 하나에 의해, 즉 사용자가 L_a를 말하고 있는 경우에는 모듈(2)에 의해, 사용자가 L_b를 말하고 있는 경우에는 모듈(9)에 의해 사용자 음성(항목 25)이 인식된다. ASR 모듈(2, 9)은 3가지 모듈, 즉 음향 모델(18), ASR 클래스 기반 언어 모델(19) 및 인식 어휘집 모델(20)을 적용한다. 이들 모델은 언어 특정적이며(language specific), 각 ASR 모듈은 자신의 모델 세트를 포함한다. 단계(28)에서 사용자 음성의 최종 텍스트가 장치 스크린(13)의 GUI를 통해 표시된다.A flowchart illustrating the operation of an example of the method of the present invention is shown in Fig. First, in step 15b, the speech recognition system is operated by the user. For example, a button on a graphical user interface (item 15b in Fig. 2) or an external physical button (not shown) may be selected. Then the module 2 by the ASR module at step 27, i.e. if the user speaks L _a , or by the module 9 if the user speaks L _b , Item 25) is recognized. The ASR modules 2 and 9 apply three modules: an acoustic model 18, an ASR class-based language model 19, and a recognition vocabulary model 20. These models are language specific and each ASR module contains its own set of models. At step 28, the final text of the user's voice is displayed via the GUI of the device screen 13.

그 다음에 입력 언어에 따라서 MT 모듈(3 또는 8)을 통해 번역이 실시된다. MT 모듈(3, 8)은 3개의 메인 모델, 즉 워드 클래스를 식별하는 태깅(tagging) 또는 파싱(parsing) [Collins02] 모델(모델(22)), 클래스 기반 번역 모델(모델(23)) 및 클래스 기반 언어 모델(모델(24))을 적용한다. 태깅 모델(22)은 「J. Lafferty, A. McCallum, and F. Pereira, "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," In Proceedings of 18th International Conference on Machine Learning, pages 282-289, 2001("Lefferty01")」 또는 「Michael Collins, "Parameter estimation for statistical parsing models: Theory and practice of distribution-free methods"(2004) In Harry Bunt, John Carroll, and Giorgio Satta, New Developments in Parsing Technology, Kluwer」에 기재된 형태와 같은 임의의 적당한 형태의 태깅 또는 파싱 모델일 수 있다. 기계 번역 중에 적용되는 다른 모델로는 번역 중의 단어 재정렬 방법을 제한하는 왜곡(distortion) 모델과 문장 길이 모델이 있다. 클래스 기반 기계 번역에 대해 상세한 것은 아래에 설명된다. 최종 번역은 단계(30)에 나타낸 바와 같이 장치(13)의 GUI를 통해 표시된다.The translation is then performed via the MT module (3 or 8) according to the input language. The MT modules 3 and 8 include three main models: a tagging or parsing [Collins 02] model (model 22), a class-based translation model (model 23) A class-based language model (model 24) is applied. The tagging model 22 is described in " J. Lafferty, A. McCallum, and F. Pereira, "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," In Proceedings of 18th International Conference on Machine Learning, pages 282-289, 2001 ("Lefferty01" Such as the form described by Michael Collins, " Parameter estimation for statistical parsing models: Theory and practice of distribution-free methods "(2004) In Harry Bunt, John Carroll, and Giorgio Satta, New Developments in Parsing Technology, Kluwer Type tagging or parsing model. Other models applied during machine translation include the distortion model and the sentence length model, which limit the word rearrangement method during translation. Details of class-based machine translation are described below. The final translation is displayed via the GUI of the device 13 as shown in step 30.

사용자가 번역 출력이 적당한지 여부를 판단하는 것을 지원하기 위해서 자동 생성 번역(도 2, 항목 16)이 MT 모듈(3 또는 8)을 통해 입력 언어로 재번역되어 예컨대 도 2의 항목 15a에 나타낸 바와 같이 원 입력 아래에 괄호로 표시된다. 음성 인식과 번역 모두의 신뢰성이 ASR 모델(2 또는 9)과 MT 모델(3 또는 8)에 의해 정해진 대로 높으면(단계(31)) 음성 출력(항목 26)이 TTS 모듈(4 또는 7)을 통해 확성기(5 또는 6)를 통해 발생한다(단계(33)). 그러지 않으면 시스템은 번역이 틀릴 수 있음을 GUI, 오디오 및/또는 촉각 피드백을 통해 표시한다. 단계(33)에서 이용된 특정 TTS 모듈은 출력 언어에 따라 선택된다.In order to assist the user in determining whether the translation output is appropriate, an automatically generated translation (Figure 2, item 16) is re-translated into the input language via the MT module 3 or 8, for example as shown in item 15a of Figure 2 It is displayed in parentheses under the circle input. If the reliability of both speech recognition and translation is high as determined by the ASR model (2 or 9) and the MT model (3 or 8) (step 31) And is generated via the loudspeaker 5 or 6 (step 33). Otherwise, the system will display via GUI, audio and / or tactile feedback that translation may be incorrect. The particular TTS module used in step 33 is selected according to the output language.

그 후, 사용자가 생성된 번역에 만족하지 않으면 사용자는 단계(27 내지 33) 중 어느 단계에서 음성 번역 과정 중에 또는 이 과정이 완료된 후에 개입할 수 있다. 이는 (단계(35))에서 정정 및 교정 모듈(11)을 발동시킨다. 정정 및 교정 모듈(11)은 사용자가 할 수 있는 정정을 기록하고 로그하며, 이러한 정정은 뒤에 ASR 모듈(2, 9) 및 MT 모듈(3, 8)을 갱신하는데 이용될 수 있다. 이에 대해서는 본 명세서에서 뒤에 더 자세히 설명한다. 정정이 새로운 어휘 항목을 포함하거나(단계(36)), 사용자가 단계(15c)에서 시스템에 새로운 단어를 명시적으로 추가하는 필드 맞춤 모드로 들어가거나, 또는 단계(15d)에서「Thomas Schaaf, "Detection of OOV words using generalized word models and a semantic class language model," in Proc. of Eurospeech, 2001」에 기재된 방법과 같이 새로운 단어가 신뢰도 또는 새로운 단어 모델을 이용하여 입력 오디오에서 자동적으로 검출된다면, 사용자 필드 맞춤 모듈(12)이 발동된다. 이 모듈(12)은 사용자가 액티브 시스템 어휘에 새로운 단어를 추가할 수 있도록 해주는 멀티모달(multimodal) 인터페이스를 제공한다. 사용자가 새로운 단어 또는 구절을 추가하면, ASR, MT 및 TTS 모델(항목 17, 21 및 33a)은 필요에 따라 갱신된다. 이 모듈의 기능에 대해서는 양 언어에 대해서 뒤에 더 자세히 설명한다.Thereafter, if the user is not satisfied with the generated translation, the user may intervene either during the speech translation process at steps 27-33 or after the process is completed. This activates the correction and calibration module 11 (step 35). The correction and calibration module 11 records and logs the corrections that can be made by the user and these corrections can be used later to update the ASR modules 2 and 9 and the MT modules 3 and 8. Which will be described in more detail later in this specification. If the correction includes a new lexical item (step 36), the user enters a field customization mode that explicitly adds a new word to the system at step 15c, or at step 15d the "Thomas Schaaf, Detection of OOV words using generalized word models and a semantic class language model, "in Proc. of Eurospeech, 2001 ", a new field is automatically detected in the input audio using reliability or a new word model, the user field adaptation module 12 is triggered. This module 12 provides a multimodal interface that allows the user to add new words to active system vocabularies. If the user adds a new word or phrase, the ASR, MT and TTS models (items 17, 21 and 33a) are updated as needed. The functionality of this module is described in more detail later in both languages.

양 언어에 대해 ASR과 MT 모두에서는 일반적인 클래스 세트(예컨대 인명, 장소명 및 조직명)가 이용된다. 이는 새로운 단어가 시스템에 추가될 수 있도록 하는 시스템 와이드 시맨틱 슬롯 세트(system-wide set of semantic slots)를 제공한다. 이들 클래스 내에서 발생하는 이름, 특수 용어 및 표현은 서로 다른 사용자 배치, 위치, 문화, 관습 및 업무에 따라서 매우 가변적인 단어이며, 따라서 이들은 사용자 맞춤이 매우 필요하다.For both languages, a generic set of classes (e.g., name, place name, and organization name) is used in both ASR and MT. It provides a system-wide set of semantic slots that allow new words to be added to the system. Names, jargon, and expressions that occur within these classes are highly variable words that are based on different user placements, locations, cultures, customs, and tasks, so they are highly customizable.

바람직한 예에서 사용되는 특정 클래스는 시스템의 적용 분야에 따라 다르다. 클래스는 개체명(named-entity); 사람, 장소 및 조직명; 또는 업무 특정 명사구; 예컨대 음식, 질병 또는 약물명에 대한 시맨틱 클래스; 및 미리 정해져 있는 클래스에 맞지 않는 단어나 구절에 대한 다른 공개 클래스를 포함할 수 있다. 동의어와 같은 신택틱(syntactic) 클래스 또는 단어 등치(equivalence) 클래스도 이용될 수 있다. 적용 분야의 예로는 관광, 의료, 평화유지(peace keeping) 등이 있으나 이에 한정되는 것은 아니다. 일례로서 관광 적용 분야에서 필요한 클래스는 인명, 도시명, 음식명 등을 포함한다. 다른 예로서 의료 전문 분야에서는 필요한 클래스는 질병명, 약물명, 해부학적 명칭 등을 포함한다. 다른 예로서 평화유지 분야에 필요한 클래스는 무기명, 수송수단명 등을 포함한다. 현장 맞춤식 음성 번역을 가능하게 하기 위해 시스템은 사용자 필드 맞춤 모듈(12)과 함께 정정 및 교정 모듈(11)의 동작을 통해 에러 정정이 가능하고 뒤에는 이들 에러로부터 학습이 가능하다.The specific class used in the preferred example depends on the application of the system. A class is a named-entity; The names of people, places and organizations; Or task specific noun phrases; A semantic class for a food, a disease or drug name; And other public classes for words or phrases that do not fit into a predetermined class. Syntactic classes such as synonyms or word equivalence classes may also be used. Examples of applications include, but are not limited to, tourism, medical care, and peace keeping. As an example, classes required in the field of tourism application include names of people, cities, and food names. As another example, the classes required in the medical profession include disease names, drug names, anatomical names, and the like. As another example, the classes needed for peacekeeping include the bearer name, vehicle name, and so on. To enable field-specific voice translation, the system can perform error correction through the operation of the correction and calibration module 11 along with the user field customization module 12, and then learn from these errors.

정정 및 교정 모듈Correction and Calibration Module

정정 및 교정 모듈(11)은 사용자가 언제라도 음성 번역 과정에 개입할 수 있도록 해준다. 사용자는 에러를 식별하고 기록(log)하거나, 원하는 경우에 음성 인식 또는 번역 출력의 에러를 정정할 수 있다. 그와 같은 사용자 개입은 사람간 통신 과정에서 에러를 즉시 정정할 수 있고 또 시스템이 사용자 필요와 관심에 맞추고 실수로부터 학습할 기회를 제공하기 때문에 매우 가치있는 일이다. 이러한 에러 피드백 기능을 설명하는 흐름도는 도 4에 나타나 있다. 사용자가 어떤 말의 번역이 마음에 들지 않으면(즉 에러가 발생하면) 사용자는 현재 입력을 기록할 수 있다(단계(40)). 시스템은 그 말의 오디오는 물론 다른 정보도 로그 파일에 저장할 것이다. 이 로그 파일은 사용자가 뒤에 액세스하여 정정할 수 있으며 전문 사용가가 에러를 식별하고 정정할 수 있도록 공동 데이터베이스에 업로드될 수 있다.The correction and correction module 11 allows the user to intervene in the voice translation process at any time. The user can identify and log errors, or correct errors in speech recognition or translation output, if desired. Such user intervention is very valuable because it can immediately correct errors in the process of communication between people and the system provides opportunities to learn from mistakes, tailored to user needs and interests. A flow chart illustrating such an error feedback function is shown in Fig. If the user does not like the translation of a word (i.e., an error occurs), the user can record the current input (step 40). The system will store the audio as well as other information in the log file. This log file can be accessed and corrected later by the user and uploaded to the common database so that the professional user can identify and correct the error.

사용자는 여러 가지 양식(modality)을 통해 음성 인식 또는 기계 번역 출력을 정정할 수도 있다. 사용자는 말 전체를 다시 말하거나 키보드 또는 수기(handwriting) 인터페이스를 통해 문장을 입력하여 그 말 전체를 정정할 수 있다. 또는 사용자는 출력된 샘플문(output hypothesis) 중의 오류 부분을 터치 스크린, 마우스 또는 커서 키를 통해 강조표시를 하고, 그 구절 또는 단어만을 키보드, 수기, 음성을 이용하거나 그 단어를 한 글자씩 적음으로써 정정할 수 있다. 사용자는 출력된 샘플문 중의 오류 부분을 터치 스크린을 통해 선택하여 이를 자동 생성된 드롭 다운(drop-down) 리스트 중에서 잘 맞다고 생각되는 샘플문을 선택하거나, 또는 오류 부분을 음성으로 또는 다른 보충적 양식(예컨대 수기(handwriting), 철자쓰기(spelling), 바꾸어 말하기(paraphrasing))으로 재입력함으로써 정정할 수도 있다. 이들 방법과 보충적 교정 작업의 적절한 조합 방법은 멀티모달 음성 인식 정정 및 교정에 대해서 Waible 등의 미국특허 제5,855,000호에 제시된 방법을 기반으로 한다. 여기서는 이들 방법은 상호작용하는 음성 번역 시스템의 음성 인식 및 번역 모듈에 적용된다.The user can also correct speech recognition or machine translation output through various modalities. The user can correct the entire word by repeating the whole word or typing the sentence through the keyboard or handwriting interface. Alternatively, the user may highlight an error portion of the output hypothesis through a touch screen, a mouse or a cursor key, using only the phrase or word using keyboard, handwriting, or voice or writing the word one character at a time Can be corrected. The user can select an error portion of the outputted sample sentence through the touch screen and select a sample sentence that is considered to be correct among the automatically generated drop-down list, or to select the error portion as a voice or another supplementary form For example, handwriting, spelling, paraphrasing). &Lt; / RTI > A suitable combination of these methods and supplemental calibration work is based on the method described in U.S. Patent No. 5,855,000 to Waible et al. For multimodal speech recognition correction and correction. Here, these methods are applied to the speech recognition and translation module of an interactive speech translation system.

사용자가 음성 인식 출력을 정정하면(단계(43)), 시스템은 먼저 이 정정이 새로운 단어를 포함하는지 여부를 판단한다(단계(44)). 이러한 판단은 각 언어 La 및 Lb와 연관된 인식 어휘집 모델(20) 내의 단어를 체크함으로써 이루어진다. 만일 이 단어를 찾을 수 없다면 시스템은 사용자에게 원한다면 그 새로운 단어를 액티브 시스템 어휘에 추가하도록 유도한다(도 5, 단계(50)). 그러지 않으면, ASR 모델(도 3, 항목 17) 내의 확률(probability)이 갱신되어 같은 에러가 다시 발생할 가능성을 줄인다. 이는 정정된 단어 계열의 확률이 증가되고, 치열한 경합의(close-competing) 샘플문의 확율이 감소되는 차별적인 방식으로 수행될 수 있다.If the user corrects the speech recognition output (step 43), the system first determines whether the correction includes the new word (step 44). This determination is made by checking the words in the recognition lexicon model 20 associated with each language La and Lb. If the word can not be found, the system prompts the user to add the new word to the active system vocabulary (Fig. 5, step 50). Otherwise, the probability in the ASR model (FIG. 3, item 17) is updated to reduce the likelihood of the same error occurring again. This can be done in a discriminative way in which the probability of the corrected word sequence is increased and the probability of a close-competing sample query is reduced.

사용자는 언어적 지식이 충분하다면 기계 번역 출력을 정정할 수도 있다. ASR 경우에서 이용된 것과 같은 양식이 이용될 수 있다. 기계 번역 출력이 사용자에 의해 정정되고(단계(45)) 이 정정이 새로운 단어를 포함한다면, 사용자에게 그 새로운 단어를 액티브 시스템 어휘에 추가할 수 있는 대화창이 제시된다(도 5, 단계(50)). 이 정정이 이미 액티브 시스템 내에 있는 단어만을 포함한다면 기계 번역 모듈(도 3, 항목 21)이 갱신된다. 특히 정정된 문장쌍에서 구절이 추출되고 이들이 번역 모듈에 섞여지는 임플리멘테이션(implementation)이 이용될 수 있다. 사용된 목적 언어 모델은 ASR 경우와 유사한 방식으로 갱신될 수 있다.The user may correct the machine translation output if the linguistic knowledge is sufficient. The same format as used in the ASR case can be used. If the machine translation output is corrected by the user (step 45) and the correction contains a new word, then a dialog is presented to the user to add the new word to the active system vocabulary (Figure 5, step 50) ). If this correction already includes only words in the active system, the machine translation module (item 3, item 21) is updated. In particular, implementations may be used in which phrases are extracted from corrected sentence pairs and mixed in translation modules. The target language model used can be updated in a similar manner to the ASR case.

사용자 필드 맞춤 모듈User field customization module

사용자 필드 맞춤 모듈(12)은 시스템이 사용자와 협력하여 새로운 단어를 학습할 수 있도록 해준다. 종래의 시스템에서는 사용자는 음성 번역 시스템 내의 어휘들을 변경할 수가 없다. 종래의 시스템과는 달리 사용자 필드 맞춤 모델(12)은 사용자가 실행 시스템에서 컴퓨터 음성 및 언어 처리 기술이나 언어학을 몰라도 비전문가로서 어휘들을 비교적 쉽게 점진적으로 변경할 수 있도록 해준다. 모델(12)은 사용자로부터의 쉽게 이해할 수 있는 피드백을 제공하고 받아들이고 이 피드백에 기초하여 필요한 파라미터와 시스템 구성 모두를 자율적으로 도출함으로써 그와 같은 필드 맞춤을 제공한다. 필드 맞춤 모듈(12)은 이를 1) 사용자 맞춤을 위한 직관적 인터페이스, 및 2) 사용자 맞춤에 필요한 내부 파라미터와 설정 모두를 지동적으로 평가하는 내부 도구(tool)을 통해 달성함으로써 이러한 사용자의 부담을 덜어준다.The user field customization module 12 allows the system to learn new words in cooperation with the user. In the conventional system, the user can not change the vocabularies in the voice translation system. Unlike the conventional system, the user field customization model 12 enables the user to change vocabularies relatively easily and gradually as a non-expert, without knowing computer speech and language processing techniques or linguistics in the execution system. The model 12 provides and accepts easily understandable feedback from the user and autonomously derives both necessary parameters and system configurations based on the feedback to provide such field alignment. The field customization module 12 alleviates the burden on the user by accomplishing this through 1) an intuitive interface for user customization and 2) an internal tool that dynamically evaluates both internal parameters and settings needed for customization give.

단방향 번역에 있어서 시스템은 액티브 시스템 어휘에 새로운 단어나 구절을 추가하기 위해 그 단어나 구절에 대한 최소한 4가지 정보를 처리한다. 이 정보는 다음을 포함한다.In one-way translation, the system processes at least four pieces of information about the word or phrase to add new words or phrases to the active system vocabulary. This information includes:

● 클래스(즉, 새로운 엔트리의 시맨틱 또는 신택틱 클래스)● Class (ie the semantic or syntactic class of the new entry)

● 언어 L_a로 된 단어(즉 L_a로 된 문서)● Words in the language L _a (ie documents in L _a )

● L_a로 된 단어의 발음● Pronunciation of a word with L _a

● 단어를 L_b로 번역(즉 L_b로 된 문서)● Translating a word into L _b (ie a document with L _b )

양방향 번역에 있어서는, 시스템은 L_b로 된 새로운 단어의 발음의 입력도 요구한다. L_b는 TTS가 오디오 출력을 발생할 수 있게 하고 L_b에 대한 ASR 모듈이 새로운 단어를 반대로 인식할 수 있게 한다.In bi-directional translation, the system also requires input of pronunciation of a new word in L _b . L _b allows the TTS to generate an audio output and allows the ASR module for L _b to recognize the new word in reverse.

사용자 필드 맞춤 모델(12)의 동작 단계를 설명하는 플로우 차트는 예컨대 도 5에 나타나 있다. 시스템에 새로운 단어가 입력되면, 앞 부분에서의 정정 및 교정 모델(11)을 통한 정정 개입에 기초하여 이 시스템은 사용자에게 이 단어가 "학습"되어야 하는지, 즉 액티브 시스템 어휘에 추가되어야 하는지 여부를 판단하도록 한다(도 5, 단계(50)). 판단해야 한다면, 단어 학습 모드가 작동되고 필드 맞춤 모듈(12)이 동작하기 시작한다. 필드 맞춤 또는 새단어 학습은 에러 정정 다이얼로그로부터의 결과만을 필요로 하는 것이 아님에 유의한다. 사용자는 풀 다운 메뉴로부터 단어 학습 모드를 입력하고 새로운 단어 또는 새로운 단어 리스트를 연역적으로(a priori) 추가하는 것을 명확히 선택할 수도 있다. 특수 용어, 명칭, 위치 등과 같이 여러 가지 단어가 갑자기 필요한 경우에도 새단어 학습이 작동될 수 있다. 그러나 그러한 모든 경우에 시스템은 상기 정보를 수집해야 한다.The flowchart illustrating the operational steps of the user field customized model 12 is shown, for example, in FIG. When a new word is entered into the system, based on the correction and calibration intervention in the previous section, the system will inform the user whether the word should be "learned ", i.e. added to the active system vocabulary (Fig. 5, step 50). If so, the word learning mode is activated and the field alignment module 12 begins to operate. Note that field fit or new word learning does not only require results from the error correction dialog. The user may explicitly choose to enter a word learning mode from the pull down menu and add a new word or a new word list a priori. New word learning can be activated even when suddenly many words are needed, such as special terms, names, and locations. In all such cases, however, the system must collect the above information.

사용자가 새로운 단어를 시스템 어휘에 추가하기를 원한다는 것을 알리면(단계(50)), 시스템은 먼저 장치 자체에 포함되어 있거나 인터넷을 통해 접근할 수 있는 사전 서비스, 또는 이 둘의 조합인 대용량 외부 사전을 검색한다. 이 외부 사전은 단어 번역 쌍들의 엔트리들로 구성되어 있다. 각 엔트리는 발음뿐만 아니라 새로운 단어가 액티브 시스템 어휘에 쉽게 추가될 수 있도록 해주는 워드 클래스 정보도 포함한다. 각 엔트리는 양 언어로 된 각 워드 쌍의 설명문도 포함한다. 이에 따라서 사용자는 목적 언어에 대한 지식이 없더라도 그 단어의 적당한 번역을 선택할 수가 있게 된다. 만일 새로운 단어가 외부 사전에 포함되어 있다면(단계(51)), 시스템은 그 단어의 택일적 번역문 리스트를 그 번역문 각각의 설명문과 함께 표시한다(단계(52)). 만일 사용자가 사전으로부터 미리 정해진 번역문들 중 하나를 선택하면(단계(53)), 사용자는 그 사전이 제공하는 발음과 기타 정보를 검증할 수 있고(단계(53a)), 필요에 따라서는 이를 편집할 수도 있다. 그런 다음에 이 새로운 단어는 액티브 시스템 어휘에 추가된다.If the user notices that he or she wants to add a new word to the system vocabulary (step 50), the system first accesses a dictionary containing the dictionary itself, which is either contained in the device itself or accessible via the Internet, Search. This external dictionary consists of entries of word translation pairs. Each entry contains not only pronunciation, but also word class information that allows new words to be easily added to the active system vocabulary. Each entry also contains a description of each word pair in both languages. Accordingly, the user can select a proper translation of the word even if the user does not have knowledge of the target language. If a new word is included in the external dictionary (step 51), the system displays the alternate translation list of the word with the description of each of the translations (step 52). If the user selects one of the predefined translations from the dictionary (step 53), the user can verify the pronunciation and other information provided by the dictionary (step 53a), and if necessary, edit You may. This new word is then added to the active system vocabulary.

새로운 단어를 액티브 시스템 어휘에 추가하기 위해서는 3가지 단계(단계(59, 59a, 59b))가 필요하다. 먼저 이 단어와 그 번역이 모듈(2, 9)의 ASR 인식 어휘집에 추가된다(단계(59)). 이 단어는 사전에 주어진 발음과 함께 이 인식 어휘집(20)에 추가된다. 사용자가 이 단어를 입력함에 따라 그 발생 확률이 ASR 클래스 기반 언어 모델(19) 내의 같은 클래스의 경합 단어보다 더 크도록 설정된다. 이것은 사용자가 특정하여 추가한 단어가 발생할 확률이 더 크도록 하기 위함이다. 다음, 이 단어와 그 번역이 MT 모델에 추가되며(도 3, 항목 21), 이에 따라서 시스템은 그 새로운 단어를 양방으로 번역할 수 있게 된다. 마지막으로 이 단어는 TTS 발음 모델(도 3, 모델(33a))에 등록되고, 이 모델에 의해서 시스템은 그 단어를 양 언어로 정확하게 발음할 수가 있게 된다.Three steps (steps 59, 59a, 59b) are required to add a new word to the active system vocabulary. First, the word and its translation are added to the ASR-aware vocabulary of module 2, 9 (step 59). This word is added to this recognition lexicon 20 along with the pronunciation given in the dictionary. As the user enters the word, the occurrence probability is set to be larger than the contention word of the same class in the ASR class-based language model 19. [ This is to ensure that the probability that a word added by the user will occur is greater. Next, the word and its translation are added to the MT model (FIG. 3, item 21) and the system can then translate the new word into both. Finally, the word is registered in the TTS pronunciation model (FIG. 3, model 33a), which allows the system to pronounce the word correctly in both languages.

사용자가 입력한 이 새로운 단어가 외부 사전에 있지 않으면, 시스템은 그 단어를 액티브 시스템 어휘에 등록시키는데 필요한 정보를 자동적으로 발생하고 이 정보를 사용자가 검증할 것이다. 먼저, 이 새로운 단어의 클래스가 주위 단어 문맥(이것을 이용가능하다면)을 이용하여 태깅 모델(도 3, 모델(22))을 통해 평가된다(단계(54)). 다음, 이 새로운 단어의 발음과 번역이 룰 베이스(rule-based) 또는 통계 모델을 통해 자동적으로 발생된다(단계(55)). 그런 다음에, 발생된 정보는 멀티모달 인터페이스를 통해 사용자에게 보여진다(단계(58)). 시스템은 사용자에게 자동 발생된 번역 또는 발음을 검증하거나(단계(58)) 정정하게(단계(57)) 한다. 마지막으로, 사용자가 이 정보를 검증하고 나면, 이 새로운 단어가 액티브 시스템 어휘에 추가된다(단계(59, 59a, 59b)). 새로운 단어(구체적으로 "단어 + 발음 + 단어 클래스")를 ASR 어휘에 동적으로 추가하기 위해서(59), (통상적으로 ASR 모듈(2 또는 9)에 트리(tree) 구조로 저장되어 있는) 인식 어휘집(20)이 검색된 다음에 그 새로운 단어를 포함하도록 갱신된다. 이에 따라 이 새로운 단어는 인식 어휘에 동적으로 추가될 수 있으며, 그에 따라서 이 단어가 다음 말에서 말해질 때에 즉시 인식될 수 있다. ASR 시스템은 종래 시스템에서와 같이 재초기화되거나 재시작할 필요가 없다.If this new word entered by the user is not in the external dictionary, the system will automatically generate the information necessary to register the word in the active system vocabulary and the user will verify this information. First, the class of this new word is evaluated (step 54) through a tagging model (FIG. 3, model 22) using the surrounding word context (if available). Next, pronunciation and translation of this new word is automatically generated (step 55) via a rule-based or statistical model. The generated information is then shown to the user via the multimodal interface (step 58). The system verifies (step 58) and corrects (step 57) the automatically generated translation or pronunciation to the user. Finally, after the user has verified this information, this new word is added to the active system vocabulary (steps 59, 59a, 59b). In order to dynamically add a new word (specifically "word + pronunciation + word class") to the ASR vocabulary 59, a recognition vocabulary (typically stored in a tree structure in the ASR module 2 or 9) (20) is retrieved and then updated to include the new word. Thus, the new word can be added dynamically to the recognition vocabulary, so that it can be immediately recognized when it is said in the following words. The ASR system does not need to reinitialize or restart as in conventional systems.

유사하게, 새로운 단어(구체적으로 "단어 + 발음 + 단어 클래스")가 MT 번역 모델에 첨부될 수 있고(59a), (MT 모듈(3 및/또는 8) 내에 해시 맵(hash-map)으로서 저장될 수 있는) 번역 모델(23)이 검색되고, 이 새 단어, 그 번역 및 단어 클래스를 포함하는 새로운 번역쌍이 첨부된다. 이에 따라 이 새 단어는 MT 모델(3 및/또는 8)에 동적으로 추가될 수 있고, 진행하는 말에서 정확하게 번역될 것이다. MT 시스템은 종래 시스템에서와 같이 재초기화되거나 재시작할 필요가 없다.Similarly, a new word (specifically "word + pronunciation + word class") may be attached to the MT translation model 59a and stored as a hash-map within MT module 3 and / Translation model 23) is searched and a new translation pair including this new word, its translation and the word class is attached. This new word can thus be dynamically added to the MT model (3 and / or 8) and will be correctly translated in the proceeding speech. The MT system does not need to be reinitialized or restarted as in conventional systems.

이 분야의 비전문가라 하더라도 맞춤 작업을 실시할 수 있도록 이 모든 정보를 자동적으로 평가하는 것이 필수적이다 . 다음에서는 단어에 대한 이 중요한 중보를 자동적으로 평가하는 방법과 이것을 사용자가 직관적으로 얻거나 검증할 수 있는 방법에 대해서 자세히 설명한다.Even non-specialists in this field are required to automatically evaluate all this information so that they can be customized. The following sections describe how to automatically assess this important intermission of words and how they can be obtained or verified intuitively by the user.

새로운 단어의 발음과 번역의 발생Pronunciation and translation of new words

음성 번역 시스템의 사용자는 보통은 음성학, 언어학, 언어 기술에 대해서 아는 바가 없고 다른 언어로 된 단어와 그 용법에 대해서도 알지 못하므로 시스템에 추가하고자 하는 새로운 단어의 번역과 그 모든 관련 정보(발음, 철자법, 단어 용법 등)를 제공할 수 있을 것으로는 기대할 수 없다. 따라서 사용자가 새로운 단어를 입력하면 시스템은 단어 클래스를 평가하고 그 단어의 번역과 발음 정보를 양 언어로 자동적으로 발생한다.Users of a speech translation system usually do not know about phonetics, linguistics, or language skills, nor do they know the words and their usage in other languages. Therefore, the translation of new words to be added to the system and all related information , Word usage, etc.) can not be expected. Therefore, when the user enters a new word, the system evaluates the word class and automatically generates the translation and pronunciation information of the word in both languages.

새로운 단어를 액티브 시스템 어휘에 추가하기 위해서는 그 단어의 번역과 그 단어와 그 번역에 대한 발음이 필요하다. 이 정보를 발생하는 것은 예컨대 도 6에 나타낸 3 단계 과정에 따라 구현될 수 있다. 먼저, 단어의 발음이 발생된다(단계(60)). 그 단어의 문자 계열과 그 발음에 기초하여 번역이 발생된다(단계(61)). 다음, 전 단계에서 발생된 정보에 기초하여 목적 언어로 된 새로운 단어의 발음이 발생된다(단계(62)). 일영 현장 유지가능한 S2S 번역 시스템 내의 여러 가지 기술을 이용하여 이 정보를 발생하는 2가지 예가 도 6의 좌측에 나타나 있다. 새로운 영단어 "Wheeling"을 시스템에 부가하기 위해서(항목 61)는 먼저 기계 학습을 통해 영어 발음이 발생된다(단계(61)). 기계 학습은 「Damper, R. I. (Ed.), Data-Driven Techniques in Speech Synthesis. Dordrecht, The Netherlands: Kluwer Academic Publishers (2001)」에 기재된 것과 같은 임의의 적당한 기술에 따라 실시될 수 있다. 다음, 통계적 기계 번역을 통해 이 단어의 일본어로의 번역이 자동적으로 발생되고(단계(66)), 그 다음에 수동적 정의 규칙에 따라서 일본어 발음이 발생된다(단계(67)). 번역은 임의의 적당한 통계적 기계 번역 엔진을 이용하여 달성될 수 있다. 이 예로는 HLT/NAACL-2007에 나타나는 「K. Knight and J. Graehl, Machine transliteration. Computational Linguistics 24 4 91998), pp. 599-612」및 「Bing Zhao, Nguyen Bach, Ian Lane, and Stephan Vogel, "A Log-linear Block Transliteration Model based Bi-Stream HMMs"」. 그런 다음에, 발생된 정보(항목 68)은, 그 단어를 액티브 시스템 어휘에 등록하기 전에, 음향 구동과 음성열을 통해 사용자에 의해 검증된다.To add a new word to an active system vocabulary, it is necessary to translate the word and pronounce the word and its translation. This information may be generated in accordance with, for example, a three-step process shown in FIG. First, pronunciation of a word is generated (step 60). A translation is generated based on the character sequence of the word and its pronunciation (step 61). Next, pronunciation of a new word in the target language is generated based on the information generated in the previous step (step 62). Two examples of generating this information using various techniques within the S2S translation system that can be maintained in the field of the day are shown on the left side of FIG. To add a new English word "Wheeling " to the system (item 61), the English pronunciation is first generated through machine learning (step 61). Machine learning is described in " Damper, R. I. (Ed.), Data-Driven Techniques in Speech Synthesis. Dordrecht, The Netherlands: Kluwer Academic Publishers (2001). &Quot; Next, a translation of the word into Japanese is automatically generated (step 66) through statistical machine translation, and then Japanese pronunciation is generated in accordance with the passive definition rule (step 67). Translation can be accomplished using any suitable statistical machine translation engine. This example is based on the "K. Knight and J. Graehl, Machine transliteration. Computational Linguistics 24 4 91998), pp. 599-612 " and Bing Zhao, Nguyen Bach, Ian Lane, and Stephan Vogel, "A Log-linear Block Transliteration Model based Bi-Stream HMMs ". The generated information (item 68) is then verified by the user via the acoustic drive and the voice sequence before registering the word in the active system vocabulary.

유사하게, 새로운 일본어 단어 "Wakayama"(항목 70)를 시스템에 추가하기 위해서, 먼저 수동적 정의 규칙에 따라서 일본어 발음이 발생된다(단계(71)). 다음, 일본어로 된 이 단어의 번역이 룰 베이스 번역에 따라 자동적으로 발생되고(단계(72)), 그 다음에 수동적 정의 규칙을 통해 영어 발음이 발생된다(단계(73)). 이 룰 베이스 번역은 「Mansur Arbabi, Scott M. Fischthal, Vincent C. Cheng, and Elizabeth Bar, "Algorithms for Arabic name transliteration," IBM Journal of research and Development, 38(2): 183-193, 1994」의 방법을 이용하여 수행될 수 있다. 그 다음에, 발생된 정보(항목 74)는 그 단어를 액티브 시스템 어휘에 등록하기 전에 사용자에 의해 검증된다.Similarly, in order to add a new Japanese word "Wakayama" (item 70) to the system, Japanese pronunciation is first generated in accordance with the manual definition rule (step 71). Next, the translation of this word in Japanese is automatically generated in accordance with the rule base translation (step 72), and then the English pronunciation is generated via the passive definition rule (step 73). This rule base translation is described in Mansur Arbabi, Scott M. Fischthal, Vincent C. Cheng, and Elizabeth Bar, "Algorithms for Arabic name transliteration," IBM Journal of research and development, 38 (2): 183-193, 1994 Method. &Lt; / RTI > The generated information (item 74) is then verified by the user before registering the word in the active system vocabulary.

사용자는 그 발생된 번역과 발음을 가청 출력을 통해 검증할 수 있다. 또는, 사용자가 더 적당하다고 생각한다면 모국어로 된(즉, 사용자가 영어 사용자라면 중국어로 "Hanyu Pinyin"으로, 일본어로 "Romaji"로) 문서가 이용될 수 있다. 사용자는 필요에 따라 그 번역 및/또는 발음을 편집할 수 있다. 사용자가 승인하고 나면 그 단어와 단어 특성이 다국어 시스템 사전에 추가된다.The user can verify the generated translation and pronunciation by audible output. Alternatively, if the user thinks it is more appropriate, the document may be in a native language (i.e., "Hanyu Pinyin" in Chinese if the user is an English user, or "Romaji" in Japanese). The user can edit the translation and / or pronunciation as needed. Once approved by the user, the word and word characteristics are added to the multilingual system dictionary.

또한 이 시스템에 따르면 대화식 사용자 입력에 따라서 필요한 정보를 자동적으로 발생함으로써 사전에 추가되는 새로운 단어를 번역할 필요가 없게 된다. 사용자 인터페이스의 예는 도 3에 나타나 있다.The system also automatically generates the necessary information according to the interactive user input, thereby eliminating the need to translate new words that are added in advance. An example of a user interface is shown in FIG.

대화식 사용자 인터페이스Interactive user interface

그 후, 시스템은 사용자에게 평가된 언어 정보를 확인하고 검증하도록 한다. 이것은 특별한 언어적 또는 기술적 지식을 가정하지 않도록 직관적으로 행해진다. 따라서 적당한 인터페이스가 이용된다. 다음에서는 새로운 단어의 학습 중의 사용자 상호작용(interaction)에 대해서 설명한다.The system then allows the user to verify and verify the evaluated language information. This is done intuitively so as not to assume any particular linguistic or technical knowledge. Therefore, a suitable interface is used. The following describes user interaction during learning of new words.

인터페이스에서 사용자는 메뉴에서 "새단어" 모드를 선택할 수 있고, 또는 사용자 정정이 새로운/모르는 단어를 발생한 후에 새단어 학습 모드가 발동될 수 있다. 나타내는 창을 통해 사용자는 이제는 원하는 새단어, 명칭, 특수 용어, 개념, 표현을 타이핑할 수 있다. 그러면 사용자의 언어로 된 철자 입력(이것은 영어가 아닌 문자 세트, 예컨대 중국어, 일본어, 러시아어 등일 수 있음)에 기초하여 시스템은 로마 알파벳과 예상되는 단어 발음을 발생한다. 이것은 수기 변환 규칙이나 기존의 음성 사전에서 추출된 또는 번역된 음성 데이터로부터 학습된 변환 규칙에 따라 행해진다. 그러면 사용자는 그 자동 변환을 보고 발생된 발음의 소리를 TTS를 통해 구동시킬 수 있다. 사용자는 이들 표현들(각 언어로 된 스크립트, 로마자로 된 번역, 음성 표기 및 그 소리) 중 어느 것이라도 반복하고 변경할 수 있고, 다른 대응하는 엔트리도 마찬가지로 재발생될 수 있다(따라서 한 언어로 된 변경된 표기는 다른 언어로 된 표기를 변경할 수 있다).At the interface, the user can select the "new word" mode from the menu, or the new word learning mode can be triggered after the user correction has generated a new / unknown word. Through the window that displays, the user can now type new words, names, jargon, concepts, and expressions that they want. Based on the spelling input in the user's language (which can be a non-English character set, such as Chinese, Japanese, Russian, etc.), the system generates the Roman alphabet and the expected word pronunciation. This is done according to the conversion rules learned from handwriting conversion rules or from voice data extracted or translated from existing speech dictionaries. Then, the user can view the automatic conversion and drive the sound of the generated pronunciation through the TTS. The user can iterate and change any of these expressions (script in each language, translation in romanization, phonetic transcription and the sound), and other corresponding entries can be re-generated as well (thus, The notation can change the notation in another language).

또한 시스템은 유사한 문장 문맥 중의 (알고 있는 클래스를 가진) 다른 단어의 동시 발생 통계에 기초하여 그 새로운 단어가 속하는 가장 가능성이 높은 단어 클래스를 자동적으로 선택할 수 있다. 그러나 사용자가 그와 같은 평가된 클래스 평가를 무시할 수 있도록 새로운 단어창을 통해 이 클래스 아이덴티티(class identity)의 수동적 선택(및/또는 정정)도 가능하다.The system can also automatically select the most likely word class to which the new word belongs based on the coincidence statistics of other words (with known classes) in a similar sentence context. However, it is also possible to manually select (and / or correct) this class identity through a new word window so that the user can ignore such evaluated class evaluation.

요약하면, 사용자로부터 새로운 단어/구절이 주어지면, 시스템은,In summary, given a new word / phrase from the user,

● (ASR 및 MT 성분이 사용하는) 엔트리의 시맨틱 클래스를 자동적으로 분류하고,Automatically classify the semantic classes of entries (used by ASR and MT components)

● (L₁에 대한 ASR 및 TTS가 사용하는) 단어에 대한 발음을 자동적으로 발생하고,• Automatically generate pronunciations for words (used by ASR and TTS for L ₁ )

● (MT 성분 모두가 사용하는) 단어의 번역을 자동적으로 발생하고,● Automatically generates a translation of the word (used by all MT components)

● (L₂에 대한 ASR 및 TTS가 사용하는) 단어에 대한 발음을 자동적으로 발생하고,• Automatically generate pronunciations for words (used by ASR and TTS for L ₂ )

● 사용자가 자동 발생된 데이터를 필요에 따라서 정정/편집하고, ● The user can correct / edit the automatically generated data as needed,

● 사용자가 자동 발생된 번역이 적당한지 여부를 검증할 수 있는(즉 TTS를 통해 단어의 발음을 들을 수 있는) 여러 가지 양식을 제공한다.● Provide a variety of forms that allow the user to verify that the automatically generated translation is appropriate (ie, hear the pronunciation of the word through TTS).

사용자가 시스템 내의 미리 정의된 클래스와 부합하지 않은 단어를 입력하면 사용자는 그 단어를 '미지' 클래스에 할당할 수 있다. ASR에 있어서는 이 '미지' 클래스는 학습 데이터에서는 발생했으나 인식 어휘집에는 없는 단어들로 정의된다. SMT에 있어서는 번역 어휘집에서 발생하지 않는 이중 언어(bilingual) 엔트리가 목적 언어 모델 내의 미지 태그에 설정된다.If the user enters a word that does not match the predefined class in the system, the user can assign the word to the 'unknown' class. In ASR, this 'unknown' class is defined as words that occur in the learning data but not in the recognition vocabulary. In SMT, a bilingual entry that does not occur in the translated lexicon is set in an unknown tag in the target language model.

클래스내 확률 및 관련성 향상Improved probability and relevance within classes

이들 입력 방법들 중 어느 것도 언어 학습을 필요로 하지 않으며 사용자가 새로운 단어가 알맞게 표현되었는지 여부를 판단할 수 있는 직관적 방법을 제공하지 않는다. 그러면 사용자는 이 단어를 사용자의 개인 어휘집인 "다국어 시스템 사전"에 추가함으로써 이 새로운 단어 엔트리를 수용할 수 있다. 전체 시스템은 표준 어휘집을 맞춤식 어휘집과 통합하여 사용자의 런타임 사전으로 만든다.Neither of these input methods requires language learning and does not provide an intuitive way for the user to determine whether a new word is properly represented. The user can then accept the new word entry by adding it to the user's personal lexicon, "Multilingual System Dictionary ". The entire system integrates standard vocabularies with custom vocabularies to create a user's runtime dictionary.

상기 5개의 엔트리에 더하여 클래스내 확률 P(w｜C)도 정의된다. 이런 방식으로 시스템은 동일 클래스에 속하는 단어들 간을 구별할 수 있다. 따라서 사용자의 작업, 선호 및 습관에 더 가까운 단어가 우선할 것이며 더 높은 클래스내 확률이 할당된다. 이러한 더 높은 클래스내 확률의 증가는 관찰에 따라 평가되는 사용자에의 관련성을 기초로 결정된다.The in-class probability P (w | C) is also defined in addition to the five entries. In this way, the system can distinguish words belonging to the same class. So words that are closer to the user's work, preferences and habits will have priority and are assigned a higher probability in the class. This increase in probability in the higher class is determined based on the relevance to the user, which is evaluated according to observation.

● 새로운 단어 엔트리 및 그 최신성(recency)New word entries and their recency

○ 입력된 새단어는 사용자가 이 단어를 입력함으로써 이 단어를 원하고 있음을 보여주었기 때문에 가까운 장래에 사용될 가능성이 크며, 따라서 클래스내 확률이 다른 기존의 클래스 엔트리에 비해 향상된다(증가된다).○ New words entered are likely to be used in the near future because they showed the user that they wanted the word by inputting the word, so the probability in the class is improved (increased) compared to the existing class entries.

● 새단어와 다음을 포함하는 사용자 활동, 관심 및 업무 간의 상관.● Correlate user activities, interests and tasks, including new words and:

○ 도시명, 랜드마크, 관심 있는 장소 등○ Town names, landmarks, places of interest, etc.

○ 과거 이용의 이력History of past use

○ 동시 발생 통계(스시(Sushi)는 보고타보다는 도쿄와 더 잘 상관한다)Coincident statistics (Sushi correlates better with Tokyo than Bogota)

● 다음을 포함하는 새단어의 일반적인 돌출성(saliency)● The general saliency of new words, including:

○ 도시 인구○ Urban population

○ 매체에서의 최근 촌평○ Recent Opinion in Media

그와 같은 관찰과 관련성 통계는 사용자의 관찰 위치, 이력 또는 활동에 기초하여, 그리고/또는 인터넷과 같은 큰 배경 언어 자원에서의 시스템의 새단어의 발생을 관찰함으로써 수집된다. 그와 같은 통계는 단일어로 데이터 풍부 언어로 수집되어 번역 사전과 번역 언어 모델에 적용될 수 있다.Such observations and relevance statistics are collected by observing the occurrence of new words in the system on a large background language resource, such as the Internet, and / or based on the user's observation location, history or activity. Such statistics can be collected in a data-rich language in a single language and applied to translation dictionaries and translation language models.

사용자의 새로운 활동과 업무는 그와 같은 단어가 시간이 지남에 따라 사용가능성이 줄어들기 때문에 그리고/또는 (다른 도시에의 도착과 같은) 새로운 정보가 단어의 서브클래스의 관련성을 떨어지게 만든다면 부스트된(boosted) 단어의 관련성은 시간이 지남에 따라 감소할 수도 있다.The user's new activities and tasks will be boosted if such information becomes less available over time and / or new information (such as arrival in another city) makes the subclass of the word less relevant the relevance of boosted words may decrease over time.

크로스 모달 엔트리(cross-modal entry)Cross-modal entry

선택적으로 새로운 단어는 다음 중 하나에 따라 입력된다.Optionally, new words are entered according to one of the following:

● 말하기: 사용자는 새로운 단어를 말한다. 발음과 번역과 같은 모든 정보는 전과 같이 그러나 음향 입력에 기초하여 새로운 단어 모델, 번역 모델, 배경 사전에 의해 평가된다. 시스템은 구두 대화에 관여하여 클래스 아이덴티티와 기타 다른 관련 정보를 선택할 수 있다.● Speaking: The user speaks a new word. All information, such as pronunciation and translation, is assessed by a new word model, translation model, and background dictionary based on sound input, just as before. The system can participate in verbal conversations and select class identity and other relevant information.

● 철자 쓰기: 사용자는 새로운 단어를 음향적으로 그 철자를 쓴다. 이 입력 방법은 일반적으로 이 단어를 말하는 것보다 정확한 번역 가능성을 향상시킨다. 이 방법은 말하기와 기타 다른 입력 양식에 보충적으로 이용될 수도 있다.● Spelling: The user spells the new word acoustically. This input method generally improves the accuracy of the translation more accurately than the word. This method may be supplemented by speech and other input forms.

● 수기: 사용자는 새로운 단어를 수기로 입력한다. 이 입력 방법은 일반적으로 이 단어를 말하는 것보다 정확한 번역 가능성을 향상시킨다. 이 방법은 말하기, 철자 쓰기 기타 다른 입력 양식에 보충적으로 이용될 수도 있다.● Handwriting: The user enters a new word by hand. This input method generally improves the accuracy of the translation more accurately than the word. This method may be supplemented by other forms of speech, spelling and other input.

● 브라우징(browsing): 새로운 단어는 대화식 브라우징에 따라 선택될 수도 있다. 여기서 시스템은 사용자의 최근 용법 이력 및/또는 최근 선택된 입력된 새로운 단어와 유사한 통계적 프로파일을 가진 텍스트를 인터넷을 통해 검색함으로써 관련된 적절한 새로운 단어를 제시할 수 있다.● Browsing: New words may be selected according to interactive browsing. Where the system can present relevant new words relevant to the user by retrieving via the Internet a text having a statistical profile similar to the user's recent usage history and / or recently selected input new words.

인터넷을 통한 원격 새 단어 학습 및 공유 어휘집 개발Develop new remote word learning and shared vocabulary through internet

이전 절에서 설명한 방법들은 모두 개별 사용자가 음성 번역 시스템을 현장에서의 자신의 개인적 필요와 업무에 맞출 수 있도록 하는 것에 목표를 두고 있다. 그러나 그와 같은 사용자 맞춤들 중 많은 것은 다른 사용자에게도 역시 유용할 수 있다. 실시예에서 사용자 맞춤은 이해 관계자들 간에 명칭, 특수 용어 또는 표현들이 공유되는 커뮤니티 와이드(community wide) 데이터베이스에 업로드될 수 있다. 어휘 엔트리, 번역 및 클래스 태그가 수집되어 유사한 목적으로 가진 커뮤니티와 관련된다. 다음 사용자들은 이들 공유 커뮤니티 자원을 다운로드받아 자신들의 시스템에 자원으로서 추가할 수 있다.All of the methods described in the previous section are aimed at enabling individual users to tailor the speech translation system to their personal needs and tasks in the field. However, many of such customizations may also be useful to other users as well. In an embodiment, the customization may be uploaded to a community wide database in which names, jargon or expressions are shared among stakeholders. Vocabulary entries, translations, and class tags are collected and associated with communities with similar purposes. The following users can download these shared community resources and add them as resources to their systems.

또는 사용자들은 조잡하게 번역된 문장을 업로드만 하고 커뮤니티로부터 수동 번역을 요청하는 것을 선택할 수 있다. 그와 같은 부정확하거나 불완전한 소스 단어나 문장과 이들의 누락되거나 부정확한 번역에 대해서 다른 사용자가 자발적으로(즉 무료로) 온라인 교정과 번역을 제공할 수 있다. 이러한 교정과 번역은 갱신된 공유 커뮤니티 번역 데이터베이스에 다시 한번 제출된다.Alternatively, users can choose to upload a roughly translated sentence and request a manual translation from the community. Other users may voluntarily (ie, free of charge) provide online proofreading and translation for such inaccurate or incomplete source words or phrases and their missing or inaccurate translations. These corrections and translations are submitted once again to the updated shared community translation database.

자율 적응(unsupervised adaptation)Unsupervised adaptation

정정, 교정 및 새 단어 학습 후에, 최종적으로 정정된 샘플문과 이에 따라 구두 문장의 진본 또는 번역문을 얻게 된다. 음성 번역 장치 또는 시스템은 그와 같은 실제 자료(ground truth)가 ASR 모듈(도 1, 모듈(2 또는 9))을 장치의 주 사용자에게 더 적응시키기 위해 제공되었다는 사실을 이용할 수 있다. 그와 같은 적응은 장치의 정확도와 유용성을 개선하도록 설계된다. 2가지 특정 적응 방법이 수행된다. 이는 첫째는 사용자의 음성을 더 잘 인식하도록 하는 시스템의 적응과 음향 모델 및 발음 모델 적응이고, 둘째는 언어 모델 적응을 통한 사용자의 말투에 적응하는 것이다. 프로파일은 특정 사용자에 대한 적응 데이터를 저장하는데 이용되고 현장에서 바꾸어질 수 있다.After correction, correction and new word learning, the final corrected sample sentence and thus the original sentence or translation of the oral sentence are obtained. A voice translation device or system may use the fact that such ground truths have been provided to further adapt the ASR module (FIG. 1, module 2 or 9) to the main user of the device. Such adaptation is designed to improve the accuracy and usability of the device. Two specific adaptation methods are performed. The first is the adaptation of the system and the acoustic model and pronunciation model adaptation to better recognize the user's voice, and the second is adaptation to the user's speech through language model adaptation. Profiles are used to store adaptation data for a particular user and can be changed in the field.

클래스 기반 기계 번역Class-based machine translation

이전 절에서는 에러 교정 및 새 단어 학습에 대해서 설명하였다. 이들 모듈에서는 클래스 기반 기계 번역에 대해서 언급하였다. 다음에서는 그와 같은 클래스 기반 기계 번역의 상세한 기능에 대해 설명한다.In the previous section, error correction and new word learning were described. These modules refer to class-based machine translation. The following describes the detailed functions of such class-based machine translation.

접근 방식Approach

현 기술 수준의 기계 번역 시스템은 단어 레벨(word-level)로 번역을 수행한다. 이는 다음의 3가지 문헌; (1) 「P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. 'Moses: Open source toolkit for statistical machine translation', In Proc. ACL, 2007("[Koehn07"); (2) 「D. Chiang, A. Lopez, N.Madnani, C. Monz, P. Resnik and M. Subotin, "The Hiero machine translation system: extensions, evaluation, and analysis,", In Proc. Human Language Technology and Emperical Methods in Natural Language Processing, pp. 779-786, 2005("Chiang05")」; 및 「K. Yamada and K. Knight "A decoder for syntax-based statistical MT". In Proc. Association for Computational Linguistics, 2002("Yamada02")」에 기재된 것을 포함하는 종래의 번역 시스템으로부터 보아 명백하다. 단어순 정렬이 수행되고; 번역 샘풀이나 구절 쌍이 단어 레벨에서 매치되고; 그리고 단어 기반 언어 모델이 적용된다. Chiang05의 것과 같은 계층적 번역 모듈과 Yamada02의 것과 같은 신택스 기반 번역 모델은 중간 구조를 도입함으로써 이를 바탕으로 확장한다. 그러나 이들 접근 방식들은 여전피 정확한 단어 매치를 필요로 한다. 단어 각각은 독립된 실체로 취급되므로 이들 모델은 보이지 않는 단어들에까지 일반화되지 못한다.The current machine-level translation system performs translation at word-level. This is illustrated in the following three documents; (1) "P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. 'Moses: Open source toolkit for statistical machine translation', In Proc. ACL, 2007 ("[Koehn07"); (2) "D. Chiang, A. Lopez, N. Madnani, C. Monz, P. Resnik and M. Subotin, "The Hiero machine translation system: extensions, evaluation, and analysis," In Proc. Human Language Technology and Emperical Methods in Natural Language Processing, pp. 779-786, 2005 ("Chiang05"); And " K. Yamada and K. Knight "A decoder for syntax-based statistical MT". In Proc. Association for Computational Linguistics, 2002 ("Yamada02"). Word sorting is performed; The translation sample or phrase pairs are matched at the word level; And a word-based language model. Hierarchical translation modules such as those in Chiang05 and syntax-based translation models such as those in Yamada02 expand on the basis of introducing intermediate structures. However, these approaches still require exact word matching. Since each word is treated as an independent entity, these models are not generalized to invisible words.

클래스 기반 기계 번역의 일 실시예는 클래스 기반 통계적 기계 번역인데, 여기서는 외국어 문장 f^J ₁=f₁, f₂, ..., f_J가 다음과 같이 주어지는 최대 가능성을 가진 샘플문 ^e^I ₁을 검색함으로써 다른 언어 e^I ₁=e₁, e₂, ..., e_I로 번역된다.One example of a class-based machine translation is a class-based statistical machine translation where a sample sentence ^ e ^I ₁ = f ₁ , f ₂ , ..., f _J with the maximum likelihood given by a foreign language sentence f ^J ₁ = by searching for other languages ^{_{_{e I 1 = e 1, e}}} 2, ..., e _I is translated.

클래스는 개체명과 같은 시맨틱 클래스, 신택틱 클래스 또는 등가적 단어 또는 단어 구들로 이루어진 클래스일 수 있다. 일례로서 개체명 클래스가 시스템에 포함된 경우에 대해서 설명한다.

A class can be a semantic class, a syntactic class, equivalent to an object name, or a class consisting of equivalent words or word phrases. As an example, the case where the entity name class is included in the system will be described.

번역 중에 적용된 2개의 가장 유용한 정보를 제공하는 모델은 목적 언어 모델 P(e^I ₁)과 번역 모델 P(f^J ₁｜e^I ₁)이다. 클래스 기반 통계적 기계 번역 체계 P(f^J ₁｜e^I ₁)는 클래스 기반 번역 모델(도 3, 모델(23))이고 P(e^I ₁)는 클래스 기반 언어 모델(도 3, 모델(24))이다.The models that provide the two most useful information applied during translation are the target language model P (e ^I ₁ ) and the translation model P (f ^J ₁ | e ^I ₁ ). Class-based statistical machine translation system, ^{_{^{P (f J 1 | e I}}} 1) is a class-based translation model (Fig. 3, model 23) and P (e ^I ₁₎ is a class-based language model (Fig. 3, model (24) )to be.

통계적 기계 번역 체계를 위한 클래스 기반 모델은 도 10에 나타낸 절차를 이용하여 학습될 수 있다. 먼저, 문장 쌍의 학습 코퍼스(corpora)가 정규화되고(단계(100)) 태깅 모델(도 3, 모델(22))을 이용하여 그 코퍼스에 태그를 붙인다(단계(101)). 이를 위한 한 가지 방식은 Lafferty01에 기재되어 있다. 이 단계에서 학습 쌍을 구성하기 위해 조합되는 문장들에는 독립적으로 또는 공동으로 태그가 부착될 수 있고, 한 언어로부터의 태그는 다른 언어에 투영될 수 있다. 전체 학습 코퍼스에 태그가 부착된 다음에는 문장 쌍 내의 단어들이 정렬된다(단계(102)). 정렬은 「Franz Josef Och, Christoph Tillmann, Hermann Ney: "Improved Alignment Models for Stastistical Machine Translation"; pp. 20-28; Proc. of the Joint Conf. of Emperical Methods in Natural Language Processing and Very Large Corpora; University of Maryland, College Park, MD, June 1999」; 및 「Brown Peter F., Stephen A. Della Pietra, Vincent J. Della Pietra, and R. L. Mercer. 1993. "The mathematics of statistical machine translation: Parameter estimation, "Computatiional Linguistics, vol 19(2): 263-311」에 기재된 것과 같은 현재 방식을 이용하여 달성될 수 있다. 이 단계에서는 태그가 부착된 실체(즉 "New York") 내의 멀티워드 구절은 단일 토큰(token)으로 취급된다. 다음, 클래스 기반 번역 모델(도 3, 모델(23))을 생성하기 위해 Koehn07과 같은 방법을 이용하여 구절이 추출된다(단계(103)). 클래스 기반 목적 언어 모델(도 3, 모델(24))을 학습시키는 데는 태그가 부착된 코퍼스도 이용된다. 학습은 「B. Suhm and W. Waibel, "Towards better language models for spontaneous speech" in Proc. ICSLP-1994, 1994("Suhm94")에 기재된 것과 같은 절차를 이용하여 달성될 수 있다(단계(104)).A class-based model for the statistical machine translation system can be learned using the procedure shown in FIG. First, the learning corpus of the sentence pair is normalized (step 100) and the corpus is tagged with the tagging model (FIG. 3, model 22) (step 101). One way to do this is described in Lafferty01. At this stage, sentences combined to form a learning pair can be tagged independently or collectively, and tags from one language can be projected into another language. After the tag is attached to the entire learning corpus, the words in the sentence pair are aligned (step 102). Sorting is "Franz Josef Och, Christoph Tillmann, Hermann Ney:" Improved Alignment Models for Stastical Machine Translation "; pp. 20-28; Proc. of the Joint Conf. of Emperative Methods in Natural Language Processing and Very Large Corpora; University of Maryland, College Park, MD, June 1999 "; And Brown Peter F., Stephen A. Della Pietra, Vincent J. Della Pietra, and R. L. Mercer. 1993. "The mathematics of statistical machine translation: Parameter estimation," Computatiional Linguistics, vol 19 (2): 263-311. At this stage, the multi-word phrases within the tagged entity (ie, "New York") are treated as a single token. Next, a phrase is extracted using a method such as Koehn07 to generate a class-based translation model (FIG. 3, model 23) (step 103). A tagged corpus is also used to learn the class-based object language model (FIG. 3, model 24). Learning is "B. Suhm and W. Waibel, "Towards better language models for spontaneous speech" in Proc. Can be accomplished using procedures such as those described in ICSLP-1994, 1994 ("Suhm94") (step 104).

입력 문장을 번역하기 위해 도 11에 나타낸 방법이 적용된다. 먼저, 학습 코퍼스에 적용된 것과 같은 유사한 절차를 이용하여 입력 문장이 정규화되고(단계(105)) 태그가 부착된다(단계(106)). 이 입력 문장은 단일언어 태거(tagger)(도 3, 모델(22))를 이용하여 태그가 부착된다. 다음, 이 입력 문장은 클래스 기반 MT 모델(도 3, 모델(23, 24))을 이용하여 디코딩된다. 클래스 기반 통계적 기계 번역을 위해서 디코딩은 표준 통계적 기계 번역에 이용된 것과 동일한 절차를 이용하여 실시된다. 그러나 구절 쌍은 아래의 예에서 보는 바와 같이 단어가 아니라 클래스 레벨에 매치된다.The method shown in Fig. 11 is applied to translate the input sentence. First, the input sentence is normalized (step 105) and tagged (step 106) using a similar procedure as applied to the learning corpus. This input is tagged using a single language tagger (FIG. 3, model 22). Next, this input sentence is decoded using a class-based MT model (FIG. 3, models 23 and 24). For class-based statistical machine translation, decoding is performed using the same procedure used for standard statistical machine translation. However, the phrase pair matches the class level, not the word, as shown in the example below.

태그가 부착된 입력 문장이 다음과 같이 주어지면,If the tagged input sentence is given as follows,

the train to @PLACE.city{Wheeling} leaves at @TIME{4:30}the train to @PACE.city {Wheeling} leaves at @TIME {4:30}

다음 구절이 매치될 수 있다.The following phrases can be matched.

클래스(즉, @PLACE.city{Wheeling}, @TIME{4:30}) 내의 단어나 구절이 바로 통과되거나(이는 번호/시각에 대한 경우임), 번역이 번역 모델로부터 결정된다. 사용자는 "사용자 필드 맞춤 모듈"(도 1, 모듈(12))을 통해 번역 모델에 새로운 단어를 추가할 수 있다. 사용자가 (도 6의 예에서 상세히 나타낸 바와 같이) 도시명 "Wheeling"을 이미 추가한 경우에는 번역 모델은 다음 구절도 포함할 것이다.A word or phrase in a class (ie, @ PLACE.city {Wheeling}, @ TIME {4:30}) is immediately passed (for number / time) or translation is determined from the translation model. The user can add new words to the translation model via the "user field customization module" (FIG. 1, module 12). If the user has already added the city name "Wheeling " (as detailed in the example of Fig. 6), the translation model will also include the following phrase.

번역 모델 확률이 P(f^J ₁｜e^I ₁)(도 3, 모델(23))로 MT 클래스 기반 언어 모델 확률이 P(e^I ₁)(도 3, 모델(24))로 주어지면 최대 가능성 P(f^J ₁｜e^I ₁)·P(e^I ₁)을 가진 번역 샘플문을 찾는 검색이 수행된다.If the MT model-based language model probability is given by P (e ^I ₁ ) (FIG. 3, model (24)) with the translation model probability P (f ^J ₁ | e ^I ₁ ) A search is performed to find a translation sample with probability P (f ^J ₁ | e ^I ₁ ) P (e ^I ₁ ).

상기 입력 문장과 구절이 주어지면 발생되는 번역은 다음과 같을 것이다.Given the input sentences and phrases, the translation generated would be:

이 예에서는 단어 "Wheeling"이 학습 코퍼스에 나타나 있지 않더라도, 사용자가 "사용자 필드 맞춤 모듈"(도 1, 모듈(12))을 통해 그 단어를 입력하고 나면 시스템은 그 단어를 정확하게 번역할 수 있다. 더욱이 단어 클래스를 알고 있으므로(이 예에서는 "@PLACE.city") 시스템은 주변 단어들에 대해 더 좋은 번역을 선택할 수 있으며 이 단어들을 번역 출력에서 올바로 배열할 것이다,In this example, even though the word "Wheeling" does not appear in the learning corpus, once the user has entered the word through the " user field customization module "(FIG. 1, module 12) . Furthermore, since we know the word class ("@ PLACE.city" in this example), the system will be able to select a better translation for the surrounding words and arrange them correctly in the translation output,

다국어 코퍼스의 병렬 태깅Parallel Tagging of Multilingual Corpus

실시예에서 학습 코퍼스의 각 측에 단일 언어 태거를 독립적으로 태깅하고 각 문장 쌍에서 일관성이 없는 라벨을 제거함으로써 라벨 부착 병렬 코퍼스가 얻어진다. 이 방식에서는 각 문장 쌍(Sa, Sb)에 대해서 최대 조건 확률 P(Ta, Sa) 및 P(Tb, Sb)를 가진 라벨 시퀀스 쌍(Ta, Tb)이 선택된다. 임의의 클래스 태그의 발생수가 P(Ta, Sa)와 P(Tb, Sb) 간에 서로 다르다면 그 클래스 태그는 라벨 시퀀스 쌍(Ta, Tb)에서 제거된다. P(Ta, Sa)와 P(Tb, Sb)를 평가하는 한 가지 방법은 조건적 무작위(random) 필드 기반 태깅 모델(Lafferty01)을 적용하는 것이다. 단일 언어 태깅 중에 이용되는 피처(feature)의 예는 도 11에 나타나 있다.In an embodiment, labeled parallel corpus is obtained by independently tagging a single language tag to each side of the learning corpus and removing inconsistent labels in each sentence pair. In this method, the label sequence pair Ta, Tb having the maximum condition probability P (Ta, Sa) and P (Tb, Sb) is selected for each sentence pair Sa, Sb. If the number of occurrences of an arbitrary class tag is different between P (Ta, Sa) and P (Tb, Sb), the class tag is removed from the label sequence pair (Ta, Tb). One way to evaluate P (Ta, Sa) and P (Tb, Sb) is to apply a conditional random field-based tagging model (Lafferty01). An example of the features used during single language tagging is shown in FIG.

실시예에서 문장 쌍들에 대한 라벨링 일관성은 단일 언어 피처 이외에도 단어 정렬(도 11에서 wb,j)에서 추출된 목적 단어를 이용하면 더 개선될 수 있다.In an embodiment, labeling consistency for sentence pairs can be further improved using a target word extracted from word alignment (wb, j in FIG. 11) in addition to a single language feature.

다른 실시예에서 번역 쌍 내의 문장 모두에는 클래스 태그 세트가 동일한 것이어야 한다는 제한을 적용하여 공동으로 라벨이 붙는다. 특히 문장 쌍(Sa, Sb) 대해서는 공동 최대 조건 확률을 최대화하는 라벨 시퀀스 쌍(Ta, Tb)을 검색한다.In another embodiment, all of the sentences in the translation pair are labeled jointly by applying a restriction that the set of class tags should be the same. Specifically, the pair of labels Sa and Sb is searched for a pair of labels Ta and Tb that maximizes the joint maximum condition probability.

λaP(Ta, Sa)·λbP(Tb, Sb) 여기서, Oi(Ta)=Oi(Tb), 1≤i≤M? aP (Ta, Sa)? bp (Tb, Sb) where Oi (Ta) = Oi (Tb)

Oi(Ta) 라벨 시퀀스 Ta에서의 클래스 태그 i의 발생 수Oi (Ta) Number of occurrences of class tag i in label sequence Ta

(실체 수, 단어 수는 아님) (Actual number, not word count)

M 클래스 총 수Total number of M classes

λa, λb 척도 인자λa, λb Scale factor

만일 단일 언어 모델들의 성능이 크게 다르다면 λa와 λb는 최적화되어 2개 언어 태깅 성능을 개선할 수 있다.If the performance of single language models is significantly different, λa and λb can be optimized to improve the tagging performance of the two languages.

실시예에서 특정 언어에 대해서는 수동 주석 부착 코퍼스가 이용될 수 없다면 학습 코퍼스 내의 문장 쌍들에 걸쳐 라벨을 알고 있는 제1 언어로부터 주석 무부착 언어로 라벨을 투영함으로써 라벨이 발생될 수 있다. 이를 위한 한 가지 방식은 「D. Yarowsky, G. Ngai and R. Wicentowski, "Inducting Multiligual Text Analysis TOOLS via Robust Projection across Alighed Corpora," In Proc. HLT, pages 161-168, 2001("Yarowsky01")」에 기재되어 있다.A label may be generated by projecting a label from a first language that knows the label across the sentence pairs in the learning corpus to the annotation-free language if the manual annotation-attached corpus is not available for the particular language in the embodiment. One way to do this is "D. Yarowsky, G. Ngai and R. Wicentowski, "Inductive Multiligual Text Analysis TOOLS via Robust Projection across Alighed Corpora," In Proc. HLT, pages 161-168, 2001 ("Yarowsky01").

예시적인 시스템 및 클래스 기반 기계 번역의 평가Evaluation of example systems and class-based machine translation

실험적 평가를 통해, 전술한 바와 같은 클래스 기반 기계 번역이 종래 방식에 비해 번역 성능을 개선함을 보여준다. 더욱이 2.2.2 절에 기재된 병렬 태깅 방식을 이용하여 번역 정확도가 더 개선됨을 보여준다.Experimental evaluation shows that class-based machine translation as described above improves translation performance over the conventional approach. Moreover, the accuracy of translation is improved by using the parallel tagging method described in Section 2.2.2.

관광 분야용으로 개발된 일영 번역 시스템을 평가하였다. 학습 및 테스트 데이터에 대한 설명을 표 1에 나타낸다.We evaluated the Japanese translation system developed for tourism. Table 1 shows the description of the learning and test data.

영어English 일어Japanese 병렬 학습 코퍼스Parallel learning corpus 문장 쌍 수Number of sentence pairs 400k400k 토큰 수Number of tokens 3,257k3,257k 3,171k3,171k 평균 문장 길이Average sentence length 8.78.7 8.58.5 수동 태그 부착 학습 데이터(위 데이터의 서브세트)Manual tagged learning data (a subset of the above data) 학습(no. 문장 쌍)Learning (pair of sentences) 1260012600 헬드 아웃(held-out) 테스트(no. 문장 쌍)Held-out test (no. Sentence pair) 14001400 테스트 세트Test set 문장 쌍 수Number of sentence pairs 600600 토큰 수Number of tokens 43934393 46694669 평균 문장 길이Average sentence length 7.37.3 7.87.8 OOV율OOV rate 0.3%0.3% 0.5%0.5%

학습 및 테스트 데이터Learning and testing data

효과적인 클래스 기반 SMT를 구현하기 위해서는 문장 쌍에 대한 정확하고 일관된 태깅이 필수적이다. 태깅 품질을 개선하는 2가지 방식을 조사하였다. 이들 방식은 첫째, 단어 정렬로부터 2개 언어 피처를 도입하는 것과, 둘째, 문장 쌍의 양 측에 공동으로 태그를 부착하는 2개 언어 태깅이다. 병렬 학습 코퍼스 중에서 14,000개의 문장 쌍에 표 2에 나타낸 16개의 클래스 라벨을 이용하여 수동으로 태그를 부착하였다.Accurate and consistent tagging of sentence pairs is essential to implementing an effective class-based SMT. We investigated two ways to improve the quality of tagging. These methods are: first, introducing two language features from word alignment; and second, two-language tagging, jointly tagging both sides of a sentence pair. In the parallel learning corpus, 14,000 sentence pairs were manually tagged using the 16 class labels shown in Table 2.

클래스class 클래스 라벨Class Labels 번호number 기수, 서수, 시퀀스, 문자Radix, ordinal, sequence, character 시간time 시각, 날짜, 일, 달Time, date, day, month 사람Person 이름, 성First Name, Last Name 장소Place 도시, 시골, 랜드마크City, Countryside, Landmark 기관Agency 항공사, 호텔, 회사명Airlines, hotels, companies

평가 시스템에 이용된 클래스Classes used in the evaluation system

이 수동 라벨 부착 세트 중에서 태깅 정확도를 평가하기 위해 하나 이상의 태그를 헬드 아웃(held-out) 데이터로서 포함하는 10%(1400개 문장 쌍)를 선택하였다.Of these sets of manual labeling, 10% (1400 sentence pairs) were selected that included one or more tags as held-out data to evaluate tagging accuracy.

먼저, 기준 단일 언어 CRF-기반 태거의 성능을 평가하였다. 헬드 아웃 데이터 세트의 각 측에는 언어 종속 모델을 이용하여 독립적으로 라벨을 부착하였다. 그런 다음에 출력을 매뉴얼 기준과 비교하였다. 여러 가지 계량에 대한 태깅 정확도는 표 3에 나타낸다.First, we evaluated the performance of a reference monolingual CRF-based tagger. Each side of the handout data set was labeled independently using a language dependent model. The output was then compared to the manual reference. Table 3 shows the tagging accuracy for various weighings.

태깅 방식
Tagging method
영어English 일어Japanese 2개어Two % 올바로 태그된 문장 쌍
% Correctly tagged sentence pair
PP RR FF PP RR FF PP RR FF 단일 언어Single language 0.950.95 0.890.89 0.920.92 0.940.94 0.880.88 0.910.91 0.880.88 0.800.80 0.840.84 80%80% + 정렬 피처+ Sort feature 0.970.97 0.850.85 0.910.91 0.980.98 0.930.93 0.950.95 0.950.95 0.820.82 0.880.88 82%82% + 무일관 태그 제거+ Remove non-consistent tags 0.990.99 0.830.83 0.900.90 0.990.99 0.820.82 0.900.90 0.990.99 0.810.81 0.890.89 82%82% 2개 언어 태깅Two language tagging 0.980.98 0.920.92 0.950.95 0.980.98 0.920.92 0.950.95 0.970.97 0.900.90 0.930.93 92%92% + 정렬 피처+ Sort feature 0.980.98 0.930.93 0.960.96 0.980.98 0.930.93 0.960.96 0.980.98 0.920.92 0.950.95 92%92%

헬드 아웃 학습 세트에 대한 단일 언어 및 2개 언어 태깅 정확도Single-language and two-language tagging accuracy for hands-off learning sets

2개 언어 태깅에 있어서는 실체에 코퍼스의 양측에 라벨이 올바르게 부착되면 태그가 정확한 것으로 생각된다. 우측 칼럼은 양측에 태그가 올바르게 부착된 문장 쌍의 비율을 나타낸다. 독립적인 언어에 대해서는 F 스코어가 0.90 이상이지만 2개 언어 태깅 정확도는 0.84로서 훨씬 더 낮고 문장 쌍들 중 80%만이 정확하게 태가 부착되었다. 정렬 피처를 단일 언어 태거와 통합하면 양 언어에 대한 정확도가 향상되었고, 일본어 측에 대한 리콜(recall)이 크게 향상되었으나, 정확하게 태그된 문장 쌍의 비율은 약간만 증가하였다. 문장 쌍에 대한 무일관 태그를 제거하면 정확도가 향상되었으나 정확하게 태그된 문장 쌍의 수는 증가하지 않았다.For two-language tagging, the tag is considered correct if the label is correctly attached to both sides of the corpus to the entity. The right column shows the percentage of sentence pairs to which tags are correctly attached on both sides. For independent languages, the F score is greater than 0.90, but the two-language tagging accuracy is much lower, at 0.84, and only 80% of the sentence pairs are correctly tagged. Integrating the alignment feature with a single language tagger improves accuracy for both languages and significantly improves recall for the Japanese side, but the proportion of correctly tagged sentence pairs has only slightly increased. Removing the non-coherent tag for the sentence pair improved the accuracy but did not increase the number of correctly tagged sentence pairs.

다음, 2개 언어 태깅의 효과를 전술한 방식을 이용하여 평가하였다. 이 방식의 태깅 정확도는 단어 정령 피처가 통합되었을 때에 표 3의 하단 2개 행에 나타낸다. 단일 언어의 경우와 비교해서 2개 언어 태깅은 태깅 정확도를 크게 향상시켰다. 태깅 일관성이 향상될 뿐만 아니라(2개 언어에 대한 F 스코어는 0.84에서 0.95로 증가되었음) 영어와 일본어 양측의 태깅 정확도도 증가하였다. 단어 정렬 피처를 통합하면 모든 측정에서 태깅 정확도가 조금 더 향상되었다.Next, the effects of the two-language tagging were evaluated using the above-described method. The tagging accuracy of this method is shown in the bottom two rows of Table 3 when the word order feature is incorporated. Compared with the case of a single language, the two-language tagging greatly improved the tagging accuracy. In addition to improving tagging consistency (F scores for two languages increased from 0.84 to 0.95), the tagging accuracy of both English and Japanese has also increased. Consolidating the word alignment features slightly improved tagging accuracy in all measurements.

3개의 클래스 기반 시스템의 성능과 클래스 모델을 이용하지 않은 기준 시스템의 성능을 비교하여 시스템의 효과를 평가하였다.We compared the performance of three class - based systems and the performance of the reference system without the class model to evaluate the effectiveness of the system.

기준 시스템에 대해서는 Koehn05에 기재된 것과 같은 Moses toolkit와 (「Franz Josef Och, Hermann Ney. "A Systematic Comparison of Various Statistical Alignment Models", Computational Linguistics, volume 29, number 1, pp. 19-51 March 2003」에서 이용된 것과 같은) GIZA++을 이용하여 구절 기반 번역 모델을 학습시켰다. 3-gram 언어 모델은 「A. Stolcke "SRILM - an extensible language modeling toolkit", In Proc. of ICSLP, pp. 901-904, 2002」의 SRILM toolkit를 이용하여 학습시켰다. 디코딩은 본 출원인 PanDoRA 디코더를 이용하여 수행되었다. 이 디코더는 「Ying Zhang, Stephan Vogel, "PanDoRA: A Large-scale Two-way Statistical Machine Translation system for Hand-Held Devices," In the Proceedings of MT Summit XI, Copenhagen, Denmark, Sep. 10-14 2007」에 기재되어 있다. 시스템들은 표 1에 기재된 학습 세트를 이용하여 양 번역 방향 J→E(일어를 영어로)와 E→J(영얼를 일어로) 양자에 대해 작성되었다. 목적 언어 모델을 학습시키는데 이용된 데이터는 이 코퍼스로 한정되었다. 기준 시스템의 번역 품질은 600개 문장의 테스트 세트에 대해 평가하였다. 평가 중에 하나의 참조를 이용하였다. J→E와 E→J 시스템에 대한 BLEU 스코어는 각각 0.4381과 0.3947이었다. BLEU 스코어는 「Kishore Papineni, Salim Roukos, Todd Ward and Wei-Jing Zhu "BLEU: a Method for Automatic Evaluation of Machine Translation," In Proc. Association for Computational Linguistics, pp. 311-318, 2002」에 기재되어 있다. 다음과 같은 3가지 서로 다른 태깅 방식을 이용하여 번역 품질을 평가하였다.Reference systems are described in detail in the Moses toolkit as described in Koehn 05 (" Franz Josef Och, Hermann Ney. A Systematic Comparison of Various Statistical Alignment Models ", Computational Linguistics, volume 29, number 1, pp. 19-51 March 2003 GIZA ++ was used to study the phrase - based translation model. The 3-gram language model is "A. Stolcke "SRILM - an extensible language modeling toolkit ", In Proc. of ICSLP, pp. 901-904, 2002 ". The decoding was performed using the PanDoRA decoder of the present application. This decoder is described in "Ying Zhang, Stephan Vogel," PanDoRA: A Large-scale Two-way Statistical Machine Translation System for Hand-Held Devices, "In the Proceedings of MT Summit XI, Copenhagen, 10-14 2007 " The systems were written for both translation directions J → E (Japanese to English) and E → J (Japanese to Japanese) using the learning set described in Table 1. The data used to train the target language model was limited to this corpus. The translation quality of the reference system was evaluated against a test set of 600 sentences. One reference was used during the evaluation. The BLEU scores for the J → E and E → J systems were 0.4381 and 0.3947, respectively. The BLEU score is "Kishore Papineni, Salim Roukos, Todd Ward and Wei-Jing Zhu", BLEU: a Method for Automatic Evaluation of Machine Translation, "In Proc. Association for Computational Linguistics, pp. 311-318, 2002 ". We evaluated the translation quality using the following three different tagging methods.

+num: 번호, 시간에 관련된 8개 클래스+ num: Number, time related to 8 classes

+NE-class: 개체명에 대한 +8개 클래스 이상+ NE-class: More than +8 classes for object names

+Bi-Tagging: 16개 클래스 이상, 2개 언어로 태그된 학습 코퍼스+ Bi-Tagging: learning corpus tagged with more than 16 classes in 2 languages

+num과 +NE-class 경우에 대해서는 단일 언어 태깅을 적용하였고, 문장 쌍에서 일관성이 없는 태그는 제거하였다. +Bi-Tagging 경우에는 단어 정렬 피처를 통합한 2개 언어 태깅을 이용하였다. 각 태깅 방식에 대해서 전체 학습 코퍼스에 적당한 세트의 클래스 라벨이 태그되었다. 그런 다음에 클래스 기반 번역 및 언어 모델을 기준 시스템에서 이용한 것과 동일한 절차를 이용하여 학습시켰다. 테스트 중에 입력 문장에 단일 언어 태거를 이용하여 태그를 부착하였다. 테스트 세트 내의 모든 개체명은 번역 중에 사용될 사용자 사전에 입력되었다.For + num and + NE-class cases, single-language tagging was applied and inconsistent tags were removed in sentence pairs. In case of + Bi-Tagging, we used two language tagging which integrated word alignment features. For each tagging scheme, the appropriate set of class labels was tagged for the entire learning corpus. The class-based translation and language model was then taught using the same procedure used in the reference system. During the test, the tag was attached to the input sentence using a single language tag. All object names in the test set were entered into the user dictionary to be used during translation.

기준 시스템과 클래스 기반 시스템에 대한 600개 문장 테스트 세트에 대한 성능은 표 4에 J→E와 E→J 시스템에 대한 BLEU 스코어로 나타낸다.The performance of the 600-sentence test set for the reference system and the class-based system is shown in Table 4 as a BLEU score for the J → E and E → J systems.

시스템
system
번역 품질(BLEU [Papineni02])Translation quality (BLEU [Papineni02]) J→EJ → E E→J E → J 기준standard 0.43810.4381 0.39470.3947 +num+ num 0.44410.4441 0.41040.4104 +NE-class+ NE-class 0.50140.5014 0.44640.4464 +Bi-Tagging+ Bi-Tagging 0.50830.5083 0.45420.4542

클래스 기반 SMT의 번역 품질Translation quality of class-based SMT

번호와 시각 태그를 이용하는 클래스 기반 SMT 시스템(+num)은 양 번역 방향에 대해서 기준 시스템에 비해 향상된 번역 품질을 얻었다. 이들 모델에 대해서는 BLEU 스코어가 0.4441과 0.4104이었다. 번호와 시각 태그이외에 개체명 클래스를 이용하는 클래스 기반 시스템이 적용된 경우에는 번역 품질이 크게 향상되었다. J→E 시스템에 대해서는 BLEU 스코어가 0.5014이었고, E→J 시스템에 대해서는 BLEU 스코어가 0.4464이었다. 2개 언어 태깅을 이용하여 학습 코퍼스에 태그를 부착한 경우에는(+Bi-Tagging) 양 번역 방향에 대해 BLEU가 0.8 포인트 더 증가하였다. 하나 이상의 개체명을 포함한 테스트 세트에서의 문장들 중 14%에 대해서는 (+Bi-Tagging) 시스템은 단일 언어 태그 시스템("+NE-class")보다 3.5 BLEU 포인트만큼 성능이 더 우수하였다.A class - based SMT system (+ num) using number and time tags has improved translation quality compared to the reference system for both translation directions. For these models, the BLEU score was 0.4441 and 0.4104. In addition to the number and time tags, the translation quality is greatly improved when a class-based system using the object name class is applied. For the J → E system, the BLEU score was 0.5014 and for the E → J system, the BLEU score was 0.4464. When tagging the learning corpus using two language tagging (+ Bi-Tagging), BLEU increased by 0.8 points for both translation directions. For 14% of the sentences in the test set containing more than one entity name, the (+ Bi-Tagging) system performed better than the single-language tagging system ("NE-class") by 3.5 BLEU points.

지금까지 본 발명에 대해 매우 상세히 설명하였지만 도면과 상세한 실시예들은 발명의 설명을 위해서 제시된 것으로 본 발명을 한정하려는 것이 아님은 물론이다. 여러 가지로 설계나 구성 변경이 가능하지만 이들도 본 발명의 원리 내에 있는 것이다. 당업자라면 본 발명의 그러한 수정과 변형, 구성 요소의 조합, 변경, 등가물 또는 개선들도 첨부된 청구범위에 기재된 발명의 범위 내에 있다는 것을 알아야 한다.While the present invention has been described in considerable detail, it should be understood that the drawings and detailed description have been presented for purposes of illustration only and are not intended to limit the invention. Various design and configuration changes are possible, but these are also within the principles of the present invention. Those skilled in the art will appreciate that such modifications and variations of the present invention, combinations of elements, alterations, equivalents, or improvements are also within the scope of the invention as defined in the appended claims.

Claims

A method for updating a vocabulary of a speech translation system for translating a first language into a second language,
Receiving, by at least one microphone of the speech translation system, utterance from a user of the speech translation system, the speech translation system translating the speech from the first language into the second language, To output an audible translation of the speech in the second language from at least one speaker of the speech translation system;
After receiving said speech, from a user of said speech translation system, via a user interface of said speech translation system, in a first language lexicon of said first language of an automatic speech recognition module of said speech translation system, Receiving an indication to add a new word, the automatic speech recognition module for the speech translation system comprising: a first recognition lexicon; an acoustic model for the first language; and a language for the first language Model, wherein the new word is not included in the first recognized lexicon of the first language, the acoustic model, and the language model;
Determining, by the speech translation system, word class information for the new word, pronunciation in the first language, and translation in a second language;
The new word is translated by the speech translation system into the first recognition lexicon of the first language of the first language of the speech translation system with the word class information determined by the speech translation system and the pronunciation in the first language ;
And translating the new word into a first machine translation module associated with the first language of the speech translation system, with the word class information determined by the speech translation system and the translation into the second language, Wherein the first machine translation module includes a first tagging module, a first translation model, and a first language module, the translation module translating the new word into a corresponding translation word in the second language Configured -
Wherein the speech translation system comprises:

The method according to claim 1,
Adding the new word to the first recognition lexicon of the first language and adding the new word to the first machine translation module are performed without reinitializing or restarting the automatic speech recognition module In a voice translation system.

The method according to claim 1,
Wherein the step of adding the new word in the first machine translation module is performed without reinitializing or restarting the first machine translation module.

The method according to claim 1,
Translating the corresponding translation word from the second language back to the new word of the first language by a second machine translation module associated with the second language of the speech translation system;
Correlating the new word in the first language with the corresponding translation word in the second language, and adding the corresponding translation word and the word class information to a second recognition lexicon of the second language; And
Updating the second machine translation module with the corresponding translation word and the word class information, the second machine translation module including a second tagging module, a second translation model and a second language module,
Further comprising the steps of:

The method according to claim 1,
Inputting the corresponding translation word into a text-phonetic pronunciation lexicon associated with the second language
Further comprising the steps of:

5. The method of claim 4,
Inputting the new word into a text-to-speech pronunciation lexicon associated with the first language
Further comprising the steps of:

5. The method of claim 4,
Wherein the step of adding the corresponding translation word to the second recognition lexicon further comprises increasing the relative word probability of the new word in the class of the class based language model associated with the second language, A method of updating a lexicon of

The method according to claim 1,
Translating the new word of the first language into a corresponding translation word of the second language and one or more other languages;
Correlating the new word with a corresponding third or more words of the one or more other languages, respectively;
Adding the third or more words of the one or more other languages to an aware vocabulary associated with each of the one or more other languages of the speech translation system; And
Updating machine translation modules associated with the one or more other languages, each machine translation module including a tagging module, a translation model and a language module,
Further comprising the steps of:

The method according to claim 1,
Recognizing the new word in the speech received from the user using the reliability measurement and the new word model by the speech translation system; And
Prompting the user to add the new word to the user through the user interface of the voice translation system
Further comprising the steps of:

The method according to claim 1,
Wherein adding the new word to the first recognition lexicon of the first language further comprises increasing the relative word probability of the new word in the class of the class based language model associated with the first language. A method for updating a lexical item in a translation system.

11. The method of claim 10,
Increasing the relative word probability of the new word associated with the first language is performed outside the known class by associating the new word with an unknown class and increasing the probability within the class of the unknown word Of the speech translation system.

11. The method of claim 10,
Wherein adding the new word to the first recognition lexicon of the first language further comprises increasing the translation probability of the new word.

The method according to claim 1,
Wherein the step of determining the word class information comprises accepting word class information provided by the user.

The method according to claim 1,
Wherein the step of determining the word class information comprises selecting one or more possible descriptions from the dictionary associated with the speech translation system and displaying the one or more possible descriptions for user acceptance. .

The method according to claim 1,
Wherein the step of determining the word class information comprises automatically generating a hypothesis using a user field customization module of the speech translation system.

16. The method of claim 15,
Wherein the step of generating the sample statement is learned from the translated speech data.

16. The method of claim 15,
Further comprising selecting, by the speech translation system, a most probable word class for the new word based on co-occurrence statistics of other words having a similar known class How to update the vocabulary.

The method according to claim 1,
After said step of determining said pronunciation and said translation for said new word, said user interface of said speech translation system inducing said user to said pronunciation of said new word and said translation; And
Upon receipt of said pronunciation for said new word and verification of said translation from said user:
Adding the new word to the first recognized lexicon of the first language of the speech translation system, along with the word class information determined by the speech translation system and the pronunciation in the first language; And
Adding said new word to said first machine translation module associated with said first language of said speech translation system with said word class information determined by said speech translation system and said pronunciation in said second language;
Wherein the speech translation system comprises:

19. The method of claim 18,
Wherein the speech translation system determines the word class information, pronunciation and translation for the new word without inducing the user input requiring expertise in computer speech and language processing techniques How to update the vocabulary.

The method according to claim 1,
Wherein the speech translation system determines the word class information using the first tagging module.

21. The method of claim 20,
Wherein the speech translation system determines the word class information based on an ambient context for the new word.

The method according to claim 1,
Wherein the step of determining the pronunciation of the new word in the first language comprises determining the pronunciation of the new word in the first language by machine learning.

23. The method of claim 22,
Wherein the step of determining the translation of the new word in the second language comprises translating the translation of the new word in the second language by statistical machine translation based on the pronunciation of the new word in the first language And determining a vocabulary of the speech translation system.

24. The method of claim 23,
Determining a pronunciation of the new word in the second language based on the translation of the new word in the second language by the speech translation system
Further comprising the steps of:

In a field-maintainable class-based translation apparatus,
At least one microphone for receiving audible speech in a first language from a user of the device;
An automatic speech recognition module for the first language communicating with the at least one microphone, the automatic speech recognition module comprising a recognition vocabulary for the first language, an acoustic model for the first language, and a language model for the first language Automatic speech recognition module;
A machine translation module for communicating with the automatic speech recognition module for translating the output from the automatic speech recognition module in the first language into a second language;
A user interface for a user to input an indication to add a new word to the recognition lexicon of the automatic speech recognition module for the first language; And
A correction and calibration module for determining word class information for the new word, pronunciation in the first language, and translation in a second language,
/ RTI >
Wherein when the new word is not included in the recognition lexicon of the first language and is not included in the language model of the first language, the new word is updated by the word class information determined by the correction and correction module, With the pronunciation in one language being added to the recognition lexicon for the first language;
Wherein the new word is translated by the machine translation module into a corresponding translation word in the second language with the word class information determined by the correction and correction module and the translation in the second language, A field-maintainable class-based translation device that is added as a machine translation module.

26. The method of claim 25,
A second machine translation module associated with the second language for translating a second new word in the second language into a second translated word in the first language,
Further comprising: a field-maintainable class-based translation device.

26. The method of claim 25,
Wherein the user interface accepts an orthographic input in the language of the user.

26. The method of claim 25,
Wherein the language model of the first language is updated with at least one update based on a correction to an error made in the speech recognition sample sentence and the at least one update increases the language model probability of the corrected word sequence, and updating a probability in the language model for the first language of the automatic speech recognition module to reduce the likelihood of occurrence of the same error by reducing the probability of occurrence of a close-competing sample language model. Based translation device.

26. The method of claim 25,
An automatic speech recognition module for the second language for communicating with at least one microphone for recognizing speech in a second language received by the at least one microphone, An automatic speech recognition module including an acoustic model for two languages and a language model for the second language;
A machine translation module for communicating with the automatic speech recognition module for translating the output from the automatic speech recognition module in the second language into a first language; And
A second text-to-speech module for communicating with the machine translation module for the second language to generate an audible translation of the horses in the first language,
Further comprising: a field-maintainable class-based translation device.

26. The method of claim 25,
A text-to-speech module in communication with the machine translation module for generating an audible translation of the speech in the second language;
Further comprising: a field-maintainable class-based translation device.

31. The method of claim 30,
At least one speaker in communication with the text-to-speech module for outputting an audible translation of the speech in the second language
Further comprising: a field-maintainable class-based translation device.

delete