KR20110066622A

KR20110066622A - Apparatus and method of interpreting an international conference based speech recognition

Info

Publication number: KR20110066622A
Application number: KR1020090123354A
Authority: KR
Inventors: 강점자; 이윤근; 정호영; 이성주; 강병옥; 박기영; 김종진; 박전규; 왕지현; 전형배; 정의석; 정훈; 박상규
Original assignee: 한국전자통신연구원
Priority date: 2009-12-11
Filing date: 2009-12-11
Publication date: 2011-06-17
Also published as: KR101233655B1

Abstract

PURPOSE: An international interpretation device and method thereof based on voice recognition are provided to supply text data or a synthetic voice which is interpreted in a native language to attendees. CONSTITUTION: A conference participant information registering unit(100) registers conference participant information including the language used by a conference participant. A voice recognition unit(200) registers a keyword according to the conference participant contents of presentations in advance. The voice recognition circuit outputs a voice recognition result of a keyword form. A language interpreting unit(300) performs conversion to a target language corresponding to a using language per conference participants.

Description

Apparatus and method of interpreting an international conference based speech recognition}

본 발명은 음성인식 기반 국제회의 통역 장치 및 방법에 관한 것으로, 보다 상세하게는 서로 다른 언어를 사용하는 국제 회의 참석자들의 원활한 의사소통을 지원하는 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for interpreting speech based international conferences, and more particularly, to an apparatus and method for supporting smooth communication among participants of international conferences using different languages.

본 발명은 지식경제부 및 정보통신연구진흥원의 IT성장동력기술개발사업의 일환으로 수행한 연구로부터 도출된 것이다[과제관리번호: 2006-S-036-04, 과제명: 신성장동력산업용 대용량 대화형 분산 처리 음성인터페이스 기술개발].The present invention is derived from the research conducted as part of the IT growth engine technology development project of the Ministry of Knowledge Economy and the Ministry of Information and Communication Research and Development. [Task management number: 2006-S-036-04, Task name: Large-capacity interactive distribution for new growth engine industries] Development of processing voice interface technology].

일반적 국제 회의는 발표자가 사용하는 언어를 동시 통역해 줄 수 있는 통역사를 사용하여 회의 진행을 하거나, 세계 공용 언어인 영어를 사용하여 회의를 진행하는 것이 일반화 되어 있다. 이런 경우, 해당 언어를 통역해 줄 수 있는 동시 통역사가 필요하거나 공용어인 영어를 사용하는 경우, 회의 참석자의 영어 수준에 따라 이해도가 천차만별해 진다.In general international conferences, it is common to conduct a conference using an interpreter who can simultaneously interpret the language used by the presenter, or to hold a conference using English, which is a global language. In this case, if you need a simultaneous interpreter who can translate the language, or if you use English as the official language, your understanding will vary greatly depending on the English level of the meeting attendees.

본 발명은 상기한 종래의 사정을 감안하여 제안된 것으로, 영어 이해도가 다른 다양한 회의 참석자들에게 회의에 언급되는 주요 키워드들을 이해할 수 있도록 다양한 언어로 통역해 주는 음성인식 기반 국제회의 통역 장치 및 방법을 제공함에 그 목적이 있다.SUMMARY OF THE INVENTION The present invention has been proposed in view of the above-mentioned conventional circumstances, and provides a speech recognition-based international conference interpretation apparatus and method for interpreting in various languages so that various meeting participants with different English understandings can understand key keywords mentioned in the meeting. The purpose is to provide.

상기와 같은 목적을 달성하기 위하여 본 발명의 바람직한 실시양태에 따른 음성인식 기반 국제회의 통역 장치는, 다자간 회의에 참가하는 회의 참가자의 사용 언어를 포함하는 회의 참가자 정보를 회의 참가자별로 미리 등록하는 회의 참가자 정보 등록부; 회의 참가자별로의 발표 내용에 따른 핵심어를 미리 등록하고, 회의 참가자의 발표에 수반하는 음성을 미리 등록된 핵심어를 기반으로 인식하여 핵심어 형태의 음성인식 결과를 출력하는 음성 인식부; 및 핵심어 형태의 음성인식 결과를 분석하여 미리 등록된 회의 참가자별로의 사용 언어에 대응되는 타겟 언어로 변환하여 출력하는 언어 번역부를 포함한다.In order to achieve the above object, a voice recognition-based international conference interpretation device according to a preferred embodiment of the present invention includes a conference participant who pre-registers conference participant information including a language of a conference participant participating in a multi-party conference. An information register; A speech recognition unit that registers a key word according to the presentation content of each meeting participant in advance and recognizes a voice accompanying the conference participant based on a pre-registered key word and outputs a result of speech recognition in the form of a key word; And a language translator configured to analyze the voice recognition result in the form of a key word and convert the result into a target language corresponding to a language used for each pre-registered conference participant.

음성 인식부는, 회의 참가자별로의 발표 내용에서 핵심어를 추출하여 핵심어 데이터베이스에 저장하는 핵심어 추출부; 회의 참가자의 발표에 수반하는 음성을 수신하는 음성 수신부; 수신된 음성에 대한 특징 벡터를 추출하는 전처리부; 및 추출된 특징 벡터를 디코딩하되, 핵심어 데이터베이스에 저장된 핵심어를 기반으로 핵심어 형태의 음성인식 결과를 출력하는 디코딩부를 포함한다.The speech recognition unit may include a key word extracting unit extracting a key word from the presentation content for each meeting participant and storing the key word in a key word database; A voice receiver for receiving a voice accompanying the presentation of the conference participant; A preprocessor extracting a feature vector for the received voice; And a decoding unit for decoding the extracted feature vector and outputting a speech recognition result in the form of a keyword based on a keyword stored in a keyword database.

음성 인식부는 핵심어를 다자간 회의전에 회의 참가별로 미리 등록한다.The speech recognition unit registers key words in advance by conference participation before the multi-party conference.

음성 인식부에 미리 등록되는 핵심어는 주어, 명사, 동사를 포함한다.Key words registered in advance in the speech recognition unit include subjects, nouns, and verbs.

음성 인식부는 핵심어를 기본으로 한 텍스트 형태의 결과물을 출력한다.The speech recognition unit outputs a textual result based on a key word.

언어 번역부는, 음성 인식부로부터의 음성인식 결과를 수신하여 입력된 언어의 종류를 분석하는 언어 분석부; 회의 참가자들의 사용 언어를 분석하는 사용자 언어 정보 분석부; 및 수신된 음성인식 결과에 대해 대역사전을 기초로 입력 언어 대 출력 언어로 대응시키는 매핑을 회의 참가자별로 수행하여 회의 참가자별 타겟 언어로 변환하는 변환부를 포함한다.The language translator may include: a language analyzer configured to receive a voice recognition result from the voice recognizer and analyze a type of input language; A user language information analyzer for analyzing a language used by conference participants; And a converting unit converting the received speech recognition result into a target language for each conference participant by performing a mapping corresponding to an input language to an output language based on the band dictionary for each conference participant.

언어 번역부는 합성음 또는 텍스트의 형태의 출력물을 출력한다.The language translation unit outputs the output in the form of synthesized sound or text.

음성인식 기반 국제회의 통역 장치는 통신 네트워크를 통해 원격지의 회의 참가자간의 다자간 회의를 통역중계한다Voice recognition-based international conference interpreter translates multi-party conferences between remote conference participants through a communication network

본 발명의 바람직한 실시양태에 따른 음성인식 기반 국제회의 통역 방법은, 회의 참가자 정보 등록부가, 다자간 회의에 참가하는 회의 참가자의 사용 언어를 포함하는 회의 참가자 정보를 회의 참가자별로 미리 등록하는 회의 참가자 정보 등록 단계; 음성 인식부가, 회의 참가자별로의 발표 내용에 따른 핵심어를 미리 등록하고, 회의 참가자의 발표에 수반하는 음성을 상기 미리 등록된 핵심어를 기반으로 인식하여 핵심어 형태의 음성인식 결과를 출력하는 음성 인식 단계; 및 언어 번역부가, 핵심어 형태의 음성인식 결과를 분석하여 미리 등록된 회의 참가자별로의 사용 언어에 대응되는 타겟 언어로 변환하여 출력하는 언어 번역 단계를 포함한다.In the voice recognition-based international conference interpretation method according to a preferred embodiment of the present invention, the conference participant information registration unit registers conference participant information including the language of the conference participant participating in the multi-party conference in advance for each conference participant. step; A voice recognition step of registering a key word according to the presentation content of each meeting participant in advance, and recognizing a voice accompanying the presentation of the meeting participant based on the pre-registered key word and outputting a speech recognition result in the form of a key word; And a language translation step of analyzing, by the language translation unit, the voice recognition result in the form of a core word, converting the result into a target language corresponding to a language used for each conference participant registered in advance.

음성 인식 단계는, 회의 참가자별로의 발표 내용에서 핵심어를 추출하여 핵심어 데이터베이스에 저장하는 핵심어 추출 단계; 회의 참가자의 발표에 수반하는 음성을 수신하는 음성 수신 단계; 수신된 음성에 대한 특징 벡터를 추출하는 전처리 단계; 및 추출된 특징 벡터를 디코딩하되, 핵심어 데이터베이스에 저장된 핵심어를 기반으로 핵심어 형태의 음성인식 결과를 출력하는 디코딩 단계를 포함한다.The speech recognition step may include: extracting a key word from a presentation content of each conference participant and storing the key word in a key word database; A voice reception step of receiving a voice accompanying the presentation of the conference participant; A preprocessing step of extracting a feature vector for the received speech; And decoding the extracted feature vector, and outputting a speech recognition result in the form of a keyword based on the keyword stored in the keyword database.

음성 인식 단계는 다자간 회의전에 회의 참가별로 핵심어를 미리 등록한다.The speech recognition step registers key words in advance for each meeting participation before the multi-party meeting.

음성 인식 단계에서 미리 등록되는 핵심어는 주어, 명사, 동사를 포함한다.Key words registered in advance in the speech recognition phase include subjects, nouns, and verbs.

음성 인식 단계에 의한 음성인식 결과는 핵심어를 기본으로 한 텍스트 형태의 결과물이다.The speech recognition result by the speech recognition step is a textual result based on the key word.

언어 번역 단계는, 음성 인식 단계에 의한 음성인식 결과를 수신하여 입력된 언어의 종류를 분석하는 언어 분석 단계; 회의 참가자들의 사용 언어를 분석하는 사용자 언어 정보 분석 단계; 및 수신된 음성인식 결과에 대해 대역사전을 기초로 입력 언어 대 출력 언어로 대응시키는 매핑을 회의 참가자별로 수행하여 회의 참가자별 타겟 언어로 변환하는 변환 단계를 포함한다.The language translation step may include: a language analysis step of receiving a voice recognition result by the voice recognition step and analyzing a type of input language; Analyzing user language information for analyzing a language used by conference participants; And converting the received speech recognition result into a target language for each conference participant by performing mapping corresponding to an input language to an output language based on the band dictionary for each conference participant.

언어 번역 단계에서는 합성음 또는 텍스트의 형태의 출력물을 출력한다.In the language translation step, an output in the form of synthesized sound or text is output.

이러한 구성의 본 발명에 따르면, 국제 회의 발표자에게는 자유로운 모국어 구사가 가능하도록 하고, 참석자들에게는 모국어로 통역된 텍스트 데이터 또는 합성음을 제공해 줌으로써, 원활한 국제 회의 진행 및 참석자들의 회의 내용에 대한 이해도를 높일 수 있다.According to the present invention, the presenter can freely speak his / her native language and provide attendees with text data or synthesized voices translated into their native languages, thereby facilitating the progress of the international conference and understanding of the participants. have.

이하, 첨부된 도면을 참조하여 본 발명의 실시예에 따른 음성인식 기반 국제회의 통역 장치 및 방법에 대하여 설명하면 다음과 같다. 본 발명의 상세한 설명에 앞서, 이하에서 설명되는 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니된다. 따라서, 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일실시예에 불과할 뿐이고 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형예들이 있을 수 있음을 이해하여야 한다.Hereinafter, an apparatus and method for interpreting voice recognition based international conference according to an embodiment of the present invention will be described with reference to the accompanying drawings. Prior to the detailed description of the present invention, the terms or words used in the specification and claims described below should not be construed as being limited to the ordinary or dictionary meanings. Therefore, the embodiments described in the specification and the drawings shown in the drawings are only one of the most preferred embodiments of the present invention and do not represent all of the technical idea of the present invention, various modifications that can be replaced at the time of the present application It should be understood that there may be equivalents and variations.

도 1은 본 발명의 실시예에 따른 음성인식 기반 국제회의 통역 장치가 적용된 시스템을 설명하기 위한 도면이다.1 is a view for explaining a system to which the speech recognition based international conference interpretation apparatus according to an embodiment of the present invention is applied.

본 발명의 실시예의 장치(15)는 통신 네트워크(예컨대, 인터넷(14))에 연결된다. 인터넷(14)에 연결된 원격지의 회의 참가자는 자신의 단말기(11, 12, 13중의 어느 하나)를 통해 본 발명의 실시예의 장치(15)에 접속하여 음성인식 기반 국제회의 통역 서비스를 제공받을 수 있다. The device 15 of the embodiment of the present invention is connected to a communication network (eg, the Internet 14). A conference participant at a remote location connected to the Internet 14 may access a device 15 according to an embodiment of the present invention through his terminal 11, 12, or 13 and receive a voice recognition based international conference interpretation service. .

단말기(11, 12, 13)는 인터넷(14)을 통해 본 발명의 실시예의 장치(15)로의 접속이 가능하고, 음성 입력 및 출력, 텍스트의 화면 출력 등이 가능한 단말기이면 된다.The terminals 11, 12, and 13 may be connected to the device 15 of the embodiment of the present invention via the Internet 14, and may be a terminal capable of inputting and outputting voice and outputting a screen of text.

본 발명의 실시예의 장치(15)는 다자간의 국제회의시 음성인식을 기반으로 국제회의 통역을 행한다. 장치(15)에 대한 보다 자세한 설명은 후술한다.The apparatus 15 of the embodiment of the present invention interprets the international conference based on voice recognition during the multilateral international conference. A more detailed description of the device 15 will be given later.

도 1의 경우, 회의 시작전에 회의 참가자에게 사용하는 언어를 등록하도록 한다. In the case of Fig. 1, the language to be used is registered to the meeting participants before the meeting starts.

그 후, 임의의 회의 참석자 그룹의 참석자가 단말기(예컨대, 11)를 통해 한국어를 사용하여 “안녕하세요 저는 강점자입니다”라고 말하면, 이 음성 데이터는 인터넷(14)을 통해 본 발명의 실시예의 장치(15)에게로 전달된다(16, 17). Then, if a participant of any group of conference attendees speaks "Hello I am a strong point" using Korean via a terminal (e.g. 11), this voice data is transmitted via the Internet 14 to the device of the embodiment of the present invention. 15) (16, 17).

본 발명의 실시예의 장치(15)는 수신한 “안녕하세요. 저는 강점자입니다.”라는 음성 데이터에 대한 음성인식을 수행한다. The device 15 of the embodiment of the present invention receives the "Hi. I'm a strong person. ”We perform voice recognition on the voice data.

본 발명의 실시예의 장치(15)는 음성인식 수행 결과 “안녕 저 강점자”로 인식하게 된다. 장치(15)는 이렇게 인식된 핵심어 기반 인식 결과를 목표(target)언어로 번역하여 그의 결과를 전달한다(18, 19, 20, 21). Apparatus 15 of the embodiment of the present invention is recognized as a "hi low strength" as a result of speech recognition. The device 15 translates the recognized key word based recognition result into a target language and delivers the result (18, 19, 20, 21).

예를 들어, 단말기(11, 13)의 회의 참가자가 미리 등록시킨 사용 언어가 영어인 경우에는 해당 회의 참자자는 영어로 “Hello, I’m JeomJakang” 와 같은 서비스를 받게 된다(20, 21). 만약, 단말기(12)의 회의 참석자가 미리 등록시킨 사용 언어가 일본어인 경우에는 해당 회의 참가자는 일본어로 “こんにちは。私はカン　ジョンジャと申します” 와 같은 서비스를 받게 된다(19). For example, if a language used in advance by a conference participant of the terminals 11 and 13 is English, the conference participant receives a service such as “Hello, I'm JeomJakang” in English (20, 21). If the conference attendee of the terminal 12 registers a language used in advance in Japanese, the conference participant receives a service such as “Konnichiha. Hakankanjijotojimasu” in Japanese (19).

이 때, 화면에 출력되는 통역 결과는 화자가 발성한 문장 형태가 아닌 핵심단어만을 출력하는 키워드(핵심어) 출력 방식을 사용한다. 키워드 출력방식을 사용하는 이유는 대화체 연속어 인식의 경우 인식성공률이 떨어지기 때문에 회의 진행에 중요한 키워드 인식 및 출력을 기반으로 한다. 현재, 대화체 전화 음성 기반으 로 개발된 AT&T LVCSR(Large Vocabulary Continues Speech Recognition) 시스템의 단어 인식률은 71.6%로 인식 성공률이 저조한 상태이다.At this time, the interpretation result displayed on the screen uses a keyword (key word) output method that outputs only the core word, not the sentence type spoken by the speaker. The reason for using the keyword output method is based on the keyword recognition and output which is important for the proceeding of the conference because the recognition success rate decreases in the case of dialogue continuous word recognition. Currently, the word recognition rate of AT & T Large Vocabulary Continues Speech Recognition (AT & T LVCSR) system, which is developed based on conversational phone speech, is 71.6%, indicating a poor recognition success rate.

상술한 본 발명의 실시예의 장치(15)는 오프라인에서도 유용하게 사용할 수 있다. The device 15 of the embodiment of the present invention described above can be usefully used even offline.

도 2는 본 발명의 실시예에 따른 음성인식 기반 국제회의 통역 장치의 구성을 나타낸 블록도이다.Figure 2 is a block diagram showing the configuration of the speech recognition based international conference interpretation apparatus according to an embodiment of the present invention.

본 발명의 실시예에 따른 음성인식 기반 국제회의 통역 장치는, 회의 참가자 정보 등록부(100), 호 제어부(102), 음성 인식부(200), 및 언어 번역부(300)를 포함한다.Voice recognition-based international conference interpretation device according to an embodiment of the present invention, the meeting participant information registration unit 100, call control unit 102, voice recognition unit 200, and language translation unit 300.

회의 참가자 정보 등록부(100)는 다자간 회의에 참가하는 회의 참가자의 사용 언어를 포함하는 회의 참가자 정보를 회의 참가자별로 미리 등록한다. 여기서, 회의 참가자 정보는 사용 언어, 회의 참가자의 신상 내역과 ID 및 패스워드 등을 포함한다.The conference participant information registration unit 100 registers conference participant information including a language of the conference participant participating in the multi-party conference in advance for each conference participant. Here, the meeting participant information includes a language used, details of the meeting participant, ID and password, and the like.

호 제어부(102)는 다자간 회의에 필요한 호 제어를 행한다.The call control unit 102 performs call control necessary for the multiparty conference.

음성 인식부(200)는 회의 참가자별로의 발표 내용에 따른 핵심어(예컨대, 주 어, 명사, 동사를 포함)를 미리 등록하고, 회의 참가자의 발표에 수반하는 음성을 미리 등록된 핵심어를 기반으로 인식하여 핵심어 형태의 음성인식 결과를 출력한다. The voice recognition unit 200 registers key words (eg, including words, nouns, and verbs) according to the presentation content of each meeting participant in advance, and recognizes a voice accompanying the presentation of the meeting participant based on the pre-registered key words. Outputs speech recognition results in the form of key words.

음성 인식부(200)는 음성 수신부(202), 전처리부(204), 디코딩부(206), 핵심어 추출부(208), 핵심어 데이터베이스(210), 및 음향 모델 데이터베이스(212)를 포함한다. 음성 수신부(202)는 회의 참가자의 발표에 수반하는 음성을 수신하여 버퍼링한다. 전처리부(204)는 음성 수신부(202)로부터의 음성 데이터에 섞여 있는 잡음을 제거하고, 잡음이 제거된 데이터로부터 음성 구간에 해당하는 시작점과 끝점을 추출하여 특징 벡터를 추출한다. 핵심어 추출부(208)는 다자간 회의 시작전에 입력되는 회의 참가자별로의 발표 내용에서 핵심어를 추출하여 핵심어 데이터베이스(210)에 저장한다. 핵심어 추출부(208)는 회의 참가자 정보 등록부(100) 또는 호 제어부(102)에서 입력되는 발표 내용에서 핵심어를 추출한다. 물론, 회의 참가자 정보 등록부(100) 또는 호 제어부(102)가 아닌 별도의 마이크를 통해 입력되는 발표 내용에서 핵심어를 추출하여도 된다. 이하의 본 발명의 명세서에서는 편의상 호 제어부(102)가 마이크(도시 생략)를 갖춘 것으로 가정하고 호 제어부(102)를 통해 미리 발표 내용이 음성 인식부(200)로 입력되고 핵심어 추출부(208)에서 핵심어를 추출하는 것으로 한다. 음향 모델 데이터베이스(212)는 은닉 마코프 기반으로 생성된 음성별 표준 패턴 음향 모델을 저장하고 있다. 디코딩부(206)는 전처리부(204)에서 추출된 특징 벡터를 디코딩하되, 핵심어 데이터베이스(210)에 저장된 핵심어 및 음향 모델 데이터베이스(212)의 표준 패턴 음향 모델 등을 기반으로 핵심어 형 태의 음성인식 결과를 출력한다. 바람직하게, 디코딩부(206)는 전처리부(204)에서 추출된 특징 벡터와 은닉 마코프 기반으로 생성된 표준 패턴 음향모델, 클래스기반 무한상태네트워크(FSN: Finite State Network), 및 사전을 사용하여 비터비 탐색을 수행하는 디코딩 기능을 수행한다. 디코딩 수행결과, 디코딩부(206)는 핵심어(키워드) 형태의 음성인식 결과를 출력한다. The speech recognizer 200 includes a speech receiver 202, a preprocessor 204, a decoder 206, a keyword extractor 208, a keyword database 210, and an acoustic model database 212. The voice receiver 202 receives and buffers the voice accompanying the presentation of the conference participant. The preprocessor 204 removes noise mixed in the voice data from the voice receiver 202, and extracts a feature vector by extracting a start point and an end point corresponding to the voice interval from the data from which the noise is removed. The key word extraction unit 208 extracts a key word from the presentation content for each meeting participant input before the start of the multi-party conference and stores the key word in the key word database 210. The key word extraction unit 208 extracts a key word from the presentation content input from the conference participant information registration unit 100 or the call control unit 102. Of course, the key word may be extracted from the presentation content input through a separate microphone instead of the conference participant information registration unit 100 or the call control unit 102. In the following description of the present invention, it is assumed that the call control unit 102 has a microphone (not shown) for convenience, and the announcement content is input to the voice recognition unit 200 through the call control unit 102 and the key word extraction unit 208 is provided. Let's extract the keywords from. The acoustic model database 212 stores standard speech patterns for each voice generated based on hidden Markov. The decoding unit 206 decodes the feature vector extracted from the preprocessor 204, and generates a speech recognition result in a keyword form based on a keyword stored in the keyword database 210 and a standard pattern acoustic model of the acoustic model database 212. Outputs Preferably, the decoding unit 206 uses a feature pattern extracted from the preprocessing unit 204 and a standard pattern acoustic model generated based on a hidden Markov, a class-based finite state network (FSN), and a beater using a dictionary. Perform a decoding function that performs non-searching. As a result of the decoding, the decoding unit 206 outputs a speech recognition result in the form of a keyword (keyword).

언어 번역부(300)는 핵심어 형태의 음성인식 결과를 분석하여 미리 등록된 회의 참가자별로의 사용 언어에 대응되는 타겟 언어로 변환하여 출력한다.The language translator 300 analyzes the voice recognition result in the form of a key word and converts the result into a target language corresponding to a language used for each conference participant registered in advance.

언어 번역부(300)는 언어 분석부(302), 사용자 언어 정보 분석부(304), 변환부(308), 출력부(310), 및 사용자 정보 데이터베이스(312)를 포함한다.The language translator 300 includes a language analyzer 302, a user language information analyzer 304, a converter 308, an output 310, and a user information database 312.

언어 분석부(302)는 음성 인식부(200)로부터의 음성인식 결과를 수신하여 입력된 언어의 종류(예컨대, 한국어, 영어, 일본어, 중국어, 불어 등)가 무엇인지를 분석한다. The language analyzer 302 receives a speech recognition result from the speech recognizer 200 and analyzes what kind of language is input (eg, Korean, English, Japanese, Chinese, French, etc.).

사용자 언어 정보 분석부(304)는 사용자 정보 데이터베이스(312)에 저장된 정보를 기초로 회의 참가자들의 사용 언어가 무엇인지를 분석한다. 사용자 정보 데이터베이스(312)에는 회의 참가자 정보 등록부(100)에서 등록한 사용자 정보(즉, 사용 언어, 회의 참가자 ID, 패스워드 등을 포함하는 회의 참가자 정보)가 등록되어 있으므로, 사용자 언어 정보 분석부(304)의 분석 동작이 가능하다. The user language information analysis unit 304 analyzes what language is used by the conference participants based on the information stored in the user information database 312. In the user information database 312, user information (that is, conference participant information including a language, a conference participant ID, a password, etc.) registered in the conference participant information registration unit 100 is registered. Analytical operation is possible.

변환부(308)는 수신된 음성인식 결과에 대해 내장된 대역사전 데이터베이스(306)를 기초로 입력 언어 대 출력 언어로 대응시키는 매핑을 회의 참가자별로 수행하여 회의 참가자별 타겟 언어로 변환한다.The conversion unit 308 converts the received speech recognition result into a target language for each conference participant by performing a mapping corresponding to an input language to an output language based on a built-in band dictionary database 306 for each conference participant.

출력부(310)는 변환부(308)에서 회의 참가자별 타겟 언어로 된 변환된 출력물을 출력장치의 형태가 출력할 화면을 갖고 있지 않은 경우에는 합성음으로 출력하고, 출력할 화면을 갖고 있는 경우는 텍스트 형태로 출력한다.The output unit 310 outputs the converted output in the target language for each conference participant by the synthesizer when the format of the output device does not have a screen for outputting the synthesized sound, and the output unit 310 has a screen for outputting. Output in text form.

도 3은 본 발명의 실시예에 따른 음성인식 기반 국제회의 통역 방법을 설명하기 위한 플로우차트이다.3 is a flowchart illustrating a voice recognition based international conference interpretation method according to an embodiment of the present invention.

일단, 음성인식 기반 국제회의 통역 서비스를 개시하기 전에 다자간 회의에 참가하는 참가자는 사용 언어를 포함한 회의 참가자 정보의 등록 및 회의시 발표내용중의 핵심어에 대한 등록을 먼저 행한다(S10). 즉, 회의 참가자 정보 등록부(100)를 통해 입력되는 회의 참가자 정보는 언어 번역부(300)의 사용자 정보 데이터베이스(312)에 저장된다. 호 제어부(102)의 마이크(도시 생략)를 통해 미리 입력되는 회의시의 발표내용은 음성 인식부(200)의 핵심어 추출부(208)에 입력되고, 핵심어 추출부(208)에서의 핵심어 추출과정을 통해 발표내용중의 핵심어가 핵심어 데이터베이스(210)에 저장된다. 여기서, 사용자 정보 데이터베이스(312) 및 핵심어 데이터베이스(210)에 저장되는 정보는 국가별 및 회의 참가자별로 분류되어 저장되거나 다른 형태로 저장된다. 본 발명에서는 핵심어 기반 음성인식을 위해 사전에 핵심어를 추출하여 저장시켜 두는 방식을 취하였는데, 이는 인식 성공률을 높이기 위함이다. 즉, 사전에 미리 등록시켜 두어야 시간적으로도 인식 속도도 빠르고 인식 성공률이 높아지기 때문이다. 핵심어로는 예를 들어 주어, 명사, 동사가 핵심어 로 정의된다. 음성 인식부(200)에서는 영역에 따른 클래스로 세분화하여 문법 네트워크를 구축함으로써 시스템 속도와 인식 성공률을 높이게 된다. First, the participants participating in the multi-party conference before the voice recognition-based international conference interpretation service is first registered with the participants of the conference, including the language used, and the key words in the presentation during the conference (S10). That is, the conference participant information input through the conference participant information registration unit 100 is stored in the user information database 312 of the language translator 300. The presentation content during the conference, which is input in advance through the microphone (not shown) of the call controller 102, is input to the keyword extracting unit 208 of the speech recognition unit 200 and the keyword extracting process in the keyword extracting unit 208. Through the key words in the presentation is stored in the key word database (210). Here, the information stored in the user information database 312 and the keyword database 210 is classified and stored by country and conference participants or stored in other forms. In the present invention, a key word is extracted and stored in advance for key word-based speech recognition. This is to increase recognition success rate. That is, if the registration is made in advance, the recognition speed is fast and the recognition success rate is increased. For example, nouns and verbs are defined as key words. The speech recognition unit 200 increases the system speed and recognition success rate by constructing a grammar network by subdividing into classes according to regions.

이와 같은 등록 과정이 완료된 이후에, 호 제어부(102)에서 회의 참가자 전원에게 호 개시 명령을 함으로써 국제회의 통역 서비스를 개시한다(S12).After the registration process is completed, the call control unit 102 starts an international conference interpretation service by giving a call start command to all conference participants (S12).

호 종료가 없게 되면(S14에서 "No") 발표자는 발표하고자 하는 말을 모국어를 사용하여 자연스럽게 말을 시작한다(S16).If there is no call termination (“No” in S14), the presenter naturally starts to speak the words to be announced using the mother tongue (S16).

그에 따라, 음성 인식부(200)의 음성 수신부(202)는 입력되는 음성 데이터를 버퍼링하고, 전처리부(204)는 버퍼링된 음성 데이터에 대한 전처리를 수행한다. 즉, 전처리부(204)에서는 음성 데이터에 섞여 있는 잡음을 제거하고, 잡음이 제거된 데이터로부터 음성 구간에 해당하는 시작점과 끝점을 추출하여 특징 벡터를 추출한다. 이후, 디코딩부(206)는 전처리부(204)에서 추출된 특징 벡터와 은닉 마코프 기반으로 생성된 표준 패턴 음향모델, 클래스기반 무한상태네트워크(FSN: Finite State Network), 사전(즉, 핵심어 데이터베이스(210))을 사용하여 비터비 탐색을 수행하는 디코딩 기능을 수행한다. 디코딩부(206)는 디코딩 수행결과로써 핵심어(키워드) 형태의 음성인식결과를 출력한다(S18).Accordingly, the voice receiver 202 of the voice recognition unit 200 buffers the input voice data, and the preprocessor 204 performs preprocessing on the buffered voice data. That is, the preprocessor 204 removes the noise mixed in the speech data, extracts the start point and the end point corresponding to the speech section from the data from which the noise is removed, and extracts the feature vector. Then, the decoding unit 206 is a standard pattern acoustic model, class-based finite state network (FSN), dictionary (i.e., keyword database) generated based on the feature vector extracted from the preprocessor 204 and the hidden Markov. 210) to perform a decoding function for performing a Viterbi search. The decoding unit 206 outputs a speech recognition result in the form of a keyword (keyword) as a result of decoding (S18).

키워드 형태의 음성인식 결과가 언어 번역부(400)에게로 입력되면, 언어 번역부(400)의 언어 분석부(302)는 음성인식 텍스트를 분석하여 입력된 언어의 종류(예컨대, 한국어, 영어, 일본어, 중국어 등)가 무엇인지를 파악한다(S20).When the speech recognition result in the form of a keyword is input to the language translator 400, the language analyzer 302 of the language translator 400 analyzes the speech recognition text to input the type of language (eg, Korean, English, Japanese, Chinese, etc.) to determine what is (S20).

그리고, 사용자 언어 정보 분석부(304)는 사용자 정보 데이터베이스(312)에 저장된 정보를 기초로 회의 참석자들이 사용하는 언어가 무엇인지를 파악한 다(S22). 이는 회의 참석자들마다 사용하는 언어가 다르기 때문에 그에 해당하는 언어로 대응시켜 출력시켜 주기 위한 것이다. In addition, the user language information analyzer 304 determines what language the conference attendees use based on the information stored in the user information database 312 (S22). This is because the languages used by the conference attendees are different so that the corresponding languages can be output in correspondence.

이후, 변환부(308)는 내장된 대역사전 데이터베이스(306)를 사용하여 입력 언어 대 출력 언어로 대응시켜 주는 대역사전 매핑을 수행하여 회의 참석자별 타겟 언어로 변환한다(S24, S26). Thereafter, the conversion unit 308 converts the target language for each conference participant by performing the band dictionary mapping corresponding to the input language to the output language using the built-in band dictionary database 306 (S24 and S26).

이와 같이 회의 참석자별 타겟 언어로의 변환이 완료되면 출력부(310)는 출력장치의 형태가 출력할 화면을 갖고 있지 않은 경우에는 변환완료된 결과물을 합성음으로 출력하고, 출력할 화면을 갖고 있는 경우는 변환완료된 결과물을 텍스트 형태로 출력한다(S28).When the conversion to the target language for each meeting participant is completed as described above, the output unit 310 outputs the converted result as a synthesized sound when the output device does not have a screen for output, and when the screen has the screen for output. The converted result is output in text form (S28).

이와 같은 음성인식 기반 국제회의 통역 서비스는 호 제어부(102)에서의 호 종료가 있게 되면(S14에서 "Yse") 종료된다.This voice recognition based international conference interpretation service is terminated when there is a call termination in the call control unit 102 ("Yse" in S14).

한편, 본 발명은 상술한 실시예로만 한정되는 것이 아니라 본 발명의 요지를 벗어나지 않는 범위내에서 수정 및 변형하여 실시할 수 있고, 그러한 수정 및 변형이 가해진 기술사상 역시 이하의 특허청구범위에 속하는 것으로 보아야 한다.On the other hand, the present invention is not limited only to the above-described embodiments and can be carried out by modifications and variations within the scope not departing from the gist of the present invention, the technical idea that such modifications and variations are also within the scope of the claims Must see

Claims

A conference participant information register configured to pre-register conference participant information including a language of the conference participant participating in the multi-party conference for each conference participant;

A voice recognition unit that registers a key word according to the presentation content of each conference participant in advance, and recognizes a voice accompanying the presentation of the conference participant based on the pre-registered key word and outputs a voice recognition result in the form of a key word; And

And a language translator configured to analyze the result of speech recognition in the form of a key word and convert the result into a target language corresponding to a language used for each conference participant registered in advance.

The method according to claim 1,

The speech recognition unit,

A key word extracting unit extracting a key word from the presentation content for each meeting participant and storing the key word in a key word database;

A voice receiver configured to receive a voice accompanying the presentation of the conference participant;

A preprocessor extracting a feature vector for the received voice; And

And a decoding unit which decodes the extracted feature vector and outputs a result of speech recognition in the form of a key word based on the key word stored in the key word database.

The method according to claim 1,

The speech recognition unit, the speech recognition based international conference interpreter, characterized in that for registering the key word in advance before the multi-party conference.

The method according to claim 1,

The key word pre-registered in the speech recognition unit includes a subject, a noun, a verb.

The method according to claim 1,

The speech recognition unit, a speech recognition-based international conference interpreter, characterized in that for outputting the result of the text form based on the key word.

The method according to claim 1,

The language translation unit,

A language analyzer configured to receive a speech recognition result from the speech recognizer and analyze a type of input language;

A user language information analyzer for analyzing a language used by the conference participants; And

A speech recognition-based international conference, comprising: a conversion unit converting the received speech recognition result into a target language for each conference participant by performing mapping corresponding to an input language to an output language based on a band dictionary for each conference participant; Interpreter desk.

The method according to claim 1,

And the language translator outputs an output in the form of synthesized sound or text.

The method according to claim 1,

The voice recognition-based international conference interpretation device is a voice recognition-based international conference interpretation device, characterized in that for translating the multi-party conference between the meeting participants of a remote location through a communication network.

A meeting participant information registration step of registering, by the conference participant information registration unit, conference participant information including a language of the conference participant participating in the multi-party conference in advance for each conference participant;

The speech recognition unit pre-registers a key word according to the presentation content for each conference participant, and recognizes a voice accompanying the conference participant based on the pre-registered key word and outputs a speech recognition result in the form of a key word. step; And

And a language translation step of translating the speech recognition result of the core form into a target language corresponding to a language used for each conference participant registered in advance and outputting the language. Way.

The method according to claim 9,

The speech recognition step,

A key word extraction step of extracting a key word from the presentation content for each meeting participant and storing the key word in a key word database;

A voice receiving step of receiving a voice accompanying the presentation of the conference participant;

A preprocessing step of extracting a feature vector for the received speech; And

Decoding the extracted feature vector, and outputting a speech recognition result in the form of a key word based on a key word stored in the key word database.

The method according to claim 9,

The voice recognition step is a voice recognition-based international conference interpretation method, characterized in that for registering the key word in advance for each meeting participation before the multi-party meeting.

The method according to claim 9,

The key word pre-registered in the speech recognition step includes a subject, a noun and a verb.

The method according to claim 9,

The speech recognition result of the speech recognition step is a speech recognition based international conference interpretation method, characterized in that the result of the text form based on the key word.

The method according to claim 9,

The language translation step,

A language analysis step of receiving a speech recognition result by the speech recognition step and analyzing a type of input language;

Analyzing user language information for analyzing a language used by the conference participants; And

And converting the received speech recognition result into a target language for each conference participant by performing a mapping corresponding to an input language to an output language based on a band dictionary for each conference participant. How to interpret.

The method according to claim 9,

In the language translation step, speech recognition-based international conference interpretation method characterized in that for outputting the output in the form of synthesized sound or text.