JPH02183371A

JPH02183371A - Automatic interpreting device

Info

Publication number: JPH02183371A
Application number: JP1003581A
Authority: JP
Inventors: Toshinori Ito; 伊東　俊紀
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1989-01-10
Filing date: 1989-01-10
Publication date: 1990-07-17

Abstract

PURPOSE:To transmit the emotion of a speaker and to improve translation accuracy by extracting emotional information from the picture of the speaker and executing sound synthesization with translation corresponding to these emotional information. CONSTITUTION:The expression of the speaker is recognized with suitable time scale in a picture recognization device through a camera 11a and in a change extraction device 6, a necessary target such as the individual change of the motion of a hand, the motion of a face, the motion of eyebrows, the change of eyes and the change of a mouth, etc., is read and totally judged in an emotion extraction device 7. Then, the prescribed emotional information are selected. The sound of the speaker is recognized by a sound recognization part 1 and translated in a machine translation part 2. At such a time, the commonsense mode of expression stored to a knowledge base 4 or the mode of expression corresponding to the emotional information extracted from the emotion extraction device 7 are selected. Then, the sound is synthesized in a sound synthesization part 3, the strength and pitch of the sound are adjusted so as to be matched with the emotional information and the sound synthesization is executed. Thus, the translated sound matched with the emotion of the speaker is outputted.

Description

【発明の詳細な説明】［産業上の利用分野］本発明は話者の音声言語を他の音声言語に通訳する自動
通訳装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to an automatic interpreting device for interpreting a spoken language of a speaker into another spoken language.

［従来の技術］最近、音声の認識技術１合成技術９機械翻訳技術の発展
を背景として、あるレベルでの自動音声翻訳が可能とな
り、実用化も間近になっている。[Prior Art] Recently, with the development of speech recognition technology, synthesis technology, and machine translation technology, automatic speech translation has become possible at a certain level, and its practical use is nearing.

この種の自動通訳装置としては、話者の音声を認識する
音声認識部の認識結果に基づいて機械翻訳部により機械
翻訳し、この翻訳結果を音声合成部で音声合成して出力
するようにしたものがある。This type of automatic interpretation device uses a machine translation section to perform machine translation based on the recognition results of a speech recognition section that recognizes the speaker's voice, and then synthesizes this translation result into speech using a speech synthesis section and outputs it. There is something.

［発明が解決しようとする課題］ところで、上述した従来の自動通訳装置にあっては、話
者の正確な気持や感情をおり込んで翻訳音声を出力する
ことができないので、この感情の伝達ができない分翻訳
精度のレベルが低くなっているという欠点があった。[Problems to be Solved by the Invention] By the way, with the above-mentioned conventional automatic interpreting devices, it is not possible to output translated speech that incorporates the speaker's accurate feelings and emotions, so it is difficult to convey these emotions. However, the disadvantage was that the level of translation accuracy was low.

そこで、本発明の課題は、話者の感情をおり込んだ翻訳
音声を出力することができるようにする点にある。Therefore, an object of the present invention is to make it possible to output translated speech that incorporates the speaker's emotions.

［課題を解決するための手段］このような課題を解決するための本発明の技術的手段は
、話者の音声を認識する音声認識手段と、話者の表情を
認識する表情認識手段と、表情認識手段により認識され
た表情の変化に対応した感情情報を抽出する感情抽出手
段と、前記音声認識結果及び感情情報に基づいて機械翻
訳する機械翻訳手段と、機械翻訳手段の翻訳結果及び上
記感情情報に基づいて音声を合成する音声合成手段とを
備えた自動通訳装置にある。[Means for Solving the Problems] Technical means of the present invention for solving these problems include a voice recognition means for recognizing the voice of a speaker, a facial expression recognition means for recognizing the facial expressions of the speaker, an emotion extraction means for extracting emotional information corresponding to a change in facial expression recognized by the facial expression recognition means; a machine translation means for machine translation based on the voice recognition result and the emotion information; and a translation result of the machine translation means and the emotion. An automatic interpreting device includes a speech synthesis means for synthesizing speech based on information.

［実施例］以下、添付図面に基づいて本発明の実施例に係る自動通
訳装置を説明する。[Embodiment] Hereinafter, an automatic interpretation device according to an embodiment of the present invention will be described based on the accompanying drawings.

第１図に示すように、実施例に係る自動通訳装置は、マ
イク１０等から人力された話者の音声を認識する音声認
識装置１と、話者の表情を認識する表情認識手段とを備
えている。表情認識手段は、話者と受手との映像交換を
行う映像装置１１に接続され、カメラｌｌａから話者の
画像を受信して認識する画像認識装置５で構成されてい
る。また、この画像認識装置５は該装置５で認識さねた
話者の表情の変化に対応した感情情報を抽出する感情抽
出手段に接続されている。この感情抽出手段は、話者の
表情の変化として、例えば、手の動き、顔の動き、まゆ
げ・目・口の動き等の変化抽出する変化抽出装置６と、
これらの表情の変化に対応して予め定められた感情情報
を蓄積しである知識ベース８と、上記表情の変化に対応
した知識ベース８の感情情報を抽出して出力する感情抽
出装置７とから構成されている。また、この自動通訳装
置は、音声認識装置１の音声認識結果及び感情情報に基
づいて所定の機械翻訳をする機械翻訳装置２と、機械翻
訳装置２の翻訳結果及び上記感情情報に基づいて所定の
音声を合成してスピーカに出力する音声合成装置３とを
備えている。上記機械翻訳装置２は、音声認識結果及び
感情情報に対応した言いまわしを蓄積した知識ベース４
から、該当するデータを選択するものである。As shown in FIG. 1, the automatic interpreting device according to the embodiment includes a speech recognition device 1 that recognizes the speaker's voice input manually from a microphone 10, etc., and facial expression recognition means that recognizes the speaker's facial expression. ing. The facial expression recognition means includes an image recognition device 5 that is connected to a video device 11 that exchanges images between the speaker and the receiver, and that receives and recognizes an image of the speaker from a camera lla. The image recognition device 5 is also connected to emotion extraction means for extracting emotional information corresponding to changes in the speaker's facial expressions that the device 5 fails to recognize. This emotion extraction means includes a change extraction device 6 that extracts changes in the speaker's facial expressions, such as hand movements, facial movements, eyebrow/eye/mouth movements, etc.;
A knowledge base 8 that stores predetermined emotional information corresponding to these changes in facial expressions, and an emotion extraction device 7 that extracts and outputs emotional information from the knowledge base 8 that corresponds to the changes in facial expressions. It is configured. This automatic interpretation device also includes a machine translation device 2 that performs a predetermined machine translation based on the voice recognition result of the speech recognition device 1 and the emotional information, and a machine translation device 2 that performs predetermined machine translation based on the translation result of the machine translation device 2 and the emotional information. It is equipped with a voice synthesis device 3 that synthesizes voice and outputs it to a speaker. The machine translation device 2 has a knowledge base 4 that has accumulated phrases corresponding to speech recognition results and emotional information.
The relevant data is selected from the list.

従って、この実施例に係る自動通訳装置によれば、話者
の音声は音声認識部１で音声認識される。通常単語又は
文節単位で機械翻訳部２に送られる。機械翻訳装置２で
は構文解析、意味解析。Therefore, according to the automatic interpreting device according to this embodiment, the speaker's voice is recognized by the voice recognition unit 1. Usually, it is sent to the machine translation unit 2 in units of words or phrases. Machine translation device 2 performs syntactic analysis and semantic analysis.

文脈解析等を行って翻訳をする。このとき、知識ベース
４に蓄積された常識的言いまわしや感情抽出装置７から
抽出された感情情報に対応した言°いまわしが選択され
る。選択された言いまわしは音声合成部３で音声合成さ
れるが、この時も感情抽出装置７から抽出された感情情
報にあうように音声の強弱、ピッチを調整して音声合成
が行われる。Translate by performing context analysis, etc. At this time, common-sense expressions accumulated in the knowledge base 4 and expressions corresponding to the emotional information extracted from the emotion extraction device 7 are selected. The selected expression is synthesized into speech by the speech synthesis section 3, and at this time too, the strength and pitch of the speech are adjusted to match the emotional information extracted from the emotion extraction device 7, and the speech synthesis is performed.

更に詳しく説明すると、前記感情抽出装置７での感情情
報が抽出される過程は以下の通りである。話者の表情は
カメラｌｌａを通じて画像認識装置５で適当なタイムス
ケールで認識される。認識された画像は変化抽出装置６
において主要なターゲット例えば、手の動き、顔の動作
くうなずき、否定的な横ふり、ｅｔｃ　）　、まゆげの
動き。To explain in more detail, the process by which emotion information is extracted by the emotion extraction device 7 is as follows. The speaker's facial expression is recognized by the image recognition device 5 at an appropriate time scale through the camera lla. The recognized image is sent to the change extraction device 6
Main targets include hand movements, facial movements, nodding, negative sideways movements, etc.), and eyebrow movements.

目の変化（涙）１口の変化（笑い）等の個別の変化が読
みとられる。読みとられた個別のターゲットは感情抽出
装置７において総合的に判断され所定の感情情報が選択
される。この感情情報は各ターゲット毎の条件的知識と
して知識ベース８に蓄積されている。Individual changes such as changes in eyes (tears) and changes in one mouth (laughter) are read. The read individual targets are comprehensively judged by the emotion extraction device 7 and predetermined emotion information is selected. This emotional information is stored in the knowledge base 8 as conditional knowledge for each target.

そのため、話者の感情にフィツトした言いまわしで翻訳
されるとともに、話者の感情にあった抑揚やピッチをも
って翻訳音声が出力される。Therefore, the translation is performed using phrasing that fits the speaker's emotions, and the translated voice is output with intonation and pitch that match the speaker's emotions.

［発明の効果］以上説明したように本発明の自動通訳装置によれば、話
者の画像から感情情報を抽出し、この感情情報に対応し
た翻訳と音声合成を行うので、話者の感情を表出するこ
とができ、この感情の伝達が可能となる分、翻訳精度が
向上する。[Effects of the Invention] As explained above, according to the automatic interpreting device of the present invention, emotional information is extracted from the image of the speaker, and translation and speech synthesis corresponding to this emotional information are performed. The translation accuracy improves to the extent that this emotion can be conveyed.

[Brief explanation of the drawing]

第１図は本発明の実施例に係る自動通訳装置の構成を示
すブロック図である。：音声認識装置２：機械翻訳装置＝ｒ１声合成装置４：機械翻訳用知識ベース５：画像認識装置６：変化抽出装置７：感情抽出装置８：感情抽出用知識ベースFIG. 1 is a block diagram showing the configuration of an automatic interpretation device according to an embodiment of the present invention. : Voice recognition device 2: Machine translation device = r1 Voice synthesis device 4: Knowledge base for machine translation 5: Image recognition device 6: Change extraction device 7: Emotion extraction device 8: Knowledge base for emotion extraction

Claims

[Claims]

a voice recognition means for recognizing the voice of the speaker; a facial expression recognition means for recognizing the facial expression of the speaker; an emotion extraction means for extracting emotional information corresponding to a change in the facial expression recognized by the facial expression recognition means; An automatic interpreting device comprising: a machine translation device that performs machine translation based on the result and emotional information; and a speech synthesis device that synthesizes speech based on the translation result of the machine translation device and the emotional information.