JPH02183371A - Automatic interpreting device - Google Patents

Automatic interpreting device

Info

Publication number
JPH02183371A
JPH02183371A JP1003581A JP358189A JPH02183371A JP H02183371 A JPH02183371 A JP H02183371A JP 1003581 A JP1003581 A JP 1003581A JP 358189 A JP358189 A JP 358189A JP H02183371 A JPH02183371 A JP H02183371A
Authority
JP
Japan
Prior art keywords
speaker
sound
emotional information
emotion
change
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP1003581A
Other languages
Japanese (ja)
Inventor
Toshinori Ito
伊東 俊紀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP1003581A priority Critical patent/JPH02183371A/en
Publication of JPH02183371A publication Critical patent/JPH02183371A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

PURPOSE:To transmit the emotion of a speaker and to improve translation accuracy by extracting emotional information from the picture of the speaker and executing sound synthesization with translation corresponding to these emotional information. CONSTITUTION:The expression of the speaker is recognized with suitable time scale in a picture recognization device through a camera 11a and in a change extraction device 6, a necessary target such as the individual change of the motion of a hand, the motion of a face, the motion of eyebrows, the change of eyes and the change of a mouth, etc., is read and totally judged in an emotion extraction device 7. Then, the prescribed emotional information are selected. The sound of the speaker is recognized by a sound recognization part 1 and translated in a machine translation part 2. At such a time, the commonsense mode of expression stored to a knowledge base 4 or the mode of expression corresponding to the emotional information extracted from the emotion extraction device 7 are selected. Then, the sound is synthesized in a sound synthesization part 3, the strength and pitch of the sound are adjusted so as to be matched with the emotional information and the sound synthesization is executed. Thus, the translated sound matched with the emotion of the speaker is outputted.

Description

【発明の詳細な説明】 [産業上の利用分野] 本発明は話者の音声言語を他の音声言語に通訳する自動
通訳装置に関する。
DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to an automatic interpreting device for interpreting a spoken language of a speaker into another spoken language.

[従来の技術] 最近、音声の認識技術1合成技術9機械翻訳技術の発展
を背景として、あるレベルでの自動音声翻訳が可能とな
り、実用化も間近になっている。
[Prior Art] Recently, with the development of speech recognition technology, synthesis technology, and machine translation technology, automatic speech translation has become possible at a certain level, and its practical use is nearing.

この種の自動通訳装置としては、話者の音声を認識する
音声認識部の認識結果に基づいて機械翻訳部により機械
翻訳し、この翻訳結果を音声合成部で音声合成して出力
するようにしたものがある。
This type of automatic interpretation device uses a machine translation section to perform machine translation based on the recognition results of a speech recognition section that recognizes the speaker's voice, and then synthesizes this translation result into speech using a speech synthesis section and outputs it. There is something.

[発明が解決しようとする課題] ところで、上述した従来の自動通訳装置にあっては、話
者の正確な気持や感情をおり込んで翻訳音声を出力する
ことができないので、この感情の伝達ができない分翻訳
精度のレベルが低くなっているという欠点があった。
[Problems to be Solved by the Invention] By the way, with the above-mentioned conventional automatic interpreting devices, it is not possible to output translated speech that incorporates the speaker's accurate feelings and emotions, so it is difficult to convey these emotions. However, the disadvantage was that the level of translation accuracy was low.

そこで、本発明の課題は、話者の感情をおり込んだ翻訳
音声を出力することができるようにする点にある。
Therefore, an object of the present invention is to make it possible to output translated speech that incorporates the speaker's emotions.

[課題を解決するための手段] このような課題を解決するための本発明の技術的手段は
、話者の音声を認識する音声認識手段と、話者の表情を
認識する表情認識手段と、表情認識手段により認識され
た表情の変化に対応した感情情報を抽出する感情抽出手
段と、前記音声認識結果及び感情情報に基づいて機械翻
訳する機械翻訳手段と、機械翻訳手段の翻訳結果及び上
記感情情報に基づいて音声を合成する音声合成手段とを
備えた自動通訳装置にある。
[Means for Solving the Problems] Technical means of the present invention for solving these problems include a voice recognition means for recognizing the voice of a speaker, a facial expression recognition means for recognizing the facial expressions of the speaker, an emotion extraction means for extracting emotional information corresponding to a change in facial expression recognized by the facial expression recognition means; a machine translation means for machine translation based on the voice recognition result and the emotion information; and a translation result of the machine translation means and the emotion. An automatic interpreting device includes a speech synthesis means for synthesizing speech based on information.

[実施例] 以下、添付図面に基づいて本発明の実施例に係る自動通
訳装置を説明する。
[Embodiment] Hereinafter, an automatic interpretation device according to an embodiment of the present invention will be described based on the accompanying drawings.

第1図に示すように、実施例に係る自動通訳装置は、マ
イク10等から人力された話者の音声を認識する音声認
識装置1と、話者の表情を認識する表情認識手段とを備
えている。表情認識手段は、話者と受手との映像交換を
行う映像装置11に接続され、カメラllaから話者の
画像を受信して認識する画像認識装置5で構成されてい
る。また、この画像認識装置5は該装置5で認識さねた
話者の表情の変化に対応した感情情報を抽出する感情抽
出手段に接続されている。この感情抽出手段は、話者の
表情の変化として、例えば、手の動き、顔の動き、まゆ
げ・目・口の動き等の変化抽出する変化抽出装置6と、
これらの表情の変化に対応して予め定められた感情情報
を蓄積しである知識ベース8と、上記表情の変化に対応
した知識ベース8の感情情報を抽出して出力する感情抽
出装置7とから構成されている。また、この自動通訳装
置は、音声認識装置1の音声認識結果及び感情情報に基
づいて所定の機械翻訳をする機械翻訳装置2と、機械翻
訳装置2の翻訳結果及び上記感情情報に基づいて所定の
音声を合成してスピーカに出力する音声合成装置3とを
備えている。上記機械翻訳装置2は、音声認識結果及び
感情情報に対応した言いまわしを蓄積した知識ベース4
から、該当するデータを選択するものである。
As shown in FIG. 1, the automatic interpreting device according to the embodiment includes a speech recognition device 1 that recognizes the speaker's voice input manually from a microphone 10, etc., and facial expression recognition means that recognizes the speaker's facial expression. ing. The facial expression recognition means includes an image recognition device 5 that is connected to a video device 11 that exchanges images between the speaker and the receiver, and that receives and recognizes an image of the speaker from a camera lla. The image recognition device 5 is also connected to emotion extraction means for extracting emotional information corresponding to changes in the speaker's facial expressions that the device 5 fails to recognize. This emotion extraction means includes a change extraction device 6 that extracts changes in the speaker's facial expressions, such as hand movements, facial movements, eyebrow/eye/mouth movements, etc.;
A knowledge base 8 that stores predetermined emotional information corresponding to these changes in facial expressions, and an emotion extraction device 7 that extracts and outputs emotional information from the knowledge base 8 that corresponds to the changes in facial expressions. It is configured. This automatic interpretation device also includes a machine translation device 2 that performs a predetermined machine translation based on the voice recognition result of the speech recognition device 1 and the emotional information, and a machine translation device 2 that performs predetermined machine translation based on the translation result of the machine translation device 2 and the emotional information. It is equipped with a voice synthesis device 3 that synthesizes voice and outputs it to a speaker. The machine translation device 2 has a knowledge base 4 that has accumulated phrases corresponding to speech recognition results and emotional information.
The relevant data is selected from the list.

従って、この実施例に係る自動通訳装置によれば、話者
の音声は音声認識部1で音声認識される。通常単語又は
文節単位で機械翻訳部2に送られる。機械翻訳装置2で
は構文解析、意味解析。
Therefore, according to the automatic interpreting device according to this embodiment, the speaker's voice is recognized by the voice recognition unit 1. Usually, it is sent to the machine translation unit 2 in units of words or phrases. Machine translation device 2 performs syntactic analysis and semantic analysis.

文脈解析等を行って翻訳をする。このとき、知識ベース
4に蓄積された常識的言いまわしや感情抽出装置7から
抽出された感情情報に対応した言°いまわしが選択され
る。選択された言いまわしは音声合成部3で音声合成さ
れるが、この時も感情抽出装置7から抽出された感情情
報にあうように音声の強弱、ピッチを調整して音声合成
が行われる。
Translate by performing context analysis, etc. At this time, common-sense expressions accumulated in the knowledge base 4 and expressions corresponding to the emotional information extracted from the emotion extraction device 7 are selected. The selected expression is synthesized into speech by the speech synthesis section 3, and at this time too, the strength and pitch of the speech are adjusted to match the emotional information extracted from the emotion extraction device 7, and the speech synthesis is performed.

更に詳しく説明すると、前記感情抽出装置7での感情情
報が抽出される過程は以下の通りである。話者の表情は
カメラllaを通じて画像認識装置5で適当なタイムス
ケールで認識される。認識された画像は変化抽出装置6
において主要なターゲット例えば、手の動き、顔の動作
くうなずき、否定的な横ふり、etc ) 、まゆげの
動き。
To explain in more detail, the process by which emotion information is extracted by the emotion extraction device 7 is as follows. The speaker's facial expression is recognized by the image recognition device 5 at an appropriate time scale through the camera lla. The recognized image is sent to the change extraction device 6
Main targets include hand movements, facial movements, nodding, negative sideways movements, etc.), and eyebrow movements.

目の変化(涙)1口の変化(笑い)等の個別の変化が読
みとられる。読みとられた個別のターゲットは感情抽出
装置7において総合的に判断され所定の感情情報が選択
される。この感情情報は各ターゲット毎の条件的知識と
して知識ベース8に蓄積されている。
Individual changes such as changes in eyes (tears) and changes in one mouth (laughter) are read. The read individual targets are comprehensively judged by the emotion extraction device 7 and predetermined emotion information is selected. This emotional information is stored in the knowledge base 8 as conditional knowledge for each target.

そのため、話者の感情にフィツトした言いまわしで翻訳
されるとともに、話者の感情にあった抑揚やピッチをも
って翻訳音声が出力される。
Therefore, the translation is performed using phrasing that fits the speaker's emotions, and the translated voice is output with intonation and pitch that match the speaker's emotions.

[発明の効果] 以上説明したように本発明の自動通訳装置によれば、話
者の画像から感情情報を抽出し、この感情情報に対応し
た翻訳と音声合成を行うので、話者の感情を表出するこ
とができ、この感情の伝達が可能となる分、翻訳精度が
向上する。
[Effects of the Invention] As explained above, according to the automatic interpreting device of the present invention, emotional information is extracted from the image of the speaker, and translation and speech synthesis corresponding to this emotional information are performed. The translation accuracy improves to the extent that this emotion can be conveyed.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の実施例に係る自動通訳装置の構成を示
すブロック図である。 :音声認識装置 2:機械翻訳装置 =r1声合成装置 4:機械翻訳用知識ベース 5:画像認識装置 6:変化抽出装置 7:感情抽出装置 8:感情抽出用知識ベース
FIG. 1 is a block diagram showing the configuration of an automatic interpretation device according to an embodiment of the present invention. : Voice recognition device 2: Machine translation device = r1 Voice synthesis device 4: Knowledge base for machine translation 5: Image recognition device 6: Change extraction device 7: Emotion extraction device 8: Knowledge base for emotion extraction

Claims (1)

【特許請求の範囲】[Claims]  話者の音声を認識する音声認識手段と、話者の表情を
認識する表情認識手段と、表情認識手段により認識され
た表情の変化に対応した感情情報を抽出する感情抽出手
段と、前記音声認識結果及び感情情報に基づいて機械翻
訳する機械翻訳手段と、機械翻訳手段の翻訳結果及び上
記感情情報に基づいて音声を合成する音声合成手段とを
備えたことを特徴とする自動通訳装置。
a voice recognition means for recognizing the voice of the speaker; a facial expression recognition means for recognizing the facial expression of the speaker; an emotion extraction means for extracting emotional information corresponding to a change in the facial expression recognized by the facial expression recognition means; An automatic interpreting device comprising: a machine translation device that performs machine translation based on the result and emotional information; and a speech synthesis device that synthesizes speech based on the translation result of the machine translation device and the emotional information.
JP1003581A 1989-01-10 1989-01-10 Automatic interpreting device Pending JPH02183371A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP1003581A JPH02183371A (en) 1989-01-10 1989-01-10 Automatic interpreting device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP1003581A JPH02183371A (en) 1989-01-10 1989-01-10 Automatic interpreting device

Publications (1)

Publication Number Publication Date
JPH02183371A true JPH02183371A (en) 1990-07-17

Family

ID=11561421

Family Applications (1)

Application Number Title Priority Date Filing Date
JP1003581A Pending JPH02183371A (en) 1989-01-10 1989-01-10 Automatic interpreting device

Country Status (1)

Country Link
JP (1) JPH02183371A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0585098A2 (en) * 1992-08-24 1994-03-02 Hitachi, Ltd. Sign recognition apparatus and method and sign translation system using same
US5659764A (en) * 1993-02-25 1997-08-19 Hitachi, Ltd. Sign language generation apparatus and sign language translation apparatus
US5887069A (en) * 1992-03-10 1999-03-23 Hitachi, Ltd. Sign recognition apparatus and method and sign translation system using same
US20050201565A1 (en) * 2004-03-15 2005-09-15 Samsung Electronics Co., Ltd. Apparatus for providing sound effects according to an image and method thereof
JP2007148039A (en) * 2005-11-28 2007-06-14 Matsushita Electric Ind Co Ltd Speech translation device and speech translation method
JP2008021058A (en) * 2006-07-12 2008-01-31 Nec Corp Portable telephone apparatus with translation function, method for translating voice data, voice data translation program, and program recording medium
US7962345B2 (en) 2001-04-11 2011-06-14 International Business Machines Corporation Speech-to-speech generation system and method
JP6290479B1 (en) * 2017-03-02 2018-03-07 株式会社リクルートライフスタイル Speech translation device, speech translation method, and speech translation program
CN109949794A (en) * 2019-03-14 2019-06-28 合肥科塑信息科技有限公司 A kind of intelligent sound converting system based on Internet technology
JP2020134719A (en) * 2019-02-20 2020-08-31 ソフトバンク株式会社 Translation device, translation method, and translation program
CN112102831A (en) * 2020-09-15 2020-12-18 海南大学 Cross-data, information and knowledge modal content encoding and decoding method and component

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5887069A (en) * 1992-03-10 1999-03-23 Hitachi, Ltd. Sign recognition apparatus and method and sign translation system using same
EP0585098A3 (en) * 1992-08-24 1995-01-11 Hitachi Ltd Sign recognition apparatus and method and sign translation system using same.
EP0585098A2 (en) * 1992-08-24 1994-03-02 Hitachi, Ltd. Sign recognition apparatus and method and sign translation system using same
US5659764A (en) * 1993-02-25 1997-08-19 Hitachi, Ltd. Sign language generation apparatus and sign language translation apparatus
US5953693A (en) * 1993-02-25 1999-09-14 Hitachi, Ltd. Sign language generation apparatus and sign language translation apparatus
US7962345B2 (en) 2001-04-11 2011-06-14 International Business Machines Corporation Speech-to-speech generation system and method
US8238566B2 (en) * 2004-03-15 2012-08-07 Samsung Electronics Co., Ltd. Apparatus for providing sound effects according to an image and method thereof
US20050201565A1 (en) * 2004-03-15 2005-09-15 Samsung Electronics Co., Ltd. Apparatus for providing sound effects according to an image and method thereof
JP2007148039A (en) * 2005-11-28 2007-06-14 Matsushita Electric Ind Co Ltd Speech translation device and speech translation method
JP2008021058A (en) * 2006-07-12 2008-01-31 Nec Corp Portable telephone apparatus with translation function, method for translating voice data, voice data translation program, and program recording medium
JP6290479B1 (en) * 2017-03-02 2018-03-07 株式会社リクルートライフスタイル Speech translation device, speech translation method, and speech translation program
JP2020134719A (en) * 2019-02-20 2020-08-31 ソフトバンク株式会社 Translation device, translation method, and translation program
CN109949794A (en) * 2019-03-14 2019-06-28 合肥科塑信息科技有限公司 A kind of intelligent sound converting system based on Internet technology
CN112102831A (en) * 2020-09-15 2020-12-18 海南大学 Cross-data, information and knowledge modal content encoding and decoding method and component

Similar Documents

Publication Publication Date Title
CN113454708A (en) Linguistic style matching agent
US8224652B2 (en) Speech and text driven HMM-based body animation synthesis
US8131551B1 (en) System and method of providing conversational visual prosody for talking heads
CN112650831A (en) Virtual image generation method and device, storage medium and electronic equipment
KR102098734B1 (en) Method, apparatus and terminal for providing sign language video reflecting appearance of conversation partner
WO2021196645A1 (en) Method, apparatus and device for driving interactive object, and storage medium
CN107972028A (en) Man-machine interaction method, device and electronic equipment
US20230082830A1 (en) Method and apparatus for driving digital human, and electronic device
KR20190114150A (en) Method and apparatus for translating speech of video and providing lip-synchronization for translated speech in video
KR102174922B1 (en) Interactive sign language-voice translation apparatus and voice-sign language translation apparatus reflecting user emotion and intention
WO2017195775A1 (en) Sign language conversation assistance system
KR20200090355A (en) Multi-Channel-Network broadcasting System with translating speech on moving picture and Method thererof
US20240022772A1 (en) Video processing method and apparatus, medium, and program product
JPH02183371A (en) Automatic interpreting device
CN113689879A (en) Method, device, electronic equipment and medium for driving virtual human in real time
CN110162598A (en) A kind of data processing method and device, a kind of device for data processing
WO2024088321A1 (en) Virtual image face driving method and apparatus, electronic device and medium
WO2022072752A1 (en) Voice user interface using non-linguistic input
US20240221753A1 (en) System and method for using gestures and expressions for controlling speech applications
Hrúz et al. Automatic fingersign-to-speech translation system
CN113112575A (en) Mouth shape generation method and device, computer equipment and storage medium
CN116129852A (en) Training method of speech synthesis model, speech synthesis method and related equipment
CN117275485B (en) Audio and video generation method, device, equipment and storage medium
CN117351929A (en) Translation method, translation device, electronic equipment and storage medium
Eguchi et al. Development of Mobile Device-Based Speech Enhancement System Using Lip-Reading