JP5338350B2

JP5338350B2 - Information processing apparatus and voice correction program

Info

Publication number: JP5338350B2
Application number: JP2009026417A
Authority: JP
Inventors: 恵理子田丸; 英二西川; 仁阿部
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2009-02-06
Filing date: 2009-02-06
Publication date: 2013-11-13
Anticipated expiration: 2029-02-06
Also published as: JP2010183444A

Description

本発明は、情報処理装置及び音声補正プログラムに関する。 The present invention relates to an information processing apparatus and a sound correction program.

通信者の双方に同質な空間を設けてそれぞれに通信相手の影を生成する情報処理装置が知られている。 2. Description of the Related Art An information processing apparatus is known in which a space of the same quality is provided for both communication parties and a shadow of a communication partner is generated for each space.

例えば、特許文献１の情報処理装置は、利用者の音声や映像を通信相手にインターネット等を介して送受信することで利用者と通信相手とのコミュニケーションを補助する装置であって、距離を隔てた２地点に同質な空間とともにそれぞれ設けられ、利用者を撮影するカメラと、空間に対し通信相手の影を投射する投射装置とに接続される。これにより、利用者は、音声や映像による明示的情報に加え、投射装置から投射される通信相手の影の動きにより通信相手の発する音声や通信相手を撮影した映像等の明示的情報とは異なる暗示的情報を得ることができる。 For example, the information processing apparatus disclosed in Patent Document 1 is an apparatus that assists communication between a user and a communication partner by transmitting and receiving voice and video of the user to and from a communication partner via the Internet and the like. Each is provided with a homogeneous space at two points, and is connected to a camera for photographing a user and a projection device for projecting a shadow of a communication partner to the space. Thus, the user is different from the explicit information such as the voice of the communication partner or the video of the communication partner shot by the movement of the shadow of the communication partner projected from the projection device in addition to the explicit information by the voice or video. Implicit information can be obtained.

特開２００６−１５７９５９号公報JP 2006-157959 A

本発明の目的は、通信相手に利用者の感情を伝達するとともに不適切な音声の伝達を抑制することができる情報処理装置及び音声補正プログラムを提供することにある。 An object of the present invention is to provide an information processing apparatus and a sound correction program capable of transmitting a user's emotion to a communication partner and suppressing inappropriate sound transmission.

［１］複数の利用者の表情を撮影して得られる映像信号及び／又は前記複数の利用者の声から得られる音声信号に基づいて前記複数の利用者の感情をそれぞれ抽出する感情抽出手段と、
それぞれ異なる感情を表現する複数の感情表現画像を記憶する記憶手段と、
前記感情抽出手段が抽出した前記複数の利用者の感情に基づいて前記複数の感情表現画像から感情表現画像をそれぞれ選択する感情表現画像選択手段と、
前記感情抽出手段が抽出した前記感情に基づいて、当該感情の伝達を抑制するように前記音声信号を補正する音声補正手段と、
前記映像信号の表示映像上に選択された前記感情表現画像を表示するとともに、前記感情表現画像に該当する利用者をそれぞれ指し示す符号を表示した映像情報を生成処理する映像処理手段とを有する情報処理装置。 [1] and the emotion extraction unit configured to extract each emotion of the plurality of users based on the plurality of video signals obtained by photographing the user's facial expressions and / or audio signal obtained from the voice of the plurality of users ,
Storage means for storing a plurality of emotional expression images expressing different emotions,
And emotional expression image selecting means for respectively selecting the emotion image from the plurality of emotional expressions image based on feelings of the emotion extraction unit extracts the plurality of users,
Voice correction means for correcting the voice signal so as to suppress transmission of the emotion based on the emotion extracted by the emotion extraction means;
Information processing comprising: video processing means for generating and processing video information displaying the selected emotion expression image on a display image of the video signal and displaying a code indicating each user corresponding to the emotion expression image apparatus.

［２］前記音声補正手段が補正した前記音声信号から音声情報を生成処理する音声処理手段と、前記映像情報及び前記音声情報を外部の情報処理装置に送信する通信手段をさらに備えた前記［１］に記載の情報処理装置。 [2] The above [1], further comprising: an audio processing unit that generates audio information from the audio signal corrected by the audio correcting unit; and a communication unit that transmits the video information and the audio information to an external information processing apparatus. ]. The information processing apparatus according to claim.

［３］前記音声補正手段が補正した前記音声信号から音声を再生処理する音声処理手段をさらに備え、前記映像処理手段は、前記映像情報を再生処理する前記［１］に記載の情報処理装置。 [3] The information processing apparatus according to [1], further including an audio processing unit that reproduces audio from the audio signal corrected by the audio correcting unit, and the video processing unit reproduces the video information.

［４］前記感情抽出手段は、さらに前記利用者の生体情報を取得し、前記生体情報に基づいて前記利用者の感情を抽出する前記［１］に記載の情報処理装置。 [4] The information processing apparatus according to [1], wherein the emotion extraction unit further acquires the user's biometric information and extracts the user's emotion based on the biometric information.

［５］それぞれ異なる感情を表現する複数の感情表現画像を参照可能なコンピュータに、
複数の利用者の表情を撮影して得られる映像信号及び／又は前記利用者の声から得られる音声信号に基づいて前記複数の利用者の感情をそれぞれ抽出する感情抽出機能と、
抽出された前記複数の利用者の感情に基づいて前記感情表現画像をそれぞれ選択する感情表現画像選択機能と、
抽出された前記感情に基づいて、当該感情の伝達を抑制するように前記音声信号を補正する音声補正機能と、
前記映像信号の表示映像上に選択された前記感情表現画像を表示するとともに、前記感情表現画像に該当する利用者をそれぞれ指し示す符号を表示した映像情報を生成処理する映像処理機能とを実行させるための音声補正プログラム。 [5] A computer capable of referring to a plurality of emotion expression images expressing different emotions,
An emotion extraction function for respectively extracting emotions of the plurality of users based on video signals obtained by photographing the facial expressions of a plurality of users and / or audio signals obtained from the voices of the users;
An emotion expression image selection function for selecting each of the emotion expression images based on the extracted emotions of the plurality of users ;
A voice correction function for correcting the voice signal based on the extracted emotion to suppress transmission of the emotion;
In order to display the selected emotion expression image on the display image of the video signal , and to execute a video processing function for generating and processing video information displaying a code indicating each user corresponding to the emotion expression image Voice correction program.

請求項１又は５に係る発明によれば、利用者の感情を感情表現画像で提示するとともに不適切な音声を補正することができる。 According to the invention which concerns on Claim 1 or 5, while showing a user's emotion with an emotion expression image, an inappropriate audio | voice can be correct | amended.

請求項２に係る発明によれば、通信相手に利用者の感情を伝達するとともに不適切な音声の伝達を抑制することができる。 According to the invention which concerns on Claim 2, while transmitting a user's emotion to a communication other party, transmission of an improper voice can be suppressed.

請求項３に係る発明によれば、利用者に利用者の感情を伝達することができる。 According to the invention which concerns on Claim 3, a user's emotion can be transmitted to a user.

請求項４に係る発明によれば、利用者の感情をより正確に抽出することができる。 According to the invention which concerns on Claim 4, a user's emotion can be extracted more correctly.

本発明の第１の実施の形態に係る電子会議システムの構成例を示す概略図である。It is the schematic which shows the structural example of the electronic conference system which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る情報処理装置の構成例を示す概略図である。It is the schematic which shows the structural example of the information processing apparatus which concerns on the 1st Embodiment of this invention. （ａ）〜（ｆ）は、本発明の第１の実施の形態に係る音声補正プログラムの動作例を示す概略図である。(A)-(f) is the schematic which shows the operation example of the audio | voice correction program which concerns on the 1st Embodiment of this invention. （ａ）〜（ｄ）は、本発明の第１の実施の形態に係る音声補正プログラムの動作例を示す概略図である。(A)-(d) is the schematic which shows the operation example of the audio | voice correction program which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る音声補正プログラムの動作例を示すフローチャートである。It is a flowchart which shows the operation example of the audio | voice correction program which concerns on the 1st Embodiment of this invention. 本発明の第２の実施の形態に係る音声補正プログラムの動作例を示すフローチャートである。It is a flowchart which shows the operation example of the audio | voice correction program which concerns on the 2nd Embodiment of this invention.

［第１の実施の形態］
（電子会議システムの構成）
図１は、本発明の第１の実施の形態に係る電子会議システムの構成例を示す概略図である。 [First Embodiment]
(Configuration of electronic conference system)
FIG. 1 is a schematic diagram showing a configuration example of an electronic conference system according to the first embodiment of the present invention.

電子会議システム１は、場所Ａ及び場所Ｂにそれぞれ設置された情報処理装置２Ａ及び２Ｂをインターネット１０によって通信可能に接続することで構成される。地点Ａ及び地点Ｂに設置される機器の構成は同様であるため、以下において省略のため場所Ａの構成についてのみ説明する。 The electronic conference system 1 is configured by connecting information processing apparatuses 2 </ b> A and 2 </ b> B installed at a place A and a place B, respectively, via the Internet 10 so that they can communicate with each other. Since the configurations of the devices installed at the points A and B are the same, only the configuration of the location A will be described below for the sake of omission.

情報処理装置２Ａは、インターネット１０を介して情報処理装置２Ｂと通信するとともに、通信により送受信する情報を処理するサーバ装置であり、カメラ４Ｂ及び情報処理装置２Ｂによって撮影及び映像処理された場所Ｂの映像を表示する表示部３Ａと、場所Ａにおける状況を撮影するカメラ４Ａと、マイク６Ｂ及び情報処理装置２Ｂによって集音された音声を出力するスピーカ５Ａ、場所Ａにおける音声を集音するマイク６Ａとに接続される。 The information processing device 2A is a server device that communicates with the information processing device 2B via the Internet 10 and processes information transmitted and received by communication. Display unit 3A for displaying video, camera 4A for photographing the situation at place A, speaker 5A for outputting sound collected by microphone 6B and information processing apparatus 2B, microphone 6A for collecting sound at place A Connected to.

図２は、本発明の第１の実施の形態に係る情報処理装置の構成例を示す概略図である。 FIG. 2 is a schematic diagram illustrating a configuration example of the information processing apparatus according to the first embodiment of the present invention.

情報処理装置２Ａは、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）等から構成され各部を制御するとともに各種のプログラムを実行する制御部２０Ａと、ＨＤＤ（ＨａｒｄＤｉｓｃＤｒｉｖｅ）等の記憶装置から構成され情報を記憶する記憶部２１Ａと、外部と通信する通信部２２Ａとを有する。また、情報処理装置２Ａは、ＬＣＤ（ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ）等の表示装置から構成される表示部３Ａと、ＣＭＯＳ（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）やＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ）等のセンサ及びレンズ等の光学系から構成されるカメラ４Ａと、音声を出力するスピーカ５Ａと、音声を集音するマイク６Ａと、利用者Ａ１及びＡ２の体温を測定する赤外線カメラ、血圧や体温等の生態情報を取得するバンド型センサ、又は利用者Ａ１及びＡ２の座る椅子や机、床等に設置された振動センサや圧力センサ等であるセンサ７Ａとに接続される。 The information processing apparatus 2A is configured by a CPU (Central Processing Unit) or the like, and controls each unit and executes various programs, and a storage device such as an HDD (Hard Disc Drive) and the like, and stores information. 21A and a communication unit 22A that communicates with the outside. The information processing apparatus 2A includes a display unit 3A including a display device such as an LCD (Liquid Crystal Display), an optical system such as a sensor such as a CMOS (Complementary Metal Oxide Semiconductor), a CCD (Charge Coupled Device), and a lens. 4A, a speaker 5A that outputs sound, a microphone 6A that collects sound, an infrared camera that measures the body temperature of users A1 and A2, and a band type that acquires biological information such as blood pressure and body temperature The sensor is connected to a sensor 7A which is a vibration sensor, a pressure sensor or the like installed on a chair, a desk, a floor or the like on which the users A1 and A2 sit.

制御部２０Ａは、後述する音声補正プログラム２１０を実行することで、感情抽出手段２００と、感情表現画像選択手段２０１と、音声補正手段２０２と、映像処理手段２０３と、音声処理手段２０４とを動作させる。 The control unit 20A operates an emotion extraction unit 200, an emotion expression image selection unit 201, an audio correction unit 202, a video processing unit 203, and an audio processing unit 204 by executing a later-described audio correction program 210. Let

感情抽出手段２００は、カメラ４Ａ、マイク６Ａ及びセンサ７Ａから得られる情報に基づいて利用者Ａ１及びＡ２の感情を抽出する手段である。感情表現画像選択手段２０１は、感情抽出手段２００によって抽出された感情に応じて感情を表現するために用いられる感情表現画像を選択する手段である。音声補正手段２０２は、感情抽出手段２００によって抽出された感情又はマイク６Ａによって集音された音声の特徴に応じて音声信号を補正する手段である。映像処理手段２０３は、カメラ４Ａによって撮影された信号を処理するとともに、感情表現画像選択手段２０１によって選択された画像を表示するとともに、通信部２２Ａを介して受信した映像情報を映像として表示部３Ａに表示する手段である。音声処理手段２０４は、マイク６Ａによって集音された信号、又は音声補正手段２０２によって補正された信号を処理するとともに、通信部２２Ａを介して受信した音声情報を音声信号に変換してスピーカ５Ａに表示する手段である。 The emotion extraction unit 200 is a unit that extracts the emotions of the users A1 and A2 based on information obtained from the camera 4A, the microphone 6A, and the sensor 7A. The emotion expression image selection unit 201 is a unit that selects an emotion expression image used for expressing an emotion in accordance with the emotion extracted by the emotion extraction unit 200. The voice correction unit 202 is a unit that corrects the voice signal according to the emotion extracted by the emotion extraction unit 200 or the characteristics of the voice collected by the microphone 6A. The video processing unit 203 processes the signal captured by the camera 4A, displays the image selected by the emotion expression image selection unit 201, and displays the video information received via the communication unit 22A as a video. It is a means to display. The audio processing unit 204 processes the signal collected by the microphone 6A or the signal corrected by the audio correction unit 202, converts the audio information received via the communication unit 22A into an audio signal, and outputs it to the speaker 5A. It is a means for displaying.

記憶部２１Ａは、制御部２０Ａを上述した各手段として動作させる音声補正プログラム２１０と、感情抽出手段２００が抽出する感情を表現する複数の画像を格納する感情表現画像２１１とを記憶する。感情表現画像２１１は、表情を示すアイコン、表情又は全身を示すアバター、写真、又は文字等を用いる。 The storage unit 21A stores a sound correction program 210 that causes the control unit 20A to operate as the above-described units, and an emotion expression image 211 that stores a plurality of images that express emotions extracted by the emotion extraction unit 200. The emotion expression image 211 uses an icon indicating an expression, an avatar indicating an expression or a whole body, a photograph, a character, or the like.

（動作）
以下に、本発明の一実施の形態における電子会議システムの動作を各図を参照しつつ説明する。 (Operation)
Hereinafter, the operation of the electronic conference system according to an embodiment of the present invention will be described with reference to the drawings.

利用者Ａ１及びＡ２は、表示部３Ａに映し出された利用者Ｂ１〜Ｂ３の映像と、スピーカ５Ａから出力される利用者Ｂ１〜Ｂ３の声とを視聴しながら会議を進める。表示部３Ａに映し出される映像、及びスピーカ５Ａから出力される音声は、情報処理装置２Ａがインターネット１０を介して情報処理装置２Ｂから受信した情報に基づいて再生される。同時に、情報処理装置２Ａは、カメラ４Ａによって撮影された利用者Ａ１及びＡ２の映像信号と、マイク６Ａによって集音された利用者Ａ２及びＡ２の声の音声信号とを処理し、映像情報と音声情報としてインターネット１０を介して情報処理装置２Ｂに送信する。 The users A1 and A2 advance the conference while viewing the videos of the users B1 to B3 displayed on the display unit 3A and the voices of the users B1 to B3 output from the speaker 5A. The video displayed on the display unit 3A and the sound output from the speaker 5A are reproduced based on information received by the information processing apparatus 2A from the information processing apparatus 2B via the Internet 10. At the same time, the information processing apparatus 2A processes the video signals of the users A1 and A2 photographed by the camera 4A and the voice signals of the voices of the users A2 and A2 collected by the microphone 6A to obtain video information and audio. Information is transmitted to the information processing apparatus 2B via the Internet 10.

情報処理装置２Ａの感情抽出手段２００は、上記動作を実行するとともに、カメラ４Ａ、マイク６Ａ及びセンサ７Ａから得られる情報に基づいて利用者Ａ１及びＡ２の感情を抽出し、感情表現画像選択手段２０１及び音声補正手段２０２は、抽出された感情に応じて以下に説明する動作を実行する。 The emotion extraction unit 200 of the information processing apparatus 2A performs the above operation, extracts emotions of the users A1 and A2 based on information obtained from the camera 4A, the microphone 6A, and the sensor 7A, and an emotion expression image selection unit 201. The voice correction unit 202 executes an operation described below according to the extracted emotion.

感情抽出手段２００は、カメラ４Ａから得た映像情報を分析し、例えば、表情から「怒り」や「楽しい」、「悲しい」といった感情を抽出する。また、表情に変化がないときには「無関心」、あくびをしたときには「退屈」、長時間うつむいているときには「眠い」といった感情を抽出する。 The emotion extraction means 200 analyzes the video information obtained from the camera 4A and extracts, for example, emotions such as “anger”, “fun”, and “sad” from the facial expression. Also, emotions such as “indifference” when there is no change in facial expression, “boring” when yawning, and “sleepy” when depressed for a long time are extracted.

また、感情抽出手段２００は、マイク６Ａから得た音声情報を分析し、例えば、音声の波形の抑揚が大きいとき「興奮している」、キーが高いときや音声のピッチが速いときには「緊張している」といった感情を抽出する。 The emotion extraction means 200 analyzes the voice information obtained from the microphone 6A. For example, the emotion extraction means 200 is “excited” when the inflection of the voice waveform is large, “tense” when the key is high, or the voice pitch is fast. Extract emotions such as

また、感情抽出手段２００は、センサ７Ａから得た情報を分析し、例えば、血圧や体温が正常値から外れたときには「怒り」「興奮している」「緊張している」等の感情を抽出する。 The emotion extraction means 200 analyzes the information obtained from the sensor 7A and extracts emotions such as “anger”, “excited”, and “tensed” when the blood pressure and body temperature deviate from normal values. To do.

図３（ａ）〜（ｆ）は、本発明の第１の実施の形態に係る音声補正プログラムの動作例を示す概略図である。 FIGS. 3A to 3F are schematic diagrams illustrating an operation example of the audio correction program according to the first embodiment of the present invention.

感情抽出手段２００は、例えば、図３（ａ）に示すように、波形の振幅の大きい音声が入力したとき、「怒り」で声を荒げていると判断し、図３（ｂ）に示すように、音声補正手段２０２に、所定の振幅Ａ１以上の音量を有する音声を平滑化させ、不快な音量を相手に与えないようにする。 For example, as shown in FIG. 3A, the emotion extraction unit 200 determines that the voice is rough due to “anger” when a voice having a large waveform amplitude is input, and the voice is shown in FIG. 3B. As described above, the sound correction unit 202 is made to smooth the sound having a volume greater than or equal to the predetermined amplitude A1, so that an unpleasant volume is not given to the other party.

また、感情抽出手段２００は、例えば、図３（ｃ）に示すように、波形のピッチの速い音声が入力したときには、「緊張」で早口になっていると判断し、図３（ｄ）に示すように、音声補正手段２０２に、所定の間隔Ｔ１以下のピッチを有する音声間隔Ｔ２に変化させて、緊張を相手に伝えないようにする。 Further, for example, as shown in FIG. 3 (c), the emotion extraction means 200 determines that the voice is “tense” and becomes a quick mouth when a voice having a fast waveform pitch is input, and FIG. 3 (d) shows. As shown, the voice correction means 202 is changed to a voice interval T2 having a pitch equal to or smaller than a predetermined interval T1, so that tension is not transmitted to the other party.

また、感情抽出手段２００は、例えば、図３（ｅ）に示すように、波形の振幅の小さな音声が入力したときには、「元気がない」ために声が小さくなっていると判断し、図３（ｆ）に示すように、音声補正手段２０２に、所定の振幅Ａ２以下の音量を有する音声を振幅Ａ２に拡張して、元気のなさを伝えないようにする。 Further, for example, as shown in FIG. 3 (e), the emotion extraction means 200 determines that the voice is low because the voice is small because the waveform has a small amplitude. As shown in (f), a voice having a volume equal to or lower than a predetermined amplitude A2 is expanded to the amplitude A2 to the voice correction unit 202 so as not to convey a lack of energy.

図４（ａ）〜（ｄ）は、本発明の第１の実施の形態に係る音声補正プログラムの動作例を示す概略図である。 4A to 4D are schematic diagrams illustrating an operation example of the audio correction program according to the first embodiment of the present invention.

情報処理装置２Ａの感情抽出手段２００が感情を抽出していないときには、カメラ４Ａで撮影されて映像処理手段２０３によって処理された映像情報は、情報処理装置２Ｂに送信され、図４（ａ）に示すように表示映像３０Ｂを表示部３Ｂに表示する。 When the emotion extraction means 200 of the information processing device 2A does not extract emotions, the video information photographed by the camera 4A and processed by the video processing means 203 is transmitted to the information processing device 2B, as shown in FIG. As shown, the display video 30B is displayed on the display unit 3B.

情報処理装置２Ａの感情抽出手段２００は、感情を抽出すると、例えば、図４（ｂ）に示すように、表示映像３０Ｂ上に抽出した感情に基づいて表情を有するアイコン等である感情表現画像３００ａ及び３００ｂを表示し、感情表現画像３００ａ及び３００ｂに該当する利用者Ａ１及びＡ２を指し示すよう符号３０１ａ及び３０１ｂを表示する。 When the emotion extraction means 200 of the information processing apparatus 2A extracts the emotion, for example, as shown in FIG. 4B, the emotion expression image 300a that is an icon having a facial expression based on the emotion extracted on the display video 30B, for example. And 300b, and reference numerals 301a and 301b are displayed so as to indicate the users A1 and A2 corresponding to the emotion expression images 300a and 300b.

また、情報処理装置２Ａの感情抽出手段２００は、感情を抽出すると、例えば、図４（ｃ）に示すように、実映像３１Ｂを表示するとともに、抽出した感情に基づいて全身のアバターである感情表現画像３２０ａ及び３２０ｂを用いて感情表現映像３２Ｂを表示してもよい。 Further, when the emotion extraction means 200 of the information processing apparatus 2A extracts the emotion, for example, as shown in FIG. 4C, the emotion extraction means 200 displays a real video 31B and, based on the extracted emotion, an emotion that is a whole body avatar. The emotion expression video 32B may be displayed using the expression images 320a and 320b.

また、情報処理装置２Ａの感情抽出手段２００は、感情を抽出すると、例えば、図４（ｄ）に示すように、表示映像３０Ｂ上に抽出した感情に基づいて利用者Ａ２の顔に重ねるように表情を有するアイコンである感情表現画像３３０ａを表示してもよい。 Further, when the emotion extraction unit 200 of the information processing apparatus 2A extracts the emotion, for example, as shown in FIG. 4D, the emotion extraction unit 200 may superimpose the emotion on the face of the user A2 based on the emotion extracted on the display video 30B. An emotion expression image 330a which is an icon having a facial expression may be displayed.

図５は、本発明の第１の実施の形態に係る音声補正プログラムの動作例を示すフローチャートである。 FIG. 5 is a flowchart showing an operation example of the sound correction program according to the first embodiment of the present invention.

まず、感情抽出手段２００は、カメラ４Ａ、マイク６Ａ及びセンサ７Ａによって利用者Ａ１及びＡ２の感情を抽出する（Ｓ１０）。次に、感情表現画像選択手段２０１は、抽出された感情に基づいて図４（ｂ）〜（ｄ）に示すような感情表現画像を選択し（Ｓ１１）、表示映像に重畳処理する（Ｓ１２）。 First, the emotion extraction means 200 extracts the emotions of the users A1 and A2 using the camera 4A, the microphone 6A, and the sensor 7A (S10). Next, the emotion expression image selection means 201 selects an emotion expression image as shown in FIGS. 4B to 4D based on the extracted emotion (S11), and superimposes it on the display video (S12). .

次に、音声補正手段２０２は、例えば、図３（ａ）、（ｃ）及び（ｅ）に示す例に該当し、補正が必要と判断したとき（Ｓ１３；Ｙｅｓ）、図３（ｂ）、（ｄ）及び（ｆ）に示すように音声を補正する（Ｓ１４）。また、補正が必要ないとき（Ｓ１３；Ｎｏ）、次のステップＳ１５へ進む。 Next, the sound correction unit 202 corresponds to, for example, the examples shown in FIGS. 3A, 3C, and 3E, and when it is determined that correction is necessary (S13; Yes), FIG. The voice is corrected as shown in (d) and (f) (S14). When correction is not necessary (S13; No), the process proceeds to the next step S15.

次に、映像処理手段２０３及び音声処理手段２０４は、ステップＳ１２で重畳処理した映像情報及びステップＳ１４で補正した音声情報又は補正しない音声情報を通信部２２Ａ及びインターネット１０を介して情報処理装置２Ｂへ送信する（Ｓ１５）。 Next, the video processing unit 203 and the audio processing unit 204 send the video information superimposed in step S12 and the audio information corrected or not corrected in step S14 to the information processing apparatus 2B via the communication unit 22A and the Internet 10. Transmit (S15).

［第２の実施の形態］
第２の実施の形態は、第１の実施の形態において通信相手に送信される感情表現画像及び補正された音声情報を利用者自身にフィードバックすることで利用者の感情を操作するものである。 [Second Embodiment]
In the second embodiment, the emotion of the user is manipulated by feeding back the emotion expression image and the corrected audio information transmitted to the communication partner in the first embodiment to the user.

図６は、本発明の第２の実施の形態に係る音声補正プログラムの動作例を示すフローチャートである。 FIG. 6 is a flowchart showing an operation example of the sound correction program according to the second embodiment of the present invention.

まず、感情抽出手段２００は、カメラ４Ａ、マイク６Ａ及びセンサ７Ａによって利用者Ａ１及びＡ２の感情を抽出する（Ｓ２０）。次に、感情表現画像選択手段２０１は、抽出された感情に基づいて図４（ｂ）〜（ｄ）に示すような感情表現画像を選択し（Ｓ２１）、表示映像に重畳処理する（Ｓ２２）。 First, the emotion extraction means 200 extracts the emotions of the users A1 and A2 using the camera 4A, the microphone 6A, and the sensor 7A (S20). Next, the emotion expression image selection means 201 selects an emotion expression image as shown in FIGS. 4B to 4D based on the extracted emotion (S21), and superimposes it on the display video (S22). .

次に、音声補正手段２０２は、例えば、図３（ａ）、（ｃ）及び（ｅ）に示す例に該当し、補正が必要と判断したとき（Ｓ２３；Ｙｅｓ）、図３（ｂ）、（ｄ）及び（ｆ）に示すように音声を補正する（Ｓ２４）。また、補正が必要ないとき（Ｓ２３；Ｎｏ）、次のステップＳ２５へ進む。 Next, the sound correction unit 202 corresponds to, for example, the example shown in FIGS. 3A, 3C, and 3E, and when it is determined that correction is necessary (S23; Yes), FIG. The voice is corrected as shown in (d) and (f) (S24). When correction is not necessary (S23; No), the process proceeds to the next step S25.

次に、映像処理手段２０３及び音声処理手段２０４は、ステップＳ２２で重畳処理した映像情報及びステップＳ２４で補正した音声情報又は補正しない音声情報を表示部３Ａ及びスピーカ５Ａから出力することで利用者Ａ１及びＡ２に自信の感情をフィードバックする（Ｓ２５）。 Next, the video processing unit 203 and the audio processing unit 204 output the video information superimposed in step S22 and the audio information corrected or not corrected in step S24 from the display unit 3A and the speaker 5A to output the user A1. And the feeling of confidence is fed back to A2 (S25).

なお、フィードバックは感情表現画像や音声の補正に限らず、情報処理装置２Ａと場所Ａに設置された照明を接続して照度を調整してもよいし、空調を接続して気温を調整してもよい。また、利用者の感情を抑える芳香剤を噴出する機器に接続されてもよい。 The feedback is not limited to the correction of the emotional expression image and sound, but the illuminance may be adjusted by connecting the information processing device 2A and the illumination installed at the place A, or the air temperature is connected to adjust the temperature. Also good. Moreover, you may be connected to the apparatus which ejects the fragrance | flavor which suppresses a user's emotion.

［他の実施の形態］
なお、本発明は、上記実施の形態に限定されず、本発明の趣旨を逸脱しない範囲で種々な変形が可能である。例えば、感情表現画像は、画像に限らず動画であってもよい。 [Other embodiments]
The present invention is not limited to the above embodiment, and various modifications can be made without departing from the spirit of the present invention. For example, the emotion expression image is not limited to an image but may be a moving image.

また、上記実施の形態で使用される感情抽出手段２００と、感情表現画像選択手段２０１と、音声補正手段２０２と、映像処理手段２０３と、音声処理手段２０４とは、ＣＤ−ＲＯＭ等の記憶媒体から装置内の記憶部に読み込んでも良く、インターネット等のネットワークに接続されているサーバ装置等から装置内の記憶部にダウンロードしてもよい。また、上記実施の形態で使用される手段の一部または全部をＡＳＩＣ等のハードウェアによって実現してもよい。 Moreover, the emotion extraction means 200, the emotion expression image selection means 201, the sound correction means 202, the video processing means 203, and the sound processing means 204 used in the above embodiment are a storage medium such as a CD-ROM. Or may be downloaded to a storage unit in the apparatus from a server apparatus connected to a network such as the Internet. Moreover, you may implement | achieve part or all of the means used by the said embodiment by hardware, such as ASIC.

１…電子会議システム、２Ａ…情報処理装置、２Ｂ…情報処理装置、３Ａ…表示部、３Ｂ…表示部、４Ａ…カメラ、４Ｂ…カメラ、５Ａ…スピーカ、５Ｂ…スピーカ、６Ａ…マイク、６Ｂ…マイク、７Ａ…センサ、７Ｂ…センサ、１０…インターネット、２０Ａ…制御部、２１Ａ…記憶部、２２Ａ…通信部、３０Ｂ…表示映像、３１Ｂ…実映像、３２Ｂ…感情表現映像、２００…感情抽出手段、２０１…感情表現画像選択手段、２０２…音声補正手段、２０３…映像処理手段、２０４…音声処理手段、２１０…音声補正プログラム、２１１…感情表現画像、３００ａ…感情表現画像、３０１ａ…符号、３２０ａ…感情表現画像、３３０ａ…感情表現画像 DESCRIPTION OF SYMBOLS 1 ... Electronic conference system, 2A ... Information processing apparatus, 2B ... Information processing apparatus, 3A ... Display part, 3B ... Display part, 4A ... Camera, 4B ... Camera, 5A ... Speaker, 5B ... Speaker, 6A ... Microphone, 6B ... Microphone, 7A ... sensor, 7B ... sensor, 10 ... Internet, 20A ... control unit, 21A ... storage unit, 22A ... communication unit, 30B ... display video, 31B ... real video, 32B ... emotion expression video, 200 ... emotion extraction means 201 ... emotion expression image selection means, 202 ... sound correction means, 203 ... video processing means, 204 ... sound processing means, 210 ... sound correction program, 211 ... emotion expression image, 300a ... emotion expression image, 301a ... code, 320a ... emotion expression image, 330a ... emotion expression image

Claims

Emotion extraction unit configured to extract a plurality of video signals obtained by photographing the user's facial expressions and / or the plurality of the plurality based on the audio signal obtained from the user voice user emotions, respectively,
Storage means for storing a plurality of emotional expression images expressing different emotions,
And emotional expression image selecting means for respectively selecting the emotion image from the plurality of emotional expressions image based on feelings of the emotion extraction unit extracts the plurality of users,
Voice correction means for correcting the voice signal so as to suppress transmission of the emotion based on the emotion extracted by the emotion extraction means;
Information processing comprising: video processing means for generating and processing video information displaying the selected emotion expression image on a display image of the video signal and displaying a code indicating each user corresponding to the emotion expression image apparatus.

Voice processing means for generating voice information from the voice signal corrected by the voice correction means;
The information processing apparatus according to claim 1, further comprising a communication unit that transmits the video information and the audio information to an external information processing apparatus.

Voice processing means for reproducing the voice from the voice signal corrected by the voice correction means;
The information processing apparatus according to claim 1, wherein the video processing unit reproduces the video information.

The information processing apparatus according to claim 1, wherein the emotion extraction unit further acquires the user's biological information and extracts the user's emotion based on the biological information.

A computer that can refer to multiple emotional expressions that express different emotions,
An emotion extraction function for respectively extracting emotions of the plurality of users based on video signals obtained by photographing the facial expressions of a plurality of users and / or audio signals obtained from the voices of the users;
An emotion expression image selection function for selecting each of the emotion expression images based on the extracted emotions of the plurality of users ;
A voice correction function for correcting the voice signal based on the extracted emotion to suppress transmission of the emotion;
In order to display the selected emotion expression image on the display image of the video signal , and to execute a video processing function for generating and processing video information displaying a code indicating each user corresponding to the emotion expression image Voice correction program.