JP4425172B2

JP4425172B2 - Call device, call system, and program

Info

Publication number: JP4425172B2
Application number: JP2005113628A
Authority: JP
Inventors: 穣丸山; 雅史高橋; 俊幸岩井
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2005-04-11
Filing date: 2005-04-11
Publication date: 2010-03-03
Anticipated expiration: 2025-04-11
Also published as: JP2006295552A

Description

本発明は、通話装置、通話システム、及びプログラム、より詳細には、電話通信などネットワークを介して双方向音声通話を行なう際に、通常の音声通話で扱っている話者の声や周りの音を意味する会話音声と、ネットワーク上にある動画ファイルの音声やビデオデッキに録画された番組などのコンテンツの音を意味するコンテンツ音声といった複数の音声を同時に出力しながら音声コミュニケーションを行なう通話装置、通話システム、及びプログラムに関する。 The present invention relates to a call device , a call system, and a program , and more specifically, when a two-way voice call is made via a network such as a telephone communication, a voice of a speaker and a surrounding sound handled in a normal voice call. Conversation device that communicates while simultaneously outputting multiple voices, such as voice of video files on the network and content voice that means the sound of content such as programs recorded on a video deck on the network. The present invention relates to a system and a program .

従来、通話する相手（通話相手）に自分の声を聞かせるために、通話相手の通話装置の音量を制御する技術が知られている。例えば、特許文献１には、自動車などの車両に搭載されて、交通事故、急病などの緊急時に警察、緊急通報センタなど緊急通報システムを管轄するセンタに現在の車両の位置情報、登録車両などのデータを送信する緊急通報システム端末機器に関し、特に、緊急通報連絡時のハンズフリーによる音声通話において、緊急通報システムを管轄するセンタからの音声信号の音量レベル制御を行なう緊急通報システムが開示されている。この技術を用いることにより、緊急通報時のハンズフリー通話の音量レベルが一定レベル以上になるように制御することによって、緊急通報センタからの受信音声を確実に鳴音させることが可能となる。
特開２０００−２２２６６２号公報 2. Description of the Related Art Conventionally, there is known a technique for controlling the volume of a call device of a call partner so that the other party (call partner) can hear his / her voice. For example, in Patent Document 1, the current vehicle location information, registered vehicle, etc. are installed in a vehicle such as an automobile and the center that has jurisdiction over the emergency call system such as the police or emergency call center in an emergency such as a traffic accident or sudden illness. An emergency call system that controls the volume level of an audio signal from a center that has jurisdiction over the emergency call system has been disclosed, particularly in a hands-free voice call during emergency call notification, regarding an emergency call system terminal device that transmits data . By using this technique, it is possible to reliably sound the voice received from the emergency call center by controlling the volume level of the hands-free call at the time of emergency call to be a certain level or higher.
Japanese Patent Laid-Open No. 2000-222626

しかしながら、会話音声とコンテンツ音声といった複数の音声を同時に出力しながら音声コミュニケーションを行なうような状況で会話音声を相手に確実に聞かせるためには、相手の通話装置で会話音声の音量レベルが一定レベル以上になるように制御するだけでは不十分である。 However, in order to ensure that the conversation voice is heard by the other party in a situation where voice communication is performed while simultaneously outputting a plurality of voices such as the conversation voice and the content voice, the volume level of the conversation voice is constant at the other party's call device. It is not sufficient to control only the above.

図３０は、従来のシステムにおいて会話音声とコンテンツ音声といった複数の音声を同時に出力しながら音声コミュニケーションを行なう状況を説明するための概念図で、図中、５０１ａはＡさん宅の通話装置である情報処理端末、５０１ｂはＢさん宅の通話装置である情報処理端末、５０２はネットワーク、５０３は各種コンテンツを提供可能なコンテンツサーバを示す。本例では、運動会のビデオコンテンツを音声コミュニケーション両方の端末（情報処理端末５０１ａと情報処理端末５０１ｂ）で流しながら会話を行なっている。すなわち、情報処理端末５０１ａでは、運動会のビデオコンテンツの音声「わーわー」と相手（情報処理端末５０１ｂのユーザであるＢさん）の会話音声「よかったね」とが出力されている。また、情報処理端末５０１ｂでは、運動会のビデオコンテンツの音声「わーわー」と相手（情報処理端末５０１ａのユーザであるＡさん）の会話音声「運動会楽しかったよ」とが出力されている。 FIG. 30 is a conceptual diagram for explaining a situation in which voice communication is performed while simultaneously outputting a plurality of voices such as conversation voices and content voices in a conventional system. In the figure, reference numeral 501a denotes information on a telephone device at Mr. A's house. A processing terminal, 501b is an information processing terminal that is a call device at Mr. B's house, 502 is a network, and 503 is a content server that can provide various contents. In this example, conversation is performed while the video content of the athletic meet is played on both voice communication terminals (information processing terminal 501a and information processing terminal 501b). That is, the information processing terminal 501a outputs the voice “Wow” of the video content of the athletic meet and the conversation voice “good” of the other party (Mr. B who is the user of the information processing terminal 501b). In addition, the information processing terminal 501b outputs the voice “Wow” of the video content of the athletic meet and the conversation voice “I enjoyed the athletic meet” of the other party (Mr. A who is the user of the information processing terminal 501a).

このような状況で、会話を確実に相手側に聞かせたいときに、相手側で出音される会話音声の音量レベルが一定レベル以上になるよう制御できる機能を備えているだけでは、会話を確実に相手に聞かせられるとは限らない。その詳細を下記の図３１に基づいて説明する。 In such a situation, when you want to make sure that the other party hears the conversation, simply having a function that can control the volume level of the conversational sound output by the other party to be above a certain level will ensure the conversation. Is not always heard by the other party. Details thereof will be described with reference to FIG. 31 below.

図３１は、図３０に示した従来システムで発生する問題点を説明するための図である。まず、Ａさんが会話を確実に相手に聞かせるために、Ｂさん側で出音される会話音声の音量レベルを上げたとすると、情報処理端末５０１ｂで出力される会話音声「運動会楽しかったよ」の音量が大きくなる。しかし、そのときＢさんがビデオコンテンツの音声の音量レベルを会話音声の音量レベルと同等、もしくは上回るよう調整したとする。この場合、情報処理端末５０１ｂで出力されるビデオコンテンツの音声「わーわー」の音量が、会話音声「運動会楽しかったよ」の音量と同等もしくは上回って設定されるため、Ｂさん側では会話音声がコンテンツ音声にかき消されて、Ｂさんは会話音声を十分に聞き取れない。つまり、Ａさんの意図と異なりＢさんに会話を聞かすことができず、コミュニケーション不良を引き起こすという問題がある。 FIG. 31 is a diagram for explaining problems that occur in the conventional system shown in FIG. First, suppose that Mr. A raises the volume level of the conversational sound produced by Mr. B in order to ensure that the conversation is spoken to the other party, the conversational voice “It was fun at the athletic meet” that is output from the information processing terminal 501b. The volume increases. However, suppose that Mr. B adjusts the volume level of the audio of the video content to be equal to or higher than the volume level of the conversational audio at that time. In this case, the volume of the voice “Wow” of the video content output from the information processing terminal 501b is set to be equal to or higher than the volume of the conversation voice “I enjoyed the athletic meet”, so Mr. B has the conversation voice. Is drowned out by the content sound, and Mr. B cannot fully hear the conversation sound. In other words, unlike Mr. A's intention, there is a problem that Mr. B cannot be spoken and a poor communication is caused.

本発明は、上述のごとき実情に鑑みてなされたものであり、会話音声とコンテンツ音声といった複数の音声を同時に出力しながら音声コミュニケーションを行なう状況においても、送話側から操作することにより、相手（受話）側において会話音声あるいはコンテンツ音声の出音を優先させて、いずれか一方の音声を確実に相手に聞かせることができる通話装置、通話システム、及びプログラムを提供すること、をその目的とする。 The present invention has been made in view of the above circumstances, and even in a situation where voice communication is performed while simultaneously outputting a plurality of voices such as a conversation voice and a content voice, the partner ( The object is to provide a call device, a call system, and a program that give priority to the output of conversational sound or content sound on the receiver side and can reliably hear either one of the other party 's sounds. .

例えば、会話音声を優先して相手に聞かせたい場合、自分（送話）側から操作することにより、相手（受話）側の出音レベル設定が変更されて会話音声が優先して出音されるように設定し、同時に、相手の操作による出音レベル設定の調整を禁止もしくは制限する手段を提供することにある。 For example , if you want to give the conversation voice priority to the other party, you can operate from your own (sending) side, and the sound level setting on the other party (receiving) side will be changed and the conversation voice will be given priority. And a means for prohibiting or limiting the adjustment of the sound output level setting by the other party's operation.

上記課題を解決するために、本発明の第１の技術手段は、他の通話装置と会話音声を含む通話を行うと共に、音声を含むコンテンツを同期して視聴することが可能な通話装置であって、前記通話装置は、前記他の通話装置にて会話音声とコンテンツ音声のいずれかを優先して出音設定するための出音レベル設定変更指示情報を前記他の通話装置に送信する手段を備え、前記通話装置の会話音量レベルが一定量を上回ったときに、前記通話装置から前記他の通話装置に出音レベル設定変更指示情報を送信することを特徴としたものである。
なお、ここで述べる通話装置は、少なくとも音声通信を備えるが、通常のテレビ電話と同様に音声通信と共に映像通信を行なってもよく、固定電話、テレビ電話、携帯電話、ＰＣなど様々な形態が考えられる。 In order to solve the above-described problem, the first technical means of the present invention is a call device capable of making a call including a conversation voice with another call device and simultaneously viewing a content including the sound. The call device further includes means for transmitting to the other call device sound output level setting change instruction information for setting the sound output with priority given to either the conversation voice or the content sound in the other call device. provided, when the speech loudness level of the communication device exceeds a predetermined amount, is from the communication device that features that you send the sound output level setting change instruction information to the other communication device.
The communication device described here includes at least voice communication. However, video communication may be performed together with voice communication in the same manner as a normal videophone, and various forms such as a fixed phone, a videophone, a mobile phone, and a PC are considered. It is done.

第２の技術手段は、他の通話装置と会話音声を含む通話を行うと共に、音声を含むコンテンツを同期して視聴することが可能な通話装置であって、前記通話装置は、前記他の通話装置にて会話音声とコンテンツ音声のいずれかを優先して出音設定するための出音レベル設定変更指示情報を前記他の通話装置に送信する手段を備え、前記他の通話装置の出音設定は、エコーキャンセラの設定を変更することを特徴としたものである。 The second technical means is a call device capable of making a call including a conversation voice with another call device and simultaneously viewing a content including the sound, wherein the call device includes the other call device. Means for transmitting sound output level setting change instruction information for setting sound output by giving priority to either conversational sound or content sound at the device, and setting sound output of the other call device Is characterized by changing the setting of the echo canceller.

第３の技術手段は、他の通話装置と会話音声を含む通話を行うと共に、音声を含むコンテンツを同期して視聴することが可能な通話装置であって、前記通話装置は、前記他の通話装置にて会話音声とコンテンツ音声のいずれかを優先して出音設定するための出音レベル設定変更指示情報を前記他の通話装置に送信する手段を備え、前記他の通話装置の出音設定は、会話音声の音量を上げて、コンテンツ音声の音量を下げることを特徴としたものである。 A third technical means is a call device capable of making a call including a conversation voice with another call device and simultaneously viewing a content including the sound, wherein the call device includes the other call device. Means for transmitting sound output level setting change instruction information for setting sound output by giving priority to either conversational sound or content sound at the device, and setting sound output of the other call device Is characterized in that the volume of the conversation audio is increased and the volume of the content audio is decreased.

第４の技術手段は、他の通話装置と会話音声を含む通話を行うと共に、音声を含むコンテンツを同期して視聴することが可能な通話装置であって、前記通話装置は、前記他の通話装置にて会話音声とコンテンツ音声のいずれかを優先して出音設定するための出音レベル設定変更指示情報を前記他の通話装置に送信する手段を備え、前記他の通話装置の出音設定は、異なるスピーカで出音させることを特徴としたものである。 A fourth technical means is a call device capable of making a call including a conversation voice with another call device and simultaneously viewing a content including the sound, wherein the call device includes the other call device. Means for transmitting sound output level setting change instruction information for setting sound output by giving priority to either conversational sound or content sound at the device, and setting sound output of the other call device Is characterized by sound output from different speakers.

第５の技術手段は、第１〜第４の技術手段のいずれか１において、前記通話装置は、前記他の通話装置の出音設定を変更した後に、所定時間、前記他の通話装置の操作による出音設定の変更を禁止もしくは制限することを特徴としたものである。 According to a fifth technical means, in any one of the first to fourth technical means, the call device operates the other call device for a predetermined time after changing the sound output setting of the other call device. It is characterized by prohibiting or restricting the change of the sound output setting by.

第６の技術手段は、第１〜第５の技術手段のいずれか１において、前記通話装置は、前記他の通話装置の出音設定を変更した場合に、これに連動して前記通話装置の出音設定を変更することを特徴としたものである。 A sixth technical means is any one of the first to fifth technical means, in which, when the call device changes the sound output setting of the other call device, The sound output setting is changed.

第７の技術手段は、他の通話装置と会話音声を含む通話を行うと共に、音声を含むコンテンツを同期して視聴することが可能な通話装置であって、前記通話装置は、該通話装置にて会話音声とコンテンツ音声のいずれかを優先して出音設定するための出音レベル設定変更指示情報を前記他の通話装置から受信する手段を備え、該受信した出音レベル設定変更指示情報に従って、前記通話装置の出音設定として、エコーキャンセラの設定を変更することを特徴としたものである。 A seventh technical means is a call device capable of making a call including a conversation voice with another call device and simultaneously viewing a content including the sound, wherein the call device is connected to the call device. Means for receiving sound output level setting change instruction information for setting sound output by giving priority to either conversational sound or content sound from the other call device, and according to the received sound level setting change instruction information The setting of the echo canceller is changed as the sound output setting of the communication device .

第８の技術手段は、他の通話装置と会話音声を含む通話を行うと共に、音声を含むコンテンツを同期して視聴することが可能な通話装置であって、前記通話装置は、該通話装置にて会話音声とコンテンツ音声のいずれかを優先して出音設定するための出音レベル設定変更指示情報を前記他の通話装置から受信する手段を備え、該受信した出音レベル設定変更指示情報に従って、前記通話装置の出音設定として、会話音声の音量を上げて、コンテンツ音声の音量を下げることを特徴としたものである。 The eighth technical means is a call device capable of making a call including a conversation voice with another call device and allowing the content including the sound to be synchronized and viewed, wherein the call device is connected to the call device. Means for receiving sound output level setting change instruction information for setting sound output by giving priority to either conversational sound or content sound from the other call device, and according to the received sound level setting change instruction information as sound output setting of the communication devices, to increase the volume of the conversation voice, is obtained by said lowering the sound volume of content sound.

第９の技術手段は、他の通話装置と会話音声を含む通話を行うと共に、音声を含むコンテンツを同期して視聴することが可能な通話装置であって、前記通話装置は、該通話装置にて会話音声とコンテンツ音声のいずれかを優先して出音設定するための出音レベル設定変更指示情報を前記他の通話装置から受信する手段を備え、該受信した出音レベル設定変更指示情報に従って、前記通話装置の出音設定として、異なるスピーカで出音することを特徴としたものである。 A ninth technical means is a call device capable of making a call including a conversation voice with another call device and simultaneously viewing a content including the sound, wherein the call device is connected to the call device. Means for receiving sound output level setting change instruction information for setting sound output by giving priority to either conversational sound or content sound from the other call device, and according to the received sound level setting change instruction information As a sound output setting of the call device, sound is output from different speakers.

第１０の技術手段は、第７〜第９のいずれか１の技術手段において、前記通話装置は、該通話装置の出音設定を変更した後に、所定時間、前記通話装置の操作による出音設定の変更を禁止もしくは制限することを特徴としたものである。 According to a tenth technical means, in any one of the seventh to ninth technical means, the call device changes the sound output setting of the call device, and then sets the sound output by operating the call device for a predetermined time. It is characterized by prohibiting or restricting the change of

第１１の技術手段は、第７〜第９の技術手段のいずれか１において、前記通話装置は、前記他の通話装置から出音レベル設定変更指示情報を受信した場合に、該出音レベル設定変更指示情報による出音設定の指示を無効化し、前記通話装置の出音設定を変更しないことを特徴としたものである。 In an eleventh technical means according to any one of the seventh to ninth technical means, when the communication device receives the sound output level setting change instruction information from the other communication device, the sound output level setting is performed. The sound output setting instruction based on the change instruction information is invalidated, and the sound output setting of the call device is not changed.

第１２の技術手段は、他の通話装置と会話音声を含む通話を行うと共に、音声を含むコンテンツを同期して視聴することが可能な通話装置であって、前記通話装置は、前記他の通話装置から受信した混合音量比情報に従って会話音声とコンテンツ音声を混合した混合音声を前記他の通話装置に送信する手段と、前記他の通話装置にて会話音声とコンテンツ音声のいずれかを優先して出音するように、前記他の通話装置に送信する混合音声の混合音量比を変更する手段とを備え、前記通話装置の会話音量レベルが一定量を上回ったときに、前記他の通話装置に送信する混合音声の混合音量比を変更することを特徴としたものである。 A twelfth technical means is a call device capable of making a call including a conversation voice with another call device and simultaneously viewing a content including the sound, wherein the call device includes the other call device. Means for transmitting to the other call device a mixed sound obtained by mixing the conversation sound and the content sound in accordance with the mixed volume ratio information received from the device, and giving priority to either the conversation sound or the content sound in the other call device. Means for changing a mixed sound volume ratio of the mixed sound to be transmitted to the other call device so as to make a sound, and when the conversation volume level of the call device exceeds a certain amount, it is obtained by the features that you change the mixing volume ratio of the mixed sound to be transmitted.

第１３の技術手段は、第１２の技術手段のいずれか１において、前記通話装置は、前記他の通話装置に送信する混合音声の混合音量比として、会話音声の音量を上げて、コンテンツ音声の音量を下げることを特徴としたものである。 In a thirteenth technical means according to any one of the twelfth technical means, the call device raises the volume of the conversational sound as a mixed sound volume ratio of the mixed sound transmitted to the other call device, It is characterized by lowering the volume.

第１４の技術手段は、第１２又は第１３の技術手段のいずれか１において、前記通話装置は、前記他の通話装置に送信する混合音声の混合音量比を変更した後に、所定時間、前記他の通話装置の操作による混合音量比の変更を禁止もしくは制限することを特徴としたものである。 In a fourteenth technical means according to any one of the twelfth and thirteenth technical means, the communication device changes the mixed sound volume ratio of the mixed sound to be transmitted to the other communication device, and then the other device for a predetermined time. The change of the mixed sound volume ratio by the operation of the communication device is prohibited or restricted.

第１５の技術手段は、第１２〜第１４の技術手段のいずれか１において、前記通話装置は、前記他の通話装置に送信する混合音声の混合音量比を変更した場合に、これに連動して前記通話装置にて会話音声とコンテンツ音声のいずれかを優先するよう出音設定を変更することを特徴としたものである。 According to a fifteenth technical means, in any one of the twelfth to fourteenth technical means, when the communication device changes a mixed sound volume ratio of mixed sound to be transmitted to the other communication device, it is interlocked with this. Thus, the sound output setting is changed so that priority is given to either the conversation sound or the content sound in the call device.

第１６の技術手段は、第１〜第１５の技術手段のいずれか１における通話装置と他の通話装置とで構成される通話システムである。 The sixteenth technical means is a call system comprising the call device according to any one of the first to fifteenth technical means and another call device.

第１７の技術手段は、第１〜第１５の技術手段のいずれか１における通話装置としての機能を実行するためのプログラムである。 The seventeenth technical means is a program for executing a function as a communication device in any one of the first to fifteenth technical means.

本発明によれば、会話音声とコンテンツ音声といった複数の音声を同時に出力しながら音声コミュニケーションを行なう状況においても、任意のタイミングで、相手（受話側）に自分（送話側）の会話音声あるいはコンテンツ音声を明瞭に聞かせることができる。
具体的には、送話側から操作することにより、相手（受話）側で会話音声の音量レベルを上げることにより会話音声を聞きやすくしたり、逆に、コンテンツ音声の音量レベルを下げることにより会話音声を聞きやすくしたり、さらに、会話音声の音量レベルを上げつつコンテンツ音声の音量レベルを下げることにより会話音声を聞きやすくする。他にも、ステレオスピーカにてコンテンツ音声と会話音声を異なるチャネルで分けて出音させることによりコンテンツ音声と会話音声を聞き分けやすくする。
According to the present invention, even in a situation where voice communication is performed while simultaneously outputting a plurality of voices such as a conversation voice and a content voice, the conversation voice or content of the user (sending side) can be sent to the other party (receiving side) at an arbitrary timing. The voice can be heard clearly.
Specifically, by operating from the sending side, the other party (receiving side) can increase the volume level of the conversational voice to make it easier to hear the conversational voice, or conversely, by lowering the volume level of the content voice. It makes it easy to hear the speech, and further makes it easy to hear the speech by lowering the volume level of the content sound while raising the volume level of the conversation sound. In addition, the content sound and the conversation sound can be easily distinguished from each other by outputting the content sound and the conversation sound through different channels using a stereo speaker.

（実施例１）
本発明の通話装置の動作例を、以下に示す各実施例を用いて具体的に説明する。
図１は、本発明を適用した情報処理システムにおいて会話音声とコンテンツ音声といった複数の音声を同時に出力しながら音声コミュニケーションを行なう状況の一例を説明するための概念図で、図中、１０１ａはＡさん宅の通話装置である情報処理端末、１０１ｂはＢさん宅の通話装置である情報処理端末、１０２はネットワーク、１０３はコンテンツを提供可能なコンテンツサーバを示す。 Example 1
An example of the operation of the communication device of the present invention will be specifically described with reference to the following embodiments.
FIG. 1 is a conceptual diagram for explaining an example of a situation in which voice communication is performed while simultaneously outputting a plurality of voices such as conversation voices and content voices in an information processing system to which the present invention is applied. In FIG. An information processing terminal that is a home call device, 101b is an information processing terminal that is a call device at Mr. B's home, 102 is a network, and 103 is a content server that can provide content.

図１において、まず、ＡさんがＢさんに会話音声を優先して聞かせるための操作を行なわず、通常に会話をしていたとする。この場合、Ｂさんは自分の端末の出音レベル設定を任意に変更できる。ここで、Ａさんが会話音声を優先して相手に聞かせるため、リモコンなどで情報処理端末１０１ａの操作を行ったとする。この場合、Ｂさん宅の情報処理端末１０１ｂは、会話音声（本例では、「運動会楽しかったよ」）が優先して出音される設定に変更され、同時に、Ｂさんは自分の情報処理端末１０１ｂの出音レベル設定を変更できなくなる。つまり、この間は、Ｂさんの操作に関わらず、Ａさんからの会話音声「運動会楽しかったよ」を、確実にＢさんに聞かせることができるようになる。 In FIG. 1, first, suppose that Mr. A does not perform an operation for preferentially listening to Mr. B, but has a normal conversation. In this case, Mr. B can arbitrarily change the sound level setting of his terminal. Here, it is assumed that Mr. A operates the information processing terminal 101a with a remote controller or the like in order to give the conversation voice priority to the other party. In this case, the information processing terminal 101b at Mr. B's house is changed to a setting in which conversation voice (in this example, “It was athletic meet was fun”) is preferentially output, and at the same time, Mr. B has his information processing terminal 101b. The sound output level setting cannot be changed. That is, during this time, regardless of Mr. B's operation, Mr. B can be surely heard the conversation voice from Mr. A, “I enjoyed the athletic meet”.

そして、Ａさんが会話音声を優先して相手（Ｂさん）に聞かせるのをやめようと、リモコンなどで情報処理端末１０１ａの操作を行ったとする。この場合、Ｂさん宅の情報処理端末１０１ｂは、会話音声が優先して出音される直前の設定に戻り、同時に、Ｂさんは自分の情報処理端末１０１ｂの出音レベル設定を変更できるようになる。 Then, it is assumed that Mr. A operates the information processing terminal 101a with a remote controller or the like so as to stop the conversation voice from being given priority to the other party (Mr. B). In this case, the information processing terminal 101b at Mr. B's home returns to the setting immediately before the conversation voice is output with priority, and at the same time, Mr. B can change the sound output level setting of his / her information processing terminal 101b. Become.

本実施例の情報処理（通話）システムでは、ネットワーク１０２上のＷＷＷ（ＷｏｒｌｄＷｉｄｅＷｅｂ）サーバ、すなわちコンテンツサーバ１０３上に格納されている運動会の動画ファイルなどネットワーク１０２上にあるコンテンツを、Ａさん宅の情報処理端末１０１ａとＢさん宅の情報処理端末１０１ｂの両方で同期出力し、同時にＡさんとＢさんの間で会話も楽しむことができる。つまり、本実施例の情報処理端末からなる情報処理システムは、コンテンツを共有しながら会話できるため、今まで以上につながり感のあるコミュニケーションが実現できるように構成されている。ここでいう同期出力とは、Ａさん宅とＢさん宅において同じタイミングでコンテンツが再生されることを示しており、同じタイミングとは、完全にタイミングが一致したものである必要はなく、通信網の遅延などによる多少のタイミングの不一致を許容するものとする。 In the information processing (calling) system of the present embodiment, content on the network 102 such as an athletic meet video file stored on the WWW (World Wide Web) server on the network 102, that is, the content server 103 is transferred to Mr. A's house. The information processing terminal 101a and the information processing terminal 101b at Mr. B's house can synchronously output, and at the same time, conversation between Mr. A and Mr. B can be enjoyed. In other words, the information processing system including the information processing terminal according to the present embodiment is configured to realize communication with a sense of connection more than ever since the user can talk while sharing content. The synchronous output here indicates that the content is played back at the same timing in Mr. A's house and Mr. B's house, and the same timing does not have to be completely coincident with the communication network. Some timing discrepancies are allowed due to delays.

図２は、実施例１の情報処理システムの概要構成例を説明するための図である。本実施例の情報処理システムは、ネットワーク１０２を介して接続された情報処理端末１０１ａ（Ａさん宅）と情報処理端末１０１ｂ（Ｂさん宅）とコンテンツサーバ１０３とによって構成される。ここでは、例えば、情報処理端末１０１ａはＡさんが操作するテレビ電話機、情報処理端末１０１ｂはＢさんが操作するテレビ電話機を想定する。また、コンテンツサーバ１０３は、コンテンツを保持するＷＷＷサーバを想定しており、指定されたコンテンツをストリームデータとして情報処理端末１０１ａと情報処理端末１０１ｂとに同時に配信する。 FIG. 2 is a diagram for explaining a schematic configuration example of the information processing system according to the first embodiment. The information processing system according to this embodiment includes an information processing terminal 101 a (A's house), an information processing terminal 101 b (B's house), and a content server 103 connected via a network 102. Here, for example, it is assumed that the information processing terminal 101a is a video phone operated by Mr. A, and the information processing terminal 101b is a video phone operated by Mr. B. The content server 103 is assumed to be a WWW server that holds content, and distributes the designated content to the information processing terminal 101a and the information processing terminal 101b simultaneously as stream data.

情報処理端末１０１ａは、Ａさんの会話音声１１ａを情報処理端末１０１ｂへ送信すると共に、ネットワーク１０２を介してコンテンツサーバ１０３から受信したコンテンツ音声１３ａと情報処理端末１０１ｂから受信したＢさんの会話音声１２ａとを混合して混合音声１４ａとして出音する。同様に、情報処理端末１０１ｂは、Ｂさんの会話音声１２ｂを情報処理端末１０１ａへ送信すると共に、ネットワーク１０２を介してコンテンツサーバ１０３から受信したコンテンツ音声１３ｂと情報処理端末１０１ａから受信したＡさんの会話音声１１ｂとを混合して混合音声１４ｂとして出音する。なお、本実施例の情報処理システムでは、通常のテレビ電話と同様に音声通信と共に映像通信も行なうが、本発明は音声通信に関するものであるため、映像通信に関する説明は省略する。 The information processing terminal 101a transmits the conversation voice 11a of Mr. A to the information processing terminal 101b, and the content voice 13a received from the content server 103 via the network 102 and the conversation voice 12a of Mr. B received from the information processing terminal 101b. Are mixed and output as a mixed sound 14a. Similarly, the information processing terminal 101b transmits Mr. B's conversation voice 12b to the information processing terminal 101a, and the content voice 13b received from the content server 103 via the network 102 and Mr. A's received from the information processing terminal 101a. The conversation voice 11b is mixed and output as a mixed voice 14b. In the information processing system of this embodiment, video communication is performed together with voice communication in the same manner as a normal videophone. However, since the present invention is related to voice communication, description regarding video communication is omitted.

図３は、実施例１の情報処理端末１０１ａの構成例を示すブロック図で、情報処理端末１０１ａは、通信部１１０、会話音声入力部１１１、音声出力部１１２、会話音声送信部１１３、会話音声受信部１１４、コンテンツ音声受信部１１５、出音音声混合部１１６、混合音量変更入力部１１７、モード変更入力部１１８、モード変更送信部１１９、モード変更受信部１２０、及び出音制御部１２１によって構成される。なお、情報処理端末１０１ａと情報処理端末１０１ｂは同じ構成とする。以下、情報処理端末１０１ａあるいは情報処理端末１０１ｂをしばしば情報処理端末１０１という。 FIG. 3 is a block diagram illustrating a configuration example of the information processing terminal 101a according to the first embodiment. The information processing terminal 101a includes the communication unit 110, the conversation voice input unit 111, the voice output unit 112, the conversation voice transmission unit 113, and the conversation voice. The receiving unit 114, the content audio receiving unit 115, the output sound mixing unit 116, the mixed sound volume change input unit 117, the mode change input unit 118, the mode change transmission unit 119, the mode change receiving unit 120, and the sound output control unit 121 are configured. Is done. Note that the information processing terminal 101a and the information processing terminal 101b have the same configuration. Hereinafter, the information processing terminal 101a or the information processing terminal 101b is often referred to as the information processing terminal 101.

通信部１１０は、ネットワーク１０２を介して通信相手の情報処理端末（情報処理端末１０１ｂ）やコンテンツサーバ１０３と有線通信または無線通信を行なう手段であり、例えばネットワークカードやＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）接続端子などで構成される。 The communication unit 110 is means for performing wired communication or wireless communication with an information processing terminal (information processing terminal 101b) or a content server 103 as a communication partner via the network 102. For example, a network card or a LAN (Local Area Network) connection terminal Etc.

会話音声入力部１１１は、Ａさんの発する音声や周囲の音を電気信号に変換して会話音声を生成する手段であり、具体的にはマイクなどの音声入力装置である。音声出力部１１２は、通信相手であるＢさんの会話音声とコンテンツ音声とを混合した電気信号である混合音声を出音する手段であり、具体的にはスピーカなどの音声出力装置である。会話音声送信部１１３は、会話音声入力部１１１から伝えられた会話音声を、通信部１１０を介してネットワーク１０２へ送信する手段である。 The conversation voice input unit 111 is a means for generating a conversation voice by converting a voice uttered by Mr. A and surrounding sounds into an electric signal, and is specifically a voice input device such as a microphone. The audio output unit 112 is a unit that outputs mixed audio, which is an electric signal obtained by mixing the conversation audio of Mr. B who is a communication partner and content audio, and is specifically an audio output device such as a speaker. The conversation voice transmission unit 113 is means for transmitting the conversation voice transmitted from the conversation voice input unit 111 to the network 102 via the communication unit 110.

会話音声受信部１１４は、ネットワーク１０２を介して情報処理端末１０１ｂから会話音声を受信する手段である。また、コンテンツ音声受信部１１５は、ネットワーク１０２を介してコンテンツサーバ１０３からコンテンツ音声を受信する手段である。 The conversation voice receiving unit 114 is means for receiving conversation voice from the information processing terminal 101 b via the network 102. The content audio receiving unit 115 is means for receiving content audio from the content server 103 via the network 102.

出音音声混合部１１６は、定められた混合音量情報に基づき、会話音声受信部１１４で受信した会話音声と、コンテンツ音声受信部１１５で受信したコンテンツ音声との音量をそれぞれ設定し、混合して混合音声を生成する。混合音量情報は、例えば、会話音声とコンテンツ音声との音量比率を「７：３」のような形で指定する。その場合、出音音声混合部１１６は、会話音声の音量とコンテンツ音声の音量の比率が７対３になるように設定して混合する。なお、比率を算出するときに必要な基準単位は、ＴＶの音量レベルの目盛りのようなものや、ｄＢなど音響学的な音の単位など、何でもよい。また、混合音量情報は必ずしも比率でなくともよく、各々の絶対音量を指定する形式であってもよい。 The output sound mixing unit 116 sets and mixes the volume of the conversation sound received by the conversation sound receiving unit 114 and the content sound received by the content sound receiving unit 115 based on the determined mixed volume information. Generate mixed speech. In the mixed volume information, for example, the volume ratio between the conversational sound and the content sound is designated in a form such as “7: 3”. In that case, the output sound mixing unit 116 sets and mixes the ratio of the volume of the conversational sound and the volume of the content sound to be 7: 3. Note that the reference unit necessary for calculating the ratio may be anything such as a scale of a TV volume level or an acoustic sound unit such as dB. Further, the mixed sound volume information does not necessarily have to be a ratio, and may be in a format that specifies each absolute sound volume.

混合音量変更入力部１１７は、Ａさんが情報処理端末１０１ａで出音する音声の混合音量情報を変更するときに操作する手段であり、Ａさんの操作に応じた混合音量変更信号を生成する。具体的にはリモコンなどの操作装置である。 The mixed volume change input unit 117 is a means that is operated when Mr. A changes the mixed volume information of the sound output from the information processing terminal 101a, and generates a mixed volume change signal corresponding to Mr. A's operation. Specifically, it is an operation device such as a remote controller.

モード変更入力部１１８は、Ａさんが通信相手であるＢさんの情報処理端末１０１ｂのモードを切り替えるときに操作する手段であり、Ａさんの操作に応じたモード変更信号（すなわち、本発明の出音レベル設定変更指示情報）を生成する。具体的にはリモコンなどの操作装置である。なお、モードの種類は、例えば、通常通りの出音レベル設定で会話を行なう通常モードと、会話音声を優先して出音する会話優先出音モードの２種類である。なお、モード変更信号が会話優先出音モードである場合は、通信相手であるＢさんの情報処理端末１０１ｂの出音レベル設定をこちら（Ａさん側）から強制的に変更することになるため、この効果はあくまでも一時的なものとなるように考慮する。例えば、図２８において後述するリモコン１０１３にモードボタン１０１６を設けて、Ａさんがリモコンの所定の操作キーを押し続けている間だけ有効とする、もしくは、リモコンの操作キーを押下してから所定時間（例えば１０秒間）のみ有効とする、などの方法が考えられる。このモードボタン１０１６をリモコン１０１３に実装した例については、図２８を用いて詳細に説明する。 The mode change input unit 118 is a means to be operated when Mr. A switches the mode of the information processing terminal 101b of Mr. B who is a communication partner, and a mode change signal corresponding to Mr. A's operation (that is, the output of the present invention). Sound level setting change instruction information). Specifically, it is an operation device such as a remote controller. Note that there are two types of modes, for example, a normal mode in which a conversation is performed with a normal sound output level setting, and a conversation priority sound output mode in which a conversation sound is preferentially output. When the mode change signal is the conversation priority sound output mode, the sound output level setting of the information processing terminal 101b of Mr. B who is the communication partner is forcibly changed from here (Mr. A side). This effect is considered to be temporary. For example, a mode button 1016 is provided on the remote controller 1013, which will be described later in FIG. 28, and is effective only while Mr. A continues to press a predetermined operation key on the remote control, or for a predetermined time after the operation key on the remote control is pressed. A method of making it effective only (for example, for 10 seconds) can be considered. An example in which the mode button 1016 is mounted on the remote controller 1013 will be described in detail with reference to FIG.

モード変更送信部１１９は、モード変更入力部１１８から伝えられたモード変更信号を、通信部１１０を介してネットワーク１０２へ送信する手段である。送信されたモード変更信号が会話優先出音モードの場合、情報処理端末１０１ｂでは会話音声が優先して出音されるような設定に変更され、同時に、Ｂさんの操作によって情報処理端末１０１ｂの出音レベル設定が変更されることが禁止あるいは制限される。 The mode change transmission unit 119 is a means for transmitting the mode change signal transmitted from the mode change input unit 118 to the network 102 via the communication unit 110. When the transmitted mode change signal is in the conversation priority sound output mode, the information processing terminal 101b changes the setting so that the conversation sound is preferentially output, and at the same time, the operation of Mr. B makes the output of the information processing terminal 101b. Changing the sound level setting is prohibited or restricted.

ここで、情報処理端末１０１ａは、情報処理端末１０１ｂにモード変更信号（出音レベル設定変更指示情報）を送信した後から、その出音レベル設定変更指示を解除するための出音レベル設定変更終了指示情報を送信するまでの期間、情報処理端末１０１ｂによる出音レベル設定の変更を禁止もしくは制限されるようにしてもよい。上記出音レベル設定変更終了指示情報は、モード変更信号により出音レベル設定変更指示した情報処理端末１０１ｂに対して、その出音レベル設定変更指示を解除するためのモード解除信号で、情報処理端末１０１ａから送信される。 Here, after the information processing terminal 101a transmits the mode change signal (sound output level setting change instruction information) to the information processing terminal 101b, the sound output level setting change end for releasing the sound output level setting change instruction is completed. During the period until the instruction information is transmitted, the change of the sound output level setting by the information processing terminal 101b may be prohibited or restricted. The sound output level setting change end instruction information is a mode release signal for canceling the sound output level setting change instruction to the information processing terminal 101b instructed to change the sound output level setting by the mode change signal. 101a.

例えば、情報処理端末１０１ａを操作するリモコンにモード変更信号を送信するための操作キーを備え、情報処理端末１０１ａの利用者が操作キーを押下したとき、情報処理端末１０１ｂにモード変更信号を送信し（会話優先出音モードが有効）、利用者が操作キーの押下をやめたときに、情報処理端末１０１ｂにモード解除信号を送信するようにしてもよい。 For example, an operation key for transmitting a mode change signal is provided to a remote controller for operating the information processing terminal 101a, and when the user of the information processing terminal 101a presses the operation key, the mode change signal is transmitted to the information processing terminal 101b. (The conversation priority sound output mode is valid), and when the user stops pressing the operation key, a mode release signal may be transmitted to the information processing terminal 101b.

また、情報処理端末１０１ａの利用者が操作キーを押下したときに、情報処理端末１０１ｂにモード変更信号を送信し、そのモード変更信号を送信してから所定時間（例えば１０秒）が経過した後に、情報処理端末１０１ｂにモード解除信号を送信するようにしてもよい。モード解除信号を受信した情報処理端末１０１ｂは、出音レベル設定の操作が可能となり、出音レベル設定の禁止もしくは制限される直前の出音レベル設定、すなわち、会話音声の優先出音レベル設定前の出音レベル設定に戻すようにしてもよい。 Further, when a user of the information processing terminal 101a presses an operation key, a mode change signal is transmitted to the information processing terminal 101b, and after a predetermined time (for example, 10 seconds) has elapsed since the mode change signal was transmitted. A mode release signal may be transmitted to the information processing terminal 101b. The information processing terminal 101b that has received the mode release signal can perform the sound output level setting operation, and the sound output level setting immediately before the sound output level setting is prohibited or restricted, that is, before setting the priority sound output level of the conversational sound. You may make it return to the sound output level setting.

また、上記方法と異なる方法として、情報処理端末１０１ａが情報処理端末１０１ｂにモード変更信号（出音レベル設定変更指示情報）を送信した後から、一定時間の間、情報処理端末１０１ｂによる出音レベル設定の変更が禁止もしくは制限されるようにしてもよい。この場合、出音レベル設定変更終了指示情報は必要なく、情報処理端末１０１ｂは、出音レベル設定変更指示情報を受信して一定時間が経過すると、出音レベル設定の操作が可能となる。また、このとき出音レベル設定の禁止もしくは制限される直前の出音レベル設定、すなわち、会話音声の優先出音レベル設定前の出音レベル設定に戻すようにしてもよい。 Further, as a method different from the above method, the sound output level by the information processing terminal 101b for a certain period of time after the information processing terminal 101a transmits a mode change signal (sound output level setting change instruction information) to the information processing terminal 101b. Changing the setting may be prohibited or restricted. In this case, the sound output level setting change end instruction information is not required, and the information processing terminal 101b can perform the sound output level setting operation after a certain time has elapsed after receiving the sound output level setting change instruction information. At this time, the sound output level setting immediately before prohibition or restriction of the sound output level setting, that is, the sound output level setting before setting the priority sound output level of the conversation voice may be restored.

モード変更受信部１２０は、通信部１１０を介してネットワーク１０２から、情報処理端末１０１ｂからのモード変更信号を受信する手段である。この際、受信したモード変更信号が会話優先出音モードの場合は、情報処理端末１０１ａでは会話音声が優先して出音されるような設定に変更され、同時に、Ａさんの操作によって情報処理端末１０１ａの出音レベル設定が変更されることが禁止あるいは制限される。 The mode change receiving unit 120 is a means for receiving a mode change signal from the information processing terminal 101b from the network 102 via the communication unit 110. At this time, if the received mode change signal is in the conversation priority sound output mode, the information processing terminal 101a changes the setting so that the conversation sound is preferentially output, and at the same time, the information processing terminal is operated by Mr. A's operation. Changing the sound output level setting of 101a is prohibited or restricted.

出音制御部１２１は、情報処理端末１０１ａの混合音量変更信号と情報処理端末１０１ｂから受信したモード変更信号を入力し、それに基づいた混合音量情報を決定し、保持する手段である。 The sound output control unit 121 is a unit that receives the mixed volume change signal of the information processing terminal 101a and the mode change signal received from the information processing terminal 101b, and determines and holds mixed volume information based thereon.

図４は、出音制御部１２１の詳細な構成例を示すブロック図で、出音制御部１２１は、混合音量制御部１２１ａ、混合音量情報保持部１２１ｂ、モード情報保持部１２１ｃ、元設定混合音量情報保持部１２１ｄ、会話優先混合音量情報保持部１２１ｅによって構成されている。 FIG. 4 is a block diagram illustrating a detailed configuration example of the sound output control unit 121. The sound output control unit 121 includes a mixed sound volume control unit 121a, a mixed sound volume information holding unit 121b, a mode information holding unit 121c, and an original set mixed sound volume. The information holding unit 121d and the conversation priority mixed sound volume information holding unit 121e are configured.

混合音量制御部１２１ａは、情報処理端末１０１ａの混合音量変更信号１２３と、情報処理端末１０１ｂから受信したモード変更信号１２４を入力し、それに基づいた混合音量情報１２２を決定し、混合音量情報保持部１２１ｂへ書き込む手段である。このとき、混合音量制御部１２１ａは、必要に応じて、混合音量情報保持部１２１ｂ、モード情報保持部１２１ｃ、元設定混合音量情報保持部１２１ｄ、会話優先混合音量情報保持部１２１ｅの４つの情報保持部に対して情報を書き込み、あるいは、４つの情報保持部から情報を読み込む処理を行う。 The mixed volume control unit 121a receives the mixed volume change signal 123 of the information processing terminal 101a and the mode change signal 124 received from the information processing terminal 101b, determines the mixed volume information 122 based thereon, and the mixed volume information holding unit Means for writing to 121b. At this time, the mixed volume control unit 121a holds four pieces of information, a mixed volume information holding unit 121b, a mode information holding unit 121c, an original set mixed volume information holding unit 121d, and a conversation priority mixed volume information holding unit 121e as necessary. The information is written into the unit or the information is read from the four information holding units.

混合音量情報保持部１２１ｂは、混合音量情報を保持する。モード情報保持部１２１ｃは、会話優先出音モードであるか通常モードであるかのモード情報を保持する。元設定混合音量情報保持部１２１ｄは、会話優先出音モードから通常モードになった場合に設定する混合音量情報を保持する。このときの混合音量情報は、会話優先出音モードになる直前の出音レベル設定である。会話優先混合音量保持部１２１ｅは、会話優先出音モードの場合に設定する混合音量情報を保持する。なお、この混合音量情報はあらかじめ決定されていてもよいし、情報処理端末１０１ｂから指定するようにしてもよい。 The mixed volume information holding unit 121b holds mixed volume information. The mode information holding unit 121c holds mode information indicating whether the mode is the conversation priority sound output mode or the normal mode. The original set mixed volume information holding unit 121d holds mixed volume information set when the conversation priority sound output mode is changed to the normal mode. The mixed sound volume information at this time is a sound output level setting immediately before the conversation priority sound output mode is entered. The conversation priority mixed sound volume holding unit 121e holds mixed sound volume information set in the case of the conversation priority sound output mode. The mixed sound volume information may be determined in advance or may be specified from the information processing terminal 101b.

実施例１における情報処理端末１０１ａの動作例を以下に示すフロー図を用いて説明する。なお、情報処理端末１０１ａを中心に説明するが、情報処理端末１０１ｂの動作も同様である。
（音声通話）
図５は、実施例１における情報処理端末１０１ａの音声通話処理の一例を説明するためのフロー図である。尚、本例は、図３及び図４に示した構成に基づいて説明するものとする。また、実際には送信プロセスと受信プロセスは同時に行なわれているが、説明のためそれぞれのプロセスを分けて示している。 An operation example of the information processing terminal 101a in the first embodiment will be described with reference to a flowchart shown below. In addition, although demonstrated centering on the information processing terminal 101a, the operation | movement of the information processing terminal 101b is also the same.
(Voice call)
FIG. 5 is a flowchart for explaining an example of the voice call process of the information processing terminal 101a according to the first embodiment. In addition, this example shall be demonstrated based on the structure shown in FIG.3 and FIG.4. In addition, although the transmission process and the reception process are actually performed at the same time, each process is shown separately for explanation.

図５（Ａ）に示す送信プロセスにおいて、まず、情報処理端末１０１ａは、Ａさんの会話音声を会話音声入力部１１１により取り込み（ステップＳ１）、取り込んだ会話音声を通信部１１０を介してネットワーク１０２へ送信する（ステップＳ２）。一方、図５（Ｂ）に示す受信プロセスにおいて、情報処理端末１０１ａは、通信部１１０を介してネットワーク１０２からＢさんの会話音声を受信し、同時にコンテンツ音声を受信する（ステップＳ３）。続いて、出音音声混合部１１６は、これらふたつの音声を混合するため、混合音量情報保持部１２１ｂから混合音量情報を読み出し（ステップＳ４）、その比率に基づいて会話音声とコンテンツ音声を混合して混合音声を生成し（ステップＳ５）、生成した混合音声を、音声出力部１１２から出音する（ステップＳ６）。なお、ステップＳ１〜Ｓ２の送信プロセスと、ステップＳ３〜Ｓ６の受信プロセスは同時に実行されているものとする。 In the transmission process shown in FIG. 5A, first, the information processing terminal 101a captures Mr. A's conversation voice by the conversation voice input unit 111 (step S1), and the captured conversation voice is transmitted to the network 102 via the communication unit 110. (Step S2). On the other hand, in the reception process shown in FIG. 5B, the information processing terminal 101a receives Mr. B's conversation voice from the network 102 via the communication unit 110, and simultaneously receives the content voice (step S3). Subsequently, in order to mix these two sounds, the sound output sound mixing unit 116 reads the mixed sound volume information from the mixed sound volume information holding unit 121b (step S4), and mixes the conversation sound and the content sound based on the ratio. The mixed sound is generated (step S5), and the generated mixed sound is output from the sound output unit 112 (step S6). It is assumed that the transmission process of steps S1 to S2 and the reception process of steps S3 to S6 are executed simultaneously.

（混合音量変更）
図６は、実施例１における情報処理端末１０１ａで出音される音声の混合音量変更処理の一例を説明するためのフロー図である。尚、本例は、図３及び図４に示した構成に基づいて説明するものとする。まず、情報処理端末１０１ａは、Ａさんが混合音量を変更するために行なった操作により入力された信号を、混合音量変更入力部１１７を通して混合音量変更信号１２３に変換し（ステップＳ１１）、その混合音量変更信号１２３を混合音量制御部１２１ａに入力する。次に、混合音量制御部１２１ａは、モード情報保持部１２１ｃから今のモード情報を読み出し、会話優先出音モードであるか通常モードであるかをチェックする（ステップＳ１２）。 (Mixed volume change)
FIG. 6 is a flowchart for explaining an example of the mixed sound volume changing process for the sound output from the information processing terminal 101a according to the first embodiment. In addition, this example shall be demonstrated based on the structure shown in FIG.3 and FIG.4. First, the information processing terminal 101a converts the signal input by the operation performed by Mr. A to change the mixed volume into the mixed volume change signal 123 through the mixed volume change input unit 117 (step S11). A volume change signal 123 is input to the mixed volume control unit 121a. Next, the mixed sound volume control unit 121a reads the current mode information from the mode information holding unit 121c, and checks whether it is the conversation priority sound output mode or the normal mode (step S12).

次に、混合音量制御部１２１ａは、ステップＳ１２において、通常モードである場合（図中、通常モードの場合）、混合音量変更信号１２３に基づいた混合音量情報１２２を混合音量情報保持部１２１ｂへ書き込み、混合音量情報１２２を変更し（ステップＳ１４）、その結果を表示する（ステップＳ１５）。一方、ステップＳ１２において、会話優先出音モードである場合（図中、会話優先出音モードの場合）、混合音量情報１２２を変更しないで、その結果を表示する（ステップＳ１３）。 Next, in step S12, the mixed volume control unit 121a writes the mixed volume information 122 based on the mixed volume change signal 123 to the mixed volume information holding unit 121b in the normal mode (in the case of the normal mode in the figure). Then, the mixed sound volume information 122 is changed (step S14), and the result is displayed (step S15). On the other hand, if it is the conversation priority sound output mode in step S12 (in the case of the conversation priority sound output mode in the figure), the mixed sound volume information 122 is not changed and the result is displayed (step S13).

ここで、図７に示すように、通常モードにおいて、情報処理端末１０１ａで出音される音声の混合音量を変更できれば、変更できたことを利用者に伝えるための情報を情報処理端末１０１ａの画面に表示する。本例の情報処理端末１０１ａは、「変更しました会話Ｘ：コンテンツＹ」を表示する。なお、会話Ｘ：コンテンツＹとは、会話音声とコンテンツ音声の音量比率がＸ：Ｙであることを示す。一方、会話優先出音モードにおいて、情報処理端末１０１ａで出音される音声の混合音量を変更することができなければ、変更できないことを利用者に伝えるための情報を情報処理端末１０１ａの画面に表示する。本例の情報処理端末１０１ａは、「変更できません会話優先出音中」を表示する。 Here, as shown in FIG. 7, in the normal mode, if the mixed sound volume of the sound output from the information processing terminal 101a can be changed, information for informing the user that the change has been made is displayed on the screen of the information processing terminal 101a. To display. The information processing terminal 101a of this example displays “changed conversation X: content Y”. Note that the conversation X: content Y indicates that the volume ratio of the conversation voice and the content voice is X: Y. On the other hand, in the conversation priority sound output mode, if the mixed sound volume of the sound output from the information processing terminal 101a cannot be changed, information for notifying the user that it cannot be changed is displayed on the screen of the information processing terminal 101a. indicate. The information processing terminal 101a of the present example displays “Unchangeable conversation priority sound output”.

（会話優先送信）
図８は、実施例１において情報処理端末１０１ａが情報処理端末１０１ｂに対してモード変更の操作を行なった際の動作例を説明するためのフロー図である。尚、本例は、図３及び図４に示した構成に基づいて説明する。まず、情報処理端末１０１ａは、Ａさんが情報処理端末１０１ｂのモードを切り替えるために行なった操作により入力された信号を、モード変更入力部１１８を通してモード変更信号に変換し（ステップＳ２１）、そのモード変更信号をモード変更送信部１１９によって、通信部１１０を介してネットワーク１０２へ送信する（ステップＳ２２）。 (Conversation priority transmission)
FIG. 8 is a flowchart for explaining an operation example when the information processing terminal 101a performs a mode change operation on the information processing terminal 101b in the first embodiment. In addition, this example is demonstrated based on the structure shown in FIG.3 and FIG.4. First, the information processing terminal 101a converts a signal input by an operation performed by Mr. A to change the mode of the information processing terminal 101b into a mode change signal through the mode change input unit 118 (step S21). The change signal is transmitted to the network 102 via the communication unit 110 by the mode change transmission unit 119 (step S22).

次に、情報処理端末１０１ａは、送信したモード変更信号が通常モードか会話優先出音モードかをチェックし（ステップＳ２３）、会話優先出音モードの場合（図中、会話優先出音モードの場合）、その結果を表示する（ステップＳ２４）。一方、ステップＳ２３において、通常モードの場合（図中、通常モードの場合）、その結果を表示する（ステップＳ２５）。上記ステップＳ２４及びステップＳ２５における送信結果の表示例を図９に示す。このように、送信したモード変更信号が会話優先出音モード、通常モードのどちらであるかを知らせるための情報を、情報処理端末１０１ａの画面に表示させ、利用者に伝える。 Next, the information processing terminal 101a checks whether the transmitted mode change signal is the normal mode or the conversation priority sound output mode (step S23), and in the case of the conversation priority sound output mode (in the case of the conversation priority sound output mode in the figure). ) And display the result (step S24). On the other hand, in step S23, in the normal mode (in the figure, in the normal mode), the result is displayed (step S25). FIG. 9 shows a display example of the transmission results in steps S24 and S25. In this way, information for notifying whether the transmitted mode change signal is the conversation priority sound output mode or the normal mode is displayed on the screen of the information processing terminal 101a and is transmitted to the user.

図９に示す画面例において、会話優先出音モードが送信された場合、情報処理端末１０１ａは、その画面上に「会話優先送信中」を表示し、通常モードが送信された場合、「会話優先送信解除」を表示する。 In the screen example shown in FIG. 9, when the conversation priority sound output mode is transmitted, the information processing terminal 101a displays “conversation priority transmission” on the screen, and when the normal mode is transmitted, “Release transmission” is displayed.

（会話優先受信）
図１０は、実施例１において情報処理端末１０１ａが情報処理端末１０１ｂからモード変更の操作を受けた際の動作例を説明するためのフロー図である。尚、本例は、図３及び図４に示した構成に基づいて説明する。まず、情報処理端末１０１ａは、情報処理端末１０１ｂからモード変更信号１２４をモード変更受信部１２０で受信し、混合音量制御部１２１ａへ伝える（ステップＳ３１）。続いて、混合音量制御部１２１ａは、このモード変更信号１２４が会話優先出音モードか通常モードかをチェックし（ステップＳ３２）、会話優先出音モードである場合（図中、会話優先出音モードの場合）、混合音量制御部１２１ａは、モード情報保持部１２１ｃへモード情報として会話優先出音モードを書き込む（ステップＳ３３）。次に、元設定の混合音量情報を保持しておくため、混合音量情報保持部１２１ｂから現在の混合音量情報を読み出し、元設定混合音量情報保持部１２１ｄへ書き込み（ステップＳ３４）、その後、会話優先混合音量情報保持部１２１ｅから会話優先出音モード用の出音レベル設定を読み出し、混合音量情報保持部１２１ｂへ書き込む（ステップＳ３５）。最後に、どちらのモードを受信したか示す情報を画面上に表示する（ステップＳ３６）。 (Conversation priority reception)
FIG. 10 is a flowchart for explaining an operation example when the information processing terminal 101a receives a mode change operation from the information processing terminal 101b in the first embodiment. In addition, this example is demonstrated based on the structure shown in FIG.3 and FIG.4. First, the information processing terminal 101a receives the mode change signal 124 from the information processing terminal 101b by the mode change receiving unit 120 and transmits it to the mixed sound volume control unit 121a (step S31). Subsequently, the mixed sound volume control unit 121a checks whether the mode change signal 124 is the conversation priority sound output mode or the normal mode (step S32). If the mode change signal 124 is the conversation priority sound output mode (in the figure, the conversation priority sound output mode). ), The mixed sound volume control unit 121a writes the conversation priority sound output mode as mode information to the mode information holding unit 121c (step S33). Next, in order to hold the original mixed sound volume information, the current mixed sound volume information is read from the mixed sound volume information holding unit 121b and written to the original mixed sound volume information holding unit 121d (step S34). The sound output level setting for the conversation priority sound output mode is read from the mixed sound volume information holding unit 121e and written to the mixed sound volume information holding unit 121b (step S35). Finally, information indicating which mode is received is displayed on the screen (step S36).

一方、ステップＳ３２において、モード変更信号１２４が通常モードである場合（図中、通常モードの場合）、混合音量制御部１２１ａは、モード情報保持部１２１ｃへモード情報として通常モードを書き込む（ステップＳ３７）。次に、元設定の混合音量に戻すため、元設定混合音量情報保持部１２１ｄから混合音量情報を読み出し、混合音量情報保持部１２１ｂへ書き込む（ステップＳ３８）。最後に、どちらのモードを受信したか示す情報を画面上に表示する（ステップＳ３９）。上記ステップＳ３６及びステップＳ３９における受信結果の表示例を図１１に示す。このように、受信したモード変更信号が会話優先出音モード、通常モードのどちらであるかを知らせるための情報を、情報処理端末１０１ａの画面に表示させ、利用者に伝える。 On the other hand, when the mode change signal 124 is in the normal mode in the step S32 (in the case of the normal mode in the figure), the mixed sound volume control unit 121a writes the normal mode as the mode information in the mode information holding unit 121c (step S37). . Next, in order to return to the original mixed sound volume, the mixed sound volume information is read from the original mixed sound volume information holding unit 121d and written to the mixed sound volume information holding unit 121b (step S38). Finally, information indicating which mode is received is displayed on the screen (step S39). FIG. 11 shows a display example of the reception results in steps S36 and S39. In this way, information for notifying whether the received mode change signal is the conversation priority sound output mode or the normal mode is displayed on the screen of the information processing terminal 101a to inform the user.

図１１に示す画面例において、会話優先出音モードを受信した場合、情報処理端末１０１ａは、その画面上に「会話優先出音中」を表示し、通常モードを受信した場合、「会話優先出音解除会話Ｘ：コンテンツＹ」を表示する。この会話Ｘ：コンテンツＹは、モード変更前の元の出音レベル設定を示す。 In the screen example shown in FIG. 11, when the conversation priority sound output mode is received, the information processing terminal 101a displays “conversation priority sound output” on the screen, and when the normal mode is received, “Sound Cancel Conversation X: Content Y” is displayed. This conversation X: content Y shows the original sound output level setting before the mode change.

図１２は、混合音量情報及びモード情報のデータ構造の一例を示す図である。図１２（Ａ）に示す混合音量情報及び混合音量変更信号は、会話音声とコンテンツ音声の比率で表される。このとき、比率の値を音量レベルとみなすことにより、比率だけではなく絶対音量の情報を付け加えることも可能である。また、図１２（Ｂ）に示すモード情報及びモード変更信号は、会話音声を優先して出音するための会話優先出音モード、通常の出音レベル設定で会話を行うための通常モードのいずれかで表される。 FIG. 12 is a diagram illustrating an example of a data structure of the mixed volume information and mode information. The mixed sound volume information and mixed sound volume change signal shown in FIG. 12A are represented by the ratio of conversational sound and content sound. At this time, by regarding the value of the ratio as a volume level, it is possible to add not only the ratio but also information on the absolute volume. Further, the mode information and the mode change signal shown in FIG. 12 (B) are either a conversation priority sound output mode for outputting a sound with priority to a conversation sound or a normal mode for performing a conversation with a normal sound output level setting. It is represented by

（会話優先出音モードにならない機能）
本実施例の情報処理システムの使用場面を想定すると、コンテンツをじっくりと観たいときなど、コンテンツ音声を集中して聞きたい場合があると思われる。そういう場合のために、情報処理端末１０１に会話優先出音モードにならない機能を備えるようにしてもよい。この機能を有効にすると、通信相手から受信したモード変更信号を無視して、情報処理端末１０１の出音レベル設定を任意に設定でき、コンテンツ音声を集中して聞くことができる。例えば、情報処理端末１０１を操作するリモコンなどにモード変更信号を無効にするための操作キーを備え、利用者により操作キーが押下されると、モード変更信号を受信した場合でも情報処理端末１０１の出音レベル設定が変更されないようにしてもよい。 (Function that does not enter conversation priority sound output mode)
Assuming the usage scene of the information processing system of this embodiment, there may be a case where the user wants to listen to the content sound in a concentrated manner, for example, when he wants to watch the content carefully. For such a case, the information processing terminal 101 may be provided with a function that does not enter the conversation priority sound output mode. If this function is enabled, the mode change signal received from the communication partner can be ignored, the sound output level setting of the information processing terminal 101 can be arbitrarily set, and the content sound can be listened in a concentrated manner. For example, an operation key for invalidating a mode change signal is provided in a remote controller or the like that operates the information processing terminal 101, and when the operation key is pressed by the user, even if the mode change signal is received, the information processing terminal 101 The sound output level setting may not be changed.

（会話優先出音モードでもある程度操作できる機能）
また、本実施例の情報処理システムの使用場面を想定すると、会話優先出音モードになったときの出音レベル設定を、自分が聞きやすいように微調整したい場合があると思われる。そういう場合のために、会話優先出音モードであっても、情報処理端末１０１にある程度出音操作できる機能を備えるようにしてもよい。この機能を有効にすると、会話優先出音モード中であっても、会話音声が優先して出力されるような範囲内で情報処理端末１０１の出音レベル設定を調整することができる。 (Function that can be operated to some extent even in conversation priority sound output mode)
Further, assuming the use situation of the information processing system of the present embodiment, it may be desired to finely adjust the sound output level setting when the conversation priority sound output mode is entered so that the user can easily hear it. For such a case, even in the conversation priority sound output mode, the information processing terminal 101 may be provided with a function capable of performing sound output to some extent. If this function is enabled, it is possible to adjust the sound output level setting of the information processing terminal 101 within a range in which the conversation voice is preferentially output even in the conversation priority sound output mode.

（お互いに会話優先出音モードになる機能）
また、本実施例の情報処理システムの使用場面を想定すると、相手側に会話音声を優先して出音させたいときは、お互いに会話をするのが目的であることが多いため、相手からの会話音声もこちらで優先して出音したい場合があると思われる。そういう場合のために、情報処理端末１０１にお互いに会話優先出音モードになる機能を備えるようにしてもよい。この機能を有効にすると、相手（受話）側に会話音声を優先して出音させようとモード変更の操作を行なった場合、相手（受話）側の情報処理端末１０１が会話優先出音モードになるとともに、自分（送話）側の情報処理端末１０１でも相手からの会話音声が優先して出音されるような設定に変更される。 (Function to switch to conversation priority sound output mode)
Assuming that the information processing system of the present embodiment is used, when the other party wants to give priority to the conversation voice, it is often the purpose of talking to each other. There may be cases where you want to prioritize the conversation voice here. For such a case, the information processing terminal 101 may be provided with a function of entering the conversation priority sound output mode. When this function is enabled, when the mode change operation is performed so that the conversation (speech) side gives priority to the conversation voice, the information processing terminal 101 on the partner (reception) side enters the conversation priority sound output mode. At the same time, the information processing terminal 101 on its own (sending) side is also changed to a setting that gives priority to the conversation voice from the other party.

（ステレオスピーカを用いて出力音声をチャネルで分ける機能）
また、コンテンツ音声を会話音声とコンテンツ音声が明瞭に聞き分けられるように、それぞれの音声を別の場所から出音させる機能があってもよい。図１３は、ステレオスピーカを用いて出力音声をチャネルで分ける機能の一例を説明するための概念図である。 (Function to divide output audio by channel using stereo speakers)
Further, there may be a function of outputting each sound from another place so that the content sound can be clearly distinguished from the conversation sound and the content sound. FIG. 13 is a conceptual diagram for explaining an example of a function of dividing output sound by channel using a stereo speaker.

図１３に示す例では、左右２つのスピーカを用いて左チャネルと右チャネルの２チャネルの音声、つまりステレオ音声を出音できるステレオスピーカを想定している。通常の場合は、会話音声とコンテンツ音声が同時に出音される。すなわち、左スピーカからは会話音声のＬ音声とコンテンツ音声のＬ音声が混合された音声が、右スピーカからは、会話音声のＲ音声とコンテンツ音声のＲ音声とが混合された音声が、それぞれ出音される。この場合、会話音声もコンテンツ音声も２チャンネルのステレオ音声で出音されるため、臨場感あふれる音声を聞くことができるが、左スピーカでも右スピーカでも会話音声とコンテンツ音声が混合した音が出音されるため、会話音声とコンテンツ音声を聞き分けることが困難である。そこで、会話音声とコンテンツ音声を聞き分けたい場合、すなわち、情報処理端末１０１ｂからモード変更の操作を受けた際は、会話音声とコンテンツ音声それぞれに関して、会話音声を左チャネル、コンテンツ音声を右チャネルというようにチャネルを分けて別々のスピーカから独立して出音する。このとき、元の音声信号がステレオ音声である場合は、ステレオ音声を合成して１チャネルの音声であるモノラル音声を生成してからチャネルを分けて出音させる。このような制御により、会話音声とコンテンツ音声は別々のスピーカから独立して出音されるため、聞き分けやすくなるという効果が期待できる。 The example shown in FIG. 13 assumes a stereo speaker that can output two-channel sound of the left channel and the right channel, that is, stereo sound, using two left and right speakers. In a normal case, conversation voice and content voice are output simultaneously. That is, the left speaker outputs a sound in which the conversation sound L sound and the content sound L sound are mixed, and the right speaker outputs a sound in which the conversation sound R sound and the content sound R sound are mixed. Sounded. In this case, both the conversational sound and the content sound are output as two-channel stereo sound, so that you can hear a sound that is full of realism, but the sound that is a mixture of the conversational sound and the content sound is output from either the left speaker or the right speaker. Therefore, it is difficult to distinguish between the conversation voice and the content voice. Therefore, when it is desired to distinguish between the conversation voice and the content voice, that is, when a mode change operation is received from the information processing terminal 101b, the conversation voice is called the left channel and the content voice is called the right channel for each of the conversation voice and the content voice. Separate channels and output sound independently from different speakers. At this time, if the original audio signal is stereo audio, the stereo audio is synthesized to generate monaural audio that is 1-channel audio, and then the channels are divided and output. By such control, the conversation voice and the content voice are output independently from different speakers, so that an effect that it is easy to distinguish can be expected.

（コンテンツ音声を優先して相手側で出音させる機能）
また、本実施例の情報処理システムの使用場面を想定すると、コンテンツ共有時に、重要な見せ場など特定のシーンを強調して相手に聞かせたい場合があると考えられる。そういう場合のために、会話音声を優先させる会話優先出音モードと同様に、コンテンツ音声を優先させるモードがあってもよい。一例として、ボタンを押している間だけ相手側でコンテンツ音声が優先して出音されるという方法が考えられる。 (Function that allows content audio to be output on the other party's priority)
Assuming the usage scene of the information processing system of the present embodiment, it is considered that there is a case where a specific scene such as an important display place is emphasized and told to the other party when sharing contents. For such a case, there may be a mode in which the content sound is prioritized similarly to the conversation priority sound output mode in which the conversation sound is prioritized. As an example, there may be a method in which content audio is preferentially output on the other party only while the button is pressed.

（エコーキャンセラの設定がモードと連動して切り替わる機能）
本実施例の情報処理システムをハンズフリーで使用する場合、エコーキャンセラが必須となる。また、会話優先出音モードの場合は交互に会話をやりとりする半二重通話の状態が多くなり、通常モードの場合は同時に音声が流れる全二重通話の状態が多くなる。つまり、モードにより通話の特徴が異なるため、エコーキャンセラの適切な設定はモード毎に異なる。 (Function to switch the echo canceller setting in conjunction with the mode)
When the information processing system of this embodiment is used in a hands-free manner, an echo canceller is essential. Further, in the conversation priority sound output mode, the state of half-duplex calls in which conversations are alternately exchanged increases, and in the normal mode, the state of full-duplex calls in which voice flows simultaneously increases. That is, since the characteristics of the call differ depending on the mode, the appropriate setting of the echo canceller differs for each mode.

このことを考慮して、会話優先出音モードを設定した側(情報処理端末１０１ａ側)では、自分の会話音声１１ａを積極的に相手に送るが、相手からの会話音声１２ａはあまり送られてこないという状況に合わせてエコーキャンセラの設定を行ない、一方、会話優先出音モードを受けた側(情報処理端末１０１ｂ側)では、相手からの会話音声１１ｂは積極的に送られてくるが自分の会話音声１２ｂはあまり送らないという状況に合わせてエコーキャンセラの設定を行なう、そして、通常モードでは、双方向で音声を送受信するという状況に合わせてエコーキャンセラの設定を行なうという機能があってもよい。 In consideration of this, the side that has set the conversation priority sound output mode (information processing terminal 101a side) actively sends its own conversation voice 11a to the other party, but the conversation voice 12a from the other party is sent too much. The echo canceller is set in accordance with the situation where it does not come. On the other hand, the conversation voice 11b from the other party is actively sent on the side that has received the conversation priority sound output mode (information processing terminal 101b side). There may be a function of setting the echo canceller according to the situation where the conversation voice 12b is not transmitted so much, and in the normal mode, setting the echo canceller according to the situation where the voice is transmitted and received in both directions. .

（コンテンツを共有してないとき会話優先出音モードを自動的に無効にする機能）
また、コンテンツ共有をしていない場合、相手側は会話に適した出音レベル設定にしていると思われ、会話優先出音モードの機能はあまり意味がない。したがって、コンテンツを共有していない場合は、常に通常モードとなるようにしてもよい。 (Function to automatically disable conversation priority sound output mode when content is not shared)
If the content is not shared, the other party seems to have set the sound output level suitable for conversation, and the function of the conversation priority sound output mode is not very meaningful. Therefore, when the content is not shared, the normal mode may always be set.

（会話音声の音量レベルに応じてモード変更する機能）
会話優先出音モードと通常モードの切り替えを、会話音声１１ａに応じて行なう機能があってもよい。具体的には、波形を解析してＡさんの会話音声が入力されたと認識したら会話優先出音モードになる、あるいは、会話音声１１ａの音量レベルに応じてモード変更を行なうという例が考えられる。 (Function to change the mode according to the volume level of the conversation voice)
There may be a function of switching between the conversation priority sound output mode and the normal mode according to the conversation sound 11a. Specifically, an example is considered in which when the waveform is analyzed and it is recognized that Mr. A's conversation voice is input, the conversation priority sound output mode is set, or the mode is changed according to the volume level of the conversation voice 11a.

図１４を用いて、音量レベルに応じてモード変更を行なう例について説明する。図１４に示すグラフ（特性曲線）は会話音声の音量レベルの時間変化の一例を表したものである。会話優先出音モードへの切り替えは、会話音声の音量レベルが一定時間以上、設定レベル以上のときに行なう。一方、通常モードへの切り替えは、会話音声の音量レベルが一定時間以下、解除レベル以下のときに行なう。なお、モードが頻繁に切り替わることを防ぐため、一度モードが切り替わったら、そのモードは切り替えアルゴリズムに関わらず一定期間は続くものとする。 An example in which the mode is changed according to the volume level will be described with reference to FIG. The graph (characteristic curve) shown in FIG. 14 represents an example of the temporal change in the volume level of the conversational voice. The switching to the conversation priority sound output mode is performed when the volume level of the conversation voice is equal to or higher than a set level for a predetermined time or longer. On the other hand, switching to the normal mode is performed when the volume level of the conversation voice is below a certain time and below the cancellation level. In order to prevent the mode from switching frequently, once the mode is switched, the mode continues for a certain period regardless of the switching algorithm.

図１４において、時刻ｔ_１では音量レベルが設定レベル以上になっているが、その時間帯Ａが一定時間以上でないため会話優先出音モードにはならない、一方、時刻ｔ_２では、音量レベルが設定レベル以上になっている時間帯Ｂが一定時間以上であるため会話優先出音モードに切り替わっている。また、時刻ｔ_３では、音量レベルが解除レベル以下であり、その時間帯Ｃが一定時間以上であるため、通常モードに切り替わる条件は満たしているが、会話優先出音モードに切り替わってから一定期間内であるため、モードは切り替わらない。最後に、時刻ｔ_４では、音量レベルが解除レベル以下になっている時間帯Ｄが一定時間以上であり、さらに、会話優先出音モードに切り替わってから一定期間後であるため、通常モードに切り替わっている。 In FIG. 14, the sound volume level is equal to or higher than the set level at time t ₁ , but the conversation priority sound output mode is not set because the time zone A is not equal to or longer than a certain time, while the sound volume level is set at time t _2. Since the time zone B that is higher than the level is longer than a certain time, the mode is switched to the conversation priority sound output mode. At time t ₃ , the volume level is equal to or lower than the release level, and the time zone C is equal to or longer than a certain time. Therefore, the condition for switching to the normal mode is satisfied, but for a certain period after switching to the conversation priority sound output mode. Because it is within, the mode is not switched. Finally, at time t _4, the time period D the volume level is equal to or less than cancellation level is at a predetermined time or more, further, since it is after a predetermined period after switching to conversation priority sound output mode, it switched to the normal mode ing.

なお、前述した会話優先出音モードにならない機能、会話優先出音モードでもある程度操作できる機能、お互いに会話優先出音モードになる機能、ステレオスピーカを用いて出力音声をチャネルで分ける機能、コンテンツ音声を優先して相手側で出音させる機能、エコーキャンセラの設定がモードと連動して切り替わる機能、コンテンツを共有してないとき会話優先出音モードを自動的に無効にする機能、会話音声の音量レベルに応じてモード変更する機能の各機能は、実施例１だけでなく、以下に説明する実施例２，３にも同様に適用することができる。 In addition, the function which does not become the conversation priority sound output mode mentioned above, the function which can be operated to some extent even in the conversation priority sound output mode, the function which becomes the conversation priority sound output mode to each other, the function which divides the output sound by the channel using the stereo speaker, the content sound A function that allows the other party to output sound with priority, a function that switches the echo canceller setting in conjunction with the mode, a function that automatically disables the conversation priority sound output mode when content is not shared, the volume of the conversation voice Each function of changing the mode according to the level can be applied not only to the first embodiment but also to the second and third embodiments described below.

（実施例２）
図１５は、実施例２の情報処理システムの概要構成例を説明するための図である。本実施例の情報処理システムの特徴は、Ａさんが所有しているコンテンツを共有しながらＡさんとＢさんで会話を楽しめるというものである。Ａさんが所有しているコンテンツは、例えば、情報処理端末に録画されたものでもよいし、ビデオデッキやデジタルビデオカメラなどを情報処理端末の外部入力端子に接続して読み込んだものでもよいし、ＳＤ（ＳｅｃｕｒｅＤｉｇｉｔａｌ）メモリーカードなどの記録媒体を情報処理端末のメモリーカードスロットに挿入して読み込んだものでもよく、様々な形態が考えられる。 (Example 2)
FIG. 15 is a diagram for explaining a schematic configuration example of the information processing system according to the second embodiment. A feature of the information processing system of the present embodiment is that A and B can enjoy a conversation while sharing the contents owned by A. The content owned by Mr. A may, for example, be recorded on the information processing terminal, or may be read by connecting a video deck or digital video camera to the external input terminal of the information processing terminal, A recording medium such as an SD (Secure Digital) memory card may be read by inserting it into a memory card slot of the information processing terminal, and various forms are possible.

本実施例の情報処理システムは、ネットワーク２０２を介して接続された情報処理端末２０１ａと情報処理端末２０１ｂとコンテンツ保持部２０３とによって構成される。本実施例の情報処理システムが、前述の実施例１と大きく異なる点は、コンテンツ保持部２０３と情報処理端末２０１ａが接続され、Ａさんの情報処理端末２０１ａからＢさんの情報処理端末２０１ｂにコンテンツを送信するようにした点である。本例のコンテンツ保持部２０３は、例えば、情報処理端末２０１ａに内蔵された記憶媒体や、情報処理端末２０１ａに外部接続可能な記憶媒体など、コンテンツを保持、提供可能なものであればよく、実施例１と同様にネットワーク上のコンテンツサーバであってもよい。 The information processing system according to the present exemplary embodiment includes an information processing terminal 201a, an information processing terminal 201b, and a content holding unit 203 connected via a network 202. The information processing system of this embodiment differs greatly from the first embodiment described above in that the content holding unit 203 and the information processing terminal 201a are connected, and the content is transferred from Mr. A's information processing terminal 201a to Mr. B's information processing terminal 201b. Is to send. The content holding unit 203 of this example may be anything that can hold and provide content, such as a storage medium built in the information processing terminal 201a or a storage medium that can be externally connected to the information processing terminal 201a. Similar to Example 1, it may be a content server on the network.

情報処理端末２０１ａは、Ａさんの会話音声２１ａを情報処理端末２０１ｂへ送信すると共に、コンテンツ保持部２０３から取得したコンテンツ音声２３ａと情報処理端末２０１ｂから受信したＢさんの会話音声２２ａとを混合して混合音声２４ａとして出音する。同様に、情報処理端末２０１ｂは、Ｂさんの会話音声２２ｂを情報処理端末２０１ａへ送信すると共に、情報処理端末２０１ａから受信したコンテンツ音声２３ｂと情報処理端末２０１ａから受信したＡさんの会話音声２１ｂとを混合して混合音声２４ｂとして出音する。 The information processing terminal 201a transmits Mr. A's conversation voice 21a to the information processing terminal 201b, and mixes the content voice 23a acquired from the content holding unit 203 with Mr. B's conversation voice 22a received from the information processing terminal 201b. And output as mixed voice 24a. Similarly, the information processing terminal 201b transmits the conversation voice 22b of Mr. B to the information processing terminal 201a, and the content voice 23b received from the information processing terminal 201a and the conversation voice 21b of Mr. A received from the information processing terminal 201a. Are mixed and output as a mixed sound 24b.

図１６は、実施例２の情報処理端末２０１ａの構成例を示すブロック図で、情報処理端末２０１ａは、通信部２１０、会話音声入力部２１１、音声出力部２１２、会話音声送信部２１３、会話音声受信部２１４、出音音声混合部２１５、混合音量変更入力部２１６、モード変更入力部２１７、モード変更送信部２１８、モード変更受信部２１９、コンテンツ音声抽出部２２０、コンテンツ音声送信部２２１、出音制御部２２２、及びコンテンツ保持部２０３によって構成される。なお、本例における情報処理端末２０１ｂは、前述の図３に示した実施例１の情報処理端末１０１ａと同じ構成をとるものとする。 FIG. 16 is a block diagram illustrating a configuration example of the information processing terminal 201a according to the second embodiment. The information processing terminal 201a includes a communication unit 210, a conversation voice input unit 211, a voice output unit 212, a conversation voice transmission unit 213, and a conversation voice. Receiving unit 214, output sound mixing unit 215, mixed sound volume change input unit 216, mode change input unit 217, mode change transmission unit 218, mode change reception unit 219, content audio extraction unit 220, content audio transmission unit 221, output sound A control unit 222 and a content holding unit 203 are included. Note that the information processing terminal 201b in this example has the same configuration as the information processing terminal 101a of the first embodiment shown in FIG.

図１６において、コンテンツ音声抽出部２２０、コンテンツ音声送信部２２１、コンテンツ保持部２０３以外の各部（すなわち、通信部２１０、会話音声入力部２１１、音声出力部２１２、会話音声送信部２１３、会話音声受信部２１４、出音音声混合部２１５、混合音量変更入力部２１６、モード変更入力部２１７、モード変更送信部２１８、モード変更受信部２１９、出音制御部２２２）は、図３に示した実施例１の各部と同じ機能となるため、ここでの説明は省略する。なお、出音制御部２２２は、図４に示した出音制御部１２１と同様に、混合音量制御部、混合音量情報保持部、モード情報保持部、元設定混合音量情報保持部、会話優先混合音量情報保持部によって構成されている。 16, each unit other than the content audio extraction unit 220, the content audio transmission unit 221, and the content holding unit 203 (that is, the communication unit 210, the conversation audio input unit 211, the audio output unit 212, the conversation audio transmission unit 213, the conversation audio reception) Unit 214, output sound mixing unit 215, mixed sound volume change input unit 216, mode change input unit 217, mode change transmission unit 218, mode change reception unit 219, and sound output control unit 222) are shown in FIG. The functions are the same as those of the first unit, and a description thereof is omitted here. Similar to the sound output control unit 121 shown in FIG. 4, the sound output control unit 222 is a mixed sound volume control unit, mixed sound volume information holding unit, mode information holding unit, original set mixed volume information holding unit, conversation priority mixing. It is comprised by the volume information holding part.

コンテンツ音声抽出部２２０は、コンテンツ保持部２０３からコンテンツ音声を抽出する手段である。コンテンツ音声送信部２２１は、コンテンツ音声抽出部２２０から伝えられたコンテンツ音声を、通信部２１０を介してネットワーク２０２へ送信する手段である。 The content audio extraction unit 220 is a unit that extracts content audio from the content holding unit 203. The content audio transmission unit 221 is means for transmitting the content audio transmitted from the content audio extraction unit 220 to the network 202 via the communication unit 210.

以下、図１７に示すフローに基づいて、情報処理端末２０１ａの動作例を中心に説明する。なお、情報処理端末２０１ｂの動作例は、前述の図５、図６、図８、図１０に示した実施例１のフローと同様であるため説明を省略する。 Hereinafter, based on the flow shown in FIG. 17, the operation example of the information processing terminal 201a will be mainly described. Note that the operation example of the information processing terminal 201b is the same as the flow of the first embodiment shown in FIG. 5, FIG. 6, FIG. 8, and FIG.

（音声通話）
図１７は、実施例２の情報処理端末２０１ａの音声通話処理の一例を説明するためのフロー図である。尚、本例は、図１６に示した構成に基づいて説明する。また、実際には送信プロセスと受信プロセスは同時に行なわれているが、説明のためそれぞれのプロセスを分けて示している。 (Voice call)
FIG. 17 is a flowchart for explaining an example of the voice call processing of the information processing terminal 201a according to the second embodiment. This example will be described based on the configuration shown in FIG. In addition, although the transmission process and the reception process are actually performed at the same time, each process is shown separately for explanation.

図１７（Ａ）に示す送信プロセスにおいて、まず、情報処理端末２０１ａは、Ａさんの会話音声を会話音声入力部２１１により取り込み、同時にコンテンツ音声をコンテンツ音声抽出部２２０により取り込む（ステップＳ４１）。その後、情報処理端末２０１ａは、会話音声送信部２１３により会話音声を、コンテンツ音声送信部２２１によりコンテンツ音声を、通信部２１０を介してネットワーク２０２へ送信する（ステップＳ４２）。 In the transmission process shown in FIG. 17A, first, the information processing terminal 201a captures Mr. A's conversation voice by the conversation voice input unit 211 and simultaneously fetches the content voice by the content voice extraction unit 220 (step S41). After that, the information processing terminal 201a transmits the conversation voice by the conversation voice transmission unit 213 and the content voice by the content voice transmission unit 221 to the network 202 via the communication unit 210 (step S42).

一方、図１７（Ｂ）に示す受信プロセスにおいて、まず、情報処理端末２０１ａは、通信部２１０を介してネットワーク２０２からＢさんからの会話音声を受信し、同時にコンテンツ音声をコンテンツ音声抽出部２２０により取り込む（ステップＳ４３）。続いて、実施例１の場合と同様に、出音音声混合部２１５は、これらふたつの音声を混合するため、出音制御部２２２の混合音量情報保持部から混合音量情報を読み出し（ステップＳ４４）、その比率に基づいて会話音声とコンテンツ音声を混合して混合音声を生成し（ステップＳ４５）、生成した混合音声を、音声出力部２１２から出音する（ステップＳ４６）。なお、ステップＳ４１〜Ｓ４２の送信プロセスと、ステップＳ４３〜Ｓ４６の受信プロセスは同時に実行されているものとする。 On the other hand, in the reception process shown in FIG. 17B, first, the information processing terminal 201a receives conversation voice from Mr. B from the network 202 via the communication unit 210, and at the same time, the content audio is extracted by the content audio extraction unit 220. Capture (step S43). Subsequently, as in the case of the first embodiment, the sound output sound mixing unit 215 reads the mixed sound volume information from the mixed sound volume information holding unit of the sound output control unit 222 in order to mix these two sounds (step S44). Based on the ratio, the conversation voice and the content voice are mixed to generate a mixed voice (step S45), and the generated mixed voice is output from the voice output unit 212 (step S46). It is assumed that the transmission process of steps S41 to S42 and the reception process of steps S43 to S46 are executed simultaneously.

その他の処理、すなわち、混合音量変更、会話優先送信，会話優先受信の処理フローは、前述の図６、図８、図１０に示した実施例１の場合と同様であるため説明を省略する。 The other processes, that is, the mixed sound volume change, conversation priority transmission, and conversation priority reception process flows are the same as those in the first embodiment shown in FIGS.

（外部入力端子が接続されていないときコンテンツを共有してないと判断する機能）
本発明の情報処理端末において、コンテンツサーバのデータを入力する外部入力端子に何も接続されていないときは、コンテンツを共有していないと判断する機能があってもよい。コンテンツを共有していないと判断した後、実施例１で記述した（コンテンツを共有してないときに会話優先出音モードを自動的に無効にする機能）を用いてもよい。 (Function to determine that content is not shared when the external input terminal is not connected)
The information processing terminal of the present invention may have a function of determining that content is not shared when nothing is connected to an external input terminal for inputting data of the content server. After determining that the content is not shared, the function described in the first embodiment (a function of automatically disabling the conversation priority sound output mode when the content is not shared) may be used.

なお、上記外部入力端子が接続されていないときコンテンツを共有してないと判断する機能は実施例２だけでなく、以下に説明する実施例３にも同様に適用することができる。 Note that the function of determining that content is not shared when the external input terminal is not connected is applicable not only to the second embodiment but also to the third embodiment described below.

（実施例３）
図１８は、実施例３の情報処理システムの概要構成例を説明するための図である。本実施例の情報処理システムの特徴は、前述の実施例２と同様に、Ａさんが所有しているコンテンツを共有しながらＡさんとＢさんで会話を楽しめるというものであるが、実施例２と異なる点は、ＡさんからＢさんへ会話音声とコンテンツ音声を別々に送信するのではなく、Ａさんの情報処理端末で混合してから送信するという点である。つまり、音声信号をひとつのデータとして送信できるため、既存のＴＶ電話システムを利用できる。さらに、別々にふたつのデータとして送信していたときと比べて通信帯域を１／２に節約できるなどの効果を奏する。ただし、本実施例では、会話優先出音モードにおける出音レベル設定は絶対音量ではなく音量比率に制限される。 (Example 3)
FIG. 18 is a diagram for explaining a schematic configuration example of the information processing system according to the third embodiment. The characteristic of the information processing system of the present embodiment is that, like the second embodiment described above, A and B can enjoy a conversation while sharing the content owned by Mr. A. The difference is that conversation voice and content voice are not transmitted separately from Mr. A to Mr. B, but are mixed and transmitted at Mr. A's information processing terminal. That is, since an audio signal can be transmitted as one data, an existing TV phone system can be used. Furthermore, there is an effect that the communication band can be reduced to ½ compared to when the data is transmitted separately as two data. However, in this embodiment, the sound output level setting in the conversation priority sound output mode is not limited to the absolute sound volume but limited to the sound volume ratio.

本実施例の情報処理システムは、ネットワーク３０２を介して接続された情報処理端末３０１ａと情報処理端末３０１ｂとコンテンツ保持部３０３とによって構成される。本実施例の情報処理システムが実施例２と大きく異なる点は、情報処理端末３０１ａから情報処理端末３０１ｂへ送る音声が、会話音声とコンテンツ音声ではなく、既に混合された混合音声であるという点である。 The information processing system according to the present exemplary embodiment includes an information processing terminal 301a, an information processing terminal 301b, and a content holding unit 303 connected via a network 302. The information processing system of the present embodiment is greatly different from that of the second embodiment in that the voice sent from the information processing terminal 301a to the information processing terminal 301b is not a conversation voice and a content voice but a mixed voice that has already been mixed. is there.

情報処理端末３０１ａは、コンテンツ保持部３０３から取得したコンテンツ音声３３ａとＡさんの会話音声３１ａとを混合して混合音声３５ａとしてＢさんの情報処理端末３０１ｂに送信すると共に、コンテンツ保持部３０３から取得したコンテンツ音声３３ａと情報処理端末３０１ｂから受信したＢさんの会話音声３２ａとを混合して混合音声３４ａとして出音する。同様に、情報処理端末３０１ｂは、Ｂさんの会話音声３２ｂを情報処理端末２０１ａへ送信すると共に、情報処理端末３０１ａから受信した混合音声３１ｂを出音する。 The information processing terminal 301a mixes the content audio 33a acquired from the content holding unit 303 and the conversation audio 31a of Mr. A and transmits the mixed audio 35a to the information processing terminal 301b of B, and also acquires from the content holding unit 303. The mixed content sound 33a and Mr. B's conversation sound 32a received from the information processing terminal 301b are mixed and output as mixed sound 34a. Similarly, the information processing terminal 301b transmits Mr. B's conversation voice 32b to the information processing terminal 201a and outputs the mixed voice 31b received from the information processing terminal 301a.

図１９は、実施例３の情報処理端末３０１ａの構成例を示すブロック図で、情報処理端末３０１ａは、通信部３１０、会話音声入力部３１１、音声出力部３１２、会話音声受信部３１３、出音音声混合部３１４ａ，３１４ｂ、混合音量変更入力部３１５、モード変更入力部３１６、モード変更受信部３１７、コンテンツ音声抽出部３１８、混合音声送信部３１９、混合音量変更受信部３２０、出音制御部３２１ａ，３２１ｂ、及びコンテンツ保持部３０３によって構成される。 FIG. 19 is a block diagram illustrating a configuration example of the information processing terminal 301a according to the third embodiment. The information processing terminal 301a includes a communication unit 310, a conversation voice input unit 311, a voice output unit 312, a conversation voice reception unit 313, and a sound output. Audio mixing units 314a and 314b, mixed volume change input unit 315, mode change input unit 316, mode change reception unit 317, content audio extraction unit 318, mixed audio transmission unit 319, mixed volume change reception unit 320, sound output control unit 321a , 321b and the content holding unit 303.

図１９において、出音音声混合部３１４ｂ、混合音量変更受信部３２０、混合音声送信部３１９、出音制御部３２１ｂ以外の各部（すなわち、コンテンツ保持部３０３、通信部３１０、会話音声入力部３１１、音声出力部３１２、会話音声受信部３１３、出音音声混合部３１４ａ、混合音量変更入力部３１５、モード変更入力部３１６、モード変更受信部３１７、コンテンツ音声抽出部３１８、出音制御部３２１ａ）は、図１６に示した実施例２の各部と同じ機能となるため、ここでの説明は省略する。なお、出音制御部３２１ａ，３２１ｂは、図４に示した出音制御部１２１と同様に、混合音量制御部、混合音量情報保持部、モード情報保持部、元設定混合音量情報保持部、会話優先混合音量情報保持部によって構成されている。 In FIG. 19, each unit other than the sound output sound mixing unit 314b, the mixed sound volume change receiving unit 320, the mixed sound transmission unit 319, and the sound output control unit 321b (that is, the content holding unit 303, the communication unit 310, the conversation sound input unit 311, The voice output unit 312, the conversational voice receiving unit 313, the output voice mixing unit 314 a, the mixed sound volume change input unit 315, the mode change input unit 316, the mode change reception unit 317, the content voice extraction unit 318, and the sound output control unit 321 a). Since the functions are the same as those in the second embodiment shown in FIG. 16, the description thereof is omitted here. Note that the sound output control units 321a and 321b are similar to the sound output control unit 121 shown in FIG. 4, the mixed sound volume control unit, the mixed sound volume information holding unit, the mode information holding unit, the original mixed sound volume information holding unit, the conversation It is comprised by the priority mixing volume information holding part.

出音音声混合部３１４ｂは、会話音声入力部３１１からの会話音声と、コンテンツ音声抽出部３１８からのコンテンツ音声とを混合して混合音声を混合音声送信部３１９に入力する。混合音声送信部３１９は、出音音声混合部３１４ｂから伝えられた混合音声を、通信部３１０を介してネットワーク３０２へ送信する。混合音量変更受信部３２０は、通信部３１０を介してネットワーク３０２から、情報処理端末３０１ｂからの混合音量変更情報（音声比率の変更情報）を受信し、その混合音量変更情報を出音制御部３２１ｂへ入力する。出音音声混合部３１４ｂは、出音制御部３２１ｂの混合音量情報保持部から混合音量情報を読み出し、その比率に基づいて混合音声を生成し、生成した混合音声は混合音声送信部３１９からネットワーク３０２へ送信される。 The output sound mixing unit 314 b mixes the conversation sound from the conversation sound input unit 311 and the content sound from the content sound extraction unit 318 and inputs the mixed sound to the mixed sound transmission unit 319. The mixed sound transmission unit 319 transmits the mixed sound transmitted from the output sound mixing unit 314 b to the network 302 via the communication unit 310. The mixed volume change receiving unit 320 receives mixed volume change information (voice ratio change information) from the information processing terminal 301b from the network 302 via the communication unit 310, and outputs the mixed volume change information to the sound output control unit 321b. Enter. The output sound mixing unit 314b reads the mixed sound volume information from the mixed sound volume information holding unit of the sound output control unit 321b, generates mixed sound based on the ratio, and the generated mixed sound is sent from the mixed sound transmission unit 319 to the network 302. Sent to.

（音声通話）
図２０は、実施例３の情報処理端末３０１ａの音声通話処理の一例を説明するためのフロー図である。尚、本例は、図１９に示した構成に基づいて説明する。また、実際には送信プロセスと受信プロセスは同時に行なわれているが、説明のためそれぞれのプロセスを分けて示している。 (Voice call)
FIG. 20 is a flowchart for explaining an example of the voice call process of the information processing terminal 301a according to the third embodiment. This example will be described based on the configuration shown in FIG. In addition, although the transmission process and the reception process are actually performed at the same time, each process is shown separately for explanation.

図２０（Ａ）に示す送信プロセスでは、まず、情報処理端末３０１ａは、Ａさんの会話音声が会話音声力部３１１により取り込まれ、同時に、コンテンツ音声がコンテンツ音声抽出部３１８により取り込まれる（ステップＳ５１）。次に、出音音声混合部３１４ｂは、これらふたつの音声を混合するため、出音制御部３２１ｂの混合音量情報保持部から混合音量情報を読み出し（ステップＳ５２）、その比率に基づいて混合音声を生成する（ステップＳ５３）。その後、情報処理端末３０１ａは、生成した混合音声を通信部３１０を介してネットワーク３０２へ送信する（ステップＳ５４）。 In the transmission process shown in FIG. 20A, first, in the information processing terminal 301a, Mr. A's conversation voice is captured by the conversation voice power unit 311 and at the same time, the content voice is captured by the content voice extraction unit 318 (step S51). ). Next, in order to mix these two sounds, the sound output sound mixing unit 314b reads the mixed sound volume information from the mixed sound volume information holding unit of the sound output control unit 321b (step S52), and outputs the mixed sound based on the ratio. Generate (step S53). Thereafter, the information processing terminal 301a transmits the generated mixed sound to the network 302 via the communication unit 310 (step S54).

一方、図２０（Ｂ）に示す受信プロセスにおいて、まず、情報処理端末３０１ａは、通信部３１０を介してネットワーク３０２からＢさんからの会話音声を受信し、同時にコンテンツ音声をコンテンツ音声抽出部３１８により取り込む（ステップＳ５５）。続いて、実施例２の場合と同様に、出音音声混合部３１４ａは、これらふたつの音声を混合するため、出音制御部３２１ａの混合音量情報保持部から混合音量情報を読み出し（ステップＳ５６）、その比率に基づいて会話音声とコンテンツ音声を混合して混合音声を生成し（ステップＳ５７）、生成した混合音声を、音声出力部３１２から出音する（ステップＳ５８）。なお、ステップＳ５１〜Ｓ５４の送信プロセスと、ステップＳ５５〜Ｓ５８の受信プロセスは同時に実行されているものとする。 On the other hand, in the reception process shown in FIG. 20B, first, the information processing terminal 301a receives conversation voice from Mr. B from the network 302 via the communication unit 310, and at the same time, the content audio is extracted by the content audio extraction unit 318. Capture (step S55). Subsequently, as in the case of the second embodiment, the output sound mixing unit 314a reads the mixed sound volume information from the mixed sound volume information holding unit of the sound output control unit 321a in order to mix these two sounds (step S56). Based on the ratio, the conversation voice and the content voice are mixed to generate a mixed voice (step S57), and the generated mixed voice is output from the voice output unit 312 (step S58). It is assumed that the transmission process of steps S51 to S54 and the reception process of steps S55 to S58 are executed simultaneously.

なお、情報処理端末３０１ａにおける混合音量変更プロセスは、図６に示した実施例１の場合と同様であるため説明を省略する。 The mixed sound volume changing process in the information processing terminal 301a is the same as that in the first embodiment shown in FIG.

（会話優先送信）
図２１は、実施例３において情報処理端末３０１ａが情報処理端末３０１ｂに対してモード変更の操作を行なった際の動作例を説明するためのフロー図である。尚、本例は、図１９に示した構成に基づいて説明する。まず、情報処理端末３０１ａは、Ａさんが情報処理端末３０１ａのモードを切り替えるために行なった操作により入力された信号を、モード変更入力部３１６を通してモード変更信号に変換する（ステップＳ６１）。続いて、モード変更信号を出音制御部３２１ｂ内の混合音量制御部へ伝え、混合音量制御部は、このモード変更信号が会話優先出音モードか通常モードかをチェックし（ステップＳ６２）、会話優先出音モードである場合（図中、会話優先出音モードの場合）、混合音量制御部は、モード情報保持部へモード情報として会話優先出音モードを書き込む（ステップＳ６３）。 (Conversation priority transmission)
FIG. 21 is a flowchart for explaining an operation example when the information processing terminal 301a performs a mode change operation on the information processing terminal 301b in the third embodiment. This example will be described based on the configuration shown in FIG. First, the information processing terminal 301a converts a signal input by an operation performed by Mr. A to switch the mode of the information processing terminal 301a into a mode change signal through the mode change input unit 316 (step S61). Subsequently, the mode change signal is transmitted to the mixed sound volume control unit in the sound output control unit 321b, and the mixed sound volume control unit checks whether the mode change signal is the conversation priority sound output mode or the normal mode (step S62). When it is the priority sound output mode (in the case of the conversation priority sound output mode in the figure), the mixed sound volume control unit writes the conversation priority sound output mode as mode information in the mode information holding unit (step S63).

次に、元設定の混合音量情報を保持しておくため、混合音量情報保持部から現在の混合音量情報を読み出し、元設定混合音量情報保持部へ書き込み（ステップＳ６４）、その後、会話優先混合音量情報保持部から会話優先出音モード用の出音レベル設定を読み出し、混合音量情報保持部へ書き込む（ステップＳ６５）。最後に、どちらのモードを送信したか示す情報を画面上に表示する（ステップＳ６６）。 Next, in order to retain the original mixed sound volume information, the current mixed sound volume information is read from the mixed sound volume information holding unit and written to the original set mixed sound volume information holding unit (step S64). The sound level setting for the conversation priority sound output mode is read from the information holding unit and written to the mixed sound volume information holding unit (step S65). Finally, information indicating which mode is transmitted is displayed on the screen (step S66).

一方、ステップＳ６２において、モード変更信号が通常モードである場合（図中、通常モードの場合）、混合音量制御部は、モード情報保持部へモード情報として通常モードを書き込む（ステップＳ６７）。次に、元設定の混合音量に戻すため、元設定混合音量情報保持部から混合音量情報を読み出し、混合音量情報保持部へ書き込む（ステップＳ６８）。最後に、どちらのモードを送信したか示す情報を画面上に表示する（ステップＳ６９）。上記ステップＳ６６及びステップＳ６９における受信結果の表示例を図９に示す。このように、送信したモード変更信号が会話優先出音モード、通常モードのどちらであるかを知らせるための情報を、情報処理端末３０１ａの画面に表示させ、利用者（Ａさん）に伝える。 On the other hand, when the mode change signal is the normal mode in step S62 (in the case of the normal mode in the figure), the mixed sound volume control unit writes the normal mode as mode information in the mode information holding unit (step S67). Next, in order to return to the original mixed sound volume, the mixed sound volume information is read from the original mixed sound volume information holding unit and written to the mixed sound volume information holding unit (step S68). Finally, information indicating which mode is transmitted is displayed on the screen (step S69). FIG. 9 shows a display example of the reception results in steps S66 and S69. In this way, information for notifying whether the transmitted mode change signal is the conversation priority sound output mode or the normal mode is displayed on the screen of the information processing terminal 301a and is transmitted to the user (Mr. A).

なお、情報処理端末３０１ａにおける会話優先受信プロセスは、図１０に示した実施例１の場合と同様であるため説明を省略する。 The conversation priority reception process in the information processing terminal 301a is the same as that in the first embodiment shown in FIG.

（混合音量変更：通信相手の出音レベル設定）
図２２は、実施例３の情報処理端末３０１ａにおいて、情報処理端末３０１ｂから受信した混合音量変更信号に応じて、情報処理端末３０１ｂの出音レベル設定を変更する際の動作例を説明するためのフロー図である。尚、本例は、図１９に示した構成に基づいて説明する。まず、情報処理端末３０１ａは、情報処理端末３０１ｂから混合音量変更信号を混合音量変更受信部３２０で受信し、出音制御部３２１ｂの混合音量制御部に伝える（ステップＳ７１）。次に、混合音量制御部は、モード情報保持部から今のモード情報を読み出し、会話優先出音モードであるか通常モードであるかをチェックする（ステップＳ７２）。その結果、通常モードである場合（図中、通常モードの場合）、情報処理端末３０１ｂからの混合音量変更信号に基づいた混合音量情報を混合音量情報保持部へ書き込み、情報処理端末３０１ｂの混合音量情報を変更する（ステップＳ７３）。一方、ステップＳ７２において、会話優先出音モードである場合（図中、会話優先出音モードの場合）、混合音量情報を変更しないでそのまま終了する。 (Mixed volume change: Setting of the other party's output level)
FIG. 22 is a diagram for explaining an operation example when the sound output level setting of the information processing terminal 301b is changed according to the mixed sound volume change signal received from the information processing terminal 301b in the information processing terminal 301a of the third embodiment. FIG. This example will be described based on the configuration shown in FIG. First, the information processing terminal 301a receives the mixed volume change signal from the information processing terminal 301b by the mixed volume change receiving unit 320 and transmits it to the mixed volume control unit of the sound output control unit 321b (step S71). Next, the mixed sound volume control unit reads the current mode information from the mode information holding unit and checks whether it is the conversation priority sound output mode or the normal mode (step S72). As a result, in the normal mode (in the case of the normal mode in the figure), the mixed volume information based on the mixed volume change signal from the information processing terminal 301b is written to the mixed volume information holding unit, and the mixed volume of the information processing terminal 301b is written. Information is changed (step S73). On the other hand, if it is the conversation priority sound output mode in step S72 (in the case of the conversation priority sound output mode in the figure), the mixed sound volume information is not changed and the process is terminated as it is.

図２３は、実施例３の情報処理端末３０１ｂの構成例を示すブロック図で、情報処理端末３０１ｂは、通信部４１０、会話音声入力部４１１、音声出力部４１２、会話音声送信部４１３、混合音量変更入力部４１４、モード変更入力部４１５、モード変更送信部４１６、混合音声受信部４１７、及び混合音量変更送信部４１８によって構成される。 FIG. 23 is a block diagram illustrating a configuration example of the information processing terminal 301b according to the third embodiment. The information processing terminal 301b includes a communication unit 410, a conversation voice input unit 411, a voice output unit 412, a conversation voice transmission unit 413, and a mixed sound volume. A change input unit 414, a mode change input unit 415, a mode change transmission unit 416, a mixed sound reception unit 417, and a mixed sound volume change transmission unit 418 are configured.

図２３において、混合音声受信部４１７、混合音量変更送信部４１８以外の各部（すなわち、通信部４１０、会話音声入力部４１１、音声出力部４１２、会話音声送信部４１３、混合音量変更入力部４１４、モード変更入力部４１５、モード変更送信部４１６）は、図６に示した実施例２の各部と同じ機能となるため、ここでの説明は省略する。 In FIG. 23, each unit other than the mixed voice receiving unit 417 and the mixed volume change transmitting unit 418 (that is, the communication unit 410, the conversation voice input unit 411, the voice output unit 412, the conversation voice transmission unit 413, the mixed volume change input unit 414, The mode change input unit 415 and the mode change transmission unit 416) have the same functions as those of the second embodiment shown in FIG.

混合音声受信部４１７は、通信部４１０を介してネットワーク３０２から混合音声を受信する手段である。混合音量変更送信部４１８は、混合音量変更入力部４１４から伝えられた混合音量変更信号を、通信部４１０を介してネットワーク３０２へ送信する手段である。 The mixed sound receiving unit 417 is a means for receiving mixed sound from the network 302 via the communication unit 410. The mixed sound volume change transmission unit 418 is a means for transmitting the mixed sound volume change signal transmitted from the mixed sound volume change input unit 414 to the network 302 via the communication unit 410.

（音声通話）
図２４は、実施例３の情報処理端末３０１ｂの音声通話処理の一例を説明するためのフロー図である。尚、本例は、図２３に示した構成に基づいて説明する。また、実際には送信プロセスと受信プロセスは同時に行なわれているが、説明のためそれぞれのプロセスを分けて示している。 (Voice call)
FIG. 24 is a flowchart for explaining an example of the voice call process of the information processing terminal 301b according to the third embodiment. This example will be described based on the configuration shown in FIG. In addition, although the transmission process and the reception process are actually performed at the same time, each process is shown separately for explanation.

図２４（Ａ）に示す送信プロセスにおいて、まず、情報処理端末３０１ｂは、Ｂさんの会話音声を会話音声入力部４１１により取り込み（ステップＳ８１）、取り込んだ会話音声を通信部４１０を介してネットワーク３０２へ送信する（ステップＳ８２）。一方、図２４（Ｂ）に示す受信プロセスにおいて、情報処理端末３０１ｂは、通信部４１０を介してネットワーク３０２から混合音声を受信し（ステップＳ８３）、受信した混合音声を音声出力部４１２から出音する（ステップＳ８４）。なお、ステップＳ８１〜Ｓ８２の送信プロセスと、ステップＳ８３〜Ｓ８４の受信プロセスは同時に実行されているものとする。 In the transmission process shown in FIG. 24A, first, the information processing terminal 301b captures Mr. B's conversation voice by the conversation voice input unit 411 (step S81), and the captured conversation voice is transmitted to the network 302 via the communication unit 410. (Step S82). On the other hand, in the reception process shown in FIG. 24B, the information processing terminal 301b receives mixed sound from the network 302 via the communication unit 410 (step S83), and outputs the received mixed sound from the sound output unit 412. (Step S84). It is assumed that the transmission process of steps S81 to S82 and the reception process of steps S83 to S84 are executed simultaneously.

（混合音量変更）
図２５は、実施例３の情報処理端末３０１ｂで出音される音声の混合音量変更処理の一例を説明するためのフロー図である。尚、本例は、図２３に示した構成に基づいて説明する。まず、情報処理端末３０１ｂは、Ｂさんが混合音量を変更するために行なった操作により入力された信号を、混合音量変更入力部４１４を通して混合音量変更信号に変更する（ステップＳ９１）。続いて、混合音量変更送信部４１８は、取得された混合音量変更信号を通信部４１０を介してネットワーク３０２へ送信し、情報処理端末３０１ａへ伝える（ステップＳ９２）。そして、実際の混合音量変更処理は、図２２に示したフローに従って情報処理端末３０１ａで実行される。 (Mixed volume change)
FIG. 25 is a flowchart for explaining an example of the mixed sound volume changing process for the sound output from the information processing terminal 301b according to the third embodiment. This example will be described based on the configuration shown in FIG. First, the information processing terminal 301b changes the signal input by the operation performed by Mr. B to change the mixed volume to the mixed volume change signal through the mixed volume change input unit 414 (step S91). Subsequently, the mixed volume change transmission unit 418 transmits the acquired mixed volume change signal to the network 302 via the communication unit 410 and transmits the signal to the information processing terminal 301a (step S92). Then, the actual mixed sound volume changing process is executed by the information processing terminal 301a according to the flow shown in FIG.

なお、情報処理端末３０１ｂにおける会話優先送信プロセスは、図８に示した実施例１の場合と同様であるため説明を省略する。 The conversation priority transmission process in the information processing terminal 301b is the same as that in the first embodiment shown in FIG.

また、会話優先受信プロセスについて、情報処理端末３０１ｂで行なう処理はない。実施例３の情報処理システムでは、情報処理端末３０１ｂにおける会話優先受信、混合音量変更のプロセスが、情報処理端末３０１ａ上で行なわれる。したがって、結果を情報処理端末３０１ｂ側で表示するためには新たな制御情報の通知が必要である。このための制御情報通知の機能を情報処理端末に備えるようにしてもよい。 Further, there is no processing performed by the information processing terminal 301b for the conversation priority reception process. In the information processing system according to the third embodiment, the conversation priority reception process and the mixed sound volume change process in the information processing terminal 301b are performed on the information processing terminal 301a. Therefore, in order to display the result on the information processing terminal 301b side, notification of new control information is necessary. The information processing terminal may be provided with a control information notification function for this purpose.

（プログラム実装）
なお、本発明の情報処理端末は、プログラムとしても実現可能である。このプログラムはコンピュータで読み取り可能な記録媒体に格納されており、各処理は上記プログラムによって実現される。記録媒体の例としては、磁気テープやカセットテープなどのテープ系、フロッピー（登録商標）ディスクやハードディスク等の磁気ディスク系、ＣＤ−ＲＯＭ／ＭＯ／ＭＤ／ＤＶＤなどの光ディスクなどからなるディスク系、ＩＣカードや光カード等のカード系、マスクＲＯＭ、ＦＰＲＯＭ、ＥＥＰＲＯＭ、フラッシュＲＯＭ等による半導体メモリを含めた固定的にプログラムを担持する媒体などいずれであってもよい。 (Program implementation)
Note that the information processing terminal of the present invention can also be realized as a program. This program is stored in a computer-readable recording medium, and each process is realized by the program. Examples of recording media include tape systems such as magnetic tapes and cassette tapes, magnetic disk systems such as floppy (registered trademark) disks and hard disks, disk systems such as optical disks such as CD-ROM / MO / MD / DVD, and ICs. Any of a card system such as a card or an optical card, a medium carrying a fixed program including a semiconductor memory such as a mask ROM, FPROM, EEPROM, flash ROM or the like may be used.

（筐体レイアウトの例）
図２６は、本発明に係る情報処理端末の各ブロック間の接続構成の一例を説明するための接続構成図である。図２６に示す情報処理端末１０１ａは、テレビ受像機１０１１、テレビ電話用のカメラ及びマイク内蔵型のセットトップユニット１０１２、利用者が操作するリモコンユニット１０１３、有線又は無線回線を用いたネットワーク１０２を介して、他の情報処理装置１０１ｂとの間で映像、音声の送受信を行なうアダプタユニット１０１０とを少なくとも備えている。 (Example of housing layout)
FIG. 26 is a connection configuration diagram for explaining an example of a connection configuration between blocks of the information processing terminal according to the present invention. The information processing terminal 101a shown in FIG. 26 is connected via a TV set 1011, a video phone camera and microphone set-top unit 1012, a remote control unit 1013 operated by a user, and a network 102 using a wired or wireless line. And at least an adapter unit 1010 that transmits and receives video and audio to and from the other information processing apparatus 101b.

セットトップユニット１０１２は、有線又は無線によりアダプタユニット１０１０と接続され、ネットワーク１０２を介して受信すべき情報の選択指示をしたり、内蔵のカメラで撮影した画像データの送信を指示したりする。また、アダプタユニット１０１０は、有線又は無線により、ネットワーク１０２と接続されると共に、テレビ受像機１０１１と接続されていて、ネットワーク１０２を介して受信した映像、音声などの情報をテレビ受像機１０１１に伝えると共に、セットトップユニット１０１２の内蔵カメラが撮影した画像データをネットワーク１０２を介して指定された他の情報処理装置に送信する。また、リモコンユニット１０１３は、赤外線又は無線信号により、利用者からの指示を、セットトップユニット１０１２に、又は、セットトップユニット１０１２を介してアダプタユニット１０１０に伝える。テレビ受像機１０１１は、アダプタユニット１０１０やセットトップユニット１０１２からの映像や音声、あるいは、テキスト情報を表示出力する。 The set-top unit 1012 is connected to the adapter unit 1010 by wire or wirelessly, and instructs the selection of information to be received via the network 102 or instructs the transmission of image data captured by the built-in camera. The adapter unit 1010 is connected to the network 102 by wire or wirelessly, and is connected to the television receiver 1011, and transmits information such as video and audio received via the network 102 to the television receiver 1011. At the same time, the image data taken by the built-in camera of the set top unit 1012 is transmitted to another information processing apparatus designated via the network 102. The remote control unit 1013 transmits an instruction from the user to the set top unit 1012 or to the adapter unit 1010 via the set top unit 1012 by an infrared ray or a radio signal. The television receiver 1011 displays and outputs video and audio from the adapter unit 1010 and the set top unit 1012 or text information.

また、図２７は、本発明に係る情報処理装置の各ブロック間の接続構成の他の例を説明するための接続構成図であり、図２７に示す情報処理装置１０１ａの変形例を示している。即ち、図２７に示す情報処理装置１０１ａ′は、図２６に示した構成とは異なり、情報処理装置１０１ａのアダプタユニット１０１０を通信部１０１０′とし、この通信部１０１０′がテレビ受像機１０１１′に内蔵されて一体化された一体型情報処理装置の場合を示している。なお、図示していないが、テレビ受像機１０１１′が更にセットトップユニット１０１２′と一体となっている構成であっても良い。また、通信部１０１０′には、外部機器から映像や音声を入力するために、外部入力端子がついている。図２７に示す例では、外部入力端子としてＲＣＡピンジャック３個(映像１、音声２)が設けられているが、ＲＣＡピン以外の入力形式であっても構わない。 27 is a connection configuration diagram for explaining another example of the connection configuration between blocks of the information processing apparatus according to the present invention, and shows a modification of the information processing apparatus 101a shown in FIG. . That is, the information processing apparatus 101a ′ shown in FIG. 27 differs from the configuration shown in FIG. 26 in that the adapter unit 1010 of the information processing apparatus 101a is a communication unit 1010 ′, and the communication unit 1010 ′ is connected to the television receiver 1011 ′. This shows a case of an integrated information processing apparatus built in and integrated. Although not shown, the television receiver 1011 ′ may be further integrated with the set top unit 1012 ′. Further, the communication unit 1010 ′ has an external input terminal for inputting video and audio from an external device. In the example shown in FIG. 27, three RCA pin jacks (video 1, audio 2) are provided as external input terminals, but an input format other than the RCA pin may be used.

（リモコンレイアウトの例）
図２８は、リモコンユニット１０１３のレイアウトの一例を説明するため図である。本例に示すリモコンユニット１０１３は、電源ボタン１０１４、様々な設定項目を表示するメニューボタン１０１５、会話優先出音モードと通常モードを切り替えるモードボタン１０１６、メニューボタン１０１５で表示された項目を選択して決定する上下、決定ボタン１０１７、第１の音量調整ボタン１０１８、第２の音量調整ボタン１０１９とを少なくとも備えている。メニューボタン１０１５で表示される設定項目の中には、出音レベル設定を音量比率で調整するか音量レベルで調整するかの項目、会話優先出音モードを禁止するかの項目などが含まれる。 (Example of remote control layout)
FIG. 28 is a diagram for explaining an example of the layout of the remote control unit 1013. The remote control unit 1013 shown in this example selects a power button 1014, a menu button 1015 for displaying various setting items, a mode button 1016 for switching between the conversation priority sound output mode and the normal mode, and an item displayed by the menu button 1015. There are at least an up / down decision button, a decision button 1017, a first volume adjustment button 1018, and a second volume adjustment button 1019. The setting items displayed by the menu button 1015 include an item for adjusting the sound output level setting by the volume ratio or the sound volume level, an item for prohibiting the conversation priority sound output mode, and the like.

音量比率で調整する際には、第１の音量調整ボタン１０１８で全体の音量レベルを調整し、第２の音量調整ボタン１０１９で会話音声とコンテンツ音声の音量比率を調整する。一方、音量レベルで調整する際には、第１の音量調整ボタン１０１８で会話音声の音量レベルを調整し、第２の音量調整ボタン１０１９でコンテンツ音声の音量レベルを調整する。 When the volume ratio is adjusted, the overall volume level is adjusted with the first volume adjustment button 1018, and the volume ratio between the conversational sound and the content sound is adjusted with the second volume adjustment button 1019. On the other hand, when the volume level is adjusted, the volume level of the conversational sound is adjusted with the first volume adjustment button 1018, and the volume level of the content sound is adjusted with the second volume adjustment button 1019.

（音量調整時の画面表示の例）
図２９は、音量調整の際の画面表示例を示す図である。なお、前述した図７とは別の例である。本例における表示画面は、会話音声アイコン１０２０、コンテンツ音声アイコン１０２１、会話音声レベルバー１０２２、コンテンツ音声レベルバー１０２３から構成される。会話音声アイコン１０２０は、会話音声レベルバー１０２２が会話音声の音量レベルを表していることをわかりやすくするための表示であり、コンテンツ音声アイコン１０２１は、コンテンツ音声レベルバー１０２３がコンテンツ音声の音量レベルを表していることをわかりやすくするための表示である。 (Example of screen display during volume adjustment)
FIG. 29 is a diagram illustrating a screen display example at the time of volume adjustment. It is an example different from FIG. 7 described above. The display screen in this example includes a conversation voice icon 1020, a content voice icon 1021, a conversation voice level bar 1022, and a content voice level bar 1023. The conversation voice icon 1020 is a display for making it easy to understand that the conversation voice level bar 1022 represents the volume level of the conversation voice. The content voice icon 1021 has the content voice level bar 1023 indicating the volume level of the content voice. It is a display to make it easy to understand what is being represented.

また、会話音声レベルバー１０２２は、上下に伸びるバーの形をした表示で会話音声の音量レベルを表しており、上に伸びるほど音量レベルが大きいことを表している。コンテンツ音声レベルバー１０２３は、上下に伸びるバーの形をした表示でコンテンツ音声の音量レベルを表しており、上に伸びるほど音量レベルが大きいことを表している。また、通常モードの場合は、会話音声レベルバー１０２２とコンテンツ音声レベルバー１０２３を緑色で表示し、一方、会話優先出音モードの場合は、会話音声レベルバー１０２２とコンテンツ音声レベルバー１０２３を赤色で表示するなどして、会話優先出音モードの場合は変更できないことを強調してもよい。 The conversational voice level bar 1022 represents the volume level of the conversational voice in the form of a bar extending vertically, and indicates that the volume level increases as it extends upward. The content audio level bar 1023 represents the volume level of the content audio in the form of a bar extending up and down, and indicates that the volume level increases as it extends upward. In the normal mode, the conversation audio level bar 1022 and the content audio level bar 1023 are displayed in green. In the conversation priority sound output mode, the conversation audio level bar 1022 and the content audio level bar 1023 are displayed in red. It may be emphasized that it cannot be changed in the conversation priority sound output mode by displaying it.

本発明を適用した情報処理システムにおいて会話音声とコンテンツ音声といった複数の音声を同時に出力しながら音声コミュニケーションを行なう状況の一例を説明するための概念図である。It is a conceptual diagram for explaining an example of a situation where voice communication is performed while simultaneously outputting a plurality of voices such as conversation voices and content voices in an information processing system to which the present invention is applied. 実施例１の情報処理システムの概要構成例を説明するための図である。1 is a diagram for explaining a schematic configuration example of an information processing system according to a first embodiment. FIG. 実施例１の情報処理端末の構成例を示すブロック図である。1 is a block diagram illustrating a configuration example of an information processing terminal according to a first embodiment. 出音制御部の詳細な構成例を示すブロック図である。It is a block diagram which shows the detailed structural example of a sound output control part. 実施例１における情報処理端末の音声通話処理の一例を説明するためのフロー図である。FIG. 6 is a flowchart for explaining an example of a voice call process of the information processing terminal in the first embodiment. 実施例１における情報処理端末で出音される音声の混合音量変更処理の一例を説明するためのフロー図である。It is a flowchart for demonstrating an example of the mixing sound volume change process of the sound uttered by the information processing terminal in Example 1. 本発明において混合音量を変更したときの画面表示例を示す図である。It is a figure which shows the example of a screen display when a mixed sound volume is changed in this invention. 実施例１において情報処理端末が相手先の情報処理端末に対してモード変更の操作を行なった際の動作例を説明するためのフロー図である。FIG. 6 is a flowchart for explaining an operation example when the information processing terminal performs a mode change operation on the counterpart information processing terminal in the first embodiment. 本発明において会話優先を指示したときの画面表示例を示す図である。It is a figure which shows the example of a screen display when the conversation priority is instruct | indicated in this invention. 実施例１において情報処理端末が相手先の情報処理端末からモード変更の操作を受けた際の動作例を説明するためのフロー図である。FIG. 6 is a flowchart for explaining an operation example when the information processing terminal receives a mode change operation from the counterpart information processing terminal in the first embodiment. 本発明において会話優先を相手から指示されたときの画面表示例を示す図である。It is a figure which shows the example of a screen display when the conversation priority is instruct | indicated from the other party in this invention. 混合音量情報及びモード情報のデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of mixed volume information and mode information. ステレオスピーカを用いて出力音声をチャネルで分ける機能の一例を説明するための概念図である。It is a conceptual diagram for demonstrating an example of the function which divides | segments output audio | voice by a channel using a stereo speaker. 会話音声音量レベルに応じてモード変更する機能の一例を説明するための図である。It is a figure for demonstrating an example of the function which changes a mode according to a conversational sound volume level. 実施例２の情報処理システムの概要構成例を説明するための図である。FIG. 6 is a diagram for explaining a schematic configuration example of an information processing system according to a second embodiment. 実施例２の情報処理端末の構成例を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration example of an information processing terminal according to a second embodiment. 実施例２の情報処理端末の音声通話処理の一例を説明するためのフロー図である。FIG. 10 is a flowchart for explaining an example of a voice call process of the information processing terminal according to the second embodiment. 実施例３の情報処理システムの概要構成例を説明するための図である。FIG. 10 is a diagram for explaining a schematic configuration example of an information processing system according to a third embodiment. 実施例３の情報処理端末の構成例を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration example of an information processing terminal according to a third embodiment. 実施例３の情報処理端末の音声通話処理の一例を説明するためのフロー図である。FIG. 10 is a flowchart for explaining an example of a voice call process of the information processing terminal according to the third embodiment. 実施例３において情報処理端末が相手先の情報処理端末に対してモード変更の操作を行なった際の動作例を説明するためのフロー図である。FIG. 10 is a flowchart for explaining an operation example when the information processing terminal performs a mode change operation on the counterpart information processing terminal in the third embodiment. 実施例３の情報処理端末において、相手先の情報処理端末から受信した混合音量変更信号に応じて、その相手先の情報処理端末の出音レベル設定を変更する際の動作例を説明するためのフロー図である。In the information processing terminal according to the third embodiment, an operation example when changing the sound output level setting of the information processing terminal of the counterpart in accordance with the mixed sound volume change signal received from the counterpart information processing terminal is described. FIG. 実施例３の情報処理端末の構成例を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration example of an information processing terminal according to a third embodiment. 実施例３の情報処理端末の音声通話処理の一例を説明するためのフロー図である。FIG. 10 is a flowchart for explaining an example of a voice call process of the information processing terminal according to the third embodiment. 実施例３の情報処理端末で出音される音声の混合音量変更処理の一例を説明するためのフロー図である。FIG. 10 is a flowchart for explaining an example of a mixed sound volume changing process for sounds output from the information processing terminal according to the third embodiment. 本発明に係る情報処理端末の各ブロック間の接続構成の一例を説明するための接続構成図である。It is a connection block diagram for demonstrating an example of the connection structure between each block of the information processing terminal which concerns on this invention. 本発明に係る情報処理端末の各ブロック間の接続構成の他の例を説明するための接続構成図である。It is a connection block diagram for demonstrating the other example of the connection structure between each block of the information processing terminal which concerns on this invention. 本発明に係る情報処理端末で用いるリモコンユニットのレイアウトの一例を示す図である。It is a figure which shows an example of the layout of the remote control unit used with the information processing terminal which concerns on this invention. 音量調整の際の画面表示例を示す図である。It is a figure which shows the example of a screen display in the case of volume adjustment. 従来のシステムにおいて会話音声とコンテンツ音声といった複数の音声を同時に出力しながら音声コミュニケーションを行なう状況を説明するための概念図である。It is a conceptual diagram for demonstrating the situation where voice communication is performed while simultaneously outputting a plurality of voices such as conversation voice and content voice in a conventional system. 図３０に示した従来システムで発生する問題点を説明するための図である。It is a figure for demonstrating the problem which generate | occur | produces in the conventional system shown in FIG.

Explanation of symbols

１１ａ，１１ｂ，２１ａ，２１ｂ，３１ａ…Ａの会話音声、１２ａ，１２ｂ，２２ａ，２２ｂ，３２ａ，３２ｂ…Ｂの会話音声、１３ａ，１３ｂ，２３ａ，２３ｂ，３３ａ…コンテンツ音声、１４ａ，１４ｂ，２４ａ，２４ｂ，３１ｂ，３４ａ…混合音声、１０１ａ，１０１ｂ，２０１ａ，２０１ｂ，３０１ａ，３０１ｂ，５０１ａ，５０１ｂ…情報処理端末（通話装置）、１０２，１０２′，２０２，３０２，５０２…ネットワーク、１０３，１０３′，５０３…コンテンツサーバ、１１０，２１０，３１０，４１０…通信部、１１１，２１１，３１１，４１１…会話音声入力部、１１２，２１２，３１２，４１２…音声出力部、１１３，２１３，４１３…会話音声送信部、１１４，２１４，３１３…会話音声受信部、１１５…コンテンツ音声受信部、１１６，２１５，３１４ａ，３１４ｂ…出音音声混合部、１１７，２１６，３１５，４１４…混合音量変更入力部、１１８，２１７，３１６，４１５…モード変更入力部、１１９，２１８，４１６…モード変更送信部、１２０，２１９，３１７…モード変更受信部、１２１，２２２，３２１ａ，３２１ｂ…出音制御部、１２１ａ…混合音量制御部、１２１ｂ…混合音量情報保持部、１２１ｃ…モード情報保持部、１２１ｄ…元設定混合音量情報保持部、１２１ｅ…会話優先混合音量情報保持部、１２２…混合音量情報、１２３…混合音量変更信号、１２４…モード変更信号、２０３，３０３…コンテンツ保持部、２２０，３１８…コンテンツ音声抽出部、２２１…コンテンツ音声送信部、３１９…混合音声送信部、３２０…混合音量変更受信部、４１７…混合音声受信部、４１８…混合音量変更送信部、１０１０…アダプタユニット、１０１１，１０１１′…テレビ受像機、１０１２，１０１２′…セットトップユニット、１０１３，１０１３′…リモコンユニット、１０１０′…通信部、１０１４…電源ボタン、１０１５…メニューボタン、１０１６…モードボタン、１０１７…上下、決定ボタン、１０１８…第１の音量調整ボタン、１０１９…第２の音量調整ボタン、１０２０…会話音声アイコン、１０２１…コンテンツ音声アイコン、１０２２…会話音声レベルバー、１０２３…コンテンツ音声レベルバー。 11a, 11b, 21a, 21b, 31a ... A conversation voice, 12a, 12b, 22a, 22b, 32a, 32b ... B conversation voice, 13a, 13b, 23a, 23b, 33a ... content voice, 14a, 14b, 24a , 24b, 31b, 34a ... mixed voice, 101a, 101b, 201a, 201b, 301a, 301b, 501a, 501b ... information processing terminal (calling device), 102, 102 ', 202, 302, 502 ... network, 103, 103 ', 503 ... Content server, 110, 210, 310, 410 ... Communication unit, 111, 211, 311, 411 ... Conversation voice input unit, 112, 212, 312, 412 ... Voice output unit, 113, 213, 413 ... Conversation Voice transmitting unit, 114, 214, 313 ... conversation voice receiving unit, 115 ... content Voice receiving unit 116, 215, 314a, 314b ... Outgoing sound mixing unit, 117, 216, 315, 414 ... Mixed volume change input unit, 118, 217, 316, 415 ... Mode change input unit, 119, 218, 416 ... mode change transmission unit, 120, 219, 317 ... mode change reception unit, 121, 222, 321a, 321b ... sound output control unit, 121a ... mixed volume control unit, 121b ... mixed volume information holding unit, 121c ... mode information holding 121d ... original setting mixed volume information holding unit, 121e ... conversation priority mixed volume information holding unit, 122 ... mixed volume information, 123 ... mixed volume change signal, 124 ... mode change signal, 203,303 ... content holding unit, 220 , 318 ... content audio extraction unit, 221 ... content audio transmission unit, 319 ... mixed audio transmission unit, 320 ... mixing Volume change receiving unit, 417 ... Mixed sound receiving unit, 418 ... Mixed volume change transmitting unit, 1010 ... Adapter unit, 1011, 1011 '... TV receiver, 1012, 1012' ... Set top unit, 1013, 1013 '... Remote control unit DESCRIPTION OF SYMBOLS 1010 '... Communication part, 1014 ... Power button, 1015 ... Menu button, 1016 ... Mode button, 1017 ... Up / down, decision button, 1018 ... First volume adjustment button, 1019 ... Second volume adjustment button, 1020 ... Conversation Voice icon, 1021... Content voice icon, 1022 ... conversation voice level bar, 1023 ... content voice level bar.

Claims

A call device capable of making a call including conversation voice with another call device and simultaneously viewing content including the voice,
The call device includes means for transmitting sound level setting change instruction information to the other call device for setting the sound output by giving priority to either conversation voice or content sound in the other call device ,
Wherein when the speech loudness level of the call device exceeds a predetermined amount, the communication device characterized that you send the sound output level setting change instruction information to the other communication device from the communication device.

A call device capable of making a call including conversation voice with another call device and simultaneously viewing content including the voice,
The call device includes means for transmitting sound level setting change instruction information to the other call device for setting the sound output by giving priority to either conversation voice or content sound in the other call device,
The call device is characterized in that the sound output setting of the other call device changes the setting of the echo canceller.

A call device capable of making a call including conversation voice with another call device and simultaneously viewing content including the voice,
The call device includes means for transmitting sound level setting change instruction information to the other call device for setting the sound output by giving priority to either conversation voice or content sound in the other call device,
The sound output setting of the other call device is characterized in that the volume of the conversation voice is increased and the volume of the content sound is decreased.

A call device capable of making a call including conversation voice with another call device and simultaneously viewing content including the voice,
The call device includes means for transmitting sound level setting change instruction information to the other call device for setting the sound output by giving priority to either conversation voice or content sound in the other call device,
The communication device is characterized in that the sound output setting of the other communication device is a sound output from a different speaker.

The communication device according to any one of claims 1 to 4 ,
The call device is characterized in that after changing the sound output setting of the other call device, the change of the sound output setting by operating the other call device is prohibited or restricted for a predetermined time.

The communication device according to any one of claims 1 to 5 ,
When the sound output setting of the other call device is changed, the call device changes the sound output setting of the call device in conjunction with the change.

A call device capable of making a call including conversation voice with another call device and simultaneously viewing content including the voice,
The call device includes means for receiving, from the other call device, sound output level setting change instruction information for setting sound output with priority given to either conversation voice or content sound at the call device. According to the sound output level setting change instruction information , the call canceling device changes the setting of the echo canceller as the sound output setting of the communication device.

A call device capable of making a call including conversation voice with another call device and simultaneously viewing content including the voice,
The call device includes means for receiving, from the other call device, sound output level setting change instruction information for setting sound output with priority given to either conversation voice or content sound at the call device. According to the sound output level setting change instruction information, the call device is characterized in that, as the sound output setting of the call device, the volume of the conversation sound is increased and the volume of the content sound is decreased.

A call device capable of making a call including conversation voice with another call device and simultaneously viewing content including the voice,
The call device includes means for receiving, from the other call device, sound output level setting change instruction information for setting sound output with priority given to either conversation voice or content sound at the call device. According to the sound output level setting change instruction information, a sound is output from a different speaker as the sound output setting of the communication device.

The communication device according to any one of claims 7 to 9 ,
The call device is characterized in that after changing the sound output setting of the call device, the change of the sound output setting by operating the call device is prohibited or restricted for a predetermined time.

The communication device according to any one of claims 7 to 9 ,
When the call device receives the sound level setting change instruction information from the other call device, the call device invalidates the sound setting instruction according to the sound level setting change instruction information, and sets the sound setting of the call device. A telephone device characterized by not changing.

A call device capable of making a call including conversation voice with another call device and simultaneously viewing content including the voice,
The communication device is configured to transmit a mixed sound obtained by mixing a conversation sound and a content sound to the other call device according to the mixed volume ratio information received from the other call device;
Means for changing the mixed sound volume ratio of the mixed sound to be transmitted to the other call device so as to give priority to either the conversation voice or the content sound at the other call device ;
When speech loudness level of the communication device exceeds a predetermined amount, the communication device characterized that you change the mixing volume ratio of the mixed sound to be transmitted to the other communication device.

The communication device according to claim 12 ,
The call device is characterized in that, as the mixed sound volume ratio of the mixed sound transmitted to the other call device, the volume of the conversation sound is increased and the volume of the content sound is decreased.

The communication device according to claim 12 or 13 ,
The call device prohibits or restricts the change of the mixed sound volume ratio by the operation of the other call device for a predetermined time after changing the mixed sound volume ratio of the mixed sound transmitted to the other call device. Telephone device.

The communication device according to any one of claims 12 to 14 ,
If the mixed volume ratio of the mixed voice to be transmitted to the other call device is changed, the call device sets the sound output to give priority to either the conversation voice or the content sound in the call device in conjunction with this change. A call device characterized by changing the voice.

A call system comprising the call device according to any one of claims 1 to 15 and another call device.

The program for performing the function as a telephone apparatus of any one of Claims 1-15 .