JP2018139397A

JP2018139397A - Voice display device and voice display program

Info

Publication number: JP2018139397A
Application number: JP2017033961A
Authority: JP
Inventors: 山中　健司; Kenji Yamanaka; 山中　　健司
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2017-02-24
Filing date: 2017-02-24
Publication date: 2018-09-06

Abstract

PROBLEM TO BE SOLVED: To provide a voice display device capable of ensuring communication between speaking persons in the case of conversation using a system for bi-directional communication of voice with another base.SOLUTION: A voice display device 10 to be used for the system for bi-directional communication of voice or the like with another base comprises: a voice input part 22 which generates a voice signal out of an input voice; a voice processing part 36 which generates character data by using the voice signal generated by the voice input part; a communication processing part which is supplied with the voice signal or the character data generated on the basis of a voice generated at another base; an image processing part 60 which generates an image signal on the basis of character data corresponding to the voice generated at each base; an image output part 28 which outputs and displays an image on the basis of the image signal; and a control part 30 which determines a voice communication situation and controls the respective processing parts. The display mode of the image is determined in accordance with information recognized from the character data and the communication situation of the voice.SELECTED DRAWING: Figure 2

Description

本発明は、音声表示装置および音声表示プログラム、より具体的には、他の拠点との間で少なくとも音声を双方向通信するシステムに用いられる音声表示装置および音声表示プログラムに関するものである。 The present invention relates to an audio display device and an audio display program, and more specifically to an audio display device and an audio display program used in a system that performs at least two-way audio communication with other sites.

離れた拠点間での会話、特に会議を行う際、インターネットプロトコル網（以下、IP網ともいう）を使用することによって、他拠点の様子を映す画像を表示することが可能である。すなわち、IP網の使用により、他拠点の音声情報のみならず視覚情報をも伝えることが可能となる。このように、遠隔地への視覚情報の伝送を可能にするシステムを、特に遠隔地間の会議での使用に適することから、例えばテレビ会議システムと呼ぶことがある。 When a conversation between remote bases, especially a conference, is performed, an image showing the state of another base can be displayed by using an Internet protocol network (hereinafter also referred to as an IP network). In other words, by using the IP network, it is possible to convey visual information as well as voice information of other bases. As described above, a system that enables transmission of visual information to a remote place is particularly suitable for use in a conference between remote places, and thus may be called a video conference system, for example.

かかるシステムでは、IP網を用いて音声および画像をそれぞれパケット形式にて送受信する。しかしながら、通信障害や通信トラフィックの集中などの様々な理由により、パケットの円滑な伝送が妨げられ、パケットを受信した側では出力される音声と画像がリアルタイムに対応しなくなってしまう場合がある。このような場合、自分が話した言葉の内容が他拠点側にいる相手に正しく伝わったかどうかは相手の表情などから間接的に判断する他ないが、これらの間接的な情報だけでは判断できないこともある。さらに、音声パケットの受信の不調などにより、相手が話した言葉を聞き取れない場合もある。また、相手が話した言葉を聞き取れないことや、聞き漏らしてしまうことは、システムに何の不具合もない場合にも度々起こり得る。 In such a system, voice and images are transmitted and received in packet format using an IP network. However, due to various reasons such as communication failure and concentration of communication traffic, smooth transmission of packets may be hindered, and the voice and image output may not correspond in real time on the side receiving the packets. In such a case, whether or not the content of the spoken word is correctly communicated to the partner at the other site side can only be judged indirectly from the other party's facial expression, etc., but it cannot be judged only by this indirect information There is also. In addition, the speech spoken by the other party may not be heard due to poor reception of voice packets. In addition, the inability to hear or miss the words spoken by the other party can often occur even when there is no problem with the system.

このように、会話内容の相互理解ができなかった場合には、同じ言葉を再度伝えて相手に伝わったかどうかを確認する作業や、同じ言葉を再度伝えるよう相手に要請する作業が必要となる。 As described above, when mutual understanding of conversation contents cannot be achieved, it is necessary to confirm whether or not the same word is transmitted to the other party and to request the other party to transmit the same word again.

特開2015-41885号公報JP-A-2015-41885

しかしながら、このような作業を行うと、その都度会議の進行が妨げられ、会議の能率が低下してしまうという問題がある。とはいえ、会話内容の不十分な理解や誤った理解を放置するようなことは避けなければならない。 However, when such work is performed, there is a problem that the progress of the conference is prevented each time, and the efficiency of the conference is reduced. Nonetheless, you should avoid neglecting inadequate or misunderstood conversations.

そのため、会議の進行を妨げることなく会話内容の正確な共通理解を図ることができるシステムの構築が求められていた。特に、視覚的演出をできるだけ効率的に活用して会話内容の正確な共通理解を図り、議論の活性化を促す手法が求められていた。 Therefore, there has been a demand for the construction of a system that can achieve an accurate common understanding of conversation contents without hindering the progress of the conference. In particular, there has been a need for a technique that promotes the activation of discussions by utilizing visual effects as efficiently as possible to achieve an accurate common understanding of conversation contents.

本発明はこのような課題に鑑み、他の拠点との間で音声などを双方向通信するシステムを用いて会話をする際に、会話者間の意思疎通をより確実にする音声表示装置を提供することを目的とする。 In view of such problems, the present invention provides a voice display device that makes communication between talkers more reliable when a conversation is performed using a system that performs two-way communication of voice and the like with other bases. The purpose is to do.

本発明は上述の課題を解決するために、他の拠点との間で少なくとも音声を双方向通信するシステムに用いられ、音声を入力して音声を信号変換して音声信号を生成する音声入力部と、少なくとも音声入力部で生成された音声信号に基づいて文字データを生成する音声処理部と、他の拠点で発せられた音声に基づいて生成された音声信号または文字データの供給を他の拠点から受ける通信処理部と、少なくとも音声入力部に入力された音声および他の拠点で発せられた音声に対応する文字データに基づいて画像信号を生成する画像処理部と、画像信号に基づいて画像を出力表示する画像出力部と、他の拠点との間における音声の通信状況を判断し、音声処理部、通信処理部および画像処理部の動作を制御する制御部とを有し、画像の表示態様は文字データから認識される情報または音声の通信状況の少なくとも一方に応じて決定される。 In order to solve the above-mentioned problem, the present invention is used in a system that performs at least two-way audio communication with other sites, and inputs a voice and converts the voice to generate a voice signal. A voice processing unit that generates character data based on at least the voice signal generated by the voice input unit, and the supply of the voice signal or character data generated based on the voice emitted from the other site to the other site. A communication processing unit received from the image processing unit, an image processing unit that generates an image signal based on character data corresponding to at least a voice input to the voice input unit and a voice emitted from another base, and an image based on the image signal An image output unit that outputs and displays; and a control unit that determines an audio communication state between other bases and controls operations of the audio processing unit, the communication processing unit, and the image processing unit, and displays an image It is determined according to at least one of the communication status information or the speech is recognized from the character data.

また、本発明は、他の拠点との間で少なくとも音声を双方向通信するシステムに用いられ、音声を入力して音声を信号変換して音声信号を生成する音声入力部と、画像信号に基づいて画像を出力表示する画像出力部と接続されているコンピュータを、音声を画像出力部から表示する音声表示装置として機能させるプログラムであって、コンピュータを、少なくとも音声入力部で生成された音声信号に基づいて文字データを生成する音声処理手段、他の拠点で発せられた音声に基づいて生成された音声信号または文字データの供給を他の拠点から受ける通信処理手段、少なくとも音声入力部に入力された音声および他の拠点で発せられた音声に対応する文字データに基づいて画像出力部から出力する画像信号を生成する画像処理手段、ならびに他の拠点との間における音声の通信状況を判断し音声処理手段、通信処理手段および画像処理手段の動作を制御する制御手段として機能させ、画像の表示態様は文字データから認識される情報または音声の通信状況の少なくとも一方に応じて決定される。 Further, the present invention is used in a system that performs at least two-way audio communication with another site, and is based on an audio input unit that inputs audio and converts the audio to generate an audio signal, and an image signal. A computer connected to an image output unit for outputting and displaying an image to function as an audio display device for displaying sound from the image output unit, wherein the computer is converted into at least an audio signal generated by the audio input unit. Voice processing means for generating character data based on the above, communication processing means for receiving supply of a voice signal or character data generated based on the voice uttered at another base from another base, input to at least the voice input unit Image processing means for generating an image signal to be output from the image output unit based on the character data corresponding to the sound and the sound emitted at another base, and It determines the voice communication status with the other base and functions as a control means for controlling the operation of the voice processing means, the communication processing means and the image processing means, and the display mode of the image is information recognized from character data or voice It is determined according to at least one of the communication conditions.

本発明によれば、他の拠点との間で音声などを双方向通信するシステムを用いて会話をする際に、会話者間の意思疎通をより確実にすることができる。 ADVANTAGE OF THE INVENTION According to this invention, when carrying out a conversation using the system which carries out two-way communication of an audio | voice etc. between other bases, communication between talkers can be made more reliable.

本発明に係る音声表示装置の一実施例を用いるシステムの構成を示す概略的な図である。1 is a schematic diagram showing the configuration of a system using an embodiment of an audio display device according to the present invention. 本発明に係る音声表示装置の実施例の構成を示す概略的な図である。1 is a schematic diagram showing a configuration of an embodiment of an audio display device according to the present invention. ないしOr 図２で例示する装置から出力された画像の表示例を示す図である。FIG. 3 is a diagram illustrating a display example of an image output from the apparatus illustrated in FIG. 2. 本発明に係る音声表示装置の別の実施例の構成を示す概略的な図である。It is the schematic which shows the structure of another Example of the audio | voice display apparatus which concerns on this invention. およびand 図８で例示する装置から出力された画像の表示例を示す図である。It is a figure which shows the example of a display of the image output from the apparatus illustrated in FIG. コンピュータを本発明の実施例として稼働させるときの構成を概略的に示す図である。It is a figure which shows roughly the structure when operating a computer as an Example of this invention.

次に添付図面を参照して本発明による音声表示装置の実施例を詳細に説明する。図１を参照すると、本発明による音声表示装置10の実施例は、離れた拠点A、B間での音声および画像の双方向通信を可能にするテレビ会議システム12に設けられる。音声表示装置10は、遠隔地間における画像データおよび音声データの通信を可能とするデータ通信装置であるとともに、各拠点で入力された音声を文字データ化して所定の処理を施し、画像の一部として出力する装置である。 Next, an embodiment of an audio display device according to the present invention will be described in detail with reference to the accompanying drawings. Referring to FIG. 1, an embodiment of an audio display device 10 according to the present invention is provided in a video conference system 12 that enables two-way audio and image communication between remote sites A and B. The voice display device 10 is a data communication device that enables image data and voice data to be communicated between remote locations. The voice display device 10 converts voice input at each site into character data, performs predetermined processing, and performs part of the image. As the output device.

図１に示す例において、テレビ会議システム12は、IP網などの通信網14を使用してシステムを構築することによって、離れた拠点間での音声および画像の送受信を実現している。通信網14は、複数のコンピュータや通信機器を接続して相互間における通信を実現する技術であれば、IP網をはじめとしていかなる通信ネットワークであっても構わない。 In the example shown in FIG. 1, the video conference system 12 implements transmission and reception of voice and images between remote locations by constructing a system using a communication network 14 such as an IP network. The communication network 14 may be any communication network including an IP network as long as it is a technology that realizes communication between a plurality of computers and communication devices.

音声表示装置10と通信網14の接続は、ルータ16を介して実現される。ルータ16は、データの転送経路を選択および制御する機器である。図１では、ルータ16に接続されている通信網のうち、音声表示装置10a、10b間における通信処理を行う通信網14以外の通信網の図示は省略されている。 The connection between the voice display device 10 and the communication network 14 is realized via the router 16. The router 16 is a device that selects and controls a data transfer path. In FIG. 1, communication networks other than the communication network 14 that performs communication processing between the voice display devices 10a and 10b among the communication networks connected to the router 16 are omitted.

音声表示装置10と通信網14の接続構成をより具体的に述べる。音声表示装置10とルータ16の間は、回線18によって接続されている。回線18とは、音声データや画像データの送受信に用いられる通信線一般のことを指し、有線回線であっても、無線回線であっても構わない。ルータ16と通信網14の間もまた、データの送受信に用いられる広義の回線20によって接続されている。 The connection configuration of the voice display device 10 and the communication network 14 will be described more specifically. The voice display device 10 and the router 16 are connected by a line 18. The line 18 refers to a general communication line used for transmission / reception of audio data and image data, and may be a wired line or a wireless line. The router 16 and the communication network 14 are also connected by a broad line 20 used for data transmission / reception.

音声表示装置10は、音声を入力して同装置10で処理可能な形式の信号に変換して同装置10へ送る音声入力部22を有する。音声表示装置10は、音声入力部22で受信した音声に対応する音声信号を文字データに変換することができる。音声表示装置10はさらに、被写体の画像情報を受信して音声表示装置10で処理可能な形式の信号に変換して同装置10へ送る画像入力部24を有する。 The voice display device 10 has a voice input unit 22 that inputs voice, converts it into a signal in a format that can be processed by the device 10, and sends the signal to the device 10. The voice display device 10 can convert a voice signal corresponding to the voice received by the voice input unit 22 into character data. The audio display device 10 further includes an image input unit 24 that receives image information of a subject, converts it into a signal in a format that can be processed by the audio display device 10, and sends the signal to the device 10.

音声表示装置10は、他拠点B側の音声入力部22で得られた音声信号を通信網14経由で受け取り音声として出力する音声出力部26を有する。音声表示装置10はさらに、他拠点B側の画像入力部24で得られた信号を通信網14経由で受け取り画像として出力する画像出力部28を有する。また、画像出力部28から出力される画像には、自拠点A側および他拠点B側の音声入力部22で音声データを含む信号を処理して得られた文字が含まれる。 The voice display device 10 includes a voice output unit 26 that receives a voice signal obtained by the voice input unit 22 on the other site B side via the communication network 14 and outputs it as voice. The voice display device 10 further includes an image output unit 28 that receives a signal obtained by the image input unit 24 on the other site B side via the communication network 14 and outputs it as an image. In addition, the image output from the image output unit 28 includes characters obtained by processing a signal including audio data at the audio input unit 22 on the local site A side and the other site B side.

音声表示装置10は、自拠点A側の音声信号のみならず、他拠点B側の音声表示装置から受け取った信号に含まれる音声データを文字データに変換することができるようにしてもよい。 The voice display device 10 may convert not only the voice signal on the local site A side but also the voice data included in the signal received from the voice display device on the other site B side into character data.

続いて、本発明に係る音声表示装置10の実施例の構成について、図２を参照しながらより詳細に説明する。なお、音声表示装置10にはさまざまな構成要素が含まれているが、図中に明示し詳細な説明を述べる構成要素は、本発明に係る音声表示装置10の実施例の理解のため特に重要な構成要素のみにとどめる。 Next, the configuration of the embodiment of the voice display device 10 according to the present invention will be described in more detail with reference to FIG. The voice display device 10 includes various components. However, the components that are clearly shown in the drawing and whose detailed explanation is described are particularly important for understanding the embodiment of the voice display device 10 according to the present invention. Only the essential components.

音声表示装置10は、装置10に含まれる様々な構成要素の動作の制御を行う制御部30を有する。特に、制御部30は各構成要素に対していかなる動作をどのタイミングで実行するかを制御する。制御部30はさらに、各構成要素の制御および各構成要素を介して受信した各種のデータの加工に必要な所定の演算を実行する。 The voice display device 10 includes a control unit 30 that controls operations of various components included in the device 10. In particular, the control unit 30 controls what operation is performed on each component at which timing. The control unit 30 further executes predetermined calculations necessary for controlling each component and processing various data received via each component.

音声表示装置10は、同装置10内の構成要素に対する制御処理の実行手順および受信したデータの加工手順などを規定するプログラムを記憶している記憶部32を有する。記憶部32は、データや制御信号の伝送経路となる信号線34を介して制御部30と接続されている。そのため、制御部30は記憶部32から所望のプログラムデータを読み出し、読み出したプログラムを実行することにより、プログラムに記載された所定の手順で制御処理およびデータの加工などを実行することが可能となる。また、記憶部32は、音声表示装置10内で処理される予定のデータおよび処理済みのデータを、一時的または半永久的に記憶しておくことも可能である。 The voice display device 10 includes a storage unit 32 that stores a program that defines a control processing execution procedure for the components in the device 10, a processing procedure for received data, and the like. The storage unit 32 is connected to the control unit 30 via a signal line 34 serving as a transmission path for data and control signals. Therefore, the control unit 30 can read out desired program data from the storage unit 32 and execute the read program to execute control processing, data processing, and the like in a predetermined procedure described in the program. . The storage unit 32 can also temporarily or semi-permanently store data scheduled to be processed in the audio display device 10 and processed data.

音声表示装置10が有する音声入力部22は、例えばマイクロフォンのように、音声を電気信号など別の形式に変換した信号を生成する機器であり、当該信号変換を実現するあらゆる手段が定義上含まれる。 The voice input unit 22 included in the voice display device 10 is a device that generates a signal obtained by converting voice into another format such as an electric signal, such as a microphone, and includes all means for realizing the signal conversion by definition. .

音声表示装置10は、音声入力部22と制御部30の間を接続するインタフェースとして、音声入力部22によって得られた音声データを含む音声信号を処理する入力音声処理部36を有する。音声入力部22と入力音声処理部36の間は、有線または無線の回線38を介して接続されている。かかる構成により、音声入力部22で得られた信号は入力音声処理部36に供給される。 The audio display device 10 includes an input audio processing unit 36 that processes an audio signal including audio data obtained by the audio input unit 22 as an interface that connects the audio input unit 22 and the control unit 30. The audio input unit 22 and the input audio processing unit 36 are connected via a wired or wireless line 38. With this configuration, the signal obtained by the voice input unit 22 is supplied to the input voice processing unit 36.

また、入力音声処理部36と制御部30の間は、通信線40を介して接続されている。かかる構成により、入力音声処理部36は制御部30から制御信号を受け取り、記憶部32に記憶されたプログラムに記載の手順に従った信号処理を実行することができる。また、かかる構成では通信線40は信号の伝送路となるので、入力音声処理部36は、制御部30による制御を受けて処理された音声データを含む信号を、制御部30へ送信することができる。 Further, the input voice processing unit 36 and the control unit 30 are connected via a communication line 40. With this configuration, the input voice processing unit 36 can receive a control signal from the control unit 30 and execute signal processing according to the procedure described in the program stored in the storage unit 32. In this configuration, since the communication line 40 serves as a signal transmission path, the input voice processing unit 36 can transmit a signal including voice data processed under the control of the control unit 30 to the control unit 30. it can.

入力音声処理部36では、音声入力部22から供給を受けた信号に含まれる音声データを文字データに変換する処理も実行される。文字データへの変換処理もまた、制御部30からの制御の下で実行される。変換処理によって生成された文字データもまた、処理済みの音声データと同様に信号にのせて制御部30へと送信される。なお、音声データから文字データへの変換は、公知のいかなる変換方法でも利用することができる。 The input voice processing unit 36 also executes processing for converting voice data included in the signal supplied from the voice input unit 22 into character data. Conversion processing to character data is also executed under the control of the control unit 30. The character data generated by the conversion process is also transmitted to the control unit 30 in the same manner as the processed voice data. Note that any known conversion method can be used to convert voice data to character data.

好ましくは、入力音声処理部36は、音声入力部22で検出した音声の強弱を認識する。入力音声処理部36が認識した音声の強弱は、処理部36内で生成された文字データの付随情報として処理され、文字データとともに制御部30へと供給される。 Preferably, the input voice processing unit 36 recognizes the strength of the voice detected by the voice input unit 22. The strength of the voice recognized by the input voice processing unit 36 is processed as accompanying information of the character data generated in the processing unit 36 and supplied to the control unit 30 together with the character data.

音声表示装置10が有する画像入力部24は、例えばビデオカメラのように、撮影した風景の画像、特に動画を電気信号など別の形式の信号に変換する機器であり、当該信号変換を実現するあらゆる手段が定義上含まれる。 The image input unit 24 included in the audio display device 10 is a device that converts an image of a taken landscape, particularly a moving image, into a signal of another format such as an electric signal, such as a video camera. Means are included by definition.

音声表示装置10は、画像入力部24と制御部30の間を接続するインタフェースとして、画像入力部24によって得られた画像データを含む信号を処理する入力画像処理部42を有する。画像入力部24と入力画像処理部42の間は、有線または無線の回線44を介して接続されている。かかる構成により、画像入力部24で得られた信号は入力画像処理部42に供給される。 The audio display device 10 includes an input image processing unit 42 that processes a signal including image data obtained by the image input unit 24 as an interface for connecting the image input unit 24 and the control unit 30. The image input unit 24 and the input image processing unit 42 are connected via a wired or wireless line 44. With this configuration, the signal obtained by the image input unit 24 is supplied to the input image processing unit 42.

また、入力画像処理部42と制御部30の間は、通信線46を介して接続されている。かかる構成により、入力画像処理部42は制御部30から制御信号を受け取り、記憶部32に記憶されたプログラムに記載の手順に従った信号処理を実行することができる。また、かかる構成では通信線46は信号の伝送路となるので、入力画像処理部42は、制御部30による制御を受けて処理された画像データを含む信号を、制御部30へ送信することができる。 The input image processing unit 42 and the control unit 30 are connected via a communication line 46. With this configuration, the input image processing unit 42 can receive a control signal from the control unit 30 and execute signal processing according to the procedure described in the program stored in the storage unit 32. In such a configuration, the communication line 46 serves as a signal transmission path, so that the input image processing unit 42 can transmit a signal including image data processed under the control of the control unit 30 to the control unit 30. it can.

音声表示装置10は、回線18によってルータ16に接続され、ルータ16と接続されているIP網14を経由した先にある他拠点側とのデータや信号、文字データを含む音声信号や画像データを含む画像信号の送受信に関する処理を行う通信処理部50を有する。通信処理部50は通信線52を介して制御部30とも接続されている。そのため、通信処理部50は制御部30から制御信号を受け取り、記憶部32に記憶されたプログラムに記載の手順に従った制御を受けることができる。 The voice display device 10 is connected to the router 16 via the line 18 and receives voice signals and image data including character data and data and signals with other sites via the IP network 14 connected to the router 16. The communication processing unit 50 performs processing related to transmission / reception of image signals including the image signal. The communication processing unit 50 is also connected to the control unit 30 via the communication line 52. Therefore, the communication processing unit 50 can receive a control signal from the control unit 30 and can receive control according to the procedure described in the program stored in the storage unit 32.

制御部30は、入力音声処理部36から受け取った音声データおよび入力画像処理部42から受け取った画像データを通信処理部50に転送する。制御部30はさらに、これらのデータを他拠点側にある画像データおよび音声データを通信可能なデータ通信装置、好ましくは他拠点側の音声表示装置10に送信するよう通信処理部50に対して指示する。制御部30の制御により通信処理部50から他拠点側の音声表示装置10に転送されるデータには、入力音声処理部36で音声データから変換された文字データも含まれる。 The control unit 30 transfers the audio data received from the input audio processing unit 36 and the image data received from the input image processing unit 42 to the communication processing unit 50. The control unit 30 further instructs the communication processing unit 50 to transmit these data to a data communication device capable of communicating image data and sound data on the other site side, preferably to the sound display device 10 on the other site side. To do. The data transferred from the communication processing unit 50 to the voice display device 10 on the other site side under the control of the control unit 30 includes character data converted from the voice data by the input voice processing unit 36.

また、通信処理部50で受け取った他拠点側のデータ通信装置から送信された画像データや音声データは、さらに制御部30に転送される。他拠点側のデータ通信装置が音声表示装置10である場合には、通信処理部50で受け取り制御部30に転送される各種のデータの中には他拠点側の入力音声処理部36で音声データから変換された文字データも含まれる。 The image data and audio data transmitted from the data communication device on the other base side received by the communication processing unit 50 are further transferred to the control unit 30. When the data communication device on the other site side is the voice display device 10, among the various data received by the communication processing unit 50 and transferred to the control unit 30, the voice data is input by the input voice processing unit 36 on the other site side. Character data converted from is also included.

制御部30は、入力音声処理部36から受け取った音声データおよび通信処理部50から受け取った他拠点側の音声データに基づいて、会話の状況を判断することができる。制御部30は会話状況の判断にあたり、自拠点側および他拠点側で生成された文字データならびに画像データの少なくともいずれかを判断材料に加えても構わない。または、制御部30は文字データを会話状況の主たる判断材料としても構わない。 The control unit 30 can determine the state of the conversation based on the voice data received from the input voice processing unit 36 and the voice data on the other site side received from the communication processing unit 50. In determining the conversation status, the control unit 30 may add at least one of character data and image data generated on the local site side and the other site side to the determination material. Alternatively, the control unit 30 may use character data as a main determination material for the conversation situation.

例えば、30秒間や１分間などの所定の期間、どの拠点からも音声が全く検出されないか、または予め定めておいた下限基準以下の音声データ量や文字データ量しか得られなかった場合には、制御部30は会議参加者達が沈黙していて会議の進行が止まっていると判断する。 For example, when no voice is detected from any base for a predetermined period such as 30 seconds or 1 minute, or only voice data amount or character data amount below a predetermined lower limit standard is obtained, The control unit 30 determines that the conference participants are silent and the progress of the conference is stopped.

他方、所定の期間内に予め定めておいた上限基準以上の音声データ量や文字データ量が得られた場合には、制御部30は参加者が活発に発言を行い、議論が白熱している、いわば激論中であると判断する。もちろん、激論中であると制御部30が判断する方法はこの限りではない。制御部30は例えば、検出された声量の大きさに基づいて議論の白熱状況を判断してもよい。または、制御部30は、ある参加者の発言とこれに続く別の参加者の発言の間隔が所定の時間よりも短いことが連続している場合に激論中であると判断してもよい。 On the other hand, when the amount of voice data or character data exceeding the predetermined upper limit standard is obtained within a predetermined period, the control unit 30 actively participates and the discussion is heated. In other words, it is judged that there is an intense argument. Of course, this is not the only way for the control unit 30 to determine that there is a heated argument. For example, the control unit 30 may determine the incandescent state of the discussion based on the detected volume of voice. Alternatively, the control unit 30 may determine that the argument is being discussed when the interval between the speech of one participant and the speech of another participant following this continues for less than a predetermined time.

音声表示装置10が有する音声出力部26は、例えばスピーカのように、音声データを載せた電気信号などの信号を音に変換する機器であり、音声の出力を実現するあらゆる手段が定義上含まれる。 The audio output unit 26 included in the audio display device 10 is a device that converts a signal such as an electrical signal carrying audio data into sound, such as a speaker, and includes all means for realizing audio output by definition. .

音声表示装置10は、音声出力部26と制御部30の間を接続するインタフェースとして、他拠点側のデータ通信装置から得られた音声データを含む音声信号を処理する出力音声処理部54を有する。出力音声処理部54では、受け取った信号を、音声出力部26で処理可能な方式の信号に変換処理する。 The audio display device 10 includes an output audio processing unit 54 that processes an audio signal including audio data obtained from the data communication device on the other site side as an interface for connecting the audio output unit 26 and the control unit 30. The output audio processing unit 54 converts the received signal into a signal having a method that can be processed by the audio output unit 26.

出力音声処理部54と制御部30の間は、通信線56を介して接続されている。通信線56は信号の伝送路となり、出力音声処理部54は、制御部30から他拠点側の音声データを含む信号を受信することができる。また、通信線56を介して、出力音声処理部54は制御部30から制御信号を受け取り、記憶部32に記憶されたプログラムに記載の手順に従った信号処理を実行することができる。 The output audio processing unit 54 and the control unit 30 are connected via a communication line 56. The communication line 56 serves as a signal transmission path, and the output voice processing unit 54 can receive a signal including voice data on the other site side from the control unit 30. Further, the output sound processing unit 54 can receive a control signal from the control unit 30 via the communication line 56 and execute signal processing according to the procedure described in the program stored in the storage unit 32.

出力音声処理部54と音声出力部26の間は、有線または無線の回線58を介して接続されている。かかる構成により、出力音声処理部54で処理された音声データ付きの信号は音声出力部26に供給される。 The output audio processing unit 54 and the audio output unit 26 are connected via a wired or wireless line 58. With this configuration, the signal with audio data processed by the output audio processing unit 54 is supplied to the audio output unit 26.

出力音声処理部54は、供給を受けた信号に含まれる、他拠点側の音声表示装置10で得られた音声データを文字データに変換処理してもよい。例えば、他拠点側のデータ通信装置が本発明の実施例に係る音声表示装置10ではなく従来の通信装置であった場合には、他拠点側の通信装置では音声データを文字データに変換することができない。そのため、従来の通信装置を有する他拠点側から通信部50を介して制御部30に入力される信号には、音声データは含まれるものの文字データは含まれない。このような場合には、制御部30は、他拠点側の音声データを出力音声処理部54に転送するとともに、転送した音声データに基づいて文字データを生成するよう出力音声処理部54に対して命令する。出力音声処理部54で生成された他拠点側の音声に対応する文字データは、制御部30による制御の下で制御部30へ返送される。かかる構成により、他拠点側の設備状態に関わらず、他拠点側で発せられた音声を文字表示することが可能となる。 The output voice processing unit 54 may convert the voice data obtained by the voice display device 10 on the other site side included in the supplied signal into character data. For example, if the data communication device on the other site side is a conventional communication device instead of the voice display device 10 according to the embodiment of the present invention, the communication device on the other site side converts the voice data into character data. I can't. For this reason, a signal input from the other site side having the conventional communication device to the control unit 30 via the communication unit 50 includes voice data but does not include character data. In such a case, the control unit 30 transfers the voice data on the other site side to the output voice processing unit 54, and also generates the character data based on the transferred voice data with respect to the output voice processing unit 54. Command. Character data corresponding to the voice on the other base side generated by the output voice processing unit 54 is returned to the control unit 30 under the control of the control unit 30. With such a configuration, it is possible to display texts uttered on the other site side regardless of the equipment state on the other site side.

音声表示装置10が有する画像出力部28は、例えば装置自体に画像を表示するディスプレイまたはスクリーンに画像を投影するプロジェクタのように、画像データを載せた電気信号などの信号を可視光に変換する機器であり、画像の出力を実現するあらゆる手段が定義上含まれる。 The image output unit 28 of the audio display device 10 is a device that converts a signal such as an electric signal carrying image data into visible light, such as a display that displays an image on the device itself or a projector that projects an image on a screen. Any means for realizing image output is included in the definition.

音声表示装置10は、画像出力部28と制御部30の間を接続するインタフェースとして、他拠点側のデータ通信装置から得られた画像データを含む信号を処理する出力画像処理部60を有する。出力画像処理部60では、受け取った信号を、画像出力部28で処理可能な方式の画像信号に変換処理する。 The audio display device 10 includes an output image processing unit 60 that processes a signal including image data obtained from the data communication device on the other site side as an interface for connecting the image output unit 28 and the control unit 30. The output image processing unit 60 converts the received signal into an image signal that can be processed by the image output unit 28.

出力画像処理部60と制御部30の間は、通信線62を介して接続されている。通信線62は信号の伝送路となり、出力画像処理部60は、制御部30から他拠点側の画像データを含む信号を受信することができる。また、通信線62を介して、出力画像処理部60は制御部30から制御信号を受け取り、記憶部32に記憶されたプログラムに記載の手順に従った信号処理を実行することができる。 The output image processing unit 60 and the control unit 30 are connected via a communication line 62. The communication line 62 becomes a signal transmission path, and the output image processing unit 60 can receive a signal including image data on the other site side from the control unit 30. Further, the output image processing unit 60 can receive a control signal from the control unit 30 via the communication line 62 and execute signal processing according to the procedure described in the program stored in the storage unit 32.

出力画像処理部60と画像出力部28の間は、有線または無線の回線64を介して接続されている。かかる構成により、出力画像処理部60で処理された画像データ付きの画像信号は画像出力部28に供給される。 The output image processing unit 60 and the image output unit 28 are connected via a wired or wireless line 64. With this configuration, the image signal with image data processed by the output image processing unit 60 is supplied to the image output unit 28.

出力画像処理部60は、制御部30を介して入力音声処理部36で生成された文字データを受け取り、受け取ったデータに対応する文字を画像データから得られる画像とともに画像出力部28を通じて表示するための処理を実行する。表示の対象となる文字データには、他拠点側の音声表示装置10で生成されIP網14を介して自拠点側の音声表示装置10に送信された文字データのみならず、自拠点側の音声を変換して得られた文字データも含まれる。 The output image processing unit 60 receives the character data generated by the input sound processing unit 36 via the control unit 30 and displays the characters corresponding to the received data through the image output unit 28 together with the image obtained from the image data. Execute the process. The character data to be displayed includes not only the character data generated by the voice display device 10 at the other site side and transmitted to the voice display device 10 at the local site via the IP network 14, but also the voice data at the local site side. Character data obtained by converting is also included.

他拠点側のデータ通信装置が従来の装置であって音声から変換された文字データが送信されてこないような場合には、他拠点側から受け取った音声データに基づいて出力音声処理部54で生成された文字データが制御部30を介して転送され、この転送された文字データを表示処理の対象とする。 If the data communication device at the other site is a conventional device and character data converted from voice is not sent, it is generated by the output voice processing unit 54 based on the voice data received from the other site The transferred character data is transferred via the control unit 30, and the transferred character data is set as a display process target.

文字データの表示処理は、文字データの単なる視覚化処理にとどまらない。例えば、制御部30または制御部30の制御下にある出力画像処理部60は、処理対象となる各文字データに基づいて、どの拠点で発せられた音声に基づき生成されたデータであるかを識別することができる。かかる識別結果を踏まえて、出力画像処理部60は、音声の発生源に関する情報に応じて画面内における文字の表示位置を決定する処理を行うことが可能である。また、かかる識別結果を踏まえれば、出力画像処理部60は音声の発生源に応じて文字の色を区別して画面表示させる処理を行うことも可能である。このような処理を行って音声の発生源ごとに文字の表示態様を適宜に変えることにより、どの発言がどの拠点にいる参加者から発せられたものであるかを視覚的に認識することがより容易となる。 The display process of character data is not limited to a simple visualization process of character data. For example, the control unit 30 or the output image processing unit 60 under the control of the control unit 30 identifies at which base the data is generated based on the voice uttered based on each character data to be processed. can do. Based on the identification result, the output image processing unit 60 can perform processing for determining the display position of the character in the screen according to the information regarding the sound generation source. Further, based on the identification result, the output image processing unit 60 can also perform a process of displaying the screen by distinguishing the color of the character according to the sound generation source. By performing such processing and appropriately changing the character display mode for each sound source, it is possible to visually recognize which utterance originated from a participant at which location. It becomes easy.

また、出力画像処理部60は、声量に関する文字データの付随情報などに基づいて、文字データに対応する音声が発せられたときの音声の強弱を判断することができる。かかる判断により音声の強弱に関する情報を得て、この情報を踏まえて、出力画像処理部60は、画像出力部28から出力される文字の大きさを決定する処理を行うことが可能である。かかる処理を行うことにより、例えば発言者が強調したい発言を大きめに表示して、視覚的に認識しやすくすることが可能となる。音声の強弱に応じて文字の大きさを変えるのみならず、文字のフォントや色を変えるなどの文字表示の変化を施しても構わない。 Further, the output image processing unit 60 can determine the strength of the sound when the sound corresponding to the character data is emitted based on the accompanying information of the character data related to the voice volume. Based on this determination, information on the strength of the voice is obtained, and based on this information, the output image processing unit 60 can perform processing for determining the size of characters output from the image output unit 28. By performing such processing, for example, a speech that a speaker wants to emphasize can be displayed in a larger size so that it can be easily recognized visually. In addition to changing the size of the character according to the strength of the voice, the character display may be changed such as changing the font or color of the character.

また、出力画像処理部60は、各拠点間における会話の状況、すなわち通信状況を文字にして画面に表示させる処理を行うことも可能である。例えば、会議参加者達の沈黙が続いて会議の進行が止まっていると制御部30が判断した場合には、制御部30はその判断結果を出力画像処理部60に伝える。続いて、出力画像処理部60は、制御部30の判断結果を示す文字、例えば「シーン…」のような擬態語や、「沈黙中」のように現在の状態を示す語句を画面上に表示させる処理を行うことができる。 Further, the output image processing unit 60 can also perform a process of displaying the conversation status between the bases, that is, the communication status as characters, on the screen. For example, when the control unit 30 determines that the conference has been silenced and the conference has stopped, the control unit 30 notifies the output image processing unit 60 of the determination result. Subsequently, the output image processing unit 60 displays characters indicating the determination result of the control unit 30, for example, a mimetic word such as “scene ...” or a phrase indicating the current state such as “silent” on the screen. Processing can be performed.

かかる表示をすることによって、会議の進行が止まっている状態を参加者の視覚に訴えかけ、例えば、参加者の発言を促す心理的効果を与えることが可能となる。 By performing such a display, it is possible to appeal to the participant's vision that the progress of the conference has stopped, and to give a psychological effect that prompts the participant to speak, for example.

他方、議論が白熱していると制御部30が判断した場合にも、制御部30はその判断結果を出力画像処理部60に伝える。そして出力画像処理部60は、制御部30の判断結果を示す文字、例えば「激論中!!」等のような現在の状態を示す語句を画面上に表示させる処理を行うことができる。 On the other hand, when the control unit 30 determines that the discussion is incandescent, the control unit 30 transmits the determination result to the output image processing unit 60. Then, the output image processing unit 60 can perform processing for displaying characters indicating the determination result of the control unit 30, for example, a phrase indicating the current state, such as “under discussion !!” on the screen.

かかる表示をすることによって、議論が白熱している状態を参加者の視覚に訴えかけ、例えば、積極的に発言を行っている者以外の者に対して積極的な発言を促したり、興奮した状態にある参加者に平静を取り戻させたりする心理的効果を与えることが可能となる。 This display appealed to the participants' visual perceptions that the discussion was incandescent, for example, encouraged people who were not actively speaking, or was excited. It is possible to give a psychological effect to regain calmness to participants in the state.

制御部30により判別可能な会話の状況は、上述した沈黙や議論の白熱に限られない。その他にも様々な会話の状況を制御部30で判別できるように設定しておき、これらの状況に応じた擬態語や現在の状態を表す語句などを記憶部32に記憶させておくとより好ましい構成となる。かかる構成を採る音声表示装置10の実施例を用いれば、テレビ会議システム12を利用した遠隔地間の会話の能率がさらに上昇する。 The state of conversation that can be discriminated by the control unit 30 is not limited to the above-described silence and incandescence of discussion. In addition to this, it is more preferable that various conversation situations are set so that the control unit 30 can discriminate, and imitation words corresponding to these situations, words and phrases representing the current state, etc. are stored in the storage unit 32. It becomes. If the embodiment of the audio display device 10 having such a configuration is used, the efficiency of conversation between remote locations using the video conference system 12 is further increased.

なお、ここまでで述べてきた音声表示装置10の構成例では、制御部30と各処理部36、42、54、60を別の構成要素として述べてきたが、この説明は便宜上のものである。制御部30が各処理部36、42、54、60の一部または全部を含むような構成に設計し、制御部30自身が各処理部36、42、54、60の一部または全部の役割を兼ねても構わない。 In the configuration example of the voice display device 10 described so far, the control unit 30 and the processing units 36, 42, 54, and 60 have been described as separate components, but this description is for convenience. . The control unit 30 is designed to include a part or all of each processing unit 36, 42, 54, 60, and the control unit 30 itself is a part or all of the roles of each processing unit 36, 42, 54, 60. You may also serve as

続いて、本発明に係る音声表示装置10の実施例を含むテレビ会議システム12を用いて離れた拠点間で会議を行う際におけるテレビ会議システム12、特に音声表示装置10の動作について説明する。本説明では、拠点Ａおよび拠点Ｂの２地点に会議参加者が集まって会議を開催するものとする。しかしながら、３地点以上の多拠点間における会議を開催する場合も音声表示装置10やテレビ会議システム12の基本的な動作は２地点の場合と同様である。また、特に断り書きのない限り、拠点Ａ側および拠点Ｂ側に設置されているデータ通信装置(10)はいずれも、本発明に係る音声表示装置10の実施例であるものとして説明する。必要に応じて、拠点Ａ側に設置されているテレビ会議システム12の構成要素には参照符号の後に小文字の”a”を付して、拠点Ｂ側に設置されているテレビ会議システム12の構成要素には参照符号の後に小文字の”b”を付して図示および説明を行う。 Next, the operation of the video conference system 12, particularly the voice display device 10 when a conference is held between remote sites using the video conference system 12 including the embodiment of the voice display device 10 according to the present invention will be described. In this description, it is assumed that conference participants gather at two locations, the base A and the base B, and hold a conference. However, the basic operation of the voice display device 10 and the video conference system 12 is the same as in the case of two points even when a conference is held between three or more points. Further, unless otherwise specified, the data communication device (10) installed on the site A side and the site B side will be described as being examples of the voice display device 10 according to the present invention. If necessary, constituent elements of the video conference system 12 installed at the site A are indicated by a lowercase letter “a” after the reference sign, and the configuration of the video conference system 12 installed at the site B is added. Elements are shown and described with a lowercase “b” after the reference sign.

まず、拠点Ａおよび拠点Ｂのそれぞれに集まった会議の参加者は、テレビ会議システム12を使用しての両拠点間における音声および画像の円滑な相互通信を実施するために必要な操作を行う。例えば、参加者は音声表示装置10ならびにその構成要素である入力部22、24および出力部26、28の設定を調整する。 First, the participants of the conference gathered at each of the bases A and B perform operations necessary for carrying out smooth mutual communication of voice and images between the two bases using the video conference system 12. For example, the participant adjusts the settings of the voice display device 10 and the input units 22 and 24 and the output units 26 and 28 which are the components thereof.

かかる操作を経て音声表示装置10が動作可能になると、各拠点の音声入力部22は、会議参加者が発した音声を入力して、入力した音声を入力音声処理部36で処理可能な形式の信号、例えば電気信号に変換する。音声の入力と同時進行で、各拠点の画像入力部24は、各拠点の様子を撮影し、得られた画像を入力画像処理部42で処理可能な形式の信号に変換する。 When the voice display device 10 becomes operable through such an operation, the voice input unit 22 at each site inputs a voice uttered by the conference participant, and the input voice processing unit 36 can process the input voice. Convert to a signal, for example an electrical signal. At the same time as the voice input, the image input unit 24 at each site photographs the state of each site and converts the obtained image into a signal in a format that can be processed by the input image processing unit 42.

音声入力部22は変換後の信号を入力音声処理部36に供給し、画像入力部24もまた変換後の信号を入力画像処理部42に供給する。入力音声処理部36は、制御部30の指示の下で音声データを含む信号を処理する。入力音声処理部36で実行される信号処理には、信号に含まれる音声データから文字データへの変換が含まれ、好ましくは入力された音声の強弱の認識なども含まれる。同様に、入力画像処理部42でも、制御部30の指示の下で画像データを含む信号を処理する。 The audio input unit 22 supplies the converted signal to the input audio processing unit 36, and the image input unit 24 also supplies the converted signal to the input image processing unit 42. The input voice processing unit 36 processes a signal including voice data under the instruction of the control unit 30. The signal processing executed by the input speech processing unit 36 includes conversion from speech data included in the signal to character data, and preferably includes recognition of the strength of the input speech. Similarly, the input image processing unit 42 processes a signal including image data under the instruction of the control unit 30.

入力音声処理部36は、音声データを含む処理後の信号を制御部30に供給する。このときに制御部30に供給される信号には、処理部36の変換処理により得られた文字データが添付される。同様に入力画像処理部42でも、画像データを含む処理後の信号を制御部30に供給する。 The input voice processing unit 36 supplies a processed signal including voice data to the control unit 30. At this time, the character data obtained by the conversion process of the processing unit 36 is attached to the signal supplied to the control unit 30. Similarly, the input image processing unit 42 supplies a processed signal including image data to the control unit 30.

ここまでで述べてきた動作は、音声表示装置10a、10bのいずれでも実行される。拠点Ａ側では、音声表示装置10aの制御部30aは、入力音声処理部36aから供給を受けた音声データおよび文字データを含む信号、ならびに入力画像処理部42bから供給を受けた画像データを含む信号を、IP網14を介する信号送信に適した形式に処理する。その後制御部30aは、自ら処理した各種データを含有する信号を通信部50aに供給するとともに、供給した信号を音声表示装置10bに転送するよう通信部50aに対して命令する。 The operations described so far are executed by both the voice display devices 10a and 10b. On the site A side, the control unit 30a of the voice display device 10a receives a signal including voice data and character data supplied from the input voice processing unit 36a, and a signal including image data supplied from the input image processing unit 42b. Are processed into a format suitable for signal transmission via the IP network 14. Thereafter, the control unit 30a supplies a signal containing various data processed by itself to the communication unit 50a, and instructs the communication unit 50a to transfer the supplied signal to the audio display device 10b.

このとき、拠点Ｂ側でも同様に、音声表示装置10bの制御部30bは、入力音声処理部36bおよび入力画像処理部42bから供給を受けた信号を、IP網14を介する信号送信に適した形式に処理する。その後制御部30bは、自ら処理した含有する信号を通信部50bに供給するとともに、供給した信号を音声表示装置10aに転送するよう通信部50bに対して命令する。 At this time, the control unit 30b of the voice display device 10b similarly converts the signal supplied from the input voice processing unit 36b and the input image processing unit 42b to a format suitable for signal transmission via the IP network 14 on the site B side as well. To process. Thereafter, the control unit 30b supplies the communication unit 50b with the contained signal processed by itself, and instructs the communication unit 50b to transfer the supplied signal to the audio display device 10a.

このような処理動作を経て、音声表示装置10aの制御部30aは、通信部50aを介して、音声表示装置10bによって処理されたデータ、すなわち音声データ、文字データおよび画像データを包含する信号の供給を受ける。同様に音声表示装置10bの制御部30bでも、通信部50bを介して、音声表示装置10aによって処理されたデータを包含する信号の供給を受ける。 Through such processing operation, the control unit 30a of the voice display device 10a supplies data including data processed by the voice display device 10b, that is, voice data, character data, and image data, via the communication unit 50a. Receive. Similarly, the control unit 30b of the voice display device 10b also receives a signal including data processed by the voice display device 10a via the communication unit 50b.

制御部30は、自拠点側の音声表示装置10で得られた信号および他拠点側の音声表示装置10から受け取った信号を処理する。より具体的には、他拠点側の音声データを出力音声処理部54に供給することができるよう、また他拠点側の画像データおよび両拠点の文字データを出力画像処理部60に供給することができるように信号の処理を行う。 The control unit 30 processes a signal obtained from the voice display device 10 on the local site side and a signal received from the voice display device 10 on the other site side. More specifically, the audio data of the other base can be supplied to the output audio processing unit 54, and the image data of the other base and the character data of both bases can be supplied to the output image processing unit 60. Process the signal as you can.

制御部30はさらに、自拠点側の音声表示装置10で生成された音声、文字および画像のデータならびに他拠点側の音声表示装置10から送信されてきた信号に包含されている音声、文字および画像のデータに基づいて、会話の状況を判断する。制御部30は、例えば、会議参加者達の沈黙が続いて会議の進行が止まっているか否か、また、両拠点からの発言が活発であり議論が白熱しているか否か等を判断することができる。 The control unit 30 further includes voice, character, and image data generated by the voice display device 10 on the local site side, and voice, text, and image included in the signal transmitted from the voice display device 10 on the other site side. The situation of the conversation is judged based on the data. The control unit 30 determines, for example, whether or not the conference participants have been silenced and the progress of the conference has stopped, and whether or not the discussion from both bases is active and the discussion is heated. Can do.

制御部30aは、他拠点Ｂ側の音声入力部22bから入力された音声に基づき生成された音声データを含む信号を出力音声処理部54aに供給する。出力音声処理部54aは、制御部30aの指示の下で、受け取った信号を音声出力部26aで処理可能な形式の信号に変換する。続いて出力音声処理部54aは、変換した信号を音声出力部26aに出力する。 The control unit 30a supplies the output sound processing unit 54a with a signal including sound data generated based on the sound input from the sound input unit 22b on the other site B side. The output audio processing unit 54a converts the received signal into a signal that can be processed by the audio output unit 26a under the instruction of the control unit 30a. Subsequently, the output sound processing unit 54a outputs the converted signal to the sound output unit 26a.

音声出力部26aは、受け取った信号を会議参加者が聴覚的に認識可能な音に変換して音を出力する。このようにして、拠点Ａ側の会議参加者は、遠隔地である拠点Ｂ側の参加者の発言を聞き取ることができる。なお、他拠点側にある出力音声処理部54bおよび音声出力部26bでも同様の処理が行われる。 The audio output unit 26a converts the received signal into a sound that can be audibly recognized by the conference participant, and outputs the sound. In this way, the conference participant on the site A side can hear the speech of the participant on the site B side which is a remote place. Note that the same processing is performed in the output audio processing unit 54b and the audio output unit 26b on the other site side.

なお、本実施例で用いられるデータ通信装置は、拠点Ａ側、拠点Ｂ側のいずれでも、音声を文字データ化することが可能な音声表示装置10であるものとして説明している。しかしながら、例えば、拠点Ａ側のデータ通信装置は本発明の実施例に係る音声表示装置10aであるが拠点Ｂ側のデータ通信装置は従来の装置である場合には、拠点Ｂ側から音声データは送信されても文字データは送信されてこない。または、両方の拠点で本発明の実施例に係る音声表示装置10が用いられていても、通信障害が発生している場合や、一方の音声表示装置10bが故障している場合などには、他方の音声表示装置10aに文字データが正常に送信されてこない可能性がある。 Note that the data communication apparatus used in the present embodiment is described as the voice display apparatus 10 that can convert voice into character data on either the site A side or the site B side. However, for example, when the data communication device on the site A side is the voice display device 10a according to the embodiment of the present invention, but the data communication device on the site B side is a conventional device, the voice data is transmitted from the site B side. No character data is sent even if it is sent. Or, even if the voice display device 10 according to the embodiment of the present invention is used at both bases, if a communication failure has occurred, or if one voice display device 10b is faulty, There is a possibility that the character data is not normally transmitted to the other voice display device 10a.

このように、制御部30aで音声データの受信を検出したものの音声データに対応する文字データの受信を検出できない場合には、制御部30aは、拠点Ｂ側から供給された音声データを出力音声処理部54aに出力するとともに、出力音声処理部54aに対して拠点Ｂ側の音声データを文字データに変換することを指示する命令を送ってもよい。この場合には、出力音声処理部54aは制御部30aから受けた命令を実行し、変換生成した文字データを制御部30aに返送する。 As described above, when the reception of the voice data is detected by the control unit 30a, but the reception of the character data corresponding to the voice data cannot be detected, the control unit 30a outputs the voice data supplied from the base B side to the output voice process. In addition to outputting to the unit 54a, a command for instructing the output voice processing unit 54a to convert the voice data on the site B side into character data may be sent. In this case, the output voice processing unit 54a executes the command received from the control unit 30a, and returns the converted character data to the control unit 30a.

制御部30aは、画像データおよび文字データを含む信号を出力画像処理部60aに供給する。出力画像処理部60aに供給される画像データは他拠点Ｂ側の画像入力部24bで生成された画像データであり、出力画像処理部60aに供給される文字データは、入力音声処理部36aで生成された文字データおよび入力音声処理部36bで生成され音声表示装置10aに送信されてきた文字データである。ただし、音声表示装置10aから文字データが送信されなかった場合には、制御部30aは出力音声処理部54aで生成され制御部30aに返送された拠点Ｂ側の文字データを出力画像処理部60aに転送することができる。 The control unit 30a supplies a signal including image data and character data to the output image processing unit 60a. The image data supplied to the output image processing unit 60a is image data generated by the image input unit 24b on the other site B side, and the character data supplied to the output image processing unit 60a is generated by the input sound processing unit 36a. Character data generated by the input voice processing unit 36b and transmitted to the voice display device 10a. However, if character data is not transmitted from the voice display device 10a, the control unit 30a sends the character data on the site B side generated by the output voice processing unit 54a and returned to the control unit 30a to the output image processing unit 60a. Can be transferred.

出力画像処理部60aは、制御部30aの指示の下で、受信した信号に含まれる画像データおよび文字データの成分を画像出力部28で処理可能な形式の画像信号に変換する。続いて出力画像処理部60aは、変換した画像信号を画像出力部28aに出力する。 Under the instruction of the control unit 30a, the output image processing unit 60a converts the image data and character data components included in the received signal into an image signal in a format that can be processed by the image output unit 28. Subsequently, the output image processing unit 60a outputs the converted image signal to the image output unit 28a.

画像出力部28aは、受け取った信号を会議参加者が視覚的に認識可能な画像に変換して出力する。このときに制御部30aは、出力画像処理部60aおよび出力音声処理部56aを介して、画像出力部28aによる画像出力のタイミングが音声出力部26aによる音声出力のタイミングと対応するように制御する。このようにして、拠点Ａ側の会議参加者は、遠隔地である拠点Ｂ側の風景のみならず、両拠点にいる会議参加者の発言内容を視覚的に認識することができる。会議での発言は画像出力部28aによって文字として表示されるからである。なお、他拠点側にある出力画像処理部60bおよび画像出力部28bでも同様の処理が行われる。 The image output unit 28a converts the received signal into an image that can be visually recognized by a conference participant and outputs the image. At this time, the control unit 30a controls the image output timing by the image output unit 28a to correspond to the audio output timing by the audio output unit 26a via the output image processing unit 60a and the output audio processing unit 56a. In this way, the conference participant on the site A side can visually recognize not only the scenery on the site B side, which is a remote location, but also the content of the speech of the conference participants at both sites. This is because the speech at the meeting is displayed as characters by the image output unit 28a. Note that similar processing is performed in the output image processing unit 60b and the image output unit 28b on the other site side.

出力画像処理部60による文字データの処理および画像出力部28による文字の表示の例については、以下において図３ないし図７を参照しながらより詳細な説明を行う。図３ないし図７では、画像出力部28、特に拠点Ａ側に設置されている画像出力部28aから出力される画像70が示されている。この画像70には、他拠点Ｂ側に設置されている画像入力部24bから入力された風景が映し出されている。 An example of character data processing by the output image processing unit 60 and character display by the image output unit 28 will be described in more detail below with reference to FIGS. 3 to 7 show an image 70 output from the image output unit 28, particularly the image output unit 28a installed on the site A side. In this image 70, a landscape input from the image input unit 24b installed on the other site B side is displayed.

出力画像処理部60は、供給を受けた文字データがどちらの拠点から入力された音声に基づくものであるかを判別することができる。音声表示装置10aでは、出力画像処理部60aは音声の入力源に関する判別結果に基づいて、画面70で表示する文字の色を決定し、所定の処理を行う。 The output image processing unit 60 can determine which base the supplied character data is based on the voice input from. In the audio display device 10a, the output image processing unit 60a determines the color of characters to be displayed on the screen 70 based on the determination result relating to the audio input source, and performs predetermined processing.

例えば、文字データは自拠点Ａ側に設置された音声入力部22aから入力された音声を変換したものであると出力画像処理部60aが判別した場合には、出力画像処理部60aは当該データを画像出力部28aでは白色の文字として表示させる処理を行う。他方、文字データは他拠点Ｂ側に設置された音声入力部22bから入力された音声を変換したものであると出力画像処理部60aが判別した場合には、出力画像処理部60aは当該データを画像出力部28aでは緑色の文字として表示させる処理を行う。言うまでもなく、文字の表示に用いる色は、画像出力部28aから出力できる限りにおいて任意に選択し得る。 For example, when the output image processing unit 60a determines that the character data is converted from the voice input from the voice input unit 22a installed on the local site A side, the output image processing unit 60a The image output unit 28a performs processing for displaying as white characters. On the other hand, when the output image processing unit 60a determines that the character data is converted from the voice input from the voice input unit 22b installed on the other site B side, the output image processing unit 60a The image output unit 28a performs processing for displaying as green characters. Needless to say, the color used for displaying characters can be arbitrarily selected as long as it can be output from the image output unit 28a.

さらに、出力画像処理部60aは、音声の入力源に関する判別結果に基づいて、画面70内における文字の表示位置を決定する。例えば、図３に示すように、自拠点Ａの音声入力部22aから入力された音声を変換して得られた文字72aを画面70の左側に表示させ、他拠点Ｂの音声入力部22bから入力された音声を変換して得られた文字72bを画面70の右側に表示させてもよい。なお、表示された文字72は、画面70上の限られた領域を有効活用すべく、任意のタイミングで画面から消去させてよい。 Further, the output image processing unit 60a determines the display position of the character in the screen 70 based on the determination result regarding the audio input source. For example, as shown in FIG. 3, the character 72a obtained by converting the voice input from the voice input unit 22a of the local site A is displayed on the left side of the screen 70 and input from the voice input unit 22b of the other site B. The character 72b obtained by converting the generated voice may be displayed on the right side of the screen 70. The displayed character 72 may be erased from the screen at an arbitrary timing so as to effectively use a limited area on the screen 70.

出力画像処理部60aは、画像出力部28aに対し、各拠点から入力された音声が画面70の左右どちら側に出力されるかを示す目印74を表示させる処理を行うとより好ましい。 More preferably, the output image processing unit 60a performs a process of causing the image output unit 28a to display a mark 74 indicating whether the sound input from each base is output to the left or right side of the screen 70.

図４は、画面70における別の文字表示例を示している。図４の表示例では、自拠点Ａ側の音声を変換した文字76aの表示位置と他拠点Ｂ側の音声を変換した文字76bの表示位置は特段区別されていない。その結果、文字76aと文字76bは混在して表示されることになる。文字76aおよび76bは画面70の右側から左側に流れるように画面70上を移動する。このような表示例の場合、画面70条における文字76aと文字76bの表示色を変えておけば、会議参加者は表示されている文字76がどちらの拠点からの発言であるかを視覚的に判別することができる。 FIG. 4 shows another example of character display on the screen 70. In the display example of FIG. 4, there is no particular distinction between the display position of the character 76a converted from the voice at the local site A and the display position of the character 76b converted from the voice at the other site B. As a result, the characters 76a and 76b are displayed together. The characters 76a and 76b move on the screen 70 so as to flow from the right side to the left side of the screen 70. In the case of such a display example, if the display color of the characters 76a and 76b on the screen 70 is changed, the conference participant can visually determine which site the displayed character 76 is from. Can be determined.

この表示例の場合も、出力画像処理部60aは、画像出力部28aに対し、各拠点から入力された音声がそれぞれ何色で文字化されるかを示す目印74を表示させる処理を行うとより好ましい。例えば、自拠点Ａ側の音声を変換した文字76aの表示色が白色である場合には目印74aも白色で、他拠点Ｂ側の音声を変換した文字76bの表示色が緑色である場合には目印74bも緑色で表示される。 Also in the case of this display example, the output image processing unit 60a performs a process of causing the image output unit 28a to display a mark 74 indicating in what color each voice input from each base is converted into a character. preferable. For example, when the display color of the character 76a obtained by converting the voice at the local site A is white, the mark 74a is also white, and when the display color of the character 76b obtained by converting the voice at the other site B is green. The mark 74b is also displayed in green.

また、出力画像処理部60aは、文字データに対応する入力音声の強弱を判断可能である場合には、その判断結果に基づいて画面70に表示される文字の大きさを決定する処理を行ってもよい。かかる処理が行われたときの画面70の表示例が図５に示されている。例えば、自拠点Ａ側の会議参加者が、通常の発言時より大きな声で所定の発言をした場合には、出力画像処理部60aは、当該発言を変換処理して得られた文字78aを、通常の声量の発言から得られた文字72a、72bの表示寸法よりも大きく画面70に表示させる。他方、自拠点Ａ側の会議参加者が、通常の発言時より小さな声で所定の発言をした場合には、出力画像処理部60aは、当該発言を変換処理して得られた文字80aを、文字72a、72bの表示寸法よりも小さく画面70に表示させる。 In addition, when the output image processing unit 60a can determine the strength of the input voice corresponding to the character data, the output image processing unit 60a performs a process of determining the size of the character displayed on the screen 70 based on the determination result. Also good. A display example of the screen 70 when such processing is performed is shown in FIG. For example, when a conference participant at the local site A makes a predetermined utterance with a louder voice than during normal utterance, the output image processing unit 60a converts the character 78a obtained by converting the utterance into It is displayed on the screen 70 larger than the display size of the characters 72a and 72b obtained from the speech of normal voice volume. On the other hand, when the conference participant at the local site A side makes a predetermined utterance with a lower voice than the normal utterance, the output image processing unit 60a uses the character 80a obtained by converting the utterance, It is displayed on the screen 70 smaller than the display dimensions of the characters 72a and 72b.

音声の強弱を画面70上に表示する方法は、表示される文字の大小に限られない。例えば、音声の強弱に応じて文字のフォントや色を変えて表示させても構わない。 The method for displaying the strength of the voice on the screen 70 is not limited to the size of the displayed characters. For example, the font and color of characters may be changed according to the strength of the voice.

図６および図７は、出力画像処理部60aが拠点AB間における会話の状況を文字にして画面70に表示させる処理を行った場合の表示例である。ある一定時間会話がない旨の制御部30aの判断結果が出力画像処理部60aに伝送されてきた場合には、図６に示すように、出力画像処理部60aは制御部30aの判断結果を示す文字82を画面70に表示させる処理を行う。文字82は例えば、「シーン…」等のような擬態語や、「沈黙中」のように現在の会話状態を示す語句である。 FIGS. 6 and 7 are display examples when the output image processing unit 60a performs a process of displaying the state of conversation between the bases AB as characters on the screen 70. FIG. When the determination result of the control unit 30a indicating that there is no conversation for a certain period of time is transmitted to the output image processing unit 60a, the output image processing unit 60a indicates the determination result of the control unit 30a as shown in FIG. Processing to display the character 82 on the screen 70 is performed. The character 82 is, for example, a mimetic word such as “scene ...” or a phrase indicating the current conversation state such as “silent”.

また、両拠点からの活発な発言が続き議論が白熱している旨の制御部30aの判断結果が出力画像処理部60aに伝送されてきた場合には、図７に示すように、出力画像処理部60aは制御部30aの判断結果を示す文字84を画面70に表示させる処理を行う。文字84は例えば、「激論中!!」のように現在の会話状態を示す語句である。 In addition, when active statements from both bases continue and the determination result of the control unit 30a that the discussion is incandescent is transmitted to the output image processing unit 60a, as shown in FIG. The unit 60a performs processing for displaying a character 84 indicating the determination result of the control unit 30a on the screen 70. The character 84 is a phrase indicating the current conversation state, for example, “Intense discussion !!”.

ここでいう文字82や文字84には、話者が発した言語の表記の用に供する狭義の文字の他に、例えば文字コード規格で定められている広義の文字、すなわち記号、符号、ピクトグラムなども含まれ得る。また、制御部30の判断に基づく出力画像の処理手法は、文字82や文字84の表示に限らない。例えば、所定の画像を予め記憶部32に記憶させておき、出力画像処理部60は制御部30の判断に基づいて記憶されている画像を表示させてもかまわない。また、画面70に特殊な視覚的加工を施す所定のプログラムを予め記憶部32に記憶させておき、出力画像処理部60は制御部30の判断に基づき、出力される画面70に視覚的加工処理を施しても構わない。 The character 82 and the character 84 here include, in addition to the narrowly-defined characters used for notation of the language uttered by the speaker, for example, the broadly defined characters defined by the character code standard, that is, symbols, codes, pictograms May also be included. Further, the output image processing method based on the determination of the control unit 30 is not limited to the display of the characters 82 and 84. For example, a predetermined image may be stored in the storage unit 32 in advance, and the output image processing unit 60 may display the stored image based on the determination of the control unit 30. Further, a predetermined program for performing special visual processing on the screen 70 is stored in the storage unit 32 in advance, and the output image processing unit 60 performs visual processing on the output screen 70 based on the determination of the control unit 30. May be applied.

すなわち、本発明に係る音声表示装置10の実施例によれば、遠隔の拠点間での会話の状況を制御部30が判断して、画面70を通して判断結果に基づく現在の会話状況を視覚的に表現することが可能となる。画面70で表示できる限りいかなる視覚的な表現手段も採り得る。このような画面上の視覚的表現に基づいて、会話の参加者は現在の会話状況を視覚で認識することが可能となる。 That is, according to the embodiment of the voice display device 10 according to the present invention, the control unit 30 determines the state of conversation between remote bases, and visually displays the current conversation state based on the determination result through the screen 70. It becomes possible to express. Any visual expression means that can be displayed on the screen 70 can be adopted. Based on such a visual expression on the screen, a conversation participant can visually recognize the current conversation situation.

続いて、本発明に係る音声表示装置10の別の実施例に関する説明を行う。この実施例によれば、会話の中の重要なキーワードを、付箋のように画面上に残しておくことができる。また、会話の中で登場した回数の多い語句をランキングとして画面に表示することができる。 Subsequently, another embodiment of the voice display device 10 according to the present invention will be described. According to this embodiment, important keywords in the conversation can be left on the screen like a sticky note. In addition, words that appear frequently in conversations can be displayed on the screen as rankings.

音声表示装置10の別の実施例の構成を図８に示す。図８では、先に述べた図１で示す音声表示装置10の実施例と同様の構成要素に関しては同一の参照符号を付して図示するとともに、以下においても重複した説明を避ける。 The configuration of another embodiment of the voice display device 10 is shown in FIG. In FIG. 8, the same components as those of the embodiment of the voice display device 10 shown in FIG. 1 described above are denoted by the same reference numerals, and redundant description is avoided in the following.

図８に示す音声表示装置10の実施例は、先に述べた音声表示装置10の実施例に設けられている構成要素に加えて、制御部30と接続されている表示指示入力部102を有する。制御部30と表示指示入力部102の間は、例えば有線または無線の回線104を介して接続される。表示指示入力部102は、例えばボタン、レバー、タッチパネル等のような公知の入力装置で構成されて構わない。 The embodiment of the voice display device 10 shown in FIG. 8 has a display instruction input unit 102 connected to the control unit 30 in addition to the components provided in the embodiment of the voice display device 10 described above. . The control unit 30 and the display instruction input unit 102 are connected via a wired or wireless line 104, for example. The display instruction input unit 102 may be configured by a known input device such as a button, lever, touch panel, or the like.

表示指示入力部102では、例えば音声を変換した文字を画面70に貼り付けるように見せる画像表示処理を実行するよう求める指示信号が生成される。かかる画像表示処理の具体的な実行工程は記憶部32に記憶されている。利用者が表示指示入力部102を操作すると、表示指示入力部102は上述の指示信号を生成し、生成された指示信号を音声表示装置10の制御部30に伝送する。 In the display instruction input unit 102, for example, an instruction signal for generating an image display process in which a character obtained by converting sound is pasted on the screen 70 is generated. The specific execution process of the image display process is stored in the storage unit 32. When the user operates the display instruction input unit 102, the display instruction input unit 102 generates the above-described instruction signal and transmits the generated instruction signal to the control unit 30 of the voice display device 10.

かかる指示信号の供給を受けた制御部30は、出力画像処理部60と協働して指示に対応する画像表示処理の動作を実行する。その結果、出力画像処理部60では、画面70上に貼付されているかのように表示され続ける貼付文字106のデータを含む信号が生成される。出力画像処理部60が生成した信号を画像出力部28に出力すると、画像出力部28は、貼付文字106のデータを含む信号を、画像データや通常の文字データと同様に、会議参加者が視覚的に認識可能な画像に変換して出力する。 Upon receiving the instruction signal, the control unit 30 executes an image display processing operation corresponding to the instruction in cooperation with the output image processing unit 60. As a result, the output image processing unit 60 generates a signal including data of the pasted characters 106 that continue to be displayed as if pasted on the screen 70. When the signal generated by the output image processing unit 60 is output to the image output unit 28, the image output unit 28 visually recognizes the signal including the data of the pasted character 106, like the image data and the normal character data. Converted into a recognizable image and output.

図９は、図８で示す音声表示装置10の実施例において、表示指示入力部102を操作することによって出力された画像70の一例を示す。例えばボタンを押し続けている間など、参加者が指示入力部102を操作している間に音声入力部22に入力された音声から変換された文字データは、制御部30および出力画像処理部60での処理を経て、貼付文字106として画面70に出力される。貼付文字106は、時間の経過によってもその表示が画面70から消えることはなく、会議参加者が所定の操作を行う等するまで画面70上に表示され続ける。そのため、会議における重要なキーワードが明確となり、会議参加者全体に共有される。好ましくは図９に示すように、付箋上に貼付文字106が記載されているような表示上の演出を施すと、より視覚的効果が高まる。言うまでもなく、表示上の演出は、表示指示入力部102を操作中に音声入力部22に入力された音声を、文字72や文字76とは異なる態様、特に通常の文字の表示態様よりも視覚的に目立つ態様で画面70に表示するものであれば上述の例に限定されない。 FIG. 9 shows an example of an image 70 output by operating the display instruction input unit 102 in the embodiment of the voice display device 10 shown in FIG. For example, the character data converted from the voice input to the voice input unit 22 while the participant operates the instruction input unit 102, for example, while the button is being pressed, is transmitted to the control unit 30 and the output image processing unit 60. Through the process in, the pasted character 106 is output to the screen 70. The pasted character 106 does not disappear from the screen 70 as time elapses, and continues to be displayed on the screen 70 until a conference participant performs a predetermined operation. For this reason, important keywords in the conference are clarified and shared among all conference participants. Preferably, as shown in FIG. 9, when a presentation effect such as the pasted character 106 is written on a sticky note, the visual effect is further enhanced. Needless to say, the display effect is that the voice input to the voice input unit 22 during operation of the display instruction input unit 102 is different from the character 72 and the character 76, in particular, more visually than the normal character display mode. If it is displayed on the screen 70 in a conspicuous manner, it is not limited to the above example.

さらに、図８に示す音声表示装置10の実施例は、制御部30と接続されている集計指示入力部108を有する。制御部30と集計指示入力部108の間は、例えば有線または無線の回線110を介して接続される。集計指示入力部108は、例えばボタン、レバー、タッチパネル等のような公知の入力装置で構成されて構わない。 Further, the embodiment of the voice display device 10 shown in FIG. 8 has a totaling instruction input unit 108 connected to the control unit 30. The control unit 30 and the totalization instruction input unit 108 are connected via a wired or wireless line 110, for example. The aggregation instruction input unit 108 may be configured by a known input device such as a button, lever, touch panel, or the like.

また、本実施例においては、音声表示装置10で音声から文字データに変換された語句は、画像出力部28から出力されるとともに記憶部32に記憶される。記憶部32に記憶される文字データには、IP網14を介して他の音声表示装置10から供給を受けた文字データも含まれる。 In the present embodiment, the phrase converted from voice to character data by the voice display device 10 is output from the image output unit 28 and stored in the storage unit 32. The character data stored in the storage unit 32 includes character data supplied from another voice display device 10 via the IP network 14.

集計指示入力部108では、記憶部32に記憶されている文字データに含まれている語句ごとの数を集計するよう求める指示信号が生成される。利用者が集計指示入力部108を操作すると、集計指示入力部108は上述の指示信号を生成し、生成された指示信号を音声表示装置10の制御部30に伝送する。かかる集計処理の具体的な実行工程もまた記憶部32に記憶されている。利用者が集計指示入力部108を操作すると、集計指示入力部108は上述の指示信号を生成し、生成された指示信号を音声表示装置10の制御部30に伝送する。 The totaling instruction input unit 108 generates an instruction signal that requests to count the number for each word included in the character data stored in the storage unit 32. When the user operates the aggregation instruction input unit 108, the aggregation instruction input unit 108 generates the above-described instruction signal and transmits the generated instruction signal to the control unit 30 of the voice display device 10. A specific execution process of the tabulation process is also stored in the storage unit 32. When the user operates the aggregation instruction input unit 108, the aggregation instruction input unit 108 generates the above-described instruction signal and transmits the generated instruction signal to the control unit 30 of the voice display device 10.

かかる指示信号の供給を受けた制御部30は、記憶部32に記憶されている文字データに基づいて、会議参加者の発言に登場する各語句の集計を行う。集計が終了したら、制御部30は、出力画像処理部60と協働して集計結果112を画面70に表示させる処理を実行する。その結果、出力画像処理部60では、画面70上に表示すべき集計結果112のデータを含む信号が生成される。 Upon receiving the instruction signal, the control unit 30 aggregates each word / phrase appearing in the speech of the conference participant based on the character data stored in the storage unit 32. When the counting is completed, the control unit 30 executes a process of displaying the counting result 112 on the screen 70 in cooperation with the output image processing unit 60. As a result, the output image processing unit 60 generates a signal including the data of the aggregation result 112 to be displayed on the screen 70.

出力画像処理部60が生成した集計結果112を含む信号を画像出力部28に出力すると、画像出力部28は、集計結果112を含む信号を、画像データや通常の文字データと同様に、会議参加者が視覚的に認識可能な画像に変換して出力する。 When the output image processing unit 60 generates a signal including the counting result 112 to the image output unit 28, the image output unit 28 transmits the signal including the counting result 112 to the conference in the same manner as the image data and normal character data. Converted into an image that can be visually recognized by a person.

図10は、図８で示す音声表示装置10の実施例において、集計指示入力部108を操作することによって出力された画像70の一例を示す。例えばボタンを押す等、参加者が集計指示入力部108を操作すると、記憶部32に記憶されている文字データに基づいて制御部30が語句の集計を行う。さらに、制御部30および出力画像処理部60での画面表示処理を経て、集計結果112として画面70に出力される。 FIG. 10 shows an example of an image 70 output by operating the aggregation instruction input unit 108 in the embodiment of the voice display device 10 shown in FIG. For example, when the participant operates the totaling instruction input unit 108 such as pressing a button, the control unit 30 totals words based on the character data stored in the storage unit 32. Further, after the screen display processing in the control unit 30 and the output image processing unit 60, the total result 112 is output to the screen 70.

どのような集計結果を画面70に表示するかは任意であり、集計処理の実行工程の一部として予め記憶部32に記憶されている。図10の表示例においては、画面70に表示されている集計結果112は、会議の開始時点から集計指示入力部108を操作する時点までに発言された語句を集計した上位３語をランキング形式で示すものである。なお、本発明に係る音声表示装置10は、画面70に表示される集計結果112の内容を、利用者の操作または制御部30による自動的な判断などによって任意に調整できるような構成であるとより好ましい。 It is arbitrary what sort results are displayed on the screen 70, and is stored in the storage unit 32 in advance as a part of the execution process of the summary processing. In the display example of FIG. 10, the tabulation result 112 displayed on the screen 70 shows the top three words that are tabulated from the start of the conference until the point when the tabulation input unit 108 is operated in the ranking format. It is shown. The voice display device 10 according to the present invention is configured to be able to arbitrarily adjust the contents of the aggregation result 112 displayed on the screen 70 by a user operation or automatic determination by the control unit 30. More preferred.

このように集計結果112を画面70に表示することによって、会議で出現頻度が高い語句を重要なキーワードとして参加者の視覚を通じて認識させることが可能となる。 By displaying the total result 112 on the screen 70 in this way, it is possible to recognize a phrase having a high appearance frequency in a meeting as an important keyword through the visual sense of the participant.

ところで、上述してきた本発明の実施例は、コンピュータに音声表示装置10としての役割を実行させるプログラムを所定のコンピュータにインストールさせることによっても具現化され得る。この場合の実施例を、図11を参照しながら簡潔に説明する。 By the way, the embodiment of the present invention described above can also be realized by installing a program for causing a computer to perform the role as the voice display device 10 to a predetermined computer. An embodiment in this case will be briefly described with reference to FIG.

記憶媒体130に、コンピュータ132を本発明に係る音声表示装置10の実施例として機能させるプログラムを記憶しておく。ここで、記憶媒体130とは、光学ディスクや磁気ディスク、フラッシュメモリなど、プログラムを記憶することが可能ないかなる装置や部品も含まれる。 The storage medium 130 stores a program that causes the computer 132 to function as an embodiment of the audio display device 10 according to the present invention. Here, the storage medium 130 includes any device or component capable of storing a program, such as an optical disk, a magnetic disk, or a flash memory.

コンピュータ132は、記憶媒体130の記憶内容を読取り可能なドライブ134を有する。ドライブ134はコンピュータ132に固定的に内蔵されていても、または、コンピュータ132の筐体からは独立した外付け型でコンピュータ132と接続可能な機器であってもよい。また、コンピュータ132は、演算などの情報処理やコンピュータ自身の制御を行う中央処理装置(Central Processing Unit: CPU)136およびプログラムやデータなどを記憶する記憶装置138を有する。本図で示す記憶装置138は便宜上、データを一時的に記憶する装置および恒常的に記憶する装置の双方を含むものとする。CPU 136はドライブ134と接続線140を介して接続され、記憶装置138とも接続線142を介して接続されている。 The computer 132 has a drive 134 that can read the storage contents of the storage medium 130. The drive 134 may be fixedly incorporated in the computer 132, or may be an external device that can be connected to the computer 132 independent of the housing of the computer 132. The computer 132 includes a central processing unit (CPU) 136 that performs information processing such as computation and control of the computer itself, and a storage device 138 that stores programs, data, and the like. For convenience, the storage device 138 shown in this figure includes both a device for temporarily storing data and a device for permanently storing data. The CPU 136 is connected to the drive 134 via the connection line 140 and is also connected to the storage device 138 via the connection line 142.

記憶媒体130に記憶されたプログラムは、ドライブ134を介してコンピュータ132に読み取られ、読み取られたプログラムは、CPU 136による制御の下、コンピュータ132の記憶装置138に記憶される。このようにしてプログラムが組み込まれたコンピュータ132は、プログラムを実施させることにより、音声表示装置10として働くことが可能となる。このプログラムは、プログラムの内容に応じて、コンピュータ132内のCPU 136を制御部30および制御部30と協働して動作する各処理部として働かせ、記憶装置138を記憶部32として働かせるものであるともいえる。その他図示しないコンピュータ132内の様々な装置もまた、かかるプログラムの実行によって、音声表示装置10の構成部品として働くことになる。 The program stored in the storage medium 130 is read by the computer 132 via the drive 134, and the read program is stored in the storage device 138 of the computer 132 under the control of the CPU 136. The computer 132 in which the program is incorporated in this way can operate as the audio display device 10 by executing the program. According to the contents of the program, the CPU 136 in the computer 132 serves as each processing unit that operates in cooperation with the control unit 30 and the control unit 30, and the storage device 138 serves as the storage unit 32. It can be said. Various other devices in the computer 132 (not shown) also function as components of the audio display device 10 by executing the program.

また、各種の入力部22、24、102、108に相当する構成部品および各種の出力部26、28に相当する構成部品は、コンピュータ132に当初から内蔵されている構成要素を用いても、またはコンピュータ132の筐体からは独立して構成されている構成要素をコンピュータ132と接続して用いても構わない。 In addition, components corresponding to the various input units 22, 24, 102, and 108 and components corresponding to the various output units 26 and 28 may be components that are built in the computer 132 from the beginning, or Components that are configured independently from the housing of the computer 132 may be used by being connected to the computer 132.

なお、コンピュータ132へのプログラムのインストールは、ドライブ134を介して記憶媒体130に記憶されたプログラムを読み取る方式に限らず、ネットワークを介してプログラムを読み取る方式などと採用しても構わない。 Note that the installation of the program in the computer 132 is not limited to a method of reading the program stored in the storage medium 130 via the drive 134, and may be a method of reading the program via a network.

以上、ここまで本発明のいくつかの実施例を述べてきたが、本発明を実施する具体的手法は上述の実施例に制限されるものではない。本発明の実施が可能である限りにおいて適宜に設計や動作手順等の変更をなし得る。例えば、本発明に用いられる構成要素の機能発揮を補助する用に供する回路その他の機器については、適宜に付加および省略可能である。 Although several embodiments of the present invention have been described so far, specific methods for implementing the present invention are not limited to the above-described embodiments. As long as the present invention can be implemented, the design, operation procedure, and the like can be changed as appropriate. For example, circuits and other devices used for assisting the function of the components used in the present invention can be added and omitted as appropriate.

10 音声表示装置
12 テレビ会議システム
22 音声入力部
24 画像入力部
26 音声出力部
28 画像出力部
30 制御部
32 記憶部
36 入力音声処理部
42 入力画像処理部
50 通信処理部
54 出力音声処理部
60 出力画像処理部
102 表示指示入力部
108 集計指示処理部
10 Voice display device
12 Video conference system
22 Audio input section
24 Image input section
26 Audio output section
28 Image output section
30 Control unit
32 Memory unit
36 Input audio processor
42 Input image processor
50 Communication processor
54 Output audio processor
60 Output image processor
102 Display instruction input section
108 Aggregation instruction processing section

Claims

Used in systems that communicate at least two-way audio with other sites,
A voice input unit that inputs voice and converts the voice to generate a voice signal;
A voice processing unit that generates character data based on at least a voice signal generated by the voice input unit;
A communication processing unit that receives supply of a voice signal or character data generated based on the voice emitted from the other base, from the other base;
An image processing unit that generates an image signal based on character data corresponding to at least the voice input to the voice input unit and the voice emitted from the other base;
An image output unit for outputting and displaying an image based on the image signal;
Determining the communication status of the voice with the other bases, and having a control unit for controlling the operation of the voice processing unit, the communication processing unit, and the image processing unit;
The display mode of the image is determined according to at least one of information recognized from the character data and a communication state of the voice.

The sound display device according to claim 1, wherein the sound display device determines a sound generation source corresponding to the character data, and the image processing unit is displayed on the image according to the determined generation source. A voice display device characterized by changing a display mode of the character.

3. The voice display device according to claim 1, wherein the voice display device discriminates a volume of a voice input to the voice input unit and a voice emitted from the other base, and the image processing unit A voice display device, wherein a display mode of characters displayed on the image is changed according to the determined voice level.

4. The voice display device according to claim 1, wherein the voice display device determines a communication status of a voice signal or character data with another base, and the image processing unit determines the determined communication status. In response, an audio display device that causes the image to have a predetermined display indicating the communication status.

5. The voice display device according to claim 1, further comprising: a display instruction signal for requesting image processing for displaying a predetermined character on the screen in a mode different from a normal character display. It has a display instruction input part to input,
When the voice display device detects the display instruction signal, the image processing unit uses the character corresponding to the voice input to the voice input unit during the operation of the instruction input unit as the predetermined character as a normal character. An audio display device that executes image processing to be displayed on the screen in a mode different from the display of.

The voice display device according to any one of claims 1 to 5, wherein the voice display device further includes:
A storage unit for storing the character data generated by the voice processing unit and supplied from the communication processing unit;
A tally instruction input unit for inputting a tally instruction signal for a process of tallying the number of words included in the character data stored in the storage unit;
When the voice display device detects the counting instruction signal, the voice display device counts the number of words, and after completion of the counting, the image processing unit displays the result of counting the number of words on the screen. An audio display device that performs image processing.

7. The voice display device according to claim 1, wherein when the voice signal supplied from the other base to the communication processing unit does not include character data, the voice processing unit A voice display device that generates character data corresponding to a voice uttered at the other base based on a supplied voice signal.

Used in a system that performs at least two-way audio communication with other sites, and inputs an audio signal and converts the audio signal to generate an audio signal, and outputs an image based on the image signal A program that causes a computer connected to an image output unit to be displayed to function as a sound display device that displays the sound from the image output unit,
Voice processing means for generating character data based on at least the voice signal generated by the voice input unit;
Communication processing means for receiving supply of a voice signal or character data generated based on the voice emitted from the other base from the other base;
Image processing means for generating an image signal to be output from the image output unit based on at least the character data corresponding to the voice input to the voice input unit and the voice emitted from the other site;
Judgment of the communication status of the voice with the other base, and function as a control means for controlling the operation of the voice processing means, the communication processing means and the image processing means,
The voice display program according to claim 1, wherein a display mode of the image is determined according to at least one of information recognized from the character data and a voice communication state.

9. The voice display program according to claim 8, wherein the voice display program causes the computer to determine a sound generation source corresponding to the character data, and causes the image processing unit to perform the image processing according to the determined generation source. A voice display program that changes a display mode of characters displayed on the screen.

10. The voice display program according to claim 8, wherein the voice display program causes the computer to determine the magnitude of the voice input to the voice input unit and the voice uttered at the other site. A voice display program that causes the image processing means to change a display mode of characters displayed in the image according to the volume of the voice.

11. The voice display program according to claim 8, wherein the voice display program causes the computer to determine a communication status of a voice signal or character data with another base, and the image processing means performs the determination. An audio display program for causing the image to display a predetermined display indicating the communication status according to the communication status.

12. The voice display program according to claim 8, wherein the voice display program further includes an input unit for inputting a predetermined signal to the computer or a computer connected to the input unit. Function as a display instruction signal detection means for requesting image processing to display characters on the screen in a manner different from normal character display;
When the computer detects the display instruction signal, the image processing unit displays a normal character with the character corresponding to the voice input to the voice input unit during operation of the instruction input unit as the predetermined character. An audio display program for executing image processing to be displayed on the screen in a different mode.

13. The voice display program according to claim 8, further comprising an input unit for inputting a predetermined signal to the computer or connected to the input unit, and generated by the voice processing means. And a computer having a storage unit for storing the character data supplied from the communication processing unit, and a totaling instruction signal for calculating a number of words included in the character data stored in the storage unit Function as a detection means,
When the computer detects the counting instruction signal, the computer counts the number of words, and after the counting is finished, executes image processing for causing the image processing means to display the counting result of the number of words on the screen. An audio display program characterized by causing

14. The voice display program according to claim 8, wherein when the voice signal supplied from the other base to the communication processing means does not include character data, the voice processing means A voice display program for generating character data corresponding to a voice uttered at the other base based on a supplied voice signal.