JP4850690B2

JP4850690B2 - Teleconferencing equipment

Info

Publication number: JP4850690B2
Application number: JP2006349444A
Authority: JP
Inventors: 芳美渡辺; 孝幸橋本; 和人田崎
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2006-12-26
Filing date: 2006-12-26
Publication date: 2012-01-11
Anticipated expiration: 2026-12-26
Also published as: JP2008160667A

Description

本発明は、聞き手が、話者の発言内容をもれなく知ることができ、議事録を容易に作成できる通信会議装置、例えば、テレビ会議装置または音声会議装置に関する。 The present invention relates to a communication conference apparatus, for example, a video conference apparatus or an audio conference apparatus, which enables a listener to know all the contents of a speaker's utterance and to easily create minutes.

本発明は、地理的に離れた地点間でリアルタイムに会議を行うことができる通信会議システム（例えば、音声会議システムやテレビ会議システム）において、特に聞き手が容易に話者を区別することができ、発言内容を聞き逃すことがないように配慮した通信会議システム（テレビ会議装置、音声会議装置）に関する。 In the communication conference system (for example, an audio conference system or a video conference system) capable of performing a conference in real time between geographically distant points, the present invention can particularly easily distinguish speakers. The present invention relates to a communication conference system (a video conference device, an audio conference device) that is designed so as not to miss the content of a statement.

近年、通信会議システムの重要性が高まっている中で、テレビ会議システムの場合は、聞き手は通信相手の表情を見ながら会議できるので、話者を識別するのは容易である。しかし、多地点での会議の場合では、話者が複数となり、議論が白熱した際等に、参加者は、全ての話者の発言内容を聞きとれないままに会議が進行してしまうことがある。 In recent years, with the importance of communication conference systems increasing, in the case of a video conference system, the listener can easily perform a conference while looking at the facial expression of the communication partner, so it is easy to identify the speaker. However, in the case of a multi-point meeting, when there are multiple speakers and the discussion is heated up, participants may proceed without being able to hear the content of all the speakers. is there.

また、音声会議では、通信相手が見えず、聞き手は話者を識別するのが困難であり、参加者は、テレビ会議以上に、全ての話者の発言内容を聞きとれないままに会議が進行してしまうことがある。 Also, in audio conferences, the communication partner is not visible, and it is difficult for the listener to identify the speaker, and the conference progresses without the participants being able to hear the content of all speakers more than the video conference. May end up.

この問題を解決するための技術として、話者の識別が容易であり、また、音声信号を議事録として記録する音声会議システム（例えば、特許文献１参照）が提案されている。
特開平２００５−０８０１１０号公報 As a technique for solving this problem, an audio conference system (for example, see Patent Document 1) that easily identifies a speaker and records an audio signal as a minutes has been proposed.
JP-A-2005-080110

従来の技術では、会議中に話者の発言内容を聞きとれなかった場合、同席者に発言内容を尋ねたり、会議後に議事録を見ることによってしか発言内容を理解することができなかった。また、数時間に及ぶ会議の議事録を作成する際、保存した議事録の音声データを文書に変換しても会議の要点が分かりにくく、そのままでは議事録として使用できず、全体的に見直す必要がある。 In the conventional technology, if the content of a speaker's speech cannot be heard during a conference, the content of the speech can only be understood by asking the attendant for the content of the speech or by looking at the minutes after the conference. Also, when creating minutes of a meeting that lasts several hours, it is difficult to understand the main points of the meeting even if the audio data of the saved meeting minutes is converted to a document, so it cannot be used as it is, and must be reviewed as a whole. There is.

本発明は、上記の事情に鑑みて、通信会議中に、聞き逃した発言内容を知ることができ、また、誰の発言であるかを知ることができ、かつ、議事録に残す発言を選択することによって、重要な内容のみを議事録データに残すことを容易とした通信会議システムを構成する通信会議装置（テレビ会議装置、音声会議装置）を提供することを課題とする。 In view of the above circumstances, the present invention allows the user to know the content of a missed message during a teleconference, to know who the message is, and to select a message to leave in the minutes. It is an object of the present invention to provide a communication conference device (video conference device, audio conference device) constituting a communication conference system that makes it easy to leave only important contents in the minutes data.

上記課題を解決するために、本発明は、通信会議システムにおいて、モニタ画面に発言者の発言内容を発言者識別情報および文字データとして表示画面上に所望の時間遅れて表示する追いかけ表示を行い、さらに、パソコン等の設定手段からの設定により、議事録に残すデータを選択することを特徴とする。 In order to solve the above problems, the present invention provides a chasing display that displays the content of a speaker on the monitor screen as a speaker identification information and character data with a desired time delay on the monitor screen in the communication conference system. Further, the data to be stored in the minutes is selected by setting from setting means such as a personal computer.

本発明は、モニタ画面に相手の映像を表示するとともに、発言内容の文字データをモニタ画面に追いかけて表示するので、話者の声が聞き取りにくい場合でも、会議の参加者が、話者の発言内容を常に把握できるという利点がある。 In the present invention, the other party's video is displayed on the monitor screen, and the character data of the content of the speech is displayed on the monitor screen, so that even if it is difficult to hear the voice of the speaker, the participant in the conference can There is an advantage that the contents can always be grasped.

さらに、本発明によれば、通信会議中、発言内容を文字データとしてモニタ画面の一部に追いかけ表示することにより、万が一聞き逃した発言内容を知ることができる。さらに、話者識別機能により、モニタ画面に表示した発言内容が誰の発言であるか表示することができ、通信会議装置に接続されたパソコンからの設定により、議事録に残す発言を選択することによって、重要な内容のみを議事録データに残すことが容易になる。 Furthermore, according to the present invention, during a communication conference, the content of a message can be tracked and displayed on a part of the monitor screen as character data, so that the content of the message that has been missed can be known. In addition, the speaker identification function allows you to display who is speaking on the monitor screen, and select what to leave in the minutes by setting from the PC connected to the teleconferencing device. This makes it easy to leave only important content in the minutes data.

本発明は、テレビ会議システムおよび音声会議システムなどの通信会議システムにおいて、音声の発言内容を文字データに変換する文字データ変換機能と、話者を識別する話者識別機能と、モニタ画面上に文字データを表示する機能と、文字データを所望の時間遅延させてモニタ画面に表示させる遅延表示機能とを備えることを特徴とする。さらに、本発明のテレビ会議装置は、一つのモニタ画面に、相手の画像と、話者識別情報および発言内容の文字データを追いかけて表示する構成とした。したがって、本発明は、従来のテレビ会議システムに文字データ変換機能と、文字データ表示機能と、遅延表示機能を付加することにより、実現することができる。 The present invention relates to a teleconference system such as a video conference system and an audio conference system, a character data conversion function for converting speech content into character data, a speaker identification function for identifying a speaker, and a character on a monitor screen. It is characterized by having a function of displaying data and a delay display function of displaying character data on a monitor screen with a desired time delay. Furthermore, the video conference apparatus of the present invention is configured to display the other party's image, the speaker identification information, and the text data of the contents of the message on one monitor screen. Therefore, the present invention can be realized by adding a character data conversion function, a character data display function, and a delay display function to a conventional video conference system.

すなわち、本発明は、話者の音声を集音したマイクロホンからのアナログ音声信号をディジタル音声信号に変換する音声Ａ／Ｄ変換手段と、音声信号から話者を識別し特定する話者識別情報を取得する話者識別処理手段と、前記ディジタル音声信号に話者情報を付加して情報付加音声信号に変換する話者識別情報付加手段と、当該話者識別情報付加手段が出力する前記情報付加音声信号を記録する音声記録メモリと、当該情報付加音声信号を圧縮して通信網へ圧縮音声情報を送信する送信手段と、当該通信網から前記圧縮音声情報を受信する受信手段と、受信した前記圧縮音声信号を伸張して再度情報付加音声信号とし、前記付加されていた前記話者識別情報を抽出した後、再度ディジタル音声信号とする音声データ処理手段と、当該ディジタル音声信号を再度アナログ音声信号に変換してスピーカから出力する音声Ｄ／Ａ変換手段と、を有する通信会議装置であって、文字データを表示させる画面と、前記ディジタル音声信号を文字データに変換する音声―文字データ変換手段と、当該変換された文字データを記憶する文字データメモリと、該変換された文字データを前記画面に表示させる映像に変換する映像Ｄ／Ａ変換手段と、該変換された文字データを該当する話者の発言より遅れて追っかけて前記画面上に表示する文字データ表示手段と、前記音声記録メモリに記録された前記情報付加音声信号に含まれる音声データを再生しているときに、または、前記文字データメモリの文字データを前記画面に表示しているときに、当該再生している音声データ、または、当該表示している文字データに対してマーキング信号を入力するマーキング信号入力手段と、当該マーキング信号入力手段からの入力に応じて、前記音声記録メモリに記録された前記情報付加音声信号に含まれる音声データ、または、前記文字データメモリに記録された文字データ、の先頭にフラグを付与するフラグ付与手段と、該フラグ付与手段により前記フラグを付与された音声データまたは文字データを用いて議事録データを作成する第１の議事録データ作成手段と、を有する。 That is, the present invention provides voice A / D conversion means for converting an analog voice signal from a microphone that collects a voice of a speaker into a digital voice signal, and speaker identification information for identifying and specifying the speaker from the voice signal. Speaker identification processing means to be acquired, speaker identification information adding means for adding speaker information to the digital voice signal and converting it into an information added voice signal, and the information added voice output by the speaker identification information adding means An audio recording memory for recording a signal; a transmitting unit for compressing the information-added audio signal and transmitting the compressed audio information to a communication network; a receiving unit for receiving the compressed audio information from the communication network; and the received compression An audio data processing means that expands the audio signal to make it an information-added audio signal again, extracts the added speaker identification information, and then makes it a digital audio signal again; And a voice D / A conversion means for converting the voice signal again into an analog voice signal and outputting from the speaker, a screen for displaying character data, and converting the digital voice signal into character data Voice-character data conversion means, a character data memory for storing the converted character data, a video D / A conversion means for converting the converted character data into a video to be displayed on the screen, and the converted data The character data display means for displaying the character data on the screen after being delayed from the speech of the corresponding speaker, and the sound data included in the information-added sound signal recorded in the sound recording memory are reproduced. Or when the character data of the character data memory is displayed on the screen, the audio data being reproduced or the display A marking signal input means for inputting a marking signal to the character data, and voice data included in the information-added voice signal recorded in the voice recording memory in response to an input from the marking signal input means, or Flag adding means for adding a flag to the head of the character data recorded in the character data memory, and firstly creating the minutes data using the voice data or character data to which the flag is added by the flag adding means The minutes data creation means .

さらに本発明は、上記通信会議装置において、話者を撮影するカメラと、当該カメラで撮影した話者のアナログ映像信号をディジタル映像信号に変換する映像Ａ／Ｄ変換手段と、前記ディジタル映像信号を圧縮して圧縮映像信号としこの圧縮映像信号を通信網へ送信する映像信号送信手段と、前記通信網から受信した圧縮映像信号を伸張してディジタル映像信号に変換する映像信号伸張手段と、映像を表示するモニタと、前記ディジタル映像信号に前記話者識別情報と発言内容の文字データを重畳させて前記モニタに表示させるＯＳＤ機能部とを有する。 Furthermore, the present invention provides a camera for photographing a speaker, a video A / D conversion means for converting an analog video signal of the speaker photographed by the camera into a digital video signal, and the digital video signal. A video signal transmitting means for transmitting the compressed video signal to a communication network by compressing the compressed video signal; a video signal expanding means for expanding the compressed video signal received from the communication network into a digital video signal; having a monitor for displaying, the OSD function unit to be displayed on the monitor the speaker identification information is superimposed character data of the speech content in the digital video signal.

本発明は、上記通信会議装置において、前記第１の議事録データ作成手段に換えて、
前記話者識別情報を指定することで前記識別情報付加音声信号を用いて議事録データを作成する第２の議事録データ作成手段を有する。また、本発明は、上記通信会議装置が、テレビ会議装置または音声会議装置であることを特徴とする。 The present invention provides the communication conference apparatus, wherein instead of the first minutes data creating means,
There is provided second minutes data creating means for creating minutes data using the identification information added voice signal by designating the speaker identification information . Further, the present invention is characterized in that the communication conference device is a video conference device or an audio conference device.

図１を用いて、通信会議システムのひとつである本発明にかかるテレビ会議システムを構成するテレビ会議装置１のハードウエア構成を、図２を用いてテレビ会議装置１の機能構成を説明する。図１は本発明にかかるテレビ会議システムのハードウエアの構成を説明する図である。図２は図１に示したテレビ会議システムの機能構成を説明する図である。図３は図１に示したテレビ会議装置の複数台を互いに通信網に接続してテレビ会議システムを構成した例を示す図である。 The hardware configuration of the video conference apparatus 1 constituting the video conference system according to the present invention, which is one of the communication conference systems, will be described with reference to FIG. 1, and the functional configuration of the video conference apparatus 1 will be described with reference to FIG. FIG. 1 is a diagram for explaining the hardware configuration of a video conference system according to the present invention. FIG. 2 is a diagram for explaining a functional configuration of the video conference system shown in FIG. FIG. 3 is a diagram showing an example in which a video conference system is configured by connecting a plurality of video conference apparatuses shown in FIG. 1 to a communication network.

テレビ会議装置１は、音声処理部１１と、映像処理部１３と、記憶部１４と、通信網インタフェース１５と、ＣＰＵ１６と、バス１７とを備えて構成される。さらに、テレビ会議装置１には、通信網（インターネット、構内ＬＡＮなど）３と、パソコン１８と、複数のマイクロホン１９１と、スピーカ１９２と、カメラ１９４と、モニタ１９３が接続される。 The video conference apparatus 1 includes an audio processing unit 11, a video processing unit 13, a storage unit 14, a communication network interface 15, a CPU 16, and a bus 17. Furthermore, a communication network (Internet, local area LAN, etc.) 3, a personal computer 18, a plurality of microphones 191, a speaker 192, a camera 194, and a monitor 193 are connected to the video conference apparatus 1.

音声処理部１１は、複数の音声Ａ／Ｄ変換回路１１１と、音声Ｄ／Ａ変換回路１１２と、音声ＣＯＤＥＣ１１５とを有している。 The audio processing unit 11 includes a plurality of audio A / D conversion circuits 111, an audio D / A conversion circuit 112, and an audio CODEC 115.

映像処理部１３は、映像Ｄ／Ａ変換回路１３１と、映像Ａ／Ｄ変換回路１３２と、ＯＳＤ機能回路１３３と、映像ＣＯＤＥＣ１３４とを有している。 The video processing unit 13 includes a video D / A conversion circuit 131, a video A / D conversion circuit 132, an OSD function circuit 133, and a video CODEC 134.

記憶部１４は、主メモリ１４１と、音声一時記録メモリ１４２と、音声記録メモリ１４３と、文字データメモリ１４４を有している。 The storage unit 14 includes a main memory 141, a temporary voice recording memory 142, a voice recording memory 143, and a character data memory 144.

音声ＣＯＤＥＣ１１５と、映像ＣＯＤＥＣ１３４と、ＯＳＤ機能部１３３と、ＣＰＵ１６と、主メモリ１４１と、音声一時記録メモリ１４２と、音声記録メモリ１４３と、文字データメモリ１４４は、それぞれバス１７によって接続され、バス１７を介して相互にデータや信号を交換する。 The audio CODEC 115, the video CODEC 134, the OSD function unit 133, the CPU 16, the main memory 141, the audio temporary recording memory 142, the audio recording memory 143, and the character data memory 144 are connected by a bus 17, respectively. Exchange data and signals with each other.

音声Ａ／Ｄ変換回路１１１は、複数のマイクロホン１９１からのアナログ音声信号をそれぞれＡ／Ｄ変換してディジタル音声信号を得る手段である。このディジタル音声信号は、音声ＣＯＤＥＣ１１５に出力される。 The audio A / D conversion circuit 111 is a means for A / D converting analog audio signals from a plurality of microphones 191 to obtain digital audio signals. This digital audio signal is output to the audio CODEC 115.

音声Ｄ／Ａ変換回路１１２は、音声ＣＯＤＥＣ１１５からのディジタル音声信号をＤ／Ａ変換してアナログ音声信号を得て、スピーカ１９２へ出力する。 The audio D / A conversion circuit 112 D / A converts the digital audio signal from the audio CODEC 115 to obtain an analog audio signal, and outputs it to the speaker 192.

音声ＣＯＤＥＣ１１５は、複数のアナログ音声信号のうちもっとも大きな出力の音声信号を主マイクロホンの位置と認識し、このマイクロホンに対応した話者に関する情報を話者識別情報とするとともに、この話者識別情報と現在時刻を音声Ａ／Ｄ変換回路１１１からのディジタル音声信号に付加して情報付加音声信号に変換し、この情報付加映像信号を圧縮して圧縮音声信号としてバス１７に出力する。さらに、音声ＣＯＤＥＣ１１５は、通信網３から通信インタフェース１５を介して受信した圧縮音声信号を伸張して情報付加音声信号を得、この情報付加音声信号から話者識別情報と現在時刻を抜き取った後のディジタル音声信号を音声Ｄ／Ａ変換回路１１２に出力する。音声Ｄ／Ａ変換回路１１２では、ディジタル音声信号にＤ／Ａ変換処理を行ない、アナログ音声信号としてスピーカ１９２へ出力する。音声ＣＯＤＥＣ１１５は、同時にＣＰＵ１６からの情報付加音声信号を記憶部１４の音声一時記録メモリ１４２に記録する。 The voice CODEC 115 recognizes the voice signal having the largest output among the plurality of analog voice signals as the position of the main microphone, and sets information regarding the speaker corresponding to the microphone as speaker identification information. The current time is added to the digital audio signal from the audio A / D conversion circuit 111 to convert it into an information added audio signal, and this information added video signal is compressed and output to the bus 17 as a compressed audio signal. Further, the voice CODEC 115 obtains an information-added voice signal by expanding the compressed voice signal received from the communication network 3 via the communication interface 15, and extracts the speaker identification information and the current time from the information-added voice signal. The digital audio signal is output to the audio D / A conversion circuit 112. The audio D / A conversion circuit 112 performs D / A conversion processing on the digital audio signal and outputs it to the speaker 192 as an analog audio signal. The audio CODEC 115 simultaneously records the information added audio signal from the CPU 16 in the audio temporary recording memory 142 of the storage unit 14.

映像Ｄ／Ａ変換回路１３１は、ディジタル映像信号にＤ／Ａ変換処理を行なってモニタ出力用のアナログ映像信号に変換し、アナログ映像信号をモニタ１９３へ出力する。 The video D / A conversion circuit 131 performs D / A conversion processing on the digital video signal to convert it to an analog video signal for monitor output, and outputs the analog video signal to the monitor 193.

映像Ａ／Ｄ変換回路１３２は、カメラ１９４からのアナログ映像信号をディジタル映像信号にＡ／Ｄ変換する。 The video A / D conversion circuit 132 A / D converts the analog video signal from the camera 194 into a digital video signal.

ＯＳＤ機能部１３３は、音声信号の内容を認識して文字データに変換する音声−文字データ変換機能を有しており、ＣＰＵ１６からのオーダにより、音声一時記録メモリ１４２から音声信号と話者識別情報を読み出し、音声信号をモニタ表示用の文字データに変換して、話者識別情報とともに、映像ＣＯＤＥＣ１３４に出力する。また、ＯＳＤ機能部１３３は、変換した文字データおよび識別情報を文字データメモリ１４４に出力する。 The OSD function unit 133 has a voice-to-character data conversion function for recognizing the contents of a voice signal and converting it into character data, and the voice signal and speaker identification information are read from the voice temporary recording memory 142 by an order from the CPU 16. Are converted into character data for monitor display and output to the video CODEC 134 together with the speaker identification information. The OSD function unit 133 also outputs the converted character data and identification information to the character data memory 144.

映像ＣＯＤＥＣ１３４は、映像Ａ／Ｄ変換回路１３２からのディジタル映像信号を圧縮して圧縮映像信号としてバス１７に出力する。また、映像ＣＯＤＥＣ１３４は、バス１７から受信した圧縮映像信号を伸長してディジタル映像信号に変換する。さらに、映像ＣＯＤＥＣ１３４は、伸長変換したディジタル映像信号に、ＯＳＤ機能部１３３からのモニタ表示用の文字データおよび話者識別情報を重畳させて情報付加ディジタル映像信号とし、映像Ｄ／Ａ変換回路１３１においてＤ／Ａ変換を行なって、文字データおよび話者識別情報が付加されたアナログ映像信号に変換して、モニタ１９３の画面に表示させる。このようにしてモニタ１９３の画面に表示したカメラ映像の一部に、話者識別情報と発言内容を順次表示させる。 The video CODEC 134 compresses the digital video signal from the video A / D conversion circuit 132 and outputs it to the bus 17 as a compressed video signal. The video CODEC 134 decompresses the compressed video signal received from the bus 17 and converts it into a digital video signal. Further, the video CODEC 134 superimposes monitor display character data and speaker identification information from the OSD function unit 133 on the decompressed digital video signal to form an information-added digital video signal. D / A conversion is performed to convert the data into an analog video signal to which character data and speaker identification information are added, and displayed on the screen of the monitor 193. In this way, the speaker identification information and the content of the speech are sequentially displayed on a part of the camera video displayed on the screen of the monitor 193.

主メモリ１４１は、本装置の主メモリであり、ＣＰＵ１６を動作させるためのプログラムの格納や、動作中の一時的な保持メモリとして動作する部分である。 The main memory 141 is a main memory of this apparatus, and is a part that stores a program for operating the CPU 16 and operates as a temporary holding memory during operation.

音声一時記録メモリ１４２は、話者識別情報と現在時刻を付加した情報付加音声信号を一時記録する手段である。音声一時記録メモリ１４２に記録された情報付加音声信号は、パソコン１８からの指定により、通信網インタフェース１５→ＣＰＵ１６→音声一時記録メモリ１４２とアクセスされ、指定された音声データのみが音声一時記録メモリ１４２から音声記録メモリ１４３に、議事録データとして順次格納されていく。 The temporary voice recording memory 142 is means for temporarily recording an information-added voice signal to which speaker identification information and the current time are added. The information-added audio signal recorded in the audio temporary recording memory 142 is accessed by the communication network interface 15 → the CPU 16 → the audio temporary recording memory 142 by the designation from the personal computer 18, and only the designated audio data is stored in the audio temporary recording memory 142. Are sequentially stored in the voice recording memory 143 as minutes data.

音声記録メモリ１４３は、音声一時記録メモリ１４２からの音声データのうち、パソコン１８から指定された音声データを話者識別情報とともに議事録データとして順次格納する。 The voice recording memory 143 sequentially stores the voice data designated by the personal computer 18 among the voice data from the voice temporary recording memory 142 together with the speaker identification information as minutes data.

文字データメモリ１４４は、ＯＳＤ機能部１３３において音声−文字データ変換処理した文字データおよび話者識別情報を格納する。 The character data memory 144 stores character data and speaker identification information subjected to voice-character data conversion processing in the OSD function unit 133.

通信網インタフェース１５は、通信網３との間で送受信する圧縮映像信号または圧縮音声信号の電気インタフェースを変換して、通信網３に送出し、通信網３から受信した圧縮映像信号または圧縮音声信号の電気インタフェースを変換して装置内に取り込む手段である。すなわち、通信網インタフェース１５は、通信網３のインタフェースに合わせたデータフォーマットに変換された圧縮映像信号または圧縮音声信号の電気インタフェースを変換して、通信網３にデータを送信し、通信網３から圧縮映像信号または圧縮音声信号を受信して、装置内でのデータフォーマットに電気インタフェースを変換する。 The communication network interface 15 converts an electrical interface of a compressed video signal or a compressed audio signal to be transmitted / received to / from the communication network 3, sends it to the communication network 3, and receives a compressed video signal or a compressed audio signal received from the communication network 3. It is a means for converting the electrical interface and taking it into the apparatus. That is, the communication network interface 15 converts the electrical interface of the compressed video signal or the compressed audio signal converted into the data format suitable for the interface of the communication network 3, transmits the data to the communication network 3, and transmits the data from the communication network 3. A compressed video signal or a compressed audio signal is received and the electrical interface is converted to a data format within the device.

ＣＰＵ１６は、情報付加音声信号を通信網３のインタフェースに合わせたデータフォーマットの圧縮音声信号に変換するよう音声ＣＯＤＥＣ１１５に指示する。さらに、ＣＰＵ１６は、受信した圧縮音声信号を通信網３のデータフォーマットから装置内の情報付加音声信号に変換するよう音声ＣＯＤＥＣ１１５に指示する。また、ＣＰＵ１６は、圧縮映像信号を通信網３のインタフェースに合わせたデータフォーマットに変換するよう映像ＣＯＤＥＣ１３４に指示する。また、ＣＰＵ１６は、受信した通信網３の電気インタフェースを変換した圧縮映像信号のデータフォーマットを装置内のディジタル映像信号に変換する。 The CPU 16 instructs the audio CODEC 115 to convert the information-added audio signal into a compressed audio signal having a data format adapted to the interface of the communication network 3. Further, the CPU 16 instructs the audio CODEC 115 to convert the received compressed audio signal from the data format of the communication network 3 into the information-added audio signal in the apparatus. Further, the CPU 16 instructs the video CODEC 134 to convert the compressed video signal into a data format suitable for the interface of the communication network 3. The CPU 16 converts the data format of the compressed video signal obtained by converting the electrical interface of the received communication network 3 into a digital video signal in the apparatus.

音声記録メモリ１４３に記録されたデータの中から必要な情報を読み出す場合は、パソコン１８から識別信号、記録時間、発言内容の一部等のキーワードを指定し、音声記録メモリ１４３を検索することにより必要な情報を読み出すことができる。検索した結果は、モニタ１９３に表示する。結果が複数の場合、記録時間の早いものから順に話者の識別情報と発言内容を表示する。 When necessary information is read from the data recorded in the voice recording memory 143, keywords such as an identification signal, a recording time, a part of the contents of speech, etc. are designated from the personal computer 18 and the voice recording memory 143 is searched. Necessary information can be read out. The retrieved result is displayed on the monitor 193. When there are a plurality of results, the identification information of the speaker and the content of the speech are displayed in order from the earliest recording time.

会議終了後、保存された議事録音声データを、パソコン１８からの操作により、通信網インタフェース１５→ＣＰＵ１６→音声記録メモリ１４３とアクセスし、パソコン１８へ読み出すことができる。読み出した議事録データをパソコンの文字変換ソフト等により文書化し、議事録を作成する。 After the conference is completed, the saved minutes audio data can be accessed from the personal computer 18 by accessing the communication network interface 15 → CPU 16 → voice recording memory 143 and read out to the personal computer 18. Document the minutes data read out using character conversion software on a personal computer and create minutes.

上述のような議事録音声データを読み出す手法に代えて、パソコン１８からの操作により、通信網インタフェース１５→ＣＰＵ１６→文字データメモリ１４４とアクセスし、パソコン１８へ議事録文字データを読み出し、この議事録文字データを用いて議事録を作成することができる。 Instead of the method of reading the minutes audio data as described above, the communication network interface 15 → the CPU 16 → the character data memory 144 is accessed by the operation from the personal computer 18 to read the minutes character data to the personal computer 18 and this minutes. Minutes can be created using character data.

図２を用いて、本発明にかかるテレビ会議システムを構成するテレビ会議装置の機能構成の１実施例を説明する。 An embodiment of the functional configuration of the video conference apparatus constituting the video conference system according to the present invention will be described with reference to FIG.

まず、図２を用いてテレビ会議装置の機能構成の概要を説明する。テレビ会議装置１は、音声処理部１１と、議事録作成処理部１２と、映像処理部１３と、記憶部１４と、通信処理部１５とを有している。さらに、テレビ会議装置１は、複数のマイクロホン１９１と、スピーカ１９２と、モニタ１９３と、カメラ１９４と、パソコン１８が接続される。テレビ会議装置１は、通信網３を介して他のテレビ会議装置と接続される。 First, an outline of a functional configuration of the video conference apparatus will be described with reference to FIG. The video conference apparatus 1 includes an audio processing unit 11, a minutes creation processing unit 12, a video processing unit 13, a storage unit 14, and a communication processing unit 15. Furthermore, the video conference apparatus 1 is connected to a plurality of microphones 191, speakers 192, a monitor 193, a camera 194, and a personal computer 18. The video conference apparatus 1 is connected to another video conference apparatus via the communication network 3.

音声処理部１１は、マイクロホン１９１から入力されたアナログ音声信号をディジタル音声信号に変換する音声Ａ／Ｄ変換処理機能１１１Ｍと、ディジタル音声信号をアナログ音声信号に変換してスピーカ１９２に出力する音声Ｄ／Ａ変換処理機能１１２Ｍと、音声信号が入力されたマイクロホン１９１の位置から話者を識別する話者識別機能１１３Ｍと、ディジタル音声信号に話者識別情報および現在時刻を付加する話者識別・現在時刻付加処理機能１１４Ｍと、ディジタル音声信号を通信網３に送り出すのに適した音声データ形式に変換するとともに通信網３から受信した音声信号を内部処理に適した音声データ形式に変換する音声データ送受信処理機能１１５Ｍとを有している。 The audio processing unit 11 includes an audio A / D conversion processing function 111M that converts an analog audio signal input from the microphone 191 into a digital audio signal, and an audio D that converts the digital audio signal into an analog audio signal and outputs the analog audio signal to the speaker 192. / A conversion processing function 112M, speaker identification function 113M for identifying the speaker from the position of the microphone 191 to which the voice signal is input, and speaker identification / current that adds the speaker identification information and the current time to the digital voice signal Time addition processing function 114M and voice data transmission / reception for converting a digital voice signal into a voice data format suitable for sending out to the communication network 3 and converting a voice signal received from the communication network 3 into a voice data format suitable for internal processing And a processing function 115M.

音声処理部１１は、マイクロホン１９１から入力されたアナログ音声信号をディジタル音声信号に変換した後、話者識別処理機能１１３Ｍで取得した話者識別情報および現在時刻情報をディジタル音声信号に付加して情報付加音声信号とし、さらにこの情報付加音声信号を通信網（インターネット、構内ＬＡＮなど）３へ送り出すに適した音声データ形式にした圧縮音声信号に変換して通信処理部１５へ送り出す働きを有している。また、音声処理部１１は、通信網３から通信処理部１５を介して受信した圧縮音声信号を伸張して情報付加音声信号とした後、話者識別情報および現在時刻情報を取り除いてディジタル音声信号とし、音声Ｄ／Ａ変換処理機能１１２Ｍでアナログ音声信号に変換してスピーカ１９２から出力する働きを有している。 The voice processing unit 11 converts the analog voice signal input from the microphone 191 into a digital voice signal, and then adds the speaker identification information and the current time information acquired by the speaker identification processing function 113M to the digital voice signal. As an additional audio signal, the information additional audio signal is converted into a compressed audio signal in an audio data format suitable for being sent out to the communication network (Internet, local area LAN, etc.) 3 and sent to the communication processing unit 15. Yes. The voice processing unit 11 expands the compressed voice signal received from the communication network 3 via the communication processing unit 15 to obtain an information-added voice signal, and then removes the speaker identification information and the current time information to obtain a digital voice signal. The voice D / A conversion processing function 112M converts it into an analog voice signal and outputs it from the speaker 192.

議事録作成処理部１２は、マーキング遡り処理機能１２１Ｍを有しており、パソコン１８からの指示によって、音声記録メモリ１４３に格納された音声データもしくは文字データメモリ１４４に格納された文字データに、議事録として採用することを示すマーキングをデータを遡ってマーキングする働きを有している。 The minutes creation processing unit 12 has a retroactive marking processing function 121M, and in response to an instruction from the personal computer 18, the minutes data is stored in the voice data stored in the voice recording memory 143 or the character data stored in the character data memory 144. It has the function of marking the data retrospectively to indicate that it is adopted as a record.

映像処理部１３は、通信網３から受信した圧縮映像信号から変換されたディジタル映像信号に話者識別情報および文字データを付加した情報付加映像信号を、モニタ１９３に表示するアナログ信号に変換処理する映像Ｄ／Ａ変換処理機能１３１Ｍと、表示画面内に文字データを表示させる処理を行うＯＳＤ（ＯｎＳｃｒｅｅｎＤｉｓｐｌａｙ）処理機能１３３Ｍと、映像Ａ／Ｄ変換処理機能１３２Ｍからのディジタル映像信号を通信網３に送り出すのに適した画像データ形式とした圧縮映像信号に変換するとともに、通信網３から受信した圧縮映像信号を内部処理に適した映像データ形式に変換する映像データ送受信処理機能１３４Ｍとを有している。 The video processing unit 13 converts the information-added video signal obtained by adding the speaker identification information and the character data to the digital video signal converted from the compressed video signal received from the communication network 3 into an analog signal to be displayed on the monitor 193. The video D / A conversion processing function 131M, the OSD (On Screen Display) processing function 133M that performs processing for displaying character data on the display screen, and the digital video signal from the video A / D conversion processing function 132M are transmitted over the communication network 3. And a video data transmission / reception processing function 134M for converting the compressed video signal received from the communication network 3 into a video data format suitable for internal processing. ing.

映像Ａ／Ｄ変換処理機能１３２Ｍと、ＯＳＤ処理機能１３３Ｍと、映像データ送受信処理機能１３４Ｍは、映像ＣＯＤＥＣ部１３４がＣＰＵ１６と共同して実現される機能である。ＯＳＤ処理機能１３３Ｍは、ＯＳＤ処理機能３３１Ｍと、音声−文字データ変換処理機能３３２Ｍとから成る。音声―文字データ変換処理機能３３２Ｍは、音声一時記憶メモリ１４２に記憶された音声データを読出し、その内容を認識して文字データに変換する機能を有している。この音声−文字データ変換処理は相当の処理時間を必要とするので、ＯＳＤ機能によってモニタの画面に表示するときは、話者の発言から少し遅れて表示されるが、聞き逃しなどを確認するには具合のよい表示タイミングになる。音声−文字データ変換処理に要する時間が短い場合には、文字データのＯＳＤ表示に遅延を入れて表示タイミングを望みの時間遅らせて表示する。ＯＳＤ処理機能３３１Ｍは、音声−文字データ変換処理機能３３２Ｍが変換した文字データを話者識別情報とともにディジタル映像信号に重畳し、情報付加映像信号とする機能である。 The video A / D conversion processing function 132M, the OSD processing function 133M, and the video data transmission / reception processing function 134M are functions realized by the video CODEC unit 134 in cooperation with the CPU 16. The OSD processing function 133M includes an OSD processing function 331M and a voice-character data conversion processing function 332M. The voice-character data conversion processing function 332M has a function of reading voice data stored in the voice temporary storage memory 142, recognizing the content, and converting it into character data. Since this voice-to-character data conversion process requires considerable processing time, when displayed on the monitor screen by the OSD function, it is displayed with a slight delay from the speaker's speech. Is a good display timing. When the time required for the voice-character data conversion processing is short, the display timing is delayed by a desired time by delaying the OSD display of the character data. The OSD processing function 331M is a function that superimposes the character data converted by the voice-character data conversion processing function 332M on the digital video signal together with the speaker identification information to form an information-added video signal.

ＯＳＤ処理機能１３３Ｍの情報付加映像信号力は、映像Ｄ／Ａ変換処理機能１３１Ｍに出力される。 The information-added video signal power of the OSD processing function 133M is output to the video D / A conversion processing function 131M.

記憶部１４は、主メモリ１４１と、音声一時記録メモリ１４２と、音声記録メモリ１４３と、文字データメモリ１４４を有している。各メモリの働きは、図１の説明と同じであるので説明を省略する。 The storage unit 14 includes a main memory 141, a temporary voice recording memory 142, a voice recording memory 143, and a character data memory 144. Since the function of each memory is the same as the description of FIG.

このようなテレビ会議装置１において、議事録作成者がテレビ会議における発言内容を聞き取りながらまたは文字化された発言内容を読み取りながら議事録として採用すべきと判断したときに、例えばパソコン１８から議事録として採用することを示すマーキング信号を入力することによって、音声記録メモリ１４３に格納された音声データ（発言内容）または文字データメモリ１４４に格納された文字化された発言内容の先頭に遡って音声議事録に採用のフラグを立てる。このフラグは、記憶部１４の音声メモリ１４２または文字データメモリ１４４に記録されたデータの先頭に付与される。 In such a video conference apparatus 1, when the minutes creator decides that the minutes should be adopted while listening to the content of the speech in the video conference or reading the written content of the speech, for example, the minutes from the personal computer 18. By inputting a marking signal indicating that the voice data is to be adopted as a voice conference (speech content) stored in the voice recording memory 143 or the voiced talk content stored in the character data memory 144 is traced back to the beginning of the speech Flag the record for adoption. This flag is given to the head of the data recorded in the voice memory 142 or the character data memory 144 of the storage unit 14.

議事録作成時には、音声記録メモリ１４３および／または文字データメモリ１４４にパソコン１８からアクセスし、その内容を確認しながら議事録を作成することができる。 At the time of creating the minutes, the minutes can be created while accessing the voice recording memory 143 and / or the character data memory 144 from the personal computer 18 and confirming the contents thereof.

通信処理部（通信網インタフェース）１５は、自己のテレビ会議装置１に通信網３を介して対向して設けられた少なくとも一つのテレビ会議装置との間で音声データおよび画像データならびに文字データをそれぞれ異なる帯域を用いて送受信するための処理を行う。さらに、通信処理部１５は、一部に設けたパソコン１８との間の通信のための処理を実行する。 The communication processing unit (communication network interface) 15 transmits audio data, image data, and character data to and from at least one video conference device provided opposite to the video conference device 1 via the communication network 3. Processing for transmitting and receiving using different bands is performed. Furthermore, the communication processing unit 15 executes processing for communication with the personal computer 18 provided in part.

以下、テレビ会議装置１の動作について説明する。まず、音声信号の処理について述べる。複数のマイクロホン１９１からのアナログ音声信号を音声Ａ／Ｄ変換機能１１１ＭでＡ／Ｄ変換してディジタル音声信号とし、話者識別処理機能１１３Ｍによりアナログ音声信号の主マイクロホン１９１の位置から話者を識別して話者識別情報を取得し、話者識別・現在時刻付加処理機能１１４Ｍにより、話者識別情報と現在時刻をディジタル音声信号に付加して情報付加音声信号とし、この情報付加音声信号を音声データ送受信処理機能１１５Ｍと音声一時記録メモリ１４２へ送出する。 Hereinafter, the operation of the video conference apparatus 1 will be described. First, audio signal processing will be described. Analog voice signals from a plurality of microphones 191 are A / D converted by the voice A / D conversion function 111M into digital voice signals, and the speaker is identified from the position of the main microphone 191 of the analog voice signal by the speaker identification processing function 113M. Then, the speaker identification information is acquired, and the speaker identification / current time addition processing function 114M adds the speaker identification information and the current time to the digital voice signal as an information added voice signal. The data transmission / reception processing function 115M and the temporary audio recording memory 142 are transmitted.

情報付加音声信号を受信した音声データ送受信処理機能１１５Ｍは、情報付加音声信号を通信網３のインタフェースに合わせたデータフォーマットに圧縮変換処理して圧縮音声信号として通信処理部１５へ送り出す。通信処理部１５は、通信網３に合わせて電気インタフェースを変換して、通信網３にデータを送信する。同時に音声データ送受信処理機能１１５Ｍにより話者識別情報と現在時刻を付加したディジタル音声信号を記憶部１４に設けた音声一時記録メモリ１４２に記録する。 The voice data transmission / reception processing function 115M that has received the information-added voice signal compresses and converts the information-added voice signal into a data format that matches the interface of the communication network 3, and sends it to the communication processing unit 15 as a compressed voice signal. The communication processing unit 15 converts the electrical interface according to the communication network 3 and transmits data to the communication network 3. At the same time, the voice data transmission / reception processing function 115M records the digital voice signal added with the speaker identification information and the current time in the voice temporary recording memory 142 provided in the storage unit 14.

また、通信網３からのディジタル音声信号を通信処理部１５で受信し、電気インタフェースを変換して圧縮音声信号とした後、音声データ送受信処理機能１１５Ｍに出力する。 In addition, the communication processing unit 15 receives a digital audio signal from the communication network 3, converts the electrical interface into a compressed audio signal, and outputs the compressed audio signal to the audio data transmission / reception processing function 115M.

音声データ送受信処理機能１１５Ｍは、通信網のデータフォーマットから装置内のデータフォーマットに変換して話者識別情報と現在時刻を付加した情報付加音声信号に変換し、話者識別情報と現在時刻を抜き取ったディジタル音声信号を音声Ｄ／Ａ変換処理機能１１２Ｍに送出する。同時に、音声データ送受信処理機能１１５Ｍは、話者識別情報と現在時刻を付加したディジタル音声信号を記憶部１４の音声一時記録メモリ１４２に記録する。 The voice data transmission / reception processing function 115M converts the data format of the communication network into a data format in the apparatus and converts it into an information-added voice signal to which speaker identification information and the current time are added, and extracts the speaker identification information and the current time. The digital audio signal is sent to the audio D / A conversion processing function 112M. At the same time, the voice data transmission / reception processing function 115M records the digital voice signal to which the speaker identification information and the current time are added in the voice temporary recording memory 142 of the storage unit 14.

音声Ｄ／Ａ変換処理機能１１２Ｍは、ディジタル音声信号をＤ／Ａ変換してアナログ音声信号とし、スピーカ１９２へ出力する。 The voice D / A conversion processing function 112M converts the digital voice signal into an analog voice signal by D / A conversion and outputs it to the speaker 192.

次に、映像信号の処理について述べる。映像Ａ／Ｄ変換処理機能１３２Ｍは、カメラ１９４からのアナログ映像信号をＡ／Ｄ変換してディジタル映像信号を得て映像データ送受信処理機能１３４Ｍへ送出する。映像データ送受信処理機能１３４Ｍは、このディジタル映像信号を、通信網３のインタフェースに合わせたデータフォーマットの圧縮映像信号に圧縮変換した後、通信処理部１５に送信する。通信処理部１５は、圧縮映像信号の電気インタフェースを変換して通信網内圧縮映像信号を得て、通信網３に送信する。 Next, video signal processing will be described. The video A / D conversion processing function 132M A / D converts the analog video signal from the camera 194 to obtain a digital video signal, and sends it to the video data transmission / reception processing function 134M. The video data transmission / reception processing function 134M compresses and converts this digital video signal into a compressed video signal in a data format adapted to the interface of the communication network 3, and then transmits it to the communication processing unit 15. The communication processing unit 15 converts the electrical interface of the compressed video signal to obtain a compressed video signal in the communication network, and transmits it to the communication network 3.

一方、通信網３からの通信網内圧縮映像信号を受信した通信処理部１５は、電気インタフェースを変換した後、通信網３のデータフォーマットを装置内のデータフォーマットに変換して圧縮映像信号とした後、映像データ送受信処理機能１３４Ｍに送信する。映像データ送受信処理機能１３４Ｍは、圧縮映像信号を伸長変換してディジタル映像信号を得て、ＯＳＤ処理機能１３３Ｍへ送出する。ＯＳＤ処理機能１３３Ｍは、文字データおよび話者識別情報を付加して情報付加映像信号として映像Ｄ／Ａ変換処理機能１３１Ｍへ出力する。映像Ｄ／Ａ変換処理機能１３１Ｍは、情報付加映像信号をアナログ映像信号に変換してモニタ１９３へ出力する。 On the other hand, the communication processing unit 15 that has received the compressed video signal in the communication network 3 from the communication network 3 converts the electrical interface, and then converts the data format of the communication network 3 into the data format in the apparatus to obtain a compressed video signal. Then, it transmits to the video data transmission / reception processing function 134M. The video data transmission / reception processing function 134M decompresses and converts the compressed video signal to obtain a digital video signal, and sends it to the OSD processing function 133M. The OSD processing function 133M adds the character data and the speaker identification information and outputs the information added video signal to the video D / A conversion processing function 131M. The video D / A conversion processing function 131M converts the information-added video signal into an analog video signal and outputs it to the monitor 193.

ＯＳＤ処理機能１３３Ｍは、ＣＰＵ１６からのオーダにより、記憶部１４の音声一時記録メモリ１４２から音声信号と話者識別情報を読み出し、モニタ表示用の文字データに変換する。ＯＳＤ処理機能１３３Ｍは、ディジタル映像信号に、ＯＳＤ処理機能１３３Ｍからのモニタ表示用の文字データと話者識別情報を重畳させて、モニタ画面に表示するカメラ映像の一部に、話者識別情報と発言内容の文字データを順次表示させる。 The OSD processing function 133M reads out the voice signal and the speaker identification information from the temporary voice recording memory 142 of the storage unit 14 according to the order from the CPU 16, and converts them into character data for monitor display. The OSD processing function 133M superimposes the character data for monitor display from the OSD processing function 133M and the speaker identification information on the digital video signal, and the speaker identification information and the part of the camera video displayed on the monitor screen. The character data of the utterance contents are displayed sequentially.

モニタ１９３に表示させる映像は、自己のテレビ会議装置のカメラで撮影した映像と文字データなどを表示するほかに、通信網３を介して取得した他のテレビ会議装置からの映像と文字データなどを表示させることができる。また、自己テレビ会議装置の映像と他テレビ会議装置の映像を、後述する分割表示方式で同一画面上に分割表示させ、さらに、話者の文字データを画面上に文字データとして表示することができる。 The video to be displayed on the monitor 193 displays video and character data from other video conference devices acquired via the communication network 3 in addition to displaying video and character data taken by the camera of the video conference device. Can be displayed. In addition, the video of the self-videoconferencing device and the video of the other videoconferencing device can be divided and displayed on the same screen by the split display method to be described later, and the character data of the speaker can be displayed as character data on the screen. .

記憶部１４の音声一時記録メモリ１４２は、音声データを一時記録し、パソコン１８からの指定により、通信処理部１５→ＣＰＵ１６→記憶部１４の音声一時記録メモリ１４２とアクセスし、指定された音声データのみを音声一時記録メモリ１４２から音声記録メモリ１４３に、議事録データとして格納していく。 The voice temporary recording memory 142 of the storage unit 14 temporarily records voice data, and accesses the voice temporary recording memory 142 of the communication processing unit 15 → CPU 16 → storage unit 14 according to designation from the personal computer 18 to designate the designated voice data. Are stored as minutes data in the voice recording memory 143 from the voice temporary recording memory 142.

また、音声記録メモリ１４３に記録されたデータの中から、必要な情報を読み出すときには、パソコン１８から識別信号、記録時間、発言内容の一部等のキーワードを指定し、検索することができる。検索した結果は、パソコン１８のモニタに表示する。結果が複数の場合、記録時間の早いものから順に話者の識別情報と発言内容を表示する。 Further, when necessary information is read out from the data recorded in the voice recording memory 143, a keyword such as an identification signal, a recording time, a part of the contents of speech, and the like can be specified and searched from the personal computer 18. The retrieved result is displayed on the monitor of the personal computer 18. When there are a plurality of results, the identification information of the speaker and the content of the speech are displayed in order from the earliest recording time.

会議終了後、保存された議事録データを、パソコン１８からの操作により、通信処理部１５→ＣＰＵ１６→音声記録メモリ１４２とアクセスし、パソコン１８へ読み出すことができる。読み出した議事録データをパソコン１８の文字変換ソフト等により文書化し、議事録を作成する。 After the meeting, the stored minutes data can be accessed from the personal computer 18 by accessing the communication processing unit 15 → CPU 16 → voice recording memory 142 and read out to the personal computer 18. The read minutes data is documented by the character conversion software of the personal computer 18 to create the minutes.

次に、図３を用いて、上記テレビ会議装置１を用いて、Ａ地点およびＢ地点ならびにＣ地点の３地点間でテレビ会議を行う場合のテレビ会議システムの構成を説明する。図３は、テレビ会議システムの一例を示し、本発明のテレビ会議装置１を用いたテレビ会議システムは、３地点間のみでなく、２地点間や４、５地点間での接続も可能である。また、２地点間の場合、通信網３におけるＭＣＵ（多地点接続装置）３１は不要である。 Next, the configuration of the video conference system in the case where a video conference is performed between the A point, the B point, and the C point using the video conference apparatus 1 will be described with reference to FIG. FIG. 3 shows an example of a video conference system. A video conference system using the video conference apparatus 1 of the present invention can be connected not only between three points but also between two points or between four and five points. . In the case of two points, an MCU (multi-point connection device) 31 in the communication network 3 is not necessary.

まず、３地点のテレビ会議装置１Ａ〜１Ｃ間を、通信網３を介して互いに接続する。各テレビ会議装置１Ａ〜１Ｃを通信網３に接続した後、Ａ地点のマイクロホン１９１Ａから発言した内容は、Ｂ地点およびＣ地点のそれぞれのスピーカ１９２Ｂ、１９２Ｃで再生される。また、Ａ地点のカメラ１９４Ａで撮影した映像は、それぞれＢ地点およびＣ地点のモニタ１９３Ｂ，１９３Ｃに表示される。確認のために、Ａ地点の映像をモニタ１９３Ａに表示することができる。 First, the three video conference apparatuses 1A to 1C are connected to each other via the communication network 3. After connecting each of the video conference apparatuses 1A to 1C to the communication network 3, the contents uttered from the microphone 191A at the point A are reproduced by the speakers 192B and 192C at the points B and C, respectively. In addition, images captured by the camera 194A at the point A are displayed on the monitors 193B and 193C at the points B and C, respectively. For confirmation, the video at point A can be displayed on the monitor 193A.

Ｂ地点およびＣ地点の発言内容および画像についても、それぞれ同じようにマイクロホン１９１Ｂ、１９１Ｃから発言し、他地点の音声をスピーカ１９２Ａ〜１９２Ｃで再生し、カメラ１９４Ｂ，１９４Ｃの映像をモニタ１９３Ａ〜１９３Ｃに表示する。 Similarly, the contents and images of the points B and C are also spoken from the microphones 191B and 191C, the sounds at other points are reproduced by the speakers 192A to 192C, and the images of the cameras 194B and 194C are displayed on the monitors 193A to 193C. indicate.

このようにして、Ａ、Ｂ、Ｃの３地点間で互いに相手の映像を見ながら発言内容を聴取してテレビ会議を進めることができる。 In this way, it is possible to proceed with the video conference by listening to the content of the remarks while watching the video of the other party between the three points A, B, and C.

本発明においては、Ａ地点の複数のマイクロホン１９１Ａのうち、ある一つのマイクロホン１９１Ａから発言した内容は、文字データに変換され話者識別情報と共に、Ａ、Ｂ、Ｃ地点の各々のモニタ１９３Ａ〜１９３Ｃの映像表示中の一部に、追いかけて文字表示される。同様に、Ｂ、Ｃ地点の複数のマイクロホン１９２Ｂ、１９２Ｃから発言した内容を、文字データに変換し話者識別情報と共に、Ａ、Ｂ、Ｃ地点の各々のモニタ１９３Ａ〜１９３Ｃの映像表示中の一部に、追いかけて文字表示する。 In the present invention, among the plurality of microphones 191A at the point A, the contents uttered from one microphone 191A are converted into character data, and together with the speaker identification information, the monitors 193A to 193C at the points A, B, and C, respectively. Chasing characters are displayed on a part of the video display. Similarly, the contents uttered from the plurality of microphones 192B and 192C at the points B and C are converted into character data and displayed on the video images of the monitors 193A to 193C at the points A, B and C together with the speaker identification information. Chasing characters on the part.

会議の参加者は、発言内容を万が一聞き逃した場合にも、後からモニタ１９３Ａ〜１９３Ｃに発言者と発言内容が文字で表示されるので、その発言内容を知ることができる。 Even if the participant of the conference misses the content of the speech, the speaker and the content of the speech are later displayed in text on the monitors 193A to 193C, so that the content of the speech can be known.

また、会議の参加者のうち、議事録作成の担当者は、発言内容をスピーカ１９２で聞いた後、このモニタ１９３に表示された文字による発言内容を見て、議事録に残したい内容をパソコン１８にて選択し、テレビ会議装置１の記憶部１４２にマーキングしてその他の音声データまたは文字データとともに保存する。パソコン１８での議事録作成用の選択動作は、このように会議中にも行なうことができるが、会議終了後に会議の内容を再生して選択して議事録を作成することができる。また、このような議事録作成者の選択によらずに、議事録の選択内容を、予め議事録に残す発言者をマイクロホンの位置（音声の識別信号）で設定したり、声（ボリューム）の大きさで設定したりする等により、設定することができる。 In addition, among the participants of the conference, the person in charge of making the minutes listens to the contents of the remarks on the speaker 192, and then looks at the remarks in text displayed on the monitor 193, and the contents to be kept in the minutes 18, marking is performed on the storage unit 142 of the video conference apparatus 1, and it is stored together with other voice data or character data. Although the selection operation for creating the minutes on the personal computer 18 can be performed during the meeting as described above, the contents of the meeting can be reproduced and selected to create the minutes after the meeting. Regardless of the selection of the minutes creator, the selection of the minutes can be set in advance by using the microphone position (voice identification signal) as the speaker to leave the minutes. It can be set by setting the size.

図４および図５を用いて本発明にかかる音声会議システムの構成を説明する。図４は音声会議装置のハードウエア構成の例であり、図５は音声会議装置を互いに通信網に接続して構成した音声会議システムの例示す図である。図４に示す音声会議装置１´は、図１に示したテレビ会議装置の構成から、映像処理部１３の映像Ａ／Ｄ変換処理回路１３２およびカメラ１９４の機能を省いた構成となっており、映像信号を通信網３へ送受信する機能を持っていないのを除いて、テレビ会議装置１と同様の機能動作を実行できる。図５に示す音声会議システムでは、ＯＳＤ機能部１３３は、マイクロホンからの音声信号に基づいて発言内容を文字データ化し、これをモニタ１９３に表示させる。各地点のモニタ１９３には、話者識別情報と同時に、発言内容が文字表示されるのみである。 The configuration of the audio conference system according to the present invention will be described with reference to FIGS. 4 and 5. FIG. 4 shows an example of the hardware configuration of the audio conference apparatus, and FIG. 5 shows an example of the audio conference system configured by connecting the audio conference apparatuses to a communication network. The audio conference apparatus 1 ′ shown in FIG. 4 has a configuration in which the functions of the video A / D conversion processing circuit 132 and the camera 194 of the video processing unit 13 are omitted from the configuration of the video conference apparatus shown in FIG. Except for not having a function of transmitting / receiving a video signal to / from the communication network 3, the same functional operation as that of the video conference apparatus 1 can be executed. In the audio conference system shown in FIG. 5, the OSD function unit 133 converts the content of the remarks into character data based on the audio signal from the microphone and displays it on the monitor 193. On the monitor 193 at each point, the content of the utterance is only displayed in text simultaneously with the speaker identification information.

この実施例にかかる音声会議装置１´の機能構成は、図２に示したテレビ会議装置１の機能構成からカメラからの映像をＡ／Ｄ変換する映像Ａ／Ｄ変換処理機能１３２Ｍを除いた構成となる。議事録の作成は、テレビ会議装置における処理と同じ処理を行うことができる。 The functional configuration of the audio conference apparatus 1 'according to this embodiment is a configuration in which the video A / D conversion processing function 132M for A / D converting video from the camera is removed from the functional configuration of the video conference apparatus 1 shown in FIG. It becomes. The creation of the minutes can be performed in the same manner as in the video conference apparatus.

図６は、図５に示した音声会議装置１´の３地点間を接蔵した音声会議システムの接続構成において、外部にモニタ１８を接続しない場合の構成例を示す。この構成の場合、外部モニタ１９４の代用として、パソコン１８のモニタを使用する。その際の表示動作は、図４において、ＯＳＤ機能部１３が、ＣＰＵ１６からのオーダにより、音声一時記録メモリ１４２から音声信号を読み出し、モニタ表示用の文字データに変換して、映像ＣＯＤＥＣ１３４に出力し、モニタ表示用の文字データのみＣＰＵ１６でパソコン１８のモニタ表示のインタフェースに合わせたデータフォーマットに変換し、通信網インタフェース１５で電気インタフェースを変換して、パソコン１８に送信することによって、話者識別情報と発言内容を、順次、パソコン１８のモニタに表示させる。 FIG. 6 shows a configuration example when the monitor 18 is not connected to the outside in the connection configuration of the audio conference system in which the three points of the audio conference apparatus 1 ′ shown in FIG. In the case of this configuration, the monitor of the personal computer 18 is used as a substitute for the external monitor 194. In FIG. 4, the OSD function unit 13 reads the audio signal from the audio temporary recording memory 142 according to the order from the CPU 16, converts it into character data for monitor display, and outputs it to the video CODEC 134. Only the character data for monitor display is converted by the CPU 16 into a data format suitable for the monitor display interface of the personal computer 18, the electrical interface is converted by the communication network interface 15, and transmitted to the personal computer 18. And the contents of the statements are sequentially displayed on the monitor of the personal computer 18.

図７を用いて、３地点間のテレビ会議におけるＡ地点でのモニタに映像を分割表示する表示例を説明する。モニタ１９４の表示部１９４Ｄには、例えば、Ａ地点の映像１９４Ａと、Ｂ地点の映像１９４Ｂと、Ｃ地点の映像１９４Ｃが分割表示され、画面下部に話者識別情報から作成した話者名と音声−文字データ変換処理機能１３２Ｍで作成した文字データが表示される。文字データは、画面に表示された映像よりも遅延した後追いで表示される。 A display example in which video is divided and displayed on a monitor at a point A in a video conference between three points will be described with reference to FIG. On the display unit 194D of the monitor 194, for example, the video 194A at the point A, the video 194B at the point B, and the video 194C at the point C are displayed separately, and the speaker name and voice created from the speaker identification information at the bottom of the screen -Character data created by the character data conversion processing function 132M is displayed. The character data is displayed after a delay from the video displayed on the screen.

このように、本発明にかかる通信会議システムは、発言内容を後追いで画面上で確認することができ、従来のテレビ会議システムや音声会議システムと同等の接続および同等の使用方法で操作することが可能となり、システムの導入が容易である。 As described above, the communication conference system according to the present invention can confirm the content of the message on the screen later, and can be operated with the same connection and the same usage as the conventional video conference system and the audio conference system. It becomes possible and the introduction of the system is easy.

テレビ会議装置のハードウエア構成を説明する図The figure explaining the hardware constitutions of a video conference apparatus 図１のテレビ会議装置の機能構成を説明する図The figure explaining the function structure of the video conference apparatus of FIG. テレビ会議装置を互いに接続したテレビ会議システムの構成を説明する図The figure explaining the structure of the video conference system which connected the video conference apparatus mutually. 音声会議装置のハードウエア構成を説明する図The figure explaining the hardware constitutions of an audio conference apparatus 図４の音声会議装置を互いに接続した音声会議システムの構成を説明する図The figure explaining the structure of the audio conference system which mutually connected the audio conference apparatus of FIG. 音声会議システムの他の構成を説明する図The figure explaining other composition of an audio conference system モニタに表示する分割画面および文字表示例を説明する図The figure explaining the split screen and the example of a character display displayed on a monitor

Explanation of symbols

１：テレビ会議装置、１´：音声会議装置、１１：音声処理部、１１１：音声Ａ／Ｄ変換回路、１１１Ｍ：音声Ａ／Ｄ変換処理機能、１１２：音声Ｄ／Ａ変換回路、１１２Ｍ：音声Ｄ／Ａ変換処理機能、１１３Ｍ話者識別処理機能、１１４Ｍ：話者識別・現在時刻付加処理機能、１１５：音声ＣＯＤＥＣ部、１１５Ｍ：音声データ送受信処理機能、１２：議事録作成処理部、１２１Ｍ：マーキング遡り処理機能、１３：映像処理部、１３１：映像Ｄ／Ａ変換回路、１３１Ｍ：映像Ｄ／Ａ変換処理機能、１３２：映像Ａ／Ｄ変換回路、１３２Ｍ：映像Ａ／Ｄ変換処理機能、１３３：ＯＳＤ機能部、１３４：映像ＣＯＤＥＣ部、１３４Ｍ：映像データ送受信処理機能、１４：記憶部、１４１：主メモリ、１４２：音声一時記録メモリ、１４３：音声記録用メモリ、１４４：文字データメモリ、１５：通信網インタフェース（通信処理部）、１６：中央処理装置（ＣＰＵ）、１７：バス、１８：パソコン、１９１：マイクロホン、１９２：スピーカ、１９４：カメラ、１９３：モニタ、３：通信網、３１：多地点接続装置（ＭＣＵ）、３３１Ｍ：ＯＳＤ処理機能、３３２Ｍ：音声―文字データ変換処理機能。 1: Video conference device, 1 ′: Audio conference device, 11: Audio processing unit, 111: Audio A / D conversion circuit, 111M: Audio A / D conversion processing function, 112: Audio D / A conversion circuit, 112M: Audio D / A conversion processing function, 113M speaker identification processing function, 114M: speaker identification / current time addition processing function, 115: voice CODEC unit, 115M: voice data transmission / reception processing function, 12: minutes creation processing unit, 121M: Marking retroactive processing function, 13: video processing unit, 131: video D / A conversion circuit, 131M: video D / A conversion processing function, 132: video A / D conversion circuit, 132M: video A / D conversion processing function, 133 : OSD function unit, 134: Video codec unit, 134M: Video data transmission / reception processing function, 14: Storage unit, 141: Main memory, 142: Audio temporary recording memory, 143: Sound Memory for recording, 144: Character data memory, 15: Communication network interface (communication processing unit), 16: Central processing unit (CPU), 17: Bus, 18: Personal computer, 191: Microphone, 192: Speaker, 194: Camera, 193: Monitor, 3: Communication network, 31: Multipoint connection unit (MCU), 331M: OSD processing function, 332M: Voice-character data conversion processing function.

Claims

Voice A / D conversion means for converting an analog voice signal from a microphone that collects the voice of a speaker into a digital voice signal;
Speaker identification processing means for acquiring speaker identification information for identifying and identifying a speaker from an audio signal;
And speaker identification information adding means for converting the information adding audio signals by adding the speaker identification information to the digital audio signal,
A voice recording memory for recording the information-added voice signal output by the speaker identification information adding means;
Transmitting means for compressing the information-added audio signal and transmitting the compressed audio information to the communication network;
Receiving means for receiving the compressed audio information from the communication network;
An audio data processing means that decompresses the received compressed audio signal to obtain an information-added audio signal again, extracts the speaker identification information that has been added, and then converts it to a digital audio signal again;
Audio D / A conversion means for converting the digital audio signal into an analog audio signal and outputting it from a speaker;
A teleconferencing device comprising:
A screen to display character data;
Voice-character data conversion means for converting the digital voice signal into character data;
A character data memory for storing the converted character data;
Video D / A conversion means for converting the converted character data into video to be displayed on the screen, and character data to be displayed on the screen after the converted character data is delayed from the speech of the corresponding speaker. Display means;
When the audio data included in the information-added audio signal recorded in the audio recording memory is being reproduced, or when the character data in the character data memory is being displayed on the screen, the reproduction is performed. Marking signal input means for inputting a marking signal to the voice data being displayed or the character data being displayed;
A flag is added to the head of voice data included in the information-added voice signal recorded in the voice recording memory or character data recorded in the character data memory in accordance with an input from the marking signal input means. Flag granting means,
First minutes data creating means for creating minutes data using the voice data or character data to which the flag is given by the flag giving means;
A teleconferencing apparatus comprising:

The teleconference device according to claim 1,
A camera to shoot the speaker,
Video A / D conversion means for converting an analog video signal of a speaker photographed by the camera into a digital video signal;
Video signal transmitting means for compressing the digital video signal into a compressed video signal and transmitting the compressed video signal to a communication network, and video signal expansion for expanding the compressed video signal received from the communication network and converting it into a digital video signal Means,
A monitor that displays video,
Communication conference device characterized by having an OSD function unit to be displayed on the monitor superimposed character data of the speech content and the speaker identification information to the digital video signal.

The teleconference device according to claim 1,
In place of the first minutes data creation means,
A communication conference apparatus comprising: second minutes data creating means for creating minutes data by using the identification information-added voice signal by designating the speaker identification information .

The communication conference apparatus according to claim 1, wherein the communication conference apparatus is a video conference apparatus or an audio conference apparatus.