JP4507905B2

JP4507905B2 - Communication control device, communication control method, program and recording medium for audio conference

Info

Publication number: JP4507905B2
Application number: JP2005038246A
Authority: JP
Inventors: 彰増田; 英春藤山; 雅文永易; 竜一田中
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2005-02-15
Filing date: 2005-02-15
Publication date: 2010-07-21
Anticipated expiration: 2025-02-15
Also published as: JP2006229356A

Description

本発明は、音声会議を行う際に相手との通信を開始するための装置，方法等に関する。 The present invention relates to an apparatus, a method, and the like for starting communication with a partner when performing an audio conference.

互いに離れた２以上の場所にいる者同士が会議を行う方法の一つに、音声会議と呼ばれるものがある。これは、各場所にマイクロフォンやスピーカや通信機器を用意し、マイクロフォンへの入力音声を通信機器から通信回線を介して相手の場所に送信するとともに、通信機器で受信した相手の場所からの音声信号をスピーカに送って音声出力するものである。通信回線としては公衆電話回線を利用することがあり、その場合には電話会議とも呼ばれている。 One method of performing a conference between two or more places apart from each other is called an audio conference. This is because microphones, speakers, and communication devices are prepared at each location, and audio input to the microphone is transmitted from the communication device to the other party's location via the communication line, and an audio signal received from the other party's location is received by the communication device. Is output to the speaker for audio output. A public telephone line may be used as a communication line, and in that case, it is also called a telephone conference.

こうした音声会議を開始する際には、いずれか一つの場所から、相手の場所の通信機器の識別情報（例えば電話会議では相手の場所の電話番号）に基づいて、当該相手の場所との通信を開始するための処理を行う（例えば電話会議では相手の場所の電話番号を発呼する）ことが必要である。 When starting such an audio conference, communication with the other party's location is performed from any one location based on the identification information of the communication device at the other party's location (for example, the telephone number of the other party's location in a conference call). It is necessary to perform a process for starting (for example, in a conference call, the telephone number of the other party's place is called).

従来、電話会議用の端末装置（マイクロフォン及びスピーカを有するとともに電話機機能を備えた端末装置）としては、各端末装置に固有の識別番号データを記憶した端末番号発生部や、接続する相手の設定等の各種操作を行うキーを設けたものが提案されていた（例えば、特許文献１参照）。
特開平１１−２１５２４０号公報（段落番号０００９、図１） Conventionally, as a terminal device for a telephone conference (a terminal device having a microphone and a speaker and having a telephone function), a terminal number generating unit storing identification number data unique to each terminal device, setting of a partner to be connected, etc. There has been proposed one provided with a key for performing various operations (see, for example, Patent Document 1).
JP 11-215240 A (paragraph number 0009, FIG. 1)

しかし、こうした従来の電話会議用の端末装置では、会議の相手は同じであるが相手の場所が会議のたびに一定していない場合（例えば、複数の会議室のうちの空いている会議室を予約して使用する場合）には、会議出席者が、その都度相手の現在の場所を調べて、その場所の端末装置を選択する操作を行わなければならないので、たいへん不便である。 However, in such a conventional telephone conference terminal device, the meeting partner is the same, but the partner's location is not constant at each meeting (for example, a vacant conference room among a plurality of conference rooms In the case of making a reservation, it is very inconvenient because the attendee of the conference must check the current location of the other party and select the terminal device at that location each time.

本発明は、上述の点に鑑み、電話会議等の音声会議において、会議の相手は同じであるが相手の場所が会議のたびに一定していない場合にも、容易に相手の場所との通信を開始して会議を行えるようにすることを課題としてなされたものである。 In the audio conference such as a conference call, the present invention can easily communicate with the location of the other party even if the conference partner is the same but the location of the other party is not constant at each conference. It was made as an issue to be able to start a conference and hold a conference.

この課題を解決するために、本発明に係る音声会議用の通信制御装置は、マイクロフォンからの音声信号を音声認識する音声認識手段と、会議に使用される複数の場所における通信手段の識別情報を記憶した記憶手段と、この複数の場所のうち会議出席者が使用する予定の場所とその会議出席者の名称とを対応させたスケジュールデータを参照して、この音声認識手段によって認識された音声に少なくともいずれかの会議出席者の名称が含まれているか否かを判別し、含まれている場合に、その含まれている名称の会議出席者に対応する場所をこのスケジュールデータから確認するとともに、その確認した場所の通信手段の識別情報をこの記憶手段から取得する処理手段と、通信回線を介して通信を行うための通信手段を制御して、この処理手段によって取得された識別情報宛ての通信を開始させる制御手段とを備えたことを特徴とする。 In order to solve this problem, a communication control apparatus for voice conference according to the present invention includes voice recognition means for voice recognition of a voice signal from a microphone, and identification information of communication means at a plurality of locations used for the conference. By referring to the stored storage means and schedule data in which the meeting attendee's planned location among the plurality of places is associated with the name of the meeting attendee, the voice recognized by the voice recognition means Determine if at least one of the names of the meeting attendees is included, and if so, check the location corresponding to the meeting attendee with the included name from this schedule data, This processing is performed by controlling the processing means for acquiring the identification information of the communication means at the confirmed location from the storage means and the communication means for performing communication via the communication line. Characterized in that a control means for starting the communication identification information destined acquired by stage.

また、本発明に係る音声会議用の通信制御方法は、マイクロフォンからの音声信号を音声認識する第１のステップと、会議に使用される複数の場所のうち会議出席者が使用する予定の場所とその会議出席者の名称とを対応させたスケジュールデータを参照して、この音声認識手段によって認識された音声に少なくともいずれかの会議出席者の名称が含まれているか否かを判別し、含まれている場合に、その含まれている名称の会議出席者に対応する場所をこのスケジュールデータから確認するとともに、その確認した場所の通信手段の識別情報を、この複数の場所における通信手段の識別情報を記憶した記憶手段から取得する第２のステップと、通信回線を介して通信を行うための通信手段を制御して、この第２のステップで取得した識別情報宛ての通信を開始させる第３のステップとを有することを特徴とする。 In addition, the communication control method for audio conference according to the present invention includes a first step for recognizing an audio signal from a microphone, a location scheduled to be used by a conference attendant among a plurality of locations used for the conference, and Referring to the schedule data corresponding to the names of the meeting attendees, it is determined whether or not at least one meeting attendee name is included in the voice recognized by the voice recognition means. The location corresponding to the attendee of the name included in the schedule data is confirmed from the schedule data, and the identification information of the communication means at the confirmed location is identified as the identification information of the communication means at the plurality of locations. The second step of acquiring from the storage unit storing the information and the communication unit for performing communication via the communication line, the identification information acquired in the second step And having a third step for starting a communication destined.

また、本発明に係るプログラムは、コンピュータに、マイクロフォンからの音声信号を音声認識する第１の手順と、会議に使用される複数の場所のうち会議出席者が使用する予定の場所とその会議出席者の名称とを対応させたスケジュールデータを参照して、この音声認識手段によって認識された音声に少なくともいずれかの会議出席者の名称が含まれているか否かを判別し、含まれている場合に、その含まれている名称の会議出席者に対応する場所をこのスケジュールデータから確認するとともに、その確認した場所の通信手段の識別情報を、この複数の場所における通信手段の識別情報を記憶した記憶手段から取得する第２の手順と、通信回線を介して通信を行うための通信手段を制御して、この第２の手順で取得した識別情報宛ての通信を開始させる第３の手順とを実行させることを特徴とする。 In addition, the program according to the present invention includes a first procedure for recognizing an audio signal from a microphone on a computer, a place to be used by a meeting attendee among a plurality of places used for the meeting, and the meeting attendance. Refers to the schedule data that corresponds to the name of the party, and determines whether or not the name of at least one of the meeting attendees is included in the voice recognized by the voice recognition means. In addition, the location corresponding to the meeting attendee with the name included is confirmed from the schedule data, and the identification information of the communication means at the confirmed location is stored as the identification information of the communication means at the plurality of locations. The second procedure acquired from the storage unit and the communication unit for performing communication via the communication line are controlled, and the communication addressed to the identification information acquired in the second procedure is controlled. Characterized in that to execute a third procedure to start.

また、本発明に係るコンピュータ読み取り可能な記録媒体は、コンピュータに、マイクロフォンからの音声信号を音声認識する第１の手順と、会議に使用される複数の場所のうち会議出席者が使用する予定の場所とその会議出席者の名称とを対応させたスケジュールデータを参照して、この音声認識手段によって認識された音声に少なくともいずれかの会議出席者の名称が含まれているか否かを判別し、含まれている場合に、その含まれている名称の会議出席者に対応する場所をこのスケジュールデータから確認するとともに、その確認した場所の通信手段の識別情報を、この複数の場所における通信手段の識別情報を記憶した記憶手段から取得する第２の手順と、通信回線を介して通信を行うための通信手段を制御して、この第２の手順で取得した識別情報宛ての通信を開始させる第３の手順とを実行させるプログラムを記録したことを特徴とする。 The computer-readable recording medium according to the present invention is intended to be used by a conference attendant among a plurality of locations used for a conference and a first procedure for recognizing a speech signal from a microphone. With reference to the schedule data in which the location and the name of the meeting attendee are associated with each other, it is determined whether or not at least one meeting attendee name is included in the voice recognized by the voice recognition means, If it is included, the location corresponding to the attendee of the included name is confirmed from the schedule data, and the identification information of the communication means of the confirmed location is confirmed by the communication means in the plurality of locations. The second procedure acquired from the storage unit storing the identification information and the communication unit for performing communication via the communication line are controlled. The program for executing the third procedure to start communication of the resulting identification information destined characterized by being recorded.

これらの発明では、マイクロフォンからの音声信号が音声認識され、その認識された音声少なくともいずれかの会議出席者の名称が含まれているか否かが、会議に使用される複数の場所のうち会議出席者が使用する予定の場所とその会議出席者の名称とを対応させたスケジュールデータを参照して判別される。 In these inventions, whether or not a speech signal from a microphone is recognized and whether or not the recognized speech includes the name of at least one of the meeting attendees is a meeting attendance among a plurality of locations used for the meeting. It is determined by referring to schedule data that associates the place that the person intends to use with the names of the attendees of the meeting.

そして、含まれている場合に、その含まれている名称の会議出席者に対応する場所がこのスケジュールデータから確認されるとともに、その確認した場所の通信手段の識別情報が、それらの複数の場所における通信手段の識別情報を記憶した記憶手段から取得される。 If included, the location corresponding to the meeting attendee with the included name is confirmed from this schedule data, and the identification information of the communication means of the confirmed location is the plurality of locations. Obtained from the storage means storing the identification information of the communication means.

そして、通信回線を介して通信を行うための通信手段が制御されて、その取得された識別情報宛ての通信が開始される。 And the communication means for communicating via a communication line is controlled, and communication addressed to the acquired identification information is started.

したがって、会議の相手は同じであるが相手の場所が会議のたびに一定していない場合（例えば、複数の会議室のうちの空いている会議室を予約して使用する場合）にも、その相手の名称をマイクロフォンに向けて発話するだけで、自動的にその相手の場所との通信が開始される。 Therefore, even if the other party is the same, but the location of the other party is not constant at each meeting (for example, if you reserve and use a vacant meeting room among multiple meeting rooms) Just speaking to the microphone of the other party's name, communication with the other party's location is automatically started.

これにより、会議の相手は同じであるが相手の場所が会議のたびに一定していない場合にも、容易にその相手の場所との通信を開始して会議を行うことができる。 As a result, even if the other party is the same, but the other party's location is not constant every time the conference is held, it is possible to easily start communication with the other party's location and hold the conference.

また、一例として、マイクロフォンからの音声信号から声紋データを抽出し、その抽出した声紋データを、予め登録された声紋データと比較照合して発話者を特定する声紋認証手段をさらに備え、スケジュールデータを参照して、この声紋認証手段によって特定された発話者が、自己の側の場所を使用する予定の会議出席者であるか否かを判別し、その場所を使用する予定の会議出席者でない場合には、記憶手段から識別情報を取得しないことが好適である。 In addition, as an example, voiceprint data is extracted from a voice signal from a microphone, voiceprint authentication means for identifying a speaker by comparing the extracted voiceprint data with previously registered voiceprint data, and schedule data is further provided. If the speaker identified by this voiceprint authentication means is a meeting attendee who intends to use the location on his / her side, and is not a conference attendee who intends to use that location It is preferable that the identification information is not acquired from the storage means.

それにより、自己の側の場所で会議に出席する予定の者が会議の相手の名称を発話した場合にのみ相手の場所との通信が開始されるので、部外者によって相手の場所との通信が開始されることを防止できるようになる。 As a result, communication with the other party's location starts only when the person who is scheduled to attend the meeting at his / her location speaks the name of the other party, so that the outsider can communicate with the other party's location. Can be prevented from starting.

また、一例として、音声認識によって認識された音声に含まれている名称の会議出席者がスケジュールデータにおいて複数の場所に対応している場合に、場所を選択させる音声を音声合成によって合成して出力し、その後音声認識によって認識した音声から選択結果を判別して、選択された場所の通信手段の識別情報を記憶手段から取得し、通信手段を制御して、その取得した識別情報宛ての通信を開始させることが好適である。 Also, as an example, when a meeting attendee whose name is included in the speech recognized by speech recognition corresponds to multiple locations in the schedule data, the speech for selecting the location is synthesized and output by speech synthesis. Then, the selection result is discriminated from the voice recognized by voice recognition, the identification information of the communication means at the selected place is acquired from the storage means, the communication means is controlled, and the communication addressed to the acquired identification information is performed. It is preferred to start.

それにより、スケジュールデータ上で同じ名称の相手に対応する場所が複数存在する場合（例えば、名称が同じ複数の人物が別々の場所で会議に出席する予定になっている場合）に、場所を選択させる合成音声が出力される。そして、その合成音声に応答するようにして選択結果をマイクロフォンに向けて発話すると、自動的にその選択した場所との通信が開始される。したがって、スケジュールデータ上で同じ名称の相手に対応する場所が複数存在する場合にも、本来の相手が今回使用する場所を適確に選択して、その場所との通信を開始することができるようになる。 As a result, if there are multiple locations corresponding to the same name on the schedule data (for example, if multiple people with the same name are scheduled to attend the meeting in different locations), select the location The synthesized speech to be output is output. Then, when the selection result is uttered toward the microphone so as to respond to the synthesized speech, communication with the selected place is automatically started. Therefore, even when there are multiple locations corresponding to the same name partner in the schedule data, it is possible to properly select the location that the original partner uses this time and start communication with that location. become.

本発明によれば、会議の相手は同じであるが相手の場所が会議のたびに一定していない場合にも、その相手の名称をマイクロフォンに向けて発話するだけで自動的にその相手の場所との通信が開始されるので、容易にその相手の場所との通信を開始して会議を行えるという効果が得られる。 According to the present invention, even if the other party is the same, but the other party's location is not constant at each meeting, the person's name is automatically spoken to the microphone, and the other party's location is automatically set. Since the communication with the other party is started, it is possible to easily start the communication with the other party's place and hold the conference.

以下、本発明を図面を用いて具体的に説明する。図１は、本発明を適用した或る会社Ｘ内の電話会議システムの全体構成例を示す図である。会社Ｘの本社や支社や事業所に、複数の会議室（Ａ会議室，Ｂ会議室，Ｃ会議室，Ｄ会議室，Ｅ会議室，…）が存在している。各会議室には、それぞれ回線接続装置１，全方位音声入力装置２及びパーソナルコンピュータ３が設置されている（Ｄ会議室以降の会議室についてはこれらの装置の図示を省略している）。 Hereinafter, the present invention will be specifically described with reference to the drawings. FIG. 1 is a diagram showing an example of the overall configuration of a telephone conference system in a certain company X to which the present invention is applied. A plurality of meeting rooms (A meeting room, B meeting room, C meeting room, D meeting room, E meeting room,...) Exist at the head office, branch office, or office of company X. In each conference room, a line connection device 1, an omnidirectional audio input device 2, and a personal computer 3 are installed (there are not shown in the conference rooms after the D conference room).

回線接続装置１は、公衆電話回線４を介した発呼・着呼の処理及び信号送受信処理を行う回路（すなわち電話機と同じ発呼・着呼機能及び信号送受信機能を有する回路）を設けた装置であり、公衆電話回線４に接続されている。 The line connection device 1 is a device provided with a circuit that performs processing of calling / incoming calls and signal transmission / reception processing via the public telephone line 4 (that is, a circuit having the same calling / calling function and signal transmission / reception function as a telephone). And connected to the public telephone line 4.

また、回線接続装置１は、アナログ音声信号の入力端子１ａと、アナログ音声信号の出力端子１ｂ及び１ｃと、制御信号の入力端子１ｄとを有しており、入力端子１ａに入力したアナログ音声信号を常時出力端子１ｃからそのまま出力する。また、入力端子１ｄに入力した制御信号に基づいて発呼を行い、電話がつながった状態では、入力端子１ａに入力したアナログ音声信号を公衆電話回線４を介して送信するとともに、公衆電話回線４を介して受信した音声信号を出力端子１ｂから出力する。 The line connection device 1 also has an analog audio signal input terminal 1a, analog audio signal output terminals 1b and 1c, and a control signal input terminal 1d, and the analog audio signal input to the input terminal 1a. Is always output from the output terminal 1c. Further, when a call is made based on the control signal input to the input terminal 1d and the telephone is connected, an analog voice signal input to the input terminal 1a is transmitted via the public telephone line 4, and the public telephone line 4 The audio signal received via is output from the output terminal 1b.

回線接続装置１の入力端子１ａ及び出力端子１ｂは全方位音声入力装置２に接続されており、回線接続装置１の出力端子１ｃ及び入力端子１ｄはパーソナルコンピュータ３に接続されている。 An input terminal 1 a and an output terminal 1 b of the line connection device 1 are connected to the omnidirectional audio input device 2, and an output terminal 1 c and an input terminal 1 d of the line connection device 1 are connected to the personal computer 3.

全方位音声入力装置２は、音声会議用にマイクロフォンとスピーカとを一体化させた装置である。図２は、全方位音声入力装置２の外観構成例を示す図であり、図２（ａ）は斜視図、図２（ｂ）は図２（ａ）のマイクロフォン収容部１１の内部の上面図である。 The omnidirectional voice input device 2 is a device in which a microphone and a speaker are integrated for voice conference. 2A and 2B are diagrams showing an example of the external configuration of the omnidirectional audio input device 2, FIG. 2A is a perspective view, and FIG. 2B is a top view of the inside of the microphone housing portion 11 of FIG. It is.

全方位音声入力装置２は、図２（ａ）に示すように、マイクロフォンを収容するためのマイクロフォン収容部１１と、会議相手先の音声を再生するスピーカを収容するためのスピーカ収容部１２と、操作部１３とを含んでいる。 As shown in FIG. 2A, the omnidirectional audio input device 2 includes a microphone accommodating portion 11 for accommodating a microphone, a speaker accommodating portion 12 for accommodating a speaker that reproduces the audio of the conference partner, And an operation unit 13.

図２（ｂ）に示すように、マイクロフォン収容部１１の内部には、６本のマイクロフォンＭＣ１〜ＭＣ６が全方位に均等に配置されている。各マイクロフォンは、単一指向性を持つマイクロフォンである。 As shown in FIG. 2B, six microphones MC <b> 1 to MC <b> 6 are equally arranged in all directions inside the microphone housing portion 11. Each microphone is a unidirectional microphone.

スピーカ収容部１２に収容されるスピーカは、収容筐体の中心に位置しており、スピーカからの音声が各マイクロフォンＭＣ１〜ＭＣ６に対してほぼ同音量・同位相で届くように構成されている。 The speaker accommodated in the speaker accommodating portion 12 is positioned at the center of the accommodating housing, and is configured such that sound from the speaker reaches the microphones MC1 to MC6 with substantially the same volume and phase.

各会議室の会議出席者は、この全方位音声入力装置２を取り囲むようにして着席して、いずれかのマイクロフォンＭＣ１〜ＭＣ６に向かって発話する。 A meeting attendee in each conference room sits so as to surround the omnidirectional audio input device 2 and speaks to any of the microphones MC1 to MC6.

図３は、全方位音声入力装置２の回路構成例を示すブロック図である。全方位音声入力装置２は、Ａ／Ｄ変換器ブロック５１と、Ａ／Ｄ変換器５１７と、ＤＳＰ５２と、ＤＳＰ５３と、ＣＰＵ５４と、Ｄ／Ａ変換器５５１，５５２と、増幅器５６１，５６２と、スピーカ５７とを含んでいる。図３においては、Ａ／Ｄ変換器ブロック５１の一例として、各マイクロフォンＭＣ１〜ＭＣ６に対応する６個のＡ／Ｄ変換器５１１〜５１６を示している。 FIG. 3 is a block diagram illustrating a circuit configuration example of the omnidirectional audio input device 2. The omnidirectional audio input device 2 includes an A / D converter block 51, an A / D converter 517, a DSP 52, a DSP 53, a CPU 54, D / A converters 551 and 552, amplifiers 561 and 562, The speaker 57 is included. In FIG. 3, six A / D converters 511 to 516 corresponding to the microphones MC1 to MC6 are shown as an example of the A / D converter block 51.

全方位音声入力装置２は、これらの回路を実装した基板を、例えば、図２（ａ）に示すマイクロフォン収容部１１の内部に設置したものである。 The omnidirectional audio input apparatus 2 is a board in which these circuits are mounted, for example, inside a microphone housing portion 11 shown in FIG.

ＣＰＵ５４は、全方位音声入力装置２の全体制御処理を行う。
ＤＳＰ５２は、Ａ／Ｄ変換器５１１〜５１６によりディジタル信号に変換された６本のマイクロフォンＭＣ１〜ＭＣ６からの音声信号に基づいて、一つのマイクロフォンの音声信号を選択する処理（マイクロフォン選択処理）等の各種の信号処理を行う。ＤＳＰ５２の内部処理については後述する。 The CPU 54 performs overall control processing of the omnidirectional audio input device 2.
The DSP 52 performs processing (microphone selection processing) for selecting a sound signal of one microphone based on the sound signals from the six microphones MC1 to MC6 converted into digital signals by the A / D converters 511 to 516. Perform various signal processing. The internal processing of the DSP 52 will be described later.

ＤＳＰ５３は、エコーキャンセラーとして機能する。すなわち、Ａ／Ｄ変換器５１７を介して入力する相手会議室からの音声信号の大きさと遅延量を算出し、算出した音声信号の大きさと遅延量に応じた信号を、ＤＳＰ５２を介して入力する音声信号から減じる処理を行うことにより、エコー消去処理を行う。 The DSP 53 functions as an echo canceller. That is, the size and delay amount of the audio signal from the partner conference room input via the A / D converter 517 are calculated, and a signal corresponding to the calculated audio signal size and delay amount is input via the DSP 52. Echo cancellation processing is performed by performing processing to subtract from the audio signal.

ＤＳＰ５３の処理結果は、Ｄ／Ａ変換器５５１，５５２によってアナログ信号に変換される。Ｄ／Ａ変換器５５２からのアナログ音声信号は、増幅器５６２で増幅された後、出力端子５７１から出力して図１の回線接続装置１の入力端子１ａに送られる。 The processing result of the DSP 53 is converted into an analog signal by the D / A converters 551 and 552. The analog audio signal from the D / A converter 552 is amplified by the amplifier 562, then output from the output terminal 571, and sent to the input terminal 1a of the line connection device 1 in FIG.

図１の回線接続装置１の出力端子１ｂから出力されたアナログ音声信号は、全方位音声入力装置２の入力端子５７２に入力し、Ａ／Ｄ変換器５１７によってデジタル変換され、ＤＳＰ５３に入力されてエコーキャンセル処理に使用されるとともに、Ｄ／Ａ変換器５５１及び増幅器５６１を介し、スピーカ５７から音として出力される。 The analog audio signal output from the output terminal 1b of the line connection device 1 in FIG. 1 is input to the input terminal 572 of the omnidirectional audio input device 2, digitally converted by the A / D converter 517, and input to the DSP 53. In addition to being used for echo cancellation processing, the sound is output from the speaker 57 via the D / A converter 551 and the amplifier 561.

一方、ＤＳＰ５２により選択されたマイクロフォンの音声信号も、ＤＳＰ５３を介してスピーカ５７から音として出力される。すなわち、各会議室の会議出席者は、相手の会議室に配置された全方位音声入力装置２によって選択された話者の音声に加えて、自分の会議室にいる発言者が発した音声をもスピーカ５７を介して聞くことができる。 On the other hand, the sound signal of the microphone selected by the DSP 52 is also output as sound from the speaker 57 via the DSP 53. In other words, in addition to the voice of the speaker selected by the omnidirectional voice input device 2 arranged in the other party's conference room, the conference attendee in each conference room can hear the voice uttered by the speaker in his / her conference room. Can also be heard through the speaker 57.

全方位音声入力装置２に内蔵された６本のマイクロフォンＭＣ１〜ＭＣ６は、それぞれ指向性を有するマイクロフォンである。
無指向性のマイクロフォンを用いた場合には、マイクロフォン周辺の全ての音を集音するので、発言者の音声と周辺ノイズとのＳ／Ｎが混同してあまり良い音が集音できない。これを避けるために、全方位音声入力装置２では、指向性マイクロフォンで集音することにより、周辺のノイズとのＳ／Ｎを改善している。 The six microphones MC1 to MC6 built in the omnidirectional audio input device 2 are microphones having directivity.
When an omnidirectional microphone is used, all sounds around the microphone are collected. Therefore, the S / N of the voice of the speaker and the ambient noise is confused and a very good sound cannot be collected. In order to avoid this, the omnidirectional audio input device 2 collects sound with a directional microphone, thereby improving the S / N with surrounding noise.

次に、ＤＳＰ５２で行う処理内容について述べる。
ＤＳＰ５２で行われる主な処理は、マイクロフォン（以下、単にマイクと称する場合もある）の選択・切替え処理である。
すなわち、各マイクロフォンからの音声に基づいて、一つのマイクロフォンを特定し、特定したマイクロフォンからの音声を選択・出力する処理を行う。その際、全方位音声入力部３を使用する複数の会議参加者が同時に話をすると、音声が入り交じり相手方にとって聞きにくくなるため、選択されたマイクからの音声信号のみが出力される。 Next, processing contents performed by the DSP 52 will be described.
The main processing performed in the DSP 52 is selection / switching processing of a microphone (hereinafter sometimes simply referred to as a microphone).
That is, based on the sound from each microphone, one microphone is specified, and the process of selecting and outputting the sound from the specified microphone is performed. At that time, if a plurality of conference participants who use the omnidirectional audio input unit 3 speak at the same time, the audio is mixed and difficult to hear for the other party, so only the audio signal from the selected microphone is output.

本処理を正確に行うため、ＤＳＰ５２は、下記に例示する各種の信号処理を行う。
（ａ）マイク信号の帯域分離とピークホールド処理
（ｂ）発言の開始、終了の判定処理
（ｃ）発言者方向マイクの検出処理
（ｄ）マイク信号の選択切替え処理 In order to perform this process accurately, the DSP 52 performs various signal processes exemplified below.
(A) Microphone signal band separation and peak hold processing
(B) Speech start / end determination processing (c) Speaker direction microphone detection processing (d) Microphone signal selection switching processing

図４は、ＤＳＰ５２において実行される処理を示す機能ブロック図である。
図４に示すように、ＤＳＰ５２は、各マイク信号に対してＢＰＦ処理を行い、音圧レベルデータを生成するＢＰＦ５２１１〜５２１６からなるＢＰＦブロック５２１と、ＢＰＦ処理された各マイクの音圧レベルデータに対してＰＨ処理を行い、後述するピーク値を生成するＰＨ５２２１〜５２２６からなるＰＨブロック５２２と、各マイクのピーク値に対して、後述する発言の開始判定やマイクの切替え処理などの各処理を実行する判定処理部５２３と、を含んで構成される。 FIG. 4 is a functional block diagram showing processing executed in the DSP 52.
As shown in FIG. 4, the DSP 52 performs BPF processing on each microphone signal and generates BPF blocks 521 including BPFs 5211 to 5216 that generate sound pressure level data, and the sound pressure level data of each microphone subjected to BPF processing. PH processing is performed on the PH block 522 including PHs 5221 to 5226 for generating peak values to be described later, and each process such as speech start determination and microphone switching processing to be described later is performed on the peak values of each microphone. And a determination processing unit 523.

かかる構成を有するＤＳＰ５２による上述した各信号処理（ａ）〜（ｄ）について、以下に述べる。
（ａ）マイク信号の帯域分離とピ−クホールド処理
本処理は、図４に示すＢＰＦブロック５２１及びＰＨブロック５２２によって行われる。
ＢＰＦブロック５２１の各ＢＰＦ処理は、後述する発言の開始、終了判定等に必要な所定の帯域通過特性（例えば、１００〜６００Ｈｚ）に基づいて行われる。
ＰＨ処理は、ＢＰＦ処理された音圧レベルデータ（マイク信号）の最大値を保持（ピークホールド）する処理を行った後のデータであるピーク値を生成する。
以降の処理、すなわち、上記（ｂ）〜（ｄ）の処理については、各マイク信号に基づいて算出されたピーク値を入力する判定処理部５２３により実行される。 The signal processing (a) to (d) described above by the DSP 52 having such a configuration will be described below.
(A) Band separation of microphone signal and peak hold processing This processing is performed by the BPF block 521 and the PH block 522 shown in FIG.
Each BPF process of the BPF block 521 is performed based on a predetermined band-pass characteristic (for example, 100 to 600 Hz) necessary for the start and end determination of a speech, which will be described later.
The PH processing generates a peak value that is data after performing processing for holding (peak holding) the maximum value of the sound pressure level data (microphone signal) subjected to BPF processing.
The subsequent processes, that is, the processes (b) to (d) are executed by the determination processing unit 523 that inputs the peak value calculated based on each microphone signal.

（ｂ）発言の開始、終了の判定処理
発言の開始判定、終了判定処理は、各マイク毎に独立に、例えば、所定の閾値と音圧レベルを比較することにより、発言の開始／終了を判定する。また、定常的な騒音レベルを逐次測定し、上記所定の閾値を可変とするように構成してもよい。
ＤＳＰ５２は、例えば、マイクロフォンＭＣ１の発言の開始を判定すると、マイクロフォンＭＣ１に設定された出力ゲインを増加させる。逆に、マイクロフォンＭＣ１の発言の終了を判定すると、マイクロフォンＭＣ１に設定された出力ゲインを減少させる。 (B) Speech start / end determination processing The speech start determination / end determination processing is performed independently for each microphone, for example, by comparing a predetermined threshold with a sound pressure level to determine the start / end of the speech. To do. Further, a steady noise level may be sequentially measured so that the predetermined threshold value is variable.
For example, when the DSP 52 determines the start of speech from the microphone MC1, the DSP 52 increases the output gain set in the microphone MC1. On the contrary, when the end of the speech of the microphone MC1 is determined, the output gain set in the microphone MC1 is decreased.

（ｃ）発言者方向マイクの検出処理
本処理は、各マイクに対向するそれぞれの話者が同時に発言する場合に、音圧レベルの大きい一つのマイクを選択する処理である。すなわち、１人の話者が発言を開始する場合には、一つのマイクからの音圧レベルデータに基づいて、上述した処理を行えばよいが、複数の話者が同時に発言することもあり得るので、その場合に主たる話者に係るマイクを特定する。
なお、発言者方向のマイクの検出処理に必要な各マイクの音圧レベルデータは、図４に示すように、各マイクを通して入力する音圧レベルデータに対して、バンドパス・フィルタ（ＢＰＦ）処理及びピークホールド（ＰＨ）処理を行うことにより得られるピーク値である。 (C) Speaker Direction Microphone Detection Processing This processing is processing for selecting one microphone having a high sound pressure level when the speakers facing each microphone speak at the same time. That is, when one speaker starts speaking, the above-described processing may be performed based on sound pressure level data from one microphone, but a plurality of speakers may speak at the same time. Therefore, the microphone related to the main speaker in that case is specified.
As shown in FIG. 4, the sound pressure level data of each microphone necessary for the process of detecting the microphone in the direction of the speaker is a band pass filter (BPF) process for the sound pressure level data input through each microphone. And a peak value obtained by performing a peak hold (PH) process.

（ｄ）マイク信号の選択切替え処理
本処理では、（ｃ）発言者方向マイクの検出処理により選択されたマイクにＤＳＰ５２の出力を切り替える処理を行う。
具体的には、各マイク毎に設定する出力ゲインを変化させることにより行う。例えば、図５に示すように、マイク信号の選択切替え処理は、６回路の乗算器と６入力の加算器により構成され、選択されたマイク信号が接続されている乗算器のチャンネルゲイン（CH Gain)を「１」に、その他の乗算器のチャンネルゲインを「０」とすることにより、加算器には〔選択されたマイク信号×１〕と〔他のマイク信号×０〕の処理結果が加算される。これにより、選択されたマイク信号（選択マイク信号）が後段のＤＳＰ５３（図３）に送出される。 (D) Microphone signal selection switching process In this process, (c) a process of switching the output of the DSP 52 to the microphone selected by the speaker direction microphone detection process is performed.
Specifically, it is performed by changing the output gain set for each microphone. For example, as shown in FIG. 5, the microphone signal selection switching process includes a multiplier of 6 circuits and an adder of 6 inputs, and the channel gain (CH Gain) of the multiplier to which the selected microphone signal is connected. ) To “1” and the channel gain of the other multipliers to “0”, the processing results of [selected microphone signal × 1] and [other microphone signal × 0] are added to the adder. Is done. As a result, the selected microphone signal (selected microphone signal) is sent to the DSP 53 (FIG. 3) at the subsequent stage.

図１の各会議室のパーソナルコンピュータ３は、会社Ｘ内の専用線（図示略）に接続されている。会社ＸにはＷｅｂベースのグループウェアが導入されており、各会議室のパーソナルコンピュータ３のＷｅｂブラウザでは、会社Ｘ内のサーバー（図示略）から社員のスケジュールデータをダウンロードすることができる。 The personal computer 3 in each conference room in FIG. 1 is connected to a dedicated line (not shown) in the company X. Web-based groupware is introduced in the company X, and the schedule data of employees can be downloaded from a server (not shown) in the company X by the Web browser of the personal computer 3 in each conference room.

また、パーソナルコンピュータ３内には、図６に示すような、各会議室の名称とその会議室に設置された回線接続装置１の電話番号とを対応させた電話番号表のデータが予め記憶されている。 Further, in the personal computer 3, as shown in FIG. 6, data of a telephone number table in which the name of each conference room is associated with the telephone number of the line connection device 1 installed in the conference room is stored in advance. ing.

また、パーソナルコンピュータ３には、音声認識プログラムと、音声合成プログラムと、声紋認証プログラムと、通信制御プログラムとがインストールされている。これらのプログラムは、例えばＣＤ−ＲＯＭ等の記録媒体として提供してもよいし、あるいはＷｅｂサイトからダウンロードさせるようにしてもよい。 The personal computer 3 is installed with a voice recognition program, a voice synthesis program, a voice print authentication program, and a communication control program. These programs may be provided as a recording medium such as a CD-ROM, or may be downloaded from a website.

音声認識プログラムは、不特定話者の音声認識を行うためのプログラムであり、音声データ（ここでは、全方位音声入力装置２から回線接続装置１を経由してパーソナルコンピュータ３に送られて、パーソナルコンピュータ３内のサウンドボードでデジタル変換された音声データ）の音響的な特徴を抽出し、抽出した特徴を、予め登録した音声モデルと照合して、最も近似する候補を音声認識結果として出力する。音声認識技術としては、すでに公知の技術をこの音声認識プログラムに適用してよい。 The voice recognition program is a program for performing voice recognition of an unspecified speaker, and is voice data (here, sent from the omnidirectional voice input device 2 to the personal computer 3 via the line connection device 1 to be personalized. The acoustic features of the voice data (digitally converted by the sound board in the computer 3) are extracted, and the extracted features are collated with a previously registered voice model, and the closest candidate is output as a voice recognition result. As the voice recognition technique, a known technique may be applied to this voice recognition program.

音声合成プログラムは、文字データを音声データに変換するためのプログラムである。音声合成技術としては、すでに公知の技術をこの音声合成プログラムに適用してよい。 The speech synthesis program is a program for converting character data into speech data. As a speech synthesis technique, a known technique may be applied to this speech synthesis program.

声紋認証プログラムは、音声データ（ここでは、全方位音声入力装置２から回線接続装置１を経由してパーソナルコンピュータ３に送られて、パーソナルコンピュータ３内のサウンドボードでデジタル変換された音声データ）を単位時間（例えば３秒間）毎に声紋認証して、発話者を特定するためのプログラムである。声紋認証技術としては、すでに公知の技術をこの声紋認証プログラムに適用してよい。
なお、この単位時間は、発話者が複数存在し、発話者が切り替わる場合等を考慮すると、発話者特定精度の観点から、パーソナルコンピュータ３のＣＰＵの処理能力が許せば、極力短い時間が望ましい。 The voiceprint authentication program converts voice data (here, voice data sent from the omnidirectional voice input device 2 to the personal computer 3 via the line connection device 1 and digitally converted by the sound board in the personal computer 3). This is a program for performing voiceprint authentication every unit time (for example, 3 seconds) to identify a speaker. As the voiceprint authentication technique, a known technique may be applied to this voiceprint authentication program.
Note that this unit time is preferably as short as possible if the processing capability of the CPU of the personal computer 3 allows, from the viewpoint of speaker identification accuracy, considering the case where there are a plurality of speakers and the speakers are switched.

声紋認証プログラムの声紋認証処理は、以下の（１）乃至（３）の処理から成っている。
（１）声紋モデルの生成
音声データを単位時間分毎にスペクトル分析し、声紋の特徴を抽出することで、声紋モデルを作成する。すなわち、声紋モデルは、音声に含まれる様々な音の集まりを、時間、周波数および音の強さの三次元のパターンで表現したものである。 The voiceprint authentication process of the voiceprint authentication program includes the following processes (1) to (3).
(1) Generation of voiceprint model A voiceprint model is created by analyzing the spectrum of voice data every unit time and extracting features of the voiceprint. That is, the voiceprint model is a representation of a collection of various sounds included in speech by a three-dimensional pattern of time, frequency, and sound intensity.

（２）声紋モデルの照合
上記（１）の処理で生成した声紋モデルと、予め声紋レジスタ（パーソナルコンピュータ３内の記憶領域）に声紋認証対象者のＩＤとともに登録されている声紋モデルとを比較照合し、モデルの特徴量が近似する度合いに応じた照合スコアＳＣＲ（特徴量が近似するほど大きい）を算出する。そして、声紋レジスタ内の声紋モデルのうち、上記（１）の処理で生成した声紋モデルに最も近似する声紋モデルに対応するＩＤを特定する。 (2) Collation of voiceprint model The voiceprint model generated in the process of (1) above is compared with the voiceprint model registered in advance in the voiceprint register (storage area in the personal computer 3) together with the ID of the person who is the subject of voiceprint authentication. Then, a matching score SCR (larger as the feature amount approximates) corresponding to the degree to which the feature amount of the model approximates is calculated. Then, the ID corresponding to the voiceprint model that most closely approximates the voiceprint model generated by the process (1) is specified among the voiceprint models in the voiceprint register.

（３）照合スコアと閾値との比較
上記（２）の処理で算出した照合スコアＳＣＲを所定の閾値ＴＨＤと比較し、照合スコアＳＣＲが閾値ＴＨＤを越える場合には、上記（２）の処理で特定したＩＤを有効なものと判断する。 (3) Comparison between collation score and threshold value The collation score SCR calculated in the process (2) is compared with a predetermined threshold value THD, and if the collation score SCR exceeds the threshold value THD, the process (2) The identified ID is determined to be valid.

なお、各会議室に設置されたパーソナルコンピュータ３内の声紋レジスタには、予め、声紋認証対象者として会社Ｘの社員（少なくとも会議に出席する機会のある社員）の声紋モデルが登録されており、ＩＤとしては社員の氏名が登録されている。 In the voiceprint register in the personal computer 3 installed in each conference room, a voiceprint model of a company X employee (an employee who at least attends the meeting) is registered in advance as a voiceprint authentication target. The name of the employee is registered as the ID.

通信制御プログラムは、会議を開始する前に、以上の音声認識プログラム，音声合成プログラム及び声紋認証プログラムと、Ｗｅｂブラウザと、図６に示した電話番号表とを用いて、回線接続装置１を制御するためのプログラムである。 Before starting the conference, the communication control program controls the line connection device 1 using the above speech recognition program, speech synthesis program, voiceprint authentication program, Web browser, and telephone number table shown in FIG. It is a program to do.

図７は、この通信制御プログラムの処理内容を示すフローチャートである。最初に、Ｗｅｂブラウザを呼び出して、前述のグループウェアによって作成された当日の社員の会議室の使用予定に関するスケジュールデータをダウンロードさせ、そのスケジュールデータを取得する（ステップＳ１）。 FIG. 7 is a flowchart showing the processing contents of this communication control program. First, the Web browser is called to download schedule data relating to the use schedule of the employee's meeting room on the day created by the groupware, and the schedule data is acquired (step S1).

図８は、このスケジュールデータを例示する図である。１０時〜１２時の時間帯には、田中一郎・本田二郎・鈴木三郎の３名がＡ会議室を使用し、伊藤六郎・坂田花子・佐々木七郎の３名がＣ会議室を使用し、坂田太郎・鶴牧四郎・戸田五郎の３名がＥ会議室を使用する予定になっている。 FIG. 8 is a diagram illustrating this schedule data. From 10:00 to 12:00, three people Ichiro Tanaka, Jiro Honda and Saburo Suzuki use the A meeting room, three people Ito Rokuro, Hanako Sakata, and Nanao Sasaki use the C meeting room, and Sakata Taro, Shiro Tsurumaki and Goro Toda are scheduled to use the E meeting room.

１３時〜１５時の時間帯には、田中一郎・本田二郎の２名がＢ会議室を使用し、香川八郎・佐藤九郎・新内よし子の３名がＥ会議室を使用する予定になっている。 From 13:00 to 15:00, two people, Ichiro Tanaka and Jiro Honda, will use the B meeting room, and three people, Hachiro Kagawa, Kuro Sato, and Yoshiko Shinnai, will use the E meeting room. Yes.

１５時〜１７時の時間帯には、坂田花子・佐々木七郎の２名がＡ会議室を使用し、坂田太郎・戸田五郎の２名がＤ会議室を使用する予定になっている。 From 15:00 to 17:00, two people, Hanako Sakata and Shichiro Sasaki, will use the A meeting room, and two people, Taro Sakata and Goro Toda, will use the D meeting room.

図７に示すように、ステップＳ１に続き、声紋認証プログラムを呼び出して前述の声紋認証処理を実行させるとともに、音声認識プログラムを呼び出して音声認識を行わせる（ステップＳ２）。そして、声紋認証プログラムによって発話者が特定される（特定したＩＤが有効と判断される）まで待機する（ステップＳ３）。 As shown in FIG. 7, following step S1, the voiceprint authentication program is called to execute the above-described voiceprint authentication processing, and the voice recognition program is called to perform voice recognition (step S2). Then, the process waits until the speaker is specified by the voiceprint authentication program (the specified ID is determined to be valid) (step S3).

発話者が特定されると、ステップＳ１でダウンロードさせたスケジュールデータを参照して、その特定された発話者が、当該パーソナルコンピュータ３が設置されている会議室を現在の時間帯に使用する予定になっている社員であるか（すなわち会議出席者であるか）否かを判断する（ステップＳ４）。 When the speaker is specified, the schedule data downloaded in step S1 is referred to and the specified speaker plans to use the conference room in which the personal computer 3 is installed in the current time zone. It is determined whether or not the employee is an employee (ie, a meeting attendee) (step S4).

ノーであれば、ステップＳ３に戻る。他方イエスであれば、音声認識プログラムによる当該発話者の音声の認識結果を取得する（ステップＳ５）。そして、ステップＳ１でダウンロードさせたスケジュールデータを参照して、音声認識された音声に、現在の時間帯にいずれかの会議室を使用する予定の社員の氏と、“つないで”という語とが含まれているか否かを判断する（ステップＳ６）。 If no, return to step S3. On the other hand, if yes, the speech recognition result of the speaker by the speech recognition program is acquired (step S5). Then, referring to the schedule data downloaded in step S1, the employee who plans to use one of the conference rooms in the current time zone and the word “connect” are recorded in the voice that has been recognized. It is determined whether it is included (step S6).

ノーであれば、このスケジュールデータを参照して、音声認識された音声に、現在の時間帯に使用される予定のいずれかの会議室の名称と、“つないで”という語とが含まれているか否かを判断する（ステップＳ７）。ここでもノーであれば、ステップＳ３に戻る。 If no, refer to this schedule data and the voice recognized voice will contain the name of one of the conference rooms scheduled to be used in the current time zone and the word “connect” It is determined whether or not (step S7). If the answer is no again, the process returns to step S3.

ステップＳ６でイエスであれば、その認識された氏が、スケジュールデータ内の現在の時間帯の欄に複数存在しているか否かを判断する（ステップＳ８）。 If yes in step S6, it is determined whether there are a plurality of recognized persons in the current time zone column in the schedule data (step S8).

ノーであれば（１箇所にしか存在していなければ）、スケジュールデータから、その氏の社員が使用する会議室を確認する（ステップＳ９）。続いて、その確認した会議室の回線接続装置１の電話番号を、図６に示した電話番号表から取得する（ステップＳ１０）。 If no (if it exists only in one place), the conference room used by his employee is confirmed from the schedule data (step S9). Subsequently, the telephone number of the line connection device 1 in the confirmed conference room is acquired from the telephone number table shown in FIG. 6 (step S10).

そして、その取得した電話番号を発呼させる制御信号を、回線接続装置１（図１）に送り（ステップＳ１１）、処理を終了する。 Then, a control signal for calling the acquired telephone number is sent to the line connection device 1 (FIG. 1) (step S11), and the process is terminated.

ステップＳ８でイエスであれば、その氏の社員が使用する複数の会議室をスケジュールデータから確認する（ステップＳ１２）。そして、音声合成プログラムを呼び出して、会議室を選択させる音声を合成させ、その合成音声を、パーソナルコンピュータ３の内蔵スピーカ（またはパーソナルコンピュータ３本体に接続された付属スピーカ）から出力させる（ステップＳ１３）。 If yes in step S8, a plurality of conference rooms used by the employee of the employee are confirmed from the schedule data (step S12). Then, the voice synthesis program is called to synthesize the voice for selecting the conference room, and the synthesized voice is output from the built-in speaker of the personal computer 3 (or the attached speaker connected to the personal computer 3 main body) (step S13). .

続いて、この合成音声出力後の音声認識プログラムによる当該発話者の音声の認識結果を取得する（ステップＳ１４）。そして、その認識結果から、会議室の選択結果を判別する（ステップＳ１５）。 Subsequently, the recognition result of the speech of the speaker by the speech recognition program after outputting the synthesized speech is acquired (step S14). And the selection result of a meeting room is discriminate | determined from the recognition result (step S15).

なお、ステップＳ１３では、例えば、“〇〇□□さん（〇〇はステップＳ５で音声認識された氏、□□は名前）は、××時から××時まで×会議室にいます。つなぎますか。”という問合せの合成音声を、その氏の社員が使用する各会議室について、一定の時間（例えば数秒間）をあけながら出力する。 In Step S13, for example, “OO □□” (where “OO” is the voice recognized in Step S5 and □□ is the name) is in the X meeting room from XX hours to XX hours. The synthesized voice of the inquiry “Is it?” Is output for each conference room used by the employee of that person while leaving a certain time (for example, several seconds).

そして、ステップＳ１５では、一つの会議室についてこの問合せの音声が出力された直後に音声認識プログラムによって認識された言葉に、“はい”という語が含まれていれば、その会議室が選択されたと判別する。 In step S15, if the word "Yes" is included in the words recognized by the voice recognition program immediately after the voice of this inquiry is output for one meeting room, the meeting room is selected. Determine.

ステップＳ１５に続き、その判別した会議室の回線接続装置１の電話番号を、図６に示した電話番号表から取得する（ステップＳ１６）。そして、前述のステップＳ１１に進む。 Following step S15, the determined telephone number of the line connection device 1 in the conference room is obtained from the telephone number table shown in FIG. 6 (step S16). Then, the process proceeds to step S11 described above.

ステップＳ７でイエスであれば、その認識された名称の会議室の回線接続装置１の電話番号を、図６に示した電話番号表から取得する（ステップＳ１７）。そして、前述のステップＳ１１に進む。 If yes in step S7, the telephone number of the line connection device 1 in the conference room with the recognized name is acquired from the telephone number table shown in FIG. 6 (step S17). Then, the process proceeds to step S11 described above.

次に、この電話会議システムにおける会議の開始の様子（一つの会議室から相手の会議室の電話番号を発呼する様子）を、図８に示したスケジュールデータ上の１０時〜１２時のＡ会議室を例にとって説明する。 Next, a state of a conference in this telephone conference system (a state in which a telephone number of a conference room of a partner is called from one conference room) is shown as A at 10:00 to 12:00 on the schedule data shown in FIG. A conference room will be described as an example.

図８のスケジュールデータでは、１０時〜１２時に、田中一郎・本田二郎・鈴木三郎の３名がＡ会議室を使用し、坂田太郎・鶴牧四郎・戸田五郎の３名がＥ会議室を使用する予定になっている。 In the schedule data of FIG. 8, from 10:00 to 12:00, three people Ichiro Tanaka, Jiro Honda and Saburo Suzuki use the A meeting room, and three people Taro Sakata, Shiro Tsurumaki and Goro Toda use the E meeting room. It is scheduled.

ここでは、Ａ会議室及びＥ会議室を使用して、或る部署の田中一郎・本田二郎・鈴木三郎の３名と、別の部署の坂田太郎・鶴牧四郎・戸田五郎の３名とが電話会議を行う予定であるものとして説明を行う。 Here, using A meeting room and E meeting room, Ichiro Tanaka, Jiro Honda and Saburo Suzuki from one department and Taro Sakata, Shiro Tsurumaki and Goro Toda from another department call The explanation will be made assuming that the meeting is scheduled.

図９は、Ａ会議室の側から、会議の相手の氏を発話して会議を開始する様子を例示する図である。Ａ会議室を使用する３名は、会議の相手として少なくとも坂田氏が出席する予定であることを知っているが、相手がどの会議室を使用するかは知らないものとする。 FIG. 9 is a diagram illustrating a state in which a conference is started from the conference room A side by speaking the partner of the conference. The three persons using the A meeting room know that at least Mr. Sakata will attend as the meeting partner, but do not know which meeting room the other party will use.

そこで、Ａ会議室を使用する会議出席者のうちの例えば田中一郎が、１０時になったので、パーソナルコンピュータ３で音声制御プログラムを起動させた後、全方位音声入力装置２のうちのいずれかのマイクロフォンに向けて“坂田さん、つないで”と発話する。 Therefore, for example, Ichiro Tanaka among conference attendees using the A meeting room has reached 10 o'clock, so after starting the voice control program on the personal computer 3, any one of the omnidirectional voice input devices 2 Say “Mr. Sakata, connect” to the microphone.

すると、その音声を入力したマイクロフォンからの音声信号が、全方位音声入力装置２から回線接続装置１を経由してパーソナルコンピュータ３に送られる。 Then, a voice signal from the microphone that has input the voice is sent from the omnidirectional voice input device 2 to the personal computer 3 via the line connection device 1.

パーソナルコンピュータ３では、その発話者が田中一郎であることが声紋認証によって特定される（図７のステップＳ２，Ｓ３）。そして、この田中一郎は、スケジュールデータ上、会議室Ａを現在の時間帯（１０時〜１２時）に使用する予定であることが確認される（図７のステップＳ４）。 In the personal computer 3, it is specified by voiceprint authentication that the speaker is Ichiro Tanaka (steps S2 and S3 in FIG. 7). And it is confirmed that Ichiro Tanaka plans to use the conference room A in the current time zone (10:00 to 12:00) on the schedule data (step S4 in FIG. 7).

さらに、音声認識された“坂田さん、つないで”の音声には、“坂田”という社員の氏と“つないで”という語とが含まれていることが確認される（図７のステップＳ２，Ｓ５，Ｓ６）。 Furthermore, it is confirmed that the voice of “Mr. Sakata, connected” that has been voice-recognized includes the employee “Sakata” and the word “connected” (step S2, FIG. 7). S5, S6).

しかし、図８に示したように、スケジュールデータ内の現在の時間帯（１０時〜１２時）の欄には、坂田花子（Ｃ会議室を使用予定），坂田太郎（Ｅ会議室を使用予定）というように、坂田という氏が２つ存在していることが確認される（図７のステップＳ８）。 However, as shown in FIG. 8, in the column of the current time zone (10:00 to 12:00) in the schedule data, Hanako Sakata (scheduled to use the C meeting room), Taro Sakata (scheduled to use the E meeting room) Thus, it is confirmed that there are two Mr. Sakata (step S8 in FIG. 7).

そこで、図１０に示すように、まず、“坂田花子さんは、１０時から１２時までＣ会議室にいます。つなぎますか。”という問合せの合成音声が、パーソナルコンピュータ３から出力される（図７のステップＳ１２，Ｓ１３）。 Therefore, as shown in FIG. 10, first, a synthesized voice of an inquiry “Hanako Sakata is in the C meeting room from 10:00 to 12:00. Do you want to connect?” Is output from the personal computer 3 ( Steps S12 and S13 in FIG.

この合成音声を聴いた田中一郎は、会議の相手は坂田花子ではないので、図１０に示すように、“いいえ”と応答する（あるいは、無言のままでいる）。すると、今度は、図１０に示すように、“坂田太郎さんは、１０時から１２時までＥ会議室にいます。つなぎますか。”という問合せの合成音声が出力される（図７のステップＳ１３）。 Ichiro Tanaka who listened to the synthesized speech responds “No” (or remains silent) as shown in FIG. 10 because the conference partner is not Hanako Sakata. Then, as shown in FIG. 10, a synthesized voice of the query “Taro Sakata is in the E meeting room from 10:00 to 12:00. Do you want to connect?” Is output (step in FIG. 7). S13).

この合成音声を聴いた田中一郎は、会議の相手は坂田太郎なので、図１０に示すように、“はい”と応答する。すると、Ｅ会議室が選択されたと判別されて（図７のステップＳ１４，Ｓ１５）、このＥ会議室の回線接続装置１の電話番号が図６に示した電話番号表から取得される（図７のステップＳ１６）。 Ichiro Tanaka who listened to this synthesized voice responds with “Yes” as shown in FIG. 10 because the other party is Taro Sakata. Then, it is determined that the E meeting room is selected (steps S14 and S15 in FIG. 7), and the telephone number of the line connection device 1 in the E meeting room is acquired from the telephone number table shown in FIG. 6 (FIG. 7). Step S16).

そして、このＥ会議室の回線接続装置１の電話番号を発呼させる制御信号が、パーソナルコンピュータ３から回線接続装置１に送られる（図７のステップＳ１１）。 Then, a control signal for calling the telephone number of the line connection apparatus 1 in the E meeting room is sent from the personal computer 3 to the line connection apparatus 1 (step S11 in FIG. 7).

Ａ会議室の回線接続装置１は、この制御信号に基づき、Ｅ会議室の回線接続装置１の電話番号を発呼する。これにより、Ａ会議室の回線接続装置１とＥ会議室の回線接続装置１とで電話がつながるので、Ａ会議室・Ｅ会議室間で電話会議を開始することができる。 The line connection device 1 in the A conference room calls the telephone number of the line connection device 1 in the E conference room based on this control signal. Thereby, since the telephone connection is established between the line connection device 1 in the A meeting room and the line connection device 1 in the E meeting room, the telephone conference can be started between the A meeting room and the E meeting room.

なお、図９の例では会議の相手のうちの坂田太郎の氏を発話しているが、鶴牧四郎や戸田五郎の氏を発話した場合には、スケジュールデータ内の現在の時間帯（１０時〜１２時）の欄には鶴牧や戸田という氏はそれぞれ１つしか存在しない（図７のステップＳ８）ので、問合せの合成音声が出力されることなく、直ちにＥ会議室の回線接続装置１の電話番号が取得されて（図７のステップＳ９，Ｓ１０）、Ｅ会議室の回線接続装置１の電話番号を発呼させる制御信号が回線接続装置１に送られる（図７のステップＳ１１）。 In the example of FIG. 9, Mr. Taro Sakata, who is one of the meeting partners, is uttered. However, when Mr. Shiro Tsurumaki and Mr. Goro Toda are uttered, the current time zone in the schedule data (10:00 to Since there is only one Tsurumaki or Toda in the column of “12:00” (step S8 in FIG. 7), the telephone of the line connection device 1 in the E meeting room is immediately output without outputting the synthesized voice of the inquiry. The number is acquired (steps S9 and S10 in FIG. 7), and a control signal for calling the telephone number of the line connection device 1 in the E meeting room is sent to the line connection device 1 (step S11 in FIG. 7).

また、図９の例では会議の相手の氏を発話しているが、会議の相手の場所がＥ会議室であることは分かっているが相手方の出席者が分からない（会議の相手の部署は決まっているものの誰が出席するか分からない）ような場合には、“Ｅ会議室、つないで”と発話すれば、やはり、直ちにＥ会議室の回線接続装置１の電話番号が取得されて（図７のステップＳ７，Ｓ１７）、Ｅ会議室の回線接続装置１の電話番号を発呼させる制御信号が回線接続装置１に送られる（図７のステップＳ１１）。 In addition, in the example of FIG. 9, the meeting partner is uttered, but it is known that the meeting party is in the E meeting room, but the other party's attendees are not known. If you say “E-conference room, connect”, the telephone number of the line connection device 1 in the E-conference room is immediately acquired (see FIG. 7 (steps S7 and S17), a control signal for calling the telephone number of the line connection device 1 in the E meeting room is sent to the line connection device 1 (step S11 in FIG. 7).

また、図９の例ではＡ会議室を使用する会議出席者が発話しているが、この会議出席者以外の者が発話した場合には、その発話者が会議室Ａを現在の時間帯（１０時〜１２時）に使用する予定であることが確認されない（図７のステップＳ４）ので、Ａ会議室の回線接続装置１からＥ会議室の回線接続装置１の電話番号が発呼されることはない。 Further, in the example of FIG. 9, a conference attendee who uses the conference room A speaks, but when a party other than the conference attendee speaks, the talker sets the conference room A in the current time zone ( Since it is not confirmed that it will be used at 10:00 to 12:00 (step S4 in FIG. 7), the telephone number of the line connection device 1 in the E meeting room is called from the line connection device 1 in the A meeting room. There is nothing.

また、図９の例ではＡ会議室を使用する会議出席者が発話しているが、Ｅ会議室を使用する会議出席者のほうが発話した場合にも、全く同様にして、Ｅ会議室の回線接続装置１がＡ会議室の回線接続装置１の電話番号を発呼して、Ｅ会議室の回線接続装置１とＡ会議室の回線接続装置１とで電話がつながる。 Also, in the example of FIG. 9, the conference attendee using the A conference room speaks, but when the conference attendee using the E conference room speaks, the line of the E conference room is exactly the same. The connection device 1 calls the telephone number of the line connection device 1 in the A conference room, and the telephone connection is established between the line connection device 1 in the E conference room and the line connection device 1 in the A conference room.

以上のように、この電話会議システムによれば、会議の相手は同じであるが相手の会議室が会議のたびに一定していない場合（複数の会議室のうちの空いている会議室を予約して使用する場合）にも、相手の今回の会議室を調べることなく、その相手の名称をマイクロフォンに向けて発話するだけで、自動的にその相手の会議室との通信が開始される。 As described above, according to this conference call system, when the other party is the same, but the other party's meeting room is not constant every meeting (reserving a vacant meeting room among a plurality of meeting rooms) Even when the other party's name is spoken to the microphone, communication with the other party's conference room is automatically started.

これにより、会議の相手は同じであるが相手の会議室が会議のたびに一定していない場合にも、容易にその相手の会議室との通信を開始して会議を行うことができる。 As a result, even if the other party is the same, but the other party's conference room is not constant at each meeting, it is possible to easily start communication with the other party's conference room and hold a meeting.

また、会議の相手の会議室は分かっているが相手方の出席者が分からないような場合にも、その相手の会議室をマイクロフォンに向けて発話するだけで自動的にその相手の会議室との通信が開始されるので、やはり容易に相手の会議室との通信を開始して会議を行うことができる。 In addition, even if you know the conference room of the other party but you do not know the other party's attendees, you can automatically speak to the other party's conference room simply by speaking to the microphone. Since communication is started, it is possible to easily start communication with the other party's conference room and hold a conference.

また、自己の側の会議室で会議に出席する予定の者が会議の相手の氏（または相手の会議室名）を発話した場合にのみ相手の会議室との通信が開始されるので、部外者によって相手の会議室との通信が開始されることを防止できる。したがって、会議運営上のセキュリティを高めることができる。 In addition, communication with the other party's conference room is started only when the person who is scheduled to attend the meeting in the conference room on his / her own side speaks the other party's name (or the name of the other party's conference room). It is possible to prevent the outsider from starting communication with the other party's conference room. Therefore, security in conference operation can be increased.

また、スケジュールデータ上で同じ名称の相手に対応する会議室が複数存在する場合にも、会議室を選択させる合成音声が出力され、その合成音声に応答するようにして選択結果を発話すると自動的にその選択した会議室との通信が開始されるので、本来の相手が今回使用する会議室を適確に選択してその会議室との通信を開始することができる。 In addition, even if there are multiple meeting rooms corresponding to the same name on the schedule data, a synthesized voice that selects the meeting room is output, and if the selection result is uttered in response to the synthesized voice, it is automatically Since the communication with the selected conference room is started, it is possible to appropriately select the conference room used by the original party this time and start the communication with the conference room.

なお、以上の例では、会議出席者の名称として個人の氏を発話することにより、会議の相手の会議室との通信が開始されるようにしている。しかし、スケジュールデータに会議の相手の部署の名称が掲載される場合や、社外と会議を行う場合であってスケジュールデータに会議の相手の社名が掲載されるような場合には、会議出席者の名称として相手の部署の名称や相手の社名を発話することにより、その相手の会議室との通信が開始されるようにしてもよい。 Note that in the above example, communication with the conference room of the other party of the conference is started by speaking an individual Mr. as the name of the conference attendee. However, when the name of the meeting partner is listed in the schedule data, or when the company name of the meeting partner is listed in the schedule data when the meeting is held outside the company, You may make it start communication with the meeting room of the other party by speaking the name of the other party's department or the company name of the other party as the name.

また、以上の例では電話会議に本発明を適用しているが、電話会議以外の音声会議（例えば、各会議室間でＬＡＮや専用線を介して音声を送受信する会議）にも本発明を適用してよい。その場合にも、パーソナルコンピュータ３が、その音声会議において使用される通信機器を図１の回線接続装置１と同様にして制御するようにすればよい。 In the above example, the present invention is applied to a telephone conference. However, the present invention is also applied to a voice conference other than a telephone conference (for example, a conference in which voice is transmitted and received between each conference room via a LAN or a dedicated line). May apply. Even in this case, the personal computer 3 may control the communication device used in the audio conference in the same manner as the line connection device 1 in FIG.

また、以上の例では、音声認識プログラム，音声合成プログラム，声紋認証プログラム，通信制御プログラムをそれぞれ別々のソフトウェアとしているが、これらのソフトウェアの機能を全て有する一つのソフトウェアを作成して、パーソナルコンピュータ３にインストールしてもよい。 In the above example, the voice recognition program, the voice synthesis program, the voiceprint authentication program, and the communication control program are separate software, but one piece of software having all the functions of these software is created and the personal computer 3 is created. You may install it on

また、以上の例では、音声認識プログラム，音声合成プログラム，声紋認証プログラム，通信制御プログラムというアプリケーションソフトウェアをインストールしたパーソナルコンピュータ３を設けている。しかし、別の例として、これらのアプリケーションソフトウェアと同一の処理内容のファームウェアを実行するとともにＷｅｂブラウザ機能を有する専用プロセッサを設けた装置を、パーソナルコンピュータ３に代えて設けるようにしてもよい。 In the above example, the personal computer 3 in which application software such as a speech recognition program, a speech synthesis program, a voiceprint authentication program, and a communication control program is installed is provided. However, as another example, a device that executes a firmware having the same processing contents as those application software and includes a dedicated processor having a Web browser function may be provided instead of the personal computer 3.

また、以上の例では、回線接続装置１，全方位音声入力装置２，パーソナルコンピュータ３という３台の装置を各会議室に設置している。しかし、これに限らず、回線接続装置１と全方位音声入力装置２とを一体化して１台の装置にしたり、さらには、その１台の装置に上記のような専用プロセッサを搭載することによって全体を１台の装置にしてもよい。 Moreover, in the above example, three apparatuses, the line connection apparatus 1, the omnidirectional voice input apparatus 2, and the personal computer 3, are installed in each conference room. However, the present invention is not limited to this, and the line connection device 1 and the omnidirectional audio input device 2 are integrated into a single device, or further, the dedicated processor as described above is mounted on the single device. The whole may be a single device.

また、以上の例では１つの会社内の会議システムに本発明を適用しているが、それ以外の会議システム（例えば、複数の会社間の会議システム）にも本発明を適用してよい。 In the above example, the present invention is applied to a conference system in one company. However, the present invention may be applied to other conference systems (for example, a conference system between a plurality of companies).

本発明を適用した電話会議システムの全体構成例を示す図である。1 is a diagram illustrating an example of the overall configuration of a telephone conference system to which the present invention is applied. 全方位音声入力装置の外観構成例を示す図である。It is a figure which shows the example of an external appearance structure of an omnidirectional audio | voice input apparatus. 全方位音声入力装置の回路構成例を示すブロック図である。It is a block diagram which shows the circuit structural example of an omnidirectional audio | voice input apparatus. 図３のＤＳＰ５２の機能ブロック図である。FIG. 4 is a functional block diagram of the DSP 52 in FIG. 3. ＤＳＰ５２のマイク信号選択切替え処理を示す機能ブロック図である。It is a functional block diagram which shows the microphone signal selection switching process of DSP52. パーソナルコンピュータ内の電話番号表を示す図である。It is a figure which shows the telephone number table | surface in a personal computer. 通信制御プログラムの処理内容を示すフローチャートである。It is a flowchart which shows the processing content of a communication control program. スケジュールデータを例示する図である。It is a figure which illustrates schedule data. 会議の相手の氏を発話して会議を開始する様子を例示する図である。It is a figure which illustrates a mode that utters the other party of a meeting and starts a meeting. 会議室を選択させる合成音声等を例示する図である。It is a figure which illustrates the synthetic voice etc. which select a meeting room.

Explanation of symbols

１回線接続装置、２全方位音声入力装置、３パーソナルコンピュータ、４公衆電話回線、１１マイクロフォン収容部、１２スピーカ収容部、１３操作部、ＭＣ１〜ＭＣ６マイクロフォン、５１１〜５１６Ａ／Ｄ変換器、５２，５３ＤＳＰ、５４ＣＰＵ、５７スピーカ DESCRIPTION OF SYMBOLS 1 Line connection apparatus, 2 Omnidirectional audio input device, 3 Personal computer, 4 Public telephone line, 11 Microphone accommodating part, 12 Speaker accommodating part, 13 Operation part, MC1-MC6 microphone, 511-516 A / D converter, 52 , 53 DSP, 54 CPU, 57 Speaker

Claims

Voice recognition means for recognizing a voice signal from a microphone;
Storage means for storing identification information of communication means at a plurality of locations used for the meeting;
Refer to schedule data in which the meeting attendee's planned location and the name of the meeting attendee are associated with each other among the plurality of locations, and at least one of the meetings attended by the voice recognized by the voice recognition means It is determined whether or not a person's name is included, and if included, the location corresponding to the meeting attendee with the included name is confirmed from the schedule data, and communication of the confirmed place is performed. Processing means for obtaining identification information of the means from the storage means;
Control means for controlling communication means for performing communication via a communication line, and starting communication addressed to the identification information acquired by the processing means ;
Voice print data is extracted from a voice signal from the microphone, and the extracted voice print data is compared with pre-registered voice print data to identify a speaker.
The processing means refers to the schedule data to determine whether or not the speaker identified by the voiceprint authentication means is a meeting attendee who plans to use a place where the communication control device is installed. And the communication control apparatus for audio | voice conferencing which does not acquire the said identification information from the said memory | storage means, when it is not the meeting attendee who is going to use this place .

Voice recognition means for recognizing a voice signal from a microphone;
Storage means for storing identification information of communication means at a plurality of locations used for the meeting;
Refer to schedule data in which the meeting attendee's planned location and the name of the meeting attendee are associated with each other among the plurality of locations, and at least one of the meetings attended by the voice recognized by the voice recognition means It is determined whether or not a person's name is included, and if included, the location corresponding to the meeting attendee with the included name is confirmed from the schedule data, and communication of the confirmed place is performed. Processing means for obtaining identification information of the means from the storage means;
Control means for controlling communication means for performing communication via a communication line, and starting communication addressed to the identification information acquired by the processing means ;
Voice synthesis means,
The processing means, when a meeting attendee whose name is included in the voice recognized by the voice recognition means corresponds to a plurality of places in the schedule data, the voice synthesizing means A communication control device for voice conference , wherein the selection result is discriminated from the voice recognized by the voice recognition means, and the identification information of the communication means at the selected place is obtained from the storage means .

A first step of recognizing a voice signal from a microphone by voice recognition means included in the communication control device ;
With reference to schedule data in which a meeting attendee plans to use a location among a plurality of locations used for a meeting and the name of the meeting attendee, the voice recognized by the voice recognition means is at least It is determined whether the name of the meeting attendee is included, and if it is included, the location corresponding to the meeting attendee with the included name is confirmed from the schedule data, and the confirmation A second step of acquiring the identification information of the communication means at the location from the storage means storing the identification information of the communication means at the plurality of locations;
A third step of controlling communication means for performing communication via a communication line and starting communication addressed to the identification information acquired in the second step;
Extracting voice print data from a voice signal from the microphone, and comparing the extracted voice print data with pre-registered voice print data to identify a speaker;
In the second step, referring to the schedule data, whether or not the speaker identified in the fourth step is a meeting attendee who plans to use the place where the communication control device is installed A communication control method for audio conferencing that does not acquire the identification information from the storage means if it is not a meeting attendee who intends to use the place .

A first step of recognizing a voice signal from a microphone by voice recognition means included in the communication control device;
With reference to schedule data in which a meeting attendee plans to use a location among a plurality of locations used for a meeting and the name of the meeting attendee, the voice recognized by the voice recognition means is at least It is determined whether the name of the meeting attendee is included, and if it is included, the location corresponding to the meeting attendee with the included name is confirmed from the schedule data, and the confirmation A second step of acquiring the identification information of the communication means at the location from the storage means storing the identification information of the communication means at the plurality of locations;
A third step of controlling communication means for performing communication via a communication line and starting communication addressed to the identification information acquired in the second step;
In the second step, when a meeting attendee having a name included in the voice recognized by the voice recognition unit corresponds to a plurality of places in the schedule data, a voice for selecting a place is transmitted in the communication Synthesized by the speech synthesizer included in the control device and output, and then the selection result is determined from the speech recognized by the speech recognition unit, and the identification information of the communication unit at the selected location is acquired from the storage unit
Communication control method for voice conference.

On the computer,
A first procedure for recognizing a voice signal from a microphone by voice recognition means included in the communication control device ;
With reference to schedule data in which a meeting attendee plans to use a location among a plurality of locations used for a meeting and the name of the meeting attendee, the voice recognized by the voice recognition means is at least It is determined whether the name of the meeting attendee is included, and if it is included, the location corresponding to the meeting attendee with the included name is confirmed from the schedule data, and the confirmation A second procedure for acquiring the identification information of the communication means at the location from the storage means storing the identification information of the communication means at the plurality of locations;
A third procedure for controlling communication means for performing communication via a communication line and starting communication addressed to the identification information acquired in the second procedure ;
A fourth procedure for extracting voice print data from a voice signal from the microphone, and comparing the extracted voice print data with previously registered voice print data to identify a speaker;
A program for executing
In the second procedure, referring to the schedule data, whether or not the speaker identified by the fourth procedure is a conference attendee who plans to use the place where the communication control device is installed A program that does not acquire the identification information from the storage means if it is not a meeting attendee who intends to use the place .

On the computer,
A first procedure for recognizing a voice signal from a microphone by voice recognition means included in the communication control device;
With reference to schedule data in which a meeting attendee plans to use a location among a plurality of locations used for a meeting and the name of the meeting attendee, the voice recognized by the voice recognition means is at least It is determined whether the name of the meeting attendee is included, and if it is included, the location corresponding to the meeting attendee with the included name is confirmed from the schedule data, and the confirmation A second procedure for acquiring the identification information of the communication means at the location from the storage means storing the identification information of the communication means at the plurality of locations;
A third procedure for controlling communication means for performing communication via a communication line and starting communication addressed to the identification information acquired in the second procedure;
A program for executing
In the second procedure, when a meeting attendee whose name is included in the voice recognized by the voice recognition unit corresponds to a plurality of places in the schedule data, a voice for selecting a place is transmitted in the communication. Synthesized by the speech synthesizer included in the control device and output, and then the selection result is determined from the speech recognized by the speech recognition unit, and the identification information of the communication unit at the selected location is acquired from the storage unit
program.

On the computer,
A first procedure for recognizing a voice signal from a microphone by voice recognition means included in the communication control device ;
With reference to schedule data in which a meeting attendee plans to use a location among a plurality of locations used for a meeting and the name of the meeting attendee, the voice recognized by the voice recognition means is at least It is determined whether the name of the meeting attendee is included, and if it is included, the location corresponding to the meeting attendee with the included name is confirmed from the schedule data, and the confirmation A second procedure for acquiring the identification information of the communication means at the location from the storage means storing the identification information of the communication means at the plurality of locations;
A third procedure for controlling communication means for performing communication via a communication line and starting communication addressed to the identification information acquired in the second procedure ;
A fourth procedure for extracting voice print data from a voice signal from the microphone, and comparing the extracted voice print data with previously registered voice print data to identify a speaker;
A computer-readable recording medium storing a program for executing
In the second procedure, referring to the schedule data, whether or not the speaker identified by the fourth procedure is a conference attendee who plans to use the place where the communication control device is installed And a computer-readable recording medium on which a program that does not acquire the identification information from the storage means is recorded if the person is not a meeting attendee who intends to use the place .

On the computer,
A first procedure for recognizing a voice signal from a microphone by voice recognition means included in the communication control device;
With reference to schedule data in which a meeting attendee plans to use a location among a plurality of locations used for a meeting and the name of the meeting attendee, the voice recognized by the voice recognition means is at least It is determined whether the name of the meeting attendee is included, and if it is included, the location corresponding to the meeting attendee with the included name is confirmed from the schedule data, and the confirmation A second procedure for acquiring the identification information of the communication means at the location from the storage means storing the identification information of the communication means at the plurality of locations;
A third procedure for controlling communication means for performing communication via a communication line and starting communication addressed to the identification information acquired in the second procedure;
A computer-readable recording medium storing a program for executing
In the second procedure, when a meeting attendee whose name is included in the voice recognized by the voice recognition unit corresponds to a plurality of places in the schedule data, a voice for selecting a place is transmitted in the communication. Synthesized by the speech synthesizer included in the control device and output, and then the selection result is determined from the speech recognized by the speech recognition unit, and the identification information of the communication unit at the selected location is acquired from the storage unit
A computer-readable recording medium on which a program is recorded.