JPH06292198A

JPH06292198A - Talker deciding system

Info

Publication number: JPH06292198A
Application number: JP5078143A
Authority: JP
Inventors: Kazuo Mogi; 一男茂木
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1993-04-05
Filing date: 1993-04-05
Publication date: 1994-10-18
Anticipated expiration: 2017-08-12
Also published as: JP3313446B2

Abstract

PURPOSE:To obtain a talker deciding system by which no unnatural video switching is performed by selecting a preceding talker preferentially when either of high-order two video conference terminals with high voice detecting volume is the video conference terminal of the preceding talker at the time of video siwtching by talker decision is performed. CONSTITUTION:Preceding talker storing memory 11 which stores the preceding talker, and a high-order two voice count decision part 10 which obtains the high-order two video conference terminals with the high voice detecting volume and decides the video conference terminal of the preceding talker or the video conference terminal with the largest voice detecting volume as a talker based on the difference between those voice detecting volumes are added to a talker decision part 1. Thereby, the talker deciding system which performs no video switching can be obtained when the video conference terminal at a second place out of the high-order two video conference terminals with high voice detecting volume is the talker at the time of switching to a preceding video.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、多地点間通信制御装
置等において、複数のテレビ会議端末からの音声により
話者を判定し、その映像を他の端末に分配する話者判定
方式に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speaker determination system for determining a speaker from voices from a plurality of video conference terminals and distributing the video to other terminals in a multipoint communication control device or the like. Is.

【０００２】[0002]

【従来の技術】図１８は例えば、特開平１−２８６６８
９号公報に示されたような従来の多地点間通信制御装置
を示す全体構造図である。図において、１１０は多地点
間通信制御装置、１１１はビデオカメラ、１１２はＣＲ
Ｔ装置、１１３はビデオカメラ１１１とＣＲＴ装置１１
２とマイクスピーカ等を備えたテレビ会議端末、１１５
はテレビ会議端末と多地点間通信制御装置を結ぶ回線で
ある。2. Description of the Related Art FIG. 18 shows, for example, JP-A-1-28668.
FIG. 9 is an overall structural diagram showing a conventional multipoint communication control device as shown in Japanese Patent Publication No. In the figure, 110 is a multipoint control device, 111 is a video camera, and 112 is a CR.
T device, 113 is a video camera 111 and a CRT device 11
2 and a video conference terminal equipped with a microphone speaker, 115
Is a line connecting the video conference terminal and the multipoint control unit.

【０００３】また図１９は、通信制御装置の内部を示す
ブロック図であり、図において、１１０は多地点間通信
制御装置であり、１０２はその全体を制御管理するシス
テム制御部である。１０３は各テレビ会議端末とを結ぶ
回線とのインタフェースを司る回線インタフェース部
（以下、回線Ｉ／Ｆ部という）である。１０４−１〜１
０４−ｎは前記各テレビ会議端末に送信される音声デー
タ、映像データ及び汎用データを多重化する多重部であ
り、１０５−１〜１０５−ｎは各テレビ会議端末から受
信した多重データを音声データ、映像データ及び汎用デ
ータに分離する分離部である。FIG. 19 is a block diagram showing the inside of the communication control apparatus. In the figure, 110 is a multipoint communication control apparatus, and 102 is a system control section for controlling and managing the whole. Reference numeral 103 denotes a line interface unit (hereinafter referred to as a line I / F unit) that manages an interface with a line connecting each video conference terminal. 104-1 ~ 1
Reference numeral 04-n is a multiplexing unit that multiplexes audio data, video data, and general-purpose data transmitted to each of the video conference terminals, and 105-1 to 105-n are multiplex data received from each video conference terminal as audio data. , A separation unit that separates video data and general-purpose data.

【０００４】２は分離部１０５−１〜１０５−ｎのそれ
ぞれで分離された各音声データを合成して多重部１０４
−１〜１０４−ｎに送る音声加算分配部、７は各分離部
１０５−１〜１０５−ｎで分離された映像データ中の１
つを選択して多重部１０４−１〜１０４−ｎに送る映像
切替分配部であり、１は複数のテレビ会議端末からの音
声により話者を判定する話者判定部である。Reference numeral 2 is a multiplexer 104 that synthesizes the audio data separated by the separators 105-1 to 105-n.
-1 to 104-n, an audio addition / distribution unit, and 7 is 1 of the video data separated by the separation units 105-1 to 105-n.
A video switching / distributing unit that selects one of the two and sends it to the multiplexing units 104-1 to 104-n, and 1 is a speaker determining unit that determines a speaker based on voices from a plurality of video conference terminals.

【０００５】また、図２０は音声加算分配部２の構成を
示すブロック図である。図において、２０は音声加算分
配部２の全体を管理する音声管理部であり、２１は分離
部１０５−１〜１０５−ｎの各々から送られてくる音声
データを加算する音声合成部である。２２−１〜２２−
ｎは各分離部１０５−１〜１０５−ｎからの非線形の音
声データを線形の音声データに変換して音声合成部２１
に渡し、音声合成部２１で合成された線形の音声データ
を非線形の音声データに変換して各多重部１０４−１〜
１０４−ｎに渡す音声符号化・復号化部である。２３は
音声管理部２０内に設けられ、各ビットが各テレビ会議
端末に対応しているｎビットの音声検出フラグである。FIG. 20 is a block diagram showing the structure of the voice addition / distribution unit 2. In the figure, 20 is a voice management unit that manages the entire voice addition and distribution unit 2, and 21 is a voice synthesis unit that adds the voice data sent from each of the separation units 105-1 to 105-n. 22-1 to 22-
n is a speech synthesis unit 21 that converts the non-linear voice data from each of the separation units 105-1 to 105-n into linear voice data.
To the multiplex units 104-1 to 104-1 to convert the linear voice data synthesized by the voice synthesis unit 21 into nonlinear voice data.
104-n is a voice encoding / decoding unit. Reference numeral 23 is an n-bit voice detection flag provided in the voice management unit 20 and each bit corresponds to each video conference terminal.

【０００６】次に動作について説明する。回線Ｉ／Ｆ部
１０３に回線経由で接続されたテレビ会議端末の中で、
会議を開催しようとするテレビ会議端末（親端末）が多
地点間通信制御装置１１０に対して、どのテレビ会議端
末と多地点間テレビ会議を開催したいかを指示する。Next, the operation will be described. Among the video conference terminals connected to the line I / F unit 103 via the line,
The video conference terminal (parent terminal) that intends to hold a conference instructs the multipoint communication control device 110 which video conference terminal and the multipoint video conference to be held.

【０００７】システム制御部１０２は当該要求を受け取
ると、回線Ｉ／Ｆ部１０３に対して指定されたテレビ会
議端末を呼び出すよう、指示を出す。呼び出された各テ
レビ会議端末と当該多地点間通信制御装置１１０との回
線が接続されると、音声データ、映像データ及び汎用デ
ータを多重した多重データの送受信が行われて、多地点
間テレビ会議が開始される。Upon receiving the request, the system control unit 102 gives an instruction to the line I / F unit 103 to call the designated video conference terminal. When the line between each called video conference terminal and the multipoint communication control device 110 is connected, transmission and reception of multiplexed data in which voice data, video data, and general-purpose data are multiplexed is performed, and the multipoint video conference is performed. Is started.

【０００８】各テレビ会議端末から送信されてくる多重
データは、回線Ｉ／Ｆ部１０３で受信された後、それぞ
れに対応する分離部１０５−１〜１０５−ｎに渡され、
ここで音声データと映像データに分離され、音声データ
は音声加算分配部２に、映像データは映像切替分配部３
に、それぞれ渡される。音声加算分配部２では各分離部
１０５−１〜１０５−ｎからの音声データを対応する音
声符号化・復号化部２２−１〜２２−ｎにて受け取り、
非線形のデータから線形のデータに変換して音声合成部
２１に送る。音声合成部２１は各音声符号化・復号化部
２２−１〜２２−ｎで線形のデータに変換された音声デ
ータをそれぞれ加算する。[0008] The multiplexed data transmitted from each video conference terminal is received by the line I / F unit 103 and then passed to the corresponding demultiplexing units 105-1 to 105-n,
Here, it is separated into audio data and video data. The audio data is sent to the audio addition / distribution unit 2 and the video data is sent to the video switching / distribution unit 3.
, Respectively. In the voice addition / distribution unit 2, the voice data from each of the separation units 105-1 to 105-n is received by the corresponding voice encoding / decoding unit 22-1 to 22-n,
The non-linear data is converted to linear data and sent to the speech synthesis unit 21. The voice synthesizing unit 21 adds the voice data converted into linear data by the voice encoding / decoding units 22-1 to 22-n, respectively.

【０００９】その後、加算された音声データから自分の
音声を差し引いた音声データを生成して、それを対応す
る音声符号化・復号化部２２−１〜２２−ｎのそれぞれ
に渡す。各音声符号復号化部２２−１〜２２−ｎは受け
取った音声データを線形データから非線形データに変換
して対応する多重部１０４−１〜１０４−ｎに送る。ま
た、音声合成部２１は音声が検出されたテレビ会議端末
に対応している音声検出フラグをオンにし、所定時間単
位（以下、音声検出期間という）毎にｎビットすべてを
クリアするという動作を繰り返す。After that, voice data obtained by subtracting one's own voice from the added voice data is generated and passed to each of the corresponding voice encoding / decoding units 22-1 to 22-n. Each of the voice coding / decoding units 22-1 to 22-n converts the received voice data from linear data to non-linear data and sends it to the corresponding multiplexing units 104-1 to 104-n. Further, the voice synthesizing unit 21 turns on the voice detection flag corresponding to the video conference terminal in which the voice is detected, and repeats the operation of clearing all n bits for each predetermined time unit (hereinafter, referred to as voice detection period). .

【００１０】また、各分離部１０５−１〜１０５−ｎで
分離された映像データは映像切替分配部３に送られ、映
像切替分配部３では話者判定部２の指示に従って受け取
った映像データ中の１つを選択し、それを各多重部１０
４−１〜１０４−ｎに分配するように映像データの切替
を行う。The video data separated by each of the separation units 105-1 to 105-n is sent to the video switching distribution unit 3, and the video switching distribution unit 3 receives the video data according to the instruction of the speaker judging unit 2. One of the multiplex units 10
The video data is switched so as to be distributed to 4-1 to 104-n.

【００１１】次に、図２１は例えば特公平３−６５７０
８号公報に示された従来の話者判定部を示すブロック図
である。図において、１は多地点テレビ会議システム等
で使用される多地点間通信制御装置に設けられて、話者
判定を行う話者判定部である。２は複数の接続されたテ
レビ会議端末からの音声を加算し、各テレビ会議端末に
分配する音声加算分配部、３は複数の接続されたテレビ
会議端末からの映像を他のテレビ会議端末に分配する映
像切替分配部である。４は話者判定部１において、各処
理部の実行を管理し、話者判定を行う制御処理部、５は
話者判定を行うための話者判定パラメータを格納した話
者判定パラメータ、６は制御処理部４からの指示により
音声加算分配部２の音声検出フラグ２３から音声検出の
回数をカウントし、音声の有無を検出する音声カウント
部、７は音声カウント部６の音声検出結果（カウント
値）を格納するための音声カウントメモリ、８は制御処
理部４からの指示により音声カウントメモリ７に格納さ
れた音声検出結果（カウント値）が後述する有効音声カ
ウントにあらかじめ記録されたしきい値未満であれば該
当する音声カウントメモリ７の音声検出結果（カウント
値）をクリアする有効音声判定部、９は制御処理部４か
らの指示により音声カウントメモリ７に格納された音声
検出結果より最も音声検出結果（カウント値）が大きい
ものを捜し、その端末を話者として判定し、映像切替分
配部３へ判定された話者の映像を他の端末へ分配するよ
うに指示する最大音声カウント判定部である。Next, FIG. 21 shows, for example, Japanese Patent Publication No. 3-6570.
It is a block diagram which shows the conventional speaker determination part shown by the 8th publication. In the figure, reference numeral 1 is a speaker determination unit which is provided in a multipoint communication control device used in a multipoint video conference system or the like and which determines a speaker. 2 is an audio addition and distribution unit that adds audio from a plurality of connected video conference terminals and distributes the audio to each video conference terminal, and 3 distributes video from a plurality of connected video conference terminals to other video conference terminals It is a video switching distribution unit. Reference numeral 4 is a control processing unit that manages execution of each processing unit in the speaker determination unit 1 and performs speaker determination, 5 is a speaker determination parameter that stores a speaker determination parameter for performing speaker determination, and 6 is A voice count unit that counts the number of voice detections from the voice detection flag 23 of the voice addition and distribution unit 2 according to an instruction from the control processing unit 4 to detect the presence or absence of voice, and 7 is a voice detection result (count value of the voice count unit 6 ) Is stored in the voice count memory, and 8 is a voice detection result (count value) stored in the voice count memory 7 according to an instruction from the control processing unit 4 less than a threshold value recorded in advance in an effective voice count. If so, an effective voice determination unit that clears the voice detection result (count value) of the corresponding voice count memory 7, and 9 is stored in the voice count memory 7 according to an instruction from the control processing unit 4. The one having the largest voice detection result (count value) than the detected voice detection result is searched, the terminal is determined as the speaker, and the video of the determined speaker is distributed to the video switching distribution unit 3 to other terminals. It is a maximum voice count determination unit for instructing to.

【００１２】また、図２２に話者判定パラメータ５及び
音声カウントメモリ７の構成例を示す。５ａは話者判定
を行う期間を指定するための話者判定カウンタ、５ｂは
音声検出時間間隔を指定するための音声検出時間、５ｃ
は音声検出結果を有効と判定するためのしきい値を指定
するための有効音声カウンタ、７ａ〜７ｎは接続された
テレビ会議端末毎の音声検出結果を音声加算分配部２の
音声検出フラグ２３から判断して音声検出ありの時を１
として加算した結果を格納するための音声カウンタであ
る。話者判定方式の例として話者判定カウンタ５ａには
音声検出時間５ｂを１としたときの整数倍の値を設定す
る。Further, FIG. 22 shows a configuration example of the speaker determination parameter 5 and the voice count memory 7. Reference numeral 5a is a speaker determination counter for designating a period for speaker determination, 5b is a voice detection time for designating a voice detection time interval, 5c.
Is a valid voice counter for designating a threshold value for determining the voice detection result as valid, and 7a to 7n are the voice detection results of the connected video conference terminals from the voice detection flag 23 of the voice addition and distribution unit 2. Judgment is 1 when there is voice detection
Is a voice counter for storing the result of addition. As an example of the speaker determination method, the speaker determination counter 5a is set to a value that is an integral multiple when the voice detection time 5b is 1.

【００１３】図２２に示した例においては、音声検出時
間５ｂを４０ｍｓを設定し、この４０ｍｓを１とした
時、話者判定カウンタにはその５０倍の値を設定してい
る。この話者判定カウンタの値は話者判定期間をカウン
タ値で示すものであり、この例においては４０ｍｓが５
０個集まった２秒間を話者判定期間と定めている。この
話者判定期間（２秒間）はこの時間単位に話者を判定す
るためのものであり、話者がめまぐるしく変化する場合
でも最低２秒間は同一話者が表示されることを意味して
いる。図２３はこの話者判定期間と音声検出時間の関係
を示す図である。図においてＣ１，Ｃ２，Ｃ３，Ｃ４は
それぞれの話者判定期間（２秒間）を示している。ま
た、番号１〜５０は１つの話者判定期間（２秒間）の中
に５０の音声検出時間（４０ｍｓ）が存在していること
を示している。In the example shown in FIG. 22, the voice detection time 5b is set to 40 ms, and when this 40 ms is set to 1, the speaker determination counter is set to a value 50 times that value. The value of the speaker determination counter indicates the speaker determination period by a counter value, and 40 ms is 5 in this example.
2 seconds when 0 pieces are collected is defined as a speaker determination period. This speaker determination period (2 seconds) is for determining the speaker in this time unit, and means that the same speaker is displayed for at least 2 seconds even if the speaker changes rapidly. . FIG. 23 is a diagram showing the relationship between the speaker determination period and the voice detection time. In the figure, C1, C2, C3 and C4 indicate respective speaker determination periods (2 seconds). Numbers 1 to 50 indicate that 50 voice detection times (40 ms) exist within one speaker determination period (2 seconds).

【００１４】図２２の有効音声カウンタ５ｃには話者判
定期間内にある５０の音声検出回数の内何回以上の音声
検出があった場合にその話者判定期間内において話者の
候補となり得るかというしきい値を設定する。図に示し
た例においては、３０という値を設定している。従っ
て、音声検出回数５０回の内３０回以上に音声検出がさ
れている場合には、その端末の話者を画面切替のための
候補とする。なお、前述した話者判定カウンタ及び有効
音声カウンタのそれぞれの意味からそれぞれのカウンタ
の値は「話者判定カウンタ５ａ≧有効音声カウンタ５ｃ
＞０」の条件にて設定してあるものとする。If the effective voice counter 5c shown in FIG. 22 detects voice detection more than 50 times within the number of voice detections within the speaker determination period, it can be a speaker candidate within the speaker determination period. Set a threshold for that. In the example shown in the figure, a value of 30 is set. Therefore, when the voice is detected 30 times or more out of the 50 times of voice detection, the speaker of the terminal is set as the candidate for screen switching. From the meanings of the speaker determination counter and the valid voice counter described above, the value of each counter is "speaker determination counter 5a ≧ valid voice counter 5c.
It is assumed that it is set under the condition of ">0".

【００１５】次に動作について説明する。話者判定部１
において、制御処理部４は話者判定を行う前に音声カウ
ント部６に指示して音声カウントメモリ７の音声カウン
タ７ａ〜７ｎを０クリアする。Next, the operation will be described. Speaker determination unit 1
In, the control processing unit 4 instructs the voice counting unit 6 to clear the voice counters 7a to 7n of the voice counting memory 7 to 0 before performing the speaker determination.

【００１６】次に制御処理部４は話者判定パラメータ５
の音声検出時間（４０ｍｓ）が経過するまで待った後
に、音声カウント部６に音声検出を指示する。この音声
検出時間の間に音声加算分配部２の音声合成部は各テレ
ビ会議端末からの音声を入力し加算分配する。また、音
声合成部はこの音声検出時間の間に音声が検出された端
末に対応する音声検出フラグ２３をそれぞれオンにす
る。例えば、第１と第２のテレビ会議端末から音声の入
力があった場合には音声検出フラグ２３の第１と第２度
目のフラグがオンになる。Next, the control processing unit 4 uses the speaker determination parameter 5
After waiting until the voice detection time (40 ms) has passed, the voice counting unit 6 is instructed to perform voice detection. During this voice detection time, the voice synthesizing unit of the voice adding / distributing unit 2 inputs voices from each video conference terminal and adds and distributes the voices. Further, the voice synthesizing unit turns on the voice detection flags 23 corresponding to the terminals in which voices are detected during the voice detection time. For example, when voice is input from the first and second video conference terminals, the first and second flags of the voice detection flag 23 are turned on.

【００１７】音声カウント部６は音声加算分配部２の音
声検出フラグ２３よりテレビ会議端末毎の音声検出の有
無を検出し、音声検出ありの時は音声カウントメモリ７
の該当するテレビ会議端末の音声カウンタ７ａ〜７ｎを
＋１する。例えば、音声検出フラグ２３の第１と第２の
ビットがオンの場合には、端末１の音声カウンタ７ａと
端末２の音声カウンタ７ｂに１が加算される。音声加算
分配部２の音声検出フラグ２３は話者判定部１に対して
そのフラグの値を出力したのちは音声検出フラグｎビッ
トを全てクリアし、次の音声検出時間において新たに音
声が入力された端末に対して再び音声検出フラグ２３の
対応するフラグをオンにする。The voice counting unit 6 detects the presence or absence of voice detection for each video conference terminal from the voice detection flag 23 of the voice addition and distribution unit 2, and when there is voice detection, the voice count memory 7
The voice counters 7a to 7n of the corresponding video conference terminal are incremented by one. For example, when the first and second bits of the voice detection flag 23 are on, 1 is added to the voice counter 7a of the terminal 1 and the voice counter 7b of the terminal 2. The voice detection flag 23 of the voice addition / distribution unit 2 outputs the value of the flag to the speaker determination unit 1 and then clears all n bits of the voice detection flag, and a new voice is input at the next voice detection time. The corresponding flag of the voice detection flag 23 is turned on again for the terminal.

【００１８】制御処理部４は話者判定パラメータの話者
判定カウンタ５ａの値に達するまで上記処理を繰り返
す。この例においては、話者判定カウンタ５ａのカウン
ト値が５０であるため上記処理を５０回繰り返すことに
なる。一方、音声加算分配部２の音声検出フラグ２３も
５０回の検出及び５０回のクリアを繰り返す。このよう
にして、音声カウンタ７ａから７ｎには話者判定期間
（２秒間）の内、それぞれ何回だけ音声が検出されたか
というカウント値が記録されることになる。図に示した
音声カウントメモリの音声カウンタの各値は、話者判定
期間（２秒間）が経過したときの各カウンタ値を示して
おり、この例では端末１から４５回、端末２から１５
回、端末ｎから４０回それぞれ音声検出されたことを示
している。The control processing unit 4 repeats the above processing until the value of the speaker determination counter 5a of the speaker determination parameter is reached. In this example, since the count value of the speaker determination counter 5a is 50, the above process is repeated 50 times. On the other hand, the voice detection flag 23 of the voice addition and distribution unit 2 also repeats detection and clearing 50 times. In this way, the voice counters 7a to 7n record the count value of how many times each voice is detected within the speaker determination period (2 seconds). Each value of the voice counter of the voice count memory shown in the figure shows each counter value when the speaker determination period (2 seconds) has elapsed, and in this example, terminal 1 to 45 times, terminal 2 to 15
, And 40 times from the terminal n, respectively.

【００１９】次に制御処理部４は有効音声判定部８に話
者判定パラメータの有効音声カウンタ５ｃの値（３０）
を通知し、有効音声の判定を指示する。有効音声カウン
ト判定部８は音声カウントメモリ７の音声カウンタ７ａ
〜７ｎが有効音声カウンタ値（３０）より少なければ該
当する音声カウンタ７ａ〜７ａを０クリアする。この時
点で図に示した端末２は音声カウンタ値が３０より小さ
いため０クリアされる。Next, the control processing unit 4 causes the effective voice determining unit 8 to set the value (30) of the effective voice counter 5c of the speaker determination parameter.
Is instructed and an instruction to determine valid voice is given. The valid voice count determination unit 8 is a voice counter 7a of the voice count memory 7.
If ~ 7n is less than the valid sound counter value (30), the corresponding sound counters 7a to 7a are cleared to 0. At this time point, the terminal 2 shown in the figure has a voice counter value smaller than 30, and is cleared to 0.

【００２０】次に制御処理部４は最大音声カウント判定
部９に話者判定を指示する。最大音声カウント判定部９
は、音声カウントメモリ７の音声カウンタ７ａ〜７ｎの
うち、最も音声カウント値が大きいものを話者として判
定し、映像切替分配部３に該当するテレビ会議端末の映
像を他のテレビ会議端末に分配するように指示する。こ
の例では、端末１が端末ｎよりも大きなカウント値を示
しているため端末１が話者として判定される。Next, the control processing unit 4 instructs the maximum voice count determination unit 9 to determine the speaker. Maximum voice count determination unit 9
Judges the speaker with the largest sound count value among the sound counters 7a to 7n of the sound count memory 7 as the speaker, and distributes the video of the video conference terminal corresponding to the video switching / distributing unit 3 to other video conference terminals. Instruct them to do so. In this example, since the terminal 1 has a larger count value than the terminal n, the terminal 1 is determined as the speaker.

【００２１】なお、最も音声カウント値が大きいものが
２つ以上存在するときは、接続された端末の若い順番の
ものを話者に判定するものとする。また、音声カウンタ
７ａ〜７ｎが全て０の場合は話者はなしとして、映像切
替分配部３には通知しないものとする。映像切替分配部
３に話者が通知されない場合映像切替分配部３は以前か
ら表示している話者を継続して表示する。When there are two or more voices having the highest voice count value, the speaker with the earliest connected terminal is determined as the speaker. Further, when all the audio counters 7a to 7n are 0, it is assumed that there is no speaker and the video switching distribution unit 3 is not notified. When the speaker is not notified to the video switching distribution unit 3, the video switching distribution unit 3 continuously displays the speaker who has been displayed before.

【００２２】以上のような動作を繰り返し、話者判定パ
ラメータ５の話者判定カウンタ５ａの値を話者判定によ
る映像切替間隔として話者判定方式を実現する。By repeating the above operation, the speaker determination method is realized by using the value of the speaker determination counter 5a of the speaker determination parameter 5 as the video switching interval by the speaker determination.

【００２３】[0023]

【発明が解決しようとする課題】従来の話者判定方式
は、以上のように構成されているので、話者判定による
映像切替が話者判定期間単位に音声検出回数が最大の話
者に対して行われるため、話者判定期間のたびに映像の
切り替わりが発生し、映像の切替が不自然に行われてし
まうという問題点があった。Since the conventional speaker determination method is configured as described above, the video switching by the speaker determination is performed for the speaker with the maximum number of voice detections per speaker determination period. Therefore, there is a problem that video switching occurs every time the speaker determination period occurs, and video switching is unnatural.

【００２４】この発明は、上記のような問題点を解消す
るためになされたもので、話者判定による映像切替を前
回話者を優先すること、及び話者判定期間の後部の音声
検出を優先すること、話者判定期間を変更可能にするこ
と等により、映像の切替をできるだけ自然にする話者判
定方式を得ることを目的としている。The present invention has been made in order to solve the above-mentioned problems, and prioritizes the previous speaker in the video switching by the speaker determination and prioritizes the voice detection in the rear part of the speaker determination period. The purpose of the present invention is to obtain a speaker determination method that makes video switching as natural as possible by changing the speaker determination period.

【００２５】[0025]

【課題を解決するための手段】第１の発明に係る話者判
定方式は、前回の話者判定期間において話者と判定され
た端末を前回端末として記憶する前回話者記憶手段と、
今回の話者判定期間において所定の基準により話者端末
の候補となる複数の候補端末を選出するとともに、上記
前回話者記憶手段により記憶された前回端末が選択した
候補端末に含まれる場合、上記前回端末を今回の話者端
末とする話者判定手段とを備えたことを特徴とする。A speaker determination method according to a first aspect of the present invention is a speaker determination means for storing a terminal determined to be a speaker in a previous speaker determination period as a previous terminal,
In the present speaker determination period, a plurality of candidate terminals that are candidates for speaker terminals are selected based on a predetermined criterion, and when the previous terminal stored by the previous speaker storage means is included in the selected candidate terminals, the above The present invention is characterized in that a speaker determination unit that uses the previous terminal as the current speaker terminal is provided.

【００２６】第２の発明に係る話者判定方式は、候補端
末を選出する所定の基準として話者判定期間にある複数
の音声検出期間の中で有音と判定された音声検出期間の
数から求められる有音頻度を用いることを特徴とする。The speaker determination method according to the second invention is based on the number of voice detection periods determined to be voice among a plurality of voice detection periods in the speaker determination period as a predetermined criterion for selecting candidate terminals. It is characterized by using the required voice frequency.

【００２７】第３の発明に係る話者判定方式は、候補端
末を選出する所定の基準として話者判定期間にある複数
の音声検出期間の中で有音と判定された音声検出期間の
位置から求められる有音位置を用いることを特徴とす
る。The speaker determination method according to the third aspect of the invention is based on the position of the voice detection period determined to be voice among a plurality of voice detection periods in the speaker determination period as a predetermined criterion for selecting candidate terminals. It is characterized by using the required voiced position.

【００２８】第４の発明に係る話者判定方式は、候補端
末を選出する所定の基準として話者判定期間にある複数
の音声検出期間の中で有音と判定された音声検出期間の
数から求められる有音頻度と有音と判定された音声検出
期間の話者判定期間における有音位置の両方を用いるこ
とを特徴とする。The speaker determination method according to the fourth aspect of the invention is based on the number of voice detection periods determined as voice among a plurality of voice detection periods in the speaker determination period as a predetermined criterion for selecting candidate terminals. It is characterized in that both the required voice frequency and the voice position in the speaker determination period of the voice detection period determined to be voice are used.

【００２９】第５の発明に係る話者判定方式は、上記前
回端末とその他の候補端末がもつ上記所定の基準に用い
たパラメータの値を比較し、その差が所定のしきい値以
下のとき、前回端末を話者端末とすることを特徴とす
る。In the speaker determination method according to the fifth aspect of the present invention, the values of the parameters used for the above-mentioned predetermined criterion held by the previous terminal and other candidate terminals are compared, and when the difference is less than a predetermined threshold value. , The previous terminal is a speaker terminal.

【００３０】第６の発明に係る話者判定方式は、話者判
定期間を、複数の音声検出期間を有するゾーンに分割
し、ゾーンを単位にして所定の基準に基づいて話者端末
を判定することを特徴とする。In the speaker determination system according to the sixth aspect of the present invention, the speaker determination period is divided into zones having a plurality of voice detection periods, and the speaker terminal is determined based on a predetermined standard in units of zones. It is characterized by

【００３１】第７の発明に係る話者判定方式は、話者判
定期間にある複数の音声検出期間の中で有音と判定され
た音声検出期間の話者判定期間における位置を基準にし
て話者端末を判定することを特徴とする。In the speaker determination method according to the seventh aspect of the invention, the speaker is detected based on the position in the speaker determination period of the voice detection period determined to be voice among a plurality of voice detection periods in the speaker determination period. It is characterized by determining the person terminal.

【００３２】第８の発明に係る話者判定方式は、前回の
話者判定期間において話者として判定された端末と今回
の話者判定期間において話者として判定された端末との
異同により、話者を判定する間隔を変更する間隔変更手
段を備えたことを特徴とする。In the speaker determination method according to the eighth aspect of the present invention, the difference between the terminal determined as the speaker in the previous speaker determination period and the terminal determined as the speaker in the current speaker determination period is It is characterized in that it comprises an interval changing means for changing an interval for judging a person.

【００３３】第９の発明に係る話者判定方式は、前回の
話者判定期間において話者として判定された端末と今回
の話者判定期間において話者として判定された端末が異
なる場合、次の所定時間話者判定期間の開始時刻を遅ら
せる話者固定手段を備えたことを特徴とする。The speaker determination method according to the ninth aspect of the present invention is as follows when the terminal determined as the speaker in the previous speaker determination period and the terminal determined as the speaker in the current speaker determination period are different. A speaker fixing means for delaying the start time of the speaker determination period for a predetermined time is provided.

【００３４】第１０の発明に係る話者判定方式は、上記
話者判定期間を記憶した話者判定期間記憶手段を備え、
上記間隔変更手段は、上記話者判定期間記憶部に記憶さ
れた話者判定期間を変更する話者判定期間変更手段を備
えたことを特徴とする。A speaker determination system according to a tenth aspect of the invention comprises a speaker determination period storage means for storing the speaker determination period,
The interval changing means includes a speaker determination period changing means for changing the speaker determination period stored in the speaker determination period storage section.

【００３５】第１１の発明に係る話者判定方式は、上記
話者判定期間に所定の倍率を掛けて求められた期間を次
の話者判定期間とすることを特徴とする。The speaker determination system according to the eleventh aspect of the invention is characterized in that a period obtained by multiplying the speaker determination period by a predetermined multiplication factor is set as a next speaker determination period.

【００３６】第１２の発明に係る話者判定方式は、前回
の話者判定期間において話者として判定された端末と今
回の話者判定期間において話者として判定された端末が
同じ場合は、上記話者判定期間の一部を削除した話者判
定期間を次の話者判定期間とすることを特徴とする。In the speaker determination method according to the twelfth invention, when the terminal determined as the speaker in the previous speaker determination period and the terminal determined as the speaker in the current speaker determination period are the same, The speaker determination period obtained by deleting a part of the speaker determination period is set as the next speaker determination period.

【００３７】第１３の発明に係る話者判定方式は、話者
判定に用いる各種パラメータを記憶した話者判定パラメ
ータ記憶部と、話者判定パラメータ記憶部に記憶された
パラメータの値とは異なるパラメータの値をもつパラメ
ータを記憶したデフォルトパラメータ記憶部と、前回の
話者判定期間において話者として判定された端末と今回
の話者判定期間において話者として判定された端末との
異同により上記話者判定パラメータ記憶部に記憶された
パラメータの値を上記デフォルトパラメータ記憶部に記
憶したパラメータの値により変更するパラメータ変更手
段を備えたことを特徴とする。In the speaker determination method according to the thirteenth invention, a speaker determination parameter storage unit storing various parameters used for speaker determination and a parameter different from the parameter value stored in the speaker determination parameter storage unit. The default parameter storage unit that stores a parameter having a value of, the terminal that is determined as the speaker in the previous speaker determination period and the terminal that is determined as the speaker in the current speaker determination period are different from each other in the above speaker. It is characterized by further comprising a parameter changing means for changing the value of the parameter stored in the determination parameter storage unit according to the value of the parameter stored in the default parameter storage unit.

【００３８】[0038]

【作用】第１の発明に係る話者判定方式においては、前
回話者記憶手段が前回の話者判定期間において話者と判
定された端末を前回端末として記憶しておき話者判定手
段がある所定の基準に従って複数の候補端末を今回の話
者判定期間から選出した場合に前回端末が含まれていれ
ばその前回端末を今回の話者端末とするため前回選出さ
れた端末が優先的に今回の話者端末となる。従って、話
者判定期間が経過する場合において、同一の端末が話者
として継続する可能性が高くなり、話者判定期間毎に話
者が度々切り替わるという不都合がなくなる。In the speaker determining method according to the first aspect of the invention, the speaker determining means stores the terminal determined as the speaker in the previous speaker determining period as the previous terminal as the previous terminal. If multiple candidate terminals are selected from the speaker determination period of this time according to the predetermined criteria and the previous terminal is included, the terminal selected last time is given priority because the previous terminal is the speaker terminal of this time. It becomes the speaker terminal. Therefore, when the speaker determination period elapses, the possibility that the same terminal continues as a speaker increases, and the inconvenience that the speaker is frequently switched for each speaker determination period is eliminated.

【００３９】第２の発明に係る話者判定方式において
は、候補端末を選出する基準として有音頻度を用いるこ
とを特徴としており、有音頻度が高い順にいくつかの候
補端末が選択され、その選択されたた候補端末中に前回
端末が含まれる場合にこれを話者端末とする。The speaker determination system according to the second invention is characterized in that the voice frequency is used as a criterion for selecting candidate terminals, and some candidate terminals are selected in descending order of voice frequency. If the selected candidate terminal includes the previous terminal, this is set as the speaker terminal.

【００４０】第３の発明に係る話者判定方式において
は、候補端末を選出する基準として有音と判定された音
声検出期間の話者判定期間内における位置を用いること
を特徴としており、話者判定期間内の後方において有音
と検出された端末を候補端末とするとともに、前回端末
が選出した候補端末に含まれる場合にはその前回端末を
話者端末とする。話者判定期間の後方に有音とされた音
声検出期間をもつ端末は次の話者判定期間においてもそ
のまま連続して音声を出し続けいている可能性が高い。
従って、前回の話者を優先するとともに次回の話者とな
る可能性が高い端末を話者端末として選択することがで
き、同一の端末が話者として継続する可能性が高くな
り、話者判定期間毎に話者が度々切り替わるという不都
合がなくなる。The speaker determination method according to the third aspect of the invention is characterized in that the position within the speaker determination period of the voice detection period determined as voiced is used as a criterion for selecting candidate terminals. A terminal that is detected as having a sound behind in the determination period is set as a candidate terminal, and if the terminal is included in the candidate terminals selected by the previous terminal, the previous terminal is set as the speaker terminal. It is highly possible that a terminal having a voice detection period which is voiced after the speaker determination period continues to output a voice continuously in the next speaker determination period.
Therefore, it is possible to prioritize the previous speaker and select a terminal that has a high possibility of becoming the next speaker as the speaker terminal, increasing the possibility that the same terminal will continue as the speaker. The inconvenience that the speaker is frequently switched for each period disappears.

【００４１】第４の発明に係る話者判定方式において
は、有音頻度と有音位置の両方を用いて話者を判定する
ことを特徴としている。この場合には、有音頻度を優先
させて、いくつかの候補端末を選択し、その選択された
候補端末に対する有音位置から話者となる端末を選択す
るようにしても良い。あるいは逆に有音位置から候補と
なる端末を選択し、選択された端末に対する有音頻度を
比較して話者となる端末を選択するようにしても良い。The speaker determination system according to the fourth aspect is characterized in that the speaker is determined by using both the voice frequency and the voice position. In this case, the voice frequency may be prioritized, some candidate terminals may be selected, and the speaker terminal may be selected from the voice positions of the selected candidate terminals. Alternatively, conversely, a candidate terminal may be selected from the voiced position, and the frequency of the voice with respect to the selected terminal may be compared to select the speaker terminal.

【００４２】第５の発明に係る話者判定方式において
は、候補端末の中に前回端末が含まれている場合に候補
端末の中で所定の基準に用いたパラメータ値が最大のも
のと前回端末のパラメータ値を比較しその差が所定のし
きい値以下の場合に前回端末を話者端末とするため、候
補端末の中に前回端末が含まれている場合でも、候補端
末との間に大きな差がある場合には、候補端末の中で最
も可能性の高いものを話者端末とする。逆に、候補端末
の中で最高の値をもつ候補端末と前回端末との差がしき
い値以下の場合には前回端末を優先して話者端末とす
る。In the speaker determination method according to the fifth aspect of the present invention, when the candidate terminal includes the previous terminal, the candidate terminal has the largest parameter value used for the predetermined reference and the previous terminal. If the difference is less than or equal to a predetermined threshold, the previous terminal is set as the speaker terminal, so even if the previous terminal is included in the candidate terminals, the If there is a difference, the speaker terminal is the most likely candidate terminal. On the contrary, when the difference between the candidate terminal having the highest value among the candidate terminals and the previous terminal is less than or equal to the threshold value, the previous terminal is prioritized as the speaker terminal.

【００４３】第６の発明に係る話者判定方式において
は、話者判定期間を複数のゾーンに分割しそのゾーンを
単位にして有音位置を判定することを特徴とするもので
ある。音声検出期間の有音位置を用いる場合でも構わな
いが、このように複数の音声検出期間を有するゾーンを
用いることにより、例えばノイズが発生した場合や非常
に短い時間の音声が発生された状態を有音位置と判定す
ることなく無視することができる。すなわちゾーンとい
うある一定期間に対して有音かどうかを判定することに
より話者が継続して音声信号を出力しているかどうかが
より確実に検出できる。The speaker determination system according to the sixth aspect of the invention is characterized in that the speaker determination period is divided into a plurality of zones and the voiced position is determined in units of the zones. Although it is possible to use the voiced position of the voice detection period, by using the zone having a plurality of voice detection periods in this way, for example, when noise is generated or a state where voice is generated for a very short time is generated. It can be ignored without determining the voiced position. That is, it is possible to more reliably detect whether or not the speaker is continuously outputting a voice signal by determining whether or not there is sound for a certain period of time called a zone.

【００４４】第７の発明に係る話者判定方式において
は、前述した発明における前回話者記憶手段を有してお
らず、単に有音位置を基準にして話者端末を判定するこ
とを特徴としている。話者判定期間のより後方に有音と
判定された音声検出期間を有している端末を話者端末と
することは次の話者判定期間においても継続して話者端
末となる可能性が高い端末を選択することであり、この
ような端末を選択することにより話者判定期間のたびに
話者端末が切り替わることを防止できる。The speaker determination system according to the seventh invention is characterized in that the previous speaker storage means in the above-mentioned invention is not provided and the speaker terminal is simply determined based on the voiced position. There is. Using a terminal that has a voice detection period determined to be voiced later than the speaker determination period as a speaker terminal may continue to be the speaker terminal during the next speaker determination period. It is to select a high terminal, and by selecting such a terminal, it is possible to prevent the speaker terminal from being switched at each speaker determination period.

【００４５】第８の発明に係る話者判定方式において
は、前回の話者判定期間において話者と判定された端末
と今回の話者判定期間において話者として判定された端
末との異同により、話者判定期間等の話者を判定する間
隔を変更するものである。すなわち、前回の話者端末と
今回の話者端末が異なる場合には、今回の話者端末が次
の話者判定期間においてすぐに切り替わる可能性がある
ためその間隔を長くしてやる。一方、前回の端末と今回
の端末が同じ場合には今回の端末は前回から継続して話
者端末として判定されているため、話者判定期間を従来
通りにするかあるいは短くしてやる。このように、前回
端末と今回端末の異同により間隔を変更することにより
判定された端末が話者判定期間のたびに頻繁に切り替わ
ることを防止する。In the speaker judging system according to the eighth aspect of the invention, the terminal judged as the speaker in the previous speaker judging period and the terminal judged as the speaker in the present speaker judging period are different from each other. The interval for determining the speaker such as the speaker determination period is changed. That is, when the speaker terminal of the previous time and the speaker terminal of this time are different, the speaker terminal of this time may be switched immediately in the next speaker determination period, so the interval is made longer. On the other hand, when the previous terminal and the current terminal are the same, the current terminal has been continuously determined as the speaker terminal from the previous time, and therefore the speaker determination period is set to the conventional value or shortened. In this way, it is possible to prevent the terminal determined by changing the interval due to the difference between the previous terminal and the current terminal from frequently switching for each speaker determination period.

【００４６】第９の発明に係る話者判定方式において
は、間隔を変更するために次の話者判定期間の開始を後
ろにずらし、所定の固定時間を挿入することにより挿入
した固定時間だけ今回判定された話者端末が通常よりも
長期間選択され続けることになる。In the speaker determination method according to the ninth aspect of the invention, the start of the next speaker determination period is shifted backward in order to change the interval, and a predetermined fixed time is inserted, so that only the fixed time is inserted. The determined speaker terminal will continue to be selected for a longer period than usual.

【００４７】第１０の発明に係る話者判定方式において
は、記憶している話者判定期間を書き換えてしまうもの
であり、この話者判定期間を書き換えることにより新た
な話者判定期間を用いて次の話者判定が行われる。In the speaker determination method according to the tenth aspect of the invention, the stored speaker determination period is rewritten, and a new speaker determination period is used by rewriting this speaker determination period. The next speaker decision is made.

【００４８】第１１の発明に係る話者判定方式において
は、話者を判定する間隔を変更するために話者判定期間
において所定の倍率を掛けることにより新たな話者判定
期間を算出するものである。In the speaker determination method according to the eleventh invention, a new speaker determination period is calculated by multiplying the speaker determination period by a predetermined multiplication factor in order to change the speaker determination interval. is there.

【００４９】第１２の発明に係る話者判定方式において
は、前回端末と今回端末が同じ場合には、次の話者判定
期間を短くしても構わないと考え、次の話者判定期間の
一部を削除するものである。In the speaker determination method according to the twelfth invention, if the previous terminal and the current terminal are the same, the next speaker determination period may be shortened, and the next speaker determination period This is to remove a part.

【００５０】第１３の発明に係る話者判定方式において
は、話者判定パラメータ記憶部に話者判定に用いる各種
パラメータを記憶しておき、前回の話者端末と今回の話
者端末の異同により話者判定パラメータを書き換えるも
のであり、このパラメータを書き換えることにより、単
に前述したような話者を判定する間隔を変更するばかり
でなく、例えばしきい値の変更あるいはサンプリング値
の変更等のその他のパラメータの変更も可能になる。In the speaker determination system according to the thirteenth invention, various parameters used for speaker determination are stored in the speaker determination parameter storage section, and the speaker terminal of the previous time and the speaker terminal of this time are distinguished from each other. This is to rewrite the speaker determination parameter. By rewriting this parameter, not only the interval for determining the speaker as described above is changed, but also other parameters such as the change of the threshold value or the change of the sampling value are performed. It is also possible to change the parameters.

【００５１】[0051]

【実施例】実施例１．以下、この発明の実施例を図につ
いて説明する。図１はこの発明の一実施例を示すブロッ
ク図である。図において、１は話者判定部、２は音声加
算分配部、３は映像切替分配部、４は制御処理部、５は
話者判定パラメータ、６は音声カウント部、７は音声カ
ウントメモリ、８は有効音声判定部である。これらは、
従来例のものと同一あるいは相当部分であるため詳細な
説明は省略する。EXAMPLES Example 1. Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of the present invention. In the figure, 1 is a speaker determination unit, 2 is a voice addition / distribution unit, 3 is a video switching distribution unit, 4 is a control processing unit, 5 is a speaker determination parameter, 6 is a voice counting unit, 7 is a voice counting memory, and 8 is a voice counting memory. Is an effective voice determination unit. They are,
The detailed description is omitted because it is the same as or equivalent to that of the conventional example.

【００５２】１０は制御処理部４からの指示により、音
声カウントメモリ７に格納された音声検出結果より、音
声検出結果（カウント値）が大きい上位２つの端末を求
め、その上位２つの端末の音声カウントの差が話者判定
しきい値未満であってどちらか一方が前回の話者であれ
ば、その端末を今回の話者とし、それ以外の時は上位１
位の端末を今回の話者とする上位２音声カウント判定
部、１１は上位２音声カウント判定部１０により、話者
として判定された端末を記憶する前回話者記憶メモリで
ある。In response to an instruction from the control processing unit 10, numeral 10 obtains the upper two terminals having a larger voice detection result (count value) than the voice detection result stored in the voice count memory 7, and outputs the voices of the two higher terminals. If the difference between the counts is less than the speaker determination threshold value and one of them is the previous speaker, the terminal is set as the current speaker, and in the other cases, the top 1
The top 2 voice count determination unit that sets the terminal of the highest rank as the speaker of this time, and the reference numeral 11 is a previous speaker storage memory that stores the terminal determined as the speaker by the top 2 voice count determination unit 10.

【００５３】また、図２に話者判定パラメータ５及び音
声カウントメモリ７の構成例を示す。５ａは話者判定を
行う期間を指定するための話者判定カウンタ、５ｂは音
声検出の時間間隔を指定するための音声検出時間、５ｃ
は音声検出結果を有効と判定するためのしきい値を指定
するための有効音声カウンタ、５ｄは話者判定期間を複
数のゾーンに分割したときの１つのゾーン期間を指定す
るためのゾーン音声カウンタ、５ｅは音声検出量の大き
い上位２端末を求めたときの差が一定値以上であれば、
無条件に上位１位の端末を話者と判定するためのしきい
値を指定するための話者特定カウンタ、５ｆは話者判定
により映像切替が行われたときに、話者判定の処理を遅
らせるために指定する話者固定時間である。FIG. 2 shows a configuration example of the speaker determination parameter 5 and the voice count memory 7. Reference numeral 5a is a speaker determination counter for designating a period for speaker determination, 5b is a voice detection time for designating a time interval of voice detection, 5c.
Is a valid voice counter for designating a threshold value for determining that the voice detection result is valid, and 5d is a zone voice counter for designating one zone period when the speaker determination period is divided into a plurality of zones. 5e, if the difference when obtaining the top two terminals with a large amount of voice detection is a certain value or more,
The speaker identification counter 5f for unconditionally specifying the threshold value for determining the top-ranked terminal as the speaker is a speaker determination process when the video is switched by the speaker determination. It is a fixed speaker time to delay.

【００５４】７７ａ〜７７ｎは接続されたテレビ会議端
末毎の音声検出結果を格納するためのメモリで、７７ａ
０〜７７ｎ０は音声検出ありの時を１として加算した結
果を格納するためのトータル音声カウンタ、７７ａ１〜
７７ａｍ，〜，７７ｎ１〜７７ｎｍはゾーン毎に音声検
出ありの時を１として１を加算した結果を格納するため
のゾーン音声カウンタである。Reference numerals 77a to 77n are memories for storing the voice detection result for each connected video conference terminal.
0 to 77n0 are total voice counters for storing the result of adding 1 when there is voice detection, 77a1 to 77a1
Reference numerals 77am to 77n1 to 77nm are zone voice counters for storing the result of adding 1 with 1 when voice is detected for each zone.

【００５５】図２に示した話者判定パラメータ５及び音
声カウントメモリ７の構成例は後述する実施例で使用す
るものを全て含んでいるため、詳細は各実施例において
説明する。Since the configuration examples of the speaker determination parameter 5 and the voice count memory 7 shown in FIG. 2 include all those used in the embodiments described later, the details will be described in each embodiment.

【００５６】なお、話者判定方式の例として話者判定カ
ウンタ５ａとゾーン音声カウンタ５ｄは「話者判定カウ
ンタ５ａ＝ゾーン音声カウンタ５ｄ×整数」の条件と
し、話者判定カウンタ５ｄと有効音声カウンタ５ｃと話
者特定カウンタ５ｅは「話者判定カウンタ５ａ≧有効音
声カウンタ５ｃ≧話者特定カウンタ５ｅ＞０」の条件に
て設定してあるものとする。As an example of the speaker determination method, the speaker determination counter 5a and the zone voice counter 5d are set to "speaker determination counter 5a = zone voice counter 5d × integer", and the speaker determination counter 5d and the valid voice counter are used. 5c and the speaker identification counter 5e are set under the condition of "speaker determination counter 5a≥valid voice counter 5c≥speaker identification counter 5e>0".

【００５７】次に動作について説明する。ここで図３は
その動作の流れを示すフローチャートである。ステップ
ＳＴ１において、制御処理部４は話者判定を行う前に音
声カウント部６に指示して音声カウントメモリ７のトー
タル音声カウンタ７７ａ０〜７７ｎ０を０クリアする。
次にステップＳＴ２において、制御処理部４は話者判定
パラメータ５の音声検出時間５ｂが経過するまで待つ。
ステップＳＴ３において音声カウント部６に音声カウン
ト処理を指示すると音声カウント部６は音声加算分配部
２よりテレビ会議端末毎の音声検出の有無を検出し、音
声検出ありの時は、音声カウントメモリ７の該当するテ
レビ会議端末のトータル音声カウンタ７７ａ０〜７７ｎ
０を＋１する。ステップＳＴ４において、制御処理部４
は話者判定パラメータ５の話者判定カウンタ５ａのカウ
ント値に達するまでステップＳＴ２〜ＳＴ４の処理を繰
り返す。図４は、この繰り返し処理が終了した場合のト
ータル音声カウンタの値の一例を示す図である。例え
ば、端末１のトータル音声カウンタは４０を示してい
る。同様に端末２に対しては４３、端末３に対しては３
１、端末ｎに対しては２０を示している。Next, the operation will be described. Here, FIG. 3 is a flowchart showing the flow of the operation. In step ST1, the control processing unit 4 instructs the voice counting unit 6 to clear the total voice counters 77a0 to 77n0 of the voice counting memory 7 to 0 before performing the speaker determination.
Next, in step ST2, the control processing unit 4 waits until the voice detection time 5b of the speaker determination parameter 5 elapses.
When the voice counting unit 6 is instructed to perform the voice counting process in step ST3, the voice counting unit 6 detects from the voice addition and distribution unit 2 whether or not voice detection is made for each video conference terminal. Total audio counters 77a0-77n of the corresponding video conference terminal
Increment 0 by 1. In step ST4, the control processing unit 4
Repeats the processing of steps ST2 to ST4 until the count value of the speaker determination counter 5a of the speaker determination parameter 5 is reached. FIG. 4 is a diagram showing an example of the value of the total voice counter when this repeating process is completed. For example, the total voice counter of the terminal 1 is 40. Similarly, 43 for terminal 2 and 3 for terminal 3.
1 and 20 for terminal n.

【００５８】次にステップＳＴ５において、制御処理部
４は有効音声判定部８に話者判定パラメータ５の有効音
声カウンタ５ｃの値を通知し、有効音声判定処理を指示
する。有効音声判定部８は音声カウントメモリ７のトー
タル音声カウンタ７７ａ０〜７７ｎ０が有効音声カウン
タ５ｃの値より少なければ該当するトータル音声カウン
タ７７ａ０〜７７ｎ０を０クリアする。この例において
は、有効音声カウンタ５ｃに３０が設定されているた
め、図４においては、３０の値より小さいものは０クリ
アされる。従って、端末ｎのトータル音声カウンタ７７
ｎ０は０クリアされる。このように有効音声カウンタを
用いて有効音声カウンタが示す値より小さな値をもつ端
末を除去するのは話者と判定するための回数をある基準
以上に保つためである。この音声カウンタの値を大きく
すれば無音であるという状態が多く検出され、この有効
音声カウンタの値を小さくすれば有音であるという状態
が多く検出されることになる。Next, in step ST5, the control processing unit 4 notifies the valid voice determination unit 8 of the value of the valid voice counter 5c of the speaker determination parameter 5, and instructs the valid voice determination process. If the total voice counters 77a0 to 77n0 of the voice count memory 7 are less than the value of the valid voice counter 5c, the valid voice determination unit 8 clears the corresponding total voice counters 77a0 to 77n0. In this example, since the valid voice counter 5c is set to 30, in FIG. 4, a value smaller than 30 is cleared to 0. Therefore, the total voice counter 77 of the terminal n
n0 is cleared to 0. The reason why the terminal having the value smaller than the value indicated by the effective voice counter is eliminated by using the effective voice counter is to keep the number of times for judging the speaker as a certain number or more. If the value of this voice counter is increased, many states that there is no sound will be detected, and if the value of this effective voice counter is decreased, many states that there is sound will be detected.

【００５９】次にステップＳＴ６において、制御処理部
４は上位２音声カウント処理部１０に話者判定パラメー
タ５の話者特定カウンタ５ｅのしきい値を通知し、上位
２音声カウント判定処理を指示する。上位２音声カウン
ト処理部１０は音声カウントメモリ７のトータル音声カ
ウンタ７７ａ０〜７７ｎ０より音声カウントの値は大き
い上位２つのテレビ会議端末を求め、その音声カウント
値の差が話者特定カウンタ５ｅのしきい値未満の時に前
回話者記憶メモリ１１に格納された前回話者テレビ会議
端末が２位のテレビ会議端末であれば２位のテレビ会議
端末を話者と判定し、他の場合は１位のテレビ会議端末
を今回の話者と判定し、前回話者記憶メモリ１１に話者
のテレビ会議端末を格納する。Next, in step ST6, the control processing unit 4 notifies the upper 2 voice count processing unit 10 of the threshold value of the speaker identification counter 5e of the speaker determination parameter 5, and instructs the upper 2 voice count determination process. . The upper 2 audio count processing unit 10 seeks the upper 2 video conference terminals having a larger audio count value than the total audio counters 77a0 to 77n0 of the audio count memory 7, and the difference between the audio count values is the threshold of the speaker identification counter 5e. When the value is less than the value, if the previous speaker video conference terminal stored in the speaker storage memory 11 last time is the second video conference terminal, the second video conference terminal is determined to be the speaker, and if the previous speaker video conference terminal is the first speaker otherwise. The video conference terminal is determined to be the speaker of this time, and the video conference terminal of the speaker is stored in the speaker storage memory 11 last time.

【００６０】図４に示す例においては、上位２つのテレ
ビ会議端末は端末１と端末２である。しかもその差は４
３−４０＝３であるため、図２に示した話者特定カウン
タ５ｅのしきい値（１０）より小さい。従ってもし端末
１が前回の話者判定期間において選択され、前回話者記
憶メモリ１１に記憶されている端末である場合には、今
回の話者判定期間において再び端末１が話者として判定
される。In the example shown in FIG. 4, the top two video conference terminals are terminal 1 and terminal 2. And the difference is 4
Since 3-40 = 3, it is smaller than the threshold value (10) of the speaker identification counter 5e shown in FIG. Therefore, if the terminal 1 was selected in the previous speaker determination period and is the terminal stored in the speaker storage memory 11 last time, the terminal 1 is determined again as the speaker in the current speaker determination period. .

【００６１】なお、上位１端末目または２端末目と３端
末目以降のトータル音声カウンタ７７ａ０〜７７ｎ０の
値が同一の場合は接続されたテレビ会議端末の若い順番
のものを上位１端末目及び２端末目とし、トータル音声
カウント７７ａ０〜７７ｎ０が全て０の場合は前回話者
をこのテレビ会議端末を話者と判定する。When the values of the total audio counters 77a0 to 77n0 of the first or second terminal and the third and subsequent terminals are the same, the connected video conference terminals in the youngest order are the first and second terminals. When the total voice counts 77a0 to 77n0 are all 0 for the terminal, the last speaker is determined to be the video conference terminal.

【００６２】次にステップＳＴ７において、上位２音声
カウント処理部１０は、今回の話者と判定したテレビ会
議端末と前回話者のテレビ会議端末が同一でない場合
は、映像切替分配器３へ話者と判定したテレビ会議端末
の映像を他のテレビ会議端末に分配するように指示す
る。Next, in step ST7, when the video conference terminal determined to be the current speaker and the video conference terminal of the previous speaker are not the same, the upper 2 audio count processing unit 10 sends the video switching distributor 3 to the speaker. It is instructed to distribute the video of the video conference terminal determined to be to the other video conference terminals.

【００６３】ステップＳＴ１〜ＳＴ７のような動作を繰
り返し、話者判定パラメータ５の話者判定カウンタ５ａ
を話者判定による映像切替間隔として話者判定方式を実
現する。The operations of steps ST1 to ST7 are repeated, and the speaker determination counter 5a of the speaker determination parameter 5 is repeated.
A speaker determination method is realized by using as a video switching interval by speaker determination.

【００６４】以上のようにこの実施例では、多地点間通
信制御装置等における複数のテレビ会議参加端末の映像
切替を行う話者判定方式において、話者判定部上に構成
された話者判定のパラメータを格納する話者判定パラメ
ータと、そのパラメータに基づいて話者判定の指示を行
う制御処理部と、テレビ会議端末からの音声検出結果を
カウントする音声カウント部と、音声カウント結果を格
納する音声カウントメモリと、有効音声カウンタ（音声
しきい値）により、しきい値未満の音声カウント結果を
音声カウントメモリよりクリアする有効音声判定部と、
前回話者を格納する前回話者記憶メモリと、音声カウン
トメモリより音声カウント値の大きい上位２つのテレビ
会議端末を求め、その音声カウントの差が話者特定カウ
ンタ（話者特定しきい値）未満の時、どちらか一方が前
回話者であれば、そのテレビ会議端末を話者とし、それ
以外の時は、１位のテレビ会議端末端末を話者とする上
位２音声カウント判定部とを備えた話者判定方式を説明
した。As described above, in this embodiment, in the speaker determination system for switching the video of a plurality of video conference participating terminals in the multipoint communication control device or the like, the speaker determination configured on the speaker determination unit is performed. A speaker determination parameter that stores a parameter, a control processing unit that issues a speaker determination instruction based on the parameter, a voice counting unit that counts voice detection results from the video conference terminal, and a voice that stores the voice count result. With a count memory and a valid voice counter (voice threshold), a valid voice determination unit that clears a voice count result below the threshold from the voice count memory,
The previous speaker storage memory that stores the previous speaker and the top two video conference terminals that have a higher voice count value than the voice count memory are obtained, and the difference between the voice counts is less than the speaker specific counter (speaker specific threshold). When either one is the previous speaker, the video conference terminal is used as the speaker, and in the other cases, the top two audio count determination unit using the first video conference terminal terminal as the speaker is provided. The speaker determination method was explained.

【００６５】この実施例に係る話者判定方式は、特に前
回話者を格納する前回話者記憶メモリと、音声検出頻度
の大きい上位２つのテレビ会議端末を求め、その音声検
出頻度の差により、前回話者のテレビ会議端末または音
声検出量１位のテレビ会議端末を話者として判定する上
位２音声カウント判定部を設けたことを特徴とするもの
である。上位２音声カウント判定部は、音声検出量の大
きい上位２つのテレビ会議端末の音声検出量の差を求
め、その差が話者特定しきい値未満の時に、前回話者記
憶メモリに格納した前回話者との比較を行っているた
め、音声検出量１位の端末と音声検出量２位の端末の音
声検出量の差が小さいときに、音声検出量２位の端末が
前回話者であれば話者として判定されるため映像切替を
行わない話者判定方式を実現する。In the speaker determination system according to this embodiment, the previous speaker storage memory for storing the previous speaker and the top two video conference terminals having the highest voice detection frequency are obtained, and the difference between the voice detection frequencies It is characterized in that a higher-ranked two-voice count determination unit for determining the last-time speaker's video-conference terminal or the video-conference terminal with the first highest audio detection amount as a speaker is provided. The upper 2 audio count determination unit obtains the difference in the amount of audio detection between the upper 2 video conference terminals with the higher audio detection amount, and when the difference is less than the speaker identification threshold, the previous time stored in the speaker storage memory. Since the comparison with the speaker is performed, when the difference in the voice detection amount between the terminal with the highest voice detection amount and the terminal with the second highest voice detection amount is small, the terminal with the second highest voice detection amount can be the speaker last time. For example, a speaker determination method that does not switch the video is realized because it is determined as the speaker.

【００６６】実施例２．上記実施例１においては、上位
２音声カウント発生部１０においては音声カウント値が
大きい上位２つのテレビ会議端末を候補端末として上げ
たが、上位２つの端末を選択する場合に限らず、例えば
上位３つの端末あるいは上位４つの端末を選択する場合
等のように所定の複数個の端末を候補端末としても構わ
ない。Example 2. In the first embodiment, the upper two video conference terminals having the larger audio count value are selected as the candidate terminals in the upper two audio count generation unit 10. However, the selection is not limited to the case of selecting the upper two terminals, and for example, the upper three. A predetermined plurality of terminals may be used as candidate terminals as in the case of selecting one terminal or upper four terminals.

【００６７】例えば、図４において、上位３つのテレビ
会議端末を候補端末とする場合には、端末１，２，３の
３つの端末が候補端末として選択される。そして同様に
話者特定カウンタ５ｅの値１０を用いてその差がしきい
値以内かどうかが判定される。この例においては、端末
３が値３１を示し、端末２が値４３を示し、その差が１
０より大きいため端末３はたとえ前回話者記憶メモリ１
１に記憶されている前回の話者端末であっても今回の話
者端末としては選択されない。このように、話者特定カ
ウンタ５ｅに設定されたしきい値以上に離れた端末が前
回話者記憶メモリに記憶されていたとしても、今回の話
者端末としないのは今回の話者判定期間において最大の
トータル音声カウント値を示したものから、しきい値以
上離れている端末を除去することにより、より話者とし
てふさわしいものを選択するためである。For example, in FIG. 4, when the top three video conference terminals are used as the candidate terminals, the three terminals 1, 2, 3 are selected as the candidate terminals. Similarly, the value 10 of the speaker identification counter 5e is used to determine whether or not the difference is within the threshold value. In this example, terminal 3 shows the value 31, terminal 2 shows the value 43, and the difference is 1
Since the terminal 3 is larger than 0, the terminal 3 stores the speaker storage memory 1
Even the previous speaker terminal stored in 1 is not selected as the current speaker terminal. In this way, even if a terminal separated by more than the threshold value set in the speaker identification counter 5e is stored in the speaker storage memory last time, the speaker determination period of this time is not set as the speaker terminal of this time. This is because a terminal more distant by a threshold value or more is removed from the one showing the maximum total voice count value in to select a more suitable speaker.

【００６８】実施例３．上記実施例１及び実施例２にお
いては、話者特定カウンタ５ｅを用いることにより候補
端末の間の音声カウンタ値が所定のしきい値以上はなれ
ている場合には、たとえ候補端末の中に前回話者記憶メ
モリに記憶された端末が含まれていても、今回の話者端
末としない場合を示したが、話者記憶カウンタ５ｅのし
きい値を用いず、単に候補端末として選ばれた複数の端
末の中に前回話者記憶メモリに記憶された端末が含まれ
ている場合には、その端末を今回の話者端末としても構
わない。Example 3. In the above-described first and second embodiments, when the voice counter value between the candidate terminals is not equal to or more than the predetermined threshold value by using the speaker identification counter 5e, the last talk is performed in the candidate terminal. Although the case where the terminal stored in the speaker storage memory is included is not used as the speaker terminal this time, the threshold value of the speaker storage counter 5e is not used, and the plurality of terminals simply selected as candidate terminals are not used. When the terminal includes the terminal stored in the speaker storage memory last time, the terminal may be the speaker terminal of this time.

【００６９】実施例４．図５はこの発明の他の実施例を
示すブロック図である。図において、図１と同一符号の
ものは同一あるいは相当部分であり説明は省略する。Example 4. FIG. 5 is a block diagram showing another embodiment of the present invention. In the figure, the same symbols as those in FIG. 1 are the same or corresponding parts, and the description thereof will be omitted.

【００７０】６は制御処理部４からの指示により、音声
加算分配部２からの音声の有無を検出し、音声カウント
メモリ７のトータル音声カウンタ７７ａ０〜７７ｎ０の
値にそれぞれ１を加算して増加させるとともに、制御処
理部４より指定されたゾーン音声カウンタ７７ａ１〜７
７ａｍ，〜，７７ｎ１〜７７ｎｍの値に１を加算して増
加させる音声カウント部、１２は制御処理部４からの指
示により音声カウントメモリ７上で最も後に位置するゾ
ーン音声カウンタ７７ａｍ〜７７ｎｍよりゾーン音声カ
ウンタ７７ａ１〜７７ａｍ，〜，７７ｎ１〜７７ｎｍが
０でないテレビ会議端末を求め、同一ゾーンにおいて、
２つ以上のテレビ会議端末がある場合は、上位２音声カ
ウント判定部１０に話者判定を指示し、１つのテレビ会
議端末のみの場合は、そのテレビ会議端末を話者とする
後位ゾーン音声判定部である。Reference numeral 6 indicates the presence or absence of voice from the voice addition / distribution unit 2 in response to an instruction from the control processing unit 4, and increments the value of each of the total voice counters 77a0 to 77n0 of the voice count memory 7 by adding 1 thereto. At the same time, the zone voice counters 77a1-7 designated by the control processing unit 4
7am, ~, 77n 1 ~ 77nm to increase the value by adding 1 to the voice count unit, 12 is the zone voice counter 77am ~ 77nm from the zone sound counter located at the rearmost in the voice count memory 7 according to an instruction from the control processing unit 4. Counters 77a1 to 77am, ..., 77n1 to 77nm seek a video conference terminal that is not 0, and in the same zone,
When there are two or more video conference terminals, the upper two audio count determination unit 10 is instructed to determine the speaker, and when there is only one video conference terminal, the posterior zone audio whose speaker is the video conference terminal is used. It is a judgment unit.

【００７１】次に、図６はこの実施例におけるゾーンを
説明するための図である。図６において、Ｃ１，Ｃ２，
Ｃ３，Ｃ４は話者判定期間である。また、話者判定期間
内におけるＺ１，Ｚ２，Ｚ３，Ｚ４，Ｚ５はゾーンであ
る。この例においては、話者判定期間（２秒間）を５つ
のゾーンに分割している。従って、１つのゾーンは４０
０ｍｓであり、音声検出時間が１０個ずつ存在すること
になる。このようにゾーンとは話者判定期間を分割して
作られたものであり、かつ複数の音声検出時間を有して
いるものである。Next, FIG. 6 is a diagram for explaining the zones in this embodiment. In FIG. 6, C1, C2,
C3 and C4 are speaker determination periods. Further, Z1, Z2, Z3, Z4 and Z5 in the speaker determination period are zones. In this example, the speaker determination period (2 seconds) is divided into five zones. Therefore, one zone is 40
It is 0 ms, and there are 10 voice detection times each. In this way, the zone is created by dividing the speaker determination period and has a plurality of voice detection times.

【００７２】次に動作について説明する。ここで、図７
はその動作の流れを示すフローチャートである。ステッ
プＳＴ１において、制御処理部４は話者判定を行う前
に、音声カウント部６に指示して、音声カウントメモリ
７のトータル音声カウンタ７７ａ０〜７７ｎ０とゾーン
音声カウンタ７７ａ１〜７７ａｍ，〜，７７ｎ１〜７７
ｎｍを全て０クリアする。Next, the operation will be described. Here, FIG.
Is a flowchart showing the flow of the operation. In step ST1, the control processing unit 4 instructs the voice counting unit 6 before performing the speaker determination, and the total voice counters 77a0 to 77n0 and the zone voice counters 77a1 to 77am, ..., 77n1 to 77 of the voice count memory 7 are instructed.
Clear all nm.

【００７３】次に、ステップＳＴ２において制御処理部
４は話者判定パラメータ５の音声検出時間５ｂが経過す
るまで待つ。ステップＳＴ３において音声カウント部６
にゾーン位置を通知し、音声カウント処理を指示する。Next, in step ST2, the control processing unit 4 waits until the voice detection time 5b of the speaker determination parameter 5 elapses. In step ST3, the voice counting unit 6
Notify the zone position to and instruct voice count processing.

【００７４】音声カウント部６は音声加算分配部２より
テレビ会議端末毎の音声検出の有無を検出し、音声検出
ありの時は音声カウントメモリ７の該当するテレビ会議
端末のトータル音声カウンタ７７ａ０〜７７ｎ０とゾー
ン位置のゾーン音声カウンタ７７ａ１〜７７ａｍ，〜、
７７ｎ１〜７７ｎを＋１する。The voice counting unit 6 detects the presence or absence of voice detection for each video conference terminal from the voice addition and distribution unit 2. When voice is detected, the total voice counter 77a0 to 77n0 of the corresponding video conference terminal in the voice count memory 7 is detected. And zone voice counters 77a1 to 77am, ...
+1 for 77n1 to 77n.

【００７５】例えば、図６に示す話者判定期間Ｃ１のゾ
ーンＺ１において、５回の音声検出があった場合には、
図２に示す端末１のゾーン１音声カウンタ７７ａ１は５
という値を示すことになる。同様にゾーンＺ２において
４回の音声検出があった場合には図２に示すようにゾー
ン２音声カウンタ７７ａ２の値は４となる。For example, in the zone Z1 of the speaker determination period C1 shown in FIG. 6, when the voice is detected five times,
The zone 1 voice counter 77a1 of the terminal 1 shown in FIG.
Will indicate the value. Similarly, when the voice is detected four times in the zone Z2, the value of the zone 2 voice counter 77a2 becomes 4 as shown in FIG.

【００７６】ステップＳＴ４において制御処理部４は音
声カウント処理の回数が話者判定パラメータ５の話者判
定カウンタ５ａに達するまでステップＳＴ２〜ＳＴ４の
処理を繰り返す。なお、ゾーン位置は、初期値を１と
し、ステップＳＴ４において、制御処理部４は音声カウ
ント処理の回数が話者判定パラメータ５のゾーン音声カ
ウンタ５ｄに達する毎に＋１とすることによりゾーン位
置を変更する。ステップＳＴ４の処理が終わった時点で
は、上記繰り返し処理が５０回繰り返されたことにな
る。また図２に示すトータル音声カウンタ７７ａ０の値
はゾーン１〜ゾーンｍまでの音声カウンタ７７ａ１〜７
７ａｍまでの値を加算した値になる。In step ST4, the control processing unit 4 repeats the processing of steps ST2 to ST4 until the number of voice count processing reaches the speaker determination counter 5a of the speaker determination parameter 5. Note that the zone position is set to an initial value of 1, and in step ST4, the control processing unit 4 changes the zone position by setting it to +1 every time the number of voice count processes reaches the zone voice counter 5d of the speaker determination parameter 5. To do. At the time when the process of step ST4 is completed, the repeating process is repeated 50 times. The value of the total sound counter 77a0 shown in FIG.
It is a value obtained by adding the values up to 7 am.

【００７７】図８はステップＳＴ４の処理が終了した場
合の一例を示す図であり、端末１〜端末４のトータル音
声カウンタ値と、ゾーン１〜ゾーン５までの音声カウン
タ値の一例を示す図である。FIG. 8 is a diagram showing an example of the case where the processing in step ST4 is completed, and is a diagram showing an example of the total voice counter values of terminals 1 to 4 and the voice counter values of zones 1 to 5. is there.

【００７８】次にステップＳＴ５において、制御処理部
４は有効音声判定部８に話者判定パラメータ５の有効音
声カウンタ５ｃの値を通知し、有効音声判定処理を指示
する。有効音声判定部８は音声カウントメモリ７のトー
タル音声カウンタ７７ａ０〜７７ｎ０が有効音声カウン
タ５ｃより少なければ該当するトータル音声カウンタ７
７ａ０〜７７ｎ０とゾーン音声カウンタ７７ａ１〜７７
ａｍ，〜，７７ｎ１〜７７ｎｍを０クリアする。図８に
示した例の場合は、端末４のトータル音声カウンタは有
効音声カウンタ値（３０）よりも小さい値を示している
ため、トータル音声カウンタとゾーン音声カウンタが０
クリアされる。Next, in step ST5, the control processing unit 4 notifies the effective voice determination unit 8 of the value of the effective voice counter 5c of the speaker determination parameter 5, and instructs the effective voice determination process. If the total voice counters 77a0 to 77n0 of the voice count memory 7 are less than the valid voice counter 5c, the valid voice determination unit 8 applies the corresponding total voice counter 7.
7a0 to 77n0 and zone voice counters 77a1 to 77
Am, ~, 77n1-77nm is cleared to 0. In the case of the example shown in FIG. 8, since the total voice counter of the terminal 4 shows a value smaller than the effective voice counter value (30), the total voice counter and the zone voice counter are 0.
Cleared.

【００７９】次にステップＳＴ８において、制御処理部
４は後位ゾーン音声判定部１２に話者判定パラメータ５
の話者特定カウンタ５ｅのしきい値を通知し、後位ゾー
ン音声判定処理を指示する。後位ゾーン音声判定部１２
は音声カウントメモリ７の最も後に位置するゾーン音声
カウンタ７７ａｍ〜７７ｎｍより順にゾーン音声カウン
タ７７ａ１〜７７ａｍ，〜，７７ｎ１〜７７ｎｍが０で
ないテレビ会議端末が１つ以上見つかるまで、ゾーン毎
に求める。図８に示した例の場合は、ゾーン５において
端末１と端末３が０以外の値を示しているため、端末１
と端末３が話者端末となる候補端末に上げられる。Next, in step ST8, the control processing unit 4 causes the posterior zone voice determination unit 12 to determine the speaker determination parameter 5
The threshold value of the speaker identification counter 5e is notified, and the subsequent zone voice determination processing is instructed. Rear zone voice determination unit 12
Is obtained for each zone until one or more video conference terminals whose zone voice counters 77a1 to 77am, ..., 77n1 to 77nm are not 0 are found in order from the zone voice counters 77am to 77nm located at the rearmost of the voice count memory 7. In the case of the example shown in FIG. 8, since terminals 1 and 3 in zone 5 have values other than 0, terminal 1
Then, the terminal 3 is raised to the candidate terminal which becomes the speaker terminal.

【００８０】次に、ステップＳＴ９において後位ゾーン
音声判定部１２は、テレビ会議端末が２つ以上見つかっ
たときは話者未決定として上位２音声カウント判定部１
０に話者判定パラメータ５の話者特定カウンタ５ｅのし
きい値を通知し、上位２音声カウント判定処理を指示す
る。ステップＳＴ６及びステップＳＴ７は実施例１のも
のと同等の動作を行う。テレビ会議端末が１つ見つかっ
た場合は、そのテレビ会議端末を話者と判定し、前回話
者記憶メモリ１１に話者のテレビ会議端末を格納する。
なお、テレビ会議端末が見つからなかった場合は、前回
話者記憶メモリ１１に格納されている話者のテレビ会議
端末を話者と判定する。Next, in step ST9, the posterior zone audio determination unit 12 determines that the speaker is undecided when the two or more video conference terminals are found, and determines the upper two audio count determination unit 1.
The threshold value of the speaker identification counter 5e of the speaker determination parameter 5 is notified to 0, and the upper 2 voice count determination process is instructed. Steps ST6 and ST7 perform the same operations as those in the first embodiment. If one video conference terminal is found, the video conference terminal is determined to be the speaker, and the previous video conference terminal of the speaker is stored in the speaker storage memory 11.
If the video conference terminal is not found, the video conference terminal of the speaker previously stored in the speaker storage memory 11 is determined to be the speaker.

【００８１】図８に示す例においては、端末１と端末３
が候補端末に上げられており、この候補端末のトータル
音声カウンタの差が話者特定カウンタ５ｅに設定された
しきい値（１０）より小さいため、端末１と端末３の何
れかが前回話者記憶メモリ１１に記憶された端末である
場合には前回話者記憶メモリに記憶された端末が今回の
話者端末として判定される。In the example shown in FIG. 8, terminal 1 and terminal 3
Has been raised to the candidate terminal and the difference in the total voice counter of this candidate terminal is smaller than the threshold value (10) set in the speaker identification counter 5e. When the terminal is stored in the storage memory 11, the terminal stored in the speaker storage memory last time is determined as the speaker terminal of this time.

【００８２】以上のステップＳＴ１〜ＳＴ７のような動
作を繰り返し話者判定パラメータ５の話者判定カウンタ
５ａを話者判定による映像切替間隔として話者判定方式
を実現する。The above-described operations of steps ST1 to ST7 are repeated, and the speaker determination method is realized by using the speaker determination counter 5a of the speaker determination parameter 5 as the video switching interval based on the speaker determination.

【００８３】以上のように、この実施例は、テレビ会議
端末からの音声検出結果を話者判定期間を複数に分割し
たゾーン毎にカウントする音声カウント部と最も後に位
置するゾーンより話者となるテレビ会議端末を捜す後位
ゾーン判定部を備えたことを特徴とする。As described above, in this embodiment, the voice is detected from the video conference terminal by the voice counting section which counts each of the zones obtained by dividing the speaker determination period into a plurality of zones, and the zone which is the furthest behind becomes the speaker. It is characterized in that it is provided with a rear zone deciding unit which searches for a video conference terminal.

【００８４】音声カウント部は、テレビ会議端末からの
音声検出結果をゾーン毎にカウントして格納し、後位ゾ
ーン音声判定部は最も後に位置するゾーンより話者とな
るテレビ会議室を捜すことができるため、話者判定期間
において後部で音声検出されたテレビ会議端末を話者と
して判定し、映像の切替を行う話者判定方式を実現す
る。The voice counting unit counts and stores the voice detection result from the video conference terminal for each zone, and the rear zone voice determining unit can search the video conference room to be the speaker from the zone located at the rearmost. Therefore, it is possible to realize the speaker determination method in which the video conference terminal whose voice is detected in the rear part is determined as the speaker in the speaker determination period and the video is switched.

【００８５】この実施例によれば、テレビ会議端末から
の音声の検出結果をゾーン毎にカウントする音声カウン
ト部と後位ゾーン音声判定部を話者判定部に付加するよ
うに構成したので、話者判定期間において、後部で検出
されたテレビ会議端末を話者として判定し、映像切替を
行う話者判定方式を得られる。According to this embodiment, the voice counting unit for counting the detection result of the voice from the video conference terminal for each zone and the rear zone voice judging unit are added to the speaker judging unit. It is possible to obtain a speaker determination method in which the video conference terminal detected at the rear part is determined as a speaker in the speaker determination period and video switching is performed.

【００８６】実施例５．上記実施例においては、前回話
者記憶メモリ１１が前回の話者端末を記憶している場合
について説明したが、図９に示すように前回話者記憶メ
モリが存在しない場合でも構わない。図９に示す例は実
施例４で説明した図５から前回話者記憶メモリ１１を削
除したものである。従って、図９に示す例ではまず後位
ゾーン音声判定部１２により最も後に位置するゾーンよ
り話者となるテレビ会議端末を捜し、上位２音声カウン
ト判定部１０においては、後位ゾーン音声判定部１２に
おいて複数の端末が候補端末として上げられた場合に、
音声検出量が上位のものを選択して今回の話者端末を決
定することになる。この場合、上位２音声カウンタ判定
部においては、前述したように話者特定カウンタ５ｅの
しきい値を用いて判定することなく単に音声検出量が多
い方を話者端末として選択する。Example 5. In the above embodiment, the case where the previous speaker storage memory 11 stores the previous speaker terminal has been described, but it does not matter if the previous speaker storage memory does not exist as shown in FIG. In the example shown in FIG. 9, the previous speaker storage memory 11 is deleted from FIG. 5 described in the fourth embodiment. Therefore, in the example shown in FIG. 9, first, the posterior zone audio determination unit 12 searches for the video conference terminal that is the speaker from the zone located furthest behind, and the upper two audio count determination unit 10 searches for the posterior zone audio determination unit 12 When multiple terminals are listed as candidate terminals in
The speaker with the highest voice detection amount is selected to determine the speaker terminal of this time. In this case, the upper 2 voice counter determination unit does not perform determination using the threshold value of the speaker identification counter 5e as described above, but simply selects one having a larger voice detection amount as the speaker terminal.

【００８７】実施例６．上記実施例においては、上位２
音声カウンタ判定部１０が最終的に話者端末を選択する
場合について説明したが、図１０に示すように後位ゾー
ン音声判定部１２が最終的な話者端末を選択する場合で
あっても構わない。図１０においては、後位ゾーン音声
判定部１２は最も後に位置するゾーンより話者となる端
末を捜し、その端末を話者端末として選択する。もし同
一ゾーンに複数の端末が存在した場合には若い番号をも
つ端末を話者端末として選択する。Example 6. In the above embodiment, the top two
Although the case where the voice counter determination unit 10 finally selects the speaker terminal has been described, the case where the posterior zone voice determination unit 12 selects the final speaker terminal may be performed as illustrated in FIG. 10. Absent. In FIG. 10, the posterior zone voice determination unit 12 searches for a terminal as a speaker from the zone located furthest behind and selects the terminal as a speaker terminal. If there are multiple terminals in the same zone, the terminal with the smaller number is selected as the speaker terminal.

【００８８】実施例７．上記実施例６においては、後位
ゾーン音声判定部１２が最も後に位置するゾーンから話
者となる端末を判定する場合について説明したが、図１
１に示すように、前回話者記憶メモリ１１に前回の話者
端末を記憶させておき後位ゾーン音声判定部において最
も後に位置するゾーンに複数の端末が存在する場合に
は、その複数の端末の中に前回話者記憶メモリ１１に記
憶された前回の話者端末が含まれている場合には前回の
話者端末を今回の話者端末とするようにしても構わな
い。Example 7. In the sixth embodiment, the case where the rear zone voice determination unit 12 determines the terminal to be the speaker from the zone located furthest to the rear has been described.
As shown in FIG. 1, when the previous speaker terminal is stored in the previous speaker storage memory 11 and there are a plurality of terminals in the zone located furthest in the posterior zone voice determination unit, the plurality of terminals are used. If the previous speaker terminal stored in the previous speaker storage memory 11 is included in, the previous speaker terminal may be set as the current speaker terminal.

【００８９】実施例８．上記実施例６及び実施例７にお
いては、後位ゾーン音声判定部１２が最も後に位置する
ゾーンより話者となる端末を判定する場合について説明
したが、最も後に位置するゾーンから端末を捜す場合に
限らず、最も後に位置するゾーンを含む複数のゾーンか
ら話者となる端末を判定しても構わない。例えば、最も
後に位置するゾーン及びその次に位置するゾーン、すな
わち最も後にあるゾーンから２つのゾーンにおいて、有
音と判定された端末に対して端末を候補端末としても構
わない。Example 8. In the sixth and seventh embodiments described above, the case where the posterior zone voice determination unit 12 determines the terminal which is the speaker from the zone located furthest to the rear has been described. Not limited to this, the terminal to be the speaker may be determined from a plurality of zones including the zone located most rearward. For example, a terminal may be set as a candidate terminal for a terminal that is determined to have sound in the zone located at the rearmost and the zone located next thereto, that is, the two zones from the zone located at the back.

【００９０】実施例９．上記実施例４においては、後位
ゾーン音声判定部１２が最も後に位置するゾーンより複
数の端末を検出した場合に、上位２音声カウント判定部
１０により１つの端末を選択する場合について説明した
が、この実施例においては、図１２に示すように上位２
音声カウント判定部１０が先に複数の候補端末を選択
し、後位ゾーン音声判定部１２が選択された複数の候補
端末の中で最も後に位置するゾーンにおいて有音である
と判定した端末を話者端末とする場合について説明す
る。上位２音声カウント判定部１０は前述したように音
声検出量から上位の端末を２つ判定する。後位ゾーン音
声判定部１２はその２つの端末のうちから最も後に位置
するゾーンで有音と判定された端末を話者端末として選
択する。なお、図１２においては前回話者記憶メモリ１
１がない場合について示しているが、図１２において、
前回話者記憶メモリ１１を有しており後位ゾーン音声判
定部１２が前回話者記憶メモリ１１に記憶された前回の
話者端末を優先的に今回の話者端末とするような選択を
するようにしても構わない。Example 9. In the above-described fourth embodiment, the case where the higher-order 2 voice count determination unit 10 selects one terminal when the posterior zone voice determination unit 12 detects a plurality of terminals from the rearmost zone has been described. In this embodiment, as shown in FIG.
The voice count determination unit 10 selects a plurality of candidate terminals first, and the posterior zone voice determination unit 12 talks about a terminal that is determined to be voiced in the zone located furthest out of the selected candidate terminals. The case of using the person terminal will be described. As described above, the upper 2 voice count determination unit 10 determines two upper terminals based on the voice detection amount. The posterior zone voice determination unit 12 selects, as a speaker terminal, a terminal which is determined to be sound in the zone located furthest from the two terminals. In FIG. 12, the previous speaker storage memory 1
Although the case where there is no 1 is shown, in FIG.
It has the previous speaker storage memory 11, and the posterior zone voice determination unit 12 preferentially selects the previous speaker terminal stored in the previous speaker storage memory 11 as the current speaker terminal. You may do so.

【００９１】実施例１０．図１３は、この発明の一実施
例を示すブロック図である。図において、図１と同一符
号を付したものは同一あるいは相当部分であり、説明を
省略する。Example 10. FIG. 13 is a block diagram showing an embodiment of the present invention. In the figure, those denoted by the same reference numerals as those in FIG. 1 are the same or corresponding parts, and description thereof will be omitted.

【００９２】９は従来のそれと同一の動作を行う最大音
声カウント判定部９、１３は最大音声カウント判定部９
により話者と映像切替の有無を格納する映像切替情報メ
モリ、１４は制御処理部４からの指示により映像切替情
報メモリ１３に格納された映像切替の有無が映像切替あ
りの時、話者固定時間だけ話者判定を遅らせる話者固定
判定部である。Reference numeral 9 is a maximum voice count determination unit 9 which performs the same operation as that of the conventional one, and 13 is a maximum voice count determination unit 9
The video switching information memory that stores the speaker and the presence / absence of the video switching, 14 is the speaker fixed time when the video switching information stored in the video switching information memory 13 according to the instruction from the control processing unit 4 is the video switching. It is a speaker fixed determination unit that delays the speaker determination only.

【００９３】次に動作について説明する。ここで図１４
はその動作の流れを示すフローチャートである。ステッ
プＳＴ１〜ＳＴ５及びステップＳＴ７は、図３に示した
実施例１のものと同等の動作を行うため説明を省略す
る。Next, the operation will be described. Here in FIG.
Is a flowchart showing the flow of the operation. Since steps ST1 to ST5 and step ST7 perform the same operations as those of the first embodiment shown in FIG. 3, the description thereof will be omitted.

【００９４】ステップＳＴ１１において、制御処理部４
は最大音声カウント判定部９に話者判定を指示する。最
大音声カウント判定部９は音声カウントメモリ７のトー
タル音声カウンタ７７ａ０〜７７ｎ０より最も音声カウ
ント値が大きいものを話者として判定し、映像切替分配
部３に該当するテレビ会議端末の映像を他のテレビ会議
端末に分配するように指示する。このとき最大音声カウ
ント部９は映像切替情報メモリ１３より前回話者を参照
し、今回の話者と異なれば、映像切替をありとし、同一
であれば映像切替なしとして話者と映像の切替の有無を
映像切替情報メモリ１３に格納する。At step ST11, the control processing unit 4
Instructs the maximum voice count determination unit 9 to determine the speaker. The maximum audio count determination unit 9 determines that the speaker having the largest audio count value than the total audio counters 77a0 to 77n0 of the audio count memory 7 is the speaker, and the video of the video conference terminal corresponding to the video switching distribution unit 3 is changed to another TV. Instruct the conference terminals to distribute. At this time, the maximum audio count unit 9 refers to the previous speaker from the video switching information memory 13, and if the speaker is different from this time, the video is switched, and if the speaker is the same, the video is switched without switching the speaker and the video. The presence / absence is stored in the video switching information memory 13.

【００９５】ステップＳＴ１２において、制御処理部４
は話者固定判定部１４に話者判定パラメータ５の話者固
定時間５ｆを通知する。話者固定判定部１４は映像切替
情報メモリ１３より映像切替の有無を参照し、映像切替
ありの時は、ステップＳＴ１３において話者判定固定時
間５ｆが経過するまで待つ。In step ST12, the control processing unit 4
Informs the speaker fixation determining unit 14 of the speaker fixation time 5f of the speaker determination parameter 5. The speaker fixing determination unit 14 refers to the presence / absence of video switching from the video switching information memory 13, and when the video switching is present, waits for the speaker determination fixed time 5f to elapse in step ST13.

【００９６】図１５は、映像固定処理ＳＴ１３が行われ
た場合の一例を示す図であり、話者判定期間Ｃ１におけ
る話者判定の結果、話者端末が端末１から端末２に切り
替わる場合について示した図である。図１５（ａ）は従
来の端末切替の場合を示しており、図１５（ｂ）はこの
実施例における端末切替の場合を示している。図１５
（ａ）に示すように話者判定期間（２秒間）毎に端末が
切り替わる場合、この実施例によれば端末が切り替わる
毎に図２に示した話者固定時間５ｆが挿入されることに
なる。従って、端末１から端末２に切り替わる場合には
通常の時間（２秒）よりもさらに話者固定時間（１秒）
が加えられた時間、すなわち３秒間が次の時間になり、
端末が２秒間しか表示されないという現象がなくなる。
このように端末の切替が短時間に頻繁に行われる場合に
はその映像を見ている人間に不快感を与えるため、一度
表示した映像を所定時間延長することにより映像を見て
いるものに対して不快感を除去することが可能となる。FIG. 15 is a diagram showing an example of the case where the video fixing process ST13 is performed, and shows the case where the speaker terminal is switched from the terminal 1 to the terminal 2 as a result of the speaker determination in the speaker determination period C1. It is a figure. FIG. 15A shows the case of conventional terminal switching, and FIG. 15B shows the case of terminal switching in this embodiment. Figure 15
When the terminal switches every speaker determination period (2 seconds) as shown in (a), according to this embodiment, the speaker fixed time 5f shown in FIG. 2 is inserted every time the terminal switches. . Therefore, when the terminal 1 is switched to the terminal 2, the fixed time of the speaker (1 second) is longer than the normal time (2 seconds).
Is added time, that is, 3 seconds is the next time,
The phenomenon that the terminal is displayed for only 2 seconds disappears.
In this way, when the terminals are frequently switched in a short time, it gives an unpleasant feeling to the person watching the image. It is possible to eliminate discomfort.

【００９７】以上のステップＳＴ１〜ＳＴ１３のような
動作を繰り返し、映像切替ありの時は、話者判定パラメ
ータ５の話者判定カウンタ５ａの値と話者判定固定時間
５ｆの値を加算した値を、また、映像切替なしの時は、
話者判定パラメータ５の話者判定カウンタ５ａの値を話
者判定による映像切替時間として話者判定方式を実現す
る。When the video switching is performed by repeating the above-mentioned operations of steps ST1 to ST13, a value obtained by adding the value of the speaker determination counter 5a of the speaker determination parameter 5 and the value of the speaker determination fixed time 5f is set. , When there is no video switching,
The speaker determination method is realized by using the value of the speaker determination counter 5a of the speaker determination parameter 5 as the video switching time by the speaker determination.

【００９８】すなわち、話者固定判定部は、映像切替情
報メモリに格納された映像切替の有無により、映像切替
ありの時は、一定時間待ち状態を作ることにより、話者
判定の開始を遅らせることができるため、映像切替後、
すぐに話者判定による映像の切替が行われない話者判定
方式を実現する。That is, the speaker fixing determination section delays the start of speaker determination by creating a waiting state for a certain period of time when there is video switching depending on whether or not there is video switching stored in the video switching information memory. Because you can, after switching the video,
To realize a speaker determination method that does not switch the video immediately by speaker determination.

【００９９】この実施例によれば、映像切替情報メモリ
と話者固定判定部を話者判定部に付加するように構成し
たので、映像切替ありの時に、話者判定による映像切替
がすぐに行われないため、話者のテレビ会議端末の映像
を確認できる話者判定方式を得られる効果がある。According to this embodiment, since the video switching information memory and the speaker fixing determination section are added to the speaker determination section, the video switching by the speaker determination can be performed immediately when the video switching is performed. Therefore, there is an effect that a speaker determination method capable of confirming the image of the speaker's video conference terminal can be obtained.

【０１００】実施例１１．図１６はこの発明の一実施例
を示すブロック図である。図において、図１と同一符号
を付したもの及び図１３と同一符号を付したものは同一
あるいは相当部分であり、説明を省略する。Example 11. FIG. 16 is a block diagram showing an embodiment of the present invention. In the figure, those given the same reference numerals as those in FIG. 1 and those given the same reference numerals as those in FIG. 13 are the same or corresponding parts, and a description thereof will be omitted.

【０１０１】１５は制御処理部４からの指示により、映
像切替情報メモリ１３に格納された映像切替の有無によ
り、話者判定パラメータ５を変更する話者判定パラメー
タ変更処理部、１６は話者判定パラメータ変更処理部１
５より参照され、映像切替の有無により異なった話者判
定パラメータをデフォルトとして格納する話者判定デフ
ォルトパラメータである。ここで、話者判定デフォルト
パラメータは図２の話者判定パラメータと同様なフォー
マットを有しており、図２に示した話者判定パラメータ
とは少なくとも一つの値が異なるものをいうものとす
る。Reference numeral 15 is a speaker determination parameter change processing unit for changing the speaker determination parameter 5 according to an instruction from the control processing unit 4 depending on the presence / absence of video switching stored in the video switching information memory 13, and 16 is a speaker determination parameter. Parameter change processing unit 1
5 is a speaker determination default parameter that stores different speaker determination parameters as defaults depending on whether video is switched. Here, the speaker determination default parameter has the same format as the speaker determination parameter of FIG. 2, and at least one value is different from the speaker determination parameter shown in FIG.

【０１０２】次に動作について説明する。ここで、図１
７はその動作の流れを示すフローチャートである。ステ
ップＳＴ１〜ＳＴ７は図１４に示したものと同等の動作
を行うため説明を省略する。Next, the operation will be described. Here, FIG.
7 is a flowchart showing the flow of the operation. Since steps ST1 to ST7 perform the same operations as those shown in FIG. 14, the description thereof will be omitted.

【０１０３】ステップＳＴ１２において、制御処理部４
は話者判定パラメータ変更処理部１５にデフォルトパラ
メータの変更を指示する。話者判定パラメータ変更処理
部１５は映像切替情報メモリ１３より映像切替の有無を
参照し、映像切替ありの時はステップ１４において話者
判定デフォルトパラメータ１６の映像切替ありの時のパ
ラメータを話者判定パラメータ５にコピーする。映像切
替なしの時は、ステップ１５において話者判定デフォル
トパラメータ１６の映像切替なしの時のパラメータを話
者判定パラメータ５にコピーする。前述したように、話
者判定デフォルトパラメータ１６に格納されたパラメー
タは、話者判定パラメータ５に格納された話者判定パラ
メータと同一の構成とするが、映像切替ありの時の話者
判定パラメータは、例えば話者判定による映像切替間隔
を長くしたいために、映像切替なしの時に比べて少なく
とも話者判定カウンタ５ａ及び音声検出時間５ｂは長め
に設定しておくものとする。At step ST12, the control processing unit 4
Instructs the speaker determination parameter change processing unit 15 to change the default parameters. The speaker determination parameter change processing unit 15 refers to the presence / absence of video switching from the video switching information memory 13, and when video switching is performed, the speaker determination default parameter 16 in step 14 is used as the speaker determination parameter when video switching is performed. Copy to parameter 5. When there is no video switching, the parameter of the speaker determination default parameter 16 when there is no video switching is copied to the speaker determination parameter 5 in step 15. As described above, the parameter stored in the speaker determination default parameter 16 has the same configuration as the speaker determination parameter stored in the speaker determination parameter 5, but the speaker determination parameter when video switching is performed is For example, in order to lengthen the video switching interval based on the speaker determination, it is assumed that at least the speaker determination counter 5a and the voice detection time 5b are set to be longer than when there is no video switching.

【０１０４】以上のステップＳＴ１〜ＳＴ１５のような
動作を繰り返し、映像切替ありの時は、映像切替なしの
時に比べて、話者判定による映像切替時間が長い話者判
定方式を実現する。By repeating the above-mentioned operations of steps ST1 to ST15, the speaker determination method in which the video switching time by the speaker determination is longer is realized when the video is switched than when the video is not switched.

【０１０５】以上のように、この実施例に係る話者判定
方式は、話者判定による話者の有無を格納する映像切替
情報メモリと話者判定パラメータのデフォルト値を格納
する話者判定デフォルトパラメータと、話者の有無によ
り話者判定期間の異なるデフォルトパラメータを話者判
定パラメータにコピーする話者判定パラメータ変更処理
部を設けたものである。As described above, in the speaker determination method according to this embodiment, the video switching information memory for storing the presence / absence of the speaker by the speaker determination and the speaker determination default parameter for storing the default value of the speaker determination parameter. And a speaker determination parameter change processing unit for copying a default parameter having a different speaker determination period depending on the presence or absence of a speaker to the speaker determination parameter.

【０１０６】話者判定変更処理部は、映像切替情報メモ
リに格納された話者の有無により話者ありの時に話者判
定デフォルトパラメータに格納された話者判定期間の長
いデフォルトパラメータを話者判定パラメータにコピー
するため、話者判定による映像切替を遅くすることがで
き、話者なしの時に話者判定デフォルトパラメータに格
納された話者判定期間の短いデフォルトパラメータを話
者判定パラメータにコピーするため、話者判定による映
像切替を速くすることができる話者判定方式を実現す
る。The speaker determination change processing section determines the default parameter having a long speaker determination period stored in the speaker determination default parameter when there is a speaker stored in the video switching information memory, to determine the speaker. Since it is copied to the parameter, the video switching by the speaker determination can be delayed, and the default parameter with a short speaker determination period stored in the speaker determination default parameter when there is no speaker is copied to the speaker determination parameter. To realize a speaker determination method that can speed up video switching by speaker determination.

【０１０７】この実施例によれば、映像切替情報メモリ
と話者判定デフォルトパラメータと話者判定パラメータ
変更処理部を話者判定部に付加するように構成したの
で、映像切替ありの時は、話者判定による映像の切替は
遅く、映像切替なしの時は話者判定による映像切替は速
く行うことができるため、話者のテレビ会議端末の映像
の切替をスムーズにできる話者判定方式を得られる。According to this embodiment, the video switching information memory, the speaker determination default parameter, and the speaker determination parameter change processing unit are added to the speaker determination unit. The video switching based on the speaker determination is slow, and the video switching based on the speaker determination can be performed quickly when the video switching is not performed. Therefore, a speaker determination method that can smoothly switch the video of the speaker's video conference terminal can be obtained. .

【０１０８】実施例１２．上記実施例１０においては、
話者判定期間を固定時間だけシフトさせる場合について
説明したが、話者判定期間に対して１以上の倍率を掛け
て次の話者判定期間を長く設定しても構わない。例え
ば、倍率として２を用いる場合には次の話者判定期間は
４秒になる。この場合には、図２に示した話者判定パラ
メータはそれぞれ２倍されるものとする。あるいは、倍
率は１以下であっても構わない。例えば０．５としても
構わない。この場合には話者判定期間は半分の１秒にな
る。また、図２に示したその他の話者判定パラメータの
値も０．５を掛けて使用することになる。Example 12. In Example 10 above,
Although the case where the speaker determination period is shifted by a fixed time has been described, the speaker determination period may be multiplied by one or more to set the next speaker determination period longer. For example, when 2 is used as the scaling factor, the next speaker determination period is 4 seconds. In this case, the speaker determination parameters shown in FIG. 2 are each doubled. Alternatively, the magnification may be 1 or less. For example, it may be 0.5. In this case, the speaker determination period is half, that is, one second. Further, the values of the other speaker determination parameters shown in FIG. 2 are also used after being multiplied by 0.5.

【０１０９】実施例１３．上記実施例１０においては、
固定時間を挿入する場合について説明したが、次の話者
判定期間の前半部分の１秒間を削除するようにしても構
わない。もし同一端末が連続して表示されている場合に
は映像切替が早めに起こっても構わないため、話者判定
期間を短くすることにより新しく話者となる人を即座に
検出することが可能になる。すなわち、最悪の場合には
２秒間待たなければ次の話者が表示されないのに対し
て、この実施例によれば話者判定期間を短くすることに
より、例えば最悪でも１秒以内に次の話者を表示するこ
とが可能になる。このように１秒以内に次の話者を表示
する場合であっても現在まで表示されている話者がそれ
以前から継続して表示されているため画像切替が頻繁に
起こって見にくいという弊害はない。Example 13. In Example 10 above,
Although the case of inserting the fixed time has been described, the first half of the first half of the next speaker determination period may be deleted. If the same terminal is continuously displayed, video switching may occur earlier, so by shortening the speaker determination period, it is possible to immediately detect a new speaker. Become. That is, in the worst case, the next speaker is not displayed until the second talk is waited for. However, according to this embodiment, by shortening the talker determination period, for example, the next talk is made within 1 second at worst. It becomes possible to display the person. Even when the next speaker is displayed within one second, the speaker displayed up to now is continuously displayed from before that, so that the image switching frequently occurs and it is difficult to see. Absent.

【０１１０】実施例１４．上記実施例１１においては、
話者判定デフォルトパラメータが図２に示した話者判定
パラメータと同一の構成である場合について説明した
が、話者判定デフォルトパラメータは必ずしも図２に示
した話者判定パラメータと同一の構成を取る必要はな
く、変更しようと思うパラメータのみを幾つか有してい
る場合でも構わない。例えば、有効音声カウンタ５ｃの
値を変更したい場合には、この有効音声カウンタを代替
えするためのデフォルトパラメータを幾つか用意してお
けば良い。有効音声カウンタの値を大きくすれば無音で
あると判定される端末の数が増加する。逆に有効音声カ
ウンタの値を小さくすれば有音であると判定される端末
の数が増加する。あるいは、前述した実施例において、
しきい値を用いる場合と用いない場合があるということ
を説明したが、ゾーン音声カウンタ５ｄ及び話者特定カ
ウンタ５ｅ等のしきい値の値を０とすればしきい値を用
いない場合と全く同様の動作を行うことになり、しきい
値を０とするパラメータをデフォルトパラメータとして
有しておき、映像切替の有無によりこれらの値を書き換
えるようにしても構わない。Example 14 In Example 11 above,
The case has been described where the speaker determination default parameter has the same configuration as the speaker determination parameter shown in FIG. 2, but the speaker determination default parameter does not necessarily have to have the same configuration as the speaker determination parameter shown in FIG. However, it does not matter even if there are only some parameters that are to be changed. For example, when it is desired to change the value of the valid voice counter 5c, some default parameters for substituting the valid voice counter may be prepared. Increasing the value of the valid voice counter increases the number of terminals that are determined to be silent. On the contrary, if the value of the valid voice counter is decreased, the number of terminals that are determined to have a voice increases. Alternatively, in the embodiment described above,
Although it has been described that the threshold value may or may not be used, if the threshold value of the zone voice counter 5d and the speaker identification counter 5e is set to 0, no threshold value is used. The same operation is performed, and a parameter for setting the threshold value to 0 may be set as a default parameter, and these values may be rewritten depending on the presence / absence of video switching.

【０１１１】実施例１５．上記実施例においては、テレ
ビ会議端末の映像切替の場合について説明したが、この
発明における話者判定はテレビ会議端末に用いるばかり
でなく、複数の端末から音声信号が入力されその入力さ
れた音声信号から話者を判定する場合について適応する
ことが可能である。また、上記実施例においては話者端
末を１つだけ判定する場合について説明したが、例え
ば、表示する映像が２個あるいは４個可能な場合には話
者端末を２個あるいは４個選択する場合でも構わない。Example 15. In the above embodiment, the case of video switching of the video conference terminal has been described, but the speaker determination in the present invention is not only used for the video conference terminal, but voice signals are input from a plurality of terminals and the input voice signals are input. It is possible to adapt the case of determining the speaker from. In the above embodiment, the case where only one speaker terminal is determined has been described. For example, when two or four video images can be displayed, two or four speaker terminals are selected. But it doesn't matter.

【０１１２】[0112]

【発明の効果】以上のようにこの発明によれば、複数の
端末から音声信号が入力されその中から話者を判定する
場合に頻繁に話者が切り替わってしまうという不具合が
除去できる。従って、例えばその判定された話者を用い
て映像の切替等を行う場合には映像の切替が自然に行わ
れる話者判定方式を得ることができる。As described above, according to the present invention, it is possible to eliminate the inconvenience that a speaker is frequently switched when a voice signal is input from a plurality of terminals and a speaker is judged from the voice signals. Therefore, for example, when the video is switched using the determined speaker, the speaker determination method in which the video is naturally switched can be obtained.

[Brief description of drawings]

【図１】この発明の一実施例による話者判定方式を示す
ブロック図である。FIG. 1 is a block diagram showing a speaker determination method according to an embodiment of the present invention.

【図２】この話者判定パラメータ及び音声カウントメモ
リの構成例を示す図である。FIG. 2 is a diagram showing a configuration example of a speaker determination parameter and a voice count memory.

【図３】この発明の一実施例の動作の流れを示すフロー
チャート図である。FIG. 3 is a flowchart showing a flow of operation of the embodiment of the present invention.

【図４】この発明の一実施例のトータル音声カウンタの
値を示す図である。FIG. 4 is a diagram showing values of a total voice counter according to an embodiment of the present invention.

【図５】この発明の一実施例を示すブロック図である。FIG. 5 is a block diagram showing an embodiment of the present invention.

【図６】この発明の一実施例を示すゾーンの概念を示す
図である。FIG. 6 is a diagram showing a concept of zones showing an embodiment of the present invention.

【図７】この発明の一実施例の動作の流れを示すフロー
チャート図である。FIG. 7 is a flow chart diagram showing the flow of operation of an embodiment of the present invention.

【図８】この発明の一実施例のトータル音声カウンタと
ゾーン毎の音声カウンタの値を示す図である。FIG. 8 is a diagram showing values of a total voice counter and a voice counter for each zone according to an embodiment of the present invention.

【図９】この発明の一実施例による話者判定方式を示す
ブロック図である。FIG. 9 is a block diagram showing a speaker determination method according to an embodiment of the present invention.

【図１０】この発明の一実施例による話者判定方式を示
すブロック図である。FIG. 10 is a block diagram showing a speaker determination method according to an embodiment of the present invention.

【図１１】この発明の一実施例による話者判定方式を示
すブロック図である。FIG. 11 is a block diagram showing a speaker determination method according to an embodiment of the present invention.

【図１２】この発明の一実施例による話者判定方式を示
すブロック図である。FIG. 12 is a block diagram showing a speaker determination method according to an embodiment of the present invention.

【図１３】この発明の一実施例を示すブロック図であ
る。FIG. 13 is a block diagram showing an embodiment of the present invention.

【図１４】この発明の一実施例の動作の流れを示すフロ
ーチャート図である。FIG. 14 is a flowchart showing a flow of operations of the embodiment of the present invention.

【図１５】この発明の一実施例による固定時間を挿入し
た場合の動作を示す図である。FIG. 15 is a diagram showing an operation when a fixed time is inserted according to an embodiment of the present invention.

【図１６】この発明の一実施例を示すブロック図であ
る。FIG. 16 is a block diagram showing an embodiment of the present invention.

【図１７】この発明の一実施例の動作の流れを示すフロ
ーチャート図である。FIG. 17 is a flowchart showing the flow of operation of the embodiment of the present invention.

【図１８】テレビ会議システムのシステム構成を示す図
である。FIG. 18 is a diagram showing a system configuration of a video conference system.

【図１９】多地点間通信制御装置の構成図である。FIG. 19 is a block diagram of a multipoint communication control device.

【図２０】音声加算分配部の構成図である。FIG. 20 is a configuration diagram of a voice addition / distribution unit.

【図２１】従来の話者判定方式を示すブロック図であ
る。FIG. 21 is a block diagram showing a conventional speaker determination method.

【図２２】従来の話者判定方式の話者判定パラメータ及
び音声カウントメモリの構成図である。FIG. 22 is a configuration diagram of a speaker determination parameter and a voice count memory in a conventional speaker determination method.

【図２３】従来の話者判定方式における話者判定期間と
音声検出時間を示す図である。FIG. 23 is a diagram showing a speaker determination period and a voice detection time in a conventional speaker determination method.

[Explanation of symbols]

１話者判定部４制御処理部５話者判定パラメータ６音声カウント部７音声カウントメモリ８有効音声判定部１０上位２音声カウント判定部１１前回話者記憶メモリ１２後位ゾーン音声判定部１３映像切替情報メモリ１４話者固定判定部１５話者判定パラメータ変更処理部１６話者判定デフォルトパラメータ 1 Speaker determination unit 4 Control processing unit 5 Speaker determination parameter 6 Voice counting unit 7 Voice count memory 8 Effective voice determination unit 10 Higher 2 voice count determination unit 11 Previous speaker memory memory 12 Subordinate zone voice determination unit 13 Video switching Information memory 14 Speaker fixed determination unit 15 Speaker determination parameter change processing unit 16 Speaker determination default parameter

Claims

[Claims]

1. A speaker determination in which voice signals from a plurality of terminals are input, the presence or absence of voice is detected in each voice detection period of a predetermined length based on the input voice signals, and a plurality of voice detection periods are collected. In the speaker determination method that determines the speaker terminal that becomes the speaker for each period, in the previous speaker determination period, the previous speaker storage unit that stores the terminal determined to be the speaker as the previous terminal, and the current speaker In the speaker determination period, a plurality of candidate terminals that are candidates for speaker terminals are selected by comparing predetermined parameters according to predetermined criteria, and candidate terminals selected by the previous terminal stored by the previous speaker storage means are selected. And a speaker determining unit that uses the previous terminal as a speaker terminal of this time, the speaker determining method.

2. The speaker determination means is obtained from the number of voice detection periods determined to be voice among a plurality of voice detection periods in the speaker determination period as a predetermined criterion for selecting a candidate terminal. The speaker determination method according to claim 1, wherein the voice frequency is used.

3. The speaker determination unit is a speaker determination period of a voice detection period determined to be voice among a plurality of voice detection periods in the speaker determination period as a predetermined criterion for selecting a candidate terminal. The speaker determination method according to claim 1, wherein a voiced position obtained from the position inside is used.

4. The speaker determination means is obtained from the number of voice detection periods determined to be voice among a plurality of voice detection periods in the speaker determination period as a predetermined criterion for selecting a candidate terminal. It is characterized by using both the voice frequency and the voiced position obtained from the position within the speaker determination period of the voice detection period determined to be voice among a plurality of voice detection periods in the speaker determination period. The speaker determination method according to claim 1.

5. The speaker determining means compares the previous terminal with the speaker terminal when the difference between the parameters of the previous terminal and the candidate terminal based on the predetermined criterion is less than a predetermined threshold value. The speaker determination method according to claim 1, 2, 3, or 4, wherein:

6. The speaker determination method according to claim 1, 3, 4, or 5, wherein the speaker determination period is divided into zones, and the speaker terminal is determined based on a predetermined standard in units of zones. Speaker determination method.

7. A speaker determination in which voice signals from a plurality of terminals are input, the presence or absence of voice is detected for each voice detection period of a predetermined length based on the input voice signals, and a plurality of voice detection periods are collected. In the speaker determination method that determines the speaker terminal that becomes the speaker for each period, the position within the speaker determination period of the voice detection period determined to be voice among the multiple voice detection periods in the speaker determination period. A speaker determination method characterized in that the speaker terminal is determined based on.

8. A speaker determination in which voice signals from a plurality of terminals are input, the presence or absence of voice is detected for each voice detection period of a predetermined length based on the input voice signals, and a plurality of voice detection periods are collected. In the speaker determination method that determines the speaker terminal that becomes the speaker for each period, the terminal determined as the speaker in the previous speaker determination period and the terminal determined as the speaker in the current speaker determination period A speaker determination method characterized by comprising interval changing means for changing an interval for judging a speaker according to the difference.

9. The interval changing means, when a terminal determined as a speaker in a previous speaker determination period and a terminal determined as a speaker in a current speaker determination period are different from each other,
9. The speaker determination system according to claim 8, further comprising speaker fixing means for delaying the start time of the next speaker determination period by a predetermined time.

10. The speaker determination method includes a speaker determination period storage unit that stores the speaker determination period, and the interval changing unit, the speaker determination period stored in the speaker determination period storage unit. 9. The speaker determination system according to claim 8, further comprising a speaker determination period changing unit for changing the speaker determination period.

11. The speaker determination method according to claim 8, wherein the interval changing means sets a period obtained by multiplying the speaker determination period by a predetermined multiplication factor as a next speaker determination period. .

12. The speaker determination period when the terminal determined to be the speaker in the previous speaker determination period and the terminal determined to be the speaker in the current speaker determination period are the same. 9. The speaker determination method according to claim 8, wherein the speaker determination period obtained by deleting a part of the above is set as the next speaker determination period.

13. A voice signal from a plurality of terminals is input, the presence or absence of voice is detected for each voice detection period of a predetermined length based on the input voice signals, and a speaker determination in which a plurality of voice detection periods are collected is made. In the speaker determination method that determines the speaker terminal that becomes the speaker for each period, the speaker determination parameter storage unit that stores various parameters used for speaker determination, and the value of the parameter stored in the speaker determination parameter storage unit The default parameter storage unit that stores parameters having different parameter values, and the difference between the terminal determined as the speaker in the previous speaker determination period and the terminal determined as the speaker in the current speaker determination period To change the value of the parameter stored in the speaker determination parameter storage unit according to the value of the parameter stored in the default parameter storage unit. Speaker determination method characterized by comprising the meter change means.