JPH024095A

JPH024095A - Speaker deciding system for inter-multispot video conference

Info

Publication number: JPH024095A
Application number: JP15238588A
Authority: JP
Inventors: Michiaki Matsuura; 松浦　道明
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1988-06-22
Filing date: 1988-06-22
Publication date: 1990-01-09
Anticipated expiration: 2013-06-04
Also published as: JP2760804B2

Abstract

PURPOSE:To switch the picture of a talker through a simple circuit by inputting a pulse to a front protection circuit when the presence of a voice is detected from the pronunciation of the talker, and inputting it to a back protection circuit when the lack of the voice is detected, and controlling an FF by the outputs of both the protection circuits. CONSTITUTION:The voice input 1 of the talker is decided 2 about presence of voice/lack of voice at every prescribed period. A voiced decision output 3 outputs the pulse '1' to an N1-ary counter 5 as the front protection circuit, and unvoiced decision output 4 outputs the pulse '1' to an N2-ary counter 6 as the back protection circuit. During speech, the voiced decision output 3 continues, and when it continues by N1 pieces, it is counted over, and it is inputted to the set input of the FF 13, and outputs a talker decision output 14. When it comes to an unvoiced state after somebody became the talker once, and the pulse '1' continues by N2 pieces, the counter 6 counts over, and resets the FF 13 and outputs a non-talker decision output 15. Thus, prescribed time later after the presence of voice/lack of voice is detected, the talker/ non-talker is decided.

Description

【発明の詳細な説明】（発明の属する技術分野）本発明は多地点間テレビ会議において複数地点の中から
、音声レベルの検出により、話者対地に自動的に切り換
える話者判定方式に関するものである。[Detailed Description of the Invention] (Technical Field to Which the Invention Pertains) The present invention relates to a method for determining a speaker from among multiple points in a multi-point video conference by automatically switching to a speaker-to-ground system by detecting audio levels. be.

（従来の技術）従来、この種の多地点間映像会議システムにおける話者
判定としては、各地点の音声入力毎に音声検出器に設け
、該音声検出器により有音が無音かの検出を行い、有音
が検出された場合、その有音の継続時間に応じたハング
オーバ時間を有音判定に付加して有音情報として出方し
、このハングオーバ時間についた有音情報を音声入力毎
に設けた話者判定器に入力し、同判定器で一定入方毎に
、このハングオーバ時間についた有音情報に対し、話者
判定しきい値時間によって、話者／非話者を判定してい
た。(Prior Art) Conventionally, speaker determination in this type of multipoint video conference system involves installing a voice detector for each voice input at each point, and using the voice detector to detect whether there is a sound or no sound. When a voice presence is detected, a hangover time corresponding to the duration of the voice presence is added to the voice presence determination and output as voice presence information, and voice presence information about this hangover time is provided for each voice input. This was input into a speaker discriminator, and the same discriminator determined whether the speaker was a speaker or a non-speaker based on the speaker discrimination threshold time based on the voice presence information associated with this hangover time at certain intervals. .

この方法では一定時間間隔で話者／非話者の判定を行う
ので、話者／非話者判定開始時刻と有音判定開始時刻が
非同期となり、話者判定に要する時間がばらつく欠点が
ある。また、制御すべき時間パラメータとして有音の継
続時間、ハングオーバ時間、話者判定しきい値時間、一
定時間の話者／非話者判定間隔と制御パラメータの個数
（４個）が多く、回路が複雑になる欠点があった。In this method, since speaker/non-speaker determination is performed at regular time intervals, the speaker/non-speaker determination start time and the utterance determination start time are asynchronous, resulting in a drawback that the time required for speaker determination varies. In addition, the time parameters to be controlled include the duration of voice, hangover time, speaker determination threshold time, fixed time speaker/non-speaker determination interval, and the number of control parameters (4), which makes the circuit difficult to control. It had the disadvantage of being complicated.

（発明の目的）本発明は上述した従来の欠点を解消し、制御するパラメ
ータの個数を従来に比べて半分に減少させて制御回路の
簡単化と制御を容易にすることを目的とする。(Objective of the Invention) It is an object of the present invention to eliminate the above-mentioned drawbacks of the conventional technique and to simplify the control circuit and facilitate control by reducing the number of parameters to be controlled by half compared to the conventional technique.

（発明の構成）（発明の特徴と従来技術との差異）本発明は話者識別画面切替制御のため、有音／無音判定
器と前方、後方保護回路およびセットリセット型フリッ
プフロップにより構成され、前記前方、後方保護回路に
よる話者／非話者判定に要する時間を一定の値として話
者／非話者判定をすることを特徴とする。(Structure of the Invention) (Characteristics of the Invention and Differences from the Prior Art) The present invention includes a voice/silence determiner, front and rear protection circuits, and a set-reset type flip-flop for speaker identification screen switching control. The present invention is characterized in that the speaker/non-speaker determination is performed by setting the time required for the speaker/non-speaker determination by the front and rear protection circuits to be a constant value.

従来技術では一定時間間隔で話者／非話者の判定を行な
い判定に要する時間がばらつくが、本発明は話者／非話
者判定に要する時間を一定の値として判定を安定化した
点が異なる。In the conventional technology, the speaker/non-speaker is determined at regular intervals, and the time required for determination varies, but the present invention stabilizes the determination by setting the time required for speaker/non-speaker determination to a constant value. different.

（実施例）図は本発明方式の一実施例の構成ブロック図であり、こ
れは、複数テレビ会議室を映像回路と音声回線で結び多
地点間テレビ会議を行なうシステムにおける話者識別画
面切替制御の構成例である。(Embodiment) The figure is a configuration block diagram of an embodiment of the method of the present invention, and this figure shows speaker identification screen switching control in a system that connects multiple video conference rooms through video circuits and audio lines and conducts a multipoint video conference. This is a configuration example.

１は音声入力、２は有音／無音判定器、３は有音判定出
力、４は無音判定出力、５は前方保護回路としてのＮ１
進カウンタ、６は後方保護回路としてのＮ２進カウンタ
、７はＮ０進カウンタ出力、８はＮ２進カウンタ出力、
９はＮよ進カウンタ５のリセット入力、１０はＮ２進カ
ウンタ６のリセット入力、１１はセット・リセット形フ
リップフロップ１３のセット入力、１２は同フリップフ
ロップのリセット入力、１４は話者判定出力、１５は非
話者判定出力である。1 is a voice input, 2 is a sound/silence judge, 3 is a sound judgment output, 4 is a silence judgment output, and 5 is N1 as a forward protection circuit.
6 is an N binary counter as a backward protection circuit, 7 is an N0 binary counter output, 8 is an N binary counter output,
9 is a reset input of the N-ary counter 5, 10 is a reset input of the N-2 counter 6, 11 is a set input of the set/reset type flip-flop 13, 12 is a reset input of the same flip-flop, 14 is a speaker determination output, 15 is a non-speaker determination output.

これは有音を検出してもただちに話者と判定せずに、有
音回数が一定の値（前方保護段数）を越えた場合に話者
と判定し、一度無音と判定されたらただちに非話者と判
定せず、無音回数が一定の値（後方保護段数）を越えた
時、初めて非話者と判定することが可能となる。以下こ
れについて説明する。This method does not immediately identify a speaker even when a voice is detected, but determines a speaker when the number of voices exceeds a certain value (the number of forward protection steps), and once it is determined that there is no voice, it immediately determines the speaker It becomes possible to determine that the person is a non-speaker only when the number of silences exceeds a certain value (the number of backward protection steps). This will be explained below.

いま音声人力１に発言者の音声が入力するものとする。It is now assumed that the speaker's voice is input to the voice operator 1.

発言者の音声は有音／無音判定器２において、一定周期
で音声電力を計算し、振幅方向で設定された音声しきい
値レベルによって、有音／無音を判定し、有音であれば
有音判定出力３にＩＩ　Ｉ　Ｉ＋パルス、無音であれば
、無音判定出力４にＩｔ　Ｉ　Ｉ＋パルスを夫々出力す
る。The speaker's voice is processed by the voice/silence determiner 2, which calculates the voice power at regular intervals and determines voice/silence based on the voice threshold level set in the amplitude direction. The II I I+ pulse is output to the sound determination output 3, and if there is no sound, the It I I+ pulse is output to the silence determination output 4, respectively.

ここで発言中は、有音判定出力３は、ｌ／　Ｉ　ＩＩパ
ルスの連続となり、無音判定出力４はこの期間“０”が
連続する。従って、Ｎ　Ｌ　ａカウンタ５にＮ１個のｉ
ｔ　１　＋１パルスが入力して初めてカウントオーバし
セットリセット形フリップフロップ１３のセット入力端
子１１にＩＩ　１　＋１パルスが入力すると共にＮ２進
カウンタ６をリセットする。During speech, the voice determination output 3 is a series of l/I II pulses, and the silence determination output 4 is "0" continuously during this period. Therefore, there are N1 i in the N L a counter 5.
Only when the t 1 +1 pulse is input, the count is over, and the II 1 +1 pulse is input to the set input terminal 11 of the set/reset type flip-flop 13, and the N binary counter 6 is reset.

一方、無音判定出力４はパ０″′が連続するため、Ｎ２
進カウンタ６はカウントアツプ進まず、従ってセットリ
セット形フリップフロップ１３のリセット入力端子１２
は、１１０　Ｊ＋が持続し、話者判定出力１４は、′１
”即ち話者出力となる。この過程において音声人力１に
ノイズのように短い音声が入力する場合には、Ｎ□進カ
ウンタ５はカウンタ段数に到達しないため、Ｎ０進カウ
ンタ５をカウントオーバできない。従って、セット人力
１１を“１″にすることができず、話者判定とならない
。On the other hand, the silence judgment output 4 is N2 because Pa0''' is continuous.
The advance counter 6 does not count up, so the reset input terminal 12 of the set-reset type flip-flop 13
, 110 J+ continues, and the speaker judgment output 14 is '1
In other words, it becomes the speaker's output. In this process, if a short voice such as noise is input to the voice input 1, the N□-base counter 5 does not reach the number of counter stages, so the N0-base counter 5 cannot be counted over. Therefore, the set human power 11 cannot be set to "1", and the speaker cannot be determined.

また、一度話者状態になった状態で、無音となると、Ｎ
２進カウンタ６に１′１１３パルスが連続して入力する
ので、Ｎ２進カウンタ６はカウントアツプし、その個数
がＮ２個に到達した時、カウントオーバし、初めてリセ
ット入力１２が“１″となり、セットリセット形フリッ
プフロップ１３がリセットされ、非話者判定出力１５と
なる。従って、無音時間が短い場合にはＮ２進カウンタ
６はカウントオーバせず、非話者判定とならない。Also, once you are in the speaker state, if there is silence, N
Since 1'113 pulses are continuously input to the binary counter 6, the N binary counter 6 counts up, and when the number reaches N2, the count is over, and the reset input 12 becomes "1" for the first time. The set-reset type flip-flop 13 is reset, and a non-speaker determination output 15 is obtained. Therefore, if the silent period is short, the N2 counter 6 will not count over and it will not be determined that the person is a non-speaker.

（発明の効果）以上説明したように、本発明によれば、複数地点の画面
切替において、上記話者／無話者判定出力をもとに、単
独の地点から話者が生じた場合には、該当地点に画面を
切替、複数地点から話者が生じた場合あるいは、どの地
点からも話者が生じない場合はそれまで映していた地点
の画面を話者地点画面として、継続表示する方法をとる
話者画面切替力において、前方保護時間（Ｎ工）を所定
の値に選ぶことにより、短時間のノイズは話者とならな
いから、他からの短いノイズによる誤切替が生じず、ま
た後方保護時間（Ｎよ）を所定の値に選ぶことにより、
発言者のいる地点が話者画面として選択されている時に
、その発言者の話の切れ目を非話者と判定しない回路を
実現できる。上述の効果は従来技術で得られる効果と、
同一であるが。(Effects of the Invention) As explained above, according to the present invention, when a speaker appears from a single point based on the output of the speaker/non-speaker determination, when switching screens at multiple points, , change the screen to the corresponding point, and if there are speakers from multiple points, or if there are no speakers from any of the points, the screen of the point that was being displayed up to that point will continue to be displayed as the speaker point screen. By selecting the forward protection time (Nt) to a predetermined value for the speaker screen switching force to be used, short-term noises will not be used as speakers, so erroneous switching due to short noises from other sources will not occur, and backward protection By choosing the time (N) to a predetermined value,
It is possible to realize a circuit that does not determine a break in a speaker's speech as a non-speaker when a point where the speaker is located is selected as the speaker screen. The above-mentioned effects are the same as those obtained with conventional technology,
Although it is the same.

制御パラメータの個数を従来の４個から有音、無音判定
出力の制御パラメータの２個に減少するので、より簡単
な回路構成となる。Since the number of control parameters is reduced from the conventional four to two, which is the control parameter for the sound/non-sound determination output, the circuit configuration becomes simpler.

[Brief explanation of the drawing]

図は、本発明の一実施例の構成を示すブロック図である
。１　・・・音声入力、２　・・・有音／無音判定器、３
　・・・有音判定出力、　４　・・・無音判定出力、５
　・・・Ｎ、進カウンタ（前方保護回路）。６　・・・Ｎ２進カウンタ（後方保護回路）。７・・・Ｎ１進カウンタ出力、　８　・・・Ｎ２進カウ
ンタ出力、　９　・・・Ｎ１進カウンタリセット入力、
　１０・・・　Ｎ２進カウンタリセット入力、　１１・
・・セット・リセット形フリップフロップセット入力、
１２・・・セット・リセット形フリップフロップ　リセ
ット入力、　１３・・・セット・リセット形フリップフ
ロップ、１４・・・話者判定出力、１５・・・非話者判
定出力。特許出願人　日本電信電話株式会社The figure is a block diagram showing the configuration of an embodiment of the present invention. 1...Voice input, 2...Sound/silence determiner, 3
... Sound determination output, 4 ... Silence determination output, 5
...N, advance counter (forward protection circuit). 6...N binary counter (backward protection circuit). 7...N1-base counter output, 8...N2-base counter output, 9...N1-base counter reset input,
10... N binary counter reset input, 11.
・・Set/reset type flip-flop set input,
12...Set/reset type flip-flop reset input, 13...Set/reset type flip-flop, 14...Speaker determination output, 15...Non-speaker determination output. Patent applicant Nippon Telegraph and Telephone Corporation

Claims

[Claims]

In a system that connects multiple video conference rooms using video lines and audio lines to conduct multipoint video conferences, an audio detector is installed to correspond to the audio input at each location, and a sound detector is installed for each audio input.
After detecting silence, the voice presence information is input to the forward protection circuit of the frame synchronization circuit used in the multiplex converter, and the silence information is also input to the rear protection circuit, and the outputs of these two protection circuits are used to generate a flip-flop. By controlling the number of steps, the speaker is not immediately determined to be a speaker even when a voice is detected, but is determined to be a speaker when the voice presence information exceeds a certain value of the number of forward protection steps, and once the voice is detected. A multipoint-to-point video conference characterized in that even if silence is detected, the person is not immediately determined to be a non-speaker, but is determined to be a non-speaker only when the number of times of silence exceeds a certain number of backward protection stages. Speaker determination method in the system.