JPH024095A - Speaker deciding system for inter-multispot video conference - Google Patents
Speaker deciding system for inter-multispot video conferenceInfo
- Publication number
- JPH024095A JPH024095A JP15238588A JP15238588A JPH024095A JP H024095 A JPH024095 A JP H024095A JP 15238588 A JP15238588 A JP 15238588A JP 15238588 A JP15238588 A JP 15238588A JP H024095 A JPH024095 A JP H024095A
- Authority
- JP
- Japan
- Prior art keywords
- speaker
- voice
- talker
- input
- silence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 claims description 6
- 206010019133 Hangover Diseases 0.000 description 4
- 238000007796 conventional method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
Abstract
Description
【発明の詳細な説明】
(発明の属する技術分野)
本発明は多地点間テレビ会議において複数地点の中から
、音声レベルの検出により、話者対地に自動的に切り換
える話者判定方式に関するものである。[Detailed Description of the Invention] (Technical Field to Which the Invention Pertains) The present invention relates to a method for determining a speaker from among multiple points in a multi-point video conference by automatically switching to a speaker-to-ground system by detecting audio levels. be.
(従来の技術)
従来、この種の多地点間映像会議システムにおける話者
判定としては、各地点の音声入力毎に音声検出器に設け
、該音声検出器により有音が無音かの検出を行い、有音
が検出された場合、その有音の継続時間に応じたハング
オーバ時間を有音判定に付加して有音情報として出方し
、このハングオーバ時間についた有音情報を音声入力毎
に設けた話者判定器に入力し、同判定器で一定入方毎に
、このハングオーバ時間についた有音情報に対し、話者
判定しきい値時間によって、話者/非話者を判定してい
た。(Prior Art) Conventionally, speaker determination in this type of multipoint video conference system involves installing a voice detector for each voice input at each point, and using the voice detector to detect whether there is a sound or no sound. When a voice presence is detected, a hangover time corresponding to the duration of the voice presence is added to the voice presence determination and output as voice presence information, and voice presence information about this hangover time is provided for each voice input. This was input into a speaker discriminator, and the same discriminator determined whether the speaker was a speaker or a non-speaker based on the speaker discrimination threshold time based on the voice presence information associated with this hangover time at certain intervals. .
この方法では一定時間間隔で話者/非話者の判定を行う
ので、話者/非話者判定開始時刻と有音判定開始時刻が
非同期となり、話者判定に要する時間がばらつく欠点が
ある。また、制御すべき時間パラメータとして有音の継
続時間、ハングオーバ時間、話者判定しきい値時間、一
定時間の話者/非話者判定間隔と制御パラメータの個数
(4個)が多く、回路が複雑になる欠点があった。In this method, since speaker/non-speaker determination is performed at regular time intervals, the speaker/non-speaker determination start time and the utterance determination start time are asynchronous, resulting in a drawback that the time required for speaker determination varies. In addition, the time parameters to be controlled include the duration of voice, hangover time, speaker determination threshold time, fixed time speaker/non-speaker determination interval, and the number of control parameters (4), which makes the circuit difficult to control. It had the disadvantage of being complicated.
(発明の目的)
本発明は上述した従来の欠点を解消し、制御するパラメ
ータの個数を従来に比べて半分に減少させて制御回路の
簡単化と制御を容易にすることを目的とする。(Objective of the Invention) It is an object of the present invention to eliminate the above-mentioned drawbacks of the conventional technique and to simplify the control circuit and facilitate control by reducing the number of parameters to be controlled by half compared to the conventional technique.
(発明の構成)
(発明の特徴と従来技術との差異)
本発明は話者識別画面切替制御のため、有音/無音判定
器と前方、後方保護回路およびセットリセット型フリッ
プフロップにより構成され、前記前方、後方保護回路に
よる話者/非話者判定に要する時間を一定の値として話
者/非話者判定をすることを特徴とする。(Structure of the Invention) (Characteristics of the Invention and Differences from the Prior Art) The present invention includes a voice/silence determiner, front and rear protection circuits, and a set-reset type flip-flop for speaker identification screen switching control. The present invention is characterized in that the speaker/non-speaker determination is performed by setting the time required for the speaker/non-speaker determination by the front and rear protection circuits to be a constant value.
従来技術では一定時間間隔で話者/非話者の判定を行な
い判定に要する時間がばらつくが、本発明は話者/非話
者判定に要する時間を一定の値として判定を安定化した
点が異なる。In the conventional technology, the speaker/non-speaker is determined at regular intervals, and the time required for determination varies, but the present invention stabilizes the determination by setting the time required for speaker/non-speaker determination to a constant value. different.
(実施例)
図は本発明方式の一実施例の構成ブロック図であり、こ
れは、複数テレビ会議室を映像回路と音声回線で結び多
地点間テレビ会議を行なうシステムにおける話者識別画
面切替制御の構成例である。(Embodiment) The figure is a configuration block diagram of an embodiment of the method of the present invention, and this figure shows speaker identification screen switching control in a system that connects multiple video conference rooms through video circuits and audio lines and conducts a multipoint video conference. This is a configuration example.
1は音声入力、2は有音/無音判定器、3は有音判定出
力、4は無音判定出力、5は前方保護回路としてのN1
進カウンタ、6は後方保護回路としてのN2進カウンタ
、7はN0進カウンタ出力、8はN2進カウンタ出力、
9はNよ進カウンタ5のリセット入力、10はN2進カ
ウンタ6のリセット入力、11はセット・リセット形フ
リップフロップ13のセット入力、12は同フリップフ
ロップのリセット入力、14は話者判定出力、15は非
話者判定出力である。1 is a voice input, 2 is a sound/silence judge, 3 is a sound judgment output, 4 is a silence judgment output, and 5 is N1 as a forward protection circuit.
6 is an N binary counter as a backward protection circuit, 7 is an N0 binary counter output, 8 is an N binary counter output,
9 is a reset input of the N-ary counter 5, 10 is a reset input of the N-2 counter 6, 11 is a set input of the set/reset type flip-flop 13, 12 is a reset input of the same flip-flop, 14 is a speaker determination output, 15 is a non-speaker determination output.
これは有音を検出してもただちに話者と判定せずに、有
音回数が一定の値(前方保護段数)を越えた場合に話者
と判定し、一度無音と判定されたらただちに非話者と判
定せず、無音回数が一定の値(後方保護段数)を越えた
時、初めて非話者と判定することが可能となる。以下こ
れについて説明する。This method does not immediately identify a speaker even when a voice is detected, but determines a speaker when the number of voices exceeds a certain value (the number of forward protection steps), and once it is determined that there is no voice, it immediately determines the speaker It becomes possible to determine that the person is a non-speaker only when the number of silences exceeds a certain value (the number of backward protection steps). This will be explained below.
いま音声人力1に発言者の音声が入力するものとする。It is now assumed that the speaker's voice is input to the voice operator 1.
発言者の音声は有音/無音判定器2において、一定周期
で音声電力を計算し、振幅方向で設定された音声しきい
値レベルによって、有音/無音を判定し、有音であれば
有音判定出力3にII I I+パルス、無音であれば
、無音判定出力4にIt I I+パルスを夫々出力す
る。The speaker's voice is processed by the voice/silence determiner 2, which calculates the voice power at regular intervals and determines voice/silence based on the voice threshold level set in the amplitude direction. The II I I+ pulse is output to the sound determination output 3, and if there is no sound, the It I I+ pulse is output to the silence determination output 4, respectively.
ここで発言中は、有音判定出力3は、l/ I IIパ
ルスの連続となり、無音判定出力4はこの期間“0”が
連続する。従って、N L aカウンタ5にN1個のi
t 1 +1パルスが入力して初めてカウントオーバし
セットリセット形フリップフロップ13のセット入力端
子11にII 1 +1パルスが入力すると共にN2進
カウンタ6をリセットする。During speech, the voice determination output 3 is a series of l/I II pulses, and the silence determination output 4 is "0" continuously during this period. Therefore, there are N1 i in the N L a counter 5.
Only when the t 1 +1 pulse is input, the count is over, and the II 1 +1 pulse is input to the set input terminal 11 of the set/reset type flip-flop 13, and the N binary counter 6 is reset.
一方、無音判定出力4はパ0″′が連続するため、N2
進カウンタ6はカウントアツプ進まず、従ってセットリ
セット形フリップフロップ13のリセット入力端子12
は、110 J+が持続し、話者判定出力14は、′1
”即ち話者出力となる。この過程において音声人力1に
ノイズのように短い音声が入力する場合には、N□進カ
ウンタ5はカウンタ段数に到達しないため、N0進カウ
ンタ5をカウントオーバできない。従って、セット人力
11を“1″にすることができず、話者判定とならない
。On the other hand, the silence judgment output 4 is N2 because Pa0''' is continuous.
The advance counter 6 does not count up, so the reset input terminal 12 of the set-reset type flip-flop 13
, 110 J+ continues, and the speaker judgment output 14 is '1
In other words, it becomes the speaker's output. In this process, if a short voice such as noise is input to the voice input 1, the N□-base counter 5 does not reach the number of counter stages, so the N0-base counter 5 cannot be counted over. Therefore, the set human power 11 cannot be set to "1", and the speaker cannot be determined.
また、一度話者状態になった状態で、無音となると、N
2進カウンタ6に1′113パルスが連続して入力する
ので、N2進カウンタ6はカウントアツプし、その個数
がN2個に到達した時、カウントオーバし、初めてリセ
ット入力12が“1″となり、セットリセット形フリッ
プフロップ13がリセットされ、非話者判定出力15と
なる。従って、無音時間が短い場合にはN2進カウンタ
6はカウントオーバせず、非話者判定とならない。Also, once you are in the speaker state, if there is silence, N
Since 1'113 pulses are continuously input to the binary counter 6, the N binary counter 6 counts up, and when the number reaches N2, the count is over, and the reset input 12 becomes "1" for the first time. The set-reset type flip-flop 13 is reset, and a non-speaker determination output 15 is obtained. Therefore, if the silent period is short, the N2 counter 6 will not count over and it will not be determined that the person is a non-speaker.
(発明の効果)
以上説明したように、本発明によれば、複数地点の画面
切替において、上記話者/無話者判定出力をもとに、単
独の地点から話者が生じた場合には、該当地点に画面を
切替、複数地点から話者が生じた場合あるいは、どの地
点からも話者が生じない場合はそれまで映していた地点
の画面を話者地点画面として、継続表示する方法をとる
話者画面切替力において、前方保護時間(N工)を所定
の値に選ぶことにより、短時間のノイズは話者とならな
いから、他からの短いノイズによる誤切替が生じず、ま
た後方保護時間(Nよ)を所定の値に選ぶことにより、
発言者のいる地点が話者画面として選択されている時に
、その発言者の話の切れ目を非話者と判定しない回路を
実現できる。上述の効果は従来技術で得られる効果と、
同一であるが。(Effects of the Invention) As explained above, according to the present invention, when a speaker appears from a single point based on the output of the speaker/non-speaker determination, when switching screens at multiple points, , change the screen to the corresponding point, and if there are speakers from multiple points, or if there are no speakers from any of the points, the screen of the point that was being displayed up to that point will continue to be displayed as the speaker point screen. By selecting the forward protection time (Nt) to a predetermined value for the speaker screen switching force to be used, short-term noises will not be used as speakers, so erroneous switching due to short noises from other sources will not occur, and backward protection By choosing the time (N) to a predetermined value,
It is possible to realize a circuit that does not determine a break in a speaker's speech as a non-speaker when a point where the speaker is located is selected as the speaker screen. The above-mentioned effects are the same as those obtained with conventional technology,
Although it is the same.
制御パラメータの個数を従来の4個から有音、無音判定
出力の制御パラメータの2個に減少するので、より簡単
な回路構成となる。Since the number of control parameters is reduced from the conventional four to two, which is the control parameter for the sound/non-sound determination output, the circuit configuration becomes simpler.
図は、本発明の一実施例の構成を示すブロック図である
。
1 ・・・音声入力、2 ・・・有音/無音判定器、3
・・・有音判定出力、 4 ・・・無音判定出力、5
・・・N、進カウンタ(前方保護回路)。
6 ・・・N2進カウンタ(後方保護回路)。
7・・・N1進カウンタ出力、 8 ・・・N2進カウ
ンタ出力、 9 ・・・N1進カウンタリセット入力、
10・・・ N2進カウンタリセット入力、 11・
・・セット・リセット形フリップフロップセット入力、
12・・・セット・リセット形フリップフロップ リセ
ット入力、 13・・・セット・リセット形フリップフ
ロップ、14・・・話者判定出力、15・・・非話者判
定出力。
特許出願人 日本電信電話株式会社The figure is a block diagram showing the configuration of an embodiment of the present invention. 1...Voice input, 2...Sound/silence determiner, 3
... Sound determination output, 4 ... Silence determination output, 5
...N, advance counter (forward protection circuit). 6...N binary counter (backward protection circuit). 7...N1-base counter output, 8...N2-base counter output, 9...N1-base counter reset input,
10... N binary counter reset input, 11.
・・Set/reset type flip-flop set input,
12...Set/reset type flip-flop reset input, 13...Set/reset type flip-flop, 14...Speaker determination output, 15...Non-speaker determination output. Patent applicant Nippon Telegraph and Telephone Corporation
Claims (1)
テレビ会議を行うシステムにおいて、各地点の音声入力
対応に音声検出器を設けて、該当の音声入力毎に有音、
無音の検出を行い、前記有音情報を多重変換装置で用い
られるフレーム同期回路の前方保護回路に入力し、また
前記無音情報を同じく後方保護回路に入力し、この2つ
の保護回路の出力でフリップフロップを制御することに
より、有音を検出してもただちに話者と判定せずに、有
音情報が一定の値である前方保護段数を越えた場合に話
者と判定し、一度話者と判定したら無音を検出してもた
だちに非話者と判定せず、無音回数が一定の値である後
方保護段数を越えた時、初めて非話者と判定することを
特徴とする多地点間映像会議システムにおける話者判定
方式。In a system that connects multiple video conference rooms using video lines and audio lines to conduct multipoint video conferences, an audio detector is installed to correspond to the audio input at each location, and a sound detector is installed for each audio input.
After detecting silence, the voice presence information is input to the forward protection circuit of the frame synchronization circuit used in the multiplex converter, and the silence information is also input to the rear protection circuit, and the outputs of these two protection circuits are used to generate a flip-flop. By controlling the number of steps, the speaker is not immediately determined to be a speaker even when a voice is detected, but is determined to be a speaker when the voice presence information exceeds a certain value of the number of forward protection steps, and once the voice is detected. A multipoint-to-point video conference characterized in that even if silence is detected, the person is not immediately determined to be a non-speaker, but is determined to be a non-speaker only when the number of times of silence exceeds a certain number of backward protection stages. Speaker determination method in the system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP63152385A JP2760804B2 (en) | 1988-06-22 | 1988-06-22 | Speaker determination device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP63152385A JP2760804B2 (en) | 1988-06-22 | 1988-06-22 | Speaker determination device |
Publications (2)
Publication Number | Publication Date |
---|---|
JPH024095A true JPH024095A (en) | 1990-01-09 |
JP2760804B2 JP2760804B2 (en) | 1998-06-04 |
Family
ID=15539366
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP63152385A Expired - Fee Related JP2760804B2 (en) | 1988-06-22 | 1988-06-22 | Speaker determination device |
Country Status (1)
Country | Link |
---|---|
JP (1) | JP2760804B2 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0423588A (en) * | 1990-05-17 | 1992-01-27 | Fujitsu Ltd | Multi-point conference system |
JP2008005028A (en) * | 2006-06-20 | 2008-01-10 | Nippon Telegr & Teleph Corp <Ntt> | Video voice conference system and terminal |
JP2008056204A (en) * | 2006-09-04 | 2008-03-13 | Advics:Kk | Brake device for vehicle |
JP2008141505A (en) * | 2006-12-01 | 2008-06-19 | Nippon Telegr & Teleph Corp <Ntt> | Speaker selector, speaker selection method, speaker selection program, and recording medium which records the same |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS4857522A (en) * | 1971-11-19 | 1973-08-13 | ||
JPS4924021A (en) * | 1972-06-23 | 1974-03-04 |
-
1988
- 1988-06-22 JP JP63152385A patent/JP2760804B2/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS4857522A (en) * | 1971-11-19 | 1973-08-13 | ||
JPS4924021A (en) * | 1972-06-23 | 1974-03-04 |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0423588A (en) * | 1990-05-17 | 1992-01-27 | Fujitsu Ltd | Multi-point conference system |
JP2008005028A (en) * | 2006-06-20 | 2008-01-10 | Nippon Telegr & Teleph Corp <Ntt> | Video voice conference system and terminal |
JP4531013B2 (en) * | 2006-06-20 | 2010-08-25 | 日本電信電話株式会社 | Audiovisual conference system and terminal device |
JP2008056204A (en) * | 2006-09-04 | 2008-03-13 | Advics:Kk | Brake device for vehicle |
JP2008141505A (en) * | 2006-12-01 | 2008-06-19 | Nippon Telegr & Teleph Corp <Ntt> | Speaker selector, speaker selection method, speaker selection program, and recording medium which records the same |
Also Published As
Publication number | Publication date |
---|---|
JP2760804B2 (en) | 1998-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8463600B2 (en) | System and method for adjusting floor controls based on conversational characteristics of participants | |
US7698141B2 (en) | Methods, apparatus, and products for automatically managing conversational floors in computer-mediated communications | |
JP2835483B2 (en) | Voice discrimination device and sound reproduction device | |
JP2586827B2 (en) | Receiver | |
JPH024095A (en) | Speaker deciding system for inter-multispot video conference | |
JP2910417B2 (en) | Voice music discrimination device | |
JP2822897B2 (en) | Videoconferencing system speaker identification device | |
EP1453287B1 (en) | Automatic management of conversational groups | |
JP2666317B2 (en) | Video Screen Switching Method for Multipoint Video Conference System | |
JP2990051B2 (en) | Voice recognition device | |
JP2005086363A (en) | Calling device | |
JP2007096555A (en) | Voice conference system, terminal, talker priority level control method used therefor, and program thereof | |
JP3047259B2 (en) | Speaker automatic selection device of electronic conference system | |
JPH1188513A (en) | Voice processing unit for inter-multi-point communication controller | |
JP2003060792A (en) | Device for recording and reproducing a plurality of voices | |
JPH07226930A (en) | Communication conference system | |
JP2962343B2 (en) | Conference call system with audio signal level control function | |
JP2000049948A5 (en) | Voice call device, voice call system, and voice call method | |
JP2001024800A (en) | Voice conference system | |
JP2010050512A (en) | Voice mixing device, and program | |
Sakamoto et al. | Effect of speed difference between time-expanded speech and talker2s moving image on word or sentence intelligibility. | |
KR100329145B1 (en) | Method of automatically changing call sound time | |
JPH0527795A (en) | Speech recognition device | |
WO1996007177A1 (en) | Apparatus and method for detecting speech in the presence of other sounds | |
JPH05153082A (en) | Detecting equipment for background noise power |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
LAPS | Cancellation because of no payment of annual fees |