JPH03252258A

JPH03252258A - Directivity reproducing device

Info

Publication number: JPH03252258A
Application number: JP2047320A
Authority: JP
Inventors: Miwako Doi; 美和子土井
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1990-03-01
Filing date: 1990-03-01
Publication date: 1991-11-11

Abstract

PURPOSE:To enable a listener to recognize an acoustic image in an accurate position regardless of movement of a sound source by simulating and outputting the input voice whose directivity is reproduced by a directivity reproducing part based on relative position relations between a speaker and the listener which are calculated from position information of both of them. CONSTITUTION:Position information of the speaker and the listener in a conference place or the like are read out from a position storage part 103 via a control part 106 to calculate relative position relations between the speaker and the listener by a phase calculating part 104. A directivity reproducing part 105 processes the voice of the speaker from an input part 101 based on the calculation result to reproduce the directivity, and the artificial voice by which the position of the speaker can be accurately recognized is outputted from a speaker 102, and thus, the conference or the like is smoothly progressed by the directivity reproducing device which enables the listener to recognize the accurate acoustic image regardless of movement of the sound source.

Description

【発明の詳細な説明】〔発明の目的〕（産業上の利用分野）本発明は、特定の方向に対して音声を再生する指向性再
生装置に関し、特に会議等において複数の参加者がいる
場合でも、聴者が話者の位置を認識することができる指
向性再生装置に関する。[Detailed Description of the Invention] [Object of the Invention] (Industrial Application Field) The present invention relates to a directional playback device that plays back audio in a specific direction, particularly when there are multiple participants in a conference or the like. However, it concerns a directional playback device that allows the listener to recognize the location of the speaker.

（従来の技術）国際会議等において、会議参加者は発言者の発する音声
をスピーカやヘッドホン等を介して聞いている。このス
ピーカやヘッドホンから聞こえる音声は、発言者の音声
だけではない。これは発言者と参加者の使用している言
語が異なる場合には、発言者の発言内容を通訳により、
参加者に理解できる言語に通訳者が変換している場合、
その通訳の発する音声が聞こえる。また最近のテレビ会
議システムでは、有線、無線の通信手段を用いて、会議
参加者が１ケ所に集まらなくても、テレビ画面や音声等
により相手を確認しながら会議を行なうことができると
いうものも登場している。(Prior Art) At international conferences and the like, conference participants listen to the voices of speakers through speakers, headphones, and the like. The audio heard from these speakers and headphones is not only the voice of the speaker. This means that when the speakers and participants speak different languages, the content of the speaker's statement can be interpreted using an interpreter.
If the interpreter has translated it into a language that the participants can understand,
I can hear the voice of the interpreter. In addition, recent video conferencing systems use wired and wireless communication methods to allow conference participants to hold a conference while checking the other party on the TV screen and audio, even if they do not gather in one place. It is appearing.

このテレビ会議システムにおいても、発言者の音声はス
ピーカやヘッドホン等の音声出力装置を介して参加者の
耳に入る。これらの場合において、特に聴者がヘッドホ
ンを介して話者の音声を聞く場合、この音声は話者と聴
者の距離や方向は全く無視したものであるので、聴者は
すぐに話者の位置を判断することができなかった、これ
はヘッドホンで音声を聞くと、話者（つまり、音源）の
位置から発生した音声によりつくられる音像が、聴者の
後頭部にあるためである。In this video conference system as well, the voice of the speaker reaches the ears of the participants via an audio output device such as a speaker or headphones. In these cases, especially when the listener listens to the speaker's voice through headphones, the listener can quickly determine the speaker's location because the sound completely ignores the distance and direction between the speaker and the listener. This is because when listening to audio through headphones, the sound image created by the audio generated from the location of the speaker (that is, the sound source) is located at the back of the listener's head.

この音像により正常な音源位置を聴者に再生。This sound image reproduces the normal sound source location to the listener.

認識させる特開昭６３−１７６１００号による回路を第
５図を用いて説明する。これは図に示すように、増幅出
力回路５０１から出力された左信号５０２、右信号５０
３をそれぞれ遅延回路５４１．５４２にて６００±１０
０ｔｉｓ程度の遅延を行なう。更にその８力をそれぞれ
減衰回路５５１．５５２にて、２±２割程度の減衰を行
なって得られる信号５６１．５６２に左信号５０２、右
信号５０３をそれぞれ混合回路５７１．５７２にて混合
し、ヘッドホン５０８に入力する。なお、遅延時間及び
減衰量は聴取者の好みに合うように可変できるようにさ
れている６以上説明したようにこの回路を用いれば左右
信号に逆信号を遅延及び減衰した信号を混合させること
により、音像を正常位置に再生。A circuit according to Japanese Patent Application Laid-Open No. 63-176100 for recognition will be explained with reference to FIG. As shown in the figure, this is a left signal 502 and a right signal 50 output from the amplification output circuit 501.
3 to 600±10 with delay circuits 541 and 542, respectively.
A delay of about 0tis is performed. Furthermore, each of the eight forces is attenuated by about 2±20% in attenuation circuits 551 and 552, and the resulting signal 561.562 is mixed with the left signal 502 and right signal 503 in mixing circuits 571 and 572, respectively. input to headphones 508. Note that the delay time and amount of attenuation can be varied to suit the listener's preference.6 As explained above, if this circuit is used, the left and right signals are mixed with delayed and attenuated inverse signals. , reproduces the sound image in its normal position.

認識することができる。can be recognized.

しかし、前記回路は、１つのヘッドホンに対して用いる
回路であり、これを複数の音源に対して用いることはで
きなかった。また、遅延時間及び減衰量はある程度の範
囲に決められており、音源位置による音声再生は十分で
きなかった。また前記範囲以外にも遅延時間及び減衰量
は可変できるが、この調節は聴者が行なう必要があり、
音源が移動する場合の音像再生が完全に行なえないとい
う問題点があった。However, the circuit is a circuit used for one headphone, and cannot be used for multiple sound sources. In addition, the delay time and the amount of attenuation are determined within a certain range, and it is not possible to sufficiently reproduce sound depending on the sound source position. Additionally, the delay time and attenuation amount can be varied outside the above range, but this adjustment must be made by the listener.
There was a problem in that the sound image could not be reproduced completely when the sound source moved.

（発明が解決しようとする課題）上述したように従来の回路では、音源が移動する場合に
は、聴者が音像を正常な位置に再生、認識できないとい
う問題点があった。(Problems to be Solved by the Invention) As described above, the conventional circuit has a problem in that when the sound source moves, the listener cannot reproduce and recognize the sound image in its normal position.

本発明は、上述した問題点を解決するためになされたも
のであり、音源が移動しても聴者が音像を常に正確な位
置に認識することができる指向性再生装置の提供を目的
とする。The present invention has been made to solve the above-mentioned problems, and aims to provide a directional reproduction device that allows a listener to always recognize a sound image at an accurate position even if the sound source moves.

[Structure of the invention]

（課題を解決するための手段）前記目的を達成するために、本発明の指向性再生装置で
は。(Means for Solving the Problems) In order to achieve the above-mentioned object, in the directional reproduction device of the present invention.

話者の音声を入力するための入力部と、話者及び聴者の
位置情報を記憶する出力部と、この出力部の位置情報にもとづいて話者と聴者との相対
的な位置関係を算出する位相算出部と、聴者に話者の位置を音声により認識させるために音声の
指向性の再現を行なう指向性再現部と。An input section for inputting the speaker's voice, an output section for storing positional information of the speaker and the listener, and a relative positional relationship between the speaker and the listener calculated based on the positional information of the output section. a phase calculation section; and a directionality reproduction section that reproduces the directionality of the voice in order to make the listener recognize the speaker's position through the voice.

音声を出力するための出力部とを備える。そして前記指
向性再現部は、前記位相算出部の算出結果にもとづいて
前記出力部に対し入力音声を疑似的に再現させる。なお
、前記入力部及び出力部は、会議等の場合には、会議出
席者分用窓される。これらのうち前記入力部は、会議出
席者のうち、発言しようとする者の発言をする旨の指示
を行なうものである。この他会議終了等の指示も前記入
力部にて行なう。この入力部からの発言指示を位相算出
部が検出するとともに話者と各聴者の相対的な位置関係
を算出する。この話者と各聴者との位置関係を算出する
ために、前記出力部は話者と聴者の位置情報を記憶して
いる。この位置情報は、会議出席者が個々にもっている
番号と１位置と、参加者の顔の向きを記憶している。そ
して、各参加者の位置は仮想的な座標軸の原点からの距
離と位相で表わされる。また各参加者の方向も、自由に
設定可能な基準をもうけ、これに対する方向として記憶
している。そして前記指向性再現部は、前記位相算出部
の算出結果にもとづいて各聴者に送る音声髪、前記入力
部から入力された話者の音声に対して遅延、増幅、減衰
等の処理を行なうことにより得る。この結果を前記制御
部は各聴者の出力部に送る。and an output section for outputting audio. The directivity reproduction section causes the output section to reproduce the input audio in a pseudo manner based on the calculation result of the phase calculation section. Note that, in the case of a conference, the input section and the output section are used as windows for conference attendees. Among these, the input section is used to instruct a person who is going to speak among the conference attendees to speak. In addition, instructions such as ending the meeting are also given through the input section. The phase calculation section detects the speaking instruction from the input section and calculates the relative positional relationship between the speaker and each listener. In order to calculate the positional relationship between the speaker and each listener, the output section stores positional information of the speaker and each listener. This position information stores the number and position of each conference attendee, as well as the face orientation of the attendee. The position of each participant is expressed by the distance and phase from the origin of the virtual coordinate axes. Furthermore, a freely settable standard is provided for the direction of each participant, and the direction is stored as a direction relative to this standard. The directivity reproduction section performs processing such as delay, amplification, and attenuation on the audio signals sent to each listener and the speaker's voice input from the input section based on the calculation results of the phase calculation section. obtained by. The control section sends this result to the output section of each listener.

（作用）上述したように構成された本発明の指向性再生装置によ
れば、位相算出部が話者を検出する。(Operation) According to the directional reproduction device of the present invention configured as described above, the phase calculation section detects the speaker.

この話者の位置情報と出力部に記憶している話者と聴者
の位置情報にもとづいて、位相算出部は話者と個々の聴
者の相対的な位置関係を算出する。Based on this speaker position information and the speaker and listener position information stored in the output unit, the phase calculation unit calculates the relative positional relationship between the speaker and each listener.

指向性再現部はこの算出結果にもとづいて話者の音声を
処理する。これを制御部は各出力部を介して聴者に送る
。この結果、聴者は常に話者の位置（つまり、音源）を
正確に認識することができる。これは例えば国際会議や
テレビ会議では話者の正確な認識により、会議の進行が
スムーズになる。The directivity reproduction unit processes the speaker's voice based on this calculation result. The control section sends this to the listener via each output section. As a result, the listener can always accurately recognize the speaker's position (that is, the sound source). For example, in international conferences and video conferences, accurate recognition of speakers allows the conference to proceed smoothly.

（実施例）以下１図面を参照しながら本発明の詳細な説明を行なう
。(Example) The present invention will be described in detail below with reference to one drawing.

第１図は本発明の一実施例のブロック図を示す。FIG. 1 shows a block diagram of one embodiment of the invention.

以下、テレビ会議により複数の人が別々の場所におり、
出力部としてヘッドホンを使用し、各人の発する音声は
入力部（例えば、マイク）を介している場合を例にとり
説明を行なう。Below, multiple people are in different locations due to video conferencing.
The explanation will be given by taking as an example a case where headphones are used as an output section and the voices emitted by each person are transmitted through an input section (for example, a microphone).

第１図において、入力部１０は１話者の発する音声や会
議終了等の指示を入力するためのものであり、通常マイ
クやキーボード等〃使用される。In FIG. 1, an input unit 10 is for inputting voices uttered by one speaker and instructions for ending a conference, etc., and is usually a microphone, a keyboard, or the like.

出力部１０２は、音声を出力するものであり、壁埋め込
みのスピーカ等などもあるが、本実施例ではヘッドホン
を使用する場合を考える。なお、本実施例では複数の聴
者がいる場合を考えているので、入力部１０１と出力部
１０２は複数あるものとする。The output unit 102 outputs audio, and may be a wall-embedded speaker or the like, but in this embodiment, a case where headphones are used is considered. Note that since this embodiment considers the case where there are a plurality of listeners, it is assumed that there are a plurality of input sections 101 and output sections 102.

出力部１０３は話者と聴者の位置を記憶するものである
。なお、会議の場合、会議参加者の内の１人が話者、つ
まり発言者である場合、他の会議参加者は聴者となる。The output unit 103 stores the positions of speakers and listeners. Note that in the case of a conference, if one of the conference participants is a speaker, that is, a speaker, the other conference participants become listeners.

しかしこの話者は発言が済み、他の参加者が発言する場
合には聴者となる。However, this speaker has finished speaking and becomes a listener when other participants speak.

つまり、会議参加者は話者には聴者にもなり得る。In other words, conference participants can be both speakers and listeners.

このため出力部１０３は、会議参加者全員の位置情報を
記憶している。For this reason, the output unit 103 stores position information of all conference participants.

ここで第２図（ａ）に示すような仮想的な会議室を考え
る。　これは出力部１０３にて記憶するための座標軸を
考えるのに用いる。なお、この座標軸は仮想的なもので
あり、例えば実際の会議室であってもよいし、テレビ会
議等のように会議参加者が別々の場所にいる場合では、
ある会議室をモデルとして考えてもよい。したがって、
原点も任意の位置に設定できる。つまり、この座標軸は
少なくとも話者と聴者の位置を設定できる範囲あればよ
い。第２図（ａ）によれば、会議室２０１には仕切り板
２０２を境に会議参加者１〜４と５〜８が机２０４にそ
れぞれ着席している。会議参加者の顔はＣＲＴデイスプ
レィ等のそれぞれの表示部２０３を介してお互いに見る
ことができる。なお、第２図（ｂ）はこの表示例である
。これは表示部２０３−２の表示例であり、参加者１〜
４が表示されており、参加者５〜８が見ている画像の表
示例である。二の表示は、例えば表示部２０３−１上部
に取りつけられたＣ　ＣＤ　（Ｃｈａｒｇｅ　Ｃｏｕｐ
ｌｅｄ　Ｄｅｖｉｃｅ）カメラ（図示しない）等により
撮影し、表示部２０３−２に順次送っている結果である
。そして、発言者の音声はヘッドホンにより参加者に聞
える。Here, consider a virtual conference room as shown in FIG. 2(a). This is used to consider the coordinate axes to be stored in the output unit 103. Note that this coordinate axis is virtual; for example, it may be an actual conference room, or if conference participants are in different locations, such as in a video conference,
You can consider a conference room as a model. therefore,
The origin can also be set to any position. In other words, it is sufficient that the coordinate axes have a range that allows at least the positions of the speaker and the listener to be set. According to FIG. 2(a), in the conference room 201, conference participants 1 to 4 and 5 to 8 are seated at desks 204 with a partition plate 202 as a boundary. The faces of conference participants can be seen by each other through their respective display units 203, such as CRT displays. Note that FIG. 2(b) is an example of this display. This is an example display on the display unit 203-2, and participants 1 to 2
4 is displayed, which is an example of an image that is being viewed by participants 5 to 8. The second display is, for example, a CCD (Charge Coup) attached to the top of the display section 203-1.
These are the results taken by a camera (not shown) or the like and sequentially sent to the display unit 203-2. Then, the voice of the speaker can be heard by the participants through headphones.

次に参加者の位置情報について参加者１を例にとり説明
する。参加者１の位置情報は、座標軸の原点から参加者
１までの距離γ１と位相θ、により表わされる。この他
さらに、参加者１が着席時にどの方向を向いているかと
いう方向も位置情報として含んでいる。第２図（ｃ）は
第２図（ｂ）と、第２図（ａ）との対応の一部を示した
図である。この第２図（ｃ）の２５１の示す方向により
前記参加者１の方向を示す、　また、第２図（ｃ）の２
５２は先述したヘッドホンである。なお、参加者の両耳
を結ぶ直線の鉛直方向がどの方向をさしているかという
情報を位置情報に含めて記憶するのは、参加者位置（γ
１．θ、）は、各会議参加者が会議室のどこにいるかと
いう位置情報のみを表わすものであり、各参加者の耳の
方向が音声を発するコの方向が特定できないということ
をなくすためである。また、これは先に述べた座標軸の
原点を自由に設定することを可能にするためにも必要で
ある。つまり、第２図（ａ）に示したように座標軸の（
＋、　＋）の領域に会議室を設定するのではなく、例え
ば円形の会議室を設定し、座標軸の原点をその円形の中
央に設定する場合である。このときも、各参加者の位置
のみでなく、参加者の方向も情報としてもっていないと
、各参加者の音声の入力部（つまり、耳）と出力部（つ
まり、口）がどの方向を向いているかがわからないため
である。このため、参加者の両耳を結ぶ直線の鉛直方向
がどの方向を向いているかが必要になる。これにより、
聴者の耳及び口の位置もおよそ決定できる。以上は、参
加者１を例に説明したが、これを会議参加者全員に対し
て行なう。Next, participant position information will be explained using participant 1 as an example. Participant 1's position information is expressed by distance γ1 from the origin of the coordinate axis to participant 1 and phase θ. In addition to this, the location information also includes the direction in which the participant 1 is facing when seated. FIG. 2(c) is a diagram showing a part of the correspondence between FIG. 2(b) and FIG. 2(a). The direction indicated by 251 in FIG. 2(c) indicates the direction of the participant 1, and 2 in FIG. 2(c)
Reference numeral 52 indicates the headphones mentioned above. Note that information about which direction the vertical direction of a straight line connecting both ears of the participant points is included and stored in the position information based on the participant position (γ
1. θ, ) represents only the location information of where each conference participant is in the conference room, and this is to avoid the possibility that the direction of each participant's ear from which the sound is emitted cannot be determined. This is also necessary to make it possible to freely set the origin of the coordinate axes mentioned above. In other words, as shown in Figure 2(a), the coordinate axis (
For example, instead of setting a conference room in an area of +, +), a circular conference room is set and the origin of the coordinate axis is set at the center of the circle. At this time, if we do not have information about not only the position of each participant but also the direction of the participant, it is necessary to have information about the direction of each participant's audio input (i.e., ears) and output (i.e., mouth). This is because we do not know whether the For this reason, it is necessary to know which direction the vertical direction of the straight line connecting the participants' ears faces. This results in
The position of the listener's ears and mouth can also be approximately determined. The above explanation has been given using participant 1 as an example, but this will be done for all conference participants.

第３図に、上記位置情報を前記出力部１０３に記憶する
際の記憶形式の一例を示す。参加者番号３０１は、会議
参加者を特定するものであり、会議参加者が個々にもつ
番号である。したがって、会議参加者の総数がｎ人であ
れば、位置情報もｎ個存在する。なお、位置情報とは、
参加者番号３０１と、後述する参加各位［３０２及び参
加者方向３０３により構成される。そして、出力部１０
３は。FIG. 3 shows an example of a storage format in which the position information is stored in the output unit 103. The participant number 301 identifies a conference participant, and is a number that each conference participant has. Therefore, if the total number of conference participants is n, there are also n pieces of location information. Furthermore, location information is
It consists of a participant number 301, each participant [302, which will be described later], and a participant direction 303. And output section 10
3 is.

この位置情報を記憶するものである。これらのうち参加
者位置３０２は、前記した座標軸の原点からの距離γと
位相θからなるものである。つまり参加者１の参加者位
置３０２は、（γ１．θ□）となる。This location information is stored. Among these, the participant position 302 consists of the distance γ from the origin of the coordinate axes described above and the phase θ. In other words, the participant position 302 of participant 1 is (γ1.θ□).

つまり、参加者ｉの参加各位［３０２は（γ１．θ工）
となる。次に、参加者方向３０３は、参加者の両耳を結
ぶ直線の鉛直方向がどの方向をさしているのかを第２図
（Ｃ）の２５１の示す方向で表わしたものである。なお
、この方向は前記参加者位置３０２の位相θと同様の表
示形式によるものとする。これは参加者を中心に反時計
回りを「＋」（プラス）とし、時計回りに指示する場合
には、「−」（マイナス）で表わすものとする。ただし
、ｒ＋」（プラス）は省略可能であり、本実施例では記
述していない。In other words, to all participants of participant i [302 is (γ1.θ engineering)
becomes. Next, the participant direction 303 indicates which direction the vertical direction of the straight line connecting both ears of the participant points to, as indicated by the direction 251 in FIG. 2(C). Note that this direction is displayed in the same format as the phase θ of the participant position 302. This is indicated by a "+" (plus) for counterclockwise rotation around the participant, and a "-" (minus) for clockwise directions. However, "r+" (plus) can be omitted and is not described in this embodiment.

つまり、参加者方向３０３のθｉＪは参加者ｉの両耳を
結ぶ直線の鉛直方向がどの方向を向いているのかを示す
ものである。またθｉｊ、θｎｎも前記記述形式にもと
づいている。ここで、本実施例では、位置情報を会議開
始前に出力部１０３に記憶するが、　この内、参加者方
向３０３は順次変更することは可能である。これは話者
の変更に対応するものであるが、これが許されるのは、
出力部１０３に記憶する処理や、後述する位相算出部１
０４や指向性再現部１０５の処理が高速に行なわれ、前
記処理に影響を及ぼさない範囲でなければならない。In other words, θiJ of the participant direction 303 indicates which direction the vertical direction of the straight line connecting both ears of participant i faces. Further, θij and θnn are also based on the above description format. Here, in this embodiment, the position information is stored in the output unit 103 before the start of the conference, but the participant direction 303 can be changed sequentially. This corresponds to a change in speaker, but this is allowed only if
Processing stored in the output unit 103 and phase calculation unit 1 described later
04 and the directivity reproduction unit 105 must be performed at high speed and must be within a range that does not affect the processing.

ここで影響を及ぼさない範囲とは、話者の音声のスピー
ドに前記処理が追いついている範囲であり、これは参加
者の方向の抽出若しくは入力処理に用する時間や、　こ
の情報を出力部１０３に記憶するのに用する時間等によ
り決まる。以上述べた出力部１０３に記憶する会議参加
者の位置情報は参加者がそれぞれ入力部１０１から人力
するものとする。なお、画像認識により、参加者の位置
を抽出し、　これを出力部１０３に記憶することもでき
る。また、会議室を仮定する場合には、その会場室の任
意の位置に参加者を配し、その位置を位置情報とするこ
とも可能である。Here, the range that does not have any influence is the range in which the processing has caught up with the speed of the speaker's voice, and this includes the time used for extraction or input processing of the participant's direction, and the range in which this information is transferred to the output unit 103. It is determined by the time used to memorize the data. It is assumed that the position information of conference participants to be stored in the output section 103 described above is manually entered by each participant from the input section 101. Note that the position of the participant can also be extracted by image recognition and stored in the output unit 103. Furthermore, in the case of assuming a conference room, it is also possible to arrange participants at arbitrary positions in the conference room and use the positions as position information.

位細算出部１０４は、会議参加者から発言者の検出を行
なうものである。これは、発言しようとする者は入力部
１０１から発言する旨の指示を入力させて、この入力を
検出することにより可能である。The rank calculation unit 104 detects speakers from among conference participants. This can be done by having the person who wants to speak input an instruction to speak from the input unit 101 and detecting this input.

またこの他には、　ＶＯＸ（Ｖｏｉｃｅ　ｅＸｃｈａｎ
ｇｅ上：音声感知）方式を備えたマイクの場合であれば
、その動作状態により検出することもできる。そして、
発言者を検出すると、出力部１０３の位置情報を用いて
、発言者との相対的な距離と位相の算出を各聴者に対し
て行なうものである。In addition to this, VOX (Voice eXchan)
In the case of a microphone equipped with a voice detection system, detection can also be performed based on its operating state. and,
When a speaker is detected, the position information from the output unit 103 is used to calculate the relative distance and phase to the speaker for each listener.

指向性再現部１０５は、聴者に話者の位置を音声により
認識させるために音声の指向性の再現を行なうものであ
る。　これには、位相算出部１０４が算出した話者と各
聴者の相対的な距離と位相を用いる。そして、この算出
結果にもとづいて音声の指向性の再現を行なう。The directivity reproduction unit 105 reproduces the directivity of the voice in order to make the listener recognize the position of the speaker through the voice. For this purpose, the relative distance and phase between the speaker and each listener calculated by the phase calculation unit 104 are used. Then, based on this calculation result, the directionality of the audio is reproduced.

制御部１０６は、本装置の全体を制御するものである。The control unit 106 controls the entire device.

これは従来の制御部に、指向性再現部１０５で再現した
各聴者に対する音声を各聴者の出力部１０２にそれぞれ
送る機能を付加したものである。This is a conventional control section with an added function of sending the sounds for each listener reproduced by the directivity reproduction section 105 to the output section 102 of each listener.

第４図は、制御部１０６の動作の説明のためのフローチ
ャートである。制御部１０６は１出力部１０３に会議参
加者の位置情報を記憶しており、かつ入力部１．０１か
ら会議開始の指示が入力されると動作を開始する。まず
、出力部１０３から会議参加者全員の位置情報を位相算
出部１０４に送る（ステップＳ　４０１）、　　次に位
相算出部１０４で話者の検出を行なう（ステップＳ　４
０２）、参加者のうち発言しようとする者が入力部１０
１から発言する旨の指示を入力すると、入力部１０１の
１つとしてのマイク等の音声入力装置は動作を開始する
とともに、他のマイクの動作開始指示の受は付けを止め
る０位相算出部１０４は、　この動作開始指示がどの入
力部１０１から入力されたかを調べ、　これに対応する
位置情報を調べることにより、だれ二が話者であるかを
検出することができる。さらに位相ｊ山部１０４は、ス
テップＳ　４０２で特定した話者の位置情報と聴者の位
置情報のうち、聴者ｉの位置情報を取り出しくステップ
Ｓ　４０３）、話者と聴者ｉとの相対的な距離と位相を
算出する（ステップＳ　４０４）、なお、この算出処理
は後述する。この算出結果を指向性再現部１０５に送る
（ステップＳ　４０５）。指向性再現部１０５は、　こ
の算出結果を用いて聴者ｉに対して話者の音声の指向性
を再現する（ステップ８４０６）。なお、この再現処理
は後述する。そして。FIG. 4 is a flowchart for explaining the operation of the control unit 106. The control unit 106 stores position information of conference participants in the 1 output unit 103, and starts operating when an instruction to start the conference is input from the input unit 1.01. First, the output unit 103 sends the position information of all conference participants to the phase calculation unit 104 (step S401). Next, the phase calculation unit 104 detects the speaker (step S4).
02), the person who wants to speak among the participants uses the input section 10
When an instruction to speak from 1 is input, a voice input device such as a microphone as one of the input units 101 starts operating, and the 0 phase calculation unit 104 stops accepting instructions to start operation of other microphones. By checking from which input unit 101 this operation start instruction was input and checking the corresponding position information, it is possible to detect who is the speaker. Further, the phase j peak portion 104 extracts the position information of the listener i from the speaker position information and the listener position information identified in step S402 (step S403), and extracts the relative position information between the speaker and the listener i. The distance and phase are calculated (step S404), and this calculation process will be described later. This calculation result is sent to the directivity reproduction unit 105 (step S405). The directivity reproduction unit 105 reproduces the directivity of the speaker's voice for the listener i using this calculation result (step 8406). Note that this reproduction process will be described later. and.

この再現結果を聴者の呂力部１０２に送る（ステップ４
０７）。This reproduction result is sent to the listener's power unit 102 (step 4
07).

以上のような処理（ステップ８４０４〜Ｓ　４０７）を
すべての聴者に対して行ない（ステップ８４０８゜Ｓ　
４０９）、すべての聴者に対して再現結果である音声を
送ったならば、会議が終了かどうかの判断を行なう（ス
テップＳ　４１０）、ここで会議参加者から入力部１０
１を介して会議終了の易の指示があれば、処理を終了す
る。会議終了のしの指示がなければ、ステップ５４０２
に戻り再び話者の検出を行なう。The above processing (steps 8404 to S407) is performed for all listeners (step 8408).
409), once the reproduced audio has been sent to all listeners, it is determined whether the conference is over (step S410).
If there is an instruction to end the conference via 1, the process ends. If there is no instruction to end the meeting, step 5402
Return to , and detect the speaker again.

次に第４図におけるステップ５４０４、すなわち位相算
出部１０４の話者と聴者ｉとの距離と位相差の算出処理
について説明する。例えば話者、聴者Ａ及び聴者Ｂの位
置情報をそれぞれ以下に示すものとする。Next, step 5404 in FIG. 4, that is, the process of calculating the distance and phase difference between the speaker and listener i by the phase calculation unit 104 will be described. For example, assume that the position information of the speaker, listener A, and listener B are shown below.

参加者番号　　　　　Ｏ参加者位置　　　　（γ。、θ− 参加者方向　　　　　θ。。Participant number O Participant position (γ., θ− Participant direction θ. .

参加者番号　　　　　Ａ参加者位置　　　　（γＡ、θＡ）参加者方向　　　　　θＡ＾参加者番号　　　　　Ｂ参加者位Ｉ！　　　　　（γＢ、θＢ）参加者方向　　
　　　θＢＢ上記位置情報を出力部１０３に記憶しているものとする
。この位置情報を制御部１０６は位相算出部１０４に送
る。そして位相算出部１０４は以下のように話者と聴者
Ａ及び聴者Ｂとの距離と位相差をそれぞれ算出する。Participant number A Participant position (γA, θA) Participant direction θA^ Participant number B Participant position I! (γB, θB) Participant direction
θBB It is assumed that the above position information is stored in the output unit 103. The control unit 106 sends this position information to the phase calculation unit 104. Then, the phase calculation unit 104 calculates the distance and phase difference between the speaker and listeners A and B, respectively, as follows.

■　話者と聴者Ａの場合話者と聴者Ａの距離γ、Ａは位相差θ。＾は θ。Ａ＝θ。。−θ＾＾ ■　話者と聴者Ｂの場合話者と聴者Ｂの距離γ。Ｂは γ。＾＝ＪｃγＢｇｉｎθａ−７．．ｓｉｎθｏ）”＋
（γＢｃｏａθＢ−γ、ｃｏｓθｏ）２位相差θ。Ｂは θ。Ｂ＝θ。。−θＢＢ次に第４図のステップ５４０６、つまり指向性再現部１
０５の再現処理について詳述する。　これはステップ５
４０５において位相算出部１０４での算出結果が送られ
てくると動作を開始する。なお、指向性再現部１０５は
、話者の音声に減衰、増幅、遅延等の処理を施すことに
より、話者と各聴者の距離や方向を表現するものである
５以下ここでは、音圧を変化させることにより距離や方向
を表現する場合について述べる。まず距離を表現するた
めに例えば距離の２乗に反比例させて音量を小さくする
。これは以下の式で表わすことできる。■ In the case of speaker and listener A, the distance between the speaker and listener A is γ, and A is the phase difference θ. ^ is θ. A=θ. . -θ^^ ■ In the case of speaker and listener B The distance between the speaker and listener B is γ. B is γ. ^=JcγBginθa−7. ．． sinθo)”+
(γBcoaθB−γ, cosθo) 2 phase difference θ. B is θ. B=θ. . -θBB Next, step 5406 in FIG. 4, that is, the directivity reproduction section 1
The reproduction process of 05 will be explained in detail. This is step 5
In step 405, when the calculation result from the phase calculation unit 104 is sent, the operation starts. Note that the directivity reproduction unit 105 expresses the distance and direction between the speaker and each listener by applying processing such as attenuation, amplification, and delay to the speaker's voice. We will discuss the case where distance and direction are expressed by changing. First, to express distance, the volume is reduced in inverse proportion to the square of the distance, for example. This can be expressed by the following formula.

ここで式■のγｉｊは話者ｉの聴者ｊの距離、αは比例
定数を表わし、会議室の大きさ等により決めることがで
きるが、自由に設定が可能なものとする。これと同時に
音量が小さすぎても大きすぎても聴者が聞きとれなかっ
たり不快に感じるので、音圧の範囲を調節することがで
きるものである。Here, γij in equation (2) represents the distance between speaker i and listener j, and α represents a proportionality constant, which can be determined depending on the size of the conference room, etc., but can be freely set. At the same time, if the volume is too low or too high, listeners may not be able to hear the sound or feel uncomfortable, so the range of sound pressure can be adjusted.

また、方向を表現するために、出力部１０２の左右の音
圧を位相差に対応させて、例えば以下のように変化させ
る。Further, in order to express the direction, the sound pressure on the left and right sides of the output section 102 is changed in correspondence with the phase difference, for example, as follows.

この式■は話者が聴者の正面にいるときは、左右の音圧
を等しくし、話者が聴者からみて右にいる場合には右耳
の、左のいるときは左耳の音圧を高くする。なお、上述
した処理は、話者と聴者の位置情報にもとづいて行なわ
れる。This formula ■ equalizes the sound pressure on the left and right ears when the speaker is in front of the listener, equalizes the sound pressure on the right ear when the speaker is on the right as viewed from the listener, and equalizes the sound pressure on the left ear when the speaker is on the left. Make it expensive. Note that the above-described processing is performed based on location information of the speaker and the listener.

以上のように、本実施例によれば、参加者は発言者に対
する距離や方向に応じて変化した音声をきくので、通常
へッドフォンによらずにきくのと。As described above, according to this embodiment, participants listen to voices that change depending on the distance and direction to the speaker, so they can listen without using headphones.

同様であり、臨場感を維持できる。国際会議などでは、
同時通訳者の声に対し発言者との距離と方向による音圧
変化を行うことにより、臨場感を維持できる。又、ＩＣ
カードなどの個人認証装置により参加者個人を特定でき
るようにすれば特定の参加者にだけ音声を送信できる。It is the same, and a sense of realism can be maintained. At international conferences,
A sense of realism can be maintained by changing the sound pressure of the simultaneous interpreter's voice depending on the distance and direction from the speaker. Also, IC
If individual participants can be identified using a personal authentication device such as a card, audio can be sent only to specific participants.

したがって、私的な伝言やあるいは同時通訳などを、他
の参加者に聞かせることなく、伝達することが可能であ
る。Therefore, it is possible to transmit private messages or simultaneous interpretation without having other participants hear them.

さらに、本実施例の出力部１０２はへッドフォンに限定
されるものでなく、耳の中に入るような超小型スピーカ
や、あるいは壁埋め込みのスピーカなどでもよい。Further, the output unit 102 of this embodiment is not limited to headphones, but may be a micro-sized speaker that can be inserted into the ear, a speaker embedded in a wall, or the like.

要するに本発明の主旨を逸脱しない範囲で種々変更して
実施することができる。In short, various changes and modifications can be made without departing from the spirit of the invention.

〔Effect of the invention〕

以上述べたように本発明によれば話者と聴者の位置情報
にもとづいた音声を聴者は聞くことにより、話者の位置
を正確に認識することができる。As described above, according to the present invention, a listener can accurately recognize the speaker's location by listening to audio based on location information of the speaker and the listener.

これは会議の進行をスムーズにするものである。This will help the meeting proceed smoothly.

さらにテレビ会議等においても臨場感を出すことができ
る。Furthermore, it is possible to create a sense of realism in video conferences and the like.

[Brief explanation of drawings]

第１図は本発明の一実施例のブロック図、第２図（ａ）
は本発明の一実施例におけるテレビ会議の例を示す概念
図、第２図（ｂ）は第２図（ａ）における画面の表示例
、第２図（ｃ）は第２図（ａ）、第２図（ｂ）との対応
の一部を示した図、第３図は第１図における出力部に記
憶している位置情報の記憶形式の例、第４図は本発明の
一実施例で用いる制御部の動作の一例を示すフローチャ
ート、第５図は従来の音源が移動しない場合での音像に
よる音源の再生、認識を行なう回路の概念図である。１０１・・・入力部　　　　　　１０２・・・出力部１
０３・・・位置５ｂａ一部　　　　１０４・・・位相算
出部１０５・・・指向性再現部Fig. 1 is a block diagram of an embodiment of the present invention, Fig. 2(a)
is a conceptual diagram showing an example of a video conference in an embodiment of the present invention, FIG. 2(b) is an example of the screen display in FIG. 2(a), FIG. A diagram showing a part of the correspondence with FIG. 2(b), FIG. 3 is an example of the storage format of position information stored in the output section in FIG. 1, and FIG. 4 is an example of an embodiment of the present invention. FIG. 5 is a conceptual diagram of a conventional circuit for reproducing and recognizing a sound source using a sound image when the sound source does not move. 101... Input section 102... Output section 1
03...Part of position 5ba 104...Phase calculation unit 105...Directivity reproduction unit

Claims

[Claims] (1) An input section for inputting the speaker's voice; a position storage section for storing position information of the speaker and the listener; A phase calculation unit that calculates the relative positional relationship with the listener, a directional reproduction unit that reproduces the directionality of the voice so that the listener can recognize the speaker's position by voice, and an output that outputs the voice. A directional reproduction device, comprising: a directional reproduction unit, wherein the directional reproduction unit causes the output unit to reproduce input audio in a pseudo manner based on a calculation result of the phase calculation unit. (2) The directional playback device according to claim 1, wherein the input section and the output section are provided for the number of speakers and listeners present. (3) The directional reproduction device according to claim 2, wherein the directional reproduction section reproduces the directionality of sound for the output section of each listener. (4) The directional reproduction device according to claim 3, wherein the control section sequentially sends the results of the directional reproduction section to the output section of each listener. (5) The directional reproduction device according to claim 1, wherein the position storage unit stores the positions of the speaker and the listener in terms of distance and phase from the origin of the coordinate axes. (5) The directional reproduction device according to claim 1, wherein the position storage section further stores a vertical direction of a straight line connecting both ears of the listener. (7) The directional playback device according to claim 2, wherein the output section uses headphones that output audio from left and right sides. (8) The directional playback device according to claim 2, wherein the position storage section stores only position information of the speaker. (9) The directional playback device according to claim 2, wherein the position storage section stores position information of all attendees at a meeting or the like. (10) The directional playback device according to claim 2, wherein the input unit includes a keyboard or the like for giving instructions such as making a statement and ending the meeting. (11) The directional reproduction device according to claim 10, wherein the phase calculation section detects a speaker based on a speaking instruction from the input section.