JPH10243494A - Method and device for recognizing direction of face - Google Patents

Method and device for recognizing direction of face

Info

Publication number
JPH10243494A
JPH10243494A JP9047866A JP4786697A JPH10243494A JP H10243494 A JPH10243494 A JP H10243494A JP 9047866 A JP9047866 A JP 9047866A JP 4786697 A JP4786697 A JP 4786697A JP H10243494 A JPH10243494 A JP H10243494A
Authority
JP
Japan
Prior art keywords
microphones
difference
pair
face
speaker
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP9047866A
Other languages
Japanese (ja)
Inventor
Shinji Tamoto
真詞 田本
Takeshi Kawabata
豪 川端
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP9047866A priority Critical patent/JPH10243494A/en
Publication of JPH10243494A publication Critical patent/JPH10243494A/en
Pending legal-status Critical Current

Links

Abstract

PROBLEM TO BE SOLVED: To improve the recognition processing efficiently by arranging a couple of microphones in a way that projections of a couple of microphones on a vertical plane opposed to an utterance person are deviated from each other and discriminating in which direction on a line segment tying the projections of the microphones on the vertical plane a face of the utterance person is directed based on a difference of power outputs of the microphones. SOLUTION: Voice signals from a couple of microphones 202, 203 are given to characteristic amount analyzers 210, 211 respectively. The analyzers 210, 211 detect an acoustic change in a voice depending on a difference from a direction from a lip 209 to the microphones 202, 203 caused by the effect of a shape of a vocal organ and a head. A difference extract device 212 calculates a difference between the two characteristic amounts detected by the analyzers 210, 211. A discrimination analyzer 213 discriminates a direction of a face 201 with respect to a vertical plane based on an output of the extract device 212. A direction of the face with respect to a horizontal plane is discriminated based on a difference from characteristic amounts resulting from voice signals picked up by other couple of microphones 204, 205. The direction of the face is discriminated by a difference from voice power extracted by the analyzers 210, 211.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【発明の属する技術分野】この発明は、対話音声理解装
置などにおいてユーザ(利用者)の装置や任意の対象へ
の注目を効率よく識別する方法及び装置に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and an apparatus for efficiently discriminating a user (user) 's apparatus or an arbitrary object in an interactive speech understanding apparatus or the like.

【0002】[0002]

【従来の技術】人間同士のコミュニケーションにおいて
呼び掛けや合図、対象への注視における音声は、働きか
けられる相手や対象に向けて発せられる。このことから
音声理解システムにおいて、ある発話者がシステムへ向
けられたものか否かを識別して、不要発話の棄却や音声
理解における焦点の移動を制御することができる。この
点から発話がシステムに向けられているか否かを識別す
る方法として顔面画像を用いた話者の注視方向の認識が
提案されている。
2. Description of the Related Art In communication between human beings, voices for calling, signaling, and gazing at an object are uttered to a partner or an object to be approached. From this, in the speech understanding system, it is possible to discriminate whether a certain speaker is directed to the system or not, and control rejection of unnecessary speech and movement of the focus in speech understanding. From this point, recognition of the gaze direction of the speaker using a face image has been proposed as a method of identifying whether or not the utterance is directed to the system.

【0003】従来の顔方向認識方法として画像を用いた
パタン照合を例にとって説明する。画像によって顔方向
を認識するには、多方向を向いた顔面の画像を参照パタ
ンとして予め登録し、入力画像との照合を行って顔方向
を認識する方法がある[1] 。画像情報のパタン照合の一
例を図4に示す。多方向を向いた顔面の全体乃至一部の
画像は、顔方向が既知の参照パタン群101として予め
登録されている。ここで、顔方向が未知の評価パタン1
10が与えられたとき、個々の参照パタン102−10
6のうち、最も類似するものを判定して、その照合パタ
ンの顔方向を入力パタンの顔方向として認識する。
A conventional face direction recognition method will be described with reference to an example of pattern matching using an image. In order to recognize a face direction from an image, there is a method in which an image of a face facing in multiple directions is registered in advance as a reference pattern, and the face direction is recognized by collating with an input image [1]. FIG. 4 shows an example of pattern matching of image information. The entire or partial image of the face in multiple directions is registered in advance as a reference pattern group 101 whose face direction is known. Here, the evaluation pattern 1 whose face direction is unknown
10, the individual reference patterns 102-10
6, the most similar one is determined, and the face direction of the matching pattern is recognized as the face direction of the input pattern.

【0004】画像パタンの照合を行うには、まず、画像
を小区画に分割し、区画内の画像の色や明るさの分布か
ら、その区画ごとに特徴量を算出する。ついで画像を構
成するすべての各区画の特徴量を求める。この手続を参
照パタンと評価パタンの双方で行い、各々の特徴量の分
布の距離から類似性を算出する。図4から分かるよう
に、この方法では予め多数の参照パタンを登録し、パタ
ンごとに特徴量を計算し、類似性の判定をしなければな
らない。また、画像を小区画に分割し、それぞれの特徴
量を求めるためには、多くの計算を必要とする。
In order to perform image pattern matching, first, an image is divided into small sections, and a feature amount is calculated for each section from the color and brightness distribution of the image in the section. Next, the feature amounts of all the sections constituting the image are obtained. This procedure is performed for both the reference pattern and the evaluation pattern, and the similarity is calculated from the distribution distance of each feature value. As can be seen from FIG. 4, in this method, it is necessary to register a large number of reference patterns in advance, calculate a feature amount for each pattern, and determine similarity. In addition, many calculations are required to divide an image into small sections and obtain the respective feature amounts.

【0005】[0005]

【発明が解決しようとする課題】従来の顔方向認識方法
では、画像情報のパタン照合を行うため、計算コストが
高くなる問題があった。この発明の目的は、顔方向認識
のためのパタン照合をより効率よく計算することができ
る方法及びその装置を提供することにある。
The conventional face direction recognition method has a problem that the calculation cost is high because the pattern matching of the image information is performed. It is an object of the present invention to provide a method and a device capable of more efficiently calculating pattern matching for face direction recognition.

【0006】[0006]

【課題を解決するための手段】この発明によれば、一対
のマイクロホンを用い、一対のマイクロホンを発声者と
対向する鉛直面に投影した時、互いにずれて位置するよ
うに配置し、これらマイクロホンの出力パワーの差を検
出し、その差により、上記鉛直面上の投影マイクロホン
位置を結ぶ直線上における何れの位置に発声者の顔が向
いているかを判別する。このマイクロホン出力の特定の
周波数帯域について、パワー差を検出する。一対のマイ
クロホンとして、左右方向と、上下方向の各一対を設け
ることにより、左右方向の何れか、上下方向の何れかに
顔が向いているかを検出する。
According to the present invention, a pair of microphones are arranged so as to be deviated from each other when projected onto a vertical plane facing the speaker, and the pair of microphones are arranged so as to be shifted from each other. The difference between the output powers is detected, and based on the difference, it is determined to which position on the straight line connecting the positions of the projection microphones on the vertical plane the face of the speaker is directed. A power difference is detected for a specific frequency band of the microphone output. By providing a pair of microphones in the left-right direction and the up-down direction, it is detected whether the face is facing in either the left-right direction or the up-down direction.

【0007】[0007]

【発明の実施の形態】図1にこの発明による顔方向認識
装置の実施例を示す。同図Aに示すように発声者200
の前面において、発声者200の顔面201の鉛直方向
における方向を判別するために鉛直方向に並んで配置さ
れたマイクロホン202,203と、顔面201の水平
面における方向を判別するために水平方向に並んで配置
されたマイクロホン204,205とが設けられる。こ
の例では、マイクロホン202,203,204,20
5は、発声者の前面と対向するほぼ鉛直な一つの面20
6内に位置し、かつマイクロホン202と203との中
点と、マイクロホン204と205との中点とが一致
し、この中点207を通り、鉛直面206と垂直な線2
08が、顔面201が鉛直面206と対向した状態の顔
面201の口唇部209を通るようにされている。つま
り、発声者200の頭部正中を通る鉛直面(図1Aの直
線208を通る鉛直面)上にマイクロホン202,20
3を配し、その頭部正中・鉛直面の左右にマイクロホン
204,205を配する。また、各マイクロホン202
〜205は単一指向性で、その指向方向を口唇部209
に向け、対をなすマイクロホン202と203,204
と205はそれぞれ互いに同一特性のものである。
FIG. 1 shows an embodiment of a face direction recognition apparatus according to the present invention. As shown in FIG.
Microphones 202 and 203 arranged vertically in order to determine the direction of the face 201 of the speaker 200 in the vertical direction, and horizontally arranged in order to determine the direction of the face 201 in the horizontal plane. Microphones 204 and 205 are provided. In this example, the microphones 202, 203, 204, 20
5 is a substantially vertical surface 20 facing the front of the speaker.
6, the midpoint of the microphones 202 and 203 coincides with the midpoint of the microphones 204 and 205, and passes through the midpoint 207 and is perpendicular to the vertical plane 206.
08 passes through the lips 209 of the face 201 in a state where the face 201 faces the vertical plane 206. That is, the microphones 202 and 20 are placed on a vertical plane passing through the median head of the speaker 200 (vertical plane passing through the straight line 208 in FIG. 1A).
The microphones 204 and 205 are arranged on the left and right sides of the median head and the vertical plane. In addition, each microphone 202
Reference numeral 205 denotes a uni-directional pattern, and the direction of the
Microphones 202, 203 and 204 forming a pair
And 205 have the same characteristics.

【0008】一方の一対のマイクロホン202,203
からの音声は、図1Bに示すように特徴量分析器21
0,211に入力される。特徴量分析器210,211
は、発声器官及び頭部の形状の影響で口唇部209から
マイクロホン202,203への方向の違いによる音声
の音響的変化を検出する。差分抽出器212は、特徴量
分析器210,211で検出された二つの特徴量の差を
算出する。判別分析器213は差分抽出器212の出力
から顔面201の鉛直面における方向(上下方向)を判
別する。
One pair of microphones 202 and 203
From the feature analyzer 21 as shown in FIG. 1B.
0,211. Feature value analyzers 210 and 211
Detects an acoustic change in speech due to a difference in direction from the lips 209 to the microphones 202 and 203 under the influence of the shape of the vocal organs and the head. The difference extractor 212 calculates the difference between the two feature amounts detected by the feature amount analyzers 210 and 211. The discriminant analyzer 213 discriminates the direction (vertical direction) of the face 201 in the vertical plane from the output of the difference extractor 212.

【0009】図に示していないが他方の一対のマイクロ
ホン204,205の捕捉音声もそれぞれ特徴量が検出
され、これら特徴量の差から顔面201の水平面におけ
る方向(左右方向)が判別される。図1Cにこの発明の
装置としての具体的実装例を示す。特徴量分析器21
0,211としてそれぞれ音声パワーを抽出し、これら
音声パワーの差により顔面方向を判定する。つまり頭部
周囲の音圧は、水平面及び頭部正中・鉛直面の分布とも
顔正面に比べ後背部で数〜十数dB減衰する。よって前記
音声パワーの差により顔面方向を判定できる。
Although not shown in the figure, the feature amounts of the voices captured by the other pair of microphones 204 and 205 are also detected, and the direction (horizontal direction) of the face 201 in the horizontal plane is determined from the difference between these feature amounts. FIG. 1C shows a specific implementation example of the device of the present invention. Feature analyzer 21
Audio powers are extracted as 0 and 211, respectively, and the face direction is determined based on a difference between these audio powers. In other words, the sound pressure around the head is attenuated by several to several tens of dB at the back as compared to the front of the face in both the horizontal plane and the distribution of the median and vertical planes of the head. Therefore, the face direction can be determined from the difference in the audio power.

【0010】この場合、音声パワーを比較するマイクロ
ホン同士の周波数特性が異なると、スペクトル包絡の変
化によってパワーに差が生じる。これを除くため、パワ
ーを比較する周波数領域を制限し、スペクトル包絡の変
化によるパワーの変動を抑える。特徴量分析器210,
211として周波数スペクトル分析器301,302を
用いる。これは、一定期間ごとに音声信号の強さを周波
数別に抽出するものである。一対の周波数分析器30
1,302の出力は、差分抽出器212で入力の差分を
求めるかわりに、伝達関数を伝達関数算出器303で算
出する。伝達関数は、周波数スペクトル分析器301,
302の各分析周波数スペクトルをPx(f),Py
(f)とするときに次の式で求められる。
[0010] In this case, if the frequency characteristics of the microphones for comparing the audio powers are different, a difference occurs in the power due to a change in the spectral envelope. To eliminate this, the frequency domain in which the power is compared is limited, and power fluctuation due to a change in the spectral envelope is suppressed. Feature quantity analyzer 210,
The frequency spectrum analyzers 301 and 302 are used as 211. In this method, the strength of the audio signal is extracted for each frequency at regular intervals. A pair of frequency analyzers 30
The transfer function is calculated by a transfer function calculator 303 instead of calculating the difference between the inputs of the outputs 1 and 302 by the difference extractor 212. The transfer function is represented by the frequency spectrum analyzer 301,
Each analysis frequency spectrum of 302 is Px (f), Py
(F) is obtained by the following equation.

【0011】Txy(f)=Pxy(f)/Pxx(f) ただし、Pxx(f)は周波数スペクトルをPx (f)の
自乗、Pxy(f)は周波数スペクトルPx (f),Py
(f)の積である。差分抽出器の出力は、つまり伝達関
数算出器303の出力は判別分析器304で顔方向に変
換され、出力される。前記伝達関数は、一方のマイクロ
ホン、例えば202の出力を伝送路の入力とし、他方の
マイクロホン203の出力を伝送路の出力とする伝達関
数である。
Txy (f) = Pxy (f) / Pxx (f) where Pxx (f) represents the frequency spectrum as the square of Px (f), and Pxy (f) represents the frequency spectrum Px (f), Py
(F). The output of the difference extractor, that is, the output of the transfer function calculator 303 is converted to the face direction by the discriminant analyzer 304 and output. The transfer function is a transfer function in which the output of one microphone, for example, 202 is input to the transmission path and the output of the other microphone 203 is the output of the transmission path.

【0012】次に実験例を述べる。図2に示すように防
音室内で、机401に向かって座った発声者200の顔
方向の検出にこの発明を適用した場合である。発声者2
00の頭部正中・鉛直面402上において、発声者20
0の胸に全指向性のコレクトレットコンデンサマイクロ
ホン403を取付け、鉛直面402から左右に等距離は
なれて机401上に単一指向性のダイナミックマイクロ
ホン204,205を配置し、指向方向を発声者200
の顔面に向けた。発声者200の顔面を定めた方向に向
け、予め用意した文章を朗読させた。発声者200とし
て男性1名、女性1名について、正面方向404,左マ
イク方向405,右マイク方向406,左右マイク対称
面で下方向407の4方向につき、各マイクロホン20
4,205,403の出力を標本化周波数48kHz でデ
ィジタルデータとして取込み、それぞれを周波数スペク
トル分析して、対となるマイクロホンのパワースペクト
ラム比を求めた。
Next, an experimental example will be described. FIG. 2 shows a case where the present invention is applied to the detection of the face direction of a speaker 200 sitting at a desk 401 in a soundproof room. Speaker 2
On the head midline / vertical plane 402 of 00, the speaker 20
An omni-directional collectlet condenser microphone 403 is attached to the chest of the subject 0, and unidirectional dynamic microphones 204 and 205 are placed on the desk 401 at equal distances from the vertical plane 402 to the left and right, and the directional direction of the speaker 200 is changed.
Turned to the face. The sentence prepared in advance was read aloud with the face of the speaker 200 facing in a predetermined direction. Regarding one male and one female speaker 200, each microphone 20 in four directions of front direction 404, left microphone direction 405, right microphone direction 406, and left and right microphone symmetry plane downward 407.
The outputs of 4,205,403 were taken in as digital data at a sampling frequency of 48 kHz, and each was subjected to frequency spectrum analysis to determine the power spectrum ratio of a paired microphone.

【0013】正面方向404を向いて発声したときの、
左、右マイクロホン204,205の電力スペクトラム
比は図3Aに示すように、20kHz までほぼ平坦なもの
となっている。発声者200が左方を向いたときの左、
右マイクロホン204,205のパワースペクトラム比
は図3Bに示すように、20kHz 以上での減衰の他に1
3kHz,17kHz 付近に減衰が生じている。この減衰は
発声者が正面を向いて同一の発声をしたときには生じて
ない。従って、マイクロホン204,205の出力音声
パワーの比の大きさにより、発声者200が正面を向い
ているか、左を向いているかを判別することができる。
この場合、図3A,Bの特に変化がある帯域、この例で
は10kHz 〜20kHz の成分だけを取り出せば、顔面の
向きによる音声パワー比の大きさの違いが一層大きくな
り、向きを正しく判別することができる。図に示してい
ないが、正面を向いているときと、右を向いているとき
とをマイクロホン204,205の出力音声パワー比の
大きさで判別することができる。
When the utterance is directed toward the front direction 404,
The power spectrum ratios of the left and right microphones 204 and 205 are almost flat up to 20 kHz as shown in FIG. 3A. The left when the speaker 200 turns to the left,
As shown in FIG. 3B, the power spectrum ratio of the right microphones 204 and 205 is 1 in addition to the attenuation above 20 kHz.
Attenuation occurs around 3 kHz and 17 kHz. This attenuation does not occur when the speaker makes the same utterance facing the front. Therefore, it is possible to determine whether the speaker 200 faces the front or the left based on the ratio of the output audio powers of the microphones 204 and 205.
In this case, if only the frequency band of FIGS. 3A and 3B which has a particular change, in this example, only the component of 10 kHz to 20 kHz, is taken out, the difference in the magnitude of the sound power ratio depending on the direction of the face is further increased, and the direction can be correctly determined. Can be. Although not shown in the drawing, it is possible to determine whether the microphone is facing the front or to the right based on the magnitude of the output audio power ratio of the microphones 204 and 205.

【0014】発声者200が正面を向いたときの、中央
マイクロホン403のパワースペクトルに対する左側マ
イクロホン204のパワースペクトルの比は図3Cに示
すようになった。この場合は、マイクロホン204,4
03の周波数特性が異なるため、15kHz 以上の高域で
大きな連続した落ち込みが生じている。発声者200が
下方を向いたときの、マイクロホン403のパワースペ
クトルに対するマイクロホン204のパワースペクトル
の比は図3Dに示すように左右マイクロホンの出力の比
較と同様に13kHz 付近で落ち込みが生じ、つまり図3
Cに対し、周波数特性が平坦な部分でのパワースペクト
ル比にわずかな差が生じている。よって、この差の検出
により正面を向いているか、下方を向いているかの判別
をすることができる。この場合、マイクロホン204と
403の周波数特性の違いに基づく影響を避けるため、
この例では10〜15kHz 帯の成分のみを取り出して、
パワースペクトルの比の大きさを検出すればより誤りな
く、判別することができる。
FIG. 3C shows the ratio of the power spectrum of the left microphone 204 to the power spectrum of the center microphone 403 when the speaker 200 faces front. In this case, the microphones 204 and 4
Since the frequency characteristics of the frequency band 03 differ, a large continuous drop occurs in a high frequency range of 15 kHz or more. When the speaker 200 is turned downward, the ratio of the power spectrum of the microphone 204 to the power spectrum of the microphone 403 drops as shown in FIG. 3D near 13 kHz as in the comparison of the outputs of the left and right microphones.
For C, a slight difference occurs in the power spectrum ratio in a portion where the frequency characteristics are flat. Therefore, by detecting this difference, it is possible to determine whether the vehicle is facing the front or downward. In this case, to avoid the influence based on the difference in the frequency characteristics between the microphones 204 and 403,
In this example, only the components in the 10-15 kHz band are extracted,
If the magnitude of the ratio of the power spectrum is detected, the determination can be made without error.

【0015】以上の説明から理解されるように、マイク
ロホンとしては水平面内での左右に配列された少なくと
も2つのマイクロホンと、鉛直面内で上下方向に配列さ
れた少なくとも2つのマイクロホンとを設ければよく、
この際、水平面内での配列マイクロホンの1つと、鉛直
面内での配列マイクロホンの1つとを共用することがで
きる。
As will be understood from the above description, as microphones, at least two microphones arranged on the left and right in the horizontal plane and at least two microphones arranged in the vertical direction on the vertical plane are provided. Often,
In this case, one of the arrangement microphones in the horizontal plane and one of the arrangement microphones in the vertical plane can be shared.

【0016】このように2対のマイクロホンを用いるこ
とにより、顔の向きが左右方向か上下方向かの判別を行
うことができ、従って右上、右下などの斜め方向も判別
できる。このように4方の方向の検出のみならず、一対
のマイクロホンを用いることにより、このマイクロホン
を、発声者と対向する鉛直面に投影した位置を結ぶ直線
上のどの位置に顔が向いているかを検出できる。つま
り、この発明は例えば左右方向の何れに向いているかの
検出のみに対しても適用できる。
By using two pairs of microphones as described above, it is possible to determine whether the direction of the face is in the left-right direction or in the up-down direction. Therefore, it is also possible to determine diagonal directions such as upper right and lower right. As described above, in addition to the detection of the four directions, by using a pair of microphones, it is possible to determine which position on the straight line that connects the microphone with the position projected on the vertical plane facing the speaker. Can be detected. That is, the present invention can be applied to, for example, only the detection in which of the left and right directions.

【0017】使用するマイクロホンの向きは、雑音の影
響を避け、また信号音の受音レベルを大とする点から、
発声者の顔に向けた方がよいが必ずしもそのようにしな
くてもよい。また使用するマイクロホンは対をなすも
の、つまり音声パワー比乃至差をとるものは同一周波数
特性のものが好ましいが、これも必ずしも同一でなくて
もよい。
The direction of the microphone to be used is to avoid the influence of noise and to increase the reception level of the signal sound.
It is better to point to the speaker's face, but it is not necessary. Microphones used in pairs, that is, microphones having a sound power ratio or difference preferably have the same frequency characteristics, but they need not always be the same.

【0018】更に帯域制限をする場合に、上述では周波
数スペクトル分析を行った後、所要の成分を取り出した
が、マイクロホン出力を所要帯域を通過帯域とするフィ
ルタを通し、その出力の音声パワーを求めてもよい。ま
た顔面の正面、左向き、右向き、下向き、つまり正面に
対し、左、右、上、下の何れの方向に向いているかを判
別したが、同一の向きでも上記正面から大きく方向がず
れているほど、正面に向いた時との出力比に対する差が
大きくなるから、予めある角度範囲ごとに出力比の大き
さを記憶しておき、測定した電力比がどの範囲にあるか
に応じて、正面に対する向きの大きさも判別することが
できる。
In order to further limit the band, the required components are extracted after the frequency spectrum analysis has been performed in the above, but the microphone output is passed through a filter having a required band as a pass band, and the audio power of the output is obtained. You may. Also, it was determined whether the face was facing the front, left, right, or downward, that is, whether the face was facing left, right, up, or down. Since the difference with respect to the output ratio when facing the front increases, the magnitude of the output ratio is stored in advance for each angle range, and depending on which range the measured power ratio is, The size of the direction can also be determined.

【0019】[0019]

【発明の効果】従来の顔方向認識方法では、画像パタン
認識を行うために多量の計算と参照パタンを用意する必
要があった。ところが、この発明を用いれば、音声信号
から特徴量を抽出し、判別分析を行うことで、認識に必
要な参照パタンや計算量そのものを削減でき、認識処理
を効率よく行うことができる。
According to the conventional face direction recognition method, it is necessary to prepare a large amount of calculations and reference patterns in order to perform image pattern recognition. However, according to the present invention, by extracting a feature amount from a speech signal and performing discriminant analysis, a reference pattern and a calculation amount required for recognition can be reduced and recognition processing can be performed efficiently.

【図面の簡単な説明】[Brief description of the drawings]

【図1】Aはこの発明の一実施例における対象顔面20
1とマイクロホンの配置関係の例を示す図、Bは一対の
マイクロホンの出力の処理装置の例を示すブロック図、
Cはその具体的実装例を示すブロック図である。
FIG. 1A shows a target face 20 according to an embodiment of the present invention.
FIG. 1B is a diagram illustrating an example of an arrangement relationship between 1 and a microphone; FIG.
C is a block diagram showing a specific implementation example.

【図2】この発明の実験例における発声者とマイクロホ
ンの配置、顔の向きの関係を示す斜視図。
FIG. 2 is a perspective view showing the relationship between the position of a speaker and a microphone and the orientation of a face in an experimental example of the present invention.

【図3】この発明の実験結果におけるパワースペクトラ
ム比の各種例を示す図。
FIG. 3 is a diagram showing various examples of a power spectrum ratio in experimental results of the present invention.

【図4】従来の顔方向認識方法を示す概念図。 参考文献 [1] James L. Flanagan. Speech Analysis Synthesis a
nd Perception. Splinger-Verlag, 1972.
FIG. 4 is a conceptual diagram showing a conventional face direction recognition method. References [1] James L. Flanagan. Speech Analysis Synthesis a
nd Perception. Splinger-Verlag, 1972.

Claims (7)

【特許請求の範囲】[Claims] 【請求項1】 発声者の顔方向を認識する方法におい
て、 上記発声者の音声を捕捉する一対のマイクロホンを用
い、これらマイクロホンを、上記発声者と対向する鉛直
面に投影した位置が互いにずれるように配置し、 上記一対のマイクロホンの出力の音声パワーの差を求
め、 上記求めた音声パワー差により、上記鉛直面の上記投影
した一対のマイクロホン位置を結ぶ直線上の何れの位置
方向に上記発声者の顔が向いているかを判定することを
特徴とする顔方向認識方法。
1. A method for recognizing a face direction of a speaker, wherein a pair of microphones for capturing the voice of the speaker is used, and the positions of the microphones projected on a vertical surface facing the speaker are shifted from each other. The difference between the sound powers of the outputs of the pair of microphones is determined, and the speaker is determined in any position direction on a straight line connecting the pair of projected microphones on the vertical surface by the determined sound power difference. A face direction recognizing method characterized in that it is determined whether or not a face is facing.
【請求項2】 上記鉛直面の上記投影した一対のマイク
ロホン位置を結ぶ直線はほぼ水平線であり、 上記発声者の音声を捕捉する他の一対のマイクロホンを
用い、これら他の一対のマイクロホンを、上記発声者と
対向する鉛直面に投影した位置を結ぶ直線がほぼ鉛直線
であるように配置し、 上記他の一対のマイクロホンの出力の音声パワーの差を
求め、 上記求めた音声パワー差により、上記ほぼ鉛直線上の何
れの位置方向に上記発声者の顔が向いているかを判定す
ることを特徴とする請求項1記載の顔方向認識方法。
2. A straight line connecting the positions of the pair of projected microphones on the vertical plane is substantially a horizontal line, and the other pair of microphones capturing the voice of the speaker is used as the other pair of microphones. The straight line connecting the positions projected on the vertical plane facing the speaker is arranged so as to be substantially a vertical line, and the difference between the sound powers of the outputs of the other pair of microphones is obtained. 2. The face direction recognition method according to claim 1, wherein it is determined in which position direction on the vertical line the face of the speaker is facing.
【請求項3】 上記一対のマイクロホンの一方と、上記
他の一対のマイクロホンの一方とは互いに共通のもので
あることを特徴とする請求項2記載の顔方向認識方法。
3. The face direction recognition method according to claim 2, wherein one of said pair of microphones and one of said other pair of microphones are common to each other.
【請求項4】 上記音声パワーの差による顔方向の判定
は、特定の周波数における減衰に着目して行うことを特
徴とする請求項1乃至3の何れかに記載の顔方向認識方
法。
4. The face direction recognition method according to claim 1, wherein the determination of the face direction based on the difference in audio power is performed while paying attention to attenuation at a specific frequency.
【請求項5】 上記音声パワーの差は、一対のマイクロ
ホンの一方の出力の周波数スペクトルPx(f)と他方
の出力の周波数スペクトルPy(f)の積を、Px
(f)の自乗で割算した値であることを特徴とする請求
項1乃至4の何れかに記載の顔方向認識方法。
5. The difference between the audio powers is obtained by calculating the product of the frequency spectrum Px (f) of one output of the pair of microphones and the frequency spectrum Py (f) of the other output as Px
The face direction recognition method according to any one of claims 1 to 4, wherein the value is a value obtained by dividing the square by (f).
【請求項6】 発声者の顔方向を認識する装置であっ
て、 上記発声者の音声を捕捉する一対のマイクロホンと、 上記一対のマイクロホンの出力から上記顔方向の違いに
よる音声の音響的変化を特徴量として検出する一対の特
徴量分析手段と、 これら一対の特徴量分析手段により検出された特徴量の
差分を求める差分抽出手段と、 上記差分抽出手段の抽出した差分に基づき、上記顔方向
を判別する判別分析手段とを具備する顔方向認識装置。
6. A device for recognizing a face direction of a speaker, comprising: a pair of microphones for capturing the voice of the speaker; and an acoustic change of voice due to a difference in the face direction from an output of the pair of microphones. A pair of feature amount analyzing means for detecting as a feature amount; a difference extracting means for obtaining a difference between the feature amounts detected by the pair of feature amount analyzing means; and a face direction based on the difference extracted by the difference extracting means. A face direction recognizing device comprising: a discriminant analyzing means for discriminating.
【請求項7】 上記一対の特徴量の検出を予め決められ
た周波数帯域に制限する手段を有することを特徴とする
請求項6記載の顔方向認識装置。
7. The face direction recognition apparatus according to claim 6, further comprising means for restricting detection of the pair of feature amounts to a predetermined frequency band.
JP9047866A 1997-03-03 1997-03-03 Method and device for recognizing direction of face Pending JPH10243494A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP9047866A JPH10243494A (en) 1997-03-03 1997-03-03 Method and device for recognizing direction of face

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP9047866A JPH10243494A (en) 1997-03-03 1997-03-03 Method and device for recognizing direction of face

Publications (1)

Publication Number Publication Date
JPH10243494A true JPH10243494A (en) 1998-09-11

Family

ID=12787307

Family Applications (1)

Application Number Title Priority Date Filing Date
JP9047866A Pending JPH10243494A (en) 1997-03-03 1997-03-03 Method and device for recognizing direction of face

Country Status (1)

Country Link
JP (1) JPH10243494A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010103617A (en) * 2008-10-21 2010-05-06 Nippon Telegr & Teleph Corp <Ntt> Speech direction estimation device and method, and program
JP2010124447A (en) * 2008-10-21 2010-06-03 Nippon Telegr & Teleph Corp <Ntt> Frontal utterance/lateral utterance presumption device, method and program
JP2010206392A (en) * 2009-03-02 2010-09-16 Nippon Telegr & Teleph Corp <Ntt> Speech direction estimation device and method, and program
JP2010206449A (en) * 2009-03-03 2010-09-16 Nippon Telegr & Teleph Corp <Ntt> Speech direction estimation device and method, and program
JP2010206393A (en) * 2009-03-02 2010-09-16 Nippon Telegr & Teleph Corp <Ntt> Speech direction estimation device and method, and program
US7995768B2 (en) 2005-01-27 2011-08-09 Yamaha Corporation Sound reinforcement system
JP2017021812A (en) * 2011-06-10 2017-01-26 アマゾン・テクノロジーズ、インコーポレイテッド Enhanced face recognition in video
US10531189B2 (en) 2018-05-11 2020-01-07 Fujitsu Limited Method for utterance direction determination, apparatus for utterance direction determination, non-transitory computer-readable storage medium for storing program

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6473272A (en) * 1987-09-16 1989-03-17 Toshiba Corp Sound source locator
JPS6455793U (en) * 1987-10-02 1989-04-06
JPH05288598A (en) * 1992-04-10 1993-11-02 Ono Sokki Co Ltd Three-dimensional acoustic intensity measuring device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6473272A (en) * 1987-09-16 1989-03-17 Toshiba Corp Sound source locator
JPS6455793U (en) * 1987-10-02 1989-04-06
JPH05288598A (en) * 1992-04-10 1993-11-02 Ono Sokki Co Ltd Three-dimensional acoustic intensity measuring device

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7995768B2 (en) 2005-01-27 2011-08-09 Yamaha Corporation Sound reinforcement system
JP2010103617A (en) * 2008-10-21 2010-05-06 Nippon Telegr & Teleph Corp <Ntt> Speech direction estimation device and method, and program
JP2010124447A (en) * 2008-10-21 2010-06-03 Nippon Telegr & Teleph Corp <Ntt> Frontal utterance/lateral utterance presumption device, method and program
JP2010206392A (en) * 2009-03-02 2010-09-16 Nippon Telegr & Teleph Corp <Ntt> Speech direction estimation device and method, and program
JP2010206393A (en) * 2009-03-02 2010-09-16 Nippon Telegr & Teleph Corp <Ntt> Speech direction estimation device and method, and program
JP2010206449A (en) * 2009-03-03 2010-09-16 Nippon Telegr & Teleph Corp <Ntt> Speech direction estimation device and method, and program
JP2017021812A (en) * 2011-06-10 2017-01-26 アマゾン・テクノロジーズ、インコーポレイテッド Enhanced face recognition in video
US10531189B2 (en) 2018-05-11 2020-01-07 Fujitsu Limited Method for utterance direction determination, apparatus for utterance direction determination, non-transitory computer-readable storage medium for storing program

Similar Documents

Publication Publication Date Title
US9595259B2 (en) Sound source-separating device and sound source-separating method
CN108305615B (en) Object identification method and device, storage medium and terminal thereof
Jin et al. Speaker segmentation and clustering in meetings.
CN102421050B (en) Apparatus and method for enhancing audio quality using non-uniform configuration of microphones
EP2800402B1 (en) Sound field analysis system
JP6467736B2 (en) Sound source position estimating apparatus, sound source position estimating method, and sound source position estimating program
EP2881948A1 (en) Spectral comb voice activity detection
CN103180900A (en) Systems, methods, and apparatus for voice activity detection
KR100822880B1 (en) User identification system through sound localization based audio-visual under robot environments and method thereof
CN109147787A (en) A kind of smart television acoustic control identifying system and its recognition methods
CN109997186B (en) Apparatus and method for classifying acoustic environments
JPH10243494A (en) Method and device for recognizing direction of face
Remaggi et al. Modeling the comb filter effect and interaural coherence for binaural source separation
US20150039314A1 (en) Speech recognition method and apparatus based on sound mapping
Nakadai et al. Exploiting auditory fovea in humanoid-human interaction
Nguyen et al. Selection of the closest sound source for robot auditory attention in multi-source scenarios
JP2005181391A (en) Device and method for speech processing
EP1266538B1 (en) Spatial sound steering system
JP2006304125A (en) Apparatus and method for correcting sound signal
JP4240878B2 (en) Speech recognition method and speech recognition apparatus
CN112530452A (en) Post-filtering compensation method, device and system
Moon et al. Multi-channel audio source separation using azimuth-frequency analysis and convolutional neural network
JPH04324499A (en) Speech recognition device
Maraboina et al. Multi-speaker voice activity detection using ICA and beampattern analysis
Okuno et al. Separating three simultaneous speeches with two microphones by integrating auditory and visual processing