JPH10243494A

JPH10243494A - Method and device for recognizing direction of face

Info

Publication number: JPH10243494A
Application number: JP9047866A
Authority: JP
Inventors: Shinji Tamoto; 真詞田本; Takeshi Kawabata; 豪川端
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1997-03-03
Filing date: 1997-03-03
Publication date: 1998-09-11

Abstract

PROBLEM TO BE SOLVED: To improve the recognition processing efficiently by arranging a couple of microphones in a way that projections of a couple of microphones on a vertical plane opposed to an utterance person are deviated from each other and discriminating in which direction on a line segment tying the projections of the microphones on the vertical plane a face of the utterance person is directed based on a difference of power outputs of the microphones. SOLUTION: Voice signals from a couple of microphones 202, 203 are given to characteristic amount analyzers 210, 211 respectively. The analyzers 210, 211 detect an acoustic change in a voice depending on a difference from a direction from a lip 209 to the microphones 202, 203 caused by the effect of a shape of a vocal organ and a head. A difference extract device 212 calculates a difference between the two characteristic amounts detected by the analyzers 210, 211. A discrimination analyzer 213 discriminates a direction of a face 201 with respect to a vertical plane based on an output of the extract device 212. A direction of the face with respect to a horizontal plane is discriminated based on a difference from characteristic amounts resulting from voice signals picked up by other couple of microphones 204, 205. The direction of the face is discriminated by a difference from voice power extracted by the analyzers 210, 211.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、対話音声理解装
置などにおいてユーザ（利用者）の装置や任意の対象へ
の注目を効率よく識別する方法及び装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and an apparatus for efficiently discriminating a user (user) 's apparatus or an arbitrary object in an interactive speech understanding apparatus or the like.

【０００２】[0002]

【従来の技術】人間同士のコミュニケーションにおいて
呼び掛けや合図、対象への注視における音声は、働きか
けられる相手や対象に向けて発せられる。このことから
音声理解システムにおいて、ある発話者がシステムへ向
けられたものか否かを識別して、不要発話の棄却や音声
理解における焦点の移動を制御することができる。この
点から発話がシステムに向けられているか否かを識別す
る方法として顔面画像を用いた話者の注視方向の認識が
提案されている。2. Description of the Related Art In communication between human beings, voices for calling, signaling, and gazing at an object are uttered to a partner or an object to be approached. From this, in the speech understanding system, it is possible to discriminate whether a certain speaker is directed to the system or not, and control rejection of unnecessary speech and movement of the focus in speech understanding. From this point, recognition of the gaze direction of the speaker using a face image has been proposed as a method of identifying whether or not the utterance is directed to the system.

【０００３】従来の顔方向認識方法として画像を用いた
パタン照合を例にとって説明する。画像によって顔方向
を認識するには、多方向を向いた顔面の画像を参照パタ
ンとして予め登録し、入力画像との照合を行って顔方向
を認識する方法がある[1] 。画像情報のパタン照合の一
例を図４に示す。多方向を向いた顔面の全体乃至一部の
画像は、顔方向が既知の参照パタン群１０１として予め
登録されている。ここで、顔方向が未知の評価パタン１
１０が与えられたとき、個々の参照パタン１０２−１０
６のうち、最も類似するものを判定して、その照合パタ
ンの顔方向を入力パタンの顔方向として認識する。A conventional face direction recognition method will be described with reference to an example of pattern matching using an image. In order to recognize a face direction from an image, there is a method in which an image of a face facing in multiple directions is registered in advance as a reference pattern, and the face direction is recognized by collating with an input image [1]. FIG. 4 shows an example of pattern matching of image information. The entire or partial image of the face in multiple directions is registered in advance as a reference pattern group 101 whose face direction is known. Here, the evaluation pattern 1 whose face direction is unknown
10, the individual reference patterns 102-10
6, the most similar one is determined, and the face direction of the matching pattern is recognized as the face direction of the input pattern.

【０００４】画像パタンの照合を行うには、まず、画像
を小区画に分割し、区画内の画像の色や明るさの分布か
ら、その区画ごとに特徴量を算出する。ついで画像を構
成するすべての各区画の特徴量を求める。この手続を参
照パタンと評価パタンの双方で行い、各々の特徴量の分
布の距離から類似性を算出する。図４から分かるよう
に、この方法では予め多数の参照パタンを登録し、パタ
ンごとに特徴量を計算し、類似性の判定をしなければな
らない。また、画像を小区画に分割し、それぞれの特徴
量を求めるためには、多くの計算を必要とする。In order to perform image pattern matching, first, an image is divided into small sections, and a feature amount is calculated for each section from the color and brightness distribution of the image in the section. Next, the feature amounts of all the sections constituting the image are obtained. This procedure is performed for both the reference pattern and the evaluation pattern, and the similarity is calculated from the distribution distance of each feature value. As can be seen from FIG. 4, in this method, it is necessary to register a large number of reference patterns in advance, calculate a feature amount for each pattern, and determine similarity. In addition, many calculations are required to divide an image into small sections and obtain the respective feature amounts.

【０００５】[0005]

【発明が解決しようとする課題】従来の顔方向認識方法
では、画像情報のパタン照合を行うため、計算コストが
高くなる問題があった。この発明の目的は、顔方向認識
のためのパタン照合をより効率よく計算することができ
る方法及びその装置を提供することにある。The conventional face direction recognition method has a problem that the calculation cost is high because the pattern matching of the image information is performed. It is an object of the present invention to provide a method and a device capable of more efficiently calculating pattern matching for face direction recognition.

【０００６】[0006]

【課題を解決するための手段】この発明によれば、一対
のマイクロホンを用い、一対のマイクロホンを発声者と
対向する鉛直面に投影した時、互いにずれて位置するよ
うに配置し、これらマイクロホンの出力パワーの差を検
出し、その差により、上記鉛直面上の投影マイクロホン
位置を結ぶ直線上における何れの位置に発声者の顔が向
いているかを判別する。このマイクロホン出力の特定の
周波数帯域について、パワー差を検出する。一対のマイ
クロホンとして、左右方向と、上下方向の各一対を設け
ることにより、左右方向の何れか、上下方向の何れかに
顔が向いているかを検出する。According to the present invention, a pair of microphones are arranged so as to be deviated from each other when projected onto a vertical plane facing the speaker, and the pair of microphones are arranged so as to be shifted from each other. The difference between the output powers is detected, and based on the difference, it is determined to which position on the straight line connecting the positions of the projection microphones on the vertical plane the face of the speaker is directed. A power difference is detected for a specific frequency band of the microphone output. By providing a pair of microphones in the left-right direction and the up-down direction, it is detected whether the face is facing in either the left-right direction or the up-down direction.

【０００７】[0007]

【発明の実施の形態】図１にこの発明による顔方向認識
装置の実施例を示す。同図Ａに示すように発声者２００
の前面において、発声者２００の顔面２０１の鉛直方向
における方向を判別するために鉛直方向に並んで配置さ
れたマイクロホン２０２，２０３と、顔面２０１の水平
面における方向を判別するために水平方向に並んで配置
されたマイクロホン２０４，２０５とが設けられる。こ
の例では、マイクロホン２０２，２０３，２０４，２０
５は、発声者の前面と対向するほぼ鉛直な一つの面２０
６内に位置し、かつマイクロホン２０２と２０３との中
点と、マイクロホン２０４と２０５との中点とが一致
し、この中点２０７を通り、鉛直面２０６と垂直な線２
０８が、顔面２０１が鉛直面２０６と対向した状態の顔
面２０１の口唇部２０９を通るようにされている。つま
り、発声者２００の頭部正中を通る鉛直面（図１Ａの直
線２０８を通る鉛直面）上にマイクロホン２０２，２０
３を配し、その頭部正中・鉛直面の左右にマイクロホン
２０４，２０５を配する。また、各マイクロホン２０２
〜２０５は単一指向性で、その指向方向を口唇部２０９
に向け、対をなすマイクロホン２０２と２０３，２０４
と２０５はそれぞれ互いに同一特性のものである。FIG. 1 shows an embodiment of a face direction recognition apparatus according to the present invention. As shown in FIG.
Microphones 202 and 203 arranged vertically in order to determine the direction of the face 201 of the speaker 200 in the vertical direction, and horizontally arranged in order to determine the direction of the face 201 in the horizontal plane. Microphones 204 and 205 are provided. In this example, the microphones 202, 203, 204, 20
5 is a substantially vertical surface 20 facing the front of the speaker.
6, the midpoint of the microphones 202 and 203 coincides with the midpoint of the microphones 204 and 205, and passes through the midpoint 207 and is perpendicular to the vertical plane 206.
08 passes through the lips 209 of the face 201 in a state where the face 201 faces the vertical plane 206. That is, the microphones 202 and 20 are placed on a vertical plane passing through the median head of the speaker 200 (vertical plane passing through the straight line 208 in FIG. 1A).
The microphones 204 and 205 are arranged on the left and right sides of the median head and the vertical plane. In addition, each microphone 202
Reference numeral 205 denotes a uni-directional pattern, and the direction of the
Microphones 202, 203 and 204 forming a pair
And 205 have the same characteristics.

【０００８】一方の一対のマイクロホン２０２，２０３
からの音声は、図１Ｂに示すように特徴量分析器２１
０，２１１に入力される。特徴量分析器２１０，２１１
は、発声器官及び頭部の形状の影響で口唇部２０９から
マイクロホン２０２，２０３への方向の違いによる音声
の音響的変化を検出する。差分抽出器２１２は、特徴量
分析器２１０，２１１で検出された二つの特徴量の差を
算出する。判別分析器２１３は差分抽出器２１２の出力
から顔面２０１の鉛直面における方向（上下方向）を判
別する。One pair of microphones 202 and 203
From the feature analyzer 21 as shown in FIG. 1B.
0,211. Feature value analyzers 210 and 211
Detects an acoustic change in speech due to a difference in direction from the lips 209 to the microphones 202 and 203 under the influence of the shape of the vocal organs and the head. The difference extractor 212 calculates the difference between the two feature amounts detected by the feature amount analyzers 210 and 211. The discriminant analyzer 213 discriminates the direction (vertical direction) of the face 201 in the vertical plane from the output of the difference extractor 212.

【０００９】図に示していないが他方の一対のマイクロ
ホン２０４，２０５の捕捉音声もそれぞれ特徴量が検出
され、これら特徴量の差から顔面２０１の水平面におけ
る方向（左右方向）が判別される。図１Ｃにこの発明の
装置としての具体的実装例を示す。特徴量分析器２１
０，２１１としてそれぞれ音声パワーを抽出し、これら
音声パワーの差により顔面方向を判定する。つまり頭部
周囲の音圧は、水平面及び頭部正中・鉛直面の分布とも
顔正面に比べ後背部で数〜十数dB減衰する。よって前記
音声パワーの差により顔面方向を判定できる。Although not shown in the figure, the feature amounts of the voices captured by the other pair of microphones 204 and 205 are also detected, and the direction (horizontal direction) of the face 201 in the horizontal plane is determined from the difference between these feature amounts. FIG. 1C shows a specific implementation example of the device of the present invention. Feature analyzer 21
Audio powers are extracted as 0 and 211, respectively, and the face direction is determined based on a difference between these audio powers. In other words, the sound pressure around the head is attenuated by several to several tens of dB at the back as compared to the front of the face in both the horizontal plane and the distribution of the median and vertical planes of the head. Therefore, the face direction can be determined from the difference in the audio power.

【００１０】この場合、音声パワーを比較するマイクロ
ホン同士の周波数特性が異なると、スペクトル包絡の変
化によってパワーに差が生じる。これを除くため、パワ
ーを比較する周波数領域を制限し、スペクトル包絡の変
化によるパワーの変動を抑える。特徴量分析器２１０，
２１１として周波数スペクトル分析器３０１，３０２を
用いる。これは、一定期間ごとに音声信号の強さを周波
数別に抽出するものである。一対の周波数分析器３０
１，３０２の出力は、差分抽出器２１２で入力の差分を
求めるかわりに、伝達関数を伝達関数算出器３０３で算
出する。伝達関数は、周波数スペクトル分析器３０１，
３０２の各分析周波数スペクトルをＰx(ｆ），Ｐy
（ｆ）とするときに次の式で求められる。[0010] In this case, if the frequency characteristics of the microphones for comparing the audio powers are different, a difference occurs in the power due to a change in the spectral envelope. To eliminate this, the frequency domain in which the power is compared is limited, and power fluctuation due to a change in the spectral envelope is suppressed. Feature quantity analyzer 210,
The frequency spectrum analyzers 301 and 302 are used as 211. In this method, the strength of the audio signal is extracted for each frequency at regular intervals. A pair of frequency analyzers 30
The transfer function is calculated by a transfer function calculator 303 instead of calculating the difference between the inputs of the outputs 1 and 302 by the difference extractor 212. The transfer function is represented by the frequency spectrum analyzer 301,
Each analysis frequency spectrum of 302 is Px (f), Py
(F) is obtained by the following equation.

【００１１】Ｔxy（ｆ）＝Ｐxy（ｆ）／Ｐxx（ｆ）ただし、Ｐxx（ｆ）は周波数スペクトルをＰx （ｆ）の
自乗、Ｐxy（ｆ）は周波数スペクトルＰx （ｆ），Ｐy
（ｆ）の積である。差分抽出器の出力は、つまり伝達関
数算出器３０３の出力は判別分析器３０４で顔方向に変
換され、出力される。前記伝達関数は、一方のマイクロ
ホン、例えば２０２の出力を伝送路の入力とし、他方の
マイクロホン２０３の出力を伝送路の出力とする伝達関
数である。Txy (f) = Pxy (f) / Pxx (f) where Pxx (f) represents the frequency spectrum as the square of Px (f), and Pxy (f) represents the frequency spectrum Px (f), Py
(F). The output of the difference extractor, that is, the output of the transfer function calculator 303 is converted to the face direction by the discriminant analyzer 304 and output. The transfer function is a transfer function in which the output of one microphone, for example, 202 is input to the transmission path and the output of the other microphone 203 is the output of the transmission path.

【００１２】次に実験例を述べる。図２に示すように防
音室内で、机４０１に向かって座った発声者２００の顔
方向の検出にこの発明を適用した場合である。発声者２
００の頭部正中・鉛直面４０２上において、発声者２０
０の胸に全指向性のコレクトレットコンデンサマイクロ
ホン４０３を取付け、鉛直面４０２から左右に等距離は
なれて机４０１上に単一指向性のダイナミックマイクロ
ホン２０４，２０５を配置し、指向方向を発声者２００
の顔面に向けた。発声者２００の顔面を定めた方向に向
け、予め用意した文章を朗読させた。発声者２００とし
て男性１名、女性１名について、正面方向４０４，左マ
イク方向４０５，右マイク方向４０６，左右マイク対称
面で下方向４０７の４方向につき、各マイクロホン２０
４，２０５，４０３の出力を標本化周波数４８kHz でデ
ィジタルデータとして取込み、それぞれを周波数スペク
トル分析して、対となるマイクロホンのパワースペクト
ラム比を求めた。Next, an experimental example will be described. FIG. 2 shows a case where the present invention is applied to the detection of the face direction of a speaker 200 sitting at a desk 401 in a soundproof room. Speaker 2
On the head midline / vertical plane 402 of 00, the speaker 20
An omni-directional collectlet condenser microphone 403 is attached to the chest of the subject 0, and unidirectional dynamic microphones 204 and 205 are placed on the desk 401 at equal distances from the vertical plane 402 to the left and right, and the directional direction of the speaker 200 is changed.
Turned to the face. The sentence prepared in advance was read aloud with the face of the speaker 200 facing in a predetermined direction. Regarding one male and one female speaker 200, each microphone 20 in four directions of front direction 404, left microphone direction 405, right microphone direction 406, and left and right microphone symmetry plane downward 407.
The outputs of 4,205,403 were taken in as digital data at a sampling frequency of 48 kHz, and each was subjected to frequency spectrum analysis to determine the power spectrum ratio of a paired microphone.

【００１３】正面方向４０４を向いて発声したときの、
左、右マイクロホン２０４，２０５の電力スペクトラム
比は図３Ａに示すように、２０kHz までほぼ平坦なもの
となっている。発声者２００が左方を向いたときの左、
右マイクロホン２０４，２０５のパワースペクトラム比
は図３Ｂに示すように、２０kHz 以上での減衰の他に１
３kHz，１７kHz 付近に減衰が生じている。この減衰は
発声者が正面を向いて同一の発声をしたときには生じて
ない。従って、マイクロホン２０４，２０５の出力音声
パワーの比の大きさにより、発声者２００が正面を向い
ているか、左を向いているかを判別することができる。
この場合、図３Ａ，Ｂの特に変化がある帯域、この例で
は１０kHz 〜２０kHz の成分だけを取り出せば、顔面の
向きによる音声パワー比の大きさの違いが一層大きくな
り、向きを正しく判別することができる。図に示してい
ないが、正面を向いているときと、右を向いているとき
とをマイクロホン２０４，２０５の出力音声パワー比の
大きさで判別することができる。When the utterance is directed toward the front direction 404,
The power spectrum ratios of the left and right microphones 204 and 205 are almost flat up to 20 kHz as shown in FIG. 3A. The left when the speaker 200 turns to the left,
As shown in FIG. 3B, the power spectrum ratio of the right microphones 204 and 205 is 1 in addition to the attenuation above 20 kHz.
Attenuation occurs around 3 kHz and 17 kHz. This attenuation does not occur when the speaker makes the same utterance facing the front. Therefore, it is possible to determine whether the speaker 200 faces the front or the left based on the ratio of the output audio powers of the microphones 204 and 205.
In this case, if only the frequency band of FIGS. 3A and 3B which has a particular change, in this example, only the component of 10 kHz to 20 kHz, is taken out, the difference in the magnitude of the sound power ratio depending on the direction of the face is further increased, and the direction can be correctly determined. Can be. Although not shown in the drawing, it is possible to determine whether the microphone is facing the front or to the right based on the magnitude of the output audio power ratio of the microphones 204 and 205.

【００１４】発声者２００が正面を向いたときの、中央
マイクロホン４０３のパワースペクトルに対する左側マ
イクロホン２０４のパワースペクトルの比は図３Ｃに示
すようになった。この場合は、マイクロホン２０４，４
０３の周波数特性が異なるため、１５kHz 以上の高域で
大きな連続した落ち込みが生じている。発声者２００が
下方を向いたときの、マイクロホン４０３のパワースペ
クトルに対するマイクロホン２０４のパワースペクトル
の比は図３Ｄに示すように左右マイクロホンの出力の比
較と同様に１３kHz 付近で落ち込みが生じ、つまり図３
Ｃに対し、周波数特性が平坦な部分でのパワースペクト
ル比にわずかな差が生じている。よって、この差の検出
により正面を向いているか、下方を向いているかの判別
をすることができる。この場合、マイクロホン２０４と
４０３の周波数特性の違いに基づく影響を避けるため、
この例では１０〜１５kHz 帯の成分のみを取り出して、
パワースペクトルの比の大きさを検出すればより誤りな
く、判別することができる。FIG. 3C shows the ratio of the power spectrum of the left microphone 204 to the power spectrum of the center microphone 403 when the speaker 200 faces front. In this case, the microphones 204 and 4
Since the frequency characteristics of the frequency band 03 differ, a large continuous drop occurs in a high frequency range of 15 kHz or more. When the speaker 200 is turned downward, the ratio of the power spectrum of the microphone 204 to the power spectrum of the microphone 403 drops as shown in FIG. 3D near 13 kHz as in the comparison of the outputs of the left and right microphones.
For C, a slight difference occurs in the power spectrum ratio in a portion where the frequency characteristics are flat. Therefore, by detecting this difference, it is possible to determine whether the vehicle is facing the front or downward. In this case, to avoid the influence based on the difference in the frequency characteristics between the microphones 204 and 403,
In this example, only the components in the 10-15 kHz band are extracted,
If the magnitude of the ratio of the power spectrum is detected, the determination can be made without error.

【００１５】以上の説明から理解されるように、マイク
ロホンとしては水平面内での左右に配列された少なくと
も２つのマイクロホンと、鉛直面内で上下方向に配列さ
れた少なくとも２つのマイクロホンとを設ければよく、
この際、水平面内での配列マイクロホンの１つと、鉛直
面内での配列マイクロホンの１つとを共用することがで
きる。As will be understood from the above description, as microphones, at least two microphones arranged on the left and right in the horizontal plane and at least two microphones arranged in the vertical direction on the vertical plane are provided. Often,
In this case, one of the arrangement microphones in the horizontal plane and one of the arrangement microphones in the vertical plane can be shared.

【００１６】このように２対のマイクロホンを用いるこ
とにより、顔の向きが左右方向か上下方向かの判別を行
うことができ、従って右上、右下などの斜め方向も判別
できる。このように４方の方向の検出のみならず、一対
のマイクロホンを用いることにより、このマイクロホン
を、発声者と対向する鉛直面に投影した位置を結ぶ直線
上のどの位置に顔が向いているかを検出できる。つま
り、この発明は例えば左右方向の何れに向いているかの
検出のみに対しても適用できる。By using two pairs of microphones as described above, it is possible to determine whether the direction of the face is in the left-right direction or in the up-down direction. Therefore, it is also possible to determine diagonal directions such as upper right and lower right. As described above, in addition to the detection of the four directions, by using a pair of microphones, it is possible to determine which position on the straight line that connects the microphone with the position projected on the vertical plane facing the speaker. Can be detected. That is, the present invention can be applied to, for example, only the detection in which of the left and right directions.

【００１７】使用するマイクロホンの向きは、雑音の影
響を避け、また信号音の受音レベルを大とする点から、
発声者の顔に向けた方がよいが必ずしもそのようにしな
くてもよい。また使用するマイクロホンは対をなすも
の、つまり音声パワー比乃至差をとるものは同一周波数
特性のものが好ましいが、これも必ずしも同一でなくて
もよい。The direction of the microphone to be used is to avoid the influence of noise and to increase the reception level of the signal sound.
It is better to point to the speaker's face, but it is not necessary. Microphones used in pairs, that is, microphones having a sound power ratio or difference preferably have the same frequency characteristics, but they need not always be the same.

【００１８】更に帯域制限をする場合に、上述では周波
数スペクトル分析を行った後、所要の成分を取り出した
が、マイクロホン出力を所要帯域を通過帯域とするフィ
ルタを通し、その出力の音声パワーを求めてもよい。ま
た顔面の正面、左向き、右向き、下向き、つまり正面に
対し、左、右、上、下の何れの方向に向いているかを判
別したが、同一の向きでも上記正面から大きく方向がず
れているほど、正面に向いた時との出力比に対する差が
大きくなるから、予めある角度範囲ごとに出力比の大き
さを記憶しておき、測定した電力比がどの範囲にあるか
に応じて、正面に対する向きの大きさも判別することが
できる。In order to further limit the band, the required components are extracted after the frequency spectrum analysis has been performed in the above, but the microphone output is passed through a filter having a required band as a pass band, and the audio power of the output is obtained. You may. Also, it was determined whether the face was facing the front, left, right, or downward, that is, whether the face was facing left, right, up, or down. Since the difference with respect to the output ratio when facing the front increases, the magnitude of the output ratio is stored in advance for each angle range, and depending on which range the measured power ratio is, The size of the direction can also be determined.

【００１９】[0019]

【発明の効果】従来の顔方向認識方法では、画像パタン
認識を行うために多量の計算と参照パタンを用意する必
要があった。ところが、この発明を用いれば、音声信号
から特徴量を抽出し、判別分析を行うことで、認識に必
要な参照パタンや計算量そのものを削減でき、認識処理
を効率よく行うことができる。According to the conventional face direction recognition method, it is necessary to prepare a large amount of calculations and reference patterns in order to perform image pattern recognition. However, according to the present invention, by extracting a feature amount from a speech signal and performing discriminant analysis, a reference pattern and a calculation amount required for recognition can be reduced and recognition processing can be performed efficiently.

[Brief description of the drawings]

【図１】Ａはこの発明の一実施例における対象顔面２０
１とマイクロホンの配置関係の例を示す図、Ｂは一対の
マイクロホンの出力の処理装置の例を示すブロック図、
Ｃはその具体的実装例を示すブロック図である。FIG. 1A shows a target face 20 according to an embodiment of the present invention.
FIG. 1B is a diagram illustrating an example of an arrangement relationship between 1 and a microphone; FIG.
C is a block diagram showing a specific implementation example.

【図２】この発明の実験例における発声者とマイクロホ
ンの配置、顔の向きの関係を示す斜視図。FIG. 2 is a perspective view showing the relationship between the position of a speaker and a microphone and the orientation of a face in an experimental example of the present invention.

【図３】この発明の実験結果におけるパワースペクトラ
ム比の各種例を示す図。FIG. 3 is a diagram showing various examples of a power spectrum ratio in experimental results of the present invention.

【図４】従来の顔方向認識方法を示す概念図。参考文献 [1] James L. Flanagan. Speech Analysis Synthesis a
nd Perception. Splinger-Verlag, 1972.FIG. 4 is a conceptual diagram showing a conventional face direction recognition method. References [1] James L. Flanagan. Speech Analysis Synthesis a
nd Perception. Splinger-Verlag, 1972.

Claims

[Claims]

1. A method for recognizing a face direction of a speaker, wherein a pair of microphones for capturing the voice of the speaker is used, and the positions of the microphones projected on a vertical surface facing the speaker are shifted from each other. The difference between the sound powers of the outputs of the pair of microphones is determined, and the speaker is determined in any position direction on a straight line connecting the pair of projected microphones on the vertical surface by the determined sound power difference. A face direction recognizing method characterized in that it is determined whether or not a face is facing.

2. A straight line connecting the positions of the pair of projected microphones on the vertical plane is substantially a horizontal line, and the other pair of microphones capturing the voice of the speaker is used as the other pair of microphones. The straight line connecting the positions projected on the vertical plane facing the speaker is arranged so as to be substantially a vertical line, and the difference between the sound powers of the outputs of the other pair of microphones is obtained. 2. The face direction recognition method according to claim 1, wherein it is determined in which position direction on the vertical line the face of the speaker is facing.

3. The face direction recognition method according to claim 2, wherein one of said pair of microphones and one of said other pair of microphones are common to each other.

4. The face direction recognition method according to claim 1, wherein the determination of the face direction based on the difference in audio power is performed while paying attention to attenuation at a specific frequency.

5. The difference between the audio powers is obtained by calculating the product of the frequency spectrum Px (f) of one output of the pair of microphones and the frequency spectrum Py (f) of the other output as Px
The face direction recognition method according to any one of claims 1 to 4, wherein the value is a value obtained by dividing the square by (f).

6. A device for recognizing a face direction of a speaker, comprising: a pair of microphones for capturing the voice of the speaker; and an acoustic change of voice due to a difference in the face direction from an output of the pair of microphones. A pair of feature amount analyzing means for detecting as a feature amount; a difference extracting means for obtaining a difference between the feature amounts detected by the pair of feature amount analyzing means; and a face direction based on the difference extracted by the difference extracting means. A face direction recognizing device comprising: a discriminant analyzing means for discriminating.

7. The face direction recognition apparatus according to claim 6, further comprising means for restricting detection of the pair of feature amounts to a predetermined frequency band.