JP2007228135A

JP2007228135A - Utterance position estimate method and utterance position estimate apparatus using same, and electric wheelchair

Info

Publication number: JP2007228135A
Application number: JP2006045096A
Authority: JP
Inventors: Akira Saso; 晃佐宗
Original assignee: National Institute of Advanced Industrial Science and Technology AIST
Current assignee: National Institute of Advanced Industrial Science and Technology AIST
Priority date: 2006-02-22
Filing date: 2006-02-22
Publication date: 2007-09-06
Anticipated expiration: 2026-02-22
Also published as: JP4682344B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an electric wheelchair for accurately hearing operation voice uttered by a user even under noises and controlling itself on the basis of the voice. <P>SOLUTION: The electric wheelchair is characterized by including: a voice input means provided with a plurality of voice receiving means arranged apart from each other in order to receive a plurality of voices through multi-channels; an utterance position estimate means that estimates the utterance position of the user on the basis of multi-channel voice data received by the voice receiving means and outputs an utterance position estimate signal; and a control means for controlling a drive source of wheels based on the utterance position estimate signal. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

自分の意思で自分の行きたいところへ行くという自立移動は、人にとって重要な生活機能である。重度の障害により、身体機能に制限がある場合、自立移動の実現は、生活や精神的側面に絶大な効果を発揮する。本発明は、このような自立移動を支援するために、声や「スー」という摩擦音、また口笛など音をある程度発することができ、更に目的の方向に首や上半身を使って口先などの音源位置を移動することができる障害者を対象として、発声位置を推定しその情報に基づいて電動車椅子の操作をする発声位置推定方法およびそれを用いた発生位置推定装置、電動車椅子に関する。 Independent movement to go where you want to go for your own will is an important life function for people. When physical functions are limited due to severe disability, the achievement of self-sustained movement has a profound effect on life and mental aspects. In order to support such independent movement, the present invention can emit a sound such as a voice, a friction sound such as "Sue", a whistle, etc., and further a sound source position such as the mouth using the neck and upper body in the intended direction The present invention relates to an utterance position estimation method for estimating a utterance position and operating an electric wheelchair based on the information, and a generation position estimation apparatus and an electric wheelchair using the utterance position.

音声により制御可能な電動車椅子に関する先行技術として特許文献１や特許文献２などがあるが、いずれも音声の入力装置としてシングルマイクロフォンの使用を前提としている。よって、このような先行技術では、ユーザ音声の発声位置を検出することは不可能であり、電動車椅子を制御するためには、音声認識技術と組み合わせる必要がある。
特開２００３−３１０６６５号公報特開平６−２２５９１０号公報 Prior art relating to an electric wheelchair that can be controlled by voice includes Patent Document 1 and Patent Document 2, and all of them are based on the use of a single microphone as a voice input device. Therefore, in such a prior art, it is impossible to detect the utterance position of the user voice, and it is necessary to combine the voice recognition technique in order to control the electric wheelchair.
JP 2003-310665 A JP-A-6-225910

様々な環境騒音が存在する実環境下において、音・音声を用いたインタフェースにより電動車椅子を操作する場合、雑音に対する頑健性が必要不可欠である。従来のシングルマイクロフォンから入力される音声で制御する電動車椅子では、雑音の混入を抑えるためにヘッドセットなどの接話型マイクロフォンが広く用いられている。しかし、ヘッドセットマイクロフォンは、電動車椅子を使用する度に装着する必要があり、また使用中に位置がずれた場合は、自分でその位置を修正する必要がある。これでは、例えば、手を自由に動かすことが困難な障害者などにとって、必ずしも実用的とは言えない。また、車椅子制御に音声認識を用いる従来技術では、ユーザが識別可能な音声を発声できることが前提となるが、重度の障害を持つ障害者の中には、明瞭な音声を出すことが困難な場合もある。
本発明の目的は、上記従来例の欠点に鑑み、雑音中であっても正確に使用者の発生する操作用の音声を聞き取り、それに基づき制御する発声位置推定方法およびそれを用いた発声位置推定装置、電動車椅子を提供することにある。 In an actual environment where various environmental noises exist, robustness against noise is indispensable when an electric wheelchair is operated by an interface using sound and voice. In an electric wheelchair that is controlled by sound input from a conventional single microphone, a close-talking microphone such as a headset is widely used to suppress noise mixing. However, the headset microphone needs to be worn every time the electric wheelchair is used, and if the position shifts during use, the position needs to be corrected by itself. This is not necessarily practical, for example, for people with disabilities who cannot easily move their hands. In addition, in the conventional technology using voice recognition for wheelchair control, it is assumed that the user can utter a voice that can be identified, but it is difficult for a disabled person with severe disability to make a clear voice. There is also.
In view of the drawbacks of the above-described conventional example, an object of the present invention is to accurately hear a voice for operation generated by a user even in a noise and to control based on the voice and a voice position estimation method using the voice. The object is to provide an electric wheelchair.

本発明は上記課題を解決するために以下の解決手段を採用する。
（ａ）電動車椅子は、ユーザ音声をマイクロフォンアレイで受音する音声入力手段と、そのマルチチャネル音声データから、周囲雑音に対して頑健なユーザ音声の発声位置推定手段と、ジョイスティックと緊急停止ボタンによる補助操作手段と、ユーザ音声の発声位置推定結果や車椅子の状態を視覚的に示す表示手段と、車椅子の車輪を駆動する駆動手段と、前記推定した発声位置に基づいて前記駆動手段を制御する制御手段を有することを特徴とする。
（ｂ）ユーザ音声の発声位置推定方法および発声位置推定装置は、平行マイクロフォンアレイを用いることを特徴とする。
（ｃ）電動車椅子は、マイクロフォンアレイを用いてユーザ音声の発声位置を推定し、それに基づいて操作および制御される。
具体的には、以下の手段を採用する。 The present invention employs the following means for solving the above problems.
(A) The electric wheelchair includes voice input means for receiving a user voice by a microphone array, voice voice position estimation means that is robust against ambient noise from the multi-channel voice data, a joystick, and an emergency stop button. Auxiliary operation means, display means for visually indicating the utterance position estimation result of the user voice and the state of the wheelchair, drive means for driving the wheel of the wheelchair, and control for controlling the drive means based on the estimated utterance position It has the means.
(B) A voice position estimation method and a voice position estimation apparatus for user speech use a parallel microphone array.
(C) The electric wheelchair uses the microphone array to estimate the utterance position of the user voice, and is operated and controlled based on the estimation.
Specifically, the following means are adopted.

（１）発声位置推定方法は、
座標Ｐ＝（Ｐｘ，Ｐｙ，Ｐｚ）にある音源の位置ベクトルａ（ω，Ｐ）と、マイクロフォン入力の連続するフレームの短時間フーリエ変換値を要素とした観測ベクトルから求めた相関行列Ｒ（ω）の雑音部分空間相関行列Ｒｎ（ω）を求め、下記の関数Ｆ（Ｐ）を計算し、

関数Ｆ（Ｐ）を最大にする座標Ｐ０＝（Ｐｘ０，Ｐｙ０，Ｐｚ0）を求め、その座標から最大値Ｆ（Ｐ０）を求め、その座標から到来する音源のパワーを下記の関数Ｐ（Ｐ０）により求め、

求めた関数Ｆ（Ｐ０）と関数Ｐ（Ｐ０）の値を所定の閾値と比較し、ともに所定の閾値以上で有る場合に前記座標Ｐ０において発音があったと判断することを特徴とする。
（２）上記（１）記載の発声位置推定方法は、
前記音源の位置ベクトルａ（ω，Ｐ）を求める際に、利得の関数を簡単に音源-マイクロフォン間距離ｒだけに依存する関数ｇ（ｒ）として、次式を用いることを特徴とする。

（３）上記（１）記載の発声位置推定方法は、
雑音部分空間相関行列Ｒｎ（ω）を求める際に必要な音源数Ｓを、次式により求めることを特徴とする。

（４）上記（１）記載の発声位置推定方法は、
前記座標Ｐ０＝（Ｐｘ０，Ｐｙ０，Ｐｚ０）を画像表示することを特徴とする。
（５）発生位置推定装置は、
座標Ｐ＝（Ｐｘ，Ｐｙ，Ｐｚ）にある音源の位置ベクトルａ（ω，Ｐ）と、マイクロフォン入力の連続するフレームの短時間フーリエ変換値を要素とした観測ベクトルから求めた相関行列Ｒ（ω）の雑音部分空間相関行列Ｒｎ（ω）を求め、下記の関数Ｆ（Ｐ）を計算し、

求めた関数Ｆ（Ｐ０）と関数Ｐ（Ｐ０）の値を所定の閾値と比較し、ともに所定の閾値以上で有る場合に前記座標Ｐ０において発音があったと判断することを特徴とする。
（６）上記（５）記載の発声位置推定装置は、
前記音源の位置ベクトルａ（ω，Ｐ）を求める際に、利得の関数を簡単に音源-マイクロフォン間距離ｒだけに依存する関数ｇ（ｒ）として、次式を用いることを特徴とする。

（７）上記（５）記載の発声位置推定装置は、
雑音部分空間相関行列Ｒｎ（ω）を求める際に必要な音源数Ｓを、次式により求めることを特徴とする。

（８）上記（５）記載の発声位置推定装置は、
前記座標Ｐ０＝（Ｐｘ０，Ｐｙ０，Ｐｚ０）を画像表示することを特徴とする。
（９）電動車椅子は、
ユーザ音声をマルチチャンネルで複数受音するために相互に離間して配置した複数の受音手段を備えた音声入力手段と、前記受音手段で受音したマルチチャネル音声データからユーザの発声位置を推定し発声位置推定信号を出力する発声位置推定手段と、前記発声位置推定信号に基づき車輪の駆動源を制御する制御手段と、を備えることを特徴とする。
（１０）上記（９）記載の電動車椅子は、
前記音声入力手段は、ユーザがシートに座ったときにユーザの両側にそれぞれ配置されるマイクロフォンを含むことを特徴とする。
（１１）上記（１０）記載の電動車椅子は、
前記マイクロフォンを、平行配置したマイクロフォンアレイとしたことを特徴とする。
（１２）上記（９）乃至（１１）のいずれか１項記載の電動車椅子は、
ユーザ音声を受音するために相互に離間して配置した複数のマイクロフォンアレイからなる受音手段を備えた音声入力手段と、前記受音手段で受音したマルチチャネル音声データに基づきユーザの発声位置を推定し発声位置推定信号を出力する発声位置推定手段と、座標位置指定手段および停止ボタンにより補助操作信号を出力する補助操作手段と、前記発声位置推定信号および車椅子の状態を視覚的に示す画像表示手段と、車椅子の車輪の駆動源を駆動制御する駆動手段と、前記発声位置推定信号および前記補助操作信号に基づき前記駆動手段を制御する制御手段を備えたことを特徴とする。 (1) The utterance position estimation method is:
Correlation matrix R (ω obtained from the position vector a (ω, P) of the sound source at coordinates P = (Px, Py, Pz) and the observation vector having short-time Fourier transform values of successive frames of the microphone input as elements. ) Noise subspace correlation matrix Rn (ω), and the following function F (P) is calculated:

The coordinate P0 = (Px0, Py0, Pz0) that maximizes the function F (P) is obtained, the maximum value F (P0) is obtained from the coordinate, and the power of the sound source coming from that coordinate is expressed by the following function P (P0). Sought by

The obtained values of the function F (P0) and the function P (P0) are compared with a predetermined threshold value, and when both are equal to or higher than the predetermined threshold value, it is determined that there is a pronunciation at the coordinate P0.
(2) The utterance position estimation method described in (1) above is:
When the position vector a (ω, P) of the sound source is obtained, the following equation is used as a function g (r) that simply depends on the sound source-microphone distance r as a gain function.

(3) The utterance position estimation method described in (1) above is:
The number of sound sources S necessary for obtaining the noise subspace correlation matrix Rn (ω) is obtained by the following equation.

(4) The utterance position estimation method described in (1) above is:
The coordinate P0 = (Px0, Py0, Pz0) is displayed as an image.
(5) The generation position estimation device
Correlation matrix R (ω obtained from the position vector a (ω, P) of the sound source at coordinates P = (Px, Py, Pz) and the observation vector having short-time Fourier transform values of successive frames of the microphone input as elements. ) Noise subspace correlation matrix Rn (ω), and the following function F (P) is calculated:

The obtained values of the function F (P0) and the function P (P0) are compared with a predetermined threshold value, and when both are equal to or higher than the predetermined threshold value, it is determined that there is a pronunciation at the coordinate P0.
(6) The utterance position estimation apparatus according to (5) above is
When the position vector a (ω, P) of the sound source is obtained, the following equation is used as a function g (r) that simply depends on the sound source-microphone distance r as a gain function.

(7) The utterance position estimation apparatus according to (5) above is
The number of sound sources S necessary for obtaining the noise subspace correlation matrix Rn (ω) is obtained by the following equation.

(8) The utterance position estimation apparatus according to (5) above is
The coordinate P0 = (Px0, Py0, Pz0) is displayed as an image.
(9) Electric wheelchairs
In order to receive a plurality of user voices in multi-channel, voice input means including a plurality of sound receiving means arranged apart from each other, and a user's utterance position from the multi-channel voice data received by the sound receiving means The voice position estimation means for estimating and outputting a voice position estimation signal, and a control means for controlling a wheel drive source based on the voice position estimation signal.
(10) The electric wheelchair described in (9) above is
The voice input unit includes microphones disposed on both sides of the user when the user is seated on a seat.
(11) The electric wheelchair described in (10) above is
The microphone is a microphone array arranged in parallel.
(12) The electric wheelchair according to any one of (9) to (11) above,
Voice input means comprising sound receiving means composed of a plurality of microphone arrays arranged to be separated from each other for receiving user voice, and a user's utterance position based on multi-channel voice data received by the sound receiving means A speech position estimation means for estimating a speech position estimation signal, an auxiliary operation means for outputting an auxiliary operation signal by means of a coordinate position designation means and a stop button, and an image visually showing the speech position estimation signal and the state of the wheelchair It is characterized by comprising a display means, a drive means for controlling the drive source of the wheelchair wheel, and a control means for controlling the drive means based on the utterance position estimation signal and the auxiliary operation signal.

車椅子に固定されたマイクロフォンアレイを用いることで、ユーザはコードや機器を一切身につける必要がなくなり、電動車椅子の利用が容易になる。
このため、手を自由に動かすことが困難な障害者などが使用しても、マイクロフォンの装着やマイクロフォン位置の修正などの手続きを必要としない実用的な電動車椅子が実現される。
平行マイクロフォンアレイ音声入力装置を利用することで、周囲に妨害雑音が複数存在しても、ユーザの発声位置を推定することが可能になる。
そして、その発声位置を利用して電動車椅子を制御することが可能になる。
このため、ユーザは明瞭な音声を発声する必要はなく、例えば、不明瞭な音声、「スー」という摩擦音、また口笛などであっても、そのような音をある程度発することができ、更に目的の方向に首や上半身を使って口先などの音源位置を移動することができさえすれば、本発明の電動車椅子が利用可能となる。 By using the microphone array fixed to the wheelchair, the user does not need to wear any cords or equipment, and the use of the electric wheelchair becomes easy.
For this reason, even if a handicapped person who cannot move his / her hand freely is used, a practical electric wheelchair which does not require procedures such as wearing a microphone and correcting the microphone position is realized.
By using the parallel microphone array voice input device, it is possible to estimate the user's utterance position even if there are multiple interference noises in the surroundings.
And it becomes possible to control an electric wheelchair using the utterance position.
For this reason, the user does not need to utter a clear voice. For example, even an unclear voice, a friction sound such as “Sue”, or a whistle can produce such a sound to some extent. The electric wheelchair of the present invention can be used as long as the position of the sound source such as the mouth can be moved using the neck or upper body in the direction.

本発明の実施の形態を図に基づいて詳細に説明する。
以下、本発明の電動車椅子の実施形態について説明する。なお、以下に示す電動車椅子は、本発明の一実施形態であり、当該実施形態に限定されるものではない。
図１は本実施形態の電動車椅子の外観図、図２は図１に示す電動車椅子の機能ブロック図である。図１に示すように、電動車椅子、例えば、２つの後輪３６ａ（図示省略）、３６ｂ、２つの前輪３５ａ，３５ｂ、後輪３６ａ、３６ｂの上方に設置されたシート３７と背もたれ４０、背もたれ４０の両側に設置された肘掛３３ａ，３３ｂ、前輪３５ａ，３５ｂの前方に設置された足置き４１ａ、４１ｂを有する。肘掛３３ａにはディスプレイ３１が、そして肘掛３３ｂにはジョイスティック３２と緊急停止ボタン３４がそれぞれ固定されている。背もたれ４０には、支柱４２ａ、４２ｂを介して取り付け金具を設けた調節バー３９を取り付ける。支柱４２ａおよび４２ｂと、取り付け金具を設けた調節バー３９によりスタンド金具を構成する。調節バー３９に設けた取り付け金具に、マイクロフォンアレイ３０ａ、３０ｂを先端部分に設けたマイクロフォン取付体３０ｃ、３０ｄを摺動自在に設ける。
一対のマイクロフォン取付体３０ｃ、３０ｄは、平行に配置され、ユーザの背後から両肩上を通ってユーザの口元より先まで達する程度の長さを持ち、その上にマイクロフォンを配置できるようにする。
また、図２に示すように、電動車椅子は、例えば、２本のマイクロフォンアレイ３０ａ，３０ｂで構成する平行マイクロフォンアレイ、マイクロフォンアンプとＡＤＣ（アナログ／デジタル変換器）６１が本発明の平行マイクロフォンアレイ音声入力手段、ディスプレイ３１が本発明の表示手段、ＣＰＵ（中央演算処理装置）ボード６３、記憶装置６４が本発明の制御手段、駆動制御６５、駆動モータ６７が本発明の駆動手段、ジョイスティック３２や緊急停止ボタン３４などの操作スイッチ６６が本発明の操作手段にそれぞれ対応している。ＣＰＵ６３と駆動制御６５は、シリアルケーブル６９で接続する。 Embodiments of the present invention will be described in detail with reference to the drawings.
Hereinafter, embodiments of the electric wheelchair of the present invention will be described. In addition, the electric wheelchair shown below is one Embodiment of this invention, and is not limited to the said embodiment.
FIG. 1 is an external view of the electric wheelchair of the present embodiment, and FIG. 2 is a functional block diagram of the electric wheelchair shown in FIG. As shown in FIG. 1, an electric wheelchair, for example, two rear wheels 36a (not shown), 36b, two front wheels 35a, 35b, a seat 37 installed above the rear wheels 36a, 36b, a backrest 40, and a backrest 40 are provided. Armrests 33a and 33b installed on both sides of the front and footrests 41a and 41b installed in front of the front wheels 35a and 35b. A display 31 is fixed to the armrest 33a, and a joystick 32 and an emergency stop button 34 are fixed to the armrest 33b. An adjustment bar 39 provided with a mounting bracket is attached to the backrest 40 via support posts 42a and 42b. The support brackets 42a and 42b and the adjustment bar 39 provided with the mounting bracket constitute a stand bracket. Microphone mounting bodies 30c and 30d provided with microphone arrays 30a and 30b at the tip portions are slidably provided on the mounting bracket provided on the adjustment bar 39.
The pair of microphone attachment bodies 30c and 30d are arranged in parallel and have a length that reaches from the back of the user through both shoulders to beyond the user's mouth so that the microphone can be placed thereon.
As shown in FIG. 2, the electric wheelchair includes, for example, a parallel microphone array constituted by two microphone arrays 30a and 30b, a microphone amplifier and an ADC (analog / digital converter) 61, and the parallel microphone array sound of the present invention. Input means, display 31 is display means of the present invention, CPU (central processing unit) board 63, storage device 64 is control means of the present invention, drive control 65, drive motor 67 is drive means of the present invention, joystick 32 and emergency The operation switches 66 such as the stop button 34 correspond to the operation means of the present invention. The CPU 63 and the drive control 65 are connected by a serial cable 69.

（平行マイクロフォンアレイ音声入力装置）
音声入力手段は、ユーザ音声を受音するために相互に離間して配置した複数のマイクロフォンアレイからなる受音手段を備える。
図１および図２に示した平行マイクロフォンアレイ音声入力装置の構成について、以下に説明する。図１に示すように、マイクロフォンを取り付ける２本の金具３０ａ，３０ｂは、一端をスタンドの金具３９に固定し、任意の間隔、例えば３７ｃｍの間隔で平行に背後から両肩上を通ってユーザの口元より先まで達する程度の長さを持ち、左右それぞれの金具上に任意数、例えば４つのマイクロフォン（計８個）を任意の間隔、例えば３ｃｍ間隔で配置している。走行中の振動に対しては、制振機構、例えば２本の支柱４２ａ，４２ｂの下にショックアブソーバを入れ、更にその２本の支柱を繋ぐ金具の中央に上下に移動可能な金具を取り付け、そこにスプリングを入れる等により上下の振動を吸収する機構を用いる。また、必要に応じて、前記上下移動可能な金具を挟んで左右にもスプリングを入れることで、横方向の振動を吸収する。マイクロフォンスタンドの高さ・幅およびマイクロフォンの位置は使用者毎に調整可能となっている。
図２に示すように、音声入力手段は、平行マイクロフォンアレイ３０ａ、３０ｂと、マイクロフォンアンプとＡＤＣ（アナログ／デジタルコンバータ）６１を有する。
受音手段は、少なくとも複数のマイクロフォンを備え、好ましくは多数個のマイクロフォンをアレイ状に配置したマイクロフォンアレイが好ましい。また、マイクロフォンの配置方向は、少なくとも相互に離間して、音源からのベクトルが異なるようにする。さらに好ましくは、マイクロフォンがユーザの両側に配置されていることが好ましい。このようにユーザの両側に配置されることにより、ユーザの音声入力が容易に且つ明瞭になる。 (Parallel microphone array audio input device)
The voice input means includes sound receiving means including a plurality of microphone arrays arranged to be separated from each other in order to receive user voice.
The configuration of the parallel microphone array audio input device shown in FIGS. 1 and 2 will be described below. As shown in FIG. 1, the two metal fittings 30a and 30b to which the microphones are attached are fixed at one end to the metal fitting 39 of the stand, and are parallel to each other at an arbitrary interval, for example, 37 cm, from the back through both shoulders. It has a length that extends beyond the mouth, and an arbitrary number of, for example, four microphones (total of eight microphones) are arranged at arbitrary intervals, for example, 3 cm intervals, on the left and right metal fittings. For vibration during travel, a shock absorber, for example, a shock absorber is placed under the two struts 42a and 42b, and a bracket that can move up and down is attached to the center of the bracket that connects the two struts. A mechanism that absorbs vertical vibrations, such as by inserting a spring, is used. Further, if necessary, horizontal vibrations are absorbed by inserting springs on the left and right sides of the vertically movable metal fitting. The height and width of the microphone stand and the position of the microphone can be adjusted for each user.
As shown in FIG. 2, the voice input unit includes parallel microphone arrays 30 a and 30 b, a microphone amplifier, and an ADC (analog / digital converter) 61.
The sound receiving means includes at least a plurality of microphones, and preferably a microphone array in which a large number of microphones are arranged in an array. The microphones are arranged at least apart from each other so that the vectors from the sound sources are different. More preferably, the microphones are disposed on both sides of the user. By being arranged on both sides of the user in this way, the user's voice input becomes easy and clear.

（発声位置推定手段と制御手段）
ＣＰＵ（中央演算処理装置）ボード６８は、ＣＰＵを搭載したボードからなり、発声位置推定手段および制御手段を含む。発声位置推定手段および制御手段は、ＣＰＵボード６８に接続される記憶装置６４を備える。
発声位置推定手段は、前記受音手段で受音したマルチチャネル音声データに基づきユーザの発声位置を推定し発声位置推定信号を出力する。
制御手段は、前記発声位置推定信号および前記補助操作信号に基づき前記駆動手段を制御する。
ＡＤＣ６１とＣＰＵボード６３はＵＳＢケーブル６８を介して接続し、マイクロフォンアンプおよびＡＤＣ６１の電源はＣＰＵボード６３から供給する。サンプリングレートは任意に設定でき、例えば８ｋＨｚとし、量子化ビット数は任意に設定でき、例えば１６ｂｉｔとする。処理精度を上げるときには、サンプリングレートおよび量子化ビット数を上げる。 (Speech position estimation means and control means)
The CPU (central processing unit) board 68 is a board on which a CPU is mounted, and includes an utterance position estimation unit and a control unit. The utterance position estimation unit and the control unit include a storage device 64 connected to the CPU board 68.
The utterance position estimation means estimates the utterance position of the user based on the multi-channel sound data received by the sound reception means and outputs a utterance position estimation signal.
The control means controls the driving means based on the utterance position estimation signal and the auxiliary operation signal.
The ADC 61 and the CPU board 63 are connected via the USB cable 68, and the power of the microphone amplifier and the ADC 61 is supplied from the CPU board 63. The sampling rate can be arbitrarily set, for example, 8 kHz, and the number of quantization bits can be arbitrarily set, for example, 16 bits. When increasing the processing accuracy, the sampling rate and the number of quantization bits are increased.

（補助入力手段）
補助操作手段は、操作スイッチ６６で代表され、例えばジョイスティック（図示省略）からなる座標位置指定手段、および、緊急停止ボタン（図示省略）により補助操作信号を出力する。 (Auxiliary input means)
The auxiliary operation means is represented by an operation switch 66, and outputs an auxiliary operation signal by means of a coordinate position designation means comprising a joystick (not shown) and an emergency stop button (not shown), for example.

（画像表示手段）
画像表示手段は、ディスプレイ３１を有し、前記発声位置推定信号および車椅子の状態等を視覚的に示す。 (Image display means)
The image display means has a display 31 and visually indicates the utterance position estimation signal and the state of the wheelchair.

（駆動手段）
駆動手段は、駆動制御装置６５を備え、車椅子の車輪の駆動源である駆動モータ６７を駆動制御する。 (Driving means)
The drive means includes a drive control device 65 and drives and controls a drive motor 67 that is a drive source of the wheelchair wheel.

（発声位置推定）
上記発声位置推定手段による、複数の受音手段を備えた音声入力装置からの入力信号を用いた発声位置推定処理について、以下に説明する。
3次元空間中の任意の位置

に置かれた点音源から出力された音響信号を、3次元空間中の任意の位置

に配置されたＱ個のマイクロフォンで受音する。点音源と各マイクロフォン間の距離Ｒｑは次式で求められる。

点音源から各マイクロフォンまでの伝播時間τｑは、音速をｖとすると、次式で求められる。 (Voice location estimation)
The utterance position estimation process using the input signal from the voice input device having a plurality of sound receiving means by the utterance position estimation means will be described below.
Arbitrary position in 3D space

An acoustic signal output from a point sound source placed in

The sound is received by Q microphones arranged in the. The distance Rq between the point sound source and each microphone can be obtained by the following equation.

The propagation time τq from the point sound source to each microphone can be obtained by the following equation, where the speed of sound is v.

各マイクロフォンで受音した中心周波数ωの狭帯域信号の、点音源のそれに対する利得ｇｑは、一般的に、点音源とマイクロフォン間の距離Ｒｑと中心周波数ωの関数として定義される。

中心周波数ωの狭帯域信号に関する、点音源と各マイクロフォン間の伝達特性は、

と表される。そして、位置Ｐ０にある音源を表す位置ベクトルａ（ω，Ｐ０）を、次式のように、狭帯域信号に関する、点音源と各マイクロフォン間の伝達特性を要素とする複素ベクトルとして定義する。

The gain gq of a narrowband signal having a center frequency ω received by each microphone with respect to that of a point sound source is generally defined as a function of the distance Rq between the point sound source and the microphone and the center frequency ω.

The transfer characteristics between a point sound source and each microphone for a narrowband signal with a center frequency ω are:

It is expressed. Then, the position vector a (ω, P0) representing the sound source at the position P0 is defined as a complex vector whose element is the transfer characteristic between the point sound source and each microphone regarding the narrowband signal, as in the following equation.

発声位置推定はＭＵＳＩＣ法（相関行列を固有値分解することで信号部分空間と雑音部分空間を求め、任意の音源位置ベクトルと雑音部分空間の内積の逆数を求めることにより、音源の到来方向や位置を調べる手法）を用いて、以下の手順で行う。ｑ番目のマイクロフォン入力の短時間フーリエ変換を

で表し、これを要素として観測ベクトルを次のように定義する。

ここで、ｎはフレーム時刻のインデックスである。連続するＮ個の観測ベクトルから相関行列を次式により求める。

The utterance position estimation is based on the MUSIC method (the signal subspace and the noise subspace are obtained by eigenvalue decomposition of the correlation matrix, and the arrival direction and position of the sound source are determined by obtaining the reciprocal of the inner product of an arbitrary sound source position vector and the noise subspace. The following procedure is used. Short-time Fourier transform of qth microphone input

The observation vector is defined as follows using this as an element.

Here, n is an index of frame time. A correlation matrix is obtained from the continuous N observation vectors by the following equation.

この相関行列の大きい順に並べた固有値を

とし、それぞれに対応する固有ベクトルを

とする。そして、音源数Ｓを次式により推定する。

雑音部分空間相関行列Ｒｎ（ω）を次のように定義し、

The eigenvalues arranged in descending order of this correlation matrix

And the corresponding eigenvectors

And Then, the number S of sound sources is estimated by the following equation.

Define the noise subspace correlation matrix Rn (ω) as:

周波数帯域

および発声領域Ｕ

として、

を計算し、関数Ｆ（Ｐ）を最大にする座標を求める。

frequency band

And utterance area U

As

To obtain the coordinates that maximize the function F (P).

次に、上記座標から到来する音源のパワーを、

により推定する。そして、２つの閾値Ｆｔｈｒ，Ｐｔｈｒを用意し、次の条件を満足するときに、

連続するＮ個のフレーム時間内の座標Ｐ＝（Ｐｘ，Ｐｙ，Ｐｚ）において発声があったと判断する。
発声位置推定処理は連続するＮ個のフレームを１つのブロックとして処理する。発声位置推定をより安定に行うためには、フレーム数Ｎを増やす、そして／また連続するＮｂ個のブロックの全てで式２０の条件が満たされたら発声があったと判断する。ブロック数は任意に設定する。ブロック数が多いほど、一般に精度が向上する傾向にある。 Next, the power of the sound source coming from the above coordinates,

Estimated by When two threshold values Fthr and Pthr are prepared and the following conditions are satisfied,

It is determined that there is a utterance at coordinates P = (Px, Py, Pz) within N consecutive frame times.
In the utterance position estimation process, consecutive N frames are processed as one block. In order to perform the utterance position estimation more stably, the number N of frames is increased, and / or it is determined that there is utterance when the condition of Expression 20 is satisfied in all of the consecutive Nb blocks. The number of blocks is set arbitrarily. As the number of blocks increases, the accuracy generally tends to improve.

以下では、具体例として、図３に示すように、8個のマイクロフォンを平面上に平行に配置する場合について説明する。座標（Ｐｘ，Ｐｙ）にある音源の位置ベクトルａ（ω，Ｐｘ，Ｐｙ）は次式のように表される。

ここで、ｍはマイクロフォンアレイ番号（右＝１、左＝２）、ｉはマイクロフォン番号を表す。また、ここでは利得の関数を簡単に音源-マイクロフォン間距離ｒだけに依存する関数ｇ（ｒ）として、例えば、実験的に求めた次式のような関数を用いる。 Hereinafter, as a specific example, a case where eight microphones are arranged in parallel on a plane as shown in FIG. 3 will be described. The position vector a (ω, Px, Py) of the sound source at the coordinates (Px, Py) is expressed as the following equation.

Here, m represents a microphone array number (right = 1, left = 2), and i represents a microphone number. In addition, here, for example, a function such as the following expression obtained experimentally is used as the function g (r) that simply depends on only the sound source-microphone distance r.

なお上記関数はマイクロフォンアレイが平行でなくても使うことができる。
発声位置推定は以下の手順で行う。（ｍ，ｉ）番目のマイクロフォン入力の短時間フーリエ変換を

The above function can be used even if the microphone array is not parallel.
The utterance position is estimated by the following procedure. Perform a short-time Fourier transform of the (m, i) th microphone input

The observation vector is defined as follows using this as an element.

この相関行列の大きい順に並べた固有値を

とし、それぞれに対応する固有ベクトルを

とする。そして、音源数Ｓを次式により推定する。 Here, n is an index of frame time. A correlation matrix is obtained from the continuous N observation vectors by the following equation.

The eigenvalues arranged in descending order of this correlation matrix

And the corresponding eigenvectors

And Then, the number S of sound sources is estimated by the following equation.

行列Ｒｎ（ω）を次のように定義し、

周波数帯域

および発声領域

として、

を計算し、関数Ｆ（Ｐｘ，Ｐｙ）を最大にする座標を求める。

Define the matrix Rn (ω) as:

frequency band

And utterance area

As

To obtain the coordinates that maximize the function F (Px, Py).

次に、上記座標から到来する音源のパワーを、

により推定する。

Next, the power of the sound source coming from the above coordinates,

Estimated by

そして、２つの閾値Ｆｔｈｒ，Ｐｔｈｒを用意し、次の条件を満足するときに、

連続するＮ個のフレーム時間内の座標（Ｐｘ０，Ｐｙ０）において発声があったと判断する。
発声位置推定処理は連続するＮ個のフレームを１つのブロックとして処理する。発声位置推定をより安定に行うためには、フレーム数Ｎを増やす、そして／また連続するＮｂ個のブロックの全てで式３５の条件が満たされたら発声があったと判断する。ブロック数は任意に設定する。ブロック数が多いほど、一般に精度が向上する傾向にある。 When two threshold values Fthr and Pthr are prepared and the following conditions are satisfied,

It is determined that there is a utterance at coordinates (Px0, Py0) within N consecutive frame times.
In the utterance position estimation process, consecutive N frames are processed as one block. In order to perform the utterance position estimation more stably, the number N of frames is increased, and / or if all of the consecutive Nb blocks satisfy the condition of Expression 35, it is determined that there is utterance. The number of blocks is set arbitrarily. As the number of blocks increases, the accuracy generally tends to improve.

（電動車椅子の制御方法）
上記の平行マイクロフォンアレイ音声入力装置による発声位置推定手段を用いた電動車椅子の制御方法について、以下に述べる。
図４は、当該動作例を説明するためのフローチャートである。また、図５は、ディスプレイ３１に表示する、電動車椅子制御インタフェースのレイアウト例である。
当該実施例では、上記式１２で表される発声領域を、図５に示すように４つの領域に区分けし、それぞれの領域に、前進１００、右折１０１、左折１０２、停止１０３という４種類の車椅子の動作を割り振る。ユーザは、希望する車いすの動作に相当する領域に口先など音源位置が入るように首など上半身を使って姿勢をつくる。そして、声に限らず、口笛、摩擦音など様々な音を発することにより、車椅子に対して動作を指示する。 (Control method of electric wheelchair)
A method for controlling the electric wheelchair using the utterance position estimating means by the parallel microphone array voice input device will be described below.
FIG. 4 is a flowchart for explaining the operation example. FIG. 5 is a layout example of the electric wheelchair control interface displayed on the display 31.
In this embodiment, the utterance area represented by the above formula 12 is divided into four areas as shown in FIG. 5, and four types of wheelchairs of forward 100, right turn 101, left turn 102, and stop 103 are divided into the respective areas. Allocate actions. The user creates a posture using the upper body such as the neck so that the position of the sound source such as the mouth is in an area corresponding to the desired movement of the wheelchair. Then, not only the voice but also various sounds such as whistle and friction sound are emitted to instruct the wheelchair to operate.

このユーザの発声に対して本発明装置は、以下の手順により電動車椅子の制御を行う。

ステップ１：８０
ユーザの発した音のＮフレーム（１ブロック）分のデータを、平行マイクロフォンアレイ音声入力装置から入力する。

ステップ２：８１
前述の発声位置推定手段によりユーザの発声位置（Ｐｘ０，Ｐｙ０）を求める。

ステップ３：８２
式１６の条件が満足されていれば発声ありと判断し、ステップ４へ進む。そうでない場合は発声なしと判断し、ステップ１へ戻る。

ステップ４：８３
発声位置（Ｐｘ０，Ｐｙ０）が図５のどの領域に対応するかを、以下の手順で調べる。
はじめに、２つの関数を次式のように定義する。

In response to the user's utterance, the device of the present invention controls the electric wheelchair according to the following procedure.

Step 1: 80
Data for N frames (one block) of the sound produced by the user is input from the parallel microphone array audio input device.

Step 2: 81
The utterance position (Px0, Py0) of the user is obtained by the aforementioned utterance position estimation means.

Step 3: 82
If the condition of Expression 16 is satisfied, it is determined that there is utterance, and the process proceeds to Step 4. Otherwise, it is determined that there is no utterance, and the process returns to Step 1.

Step 4: 83
The following procedure is used to check which region in FIG. 5 corresponds to the utterance position (Px0, Py0).
First, two functions are defined as follows:

次の条件を満たす場合は前進１００と判断する。

次の条件を満たす場合は右折１０１と判断する。

次の条件を満たす場合は左折１０２と判断する。

When the following condition is satisfied, it is determined that the vehicle is moving forward 100.

If the following condition is satisfied, it is determined that the turn 101 is right.

If the following condition is satisfied, it is determined that the turn 102 is a left turn.

次の条件を満たす場合は停止１０３と判断する。

ステップ５：８４
ディスプレイ３１に表示されている図５のレイアウトで、発声位置から特定された電動車椅子の動作に該当する領域の色を反転させることで、動作の特定結果を表示する。そして、ＣＰＵ６３から駆動制御６５に制御信号を送信することで、電動車椅子が目的の動作をするように制御する。その後、ステップ１へ戻る。

当該実施形態の電動車椅子は、ＣＰＵ６８と記憶装置６４で構成する本発明の制御手段から制御信号を駆動制御６５に送信する他に、ジョイスティック３４と緊急停止ボタン３２からなる操作スイッチ６６から直接駆動制御６５を制御することが可能である。 If the following condition is satisfied, the stop 103 is determined.

Step 5: 84
In the layout of FIG. 5 displayed on the display 31, the color of the area corresponding to the motion of the electric wheelchair identified from the utterance position is reversed to display the motion identification result. Then, by transmitting a control signal from the CPU 63 to the drive control 65, the electric wheelchair is controlled to perform the intended operation. Then, it returns to step 1.

The electric wheelchair of this embodiment directly controls driving from the operation switch 66 including the joystick 34 and the emergency stop button 32 in addition to transmitting a control signal to the drive control 65 from the control means of the present invention constituted by the CPU 68 and the storage device 64. 65 can be controlled.

平行マイクロフォンアレイを搭載した当該実施形態の電動車椅子のシートに座って、口先がマイクロフォンアレイ内で円を２周描くように移動しながら摩擦音を発し、その口先の動きを本発明装置で推定する実験を行った。発声位置の推定は、周波数を２〜４ｋＨｚに制限し、発声領域は

［ｃｍ］として１ｃｍ間隔のグリッド上で行った。ＦＦＴのフレーム幅は６４ｍｓ、フレーム周期は１２．５ｍｓ、相関行列はＮ＝１５フレーム分のデータから求めた。
図６は、周囲に妨害雑音が無い状態で発声位置推定を行った結果である。図示されている領域が実験条件の発声領域を示し、横軸と縦軸の交点が発声位置推定のグリッドを示している。また、グリッド上に置かれたドットが検出された発声位置を示している。
次に、進行方向右手60度の方向の、マイクロフォンアレイの中心から１．２ｍ離れた場所に、マイクロフォンアレイの中心方向に向けてスピーカを設置し、妨害音（テレビ音声）を流し、その妨害音だけを収録する。そして、先に妨害音がない状態で収録した摩擦音に、マイクロフォンで受音した信号のＳＮＲがほぼ０ｄＢとなるようにレベル調整した妨害音を、計算機上で加え合わせることで、妨害音のある雑音環境下で収録したデータを人工的に生成する。
図７は、雑音環境下のデータから発声位置推定をした結果である。図6の妨害雑音が無い状態で発声位置を検出した結果と、図７の妨害雑音がある状態で発声位置を推定した結果を比べると、いずれの結果も、周囲の妨害雑音の有無に係わらず、ユーザの意図したとおり、口先の移動した軌跡が円状になっていることがわかる。これらより、本発明装置は、電動車椅子を制御するのに十分な精度を持って発声位置を推定できていることがわかる。
図６および図７の縦軸および横軸はマイクロフォンアレイの先端からの距離（単位ｃｍ）を表す。 An experiment in which the apparatus of the present invention estimates the movement of the mouth while sitting on the seat of the electric wheelchair according to the embodiment equipped with the parallel microphone array, generating a frictional sound while moving the mouth to draw a circle in the microphone array. Went. The estimation of the utterance position limits the frequency to 2 to 4 kHz, and the utterance area is

[Cm] was performed on a grid with an interval of 1 cm. The frame width of FFT was 64 ms, the frame period was 12.5 ms, and the correlation matrix was obtained from data for N = 15 frames.
FIG. 6 shows the result of utterance position estimation in the absence of interference noise in the surroundings. The region shown in the figure represents the utterance region under the experimental conditions, and the intersection of the horizontal axis and the vertical axis represents the utterance position estimation grid. Moreover, the utterance position where the dot placed on the grid is detected is shown.
Next, a speaker is installed toward the center of the microphone array at a distance of 1.2 m from the center of the microphone array in the direction of 60 degrees on the right hand direction, and a disturbing sound (TV sound) is played. Only record. Then, by adding on the computer the interference sound level adjusted so that the SNR of the signal received by the microphone is almost 0 dB to the friction sound previously recorded without the interference sound, noise with interference sound is obtained. Generate artificially recorded data in the environment.
FIG. 7 shows the result of estimating the utterance position from data in a noisy environment. Comparing the result of detecting the utterance position in the absence of interference noise in FIG. 6 with the result of estimating the utterance position in the presence of interference noise in FIG. As shown by the user, it can be seen that the locus of movement of the mouth is circular. From these, it can be seen that the device of the present invention can estimate the utterance position with sufficient accuracy to control the electric wheelchair.
The vertical and horizontal axes in FIGS. 6 and 7 represent the distance (unit: cm) from the tip of the microphone array.

Industrial applicability

電動車椅子だけでなく、クレーン車やショベルカーなど雑音が大きく複雑な操作のため手が使えないような重機、また、リビングのソファーなどにマイクロフォンアレイを仕込むことで、テレビやビデオなど様々な家電機器を操作するインタフェースとしの応用など、その他様々な雑音があり更に振動を伴うような環境下で音・音声による操作を必要とする状況において利用可能である。 In addition to electric wheelchairs, cranes, excavators, and other heavy equipment that can't be used because of complicated operations, such as cranes, and various household appliances such as TVs and videos by installing a microphone array on a sofa in the living room It can be used in situations where there is a variety of other noises, such as application as an interface for manipulating the sound, and where sound and voice operations are required in an environment where vibrations are involved.

平行マイクロフォンアレイ音声入力装置を搭載した電動車椅子の概観である。1 is an overview of an electric wheelchair equipped with a parallel microphone array voice input device. 図１に示す電動車椅子の機能ブロック図である。It is a functional block diagram of the electric wheelchair shown in FIG. 平行マイクロフォンアレイ音声入力装置の配置図である。It is a layout view of a parallel microphone array audio input device. 動作例を説明するためのフローチャートである。It is a flowchart for demonstrating an operation example. 電動車椅子制御インタフェースのレイアウト例である。It is an example of a layout of an electric wheelchair control interface. 周囲に妨害雑音が無い状態で発声位置検出を行った結果である。This is the result of detecting the utterance position in the absence of interference noise in the surroundings. 周囲に妨害雑音がある状態で発声位置検出を行った結果である。This is a result of utterance position detection in the presence of interference noise in the surroundings.

Explanation of symbols

３０ａ、３０ｂマイクロフォンアレイ
３０ｃ、３０ｄマイクロフォン取付体
３１ディスプレイ
３２ジョイスティック
３３ａ、３３ｂ肘掛
３４緊急停止ボタン
３５ａ、３５ｂ前輪
３６ａ、３６ｂ後輪
３９調節バー
４０背もたれ
４１ａ、４１ｂ足置き
４２ａ、４２ｂ支柱
６１マイクロフォンとＡＤＣ
６３ＣＰＵ
６５駆動制御手段
６６操作スイッチ
６７駆動モータ
30a, 30b Microphone array 30c, 30d Microphone mounting body 31 Display 32 Joystick 33a, 33b Armrest 34 Emergency stop button 35a, 35b Front wheel 36a, 36b Rear wheel 39 Adjustment bar 40 Backrest 41a, 41b Footrest 42a, 42b Post 61 Microphone and ADC
63 CPU
65 Drive control means 66 Operation switch 67 Drive motor

Claims

Correlation matrix R (ω obtained from the position vector a (ω, P) of the sound source at coordinates P = (Px, Py, Pz) and the observation vector having short-time Fourier transform values of successive frames of the microphone input as elements. ) Noise subspace correlation matrix Rn (ω), and the following function F (P) is calculated:

The calculated position of the function F (P0) and the function P (P0) is compared with a predetermined threshold value, and if both are equal to or greater than the predetermined threshold value, it is determined that there is a pronunciation at the coordinate P0. Method.

The following equation is used as a function g (r) that simply depends on a sound source-microphone distance r when determining a position vector a (ω, P) of the sound source: The utterance position estimation method according to 1.

The utterance position estimation method according to claim 1, wherein the number S of sound sources necessary for obtaining the noise subspace correlation matrix Rn (ω) is obtained by the following equation.

The utterance position estimation method according to claim 1, wherein the coordinate P0 = (Px0, Py0, Pz0) is displayed as an image.

The calculated position of the function F (P0) and the function P (P0) is compared with a predetermined threshold value, and if both are equal to or greater than the predetermined threshold value, it is determined that there is a pronunciation at the coordinate P0. apparatus.

The following equation is used as a function g (r) that simply depends on a sound source-microphone distance r when determining a position vector a (ω, P) of the sound source: 6. The utterance position estimation device according to 5.

6. The utterance position estimation apparatus according to claim 5, wherein the number S of sound sources necessary for obtaining the noise subspace correlation matrix Rn (ω) is obtained by the following equation.

6. The utterance position estimating apparatus according to claim 5, wherein the coordinates P0 = (Px0, Py0, Pz0) are displayed as an image.

In order to receive a plurality of user voices in multi-channel, a voice input means having a plurality of sound receiving means arranged apart from each other, and a user's utterance position from the multi-channel voice data received by the sound receiving means An electric wheelchair comprising: an utterance position estimation unit that estimates and outputs an utterance position estimation signal; and a control unit that controls a wheel drive source based on the utterance position estimation signal.

The electric wheelchair according to claim 9, wherein the voice input unit includes microphones disposed on both sides of the user when the user sits on a seat.

The electric wheelchair according to claim 10, wherein the microphones are microphone arrays arranged in parallel.

Voice input means comprising sound receiving means composed of a plurality of microphone arrays arranged to be separated from each other for receiving user voice, and a user's utterance position based on multi-channel voice data received by the sound receiving means A speech position estimation means for estimating a speech position estimation signal, an auxiliary operation means for outputting an auxiliary operation signal by means of a coordinate position designation means and a stop button, and an image visually showing the speech position estimation signal and the state of the wheelchair 10. The apparatus according to claim 9, further comprising: display means; drive means for driving and controlling a wheel wheel drive source; and control means for controlling the drive means based on the utterance position estimation signal and the auxiliary operation signal. 11. The electric wheelchair according to any one of 11 above.