JPH0942961A

JPH0942961A - Position measuring device for voice input device

Info

Publication number: JPH0942961A
Application number: JP19201695A
Authority: JP
Inventors: Minoru Yagi; 稔八木; Kyoichi Okamoto; 恭一岡本; Yosuke Tajika; 陽介多鹿; Hisashi Kazama; 久風間
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1995-07-27
Filing date: 1995-07-27
Publication date: 1997-02-14

Abstract

PROBLEM TO BE SOLVED: To provide a position measuring device for voice input device, which can find the information on position and direction of a microphone from the image. SOLUTION: This position measuring device for voice input device is provided with an acoustic/electrical signal converting device 1 held by a supporting member, of which one end side is provided at a constant position, in the condition that a neck thereof can be freely oscillated and provided with a mark in the side peripheral surface near the tip thereof, an image pickup device 2 for photographing the mark of the acoustic/electrical signal converting device 1 from the a constant position, and a computing device 3 for obtaining the position of the mark from the image, which is photographed by the image pickup device, and for obtaining the direction of the acoustic/electrical signal converting device on the basis of this position of the mark.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は音声入力装置、例え
ば、マイクロホンの位置や方向の情報を画像中から取得
できるようにした音声入力装置の位置測定装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice input device, for example, a position measuring device for a voice input device capable of acquiring information on the position and direction of a microphone from an image.

【０００２】[0002]

【従来の技術】近年、マルチメディアが注目され、種々
の応用の試みが成されている。マルチメディアは静止画
や動画等の画像情報、データ、音声等の種々の情報を扱
うことができるが、その中でマルチメディア情報とし
て、画像や音声をコンピュータに入力し、利用したいと
いう要求が高まっている。特に、動画像中の人物の身振
り、手振りにより指示を与えることのできるようにする
指示入力手段や、取り込まれた音声をもとにしての声に
よる指示を与えることのできる指示入力手段は次世代の
情報機器に要求されるマンマシン・インターフェース技
術である。2. Description of the Related Art In recent years, attention has been paid to multimedia and various application attempts have been made. Although multimedia can handle various information such as image information such as still images and moving images, data, audio, etc., there is a growing demand for inputting images and audio into computers as multimedia information and using them. ing. In particular, next-generation instruction input means that can give an instruction by gesturing or gesturing a person in a moving image and instruction input means that can give an instruction by a voice based on the captured voice are the next generation. This is a man-machine interface technology required for information devices of

【０００３】ここで、声による指示の入力技術として
は、音声認識技術の応用として実用化に向け研究開発さ
れており、特定話者、不特定話者の音声の認識に効果を
上げてきた。次世代の情報機器はこのような音声認識技
術を導入したマンマシン・インターフェース技術を利用
し、発言の記録、指示の入力などの機能を得て、より高
度で、より自動化の進んだ、人との相性のよいマシンと
なることが期待されている。Here, as a voice instruction input technique, research and development have been carried out for practical use as an application of voice recognition technology, and it has been effective in recognizing voices of specific speakers and unspecified speakers. Next-generation information equipment uses man-machine interface technology that introduces such speech recognition technology, and obtains functions such as recording of comments and input of instructions, making it more advanced and more automated. It is expected to be a machine that is compatible with.

【０００４】また、会議での発言や、あるいは自動販売
機での購買時に、マイクロホンの方向を制御して、発言
者（発話者）や購買者の方向に自動的にマイクロホンを
向けてその発言者の音声や購買者の音声信号を効率良く
とらえるようにするといった制御に対する利用も考えら
れている。Further, when a speech is made at a conference or a purchase is made by a vending machine, the direction of the microphone is controlled so that the microphone is automatically pointed toward the speaker (speaker) or the purchaser. It is also considered to be used for control such as to efficiently capture the voice of the user and the voice signal of the purchaser.

【０００５】ところで、従来、音声入力手段としては、
装置に内蔵されたマイクロホンや、ケーブルにより接続
されたマイクロホンが多く用いられてきた。しかし、発
話者の声を抽出しようとしても、設置されているマイク
ロホンの近辺にいる他の人の声や、環境雑音といった背
景音をも収集してしまうと、これらがノイズとなり、記
録したい対象の発話者の音声や、指示を行っている発話
者の音声の抽出が困難になる。By the way, conventionally, as a voice input means,
A microphone built in the device and a microphone connected by a cable have been widely used. However, even if you try to extract the voice of the speaker, if you also collect the voices of other people in the vicinity of the installed microphone and background sounds such as environmental noise, these will become noise, and these will be the target of recording. It becomes difficult to extract the voice of the speaker or the voice of the speaker giving an instruction.

【０００６】そこで、このような場合、ノイズを抑える
必要が生じる。ノイズを抑えるための最も簡便な方法と
しては、音声入力手段に指向性の高いマイクロホンを用
い、それを発話者の口元に向けるようにすれば良い。そ
して、そのマイクロホンにて得られる音声のうち、発話
者の発話期間における音声出力を抽出すれば必要な音声
を比較的容易に選択抽出できることになる。Therefore, in such a case, it is necessary to suppress noise. The simplest method for suppressing noise is to use a microphone with high directivity as the voice input means and direct it toward the mouth of the speaker. Then, by extracting the voice output during the utterance period of the speaker from the voices obtained by the microphone, the necessary voice can be selected and extracted relatively easily.

【０００７】一方、会議等のように、マイクロホンが多
数あり、参加者も多数いるような場合に、発話者の概略
の位置が既知であれば、各マイクロホンからの入力に、
そのマイクロホンが向いている方向に居る発話者の発話
期間を当て嵌め、この期間内で得られる音声を選択的に
抽出したり、マイクロホンが向いている方向に居る発話
者の口元を見て、入力データとのタイミングをとること
で音声データの抽出を行うといった、マイクロホンの位
置や方向の情報を利用する技術が研究されている。しか
し、この作業は人手に頼るものである。そして、マイク
ロホンの位置や方向の測定を人手で行うことは繁雑であ
るため、ユーザに負担を強いることになる。On the other hand, in the case where there are many microphones and many participants, such as a conference, if the approximate position of the speaker is known, the input from each microphone is
Apply the utterance period of the speaker in the direction in which the microphone is facing, selectively extract the voice obtained within this period, or look at the mouth of the speaker in the direction in which the microphone is facing and input. Research is being made into technologies that utilize information about the position and direction of a microphone, such as extracting audio data by timing the data. However, this work requires manual labor. Since it is complicated to manually measure the position and direction of the microphone, it imposes a burden on the user.

【０００８】[0008]

【発明が解決しようとする課題】音声により指示を与え
たり、発言者の音声を記録したりする場合、その音声の
入力には一般にマイクロホンを使用する。そして、周囲
のノイズを抑えて目的の音声を得るようにするために、
音声入力手段に指向性の高いマイクロホンを用いるよう
にしている。When an instruction is given by voice or a voice of a speaker is recorded, a microphone is generally used to input the voice. Then, in order to suppress the surrounding noise and obtain the desired sound,
A microphone with high directivity is used as the voice input means.

【０００９】しかし、指向性の高いマイクロホンを用い
ても背景音の除去は十分ではなく、音声データの収集で
は防音室を利用するなどの背景音を抑える手段を使用す
る必要があった。しかし、情報機器を防音室で用いると
いう前提は現実的でないから、より簡便な方法で背景音
の除去、発話者の音声データの抽出を行うことができる
方法が求められる。However, even if a microphone having high directivity is used, the background sound is not sufficiently removed, and it is necessary to use a soundproof room or the like to suppress the background sound when collecting the audio data. However, since it is not realistic to use the information equipment in the soundproof room, a method that can remove the background sound and extract the voice data of the speaker by a simpler method is required.

【００１０】その例として、たとえば会議等のように、
発話者の概略の位置が既知でマイクロホンが多数ある場
合、各マイクロホンからの入力に、そのマイクロホンが
向いている方向に居る発話者の発話期間を当て嵌め、こ
の期間内で得られる音声を選択的に抽出したり、マイク
ロホンが向いている方向に居る発話者の口元を見て、入
力データとのタイミングをとることで音声データの抽出
を行うといった、マイクロホンの位置や方向の情報と発
話者の動向とを利用することが試みられている。しか
し、この作業は人手に頼るものである。そして、このよ
うな作業を人手で行うことは繁雑であるためユーザに負
担を強いることになる。[0010] As an example thereof, such as a conference,
When the approximate position of the speaker is known and there are many microphones, the utterance period of the speaker in the direction facing the microphone is applied to the input from each microphone, and the voice obtained within this period is selectively selected. Information on the position and direction of the microphone and the trend of the speaker, such as extracting the voice data by extracting the voice data by extracting the voice data by timing the input data and the voice of the speaker who is in the direction in which the microphone is facing. It has been attempted to utilize and. However, this work requires manual labor. Since it is complicated to perform such work manually, it imposes a burden on the user.

【００１１】この対応策として会議参加者の映像を得
て、この映像から発話者の口元の動きを解析して発話中
であるか否かを検知し、発話中であればその方向に向い
ているマイクロホンの出力を抽出するといったことを行
なったりすることができれば発話者の発話内容を自動的
に選択抽出することが可能である。As a countermeasure for this, a video of a conference participant is obtained, the movement of the speaker's mouth is analyzed from this video to detect whether or not the speaker is speaking, and if the speaker is speaking, turn to that direction. If it is possible to extract the output of the existing microphone, it is possible to automatically select and extract the utterance content of the speaker.

【００１２】あるいは、マイクロホンを使用する会議等
の場合には、発話者はマイクロホンに口元を近付けた
り、自身でマイクロホンを自己の方向に向けたりすると
いったことが、自然に行なわれるので、発話者の口元の
動きとその方向に向いているマイクロホンの出力とを関
連付けて抽出するといったことで、発話者の発話内容を
自動的に選択抽出することが可能である。Alternatively, in the case of a conference or the like using a microphone, it is natural for the speaker to bring his or her mouth close to the microphone and to orient the microphone in his or her own direction. It is possible to automatically select and extract the utterance content of the speaker by associating and extracting the movement of the mouth and the output of the microphone oriented in that direction.

【００１３】そのためには、マイクロホンの位置や方向
を自動測定する技術の開発が必要である。また、発言者
の方向にマイクロホンを自動的に向けたりするための制
御を行なうようにするにも、マイクロホンの位置や方向
を自動測定する技術が必要である。For that purpose, it is necessary to develop a technique for automatically measuring the position and direction of the microphone. Further, a technique for automatically measuring the position and direction of the microphone is also required to perform control for automatically pointing the microphone toward the speaker.

【００１４】そこで、この発明の目的とするところは、
マイクロホンが向いている方向や位置を自動測定するこ
とができるようにする音声入力装置の位置測定装置を提
供することにある。Therefore, the object of the present invention is to
It is an object of the present invention to provide a position measuring device for a voice input device that can automatically measure the direction and position of a microphone.

【００１５】[0015]

【課題を解決するための手段】前記目的を達成するた
め、本発明はつぎのように構成する。すなわち、一端側
を支持部材によって首振り自由に保持され、少なくとも
先端側近傍の側周面にマークが付与された音響‐電気信
号変換装置と、少なくともこの音響‐電気信号変換装置
の前記マークを撮影する撮像装置と、この撮像装置の撮
影画像から前記マークの位置を求めると共に、この求め
たマークの位置から音響‐電気信号変換装置の向きを求
める演算装置とを具備する。In order to achieve the above object, the present invention is configured as follows. That is, one end side is held by a supporting member so that it can be swung freely, and an acoustic-electrical signal conversion device in which a mark is provided on at least a side circumferential surface near the tip side, and at least the mark of the acoustical-electrical signal conversion device is photographed. And an arithmetic unit for determining the position of the mark from the image captured by the image capturing apparatus and for determining the orientation of the acoustic-electrical signal converter from the determined mark position.

【００１６】[0016]

【発明の実施の形態】本発明では一端側を定位置に設置
された支持部材によって首振り自由に保持され、少なく
とも先端側近傍の側周面にマークが付与されたマイクロ
ホンと、少なくともこのマイクロホンの前記マークを定
位置より撮影する撮像装置と、この撮像装置の撮影画像
から前記マークの領域を求め、該領域の重心を求めてそ
の重心位置を、このマークの前記撮影画像画面内での位
置とし、この位置情報と前記支持部材の設置位置とから
マイクロホンの向きを求める演算装置とを具備して構成
する。BEST MODE FOR CARRYING OUT THE INVENTION In the present invention, a microphone whose one end side is held by a support member installed at a fixed position so that it can be swung freely, and a mark is provided on at least a side peripheral surface near the tip side, and at least this microphone is provided. An image pickup device that takes an image of the mark from a fixed position, an area of the mark is obtained from an image taken by the image pickup device, the center of gravity of the region is obtained, and the position of the center of gravity is set as the position of the mark in the taken image screen. , And an arithmetic unit for obtaining the direction of the microphone from the position information and the installation position of the support member.

【００１７】そして、撮像装置により音響‐電気信号変
換装置であるマイクロホンの前記マークを撮影し、演算
装置によりこの撮像装置の撮影画像から前記マークの領
域を求め、該領域の重心を求めてその重心位置を、この
マークの前記撮影画像画面内での位置とし、この位置情
報と前記支持部材の設置位置とからマイクロホンの向き
を求める演算装置とを具備して構成する。Then, the mark of the microphone, which is an acoustic-electrical signal conversion device, is photographed by the image pickup device, the area of the mark is obtained from the picked-up image of the image pickup device by the arithmetic device, the center of gravity of the region is obtained, and its center of gravity is obtained. The position is defined as the position of this mark in the photographed image screen, and a computing device for determining the direction of the microphone from this position information and the installation position of the support member is provided.

【００１８】このように、音響‐電気信号変換装置の位
置と方向とを画像から求めることができるようになるの
で、測定のための機構的な要素を一切必要とせず、従っ
て、構成簡易で安価な音声入力装置の位置測定装置が実
現できる。また、音響‐電気信号変換装置の向きと位置
が測定できることから、音響‐電気信号変換装置（つま
り、マイクロホン等）を発話者の口元に向けるように自
動制御する場合の測定装置に応用できる。As described above, since the position and direction of the acoustic-electrical signal converter can be obtained from the image, no mechanical element for measurement is required, and therefore the structure is simple and the cost is low. It is possible to realize a position measuring device for various voice input devices. Further, since the orientation and position of the acoustic-electrical signal converting device can be measured, the present invention can be applied to a measuring device in the case of automatically controlling the acoustic-electrical signal converting device (that is, a microphone or the like) so as to be directed to the speaker's mouth.

【００１９】さらにまた、発話者と音響‐電気信号変換
装置とを撮像する構成とすれば、画像中に複数の人物が
写っている場合でも、音響‐電気信号変換装置がどの人
物の口元に向いているかがわかるから、発話者の口元の
動きと音響‐電気信号変換装置の出力とを連携させて音
声抽出制御する場合等において、発話者の特定を容易に
することができ、発話者の音声抽出制御を容易に実現す
ることが可能になる。特に発話者は、自己が発言する場
合に、マイクロホンを自己の口元に向けるようにマイク
ロホンの先端を自身で動かして調整するといった行動を
自然発生的に行なうので、この行動特性を先取りすれ
ば、マイクロホンの位置と向きを測定できることは、画
面中に大勢の人物が写っている場合でも、発話者が誰で
あるかを特定できることに繋がり、マルチメディアでの
新しい応用に道を拓くことができるようになる。Furthermore, if the speaker and the acoustic-electrical signal converting device are imaged, even if a plurality of persons are shown in the image, the acoustic-electrical signal converting device is suitable for the mouth of any person. It is possible to easily identify the speaker when the voice extraction control is performed by linking the movement of the speaker's mouth with the output of the acoustic-electrical signal conversion device. It becomes possible to easily realize the extraction control. In particular, when the speaker speaks spontaneously, he or she spontaneously takes actions such as moving the tip of the microphone to adjust it so that it points toward the user's mouth. The ability to measure the position and orientation of a person can identify who the speaker is, even when a large number of people are displayed on the screen, thus opening the way for new applications in multimedia. Become.

【００２０】なお、本発明では、マイクロホンの位置の
みが必要な場合は、位置だけ求める構成とすれば良く、
また、マイクロホンの方向のみが知りたければ方向の情
報だけを出力する構成とすれば良く、方向と位置の双方
を知りたければこれらの情報を全て出力する構成とすれ
ば良い。According to the present invention, if only the position of the microphone is required, it is sufficient to obtain only the position.
Further, if it is desired to know only the direction of the microphone, it may be configured to output only the direction information, and if it is desired to know both the direction and the position, all the information may be output.

【００２１】［より具体的な実施の形態１］以下、図面
を参照しながら本発明のより具体的な実施の形態の例に
ついて説明する。[More Specific Embodiment 1] An example of a more specific embodiment of the present invention will be described below with reference to the drawings.

【００２２】図１に本装置の要部構成をブロック図で示
す。本発明装置は、音声を取り込んで音声信号を得る例
えば、マイクロホンによる音響‐電気信号変換装置１
と、この音響‐電気信号変換装置１のを撮像して画像信
号化して出力する撮像装置２と、この撮像装置２により
得られた撮影画像からマイクロホンの位置と方向との少
なくとも一つを算出する演算処理部３とから構成され
る。FIG. 1 is a block diagram showing the main configuration of this apparatus. The device of the present invention captures a voice to obtain a voice signal, for example, an acoustic-electrical signal converter 1 using a microphone.
And an image pickup device 2 for picking up an image of the acoustic-electrical signal converting device 1, converting it into an image signal, and outputting the image signal, and at least one of the position and direction of the microphone is calculated from a picked-up image obtained by the image pickup device 2. It is composed of an arithmetic processing unit 3.

【００２３】音響‐電気信号変換装置１は上述の例の場
合、マイクロホンを用いた構成であり、このマイクロホ
ンにて音声を音声信号として取り込む装置である。音響
‐電気信号変換装置１はこの例の場合、棒状もしくは円
柱状のマイクロホンであり、以下、この例では音響‐電
気信号変換装置１をマイクロホン本体３２と呼ぶことに
する。In the case of the above-mentioned example, the acoustic-electrical signal converting device 1 has a structure using a microphone, and is a device for taking in a sound as a sound signal with this microphone. In the case of this example, the acoustic-electrical signal converter 1 is a rod-shaped or columnar microphone, and in the following, the acoustic-electrical signal converter 1 is referred to as a microphone body 32.

【００２４】図３に示すように、台３１上に棒状のマイ
クロホン本体３２が首振り指示機構３３により首振り自
在に保持されており、マイクロホン本体３２にはその側
周面の適宜なる位置に、図４に示すようにマーク３４が
付されている。このマイクロホン本体３２は支持台とな
る台３１上に取り付けられることによって、この台３１
を机上や演壇に置けば発言者の口元にマイクロホンを向
けて配置させることができる構成である。As shown in FIG. 3, a rod-shaped microphone main body 32 is rotatably held by a swinging instruction mechanism 33 on a base 31, and the microphone main body 32 has its side peripheral surface at an appropriate position. A mark 34 is attached as shown in FIG. The microphone main body 32 is mounted on the base 31 which serves as a support base, so that the base 31
If you place it on the desk or podium, the microphone can be placed at the speaker's mouth.

【００２５】また、撮像装置２はこのマイクロホンを撮
像して逐次、画像信号として出力するための装置であ
り、例えば、ＴＶカメラであったり、あるいはＣＣＤ
（固体撮像デバイス）を利用した電子スチルカメラなど
の撮像装置を使用している。The image pickup device 2 is a device for picking up an image of the microphone and sequentially outputting it as an image signal, for example, a TV camera or a CCD.
An image pickup apparatus such as an electronic still camera using (solid-state image pickup device) is used.

【００２６】この撮像装置２が図３に示す符号３５を付
した要素であり、図３に示すように、この撮像装置３５
は台３１上に取り付けられ、台３１上のマイクロホン本
体３２を撮像してその画像を画像信号として出力する構
成としてある。The image pickup device 2 is an element designated by the reference numeral 35 shown in FIG. 3, and as shown in FIG.
Is mounted on the base 31, and the microphone body 32 on the base 31 is imaged and the image is output as an image signal.

【００２７】また、演算処理部３はこの撮像装置２（図
３では撮像装置３５が相当する）が撮像して出力する画
像信号からマイクロホンの位置と方向を算出する機能を
持つ。具体的には、画像信号にて得た画像の画素データ
から、各画素データでの輝度や色の情報をもとに、マー
ク３４の領域を求め、その重心を求めて画像中のマーク
位置とする。そして、マイクロホンの設置位置が既知で
あるとして、その位置とマーク位置とから方向を求め
る。Further, the arithmetic processing section 3 has a function of calculating the position and direction of the microphone from the image signal which the image pickup device 2 (corresponding to the image pickup device 35 in FIG. 3) picks up and outputs. Specifically, from the pixel data of the image obtained by the image signal, the area of the mark 34 is obtained based on the information on the brightness and color in each pixel data, and the center of gravity thereof is obtained to determine the mark position in the image. To do. Then, assuming that the installation position of the microphone is known, the direction is obtained from the position and the mark position.

【００２８】図２に本発明装置の全体の処理の流れを示
す。台３１に設けられた撮像装置３５（図１の撮像装置
２に相当する）は台３１上のマイクロホン本体３２を撮
影する（図２のステップａ−１）。この撮像装置３５の
画像信号は演算処理部３に入力され（図２のステップａ
−２）、この演算処理部３は入力された撮影画像から画
像処理を行なうことによってマイクロホン本体３２の側
周面に設けられたマーク３４の像を検出する（図２のス
テップａ−３）。そして、演算処理部３は検出したマー
ク３４の像からマイクロホン本体３２の方向、位置を算
出する（図２のステップａ−４（マイク属性算出処
理））。FIG. 2 shows the overall processing flow of the device of the present invention. The image pickup device 35 (corresponding to the image pickup device 2 in FIG. 1) provided on the stand 31 photographs the microphone body 32 on the stand 31 (step a-1 in FIG. 2). The image signal of the image pickup device 35 is input to the arithmetic processing unit 3 (step a in FIG. 2).
-2), the arithmetic processing unit 3 detects an image of the mark 34 provided on the side peripheral surface of the microphone body 32 by performing image processing from the input captured image (step a-3 in FIG. 2). Then, the arithmetic processing unit 3 calculates the direction and position of the microphone body 32 from the detected image of the mark 34 (step a-4 in FIG. 2 (microphone attribute calculation process)).

【００２９】図３に示すように、本装置、または本装置
を有する情報機器上で方向を変えられるように棒状のマ
イクロホンを設置している場合の例について、マイクロ
ホン本体３２にマーク３４を付与する方法と、演算処理
部３による撮影画像からマーク３４の像を検出する処理
方法とを説明する。As shown in FIG. 3, a mark 34 is provided on the microphone body 32 in an example in which a rod-shaped microphone is installed so that the direction can be changed on the present apparatus or the information equipment including the apparatus. The method and the processing method for detecting the image of the mark 34 from the captured image by the arithmetic processing unit 3 will be described.

【００３０】本発明では、例えば、マーク３４としてＬ
ＥＤ（発光素子）などの発光体を用いるようにする。そ
の取り付け位置は、マイクロホン本体３２の先端部近傍
の側周面であり、撮像装置３５にてマイクロホン本体３
２の先端部と共に撮像できる位置である。In the present invention, for example, L is used as the mark 34.
A light emitting body such as an ED (light emitting element) is used. The mounting position is on the side peripheral surface near the tip of the microphone body 32, and the microphone body 3 is attached to the microphone body 3 by the imaging device 35.
It is a position where an image can be picked up together with the tip of 2.

【００３１】この結果、図４のようにマイクロホン本体
３２の先端部近傍の側周面にＬＥＤなどの発光体による
マーク３４を付与することができるが、このマーク３４
は一つまたは複数である。そして、使用時には、このマ
イクロホン本体３２のマーク３４を発光させる。この結
果、マーク３４は発光するので、この部分は撮影画像中
では高輝度の点または領域として得られることになる。As a result, as shown in FIG. 4, a mark 34 made of a light emitting body such as an LED can be provided on the side peripheral surface near the tip of the microphone body 32.
Is one or more. Then, at the time of use, the mark 34 of the microphone body 32 is caused to emit light. As a result, since the mark 34 emits light, this portion is obtained as a high-intensity point or area in the captured image.

【００３２】故に撮影画像から高輝度の点または領域の
画素を抽出することにより、マーク３４を検出すること
ができ、画像中のマーク３４の位置を認識することがで
きる。Therefore, the mark 34 can be detected and the position of the mark 34 in the image can be recognized by extracting the pixel of the high-intensity point or region from the photographed image.

【００３３】（画像がモノクローム画像の場合のマーク
位置の検出方法）図５に画像が濃淡画像である場合の高
輝度の点または領域の画素抽出の方法の例を示す。(Method of Detecting Mark Position when Image is Monochrome Image) FIG. 5 shows an example of a method of extracting pixels of a high brightness point or area when the image is a grayscale image.

【００３４】撮像装置３５がモノクロームの装置であれ
ば、台３１上の撮像装置３５からはモノクロームの撮影
画像が演算処理部３に入力されることになる。撮影画像
が演算処理部３に入力されると（図５のステップｂ−
１）、演算処理部３はこの入力された撮影画像について
その画素毎の輝度値を求め、当該画素毎の輝度値のヒス
トグラムを求める（図５のステップｂ−２）。こうして
得られたヒストグラムの模式図を図６に示す。図６にお
いて、横軸が画素値（輝度値）であり、縦軸が各輝度値
毎の画素数の累計である。If the image pickup device 35 is a monochrome device, a monochrome photographed image is input to the arithmetic processing section 3 from the image pickup device 35 on the table 31. When the captured image is input to the arithmetic processing unit 3 (step b-
1), the arithmetic processing unit 3 obtains a luminance value for each pixel of the input captured image, and obtains a histogram of the luminance value for each pixel (step b-2 in FIG. 5). A schematic diagram of the histogram thus obtained is shown in FIG. In FIG. 6, the horizontal axis represents the pixel value (luminance value), and the vertical axis represents the cumulative number of pixels for each luminance value.

【００３５】図６の如きヒストグラムが求められたなら
ば、演算処理部３はつぎにこのヒストグラムについてス
ムージング処理を行う（ステップｂ−３）。その結果、
図６の如きヒストグラムは、模式図である図７に示すよ
うなものとなる。そして、図７に示す如きヒストグラム
のスムージング処理結果から、ヒストグラムの極小の輝
度値を求める（ステップｂ−４）。When the histogram as shown in FIG. 6 is obtained, the arithmetic processing section 3 then carries out a smoothing process on this histogram (step b-3). as a result,
The histogram as shown in FIG. 6 is as shown in FIG. 7, which is a schematic diagram. Then, the minimum brightness value of the histogram is obtained from the smoothing processing result of the histogram as shown in FIG. 7 (step b-4).

【００３６】つぎに演算処理部３は各極小の輝度値を境
界としてヒストグラムの領域を定め、各領域にラベルを
つける（ステップｂ−５）。そして、つぎに演算処理部
３は最大輝度の領域に該当する画素を画像から抽出する
（ステップｂ−６）。つぎに演算処理部３はこれらの抽
出した画素で互いに隣接する画素を集め、画像中で同一
ラベルの小領域を抽出する（ステップｂ−７）。そし
て、これより各小領域の面積を求める（ステップｂ−
８）。Next, the arithmetic processing unit 3 defines a region of the histogram with each minimum brightness value as a boundary, and labels each region (step b-5). Then, the arithmetic processing unit 3 extracts pixels corresponding to the area of maximum brightness from the image (step b-6). Next, the arithmetic processing unit 3 collects pixels adjacent to each other among these extracted pixels and extracts a small area having the same label in the image (step b-7). Then, the area of each small region is obtained from this (step b-
8).

【００３７】各小領域の面積を求めたならば、つぎに演
算処理部３はこれらの小領域から、予め決めておいた面
積値域内に面積がある小領域を選択し、これをマークと
する（ステップｂ−９）。これでマークが定まるので、
つぎに演算処理部３は当該小領域の重心を求め、この重
心の位置をマークの位置とする（ステップｂ−１０）。After the area of each small area is obtained, the arithmetic processing unit 3 then selects a small area having an area within a predetermined area value range from these small areas and sets it as a mark. (Step b-9). Now that the mark is set,
Next, the arithmetic processing unit 3 obtains the center of gravity of the small area, and sets the position of this center of gravity as the position of the mark (step b-10).

【００３８】このような処理の結果、演算処理部３は画
像中からマイクロホン本体３２のマークの位置を求める
ことができる。以上は、モノクローム画像からマイクロ
ホン本体３２のマークの位置を求める場合の手法である
が、画像がカラー画像の場合はつぎのようにする。As a result of such processing, the arithmetic processing section 3 can obtain the position of the mark of the microphone body 32 from the image. The above is the method for obtaining the position of the mark of the microphone body 32 from the monochrome image. However, when the image is a color image, the following is performed.

【００３９】（画像がカラー画像の場合のマーク位置の
検出方法）撮影画像がカラー画像、発光体が色を持つも
のである場合について、撮影画像からマークを検出する
方法を図８に示す処理の流れに従って説明する。なお、
本実施例では、各画素の色を表すために色相、彩度、明
度を用いる。(Method of Detecting Mark Position when Image is Color Image) When the photographed image is a color image and the light emitter has a color, a method of detecting marks from the photographed image is shown in FIG. Describe according to the flow. In addition,
In this embodiment, hue, saturation, and lightness are used to represent the color of each pixel.

【００４０】撮像装置３５がカラーの装置であれば、台
３１上の撮像装置３５からはカラーの撮影画像が演算処
理部３に入力されることになる。そして、カラーの撮影
画像が演算処理部３に入力されると（図８のステップｃ
１）、演算処理部３は撮影画像の各画素の色相のヒスト
グラムを求める（図８のステップｃ−２）。ヒストグラ
ムが求められたならば、つぎに演算処理部３はヒストグ
ラムのスムージングを行う（図８のステップｃ−３）。If the image pickup device 35 is a color device, a color photographed image is input from the image pickup device 35 on the base 31 to the arithmetic processing section 3. When the color photographed image is input to the arithmetic processing unit 3 (step c in FIG. 8).
1), the arithmetic processing unit 3 obtains a hue histogram of each pixel of the captured image (step c-2 in FIG. 8). When the histogram is obtained, the arithmetic processing unit 3 then smooths the histogram (step c-3 in FIG. 8).

【００４１】そして、演算処理部３はヒストグラムのス
ムージング処理結果から、ヒストグラムの極小の輝度値
を求める（図８のステップｃ−４）。つぎに演算処理部
３は各極小の輝度値を境界としてヒストグラムの領域を
定め、発光体の色相を含む領域を選択する（図８のステ
ップｃ−５）。以下、この選択された領域を選択領域と
呼ぶ。Then, the arithmetic processing section 3 obtains the minimum luminance value of the histogram from the smoothing processing result of the histogram (step c-4 in FIG. 8). Next, the arithmetic processing unit 3 defines an area of the histogram with each minimum brightness value as a boundary, and selects an area including the hue of the light emitter (step c-5 in FIG. 8). Hereinafter, this selected area is referred to as a selected area.

【００４２】選択領域が求められると、つぎに演算処理
部３は選択領域に該当する画素を画像から抽出する（図
８のステップｃ−６）。そしてつぎにこれら抽出した画
素で互いに隣接する画素を集め、小領域を作成する（図
８のステップｃ−７）。これが終わると、つぎに演算処
理部３は各小領域の面積を求める（図８のステップｃ−
８）。そして、これらの小領域から、予め決めておいた
面積値域内に面積がある小領域を選択し、これをマーク
とする（図８のステップｃ−９）。そして、つぎに演算
処理部３は、小領域の重心を求め、この重心をマークの
位置とする（図８のステップｃ−１０）。When the selected area is obtained, the arithmetic processing unit 3 then extracts the pixels corresponding to the selected area from the image (step c-6 in FIG. 8). Then, these extracted pixels are combined with each other to form a small area (step c-7 in FIG. 8). When this is finished, the arithmetic processing unit 3 then obtains the area of each small region (step c- in FIG. 8).
8). Then, from these small areas, a small area having an area within a predetermined area value range is selected and used as a mark (step c-9 in FIG. 8). Then, the arithmetic processing unit 3 obtains the center of gravity of the small area and sets this center of gravity as the position of the mark (step c-10 in FIG. 8).

【００４３】なお、前記処理手順において、図８のステ
ップｃ−５での処理後に、選択領域に該当する画素から
予め設定しておいた閾値以上の画素のみを抽出するステ
ップを付加しても良い。また、撮影画像がＲＧＢ値で表
されている時、ＲＧＢ値から色相を算出して用いても良
い。また、カラー画像を扱う場合にはマークは発光体で
はなく、マイクロホン本体とは異なる色で点や領域を形
成し、これをマークとして用いるようにしても良い。In the processing procedure, after the processing in step c-5 of FIG. 8, a step of extracting only pixels having a preset threshold value or more from the pixels corresponding to the selected area may be added. . Further, when the captured image is represented by RGB values, the hue may be calculated from the RGB values and used. In the case of handling a color image, the mark is not a light-emitting body, but dots or regions may be formed in a color different from that of the microphone body, and this may be used as the mark.

【００４４】マイクロホン本体とは異なる色で点や領域
を形成して、これをマークとして用いる場合の処理例を
つぎに説明する。この場合、撮像装置３５からのカラー
の撮影画像を演算処理部３に入力すると、演算処理部３
は各画素の輝度値が撮影画像の同じ位置にある画素のＲ
ＧＢ値を持つような三色画像（Ｒ（赤）画像、Ｇ（緑）
画像、Ｂ（青）画像）を生成する（図９のステップｄ−
１）。An example of processing when dots or areas are formed in a color different from that of the microphone body and are used as marks will be described below. In this case, when the color captured image from the imaging device 35 is input to the arithmetic processing unit 3, the arithmetic processing unit 3
Is the R of the pixel whose luminance value is at the same position in the captured image.
Three color image with GB value (R (red) image, G (green)
Image, B (blue) image) (step d- in FIG. 9)
1).

【００４５】つぎに演算処理部３は、この生成した三色
画像からマーク３４の色に最も近い成分を持つ画像を選
び（以下、これを対象画像と呼ぶ。）、この対象画像の
輝度値のヒストグラムを求める（図９のステップｄ−
２）。つぎに演算処理部３は、この求めたヒストグラム
についてスムージング処理を行う（図９のステップｄ−
３）。Next, the arithmetic processing unit 3 selects an image having a component closest to the color of the mark 34 from the generated three-color image (hereinafter referred to as a target image), and determines the brightness value of the target image. Obtain a histogram (step d- in FIG. 9)
2). Next, the arithmetic processing unit 3 performs smoothing processing on the obtained histogram (step d- in FIG. 9).
3).

【００４６】ヒストグラムのスムージング処理が終わる
と、つぎに演算処理部３は、このスムージング処理結果
から、ヒストグラムの極小の輝度値を求める（図９のス
テップｄ−４）。そして、各極小の輝度値を境界として
ヒストグラムの領域を定め、各領域にラベルをつける
（図９のステップｄ−５）。When the histogram smoothing process is completed, the arithmetic processing unit 3 then obtains the minimum luminance value of the histogram from the result of this smoothing process (step d-4 in FIG. 9). Then, the area of the histogram is defined with each minimum brightness value as a boundary, and each area is labeled (step d-5 in FIG. 9).

【００４７】これが終わると、つぎに演算処理部３は、
最大輝度の領域に該当する画素を画像から抽出する（図
９のステップｄ−６）。つぎに演算処理部３は、抽出し
た画素で互いに隣接する画素を集め、画像中で同一ラベ
ルの小領域を抽出する（図９のステップｄ−７）。これ
が終わると、つぎに演算処理部３は、各小領域の面積を
求める（図９のステップｄ−８）。そして、つぎに演算
処理部３は、これらの小領域から、予め決めておいた面
積値域内に面積がある小領域を選択し、これをマークと
する（図９のステップｄ−９）。そして、小領域の重心
を求めてマークの位置とする（図９のステップｄ−１
０）。When this is finished, the arithmetic processing unit 3 then
Pixels corresponding to the area of maximum brightness are extracted from the image (step d-6 in FIG. 9). Next, the arithmetic processing unit 3 collects pixels that are adjacent to each other among the extracted pixels, and extracts a small area having the same label in the image (step d-7 in FIG. 9). When this is finished, the arithmetic processing unit 3 then obtains the area of each small region (step d-8 in FIG. 9). Then, the arithmetic processing unit 3 selects a small area having an area within a predetermined area value range from these small areas, and sets this as a mark (step d-9 in FIG. 9). Then, the center of gravity of the small area is obtained and set as the position of the mark (step d-1 in FIG. 9).
0).

【００４８】これによりカラー画像からマークの位置を
求めることができる。なお、以上の説明ではＲＧＢ値を
用いたが、色を表すためにＲＧＢ値以外にＨＳＶ値やＹ
ＩＱ値などがあるので、これらを用いるようにしても良
い。また、発光体ではなく、マイクロホンとは異なる色
に彩色した領域や、あるいは点をマークとして付与して
も良い。As a result, the position of the mark can be obtained from the color image. Although the RGB values are used in the above description, the HSV value and Y
Since there are IQ values and the like, these may be used. Further, instead of the light emitting body, a region colored in a color different from that of the microphone or a point may be added as a mark.

【００４９】このようにして、台上の撮像装置から、同
じ台上のマイクロホンの画像を得て、この画像からマイ
クロホンの位置を求めることができた。台上のマイクロ
ホンの基部にある接続部分の位置は既知であるから、マ
ーク位置を検出した後に、マイクロホンの基部にある接
続部分の位置と、検出されたマークとを用いて演算処理
部３は、撮影面上でのマイクロホンの方向を求める。そ
して、これにより、台上のマイクロホンの位置と方向を
求めることができる。In this way, the image of the microphone on the same table was obtained from the image pickup device on the table, and the position of the microphone could be obtained from this image. Since the position of the connecting portion at the base of the microphone on the table is known, the arithmetic processing unit 3 uses the position of the connecting portion at the base of the microphone and the detected mark after detecting the mark position. Find the direction of the microphone on the shooting surface. Then, by this, the position and direction of the microphone on the table can be obtained.

【００５０】本装置では、マイクロホンの位置と方向は
画像から求める構成であるから、測定のための機構的な
要素を一切必要とせず、従って、構成簡易で安価な音声
入力装置の位置測定装置が実現できる。Since the position and direction of the microphone are obtained from the image in this device, no mechanical element for measurement is required. Therefore, the position measuring device of the voice input device is simple and inexpensive. realizable.

【００５１】なお、台上の撮像装置３５は、フォーカス
を自動可変できる構成とし、撮像装置３５のフォーカス
を変化させて同じ台上のマイクロホンの撮影を行い、こ
の画像から検出されたマークに対応する小領域の面積が
最小になるフォーカス位置を用いてマークまでの距離を
求め、マークの三次元空間での位置を求めるようにする
構成とすることも可能である。The image pickup device 35 on the table has a structure in which the focus can be automatically changed, and the focus of the image pickup device 35 is changed to photograph the microphone on the same table, which corresponds to the mark detected from this image. It is also possible to use a focus position that minimizes the area of the small region to find the distance to the mark and to find the position of the mark in the three-dimensional space.

【００５２】［より具体的な実施の形態２］以上の例
は、台上に棒状（あるいは柱状）のマイクロホンを設置
した場合でのマイクロホンの方向と位置の求め方を示し
た。しかし、マイクロホンは棒状（あるいは柱状）のも
のばかりでなく、フレキシブルアーム先端部に小さいマ
イクロホン本体を取り付けた構造のものもある。そこ
で、つぎにこれについて説明する。[More Specific Embodiment 2] The above example shows how to determine the direction and position of the microphone when a rod-shaped (or columnar) microphone is installed on the table. However, microphones are not limited to rod-shaped (or columnar) microphones, and there are structures in which a small microphone body is attached to the tip of a flexible arm. Therefore, this will be described next.

【００５３】（フレキシブルアーム先端部支持構造のマ
イクロホンの方向と位置の求め方）図１０に示すよう
に、台５１上に一端側を固定されたフレキシブルアーム
５２の先端部自由端側に図１の音響‐電気信号変換装置
１に相当するマイクロホン本体５３を取り付けている場
合の例について、マイクロホンの方向と位置とを求める
方法を説明する。(How to Obtain Direction and Position of Microphone of Flexible Arm Tip Supporting Structure) As shown in FIG. 10, the flexible arm 52 fixed at one end on the base 51 has the tip of the free end shown in FIG. A method of obtaining the direction and position of the microphone will be described with respect to an example in which the microphone body 53 corresponding to the acoustic-electrical signal converter 1 is attached.

【００５４】フレキシブルアーム５２とは、金属等で作
られた剛性と柔軟性を持つパイプ状のアームであり、手
で自由に変形することができる部材で、古くからマイク
スタンドや照明用電気スタンドなどに用いられている支
持部材である。The flexible arm 52 is a pipe-shaped arm made of metal or the like having rigidity and flexibility, and is a member that can be freely deformed by hand, and has been used for a long time such as a microphone stand or a lighting desk lamp. Is a support member used in.

【００５５】本発明では、マイクロホン本体５３の支持
部がフレキシブルアーム５２の場合でも、マークを用い
る。すなわち、マイクロホン本体５３の先端部側近傍周
面と基部側の周面に、図１１のように、それぞれＬＥＤ
などの発光体を二つ以上付与し、これらをマーク５４と
する。In the present invention, the mark is used even when the support portion of the microphone body 53 is the flexible arm 52. That is, as shown in FIG. 11, LEDs on the peripheral surface of the microphone main body 53 near the tip side and the peripheral surface of the base side, respectively.
Two or more illuminants such as the above are provided, and these are used as marks 54.

【００５６】各マーク５４はマイクロホン本体５３の同
一面側に付してあり、台５１上にはこれらの各マーク５
４を含めマイクロホン本体５３を撮像できるような位置
関係を以て、図１の撮像装置２に相当する撮像装置３５
を図１０の如く取り付けてある。The marks 54 are provided on the same surface side of the microphone body 53, and the marks 5 are provided on the base 51.
The image pickup device 35 corresponding to the image pickup device 2 in FIG. 1 has a positional relationship such that the microphone body 53 including 4 can be picked up.
Is attached as shown in FIG.

【００５７】このような構成において、通電させる等し
て各マーク５４を発光させ、この状態で撮像装置３５に
より、台５１上のマイクロホン本体５３を撮影する。こ
の撮像装置３５の画像信号は演算処理部３に入力され、
この演算処理部３は入力された画像信号による撮影画像
から画像処理を行なうことによってマイクロホン本体５
３の側周面に設けられたマーク５４の像を検出する。In such a structure, each mark 54 is caused to emit light by energizing it, and in this state, the image pickup device 35 photographs the microphone body 53 on the base 51. The image signal of the imaging device 35 is input to the arithmetic processing unit 3,
The arithmetic processing unit 3 performs image processing on a captured image based on the input image signal to generate a microphone main body 5
The image of the mark 54 provided on the side peripheral surface of No. 3 is detected.

【００５８】ここまでの処理は上述の棒状マイクロホン
の例で用いた処理手法（［より具体的な実施の形態１］
での処理手法）と同様の手法である。撮影画像からマー
ク５４を検出したならば、つぎに演算処理部３はそのマ
ーク５４の重心を求め、画像上で検出された二つ以上の
マーク５４の重心の点を順に結ぶことにより、撮影面上
でのマイクロホン本体５３の方向と位置を求める。The processing up to this point is the processing method used in the example of the above-mentioned rod-shaped microphone ([more specific embodiment 1]).
Processing method). When the mark 54 is detected from the photographed image, the arithmetic processing unit 3 then finds the center of gravity of the mark 54, and connects the points of the center of gravity of two or more marks 54 detected on the image in order to obtain the photographed surface. The direction and position of the microphone body 53 above are obtained.

【００５９】なお、棒状マイクロホンの場合の例と同様
に、撮像装置（カメラ）のフォーカスを変化させて撮影
し、三次元空間での各マークの位置、マイクロホンの方
向を求めるようにすることもできる。また、各マークの
輝度値や色をそれぞれ異なる値や色にし、その順序を用
いてマイクロホンの向きを求めるようにすることもでき
る。As in the case of the rod-shaped microphone, it is also possible to change the focus of the image pickup device (camera) and take an image to obtain the position of each mark in the three-dimensional space and the direction of the microphone. . It is also possible to set the brightness value or color of each mark to a different value or color, and use the order to determine the direction of the microphone.

【００６０】以上は、フレキシブルアームに支持された
マイクロホン本体にマークを付し、このマークの像を含
むマイクロホンの像を１台の撮像装置で撮像してその画
像中から、マイクロホンの位置や向きを求めるようにし
たものであった。In the above, a mark is attached to the microphone body supported by the flexible arm, the image of the microphone including the image of the mark is picked up by one image pickup device, and the position and direction of the microphone are determined from the image. It was something that I asked for.

【００６１】しかし、マークの像を含むマイクロホンの
像を１台の撮像装置で撮像して得た像をもとにするに
は、求めた位置や方向の精度に問題が残る場合もある。
そのような場合には２台の撮像装置で撮像して得た像を
もとに、マイクロホンの位置や向きを求めるようにする
と、立体画像（三次元画像）をもとにした位置と方向を
決めることができるので、精度が確保できるようにな
る。その例をより具体的な実施の形態３としてつぎに説
明する。However, when the image of the microphone including the image of the mark is picked up by one image pickup device, there may be a problem in the accuracy of the obtained position and direction.
In such a case, if the position and orientation of the microphone are determined based on the images obtained by the two image capturing devices, the position and orientation based on the stereoscopic image (three-dimensional image) can be obtained. Since it can be decided, accuracy can be secured. An example thereof will be described below as a more specific embodiment 3.

【００６２】［より具体的な実施の形態３］（立体画像をもとにしたマイクロホンの位置と方向を求
める例）図１２に示すように、カメラを複数台配置した
場合の例について、マイクロホンの三次元空間中での位
置と方向を求める方法を説明する。[More Specific Embodiment 3] (Example of Obtaining Position and Direction of Microphone Based on Stereoscopic Image) As shown in FIG. 12, a case where a plurality of cameras are arranged will be described. A method of obtaining the position and direction in the three-dimensional space will be described.

【００６３】前記の棒状マイクロホンやフレキシブルア
ームを用いた例のように、マーク３４，５４を付与した
マイクロホン本体３２，５３を台３１，５１上に取り付
け、また、これも台３１，５１上に、しかも、相互の位
置及び方向が既知である複数台の撮像装置３５ａ，３５
ｂを設置して、マイクロホン本体３２，５３をこれら複
数台の撮像装置３５ａ，３５ｂで撮影する構成とする。As in the example using the above-mentioned rod-shaped microphone or flexible arm, the microphone bodies 32 and 53 provided with the marks 34 and 54 are mounted on the bases 31 and 51, and these are also mounted on the bases 31 and 51. Moreover, a plurality of image pickup devices 35a, 35 whose mutual positions and directions are known
b is installed and the microphone bodies 32 and 53 are imaged by the plurality of imaging devices 35a and 35b.

【００６４】そして、各撮像装置３５ａ，３５ｂからの
映像信号による画像中から、それぞれマイクロホン本体
３２，５３上のマーク３４，５４を検出した後に、各撮
像装置３５ａ，３５ｂで撮影した画像中のマーク間で輝
度値や色の類似しているマークを対応付け、ステレオ法
により三次元空間中での各マークの位置を求める。そし
て、得られたマークの位置を用いて、マイクロホン本体
の方向及び位置をつぎのようにして求める。Then, after the marks 34 and 54 on the microphone bodies 32 and 53 are detected from the images obtained by the video signals from the image pickup devices 35a and 35b, respectively, the marks in the images taken by the image pickup devices 35a and 35b are detected. The marks having similar brightness values and colors are associated with each other, and the position of each mark in the three-dimensional space is obtained by the stereo method. Then, using the position of the obtained mark, the direction and position of the microphone body are obtained as follows.

【００６５】[0065]

【数１】 [Equation 1]

【００６６】これにより、マイクロホン本体の方向や位
置を三次元空間上で特定することができるようになる。
なお、図１における音響‐電気信号変換装置１に該当す
るマイクロホン本体３２，５３に付与した発光体や色
点、色領域といったマーク３４，５４の形状は図４や図
１１に示したもの以外に、図１３に示す如き構成のよう
に細長くしたり、図１４や図１５に示す構成のように方
向を示す形状にしても良い。This makes it possible to specify the direction and position of the microphone body in the three-dimensional space.
The shapes of the marks 34 and 54 such as light emitters, color points, and color regions provided on the microphone bodies 32 and 53 corresponding to the acoustic-electrical signal converter 1 in FIG. 1 are other than those shown in FIGS. 4 and 11. The configuration may be elongated as in the configuration shown in FIG. 13, or may be shaped to indicate the direction as in the configuration shown in FIGS. 14 and 15.

【００６７】また、図１における撮像装置１に該当する
撮像装置３５の配置は図３や図１０に示した配置例以外
にも、図１６に示すように、マイクロホン本体３２，５
３を上方から撮影するように配置したり、図１７のよう
にマイクロホン本体３２，５３とともに、略マイクロホ
ン付近に居る発話者をも同時に撮影できるような配置に
しても良い。Further, the arrangement of the image pickup device 35 corresponding to the image pickup device 1 in FIG. 1 is not limited to the arrangement examples shown in FIGS. 3 and 10, and as shown in FIG.
3 may be arranged so as to be photographed from above, or as shown in FIG. 17, the microphone bodies 32 and 53 may be arranged so that a speaker who is near the microphone can be photographed at the same time.

【００６８】複数台の撮像装置３５ａ，３５ｂ，… を
用いる場合も、図１２に示す構成以外に図１８のよう
に、マイクロホン本体３２，５３を横方向と上方向から
撮影するような配置構成としてもよい。Even when a plurality of image pickup devices 35a, 35b, ... Are used, as shown in FIG. 18, in addition to the configuration shown in FIG. 12, the microphone main bodies 32, 53 are arranged so as to photograph from the lateral and upward directions. Good.

【００６９】また、上述の例では、マークの領域の重心
位置からマイクロホンの位置を求めるようにしたが、マ
ークが細長い場合等では、マークの幅を減らしてゆき、
線にしてからその端点を求めて、これから位置や方向を
求めるようにすることもできる。In the above example, the position of the microphone is determined from the position of the center of gravity of the mark area. However, when the mark is long and narrow, the width of the mark is reduced.
It is also possible to form a line, obtain the end points thereof, and then obtain the position and direction.

【００７０】以上、本システムは、例えば定位置に、且
つ、首振り自由に設置されて音声を入力するマイクロホ
ンなどの音響‐電気信号変換装置を用い、この音響‐電
気信号変換装置の側周面にマークを付与するとともに、
この音響‐電気信号変換装置を上記マークとともに撮影
する撮像装置を設け、この撮像装置の撮影画像から上記
マークの領域を検出し、例えばその領域の重心を求めて
その重心位置を、この検出したマークの画面内での位置
としたり、前記領域の幅を減らして線にし、その端点を
知り、これより位置を知って、設置位置が既知な音響‐
電気信号変換装置の位置情報と画面内でのこのマークの
位置とから音響‐電気信号変換装置の向きを求める演算
装置とより構成した。As described above, the present system uses an acoustic-electrical signal converter such as a microphone which is installed at a fixed position and can be freely swung to input a voice, and the side surface of the acoustic-electrical signal converter is used. While giving a mark to
An image pickup device for picking up this acoustic-electrical signal conversion device together with the mark is provided, and the area of the mark is detected from the image picked up by the image pickup device. For example, the center of gravity of the area is determined and the position of the center of gravity is detected. Position on the screen, or by reducing the width of the area to make a line, knowing its end points, and knowing the position from this, the sound whose installation position is known-
It is composed of an arithmetic unit for obtaining the orientation of the acoustic-electrical signal converter from the position information of the electric signal converter and the position of this mark on the screen.

【００７１】そして、撮像装置により音響‐電気信号変
換装置を上記マークとともに撮影し、演算装置によりこ
の撮像装置の撮影画像から上記マークの領域を検出し、
これより、このマークの前記撮影画面内での位置を知
り、設置位置が既知な音響‐電気信号変換装置の位置情
報と画面内でのこのマークの位置とから音響‐電気信号
変換装置の向きを求めるようにした。Then, the acoustic-electric signal converter is photographed together with the mark by the image pickup device, and the area of the mark is detected from the photographed image of the image pickup device by the arithmetic device,
From this, knowing the position of this mark in the photographing screen, the orientation of the acoustic-electrical signal converter is determined from the position information of the acoustic-electrical signal converter whose installation position is known and the position of this mark in the screen. I tried to ask.

【００７２】このように、音響‐電気信号変換装置の位
置と方向とを画像から求めることができるようにしたの
で、測定のための機構的な要素を一切必要とせず、従っ
て、構成簡易で安価な音声入力装置の位置測定装置が実
現できる。また、音響‐電気信号変換装置の向きと位置
が測定できることから、音響‐電気信号変換装置（つま
り、マイクロホン等）を発話者の口元に向けるように自
動制御する場合の測定装置に応用できる。As described above, since the position and direction of the acoustic-electrical signal converter can be obtained from the image, no mechanical element for measurement is required, and therefore the structure is simple and the cost is low. It is possible to realize a position measuring device for various voice input devices. Further, since the orientation and position of the acoustic-electrical signal converting device can be measured, the present invention can be applied to a measuring device in the case of automatically controlling the acoustic-electrical signal converting device (that is, a microphone or the like) so as to be directed to the speaker's mouth.

【００７３】さらにまた、発話者と音響‐電気信号変換
装置とを撮像する構成とすれば、画像中に複数の人物が
写っている場合でも、音響‐電気信号変換装置がどの人
物の口元に向いているかがわかるから、発話者の口元の
動きと音響‐電気信号変換装置の出力とを連携させて音
声抽出制御する場合等において、発話者の特定を容易に
することができ、発話者の音声抽出制御を容易に実現す
ることが可能になる。特に発話者は、自己が発言する場
合に、マイクロホンを自己の口元に向けるようにマイク
ロホンの先端を自身で動かして調整するといった行動を
自然発生的に行なうので、この行動特性を先取りすれ
ば、マイクロホンの位置と向きを測定できることは、画
面中に大勢の人物が写っている場合でも、発話者が誰で
あるかを特定できることに繋がり、マルチメディアでの
新しい応用に道を拓くことができるようになる。Furthermore, if the speaker and the acoustic-electrical signal converting device are imaged, even if a plurality of persons are shown in the image, the acoustic-electrical signal converting device is suitable for the mouth of any person. It is possible to easily identify the speaker when the voice extraction control is performed by linking the movement of the speaker's mouth with the output of the acoustic-electrical signal conversion device. It becomes possible to easily realize the extraction control. In particular, when the speaker speaks spontaneously, he or she spontaneously takes actions such as moving the tip of the microphone to adjust it so that it points toward the user's mouth. The ability to measure the position and orientation of a person can identify who the speaker is, even when a large number of people are displayed on the screen, thus opening the way for new applications in multimedia. Become.

【００７４】[0074]

【発明の効果】以上、本発明によれば、マイクロホンな
どの音声入力手段の位置と方向を簡易に求めることがで
きるようになり、マイクロホン等を発話者に向けるよう
に自動制御したり、発話者の特定のための機能実現に有
用な情報を求めることを可能とする効果がある。As described above, according to the present invention, the position and direction of the voice input means such as a microphone can be easily obtained, and the microphone or the like can be automatically controlled so as to be directed to the speaker or the speaker. There is an effect that it is possible to obtain information useful for realizing the function for identifying the.

[Brief description of drawings]

【図１】本発明の実施の態様の例を説明するための図で
あって、本発明システムの全体構成例を示すブロック
図。FIG. 1 is a block diagram showing an example of the overall configuration of a system of the present invention, which is a diagram for explaining an example of an embodiment of the present invention.

【図２】本発明の実施の態様の例を説明するための図で
あって、本発明の具体的な実施の態様１の例を説明する
ためのフローチャート。FIG. 2 is a diagram for explaining an example of an embodiment of the present invention, and is a flowchart for explaining an example of a concrete embodiment 1 of the present invention.

【図３】本発明の実施の態様の例を説明するための図で
あって、本発明の具体的な実施の態様１の例を説明する
ための装置構成の斜視図。FIG. 3 is a diagram for explaining an example of an embodiment of the present invention and is a perspective view of a device configuration for explaining an example of a specific first embodiment of the present invention.

【図４】本発明の実施の態様の例を説明するための図で
あって、本発明の具体的な実施の態様１の例を説明する
ための装置構成の斜視図。FIG. 4 is a diagram for explaining an example of an embodiment of the present invention, and is a perspective view of a device configuration for explaining an example of a specific first embodiment of the present invention.

【図５】本発明の実施の態様の例を説明するための図で
あって、本発明の具体的な実施の態様１におけるマーク
抽出処理例を説明するためのフローチャート。FIG. 5 is a diagram for explaining an example of an embodiment of the present invention, and a flowchart for explaining an example of mark extraction processing in a concrete first embodiment of the present invention.

【図６】本発明の実施の態様の例を説明するための図で
あって、本発明の具体的な実施の態様１における輝度値
ヒストグラム例を示す模式図。FIG. 6 is a diagram for explaining an example of an embodiment of the present invention and is a schematic diagram showing an example of a luminance value histogram in a specific embodiment 1 of the present invention.

【図７】本発明の実施の態様の例を説明するための図で
あって、図６のヒストグラム例のスムージング処理結果
例を示す模式図。FIG. 7 is a diagram for explaining an example of an embodiment of the present invention, and a schematic diagram showing an example of a smoothing processing result of the example histogram of FIG. 6;

【図８】本発明の実施の態様の例を説明するための図で
あって、本発明の具体的な実施の態様１におけるマーク
抽出処理例を説明するためのフローチャート。FIG. 8 is a diagram for explaining an example of an embodiment of the present invention, and a flowchart for explaining an example of mark extraction processing in a concrete embodiment 1 of the present invention.

【図９】本発明の実施の態様の例を説明するための図で
あって、本発明の具体的な実施の態様１におけるマーク
抽出処理例を説明するためのフローチャート。FIG. 9 is a diagram for explaining an example of the embodiment of the present invention, and is a flowchart for explaining an example of mark extraction processing in the first specific embodiment of the present invention.

【図１０】本発明の実施の態様の例を説明するための図
であって、本発明の具体的な実施の態様２における装置
構成例の斜視図。FIG. 10 is a diagram for explaining an example of an embodiment of the present invention, and is a perspective view of a device configuration example in a specific second embodiment of the present invention.

【図１１】本発明の実施の態様の例を説明するための図
であって、本発明の具体的な実施の態様２におけるマー
ク付与例を説明するための図。FIG. 11 is a diagram for explaining an example of the embodiment of the present invention, and is a diagram for explaining an example of mark addition in the second specific embodiment of the present invention.

【図１２】本発明の実施の態様の例を説明するための図
であって、本発明の具体的な実施の形態３における装置
構成例の斜視図。FIG. 12 is a diagram for explaining an example of an embodiment of the present invention, and is a perspective view of a device configuration example in a concrete third embodiment of the present invention.

【図１３】本発明の実施の態様の例を説明するための図
であって、本発明の具体的な実施の形態３におけるマー
ク付与例を説明するための図。FIG. 13 is a diagram for explaining an example of an embodiment of the present invention, and is a diagram for explaining an example of marking given in a concrete third embodiment of the present invention.

【図１４】本発明の実施の態様の例を説明するための図
であって、本発明の具体的な実施の形態３におけるマー
ク付与例を説明するための図。FIG. 14 is a diagram for explaining an example of an embodiment of the present invention, and is a diagram for explaining an example of marking given in a concrete third embodiment of the present invention.

【図１５】本発明の実施の態様の例を説明するための図
であって、本発明の具体的な実施の形態３におけるマー
ク付与例を説明するための図。FIG. 15 is a diagram for explaining an example of an embodiment of the present invention, and is a diagram for explaining an example of marking given in a concrete third embodiment of the present invention.

【図１６】本発明の実施の態様の例を説明するための図
であって、マイクロホン本体に対する撮像装置設置位置
の例を示す図。FIG. 16 is a diagram for explaining an example of an embodiment of the present invention, and is a diagram showing an example of an image pickup device installation position with respect to a microphone body.

【図１７】本発明の実施の態様の例を説明するための図
であって、マイクロホン本体に対する撮像装置設置位置
の例を示す図。FIG. 17 is a diagram for explaining the example of the embodiment of the present invention, and is a diagram showing an example of the installation position of the imaging device with respect to the microphone body.

【図１８】本発明の実施の態様の例を説明するための図
であって、マイクロホン本体に対する撮像装置設置位置
の例を示す図。FIG. 18 is a diagram for explaining the example of the embodiment of the present invention, and is a diagram showing an example of the installation position of the imaging device with respect to the microphone body.

[Explanation of symbols]

１…音響‐電気信号変換装置２，３５，３５ａ，３５ｂ…撮像装置３…演算処理部３１…台３２，５３…マイクロホン本体（音響‐電気信号変換装
置１）３３…首振り指示機構３４，５４…マーク５２…フレキシブルアーム。DESCRIPTION OF SYMBOLS 1 ... Acoustic-electrical signal converter 2, 35, 35a, 35b ... Imaging device 3 ... Arithmetic processing part 31 ... Stand 32, 53 ... Microphone main body (acoustic-electrical signal converter 1) 33 ... Pivoting instruction mechanism 34, 54 … Mark 52… Flexible arm.

───────────────────────────────────────────────────── フロントページの続き (72)発明者風間久大阪府大阪市北区大淀中一丁目１番30号株式会社東芝関西支社内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Hisashi Kazama 1-30, Oyodo Naka, Kita-ku, Osaka City, Osaka Prefecture Toshiba Kansai Branch Office

Claims

[Claims]

1. An acoustic-electrical signal conversion device, one end side of which is swingably held by a support member installed at a fixed position, and a mark is attached to at least a side peripheral surface near the tip side, and at least this. An image pickup device for picking up the mark given to the acoustic-electrical signal conversion device from a fixed position, a position of the mark is obtained from a picked-up image of the image pickup device, and the acoustic-electrical signal conversion is performed from the obtained position of the mark. A position measuring device for a voice input device, comprising: a computing device for determining the orientation of the device.

2. A support member, one end of which is installed in a fixed position, is swingably supported, and at least a luminous body or a mark formed by a color point or a color region is provided on a side peripheral surface near the tip side. A sound-electric signal converter, an image pickup device for photographing at least the mark given to the sound-electric signal converter from a fixed position, and an area of the mark from a photographed image of the image pickup device,
From this, it is possible to obtain a position of the mark in the photographed image screen, and a computing device for obtaining the orientation of the acoustic-electrical signal converter from the position information and the installation position of the support member. Position measuring device for voice input device.

3. A support member, one end side of which is installed in a fixed position, is swingably held, and at least a luminous body or a color point or a mark formed by a color region is provided on a side peripheral surface near the tip side. An acoustic-electrical signal converter, and a single imager that captures at least the mark attached to the acoustical-electrical signal converter from a fixed position, and obtains the area of the mark from the image captured by the imager,
From this, the position of the mark in the photographed image screen is provided, and an arithmetic device for determining the orientation of the acoustic-electrical signal converter from the position information and the installation position of the support member is provided. Position measuring device for voice input device.

4. An acoustic-electrical signal conversion device, one end side of which is swingably held by a support member installed at a fixed position, and a mark is attached to at least a side peripheral surface near the tip side, A plurality of image capturing devices that capture the marks provided to the acoustic-electrical signal conversion device from different fixed positions, and the area of the mark in the three-dimensional space is obtained from the captured images of these image capturing devices. A voice input device, comprising: a position of the mark in the photographing screen, and an arithmetic unit that determines the orientation of the acoustic-electrical signal converter from the position information and the installation position of the support member. Position measuring device.

5. An acoustic-electrical signal converting device, one end side of which is swingably held by a supporting member, and a mark is provided on at least a side circumferential surface near the tip side, and at least this acoustic-electrical signal converting device. An image pickup device for picking up the mark given to the mark, and obtaining the area of the mark from a picked-up image of the image pickup device,
A position measuring device for a voice input device, comprising: an arithmetic device for obtaining the position information of the acoustic-electrical signal converting device by obtaining the position of the mark in the photographed image screen.

6. An acoustic-electrical signal converter for voice input, one end of which is held by a supporting member so that it can be swung freely, and a mark is provided on at least a side circumferential surface near the tip side, and at least this acoustic- A plurality of image pickup devices for picking up the marks given to the electric signal conversion device from different positions respectively, and a region of the mark in a three-dimensional space is obtained from picked-up images of these image pickup devices, and from this, Obtaining the position in the captured image screen, the sound in the three-dimensional space-
A position measuring device for a voice input device, comprising: an arithmetic device for obtaining position information of an electric signal converter.

7. A support member, one end side of which is installed in a fixed position, is swingably supported, and at least a side peripheral surface near the front end side is provided with a light emitter or a mark formed of a color point or a color region. Acoustic-electric signal converting device, a plurality of image capturing devices for capturing at least the marks given to the acoustic-electric signal converting device from different fixed positions, and the captured images of these image capturing devices in a three-dimensional space. In the three-dimensional space from the position information and the installation position of the support member while obtaining the position of the mark in the photographed image screen from this
A position measuring device for a voice input device, comprising: an arithmetic device for determining a direction of an electric signal converter.

8. A microphone whose one end side is held by a support member installed at a fixed position so as to swing freely, and at least a mark is provided on a side circumferential surface near the tip side, and at least this microphone is provided. An image pickup device that photographs the mark from a fixed position, and obtains the area of the mark from a photographed image of the image pickup device,
A voice input device comprising: a calculation device for obtaining the position of the mark in the photographed image screen from this, and for obtaining the direction of the microphone from the position information and the installation position of the support member. Position measuring device.