JP2012029209A

JP2012029209A - Audio processing system

Info

Publication number: JP2012029209A
Application number: JP2010168203A
Authority: JP
Inventors: Keimei Nakada; 啓明中田; Junichi Miyakoshi; 純一宮越; Yusuke Sugano; 雄介菅野
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2010-07-27
Filing date: 2010-07-27
Publication date: 2012-02-09

Abstract

PROBLEM TO BE SOLVED: To provide an audio processing system capable of automatically tracking a positional change of an audio source which needs attention; and continuously performing selectively needed audio processing on the audio source which has been paid attention.SOLUTION: An audio processing system includes a face movement detector 3 which detects a face of a figure from images of external world monitoring cameras 2-2; a binocular vision detector 7 which detects a visual line direction of a wearer from images of a left-eye monitoring camera 4 and a right-eye monitoring camera 5; a visual line target detector 8 which detects a direction and distance of a target object from the visual line direction; a positional information updating unit 9 which obtains positional information of the face from pieces of information of the face movement detector 3 and the visual line target detector 8; directivity adjusters 11-11which calculate a parameter to augment sound at a position corresponding to the positional information by beam-forming; and stereo correspondence directivity computing units 13-13which adjust left side gain and right side gain of input signals from an microphone array 12 in accordance with a relative position of a target audio source by using the result calculated by the directivity adjusters 11-11, and output the adjusted signals as left sound and right sound.

Description

本発明は、音源情報の処理技術に関し、特に、映像処理技術を活用した音声などの処理に有効な技術に関する。 The present invention relates to a processing technique for sound source information, and more particularly, to a technique effective for processing audio or the like using a video processing technique.

複数のマイクを用いてビームフォーミングを行い、特定の範囲の音を強調し処理するシステムが存在する。この種の音声処理システムとしては、たとえば、ある角度に定位している音源のみを抽出したり調整したりするもの（たとえば、特許文献１参照）や、音源の方向を記憶し、推定し、その推定に基づいて処理を行うもの（たとえば、特許文献２参照）などが知られている。 There are systems that perform beamforming using a plurality of microphones to emphasize and process a specific range of sounds. As this type of speech processing system, for example, only the sound source localized at a certain angle is extracted or adjusted (for example, refer to Patent Document 1), the direction of the sound source is stored and estimated, A device that performs processing based on estimation (see, for example, Patent Document 2) is known.

また、映像として注目している部分の音源を強調する装置として、視線位置を検出し当該視線位置の被写体にピントを合わせる視線位置ＡＦ(Auto Focus)機能を持つビデオカメラにおいて、視線位置の方向に収音手段の指向性を合わせる機能を有するものが知られている（たとえば、特許文献３参照）。 In addition, as a device that emphasizes the sound source of the part of interest as an image, in a video camera having a line-of-sight position AF (Auto Focus) function that detects the line-of-sight position and focuses on the subject at the line-of-sight position, One having a function of matching the directivity of the sound collecting means is known (for example, see Patent Document 3).

特開２００９−１０９９２号公報JP 2009-10992 A 特開２００５−３５４２２３号公報JP 2005-354223 A 特開平８−２９８６０９号公報JP-A-8-298609

ところが、上記のような音声処理技術では、次のような問題点があることが本発明者により見い出された。 However, the present inventors have found that the above-described voice processing technology has the following problems.

上記した技術では、注目が必要な音源の位置を正確に追従し続け、選択的に音声処理を行うには課題がある。たとえば、特許文献３に示された技術では、常に注目すべき映像を人間が注視し続ける必要があり、ビデオカメラの画角から外れた場合には指向性を合わせることが困難になってしまうという問題がある。 In the above-described technique, there is a problem in selectively following the position of a sound source that needs attention and selectively performing sound processing. For example, in the technique disclosed in Patent Document 3, it is necessary for a human to keep an eye on a video to be noticed at all times, and it becomes difficult to adjust directivity when it is out of the angle of view of the video camera. There's a problem.

また、特許文献２に示された技術では、推定が含まれるために正確な音源位置の追従が困難であり、特に対象物の無音の状態が長く続いた場合には、追従が極めて困難となってしまう。 Further, in the technique disclosed in Patent Document 2, it is difficult to accurately track the sound source position because estimation is included, and tracking is extremely difficult, particularly when the silent state of the object continues for a long time. End up.

さらに、特許文献１に示す技術の場合には、選択的に音声の処理を行う上で有効な技術ではあるが、音源位置の追従を行うことができないという問題がある。 Furthermore, in the case of the technique disclosed in Patent Document 1, although it is an effective technique for selectively performing speech processing, there is a problem that it is impossible to follow the sound source position.

本発明の目的は、注目が必要な音源の位置変化に自動的に追従し、注目した音源に対し、選択的に必要な音声処理を継続することのできる技術を提供することにある。 An object of the present invention is to provide a technique capable of automatically following a change in the position of a sound source that needs attention and selectively continuing necessary sound processing for the sound source of interest.

本発明の前記ならびにそのほかの目的と新規な特徴については、本明細書の記述および添付図面から明らかになるであろう。 The above and other objects and novel features of the present invention will be apparent from the description of this specification and the accompanying drawings.

本願において開示される発明のうち、代表的なものの概要を簡単に説明すれば、次のとおりである。 Of the inventions disclosed in the present application, the outline of typical ones will be briefly described as follows.

本発明は、対象物の音源から音を取得する音源取得部と、利用者の視線を監視して対象物を検出し、前記対象物の方向、および距離を検出する視線監視検出部と、周囲を監視するカメラと、前記カメラで撮影された映像から対象物を検出し、画像位置の変化の連続性から前記対象物の移動を検出し、移動情報として出力する移動検出部と、前記移動検出部、および前記視線監視検出部が検出したそれぞれの検出結果から、前記対象物を追跡して位置情報を検出する位置情報検出部と、前記位置情報検出部が検出した位置情報に基づいて、前記位置情報に対応する位置の音が増強されるように前記音源取得部の指向性を調整して音信号を出力する指向性音源調整部と、前記指向性音源調整部から出力された音信号を出力する音出力部とを備えたものである。 The present invention includes a sound source acquisition unit that acquires sound from a sound source of an object, a line-of-sight monitoring detection unit that detects the object by monitoring the user's line of sight, and detects the direction and distance of the object, A movement detection unit that detects an object from video captured by the camera, detects movement of the object from continuity of image position change, and outputs the movement information, and the movement detection And a position information detection unit that detects the position information by tracking the object from the detection results detected by the line-of-sight monitoring detection unit, and the position information detected by the position information detection unit, A directional sound source adjustment unit that outputs a sound signal by adjusting directivity of the sound source acquisition unit so that sound at a position corresponding to the position information is enhanced, and a sound signal output from the directional sound source adjustment unit. With a sound output section to output Than is.

また、本発明は、前記視線監視検出部が、利用者の視線を監視する監視カメラ部と、前記監視カメラ部の監視結果から、黒目部分、および角膜反射像から左眼、右眼それぞれの視線方向を検出する視線検出部と、前記視線検出部が検出した視線情報から視線に対応する直線が空間上で最も接近する位置の中間点座標を求め、利用者が見ている対象物の方向と距離を検出する視線対象検出部とを備え、前記位置情報検出部は、前記移動検出部と前記視線検出部との検出結果から、対象物に視線が任意の時間以上とどまった際、前記対象物の位置情報を注目すべき位置として出力し、前記移動検出部が検出した移動情報に基づき、前記対象物の位置情報を更新する位置情報更新部と、前記位置情報更新部から出力される位置情報を格納する位置情報記憶部とを備えたものである。 Further, according to the present invention, the line-of-sight monitoring and detecting unit monitors a user's line of sight, and from the monitoring result of the monitoring camera unit, the line of sight of each of the left eye and the right eye from the black eye portion and the cornea reflection image A line-of-sight detection unit that detects a direction, and obtains an intermediate point coordinate of a position where a straight line corresponding to the line of sight closest in space from the line-of-sight information detected by the line-of-sight detection unit, and the direction of the object the user is looking at A line-of-sight object detection unit that detects a distance, and the position information detection unit detects the object when the line-of-sight remains for an arbitrary time or longer from the detection results of the movement detection unit and the line-of-sight detection unit. The position information is output as a position to be noted, and based on the movement information detected by the movement detection unit, the position information update unit that updates the position information of the object, and the position information output from the position information update unit Storing location information It is obtained and a storage unit.

さらに、本発明は、前記位置情報更新部が、位置情報を検出した際に、前記対象物が任意の距離以上離れていると判断すると、前記対象物から除外するものである。 Furthermore, according to the present invention, when the position information update unit detects position information and determines that the object is separated by an arbitrary distance or more, the position information update unit excludes the object from the object.

また、本発明は、前記移動検出部が、前記カメラで撮影された映像から人物の顔を検出して顔の特徴量を検出し、該特徴量に基づいて同一人物を判定し、画像位置の変化の連続性から顔の移動を検出し、前記音源取得部は、前記指向性音源調整部の制御に基づいて、音声を取得するものである。 Further, according to the present invention, the movement detection unit detects a person's face from the video captured by the camera, detects a face feature amount, determines the same person based on the feature amount, and determines an image position. The movement of the face is detected from the continuity of the change, and the sound source acquisition unit acquires sound based on the control of the directional sound source adjustment unit.

さらに、本発明は、前記音源取得部が、複数のマイクから構成されているものである。 Furthermore, in the present invention, the sound source acquisition unit includes a plurality of microphones.

また、本発明は、前記音出力部が、ノイズを打ち消す信号を発生し、ノイズキャンセルを行うノイズキャンセリング部を備えたものである。 According to the present invention, the sound output unit includes a noise canceling unit that generates a signal for canceling noise and performs noise cancellation.

さらに、本願のその他の発明の概要を簡単に示す。 Furthermore, the outline | summary of the other invention of this application is shown briefly.

音源が存在し得る方向の画像を監視し、注視対象になる可能性の部分を候補点としてマークする。候補点としてマークした部分に対し、あるいはマークした部分でイベントが発生したら、そのマークを注目音源として取り扱うためにその位置を記憶する。その位置に対応する画像の移動を画像処理により監視し、位置情報を適宜更新する。この位置情報に従い複数のマイクによるビームフォーミング特性を調整し、ビームフォーミングにより収音・抽出した音声に対して必要な処理を行う。 An image in a direction in which a sound source can exist is monitored, and a portion that may be a gaze target is marked as a candidate point. When an event occurs for a portion marked as a candidate point or at the marked portion, the position is stored to handle the mark as a target sound source. The movement of the image corresponding to the position is monitored by image processing, and the position information is updated as appropriate. The beam forming characteristics of a plurality of microphones are adjusted in accordance with this position information, and necessary processing is performed on the sound collected and extracted by beam forming.

また、本発明は、映像を表示する表示部と、音を出力する音出力部と、周囲を監視するカメラと、対象物の音源から音を取得する音源取得部と、前記カメラから入力される画像から人物の顔を抽出し、抽出した顔の位置、および距離を検出して前記人物の顔を追跡し、位置情報として出力する顔移動検出部と、前記カメラの画像、および前記顔移動検出部が出力した位置情報に基づいて、検出した顔画像の中から眼の部分を抽出し、前記人物が見ている方向が前記表示部に表示されている何らかの処理をすべき箇所である特定表示部であるかを判断する注視状態検出部と、前記注視状態検出部が前記人物が前記特定表示部を見ていると判断した際に、対応する人物の位置情報を取得する位置情報更新部と、前記位置情報更新部が取得した位置情報に基づいて、前記人物の位置に対応する音声をビームフォーミングにより増強するパラメータを計算し、その計算結果に基づいて、前記位置情報に対応する位置の音声が増強されるように前記音源取得部の指向性を調整して音信号を出力する指向性音源調整部と、前記指向性音源調整部から出力された音声信号に含まれる音声を認識し、その認識結果に基づいて、双方向コミュニケーション対象者であるかを判断し、対象者と判断すると、該人物に対応する位置情報に基づいて、前記表示部に情報を表示する音声認識表示制御部と、前記音声認識表示制御部から出力された位置情報に基づいて、前記対象者のみに音が伝達されるように指向性を持たせる指向性演算処理を行い、演算結果に基づいて、前記音出力部から音声を出力する発音指向性調整演算部とを備えたものである。 The present invention is also input from a display unit that displays video, a sound output unit that outputs sound, a camera that monitors the surroundings, a sound source acquisition unit that acquires sound from a sound source of an object, and the camera A face movement detection unit that extracts a human face from an image, detects the position and distance of the extracted face, tracks the face of the person, and outputs it as position information; an image of the camera; and the face movement detection Based on the positional information output by the unit, the eye part is extracted from the detected face image, and the specific display that is the place where the direction in which the person is viewed is displayed on the display unit is to be performed A gaze state detection unit that determines whether the person is a part, a position information update unit that acquires position information of a corresponding person when the gaze state detection unit determines that the person is looking at the specific display unit, The position acquired by the position information update unit Based on the information, the sound source acquisition unit calculates a parameter for enhancing the sound corresponding to the position of the person by beam forming, and based on the calculation result, the sound at the position corresponding to the position information is enhanced. A directivity sound source adjustment unit that adjusts the directivity of the sound and outputs a sound signal, and recognizes the voice included in the sound signal output from the directivity sound source adjustment unit, and based on the recognition result, the target of bidirectional communication When the target person is determined, the voice recognition display control unit that displays information on the display unit based on the position information corresponding to the person and the voice recognition display control unit output the information. Based on the position information, a directivity calculation process is performed to provide directivity so that sound is transmitted only to the target person, and sound is output from the sound output unit based on the calculation result Those having a tropism adjustment computing unit.

さらに、本発明は、前記位置情報更新部が、前記注視状態検出部が、前記人物が特定表示部を注視していると判断した際に、前記人物の顔の位置情報が変化する毎に前記位置情報を更新して追跡するものである。 Further, according to the present invention, the position information update unit changes the position information of the person's face whenever the gaze state detection unit determines that the person is gazing at the specific display unit. The location information is updated and tracked.

また、本発明は、前記音声認識表示制御部が、双方向コミュニケーション対象者の位置情報に基づいて、前記表示部に小画面の映像情報を表示するものである。 Further, according to the present invention, the voice recognition display control unit displays small screen video information on the display unit based on position information of a person to be interactively communicated.

さらに、本発明は、前記位置情報更新部が複数の人物の位置情報を取得し、複数の前記人物の位置情報を追跡して更新するものである。 Further, according to the present invention, the position information update unit acquires position information of a plurality of persons, and tracks and updates the position information of the plurality of persons.

本願において開示される発明のうち、代表的なものによって得られる効果を簡単に説明すれば以下のとおりである。 Among the inventions disclosed in the present application, effects obtained by typical ones will be briefly described as follows.

（１）ユーザと音源の相互の位置関係が刻々と変化する環境下において聞き取る必要のある人の音声を選択的に精度よく増幅することができる。 (1) It is possible to selectively amplify the voice of a person who needs to be heard in an environment where the mutual positional relationship between the user and the sound source changes every moment.

（２）上記（１）により、騒音下などの状況にあっても、最適な会話の補助を行うことができる。 (2) According to the above (1), it is possible to assist optimal conversation even in a situation of noise.

（３）また、ビームフォーミング可能なスピーカを組み合わせることで、双方向コミュニケーション可能なデジタルサイネージ機器を実現することができる。 (3) A digital signage device capable of two-way communication can be realized by combining a beam-forming speaker.

（４）上記（３）により、複数人で会話が行われている状況において、特定の人にズームを行った際にも注目した人の音声を収音することができる。 (4) According to the above (3), in a situation where a conversation is performed by a plurality of people, it is possible to pick up the voice of the person who has paid attention even when zooming on a specific person.

本発明の実施の形態１による音声処理システムにおける構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure in the speech processing system by Embodiment 1 of this invention. 図１の音声処理システムの外観の一例を示す説明図である。It is explanatory drawing which shows an example of the external appearance of the speech processing system of FIG. 図２の音声処理システムに示した外観監視カメラによる監視領域の一例を示す説明図である。It is explanatory drawing which shows an example of the monitoring area | region by the external appearance monitoring camera shown to the audio | voice processing system of FIG. 本発明の実施の形態２によるデジタルサイネージシステムにおける構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure in the digital signage system by Embodiment 2 of this invention. 図４のデジタルサイネージシステムの外観の一例を示す説明図である。It is explanatory drawing which shows an example of the external appearance of the digital signage system of FIG.

以下、本発明の実施の形態を図面に基づいて詳細に説明する。なお、実施の形態を説明するための全図において、同一の部材には原則として同一の符号を付し、その繰り返しの説明は省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Note that components having the same function are denoted by the same reference symbols throughout the drawings for describing the embodiment, and the repetitive description thereof will be omitted.

（実施の形態１）
図１は、本発明の実施の形態１による音声処理システムにおける構成の一例を示すブロック図、図２は、図１の音声処理システムの外観の一例を示す説明図、図３は、図２の音声処理システムに示した外観監視カメラによる監視領域の一例を示す説明図である。 (Embodiment 1)
1 is a block diagram showing an example of the configuration of a voice processing system according to Embodiment 1 of the present invention, FIG. 2 is an explanatory diagram showing an example of the appearance of the voice processing system of FIG. 1, and FIG. 3 is a diagram of FIG. It is explanatory drawing which shows an example of the monitoring area | region by the external appearance monitoring camera shown to the audio | voice processing system.

本実施の形態１において、音声処理システム１は、会話中に会話相手の音声を選択的に強調して、該音声処理システム１の装着者に伝達するシステムである。 In the first embodiment, the voice processing system 1 is a system that selectively emphasizes the voice of the conversation partner during the conversation and transmits the voice to the wearer of the voice processing system 1.

音処理システムである音声処理システム１は、図１に示すように、外界監視カメラ２₁〜２₈、顔移動検出器３、左眼監視カメラ４、右眼監視カメラ５、近赤外線ＬＥＤ６（図２に示す）、両眼視線検出器７、視線対象検出器８、位置情報更新器９、位置情報記憶器１０、指向性調整器１１₁〜１１₈、複数のマイクから構成されるマイクアレイ１２、ステレオ対応指向性演算器１３₁〜１３₈、加算器１４，１５、左耳イヤフォン１６、右耳イヤフォン１７、左耳マイク１８、右耳マイク１９、左ノイズキャンセル装置２０、および右ノイズキャンセル装置２１から構成されている。 As shown in FIG. 1, an audio processing system 1 which is a sound processing system includes an external monitoring camera 2 ₁ to 2 ₈ , a face movement detector 3, a left eye monitoring camera 4, a right eye monitoring camera 5, and a near infrared LED 6 (see FIG. 1). 2), a binocular gaze detector 7, a gaze target detector 8, a position information updater 9, a position information storage unit 10, directivity adjusters 11 _{1 to} 11 ₈ , and a microphone array 12 composed of a plurality of microphones. Stereo-compatible directivity calculators 13 _{1 to} 13 ₈ , adders 14 and 15, left ear earphone 16, right ear earphone 17, left ear microphone 18, right ear microphone 19, left noise cancellation device 20, and right noise cancellation device 21.

これら左眼監視カメラ４、右眼監視カメラ５、両眼視線検出器７、および視線対象検出器８により、視線監視検出部が構成されており、位置情報更新器９と位置情報記憶器１０とによって位置情報検出部が構成されている。 The left eye monitoring camera 4, right eye monitoring camera 5, binocular line of sight detector 7, and line of sight target detector 8 constitute a line of sight monitoring detector, and a position information updater 9 and a position information storage unit 10 The position information detection unit is configured by.

また、指向性調整器１１₁〜１１₈、およびステレオ対応指向性演算器１３₁〜１３₈によって指向性音源調整部が構成されており、左耳マイク１８、右耳マイク１９、左ノイズキャンセル装置２０、および右ノイズキャンセル装置２１により、ノイズキャンセリング部が構成されている。 Further, the directivity adjusters 11 _{1 to} 11 ₈ and the stereo-compatible directivity calculators 13 _{1 to} 13 ₈ constitute a directivity sound source adjustment unit, and the left ear microphone 18, the right ear microphone 19, and the left noise cancellation device. 20 and the right noise canceling device 21 constitute a noise canceling unit.

外界監視カメラ２₁〜２₈は、周囲を監視するカメラであり、周囲のどの位置に音源となりうる物があるか、また注目した音源がどこに移動しているかを監視するためのカメラ群である。 The external monitoring cameras 2 _{1 to} 2 ₈ are cameras for monitoring the surroundings, and are a group of cameras for monitoring at which position in the surroundings there is an object that can be a sound source and where the focused sound source is moving. .

外界監視カメラ２₁〜２₈で撮影された映像は、顔移動検出器３へ出力され、人物の顔を検出する。顔移動検出器３は、入力された映像から人物の顔を検出する機能、および検出した顔の特徴量を検出する機能を有し、特徴量を元に同一人物を判定し、画像位置の変化の連続性から顔の移動を検出する。 Images taken by the external monitoring cameras 2 _{1 to} 2 ₈ are output to the face movement detector 3 to detect a human face. The face movement detector 3 has a function of detecting a person's face from the input video and a function of detecting a feature amount of the detected face. The face movement detector 3 determines the same person based on the feature amount and changes the image position. The movement of the face is detected from the continuity of.

顔検出アルゴリズムについては、デジタルカメラなどで既に多く実装されている技術のため説明を省略するが、たとえば、Ｈａａｒタイプの特徴量を用い、ＡｄａＢｏｏｓｔアルゴリズムを用いる手法が広く知られている。 Description of the face detection algorithm is omitted because it is already implemented in many digital cameras and the like, but, for example, a method using an AdaBoost algorithm using Haar type feature values is widely known.

また、同一の顔を複数の外界監視カメラ２₁〜２₈で撮影し、特徴量、および各外界監視カメラ２₁〜２₈で撮影した画像における顔画像の位置関係を用い、複数の外界監視カメラ２₁〜２₈で撮影した画像間で同一の顔画像を対応付けることにより、顔移動検出器３は、顔の方向だけでなく顔までの距離も検出可能である。 Further, by photographing the same face with a plurality of external monitoring cameras 2 ₁ to 2 _8, the feature amount, and using the positional relationship of the face image in the captured image in the external monitoring camera 2 ₁ to 2 _8, a plurality of external monitoring by associating the same face image between the images captured by the camera 2 ₁ to 2 _8, the face motion detector 3, the distance to the face, not only in the direction of the face can also be detected.

監視カメラ部である左眼監視カメラ４は、音声処理システム１の装着者の左眼の視線を監視する近赤外線カメラであり、監視カメラ部である右眼監視カメラ５は、機器装着者の右眼の視線を監視する近赤外線カメラである。 The left-eye monitoring camera 4 that is the monitoring camera unit is a near-infrared camera that monitors the line of sight of the left eye of the wearer of the audio processing system 1, and the right-eye monitoring camera 5 that is the monitoring camera unit is the right of the device wearer. This is a near-infrared camera that monitors the line of sight of the eyes.

また、左眼監視カメラ４、および右眼監視カメラ５の近傍には、近赤外線ＬＥＤ６（図２に示す）が備えられている。左眼監視カメラ４、ならびに右眼監視カメラ５によって撮影された映像は、視線検出部である両眼視線検出器７において２値化された上、黒目部分、および角膜反射像から装着者の左眼、右眼それぞれの視線方向を検出する。 Further, near-infrared LEDs 6 (shown in FIG. 2) are provided in the vicinity of the left-eye monitoring camera 4 and the right-eye monitoring camera 5. Images taken by the left eye monitoring camera 4 and the right eye monitoring camera 5 are binarized by the binocular gaze detector 7 which is a gaze detection unit, and the left eye of the wearer is detected from the black eye portion and the cornea reflection image. The line-of-sight directions of the eyes and right eye are detected.

視線対象検出部となる視線対象検出器８は、両眼視線検出器７が検出した視線情報に基づいて、両眼の視線に対応する直線が空間上で最も接近する位置の中間点座標を求め、装着者が見ている対象物の方向と距離を検出する。 The line-of-sight target detector 8 serving as the line-of-sight target detection unit obtains the intermediate point coordinates of the position where the straight line corresponding to the line of sight of both eyes is closest in space based on the line-of-sight information detected by the binocular line-of-sight detector 7. Detect the direction and distance of the object the wearer is looking at.

位置情報更新部となる位置情報更新器９は、顔移動検出器３と視線対象検出器８との出力から、顔の存在する場所に視線が任意の時間以上とどまった場合、当該顔画像の位置情報を注目すべき位置として、位置情報更新部となる位置情報記憶器１０に格納する。 The position information update unit 9 serving as a position information update unit detects the position of the face image when the line of sight stays at a place where the face exists for an arbitrary time or longer from the outputs of the face movement detector 3 and the line-of-sight target detector 8. Information is stored in the position information storage unit 10 serving as a position information update unit as a position to be noted.

この際、位置情報記憶器１０で記憶可能な数より注目数が多くなった場合は、最も過去に視線により注目された顔画像の位置情報を消去し、新しく注目した顔画像の位置情報に入れ替える。 At this time, when the number of attentions is larger than the number that can be stored in the position information storage unit 10, the position information of the face image that has been noticed most recently by the line of sight is erased and replaced with the position information of the newly noticed face image. .

注目すべき位置として顔画像を判別する場合、用途によっては顔までの距離を用いて一定距離以下であった場合に注目対象とする方法も考えられる。たとえば、通常の会話を中心とした用途であれば、約２メートル、ないし３メートル以内を対象とするなど考えられる。 When a face image is discriminated as a position to be noticed, depending on the application, there may be a method of using a distance to the face as a target of attention when the distance is not more than a certain distance. For example, if it is an application centered on normal conversation, it is possible to target about 2 meters or 3 meters or less.

また、位置情報更新器９は、顔移動検出器３が検出した顔移動情報に基づき、位置情報記憶器１０に記憶されている注目対象の位置情報を更新する機能も有する。この機能により、注目した顔画像の動きだけでなく、装着者自身が移動したり顔を動かしたりした場合であっても、装着者と注目した顔画像の相対位置情報を正しく更新することが可能である。 The position information updater 9 also has a function of updating the position information of the target of interest stored in the position information storage 10 based on the face movement information detected by the face movement detector 3. With this function, not only the movement of the focused face image but also the relative position information of the focused face image with the wearer can be correctly updated even when the wearer moves or moves the face. .

位置情報記憶器１０に記憶されている注目している顔画像に対応するそれぞれの位置情報は、指向性調整器１１₁〜１１₈のいずれかに割り当てられている。指向性調整器１１₁〜１１₈のうち、位置情報が割り当てられた指向性調整器は、音源取得部となるマイクアレイ１２を用いて位置情報に対応する位置の音声をビームフォーミングにより増強するために必要なデジタルフィルタのパラメータを計算する。 Each position information corresponding to the face image of interest stored in the position information storage 10 is assigned to _{one of the} directivity adjusters 11 _{1 to} 11 ₈ . Of the directivity adjusters 11 _{1 to} 11 ₈ , the directivity adjuster to which position information is assigned is for enhancing the sound at the position corresponding to the position information by beam forming using the microphone array 12 serving as a sound source acquisition unit. Calculate the necessary digital filter parameters.

なお、これらの係数を求める方法としては、音源の位置とマイクアレイ１２を構成する各マイクまでの距離の差を補償するように各マイクからの入力信号を遅延して合計を求めるなどの手法がある。 As a method for obtaining these coefficients, there is a method of delaying the input signal from each microphone so as to compensate for the difference between the position of the sound source and the distance to each microphone constituting the microphone array 12, and calculating the total. is there.

ステレオ対応指向性演算器１３₁〜１３₈のうち、デジタルフィルタのパラメータを計算した指向性調整器に対応するステレオ対応指向性演算器は、該指向性調整器の計算結果に基づき、マイクアレイ１２を構成する各マイクから、たとえば、Ａ／Ｄ(Analog/Digital)変換入力されたデジタル音声信号のフィルタ処理を行う。 Among the stereo-compatible directivity calculators 13 _{1 to} 13 ₈ , the stereo-compatible directivity calculator corresponding to the directivity adjuster that has calculated the digital filter parameters is based on the calculation result of the directivity adjuster, and the microphone array 12. For example, a digital audio signal input by A / D (Analog / Digital) conversion is filtered from each microphone constituting the.

また、装着者に対して音源、すなわち顔画像の方向に応じた定位感を再現するため、ステレオ対応指向性演算器１３₁〜１３₈は、対象音源の相対位置に応じて左側、右側それぞれのゲインを調整し、左音声、および右音声として出力する。 In addition, in order to reproduce the sense of localization according to the direction of the sound source, that is, the face image, to the wearer, the stereo correspondence directivity calculators 13 _{1 to} 13 ₈ are respectively provided on the left side and the right side according to the relative position of the target sound source. The gain is adjusted and output as left audio and right audio.

ステレオ対応指向性演算器１３₁〜１３₈からの音声出力信号は、加算器１４，１５によって左側、右側それぞれ加算され、左耳イヤフォン１６と右耳イヤフォン１７を通して装着者の耳に伝える。 Audio output signals from the stereo-compatible directivity calculators 13 _{1 to} 13 ₈ are added by the adders 14 and 15 respectively to the left side and the right side, and transmitted to the wearer's ear through the left ear earphone 16 and the right ear earphone 17.

左耳マイク１８は、左耳イヤフォン１６から出力される左側音声のノイズキャンセル用のマイクであり、右耳マイク１９は、右耳イヤフォン１７から出力される右側音声のノイズキャンセル用のマイクである。 The left ear microphone 18 is a microphone for canceling the noise of the left side sound output from the left ear earphone 16, and the right ear microphone 19 is a microphone for canceling the noise of the right side sound output from the right ear earphone 17.

左耳マイク１８では、外部から左耳に直接届く音を入力し、この音を打ち消す信号（基本的には逆位相の信号）を左ノイズキャンセル装置２０で生成し、左耳イヤフォン１６で再生する音にゲインを調整し、加算器１４にて加える。右側も同様の処理を行う。この処理により、外部から直接耳に届くノイズなどを抑制でき、機器内部で生成した音を騒音下でも聞きやすくすることが可能である。 The left ear microphone 18 inputs a sound that directly reaches the left ear from the outside, generates a signal (basically an antiphase signal) that cancels this sound by the left noise canceling device 20, and reproduces it by the left ear earphone 16. The gain is adjusted to the sound and added by the adder 14. Similar processing is performed on the right side. By this processing, noise that directly reaches the ear from the outside can be suppressed, and the sound generated inside the device can be easily heard even under noise.

なお、左ノイズキャンセル装置２０、および右ノイズキャンセル装置２１は、必要な外部音が完全に遮断されるのを防止するために、特定の周波数帯の音に関してはキャンセルしないよう、打ち消す信号を生成しない場合もある。また、注目すべき音源が存在しない状態では、左ノイズキャンセル装置２０、および右ノイズキャンセル装置２１の動作を停止、あるいは左耳マイク１８、ならびに右耳マイク１９と同位相の信号を生成するモードに変更し、未装着に近い状態を再現することも考えられる。 Note that the left noise cancellation device 20 and the right noise cancellation device 21 do not generate a signal to cancel so as not to cancel a sound in a specific frequency band in order to prevent a necessary external sound from being completely blocked. In some cases. In a state where there is no notable sound source, the operation of the left noise canceling device 20 and the right noise canceling device 21 is stopped, or a signal in the same phase as the left ear microphone 18 and the right ear microphone 19 is generated. It is also possible to change and reproduce a state close to non-wearing.

図２は、音声処理システム１における外観構成の一例を示す説明図である。 FIG. 2 is an explanatory diagram showing an example of an external configuration in the voice processing system 1.

音声処理システム１は、図示するように、たとえば、メガネ２２に備えられたメガネ型デバイスからなる。マイクアレイ１２は、マイク１２₁〜１２₁₀からなる。マイク１２₁〜１２₃は、メガネ２２の左側のテンプル部における耳あて部側からレンズ側にかけてそれぞれ等間隔で設けられている。 As shown in the figure, the sound processing system 1 is composed of, for example, a glasses-type device provided in the glasses 22. Microphone array 12 is comprised of a microphone 12 ₁ to 12 _10. The microphones 12 _{1 to} 12 ₃ are provided at equal intervals from the ear fitting part side to the lens side in the left temple part of the glasses 22.

マイク１２₄，１２₅は、左側のレンズ上方、または左側レンズ縁上方などにそれぞれ等間隔で設けられており、マイク１２₆，１２₇は、右側のレンズ上方、あるいは右側レンズ縁上方などにそれぞれ等間隔で設けられている。 The microphones 12 ₄ and 12 ₅ are provided at equal intervals above the left lens or above the left lens edge, and the microphones 12 ₆ and 12 ₇ are respectively above the right lens or above the right lens edge. It is provided at equal intervals.

マイク１２₈〜１２₁₀は、メガネ２２の右側のテンプル部におけるレンズ側から耳あて部側にかけてそれぞれ等間隔で設けられている。マイク１２₁〜１２₃は、主として左側の音を収音する。マイク１２₄〜１２₇は、主として前面の音を収音し、マイク１２₈〜１２₁₀は、主として右側の音を収音する。左前方の収音を行うには、左側、および前面のマイク１２₁〜１２₅を、右前方の収音を行うには、右側、および前面のマイク１２₆〜１２₁₀を用いる。 The microphones 12 _{8 to} 12 ₁₀ are provided at equal intervals from the lens side to the earrest side of the temple portion on the right side of the glasses 22. The microphones 12 _{1 to} 12 ₃ mainly collect the left sound. The microphones 12 _{4 to} 12 ₇ mainly collect the sound on the front surface, and the microphones 12 _{8 to} 12 ₁₀ mainly collect the sound on the right side. The left and front microphones 12 _{1 to} 12 ₅ are used to collect left front sound, and the right and front microphones 12 _{6 to} 12 ₁₀ are used to collect right front sound.

ただし、左の音源の収音には右側のマイクの信号を用いないなど、音源から見て装着者の影になる部分に存在するマイクは用いない。この制御は指向性調整器１１₁〜１１₈がそれぞれ注目している位置に応じたパラメータを生成し、当該パラメータに従ってステレオ対応指向性演算器１３₁〜１３₈が動作することで実現できる。 However, the microphone that is in the shadow of the wearer when viewed from the sound source is not used, such as the right microphone signal is not used to collect the left sound source. This control can be realized by generating parameters according to the positions to which the directivity adjusters 11 _{1 to} 11 ₈ are paying attention, and operating the stereo correspondence directivity calculators 13 _{1 to} 13 ₈ according to the parameters.

すなわち、使用しないマイクからのゲインを０にするようなパラメータを生成することにより、使用しないマイクからの入力をマスクする。装着者の影になる部分に存在するマイクは用いないことで、音源からの直接音より、外界で反射した音や装置そのもの、および機器使用者を介して伝わる音が中心である音声入力をフィルタ処理から排除でき、指向性演算処理への悪影響を抑制できる。 That is, by generating a parameter that makes the gain from the unused microphone zero, the input from the unused microphone is masked. By not using the microphone that is in the shadow of the wearer, the sound input reflected mainly from the sound reflected from the outside world, the device itself, and the sound transmitted through the device user is filtered rather than the direct sound from the sound source. It can be excluded from processing, and adverse effects on directivity calculation processing can be suppressed.

外界監視カメラ２₁は、メガネ２２の左側のテンプル部における耳あて部側に設けられており、外界監視カメラ２₂は、メガネ２２の左側のテンプル部におけるレンズ側に設けられている。 External monitoring camera 2 ₁ is provided on the earmuffs portion side of the temple part of the left eyeglass 22, external monitoring camera 2 ₂ are provided on the lens side of the temple part of the left eyeglass 22.

外界監視カメラ２₅は、メガネ２２の右側のテンプル部における耳あて部側に設けられており、外界監視カメラ２₆は、メガネ２２の右側のテンプル部におけるレンズ側に設けられている。 The outside world monitoring camera ₂₅ is provided on the ear contact side of the right temple part of the glasses 22, and the outside world monitoring camera ₂₆ is provided on the lens side of the right temple part of the glasses 22.

外界監視カメラ２₇は、メガネ２２の左側のテンプル部における耳あて部側の端部に設けられており、外界監視カメラ２₈は、メガネ２２の右側のテンプル部における耳あて部側の端部に設けられている。 External monitoring camera 2 ₇ is provided at an end portion of the earmuff portion side of the temple part of the left eyeglass 22, external monitoring camera 2 _8, the end portions of the earmuff portion side in the right temple part of the eyeglass 22 Is provided.

この外界監視カメラ２₇，２₈は、装着者のやや後方に位置するよう、耳あて部分より後ろ設けられるようにアームなどを少々伸ばして配置する。このように配置することで装着者の後頭部による影の影響を極力避けることができ、前方と同様の監視を可能とすることができる。 The external world monitoring cameras 2 ₇ and 2 ₈ are arranged with their arms slightly extended so that they are provided behind the earpiece so that they are located slightly behind the wearer. By arranging in this way, it is possible to avoid the influence of shadows due to the back of the head of the wearer as much as possible, and monitoring similar to the front can be made possible.

外界監視カメラ２₃は、左側のレンズの左上方、または左側レンズ縁の左上方に設けられており、外界監視カメラ２₄は、右側のレンズの右上方、または右側レンズ縁の右上方に設けられている。 External monitoring camera 2 _3, upper left of the left lens or is provided on the upper left side of the left lens edge, external monitoring camera 2 ₄ provided on the upper right or upper right of the right lens edge, the right lens It has been.

外界監視カメラ２₁，２₃は、左側方向の画像監視、外界監視カメラ２₂，２₅は、前方向の画像監視、外界監視カメラ２₇，２₈は後ろ方向の画像監視、外界監視カメラ２₄，２₆は右側方向の画像監視をそれぞれ行う。それぞれの方向に対して２個のカメラで監視することで、対象物までの距離も検出することが可能である。 The outside world monitoring cameras 2 ₁ and 2 ₃ are image monitoring in the left direction, the outside world monitoring cameras 2 ₂ and 2 ₅ are image monitoring in the front direction, the outside world monitoring cameras 2 ₇ and 2 ₈ are image monitoring in the backward direction, and the outside world monitoring camera. 2 ₄ and 2 ₆ perform image monitoring in the right direction. By monitoring with two cameras in each direction, the distance to the object can also be detected.

また、メガネ２２の左側のテンプル部には、左耳イヤフォン１６がワイヤを介して接続されており、メガネ２２の右側のテンプル部には、右耳イヤフォン１７がワイヤを接続されている。左耳イヤフォン１６には、ノイズキャンセリング用の左耳マイク１８を備えた構造となっており、右耳イヤフォン１７には、ノイズキャンセリング用の右耳マイク１９を備えた構造となっている。 The left ear earphone 16 is connected to the left temple portion of the glasses 22 via a wire, and the right ear earphone 17 is connected to the right temple portion of the glasses 22 with a wire. The left ear earphone 16 has a structure including a left ear microphone 18 for noise canceling, and the right ear earphone 17 has a structure including a right ear microphone 19 for noise canceling.

左眼監視カメラ４は、メガネ２２の左側のレンズ下方、または左側レンズ縁下方に設けられており、右眼監視カメラ５は、メガネ２２の右側のレンズ下方、または右側レンズ縁下方に設けられている。これら左眼監視カメラ４、右眼監視カメラ５の近傍には、撮影用光源となる近赤外線ＬＥＤ６がそれぞれ設けられている。 The left eye monitoring camera 4 is provided below the left lens of the glasses 22 or below the left lens edge, and the right eye monitoring camera 5 is provided below the right lens of the glasses 22 or below the right lens edge. Yes. Near the left-eye monitoring camera 4 and the right-eye monitoring camera 5 are provided near-infrared LEDs 6 serving as photographing light sources.

メガネ２２において、左側のテンプルのレンズ側には、左収納ボックス２３が設けられており、右側のテンプルのレンズ側には、右収納ボックス２４が設けられている。左収納ボックス２３、および右収納ボックス２４には、顔移動検出器３、両眼視線検出器７、視線対象検出器８、位置情報更新器９、位置情報記憶器１０、指向性調整器１１₁〜１１₈、ステレオ対応指向性演算器１３₁〜１３₈、加算器１４，１５、左ノイズキャンセル装置２０、および右ノイズキャンセル装置２１などの回路ブロックや音声処理システム１を動作させる電源となるバッテリなどが収納される。 In the glasses 22, a left storage box 23 is provided on the lens side of the left temple, and a right storage box 24 is provided on the lens side of the right temple. The left storage box 23 and the right storage box 24 include a face movement detector 3, a binocular gaze detector 7, a gaze target detector 8, a position information updater 9, a position information storage unit 10, and a directivity adjuster 11 _1. To 11 ₈ , stereo-compatible directivity calculators 13 _{1 to} 13 ₈ , adders 14 and 15, left noise canceling device 20, right noise canceling device 21, and other circuit blocks and a battery serving as a power source for operating the sound processing system 1. Etc. are stored.

また、左収納ボックス２３、あるいは右収納ボックス２４に、たとえば、無線通信回路などを搭載し、無線接続により一部の処理を離れた装置で行う構成としてもよい。 Further, the left storage box 23 or the right storage box 24 may have a configuration in which, for example, a wireless communication circuit is mounted and a part of processing is performed by a wireless connection.

図３は、図２の音声処理システム１における外界監視カメラ２₁〜２₈によって監視可能な領域の一例を示した説明図である。なお、図３においては、主として前半分の領域を示しており、後ろ半分は前半分と対称になるため領域の図示を一部省略している。 FIG. 3 is an explanatory diagram showing an example of a region that can be monitored by the external monitoring cameras 2 ₁ to 2 ₈ in the voice processing system 1 of FIG. In FIG. 3, the front half region is mainly shown, and the rear half is symmetrical with the front half, and therefore the illustration of the region is partially omitted.

図３において、領域Ａは、左側方向の外界監視カメラ２₁，２₃のいずれでも監視することができない領域、領域Ｂは、右側方向の外界監視カメラ２₄，２₆のいずれでも監視することができない領域、領域Ｃは、前方向の外界監視カメラ２₂，２₅のいずれでも監視ができない領域、領域Ｄは、後ろ方向の外界監視カメラ２₇，２₈のいずれでも監視ができない領域を示している。 In FIG. 3, an area A is an area that cannot be monitored by any of the left-side external monitoring cameras 2 ₁ and 2 ₃ , and an area B is monitored by either of the right-side external monitoring cameras 2 ₄ and 2 _6. The area C cannot be monitored by any of the forward external monitoring cameras 2 ₂ and 2 ₅ , and the area D cannot be monitored by any of the external monitoring cameras 2 ₇ and 2 _{8 in} the backward direction. Show.

また、領域Ｅは、外界監視カメラ２₃による監視、領域Ｆは、外界監視カメラ２₂よる監視、領域Ｇは、外界監視カメラ２₅による監視、ならびに領域Ｈは、外界監視カメラ２₄による監視のみが可能な領域であり、２つの外界監視カメラによる監視ができない領域を示している。 The region E is monitored by the external monitoring camera 2 _3, region F is outside the monitoring camera 2 ₂ According monitored region G is monitored by external monitoring camera 2 _5, and region H is monitored by external monitoring camera 2 ₄ This is a region that can only be monitored and is a region that cannot be monitored by two external monitoring cameras.

領域Ｉは、外界監視カメラ２₂，２₃の２個のカメラで監視可能な領域であり、向きの異なるカメラを用いて監視画像から距離を含めた情報を取得可能である。領域Ｊは、外界監視カメラ２₄，２₅の２個のカメラで監視可能な領域であり、同様に監視画像から距離を含めた情報を取得可能である。すなわち、前方向の外界監視カメラ２₃，２₄と、左または右方向の外界監視カメラ２₂，２₅の双方の信号を用いて対象物の方向、ならびに距離を検出することもあり得る。 The area I is an area that can be monitored by two cameras, the external monitoring cameras 2 ₂ and 2 ₃ , and information including the distance can be acquired from the monitoring image using cameras having different directions. The area J is an area that can be monitored by two cameras, the external monitoring cameras 2 ₄ and 2 ₅ , and similarly, information including the distance can be acquired from the monitoring image. In other words, the direction and distance of the object may be detected using the signals of both the front external monitoring cameras 2 ₃ and 2 ₄ and the left or right external monitoring cameras 2 ₂ and 2 ₅ .

さらに、領域Ｋは、前方向用の外界監視カメラ２₂、および左側方向用の外界監視カメラ２₃の２個のカメラを用いた監視が可能で、領域Ｌは、前方向用の外界監視カメラ２₅、および右側方向用の外界監視カメラ２₄の２個のカメラを用いた監視が可能である。 Furthermore, the region K is capable of prior external monitoring camera 2 ₂ for directions, and monitoring with two cameras of external monitoring camera 2 ₃ for the left direction, region L is outside the monitoring camera before a direction 2 _5, and it is possible to monitor with two cameras of external monitoring camera 2 ₄ for the right direction.

すなわち、領域Ｋ，ならびに領域Ｌを、左右方向と前方向の接続領域として用い、双方で同時に顔検出を行い、位置関係と特徴量などから同一人物の対応付けを行い、監視に用いるカメラを適宜変更することで、装着者の周囲全体を移動も含めて連続的に監視することができる。 That is, the region K and the region L are used as connection regions in the left and right directions and the front direction, both face detection is performed simultaneously, the same person is associated from the positional relationship and the feature amount, and the camera used for monitoring is appropriately selected. By changing, it is possible to continuously monitor the entire circumference of the wearer including movement.

それにより、本実施の形態１では、装着者が注目した会話相手の音声のみを選択的に増幅して、該装着者に強調して伝達することができるので、騒音下などの環境であっても、会話の支障をきたすことなく、会話をすることが可能となる。 Accordingly, in the first embodiment, only the voice of the conversation partner focused on by the wearer can be selectively amplified and transmitted to the wearer with emphasis. However, it is possible to have a conversation without disturbing the conversation.

なお、本実施の形態１では、人間の会話を対象とすることを仮定しているため、顔移動検出器３を用いた構成としたが、機器の目的に応じ違う物体を検出する検出器を用いることも考えられる。 In the first embodiment, since it is assumed that the target is human conversation, the face movement detector 3 is used. However, a detector that detects different objects depending on the purpose of the device is used. It can also be used.

たとえば、バードウオッチング用途であれば鳥の映像を検出する検出器を用いることにより、鳥の鳴き声などを選択的に強調することできる。また、人の顔と鳥の両方を検出可能な検出器を用いることも考えられる。 For example, in birdwatching applications, a bird's cry can be selectively emphasized by using a detector that detects bird images. It is also conceivable to use a detector that can detect both human faces and birds.

また、本実施の形態１では、人間の聴覚を補助する構成としたが、たとえば、前方向用のカメラで撮影している映像を録画可能にし、ステレオ対応指向性演算器の出力を通常の左右のマイクで捉えた音を加えて記録することで、映像として記録していない位置の音声であっても、一度注目して注目対象として記憶されれば、当該位置の音声を大きい音で記録することが可能である。 In the first embodiment, the human hearing is assisted. However, for example, the video captured by the forward camera can be recorded, and the output of the stereo-compatible directivity calculator is changed to the normal left and right. By adding the sound captured by the microphone and recording, even if the sound is not recorded as a video, if it is recorded as a target of attention once, the sound at that position is recorded with a loud sound It is possible.

この場合、再生時に立体的な映像を再生可能にするために、外界監視カメラ２₂で撮影した画像を左眼画像、外界監視カメラ２₅で撮影した画像を右眼画像として記録する。ただし、記録量を抑制するために、どちらかの画像のみを記録したい場合は、機器使用者が記録対象を左眼画像、または右眼画像のいずれかから選択可能にする。本機能により、装着者が各自の効き眼に合わせ、記録すべき画像を選択することができる。 In this case, in order to allow play stereoscopic image at the time of reproduction, and records eye images taken at ambient monitoring camera 2 ₂ image, the image photographed at ambient surveillance camera 2 ₅ as the right-eye image. However, when it is desired to record only one of the images in order to suppress the recording amount, the device user can select the recording target from either the left eye image or the right eye image. With this function, the wearer can select an image to be recorded in accordance with his / her effectiveness.

さらに、本実施の形態１においては、外界監視カメラ２₁〜２₈、指向性調整器１１₁〜１１₈、およびステレオ対応指向性演算器１３₁〜１３₈がそれぞれ８個ずつ設けられた構成としたが、これらの回路ブロックにおける個数は、これに限定されるものではない。 Furthermore, in the first embodiment, a configuration in which _eight external monitoring cameras 2 ₁ to 2 ₈ , directivity adjusters 11 _{1 to} 11 ₈ , and _eight stereo-compatible directivity calculators 13 _{1 to} 13 ₈ are provided. However, the number of these circuit blocks is not limited to this.

（実施の形態２）
図４は、本発明の実施の形態２によるデジタルサイネージシステムにおける構成の一例を示すブロック図、図５は、図４のデジタルサイネージシステムの外観の一例を示す説明図である。 (Embodiment 2)
FIG. 4 is a block diagram showing an example of the configuration of the digital signage system according to Embodiment 2 of the present invention, and FIG. 5 is an explanatory diagram showing an example of the appearance of the digital signage system of FIG.

本実施の形態１において、デジタルサイネージシステム２５は、公共施設や店舗などに設置したディスプレイに映像や情報を表示する電子看板である。 In the first embodiment, the digital signage system 25 is an electronic signboard that displays video and information on a display installed in a public facility or store.

デジタルサイネージシステム２５は、図４に示すように、外界監視カメラ２ａ，２ｂ、顔移動検出器３、位置情報更新器９、位置情報記憶器１０、指向性調整器１１₁〜１１₈、マイク１２₁〜マイク１２₁₄（図５に示す）から構成されるマイクアレイ１２からなる前記実施の形態１の図１と同じ回路ブロックに、指向性演算器２６₁〜２６₈、音声認識器２７₁〜２７₈、全体制御装置２８、表示制御装置２９、表示装置３０、発音指向性調整演算器３１₁〜３１₈、音声合成器３２₁〜３２₈、複数のスピーカ３３₁〜３３₁₈（図５に示す）から構成されたスピーカアレイ３３、注視状態検出器３４，および加算器３５が新たに設けられた構成となっている。 As shown in FIG. 4, the digital signage system 25 includes an external monitoring camera 2 a, 2 b, a face movement detector 3, a position information updater 9, a position information storage device 10, a directivity adjuster 11 _{1 to} 11 ₈ , a microphone 12. ₁ to microphones 12 ₁₄ (shown in FIG. 5), the same circuit block as that of FIG. 1 of the first embodiment including the microphone array 12 includes directivity calculators 26 _{1 to} 26 ₈ , speech recognizers 27 ₁ to 27 ₈ , overall control device 28, display control device 29, display device 30, pronunciation directivity adjustment calculators 31 _{1 to} 31 ₈ , speech synthesizers 32 _{1 to} 32 ₈ , a plurality of speakers 33 _{1 to} 33 ₁₈ (see FIG. 5) In this configuration, a speaker array 33, a gaze state detector 34, and an adder 35 are provided.

外界監視カメラ２ａ，２ｂは、デジタルサイネージシステム２５の周囲を監視するカメラであり、周囲のどの位置に音源となりうる物があるかを監視し、また注目している人物の追跡を行うためのカメラである。 The external monitoring cameras 2a and 2b are cameras that monitor the surroundings of the digital signage system 25. The cameras are used to monitor the position around the digital signage system 25 that can be a sound source and to track the person who is paying attention. It is.

顔移動検出器３は、外界監視カメラ２ａ，２ｂから入力されるそれぞれの画像から人物の顔を抽出し、その特徴量を検出する。外界監視カメラ２ａ，２ｂで撮影されたそれぞれ画像から抽出された顔の部分は、特徴量、および各画像における位置情報を基に対応付けを行う。 The face movement detector 3 extracts a person's face from each image input from the external monitoring cameras 2a and 2b, and detects the feature amount. The face portions extracted from the images captured by the external monitoring cameras 2a and 2b are associated with each other based on the feature amount and the position information in each image.

対応付けの結果を用いることで、人物の顔の位置だけでなく、複数の画像それぞれにおける位置の違いから顔までの距離も検出する。また、一度検出した顔は監視可能エリア外に出るまで画像の動き情報、ならびに顔画像の特徴量を基に移動の追跡を行うことが可能である。 By using the result of the association, not only the position of the person's face but also the distance to the face is detected from the difference in position in each of the plurality of images. Further, it is possible to track the movement of the face once detected until it moves out of the monitorable area based on the motion information of the image and the feature amount of the face image.

外界監視カメラ２ａ（、または外界監視カメラ２ｂ）の監視画像、および顔移動検出器３が該外界監視カメラ２ａ（、または外界監視カメラ２ｂ）の監視画像から抽出した顔画像位置情報とその識別番号は、注視状態検出器３４に送られる。 The monitoring image of the external monitoring camera 2a (or external monitoring camera 2b), and the face image position information extracted from the monitoring image of the external monitoring camera 2a (or external monitoring camera 2b) by the face movement detector 3 and its identification number Is sent to the gaze state detector 34.

注視状態検出器３４は、検出した顔画像の中から眼の部分を抽出、解析し、その人物が見ている方向を検出する。見ている部分が当該デジタルサイネージシステム２５の表示装置３０に提示されている特定の画像、すなわち画像の中で予め注目された場合は何らかの処理をすべきように記録されている箇所であった場合、その人物の顔の識別番号を位置情報更新器９に送る。 The gaze state detector 34 extracts and analyzes the eye portion from the detected face image, and detects the direction in which the person is looking. When the portion being viewed is a specific image presented on the display device 30 of the digital signage system 25, that is, a portion recorded so as to be processed in the case where attention is paid in advance in the image The identification number of the person's face is sent to the position information updater 9.

位置情報更新器９は、注視状態検出器３４から注視している顔の識別番号を受け取ると、その識別番号に対応する位置情報がまだ未記憶状態であれば、位置情報記憶器１０に記憶する。 When the position information updater 9 receives the identification number of the face being watched from the gaze state detector 34, if the position information corresponding to the identification number is still unstored, it stores it in the position information storage unit 10. .

位置情報記憶器１０に記憶されている位置情報のうち、外界監視カメラ２ａ，２ｂによる追跡が困難になった人物の顔は記憶から消去し、注視状態を解除し一定時間経過した人物の顔の位置情報も消去する。 Of the position information stored in the position information storage device 10, the face of the person who has become difficult to be tracked by the external monitoring cameras 2 a and 2 b is erased from the memory, the gaze state is released, and The location information is also deleted.

また、位置情報記憶器１０で記憶可能な数より位置情報が多くなった場合には、注視状態を解除が早い人物の顔に関する位置情報を消去し、新しく注目している人物の顔の位置情報に入れ替える。 When the position information becomes larger than the number that can be stored in the position information storage device 10, the position information on the face of the person whose gaze state is quickly released is deleted, and the position information of the face of the person who is newly focused Replace with.

しかし、位置情報の消去が全く困難な場合には、新規位置情報の追加は一時中止し、行わない。これは、既にデジタルサイネージシステム２５が応対している人物の位置情報を失わないためである。 However, if it is difficult to delete the position information, the addition of new position information is temporarily stopped and not performed. This is because the position information of the person who is already responding to the digital signage system 25 is not lost.

デジタルサイネージの用途によって、デジタルサイネージシステム２５へ注目していると判断する顔までの距離を一定距離以下に制限可能である。すなわち、顔移動検出器３で求めた顔画像までの距離情報に基づき、デジタルサイネージシステム２５が対応する範囲か否かを判定し、対応すべき範囲であった場合に限り顔移動検出器３から注視状態検出器３４へ対応する位置情報を送り、デジタルサイネージシステム２５へ注目している人物を抽出する。 Depending on the use of the digital signage, the distance to the face determined to be paying attention to the digital signage system 25 can be limited to a certain distance or less. That is, based on the distance information to the face image obtained by the face movement detector 3, it is determined whether or not the digital signage system 25 is in a corresponding range. Position information corresponding to the gaze state detector 34 is sent, and a person who is paying attention to the digital signage system 25 is extracted.

また、位置情報更新器９は、顔移動検出器３で検出した顔移動情報に基づき、位置情報記憶器１０に記憶されている位置情報を更新する。これにより、デジタルサイネージシステム２５を注目している人物の顔が移動した場合でも、当該顔の位置情報を正しく更新することが可能である。 Further, the position information updater 9 updates the position information stored in the position information storage unit 10 based on the face movement information detected by the face movement detector 3. Thereby, even when the face of a person who is paying attention to the digital signage system 25 moves, the position information of the face can be correctly updated.

位置情報記憶器１０に記憶されている注目している顔画像に対応する位置情報は、指向性調整器１１₁〜１１₈のいずれかに割り当てられる。指向性調整器１１₁〜１１₈のうち、位置情報が入力された指向性調整器は、マイクアレイ１２を用いて位置情報に対応する位置の音声をビームフォーミングにより増強するために必要なデジタルフィルタのパラメータを実施の形態１と同様の手法を用いて計算する。 Position information corresponding to the face image of interest stored in the position information storage 10 is assigned to _{one of the} directivity adjusters 11 _{1 to} 11 ₈ . Among the directivity adjusters 11 _{1 to} 11 ₈ , the directivity adjuster to which position information is input is a digital filter necessary for enhancing the sound at the position corresponding to the position information by beam forming using the microphone array 12. These parameters are calculated using the same method as in the first embodiment.

但し、マイクアレイ１２を構成するマイクの配置や個数が前記実施の形態１と異なるため、それに応じて具体的な計算方法は変更する必要がある。 However, since the arrangement and number of microphones constituting the microphone array 12 are different from those of the first embodiment, the specific calculation method needs to be changed accordingly.

パラメータを計算した指向性調整器の計算結果は、指向性調整器２６₁〜２６₈のうち、パラメータを計算した指向性調整器に対応する後段の指向性調整器に入力される。指向性調整器２６₁〜２６₈のうち、パラメータの計算結果が入力された指向性調整器は、該パラメータを用いて、マイクアレイ１２を構成する各マイクからＡ／Ｄ変換入力されたデジタル音声信号のフィルタ処理を行う。 The calculation result of the directivity adjuster that has calculated the parameter is input to the subsequent directivity adjuster corresponding to the directivity adjuster that has calculated the parameter among the directivity adjusters 26 _{1 to} 26 ₈ . Of the directivity adjusters 26 _{1 to} 26 ₈ , the directivity adjuster to which the calculation result of the parameter is input is a digital voice that is A / D converted and input from each microphone constituting the microphone array 12 using the parameter. Perform signal filtering.

フィルタ処理を行った指向性演算器からの音声出力信号は、音声認識器２７₁〜２７₈のいずれかの対応する音声認識器に送られる。フィルタ処理された音声信号が入力された音声認識器では、音声信号に含まれる音声を認識する。 The voice output signal from the directivity calculator that has been subjected to the filter processing is sent to any one of the voice recognizers 27 _{1 to} 27 ₈ . The speech recognizer to which the filtered speech signal is input recognizes speech included in the speech signal.

尚、用途によっては全ての音声認識器２７₁〜２７₈の機能、性能が対象である必要はなく、デジタルサイネージシステム２５が同時にｋ人と双方向コミュニケーション可能であれば、ｎ個のうち、ｋ個の音声認識器のみを高機能、高性能なものとし、他の音声認識器は、デジタルサイネージシステムへの最初の話しかけを検出だけできる低機能な構成も考えられる。 Depending on the application, the functions and performances of all the speech recognizers 27 _{1 to} 27 ₈ do not have to be targets. If the digital signage system 25 can simultaneously communicate with k people, k out of n Only a single speech recognizer is considered to have a high function and high performance, and other speech recognizers may have a low function configuration that can only detect the first talk to the digital signage system.

位置情報記憶器１０では、現在デジタルサイネージシステムが双方向コミュニケーション対象としている人物の顔の位置情報が記憶しているどの位置情報に対応するのか管理するため、当該位置情報を高機能、高性能な音声認識器に送られる指向性演算器の指向性調整器に送られるよう制御する。 Since the position information storage 10 manages which position information stored in the digital signage system currently corresponds to the position information of the face of the person that is the target of two-way communication, the position information is highly functional and high performance. Control is performed so as to be sent to the directivity adjuster of the directivity calculator sent to the speech recognizer.

音声認識器で認識した結果は、全体制御装置２８に送られる。全体制御装置２８は、新規に双方向コミュニケーションを行う余裕がデジタルサイネージシステム２５にある状態で、最初の話しかけに相当する言葉、たとえば「すみません」「教えてください」などを検出したり、あるいは表示装置３０に表示している画像で、予め注目された場合は何らかの処理をすべきように記録されている箇所に対しての注視を検出したりした場合は、当該人物との双方向コミュニケーション状態に入る。 The result recognized by the speech recognizer is sent to the overall control device 28. The overall control device 28 detects a word corresponding to the first talk, for example, "I'm sorry", "Tell me", or the like, while the digital signage system 25 has a margin for new two-way communication. In the image displayed on the screen 30, when attention is paid in advance, if a gaze is detected for a part that is recorded so that some processing should be performed, a two-way communication state with the person is entered. .

双方向コミュニケーション状態に入った場合は、位置情報記憶器１０に記録されている当該人物に対応する位置情報に対し、双方向コミュニケーション対象であることを示すフラグを立てる。 When the two-way communication state is entered, a flag indicating that it is a target for two-way communication is set for the position information corresponding to the person recorded in the position information storage unit 10.

双方向コミュニケーション向けに必要であれば、適宜子画面３０ａ，３０ｂ，３０ｃ（図５に示す）を割り当てるために、全体制御装置２８は、表示制御装置２９を設定し、表示装置３０に任意の子画面３０ａ，３０ｂ，３０ｃのいずれかを出す。この子画面を用いて、双方向コミュニケーション対象に対して情報を表示する。 If necessary for two-way communication, the overall control device 28 sets the display control device 29 and assigns an arbitrary child to the display device 30 in order to appropriately assign the child screens 30a, 30b, 30c (shown in FIG. 5). One of the screens 30a, 30b, and 30c is displayed. Information is displayed for the two-way communication target using this sub-screen.

双方向コミュニケーション対象としてフラグが立っている位置情報は、発音指向性調整演算器３１₁〜３１₈に送られる。これら発音指向性調整演算器３１₁〜３１₈は、各音声合成器３２₁〜３２₈において合成された音声を特定の位置でのみ良く聞こえるように、スピーカアレイ３３を駆動するための指向性調整演算を行う。 Position information flag as a two-way communication target is set, is sent to the sound directivity adjustment calculator 31 _1-31 _8. These sound directivity adjustment calculator 31 _1-31 _8, the speech synthesized in the speech synthesizer 321 _to 323 ₈ to be heard well only in certain locations, directivity adjustment for driving the speaker array 33 Perform the operation.

すなわち、スピーカアレイ３３における駆動する各スピーカの位置と双方向コミュニケーション対象の人物の顔の位置関係から、発生した音声が双方向コミュニケーション対象の人物位置付近で位相が合うように遅延を調整し、また複数スピーカで駆動されることを考慮してゲインを落とす演算を行う。 That is, from the positional relationship between the position of each speaker to be driven in the speaker array 33 and the face of the person who is the object of bidirectional communication, the delay is adjusted so that the generated sound is in phase near the position of the person who is the object of bidirectional communication, An operation for reducing the gain is performed in consideration of being driven by a plurality of speakers.

発音指向性調整演算器３１₁〜３１₈によって演算された指向性調整の演算結果は、加算器３５によって加算された後、スピーカアレイ３３出力される。 Pronunciation directivity adjustment calculator 31 _1-31 ₈ operation result of the computed directivity adjustment by are summed by the adder 35, and output speaker array 33.

尚、双方向対象人物の顔付近でのみ良く聞こえる音を生成できれば、スピーカアレイ３３の制御は他の方法でも構わない。また、超音波を搬送波として用いる超指向性スピーカをスピーカアレイの代わりに用い、当該スピーカの向きを機械的に調整するなどの方法も考えられる。 Note that the speaker array 33 may be controlled by other methods as long as a sound that can be heard well only near the face of the bidirectional target person can be generated. In addition, a method of using a super-directional speaker that uses ultrasonic waves as a carrier wave instead of the speaker array and mechanically adjusting the direction of the speaker may be considered.

図５は、デジタルサイネージシステム２５の外観の一例を示す説明図である。 FIG. 5 is an explanatory diagram showing an example of the appearance of the digital signage system 25.

デジタルサイネージシステム２５は、中央部に表示装置３０が設けられており、該表示装置３０の左側には、マイクアレイ１２を構成するマイク１２₁〜１２₃が下方から上方にかけてそれぞれ設けられている。 The digital signage system 25 is provided with a display device 30 at the center, and microphones 12 _{1 to} 12 ₃ constituting the microphone array 12 are provided on the left side of the display device 30 from below to above.

また、表示装置３０の上部には、左から右にかけて、マイク１２₄〜１２₁₁がそれぞれ設けられており、該表示装置３０の右側には、マイク１２₁₂〜１２₁₄が上方から下方にかけてそれぞれ設けられている。 Further, microphones 12 _{4 to} 12 ₁₁ are provided on the upper portion of the display device 30 from left to right, respectively, and microphones 12 _{12 to} 12 ₁₄ are provided on the right side of the display device 30 from the upper side to the lower side, respectively. It has been.

表示装置３０の下方には、スピーカアレイ３３を構成するスピーカ３３₁〜３３₁₈が、上下２段に配列されている。マイク１２₅とマイク１２₆との間、およびマイク１２₉とマイク１２₁₀との間には、外界監視カメラ２ａ，２ｂがそれぞれ設けられている。 Below the display device 30, speakers 33 _{1 to} 33 ₁₈ constituting the speaker array 33 are arranged in two upper and lower stages. External monitor cameras 2a and 2b are provided between the microphone 12 ₅ and the microphone 12 ₆ and between the microphone 12 ₉ and the microphone 12 ₁₀ , respectively.

デジタルサイネージシステム２５の実際の設置を考慮した場合、デジタルサイネージシステム２５と人物の位置関係は、縦方向には人間の身長差程度しか変化しないが、横方向は双方向コミュニケーション対象とする人物の立ち位置に依存するため、縦方向に比べて横方向に多くのマイクを配置し、横方向の指向性をより高められるようにしている。 When the actual installation of the digital signage system 25 is taken into consideration, the positional relationship between the digital signage system 25 and the person changes only by a height difference of the person in the vertical direction, but in the horizontal direction, the position of the person who is the object of bidirectional communication is changed. Because it depends on the position, more microphones are arranged in the horizontal direction than in the vertical direction so that the directivity in the horizontal direction can be further improved.

また、外界監視カメラ２ａ，２ｂは、デジタルサイネージシステム２５の上部に配置することで、少しでも高い位置からの監視を行い、双方向コミュニケーション対象となっている人物の前を他の人物が横切った場合でも、双方向コミュニケーションに必要な画像追跡の妨げにならないようにしている。 In addition, the outside world surveillance cameras 2a and 2b are arranged on the upper part of the digital signage system 25 to perform monitoring from a high position as much as possible, and another person crosses in front of the person who is the object of bidirectional communication. Even in this case, the image tracking necessary for two-way communication is not hindered.

但し、実際に他の人物の影になり２台の外界監視カメラ２ａ，２ｂでの画像追跡が困難になった場合であっても、片方の外界監視カメラ２で画像追跡できる場合はそちらのカメラで追跡を行い、両方の外界監視カメラ２ａ，２ｂ共に追跡不可能になった場合には、一時的に移動予測を行い、さらに画像を再度捉えた際には、移動の予測情報と画像の特徴量を基に、位置情報記憶器１０に記録されている位置との対応付けを試み、対応付け可能であれば追跡を再開する。対応付けに失敗した場合には、その人物に対応する双方向コミュニケーションを中止する。 However, even if it is actually a shadow of another person and it becomes difficult to track the image with the two external monitoring cameras 2a and 2b, if one external monitoring camera 2 can track the image, that camera If both the external monitoring cameras 2a and 2b become unable to track, the movement prediction is temporarily performed. When the image is captured again, the movement prediction information and the image characteristics are recorded. Based on the amount, an attempt is made to associate with a position recorded in the position information storage unit 10, and if association is possible, tracking is resumed. If the association fails, the two-way communication corresponding to the person is stopped.

スピーカアレイ３３を構成するスピーカ３３₁〜３３₁₈は、マイク１２₁〜１２₁₄と同様に、デジタルサイネージシステム２５と人物の位置関係を考慮し、縦方向には２段構成とし、横方向に多くのスピーカを配置している。 The speakers 33 _{1 to} 33 ₁₈ constituting the speaker array 33 are configured in a two-stage configuration in the vertical direction and a large number in the horizontal direction in consideration of the positional relationship between the digital signage system 25 and the person, like the microphones 12 _{1 to} 12 _14. The speaker is arranged.

また、表示装置３０には、子画面３０ａ、子画面３０ｂ、子画面３０ｃが表示可能である。これらの子画面３０ａ，３０ｂ，３０ｃは、双方向コミュニケーション人物に対し、個別対応が必要な情報の提示に用いる。 Further, the display device 30 can display a small screen 30a, a small screen 30b, and a small screen 30c. These sub-screens 30a, 30b, and 30c are used for presenting information that needs to be individually handled to a two-way communication person.

子画面３０ａ，３０ｂ，３０ｃは、不要な時は消去してもよいし、常に表示しておき、双方向コミュニケーションに用いていない場合は別の情報を提示するようにしてもよいし、「ご質問のある方はこの部分を見てお話しください」と表示するなどして、双方向コミュニケーションの開始を促してもよい。 The sub-screens 30a, 30b, and 30c may be deleted when they are unnecessary, or may be always displayed, and other information may be presented when not used for two-way communication. If you have any questions, please look at this part and speak. "

それにより、本実施の形態２においては、デジタルサイネージシステム２５が、双方向コミュニケーション対象者付近の音を抽出し、また当該対象者付近でのみ良く聞こえる音を出力するため、街頭のような騒音の多い場所であっても、単純に画像を表示するだけでなく、画像に注目した人物へより積極的な案内を行い、また人物からの要求や質問に対応することを可能にすることができる。また、１台のデジタルサイネージシステム２５で複数の双方向コミュニケーションを実現できる。 As a result, in the second embodiment, the digital signage system 25 extracts sounds in the vicinity of the target person for two-way communication and outputs sounds that can be heard well only in the vicinity of the target person. Even in many places, it is possible not only to simply display an image but also to provide more active guidance to a person who has focused on the image and to respond to requests and questions from the person. Also, a plurality of two-way communications can be realized with one digital signage system 25.

以上、本発明者によってなされた発明を実施の形態に基づき具体的に説明したが、本発明は前記実施の形態に限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能であることはいうまでもない。 As mentioned above, the invention made by the present inventor has been specifically described based on the embodiment. However, the present invention is not limited to the embodiment, and various modifications can be made without departing from the scope of the invention. Needless to say.

本発明は、会話相手の音声を選択に強調する音声処理技術に適している。 The present invention is suitable for a voice processing technique that emphasizes the voice of a conversation partner in selection.

１音声処理システム
２₁〜２₈ 外界監視カメラ
２ａ外界監視カメラ
２ｂ外界監視カメラ
３顔移動検出器
４左眼監視カメラ
５右眼監視カメラ
６近赤外線ＬＥＤ
７両眼視線検出器
８視線対象検出器
９位置情報更新器
１０位置情報記憶器
１１₁〜１１₁₈ 指向性調整器
１２マイクアレイ
１２₁〜１２₁₄ マイク
１３₁〜１３₈ ステレオ対応指向性演算器
１４，１５加算器
１６左耳イヤフォン
１７右耳イヤフォン
１８左耳マイク
１９右耳マイク
２０左ノイズキャンセル装置
２１右ノイズキャンセル装置
２２メガネ
２３左収納ボックス
２４右収納ボックス
２５デジタルサイネージシステム
２６₁〜２６₈ 指向性演算器
２７₁〜２７₈ 音声認識器
２８全体制御装置
２９表示制御装置
３０表示装置
３０ａ子画面
３０ｂ子画面
３０ｃ子画面
３１₁〜３１₈ 発音指向性調整演算器
３２₁〜３２₈ 音声合成器
３３スピーカアレイ
３３₁〜３３₁₈ スピーカ
３４注視状態検出器
３５加算器 1 speech processing system 2 ₁ to 2 ₈ external surveillance cameras 2a external surveillance camera 2b external monitoring camera 3 faces motion detector 4 the left-eye monitoring camera 5 right-eye monitoring camera 6 near infrared LED
7 binocular gaze detector 8 gaze target detector 9 position information updater 10 position information storage unit 11 _{1 to} 11 ₁₈ directivity adjuster 12 microphone array 12 _{1 to} 12 ₁₄ microphone 13 _{1 to} 13 ₈ stereo-compatible directivity calculator 14, 15 Adder 16 Left ear earphone 17 Right ear earphone 18 Left ear microphone 19 Right ear microphone 20 Left noise cancellation device 21 Right noise cancellation device 22 Glasses 23 Left storage box 24 Right storage box 25 Digital signage system 26 _{1 to} 26 ₈ directional calculator 27 _1-27 ₈ speech recognizer 28 total control unit 29 display control unit 30 display unit 30a small picture 30b small picture 30c small pictures 31 _1-31 ₈ pronunciation directivity adjustment operator 32 _1-32 ₈ speech synthesis Device 33 speaker array 33 _{1 to} 33 ₁₈ speaker 34 gaze state detector 35 adder

Claims

A sound source acquisition unit for acquiring sound from the sound source of the object;
A line-of-sight monitoring detection unit that detects the object by monitoring the user's line of sight, and detects the direction and distance of the object;
A camera that monitors the surroundings,
A movement detection unit that detects an object from video captured by the camera, detects movement of the object from continuity of changes in image position, and outputs movement information;
From the respective detection results detected by the movement detection unit and the line-of-sight monitoring detection unit, a position information detection unit that detects the position information by tracking the object;
A directional sound source adjustment unit that adjusts the directivity of the sound source acquisition unit and outputs a sound signal based on the position information detected by the position information detection unit so that sound at a position corresponding to the position information is enhanced. When,
A sound processing system comprising: a sound output unit that outputs a sound signal output from the directional sound source adjustment unit.

The sound processing system according to claim 1,
The line-of-sight monitoring detector
A surveillance camera unit for monitoring the user's line of sight;
From the monitoring result of the monitoring camera unit, a gaze detection unit that detects the gaze direction of the left eye and the right eye from the black eye part and the cornea reflection image, and
A line-of-sight target detection unit for detecting a direction point and a distance of an object viewed by a user, by obtaining an intermediate point coordinate of a position where a straight line corresponding to the line of sight closest in space from the line-of-sight information detected by the line-of-sight detection unit; With
The position information detector
From the detection results of the movement detection unit and the line-of-sight detection unit, when the line of sight stays for an arbitrary time or longer, the position information of the object is output as a position to be noticed, and the movement detection unit detects A position information update unit that updates the position information of the object based on movement information;
A sound processing system comprising: a position information storage unit that stores position information output from the position information update unit.

The sound processing system according to claim 1 or 2,
The location information update unit
When the position information is detected, if it is determined that the object is more than an arbitrary distance, the sound processing system is excluded from the object.

The sound processing system according to any one of claims 1 to 3,
The movement detector is
Detecting the face of a person from the video taken by the camera to detect the feature amount of the face, determining the same person based on the feature amount, detecting the movement of the face from the continuity of image position changes,
The sound source acquisition unit
A sound processing system that acquires sound based on control of the directional sound source adjustment unit.

The sound processing system according to any one of claims 1 to 4,
The sound source acquisition unit
A sound processing system comprising a plurality of microphones.

The sound processing system according to any one of claims 1 to 5,
The sound output unit is
A sound processing system comprising a noise canceling unit that generates a signal that cancels noise and cancels noise.

A display unit for displaying images;
A sound output unit for outputting sound;
A camera that monitors the surroundings,
A sound source acquisition unit for acquiring sound;
A face movement detection unit that extracts the face of a person from an image input from the camera, detects the position and distance of the extracted face, tracks the face of the person, and outputs the position information;
Based on the image of the camera and the position information output by the face movement detection unit, an eye part is extracted from the detected face image, and the viewing direction of the person is displayed on the display unit. A gaze state detection unit that determines whether the specific display unit is a place to perform some processing;
A position information update unit that acquires position information of a corresponding person when the gaze state detection unit determines that the person is looking at the specific display unit;
Based on the position information acquired by the position information update unit, a parameter for enhancing the sound corresponding to the position of the person by beam forming is calculated, and based on the calculation result, the sound at the position corresponding to the position information is calculated. A directional sound source adjustment unit that outputs a sound signal by adjusting the directivity of the sound source acquisition unit to be enhanced;
The voice included in the voice signal output from the directional sound source adjustment unit is recognized, and based on the recognition result, it is determined whether or not the person is a two-way communication target. A voice recognition display control unit that displays information on the display unit based on position information;
Based on the position information output from the voice recognition display control unit, directivity calculation processing is performed to provide directivity so that sound is transmitted only to the target person, and based on the calculation result, the sound output unit A sound processing system comprising a pronunciation directivity adjustment calculation unit that outputs sound from

The sound processing system according to claim 7,
The location information update unit
When the gaze state detection unit determines that the person is gazing at the specific display unit, the gaze state detection unit updates and tracks the position information each time the position information of the person's face changes. Sound processing system.

The sound processing system according to claim 7,
The voice recognition display control unit
A sound processing system that displays video information of a small screen on the display unit based on position information of a person who is an object of bidirectional communication.

The sound processing system according to claim 8, wherein
The position information acquired by the position information update unit is a plurality of persons, and the position information of the plurality of persons is tracked and updated.