JP2008167319A

JP2008167319A - Headphone system, headphone drive controlling device, and headphone

Info

Publication number: JP2008167319A
Application number: JP2006356495A
Authority: JP
Inventors: Koji Tanitaka; 幸司谷高; Toshinao Suzuki; 利尚鈴木
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2006-12-28
Filing date: 2006-12-28
Publication date: 2008-07-17

Abstract

<P>PROBLEM TO BE SOLVED: To enable a predetermined sound to be listened in surrounding sounds upon using a headphone, and to improve a determination accuracy of whether or not a sound is a predetermined sound. <P>SOLUTION: A headphone drive controlling device which supplies a sound signal of a left channel to a left side speaker of a headphone and a sound signal of a right channel to a right side speaker has a both-ear signal processor performing both-ear processing to output signals of a first microphone arranged in the vicinity of the left side speaker and a second microphone arranged in a vicinity of the right side speaker, and generating waveform data indicating a sound source waveform to be outputted, a sound recognition processor determining whether or not the waveform indicating the output data of the both-ear signal processor coincides with any of predetermined waveforms, and outputting an external sound detection signal when coincidence is determined, and a mixer mixing the output signal of the first microphone to the sound signal of the left channel to be outputted when the external sound detection signal is received, and mixing the output signal of the second microphone to the sound signal of the right channel to be outputted. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、ヘッドホンの駆動制御装置に関し、特に、ヘッドホンを利用して楽音等を聴いているユーザが周囲の音も聴き取れるようにする技術に関する。 The present invention relates to a headphone drive control device, and more particularly, to a technology that enables a user who is listening to a musical sound or the like using headphones to listen to surrounding sounds.

近年、ＭＰ３プレーヤなどの携帯型楽音再生装置が普及している。この種の楽音再生装置は、屋外や電車などの車両内での利用も可能であり、その利用の際には周囲の騒音による影響を避けるため、消音ヘッドホン（以下、単に「ヘッドホン」と呼ぶ）を用いることが一般的である。
しかしながら、ヘッドホンを利用して楽音等を聴いている場合、例えば周囲からの呼びかけや注意、自動車のクラクション音などの警告音を聴き取ることができず、危険な場合があるといった問題点がある。このため、ヘッドホンの利用時においても周囲の音のうちの特定のものについてはそのユーザが聴き取ることを可能にする技術が種々提案されている。例えば、特許文献１では、音声入力部と、音声入力部に入力された音声の音量を調節する音量調節部と、音量調節部で音量が調節された音声を出力する音声出力部とを有する音量制御装置に、マイクロフォンと、音声パターンを記憶する記憶部と、マイクロフォンで検出された音声パターンを認識する音声パターン認識部と、音声パターン認識部がマイクロフォンで検出された音声に、記憶部に記憶された音声パターンを確認したとき、音量調節部に対して音量の調節を指令する音量制御部と、を設けることが提案されている。
特開２００４−１３０８４号公報 In recent years, portable music playback devices such as MP3 players have become widespread. This type of musical sound reproducing device can be used outdoors or in a vehicle such as a train. In order to avoid the influence of ambient noise during the use, a silencer headphone (hereinafter simply referred to as “headphone”) is used. Is generally used.
However, when listening to a musical sound or the like using headphones, there is a problem that a warning sound such as a call or caution from the surroundings or a car horn sound cannot be heard, which may be dangerous. For this reason, various techniques have been proposed that allow the user to listen to specific sounds of the surroundings even when using headphones. For example, in Patent Document 1, a volume having an audio input unit, a volume adjustment unit that adjusts the volume of audio input to the audio input unit, and an audio output unit that outputs audio whose volume is adjusted by the volume adjustment unit In the control device, a microphone, a storage unit that stores a voice pattern, a voice pattern recognition unit that recognizes a voice pattern detected by the microphone, and a voice that the voice pattern recognition unit is detected by the microphone are stored in the storage unit. It has been proposed to provide a volume control unit for instructing the volume adjustment unit to adjust the volume when the voice pattern is confirmed.
JP 2004-13084 A

しかしながら、特許文献１に開示された技術では、外部音が、聴き取る必要があるとしてユーザより予め定められた音であるのか、それとも、単なる騒音であるのかの判別を１つのマイクの出力信号を用いて行っているため、パターン認識による騒音との識別が現実的には難しいといった問題点がある。
本発明は上記課題に鑑みて為されたものであり、ヘッドホンの利用中であっても聴き取る必要があるとユーザにより予め定められた音をそのユーザに聴き取らせることを可能にするとともに、予め定められた音であるか否かの判別を精度良く行うことを可能にする技術を提供することを目的としている。 However, in the technique disclosed in Patent Document 1, it is determined whether an external sound is a sound determined in advance by the user as being necessary to be heard or whether it is a mere noise. Therefore, there is a problem that it is practically difficult to distinguish from noise by pattern recognition.
The present invention has been made in view of the above problems, and enables the user to listen to the sound predetermined by the user when it is necessary to listen even while using the headphones. It is an object of the present invention to provide a technique that makes it possible to accurately determine whether a sound is a predetermined sound.

上記課題を解決するために、本発明は、左耳向けスピーカと右耳向けスピーカとを有し、前記左耳向けスピーカの近傍には第１のマイクが固定されている一方、前記右耳向けスピーカの近傍には第２のマイクが固定されているヘッドホンと、音源装置から左チャネルおよび右チャネルの音信号を受け取り、左チャネルの音信号を前記左耳向けスピーカへ供給する一方、右チャネルの音信号を前記右耳向けスピーカへ供給するヘッドホン駆動制御装置と、を備え、前記ヘッドホン駆動制御装置は、前記両マイクの出力信号の位相差または時間差の少なくとも一方を利用してノイズ分析およびノイズ低減を行い、それら出力信号の表す音の音源波形を示す波形データを生成し出力する両耳信号処理部と、前記両耳信号処理部から出力される波形データの表す波形が、予め定められた１または複数の波形の何れかに一致するか否かを判定し、一致すると判定した場合に外部音検出信号を出力する音声認識処理部と、前記音声認識処理部から外部音検出信号を受け取った場合に、前記第１のマイクの出力信号を前記左チャネルの音信号にミキシングして前記左耳向けスピーカへ供給する一方、前記第２のマイクの出力信号を前記右チャネルの音信号にミキシングして前記右耳向けスピーカへ供給するミキサと、を有することを特徴とするヘッドホンシステムを提供する。 In order to solve the above problems, the present invention has a left-ear speaker and a right-ear speaker, and a first microphone is fixed in the vicinity of the left-ear speaker, while the right-hand speaker is used. Headphones with a second microphone fixed in the vicinity of the speaker and the sound signals of the left channel and the right channel are received from the sound source device, and the sound signal of the left channel is supplied to the speaker for the left ear, while the right channel A headphone drive control device that supplies a sound signal to the right-ear speaker, wherein the headphone drive control device uses at least one of a phase difference or a time difference between the output signals of the two microphones to perform noise analysis and noise reduction. A binaural signal processing unit that generates and outputs waveform data indicating a sound source waveform of the sound represented by the output signals, and a waveform data output from the binaural signal processing unit. A speech recognition processing unit that outputs an external sound detection signal when it is determined that the waveform represented by the data matches one of a predetermined waveform or a plurality of predetermined waveforms, and the speech recognition When an external sound detection signal is received from the processing unit, the output signal of the first microphone is mixed with the sound signal of the left channel and supplied to the speaker for the left ear, while the output signal of the second microphone And a mixer that mixes the sound signal with the right channel sound signal and supplies the sound signal to the right-ear speaker.

また、上記課題を解決するために、本発明は、左耳向けスピーカと右耳向けスピーカとを有し、前記左耳向けスピーカの近傍には第１のマイクが固定されている一方、前記右耳向けスピーカの近傍には第２のマイクが固定されているヘッドホンと、音源装置から左チャネルおよび右チャネルの音信号を受け取り、左チャネルの音信号を前記左耳向けスピーカへ供給する一方、右チャネルの音信号を前記右耳向けスピーカへ供給するヘッドホン駆動制御装置と、を備え、前記ヘッドホン駆動制御装置は、前記両マイクの出力信号の位相差または時間差の少なくとも一方を利用してノイズ分析およびノイズ低減を行い、それら出力信号の表す音の音源波形を示す波形データを生成し出力する両耳信号処理部と、前記両耳信号処理部から出力される波形データの表す波形が、予め定められた１または複数の波形の何れかに一致するか否かを判定し、一致すると判定した場合に外部音検出信号を出力する音声認識処理部と、前記音声認識処理部から外部音検出信号を受け取った場合に、左チャネルの音信号に代えて前記第１のマイクの出力信号を前記左耳向けスピーカへ供給し、右チャネルの音信号に代えて前記第２のマイクの出力信号を前記右耳向けスピーカへ供給する選択部とを有することを特徴とするヘッドホンシステムを提供する。 In order to solve the above problems, the present invention includes a left-ear speaker and a right-ear speaker, and a first microphone is fixed in the vicinity of the left-ear speaker, while the right microphone A headphone with a second microphone fixed in the vicinity of the ear speaker and a sound signal of the left channel and the right channel from the sound source device and receiving the sound signal of the left channel to the speaker for the left ear, A headphone drive control device that supplies a channel sound signal to the right-ear speaker, wherein the headphone drive control device uses at least one of a phase difference or a time difference between the output signals of the two microphones to perform noise analysis and A binaural signal processing unit that performs noise reduction and generates and outputs waveform data indicating a sound source waveform of a sound represented by the output signals, and is output from the binaural signal processing unit A speech recognition processing unit that determines whether or not a waveform represented by the shape data matches one or more of predetermined waveforms, and outputs an external sound detection signal when it is determined to match, the speech When an external sound detection signal is received from the recognition processing unit, the output signal of the first microphone is supplied to the left ear speaker instead of the left channel sound signal, and the first channel sound signal is replaced with the right channel sound signal. And a selection unit that supplies an output signal of the microphone to the right-ear speaker.

また、上記課題を解決するために、本発明は、音源装置から受け取った左チャネルの音信号をヘッドホンの左耳向けスピーカへ供給する一方、前記音源装置から受け取った右チャネルの音信号を前記ヘッドホンの右耳向けスピーカへ供給するヘッドホン駆動制御装置において、前記左耳向けスピーカの近傍に配置される第１のマイクの出力信号と前記右耳向けスピーカの近傍に配置される第２のマイクの出力信号の位相差または時間差の少なくとも一方を利用してノイズ分析およびノイズ低減を行い、それら出力信号の表す音の音源波形を示す波形データを生成し出力する両耳信号処理部と、前記両耳信号処理部から出力される波形データの表す波形が、予め定められた１または複数の波形の何れかに一致するか否かを判定し、一致すると判定した場合に外部音検出信号を出力する音声認識処理部と、前記音声認識処理部から外部音検出信号を受け取った場合に、前記左チャネルの音信号に前記第１のマイクの出力信号をミキシングして前記左耳向けスピーカへ供給する一方、前記右チャネルの音信号に前記第２のマイクの出力信号をミキシングして前記右耳向けスピーカへ供給するミキサとを有することを特徴とするヘッドホン駆動制御装置を提供する。 In order to solve the above problems, the present invention supplies a left channel sound signal received from a sound source device to a speaker for a left ear of a headphone, while a right channel sound signal received from the sound source device is supplied to the headphone. In the headphone drive control apparatus for supplying to the right ear speaker, the output signal of the first microphone arranged in the vicinity of the left ear speaker and the output of the second microphone arranged in the vicinity of the right ear speaker A binaural signal processing unit that performs noise analysis and noise reduction using at least one of a signal phase difference or a time difference and generates and outputs waveform data indicating a sound source waveform of a sound represented by the output signal; and the binaural signal It is determined whether or not the waveform represented by the waveform data output from the processing unit matches one or more of the predetermined waveforms. A speech recognition processing unit that outputs an external sound detection signal in the case where the external microphone detection signal is received, and when the external sound detection signal is received from the speech recognition processing unit, the output signal of the first microphone is mixed with the sound signal of the left channel. And a mixer for mixing the output signal of the second microphone with the sound signal of the right channel and supplying the output signal to the speaker for the right ear. Providing equipment.

また、上記課題を解決するために、本発明は、音源装置から受け取った左チャネルの音信号をヘッドホンの左耳向けスピーカへ供給する一方、前記音源装置から受け取った右チャネルの音信号を前記ヘッドホンの右耳向けスピーカへ供給するヘッドホン駆動制御装置において、前記左耳向けスピーカの近傍に配置される第１のマイクの出力信号と前記右耳向けスピーカの近傍に配置される第２のマイクの出力信号の位相差または時間差の少なくとも一方を利用してノイズ分析およびノイズ低減を行い、それら出力信号の表す音の音源波形を示す波形データを生成し出力する両耳信号処理部と、前記両耳信号処理部から出力される波形データの表す波形が、予め定められた１または複数の波形の何れかに一致するか否かを判定し、一致すると判定した場合に外部音検出信号を出力する音声認識処理部と、前記音声認識処理部から外部音検出信号を受け取った場合に、左チャネルの音信号に代えて前記第１のマイクの出力信号を前記左耳向けスピーカへ供給し、右チャネルの音信号に代えて前記第２のマイクの出力信号を前記右耳向けスピーカへ供給する選択部とを有することを特徴とするヘッドホン駆動制御装置を提供する。 In order to solve the above problems, the present invention supplies a left channel sound signal received from a sound source device to a speaker for a left ear of a headphone, while a right channel sound signal received from the sound source device is supplied to the headphone. In the headphone drive control apparatus for supplying to the right ear speaker, the output signal of the first microphone arranged in the vicinity of the left ear speaker and the output of the second microphone arranged in the vicinity of the right ear speaker A binaural signal processing unit that performs noise analysis and noise reduction using at least one of a signal phase difference or a time difference and generates and outputs waveform data indicating a sound source waveform of a sound represented by the output signal; and the binaural signal It is determined whether or not the waveform represented by the waveform data output from the processing unit matches one or more of the predetermined waveforms. A speech recognition processing unit that outputs an external sound detection signal in the case of receiving the external sound detection signal from the speech recognition processing unit, the output signal of the first microphone instead of the sound signal of the left channel A headphone drive control device comprising: a selection unit that supplies a left-ear speaker to an output signal of the second microphone to the right-ear speaker instead of a right channel sound signal. .

また、上記課題を解決するために、本発明は、左耳向けスピーカと右耳向けスピーカとを有し、前記左耳向けスピーカの近傍には第１のマイクが固定されている一方、前記右耳向けスピーカの近傍には第２のマイクが固定されていることを特徴とするヘッドホンを提供する。 In order to solve the above problems, the present invention includes a left-ear speaker and a right-ear speaker, and a first microphone is fixed in the vicinity of the left-ear speaker, while the right microphone A headphone is provided in which a second microphone is fixed in the vicinity of the ear speaker.

本発明によれば、ヘッドホンの利用中であっても聴き取る必要があるとユーザにより予め定められた音をそのユーザに聴き取らせることが可能になるとともに、予め定められた音であるか否かの判別を精度良く行うことが可能になる、といった効果を奏する。 According to the present invention, it is possible to make a user listen to a sound predetermined by a user when it is necessary to listen even when the headphones are being used, and whether the sound is a predetermined sound or not. It is possible to perform such determination with high accuracy.

以下、図面を参照しつつ、本発明を実施する際の最良の形態について説明する。
（Ａ：構成）
図１は、本発明の一実施形態に係るヘッドホンシステム１０の構成例を示すブロック図である。図１に示すように、このヘッドホンシステム１０は、ヘッドホン１００およびへッドホン駆動制御装置２００を含んでいる。 Hereinafter, the best mode for carrying out the present invention will be described with reference to the drawings.
(A: Configuration)
FIG. 1 is a block diagram illustrating a configuration example of a headphone system 10 according to an embodiment of the present invention. As shown in FIG. 1, the headphone system 10 includes a headphone 100 and a headphone drive control device 200.

ヘッドホン１００は、ユーザの左耳に向けて放音するスピーカ１１０Ｌと同ユーザの右耳に向けて音を放音するスピーカ１１０Ｒを有しており、スピーカ１１０Ｌには左チャネルの出力信号ＺＬがヘッドホン駆動制御装置２００から供給され、スピーカ１１０Ｒには右チャネルの出力信号ＺＲがヘッドホン駆動制御装置２００から供給される。図１に示すように、スピーカ１１０Ｌおよび１１０Ｒの各々は、耳あて１３０Ｌおよび１３０Ｒの各々に内蔵されている。図１の耳あて１３０Ｌおよび１３０Ｒは、ヘッドホン１００を装着したユーザの耳を覆い、スピーカ１１０Ｌおよび１１０Ｒから出力される音を受聴する際の妨げとならないように外部音を消音または遮音するためのものである。耳あて１３０Ｌには、スピーカ１１０Ｌの近傍にスピーカ１１０Ｌに対して外向き（すなわち、周囲の音を収音し易い向き）にマイク１２０Ｌが設けられている。同様に、耳あて１３０Ｒには、スピーカ１１０Ｒの近傍にスピーカ１１０Ｒに対して外向きにマイク１２０Ｒが設けられている。マイク１２０Ｌは収音した音（すなわち、外部音）を表す音信号（以下、外部音信号）ＹＬをヘッドホン駆動制御装置２００に供給し、マイク１２０Ｒは収音した外部音を表す外部音信号ＹＲをヘッドホン駆動制御装置２００に供給する。
音源装置３００は、例えば、ＭＰ３プレーヤであり、自装置に記憶されている楽音データにしたがって左右２チャネルの音信号（以下、音源信号）ＸＬおよびＸＲを生成し、ヘッドホン駆動制御装置２００に供給する。 The headphone 100 includes a speaker 110L that emits sound toward the user's left ear and a speaker 110R that emits sound toward the user's right ear. The speaker 110L receives the output signal ZL of the left channel from the headphone. The right channel output signal ZR is supplied from the headphone drive control device 200 to the speaker 110R. As shown in FIG. 1, each of the speakers 110L and 110R is built in each of the earpieces 130L and 130R. The ear contacts 130L and 130R in FIG. 1 cover the ears of the user wearing the headphones 100, and mute or silence external sounds so as not to interfere with listening to the sound output from the speakers 110L and 110R. It is. The ear pad 130L is provided with a microphone 120L in the vicinity of the speaker 110L and outward with respect to the speaker 110L (that is, in a direction in which ambient sounds are easily collected). Similarly, the earpiece 130R is provided with a microphone 120R facing the speaker 110R in the vicinity of the speaker 110R. The microphone 120L supplies a sound signal (hereinafter referred to as an external sound signal) YL representing the collected sound (ie, external sound) to the headphone drive control device 200, and the microphone 120R receives an external sound signal YR representing the collected external sound. This is supplied to the headphone drive control device 200.
The sound source device 300 is an MP3 player, for example, and generates sound signals XL and XR of two left and right channels (hereinafter referred to as sound source signals) XL and XR according to the musical sound data stored in the own device, and supplies them to the headphone drive control device 200. .

ヘッドホン駆動制御装置２００は、例えば音源装置３００における楽音再生を制御するリモコンとして機能するものであり、音源装置３００から受け取った左チャネルの音源信号ＸＬにレベル調整等の信号処理を施して出力信号ＺＬを生成し、スピーカ１１０Ｌへ供給する一方、音源装置３００から受け取った音源信号ＸＲに同信号処理を施して出力信号ＺＲを生成しスピーカ１１０Ｒへ供給する。
加えて、ヘッドホン駆動制御装置２００は、マイク１２０Ｌおよび１２０Ｒから受け取った外部音信号ＹＬおよびＹＲにより表される音が、ヘッドホンシステムのユーザにより聴き取る必要があるとして予め定められた音であるか否かを判定し、予め定められた音であると判定した場合には、外部音信号ＹＬと音源信号ＸＬをミキシングして出力信号ＺＬを生成するとともに、外部音信号ＹＲと音源信号ＸＲをミキシングして出力信号ＺＲを生成する。これにより、音源装置３００により再生される楽音をヘッドホン１００を利用して聴いているユーザに、そのユーザが聴き取る必要があるとして予め定めた外部音を聴き取らせることが可能になる。
以下、ヘッドホン駆動制御装置２００の構成を中心に図面を参照しつつ説明する。 The headphone drive control device 200 functions as, for example, a remote controller that controls tone reproduction in the sound source device 300, and performs signal processing such as level adjustment on the left-channel sound source signal XL received from the sound source device 300 and outputs the output signal ZL. Is generated and supplied to the speaker 110L, while the sound source signal XR received from the sound source device 300 is subjected to the same signal processing to generate an output signal ZR and supplied to the speaker 110R.
In addition, the headphone drive control device 200 determines whether or not the sounds represented by the external sound signals YL and YR received from the microphones 120L and 120R are predetermined sounds that need to be heard by the user of the headphone system. If the sound is determined to be a predetermined sound, the external sound signal YL and the sound source signal XL are mixed to generate an output signal ZL, and the external sound signal YR and the sound source signal XR are mixed. To generate an output signal ZR. As a result, a user who is listening to the musical sound reproduced by the sound source device 300 using the headphones 100 can be made to listen to an external sound that is predetermined as the user needs to listen.
Hereinafter, the configuration of the headphone drive control device 200 will be mainly described with reference to the drawings.

ヘッドホン駆動制御装置２００は、図１に示すように、操作部２１０、両耳信号処理部２２０、音声認識処理部２３０、ミキサ２４０、外部音信号処理部２５０Ｌおよび２５０Ｒを有している。
図１に示すように、マイク１２０Ｌから供給される外部音信号ＹＬはヘッドホン駆動制御装置２００の内部で２分流され、その一方は両耳信号処理部２２０に他方は外部音信号処理部２５０Ｌに与えられる。同様に、マイク１２０Ｒから供給される外部音信号ＹＲもヘッドホン駆動制御装置２００内で２分流され、一方は両耳信号処理部２２０に他方は外部音信号処理部２５０Ｒに与えられる。また、音源装置３００から供給される音源信号ＸＬおよびＸＲはミキサ２４０に与えられる。 As shown in FIG. 1, the headphone drive control apparatus 200 includes an operation unit 210, a binaural signal processing unit 220, a speech recognition processing unit 230, a mixer 240, and external sound signal processing units 250L and 250R.
As shown in FIG. 1, the external sound signal YL supplied from the microphone 120L is divided into two in the headphone drive control device 200, one of which is given to the binaural signal processing unit 220 and the other to the external sound signal processing unit 250L. It is done. Similarly, the external sound signal YR supplied from the microphone 120R is also divided into two in the headphone drive control device 200, one being supplied to the binaural signal processing unit 220 and the other being supplied to the external sound signal processing unit 250R. The sound source signals XL and XR supplied from the sound source device 300 are given to the mixer 240.

外部音信号処理部２５０Ｌおよび２５０Ｒは、アンプとイコライザとを含んでいる（図１では、何れも図示省略）。上記アンプは、マイク１２０Ｌまたは１２０Ｒから受け取った外部音信号を上記イコライザによる信号処理に適したレベルに増幅して上記イコライザに供給するものであり、上記イコライザは上記アンプから引き渡された音信号にバンドパスフィルタ処理などの各種フィルタ処理を施してミキサ２４０に出力する。 The external sound signal processing units 250L and 250R include an amplifier and an equalizer (both are not shown in FIG. 1). The amplifier amplifies the external sound signal received from the microphone 120L or 120R to a level suitable for signal processing by the equalizer and supplies the amplified signal to the equalizer. The equalizer applies a band to the sound signal delivered from the amplifier. Various filter processes such as a pass filter process are performed and output to the mixer 240.

両耳信号処理部２２０は、例えばＤＳＰ（Digital Signal Processor）であり、マイク１２０Ｌおよび１２０Ｒの各々から与えられた外部音信号ＹＬおよびＹＲの位相差または時間差の少なくとも一方を利用して所謂両耳処理を行って雑音成分を除去する等、それら外部音信号の表す音の本来の波形を表す波形データ（その音の音源における波形を表す波形データ）を生成し、その波形データを音声認識処理部２３０に供給する。なお、上記両耳処理の一例としては、外部音信号ＹＬおよびＹＲの時間差および位相差を相殺して両音信号の相関を求めるなどしてノイズ成分の分析を行う処理や音源方向を推定する処理が挙げられる。 The binaural signal processing unit 220 is, for example, a DSP (Digital Signal Processor), and uses so-called binaural processing using at least one of the phase difference or the time difference between the external sound signals YL and YR given from the microphones 120L and 120R. The waveform data representing the original waveform of the sound represented by the external sound signal (waveform data representing the waveform of the sound source of the sound) is generated, such as removing noise components, and the waveform data is converted into the voice recognition processing unit 230. To supply. As an example of the binaural process, a process for analyzing a noise component by canceling a time difference and a phase difference between the external sound signals YL and YR and obtaining a correlation between the two sound signals, or a process for estimating a sound source direction Is mentioned.

音声認識処理部２３０は、例えばＥＥＰＲＯＭ(Electronically Erasable and Programmable Read Only Memory)などの不揮発性メモリやＲＡＭ（Random
Access Memory）などの揮発性メモリとＣＰＵ（Central Processing Unit）を含んでいる（何れも図示省略）。
上記不揮発性メモリには、波形データａ、波形データｂおよび波形データｃの３つの波形データが予め記憶されている。これら３つの波形データは、例えば特定の人の音声や自動車のクラクション音などの波形を表すものであり、具体的には、波形データａは図２（ａ）に示す波形を、波形データｂは図２（ｂ）に示す波形を、そして、波形データｃは図２（ｃ）に示す波形を表している。なお、図２では、説明を簡略化するために、上記各波形データが略正弦波様の波形を表している場合について例示されているが、実際の人の声やクラクション音の波形がより複雑であることは言うまでもない。 The voice recognition processing unit 230 is, for example, a nonvolatile memory such as an EEPROM (Electronically Erasable and Programmable Read Only Memory) or a RAM (Random
It includes a volatile memory such as an access memory (CPU) and a central processing unit (CPU) (both not shown).
In the nonvolatile memory, three waveform data of waveform data a, waveform data b, and waveform data c are stored in advance. These three waveform data represent, for example, a waveform of a specific person's voice or a car horn sound. Specifically, the waveform data a is the waveform shown in FIG. 2A, and the waveform data b is The waveform shown in FIG. 2B and the waveform data c represent the waveform shown in FIG. In FIG. 2, for simplicity of explanation, the case where each of the waveform data represents a substantially sinusoidal waveform is illustrated, but the waveform of an actual human voice or horn sound is more complicated. Needless to say.

さて、図１に示すヘッドホンシステムのユーザは、操作部２１０に設けられている操作子を適宜操作することによって、上記３種類の波形データのうちの１つを予め選択することができ、操作部２１０は、ユーザの操作内容に応じた信号を音声認識処理部２３０に出力する。この信号を受け取った音声認識処理部２３０は、その信号を解析することによって、ユーザにより上記３種類の波形データのうち何れの波形データが選択されたのかを特定し、ユーザにより選択された波形データを示す識別子（例えば、その波形データの格納場所を示すアドレス等）を上記揮発性メモリ内の所定領域に記憶する。
また、上記不揮発性メモリには、上記３種類の波形データの他に、両耳信号処理部２２０から受け取った波形データが上記ユーザにより選択された波形データと一致する否かを判定しその判定結果に応じて制御信号（以下、外部音検出信号）ＣＳをミキサ２４０に出力する処理を上記ＣＰＵに実行させる制御プログラムが予め記憶されている。上記ＣＰＵは上記揮発性メモリをワークエリアとしてその制御プログラムを実行することにより、外部音検出信号ＣＳの出力制御を行う。より詳細に説明すると、上記制御プログラムにしたがって作動しているＣＰＵは、両耳信号処理部２２０から受け取った波形データとユーザにより予め選択された波形データとが一致すると判定した場合に、外部音検出信号ＣＳを出力する。 Now, the user of the headphone system shown in FIG. 1 can select one of the three types of waveform data in advance by appropriately operating an operator provided in the operation unit 210. 210 outputs a signal corresponding to the user's operation content to the speech recognition processing unit 230. Upon receiving this signal, the speech recognition processing unit 230 analyzes the signal to identify which of the three types of waveform data has been selected by the user, and the waveform data selected by the user. (For example, an address indicating the storage location of the waveform data) is stored in a predetermined area in the volatile memory.
The non-volatile memory determines whether or not the waveform data received from the binaural signal processing unit 220 matches the waveform data selected by the user in addition to the three types of waveform data. Accordingly, a control program for causing the CPU to execute a process of outputting a control signal (hereinafter referred to as an external sound detection signal) CS to the mixer 240 is stored in advance. The CPU controls the output of the external sound detection signal CS by executing the control program using the volatile memory as a work area. More specifically, when the CPU operating according to the control program determines that the waveform data received from the binaural signal processing unit 220 matches the waveform data preselected by the user, the external sound detection is performed. The signal CS is output.

ミキサ２４０は、図１に示すように、ゲイン制御部２４１と、アンプ２４２Ｌおよび２４２Ｒと、加算器２４３Ｌおよび２４３Ｒと、アンプ２４４Ｌおよび２４４Ｒとを有している。
ゲイン制御部２４１は、前述した外部音検出信号ＣＳを音声認識処理部２３０から受け取ったか否かに応じて、アンプ２４２Ｌおよび２４２Ｒの増幅率とアンプ２４４Ｌおよび２４４Ｒの増幅率を図３に示すゲイン管理テーブルの格納内容にしたがってセットする。より詳細に説明すると、ゲイン制御部２４１は、音声認識処理部２３０と同様に、不揮発性メモリ、揮発性メモリおよびＣＰＵを含んでおり（図１では何れも図示省略）、不揮発性メモリには、上記増幅率制御処理をＣＰＵに実行させるための制御プログラムおよびゲイン管理テーブルが記憶されている。ゲイン制御部２４１のＣＰＵは上記制御プログラムを不揮発性メモリから読み出し、揮発性メモリをワークエリアとして実行する。その結果、音声認識処理部２３０から外部音検出信号ＣＳが出力された場合には、アンプ２４２Ｌおよびアンプ２４２Ｒの増幅率は“０”にセットされ、アンプ２４４Ｌおよび２４４Ｒの増幅率は“１”にセットされる。逆に、音声認識処理部２３０から外部音検出信号ＣＳが出力されない場合には、アンプ２４２Ｌおよびアンプ２４２Ｒ、アンプ２４４Ｌおよび２４４Ｒの増幅率は“１／２”にセットされる。
以上がヘッドホン駆動制御装置２００の構成である。 As shown in FIG. 1, the mixer 240 includes a gain control unit 241, amplifiers 242L and 242R, adders 243L and 243R, and amplifiers 244L and 244R.
The gain control unit 241 determines the gains of the amplifiers 242L and 242R and the gains of the amplifiers 244L and 244R according to whether or not the external sound detection signal CS is received from the voice recognition processing unit 230, as shown in FIG. Set according to the stored contents of the table. More specifically, the gain control unit 241 includes a non-volatile memory, a volatile memory, and a CPU (all of which are not shown in FIG. 1), similar to the voice recognition processing unit 230. A control program and a gain management table for causing the CPU to execute the amplification factor control process are stored. The CPU of the gain control unit 241 reads the control program from the nonvolatile memory and executes the volatile memory as a work area. As a result, when the external sound detection signal CS is output from the speech recognition processing unit 230, the amplification factors of the amplifiers 242L and 242R are set to “0”, and the amplification factors of the amplifiers 244L and 244R are set to “1”. Set. Conversely, when the external sound detection signal CS is not output from the speech recognition processing unit 230, the amplification factors of the amplifiers 242L and 242R and the amplifiers 244L and 244R are set to “1/2”.
The above is the configuration of the headphone drive control device 200.

（Ｂ：動作）
次いで、ヘッドホン駆動制御装置２００が行う動作について図４を参照しつつ説明する。
図４（ａ）から（ｃ）は、時刻Ｔ０から時刻Ｔ３までの期間にヘッドホン駆動制御装置２００に入力される音源信号ＸＬおよび外部音信号ＹＬと、同期間にヘッドホン駆動制御装置２００から出力される出力信号ＺＬの各信号波形を示す図である。なお、本実施形態では、左右２チャネルの入出力波形のうち、左チャネルの入出力波形についてのみ図４（ａ）から（ｃ）に例示されているが、右チャネルの入出力波形は位相差および時間差を除いて左チャネルの入出力波形と同一である。また、以下に説明する動作例では、音源信号ＸＬの周波数はｆ[Ｈｚ]であり、図４（ｂ）に示すように、時刻Ｔ０からＴ１までの期間に入力される外部音信号ＹＬの周波数ｆ／２[Ｈｚ]であり、時刻Ｔ１からＴ２までの期間に入力される外部音信号ＹＬの周波数２ｆ[Ｈｚ]であり、時刻Ｔ２からＴ３までの期間に入力される外部音信号ＹＬの周波数４ｆ[Ｈｚ]であるとする。そして、本動作の開始時点では、周波数がｆ／２[Ｈｚ]、２ｆ[Ｈｚ]および４ｆ[Ｈｚ]である各波形を表す波形データａ、波形データｂおよび波形データｃが音声認識処理部２３０の不揮発性メモリに予め記憶されており、これら３種類の波形データのうち波形データｂが予めユーザにより選択されているものとする。 (B: Operation)
Next, operations performed by the headphone drive control device 200 will be described with reference to FIG.
4A to 4C, the sound source signal XL and the external sound signal YL that are input to the headphone drive control device 200 during the period from time T0 to time T3, and the headphone drive control device 200 that are output during the same period. It is a figure showing each signal waveform of output signal ZL. In this embodiment, among the left and right channel input / output waveforms, only the left channel input / output waveform is illustrated in FIGS. 4A to 4C. The input / output waveform of the left channel is the same except for the time difference. In the operation example described below, the frequency of the sound source signal XL is f [Hz], and as shown in FIG. 4B, the frequency of the external sound signal YL input during the period from time T0 to T1. f / 2 [Hz], the frequency 2f [Hz] of the external sound signal YL input during the period from time T1 to T2, and the frequency of the external sound signal YL input during the period from time T2 to T3 It is assumed that the frequency is 4 f [Hz]. At the start of this operation, the waveform data a, the waveform data b, and the waveform data c representing the waveforms having the frequencies of f / 2 [Hz], 2f [Hz], and 4f [Hz] are the voice recognition processing unit 230. It is assumed that the waveform data b is previously selected by the user from among these three types of waveform data.

（Ｂ−１：時刻Ｔ０からＴ１までの期間における動作）
時刻Ｔ０からＴ１までの期間において、マイク１２０Ｌからヘッドホン駆動制御装置２００に入力される外部音信号ＹＬの波形は、図４（ｂ）に示す通り、周波数がｆ／２[Ｈｚ]の波形である。このため、両耳信号処理部２２０は、外部音信号ＹＬおよびＹＲに対して両耳処理を施すことにより図４（ｂ）に示す波形を表す波形データを生成し、その波形データを音声認識処理部２３０へ引き渡す。このように、両耳信号処理部２２０により実行される両耳処理によって雑音成分が除去されるため、外部音信号ＹＬおよびＹＲにノイズが含まれていたとしても、それら外部音信号の信号源における信号波形が上記両耳処理によって復元され、その波形を示す波形データが音声認識処理部２３０に引き渡される。 (B-1: Operation in the period from time T0 to T1)
In the period from time T0 to T1, the waveform of the external sound signal YL input from the microphone 120L to the headphone drive control device 200 is a waveform having a frequency of f / 2 [Hz] as shown in FIG. 4B. . Therefore, the binaural signal processing unit 220 performs binaural processing on the external sound signals YL and YR to generate waveform data representing the waveform shown in FIG. 4B, and the waveform data is subjected to voice recognition processing. Delivered to unit 230. Thus, since the noise component is removed by the binaural processing executed by the binaural signal processing unit 220, even if the external sound signals YL and YR contain noise, the signal sources of these external sound signals The signal waveform is restored by the binaural process, and waveform data indicating the waveform is delivered to the speech recognition processing unit 230.

音声認識処理部２３０のＣＰＵは、両耳信号処理部２２０から受け取った波形データと、ユーザにより指定された波形データとを比較し、その比較結果に応じて外部音検出信号ＣＳを出力する。前述したように、両耳信号処理部２２０から音声認識処理部２３０へ引き渡される波形データにはノイズ成分が含まれていないため、音声認識処理部２３０における音声認識を精度良く実行することが可能になる。
本実施形態では、時刻Ｔ０からＴ１までの期間に両耳信号処理部２２０から音声認識処理部２３０へ引き渡される波形データは、周波数がｆ／２[Ｈｚ]の波形を表す一方、ユーザにより選択された波形データは、周波数が２ｆ「Ｈｚ」の波形を表すため、音声認識処理部２３０のＣＰＵは両波形は一致しないと判定し、外部音検出信号ＣＳを出力しない。 The CPU of the speech recognition processing unit 230 compares the waveform data received from the binaural signal processing unit 220 with the waveform data designated by the user, and outputs an external sound detection signal CS according to the comparison result. As described above, since the waveform data delivered from the binaural signal processing unit 220 to the speech recognition processing unit 230 does not include noise components, it is possible to perform speech recognition in the speech recognition processing unit 230 with high accuracy. Become.
In the present embodiment, the waveform data delivered from the binaural signal processing unit 220 to the speech recognition processing unit 230 during the period from time T0 to T1 represents a waveform having a frequency of f / 2 [Hz], and is selected by the user. Since the waveform data represents a waveform having a frequency of 2 f “Hz”, the CPU of the speech recognition processing unit 230 determines that the two waveforms do not match and does not output the external sound detection signal CS.

ミキサ２４０のゲイン制御部２４１は、時刻Ｔ０からＴ１までの期間においては、外部音検出信号ＣＳを受け取らないため、図３に示すゲイン管理テーブルの格納内容にしたがって、アンプ２４２Ｌおよび２４２Ｒの増幅率を“０”にセットし、アンプ２４４Ｌおよび２４４Ｒの増幅率を“１”にセットする。その結果、加算器２４３Ｌには音源信号ＸＬのみが入力され、その音源信号ＸＬそのものが出力信号ＺＬとしてスピーカ１１０Ｌに供給される。また、スピーカ１１０Ｒについても同様に、音源信号ＸＲそのものが出力信号ＺＲとして供給されることになる。その結果、スピーカ１１０Ｌおよび１１０Ｒからは、音源装置３００により再生される楽音のみが放音される。つまり、時刻Ｔ０からＴ１までの期間においては、ユーザは、図４（ｃ）に示すように、音源装置３００により再生された楽音のみを聴くことになる。
以上が時刻Ｔ０からＴ１までの期間におけるヘッドホン駆動制御装置２００の動作である。 Since the gain control unit 241 of the mixer 240 does not receive the external sound detection signal CS during the period from the time T0 to the time T1, the gains of the amplifiers 242L and 242R are set according to the stored contents of the gain management table shown in FIG. “0” is set, and amplification factors of the amplifiers 244L and 244R are set to “1”. As a result, only the sound source signal XL is input to the adder 243L, and the sound source signal XL itself is supplied as an output signal ZL to the speaker 110L. Similarly, the sound source signal XR itself is supplied as the output signal ZR to the speaker 110R. As a result, only the musical sound reproduced by the sound source device 300 is emitted from the speakers 110L and 110R. That is, during the period from time T0 to T1, the user listens only to the musical sound reproduced by the sound source device 300 as shown in FIG.
The above is the operation of the headphone drive control apparatus 200 during the period from time T0 to T1.

（Ｂ−２：時刻Ｔ１からＴ２までの期間における動作）
次いで、時刻Ｔ１からＴ２までの期間にてヘッドホン駆動制御装置２００が行う動作について説明する。
時刻Ｔ１からＴ２までの期間においては、マイク１２０Ｌからヘッドホン駆動制御装置２００に入力される外部音信号ＹＬの波形は、図４（ｂ）に示す通り、周波数が２ｆ[Ｈｚ]の波形であり、この波形を表す波形データが両耳信号処理部２２０から音声認識処理部２３０に引き渡される。なお、時刻Ｔ１からＴ２までの期間に両耳信号処理部２２０から音声認識処理部２３０へ引き渡される波形データにはノイズ成分が含まれていないことは、前述した時刻Ｔ０からＴ１までの期間の場合と同様である。 (B-2: Operation in the period from time T1 to time T2)
Next, an operation performed by the headphone drive control device 200 in a period from time T1 to T2 will be described.
During the period from time T1 to T2, the waveform of the external sound signal YL input from the microphone 120L to the headphone drive control device 200 is a waveform having a frequency of 2f [Hz] as shown in FIG. Waveform data representing this waveform is delivered from the binaural signal processing unit 220 to the speech recognition processing unit 230. Note that the waveform data delivered from the binaural signal processing unit 220 to the speech recognition processing unit 230 during the period from time T1 to T2 does not include a noise component in the case of the period from time T0 to T1 described above. It is the same.

音声認識処理部２３０のＣＰＵは、時刻Ｔ０からＴ１までの期間における動作と同様に、両耳信号処理部２２０から受け取った波形データと、ユーザにより指定された波形データとを比較し、その比較結果に応じて外部音検出信号ＣＳの出力制御を行う。時刻Ｔ１からＴ２までの期間に両耳信号処理部２２０から音声認識処理部２３０へ引き渡される波形データとユーザにより選択された波形データとは、共に周波数が２ｆ「Ｈｚ」の波形を表すため、音声認識処理部２３０は、両者は一致すると判定し外部音検出信号ＣＳをミキサ２４０に出力する。 The CPU of the speech recognition processing unit 230 compares the waveform data received from the binaural signal processing unit 220 with the waveform data designated by the user, as in the operation during the period from time T0 to T1, and the comparison result In response to this, the output control of the external sound detection signal CS is performed. Since the waveform data delivered from the binaural signal processing unit 220 to the speech recognition processing unit 230 and the waveform data selected by the user in the period from the time T1 to the time T2 both represent a waveform having a frequency of 2f “Hz”. The recognition processing unit 230 determines that the two match, and outputs the external sound detection signal CS to the mixer 240.

時刻Ｔ１からＴ２までの期間においては、ミキサ２４０のゲイン制御部２４１は、外部音検出信号ＣＳを音声認識処理部２３０から受け取るため、図３に示すゲイン管理テーブルの格納内容にしたがって、アンプ２４２Ｌ、２４２Ｒ、アンプ２４４Ｌおよび２４４Ｒの増幅率を“１／２”にセットする。その結果、加算器２４３Ｌには音源信号ＸＬおよび外部音信号ＹＬの両者が入力され、両者を１対１の比率でミキシングした出力信号ＺＬがスピーカ１１０Ｌに供給される。スピーカ１１０Ｒについて同様に、音源信号ＸＬおよび外部音信号ＹＬを１対１の比率でミキシングした出力信号ＺＬが供給される。このため、時刻Ｔ１からＴ２までの期間においては、図４（ｃ）に示すように、ユーザは音源装置３００により再生された楽音と外部音とを聴くことになる。
以上が時刻Ｔ１からＴ２までの期間におけるヘッドホン駆動制御装置２００の動作である。以降、ヘッドホン駆動制御装置２００は、時刻Ｔ２からＴ３までの期間においても、前述した時刻Ｔ０からＴ１までの期間と同様の動作を行い、その結果、この期間においてはユーザは音源装置３００により再生される楽音のみを聴くことになる。 In the period from time T1 to T2, the gain control unit 241 of the mixer 240 receives the external sound detection signal CS from the speech recognition processing unit 230, and therefore, according to the stored contents of the gain management table shown in FIG. The amplification factors of 242R and amplifiers 244L and 244R are set to “1/2”. As a result, both the sound source signal XL and the external sound signal YL are input to the adder 243L, and an output signal ZL obtained by mixing both at a ratio of 1: 1 is supplied to the speaker 110L. Similarly, an output signal ZL obtained by mixing the sound source signal XL and the external sound signal YL at a ratio of 1: 1 is supplied to the speaker 110R. For this reason, in the period from time T1 to T2, as shown in FIG. 4C, the user listens to the musical sound and the external sound reproduced by the sound source device 300.
The above is the operation of the headphone drive control device 200 during the period from time T1 to T2. Thereafter, the headphone drive control device 200 performs the same operation as in the period from the time T0 to T1 described above during the period from the time T2 to T3. As a result, the user is reproduced by the sound source device 300 during this period. You will only listen to the musical sounds.

以上説明したように、本実施形態に係るヘッドホン駆動制御装置２００においては、ヘッドホンの左右各スピーカに対応して配置された２つのマイクにより収音された外部音信号に両耳処理を施してその音源波形を復元した後に、音声認識処理が行われるため、その外部音がユーザにより予め指定された音であるか否かの判別が精度良く行われる。このため、本実施形態によれば、ヘッドホンの利用中であっても聴き取る必要があるとユーザにより予め定められた音をそのユーザに聴き取らせることが可能になるとともに、予め定められた音であるか否かの判別を精度良く行うことが可能になる。 As described above, in the headphone drive control device 200 according to the present embodiment, the binaural process is performed on the external sound signal collected by the two microphones arranged corresponding to the left and right speakers of the headphone. Since the voice recognition process is performed after the sound source waveform is restored, it is accurately determined whether or not the external sound is a sound designated in advance by the user. For this reason, according to the present embodiment, it is possible to make the user listen to a sound predetermined by the user when it is necessary to listen even when the headphones are being used, and to determine the predetermined sound. It is possible to accurately determine whether or not.

（Ｃ：変形）
以上、本発明の一実施形態について説明したが、係る実施形態に以下に述べる変形を加えても良いことは勿論である。
（１）上述した実施形態では、ヘッドホンの左右各スピーカに対応して配置された２つのマイクにより収音された外部音信号に両耳処理を施してその音源波形を復元した後にその波形が予め定められた波形と一致するか否かを判定し、一致すると判定した場合にその外部音信号と音源信号とをミキシングして各スピーカへ供給する場合について説明した。しかしながら、上記両耳処理により復元した波形が予め定められた波形と一致している期間は、外部音信号のみを各スピーカへ供給し、一致していない期間については音源信号のみを各スピーカへ供給するようにしても勿論良い。具体的には、図３に示すゲイン管理テーブルに代えて図５（ａ）に示すゲイン管理テーブルを用いるとしても良く、ミキサ２４０に代えて図５（ｂ）に示す選択部２６０を設けることにより実現するとしても良い。なお、図５に示す選択部２６０において、セレクタ２６１Ｌおよび２６１Ｒは、音声認識処理部２３０から外部音出力信号ＣＳを受け取った場合には、アンプ２４２Ｌまたはアンプ２４２Ｒから与えられた音源信号ＸＬまたはＸＲを出力し、上記外部音検出信号ＣＳを受け取らなかった場合には、アンプ２４４Ｌまたはアンプ２４４Ｒから与えられた外部音信号ＹＬまたはＹＲを出力するものである。なお、図５（ａ）に示すゲイン管理テーブルを用いる態様や図５（ｂ）に示す選択部２６０をミキサ２４０に代えて用いる態様では、外部音信号を出力している期間に相当する楽音を聴き逃してしまうため、外部音信号出力期間における音源信号を蓄積記憶するバッファメモリを設けるようにしても良い。 (C: deformation)
Although one embodiment of the present invention has been described above, it goes without saying that modifications described below may be added to such an embodiment.
(1) In the above-described embodiment, after the binaural process is performed on the external sound signal collected by the two microphones arranged corresponding to the left and right speakers of the headphones and the sound source waveform is restored, the waveform is preliminarily stored. A case has been described in which it is determined whether or not it matches a predetermined waveform, and when it is determined to match, the external sound signal and the sound source signal are mixed and supplied to each speaker. However, only the external sound signal is supplied to each speaker during the period when the waveform restored by the binaural processing matches the predetermined waveform, and only the sound source signal is supplied to each speaker during the non-matching period. Of course, it is good. Specifically, the gain management table shown in FIG. 5A may be used instead of the gain management table shown in FIG. 3, and the selection unit 260 shown in FIG. 5B is provided in place of the mixer 240. It may be realized. In the selection unit 260 shown in FIG. 5, when the selectors 261L and 261R receive the external sound output signal CS from the speech recognition processing unit 230, the selectors 261L and 261R receive the sound source signal XL or XR given from the amplifier 242L or the amplifier 242R. When the external sound detection signal CS is not received, the external sound signal YL or YR given from the amplifier 244L or the amplifier 244R is output. In the aspect using the gain management table shown in FIG. 5A or the aspect using the selection unit 260 shown in FIG. 5B instead of the mixer 240, the musical sound corresponding to the period during which the external sound signal is output is displayed. A buffer memory for accumulating and storing sound source signals during the external sound signal output period may be provided in order to miss the sound.

（２）上述した実施形態では、波形データａ、波形データｂおよび波形データｃの３種類の波形データが音声認識処理部２３０の不揮発性メモリに予め記憶されている場合について説明したが、音声認識処理部２３０の不揮発性メモリに予め記憶されている波形データの数は、３に限定されるものではなく、２つ以下であっても良く、４つ以上であっても良い。要は、１又は複数の波形データが音声認識処理部２３０の不揮発性メモリに記憶されていれば良い。なお、音声認識処理部２３０の不揮発性メモリに１つだけ波形データが記憶されている場合には、前述した選択処理をユーザに行わせる必要がないことは言うまでもない。 (2) In the above-described embodiment, the case where the three types of waveform data of the waveform data a, the waveform data b, and the waveform data c are stored in advance in the nonvolatile memory of the speech recognition processing unit 230 has been described. The number of waveform data stored in advance in the nonvolatile memory of the processing unit 230 is not limited to 3, and may be 2 or less, or 4 or more. In short, one or a plurality of waveform data may be stored in the nonvolatile memory of the voice recognition processing unit 230. Needless to say, when only one waveform data is stored in the non-volatile memory of the speech recognition processing unit 230, it is not necessary for the user to perform the selection process described above.

（３）上述した実施形態では、音声認識処理部２３０の不揮発性メモリに予め波形データが記憶されている場合について説明したが、ユーザの指示に応じて波形データの追加記憶や不揮発性メモリからの消去を行うようにしても勿論良い。具体的には、操作部２１０に対して録音開始を指示する旨の操作が為されてから、その終了を指示する旨の操作が為されるまでの期間にマイク１２０Ｌおよび１２０Ｒから出力される外部音信号ＹＬおよびＹＲに両耳処理を施して得られる波形データを音声認識処理部２３０の不揮発性メモリに書き込む処理を音声認識処理部２３０のＣＰＵに実行させるようにすれば良い。また、ヘッドホンシステムの利用中（すなわち、音源装置３００により再生される楽音を聴いている状態）に、再生音量を所定の閾値以下に引き下げる操作が為された場合に、上記録音開始を指示する旨の操作が為されたと操作部２１０に判定させ、その後、再生音量を上記閾値以上の値に引き上げる操作が為された場合に、上記録音の終了を指示する旨の操作が為されたと操作部２１０に判定させ、その判定結果を上記ＣＰＵへ通知させるようにしても勿論良い。 (3) In the above-described embodiment, the case where the waveform data is stored in advance in the nonvolatile memory of the speech recognition processing unit 230 has been described. However, additional waveform data is stored in accordance with a user instruction, or from the nonvolatile memory. Of course, erasing may be performed. Specifically, an external output from the microphones 120L and 120R during a period from when an operation for instructing the start of recording to the operation unit 210 is performed until an operation for instructing the end of the operation is performed. What is necessary is just to make the CPU of the speech recognition processing unit 230 execute the process of writing the waveform data obtained by performing binaural processing on the sound signals YL and YR to the nonvolatile memory of the speech recognition processing unit 230. In addition, when the headphone system is being used (that is, in a state where a musical sound reproduced by the sound source device 300 is being listened to), when an operation for lowering the reproduction volume is performed below a predetermined threshold value, the start of recording is instructed. The operation unit 210 determines that the recording operation has been performed, and when the operation for instructing the end of the recording is performed when the operation for raising the playback volume to a value equal to or higher than the threshold is performed. Of course, it is possible to make the determination and notify the CPU of the determination result.

（４）上述した実施形態では、マイク１２０Ｌおよび１２０Ｒにより収音された外部音が予め定められた音である場合には、その外部音信号と音源信号とを１対１の比率でミキシングしてスピーカ１１０Ｌおよび１１０Ｒに出力する場合について説明したが、両者の比率は１対１に限定されるものではなく適宜定めるようにしても良い。
また、上述した実施形態では、ヘッドホンシステムのユーザにそのヘッドホンシステムの利用中ではあっても聴き取ることを要する外部音を１つだけ予め選択させておく場合について説明したが、複数選択させておくとしても勿論良い。 (4) In the embodiment described above, when the external sound collected by the microphones 120L and 120R is a predetermined sound, the external sound signal and the sound source signal are mixed at a ratio of 1: 1. Although the case of outputting to the speakers 110L and 110R has been described, the ratio between the two is not limited to 1: 1, but may be determined as appropriate.
In the above-described embodiment, the case where the user of the headphone system selects in advance one external sound that needs to be heard even when the headphone system is being used has been described. Of course it is good.

本発明の一実施形態に係るヘッドホン駆動制御装置２００の構成例を示すブロック図である。It is a block diagram which shows the structural example of the headphone drive control apparatus 200 which concerns on one Embodiment of this invention. 同ヘッドホン駆動制御装置２００の音声認識処理部２３０に予め記憶されている波形データの表す波形の一例を示す図である。It is a figure which shows an example of the waveform which the waveform data pre-stored in the speech recognition process part 230 of the headphone drive control apparatus 200 represent. 同ヘッドホン駆動制御装置２００のゲイン制御部２４１に記憶されているゲイン管理テーブルの一例を示す図である。4 is a diagram showing an example of a gain management table stored in a gain control unit 241 of the headphone drive control device 200. FIG. 同ヘッドホン駆動制御装置２００の動作例を示す波形図である。4 is a waveform diagram showing an operation example of the headphone drive control apparatus 200. FIG. 変形例（１）に係るヘッドホン駆動制御装置を説明するための図である。It is a figure for demonstrating the headphone drive control apparatus which concerns on a modification (1).

Explanation of symbols

１０…ヘッドホンシステム、１００…ヘッドホン、１１０Ｌ，１１０Ｒ…スピーカ、１２０Ｌ，１２０Ｒ…マイク、１３０Ｌ、１３０Ｒ…耳あて、２００…ヘッドホン駆動制御装置、２１０…操作部、２２０…両耳信号処理部、２３０…音声認識処理部、２４０…ミキサ、２４１…ゲイン制御部、２４２Ｌ，２４２Ｒ，２４４Ｌ，２４４Ｒ…アンプ、２４３Ｌ，２４３Ｒ…加算器、２５０Ｌ，２５０Ｒ…外部音信号処理部。 DESCRIPTION OF SYMBOLS 10 ... Headphone system, 100 ... Headphone, 110L, 110R ... Speaker, 120L, 120R ... Microphone, 130L, 130R ... Ear contact, 200 ... Headphone drive control device, 210 ... Operation part, 220 ... Binaural signal processing part, 230 ... Voice recognition processing unit, 240 ... mixer, 241 ... gain control unit, 242L, 242R, 244L, 244R ... amplifier, 243L, 243R ... adder, 250L, 250R ... external sound signal processing unit.

Claims

There is a left ear speaker and a right ear speaker, and a first microphone is fixed in the vicinity of the left ear speaker, while a second microphone is fixed in the vicinity of the right ear speaker. Headphones
A headphone drive control device that receives left channel and right channel sound signals from a sound source device and supplies left channel sound signals to the left ear speaker, while supplying right channel sound signals to the right ear speaker;
With
The headphone drive control device is
A binaural signal processing unit that performs noise analysis and noise reduction using at least one of a phase difference or a time difference between the output signals of both microphones, and generates and outputs waveform data indicating a sound source waveform of a sound represented by the output signals; ,
It is determined whether or not the waveform represented by the waveform data output from the binaural signal processing unit matches any one or a plurality of predetermined waveforms. A voice recognition processing unit to output;
When an external sound detection signal is received from the voice recognition processing unit, the output signal of the first microphone is mixed with the sound signal of the left channel and supplied to the speaker for the left ear, while the second microphone A mixer that mixes the output signal with the right channel sound signal and supplies it to the right-ear speaker;
A headphone system comprising:

There is a left ear speaker and a right ear speaker, and a first microphone is fixed in the vicinity of the left ear speaker, while a second microphone is fixed in the vicinity of the right ear speaker. Headphones
A headphone drive control device that receives left channel and right channel sound signals from a sound source device and supplies left channel sound signals to the left ear speaker, while supplying right channel sound signals to the right ear speaker;
With
The headphone drive control device is
A binaural signal processing unit that performs noise analysis and noise reduction using at least one of a phase difference or a time difference between the output signals of both microphones, and generates and outputs waveform data indicating a sound source waveform of a sound represented by the output signals; ,
It is determined whether or not the waveform represented by the waveform data output from the binaural signal processing unit matches any one or a plurality of predetermined waveforms. A voice recognition processing unit to output;
When an external sound detection signal is received from the voice recognition processing unit, the output signal of the first microphone is supplied to the left ear speaker instead of the left channel sound signal, and the right channel sound signal is replaced. A headphone system comprising: a selection unit that supplies an output signal of the second microphone to the right-ear speaker.

In the headphone drive control device that supplies the sound signal of the left channel received from the sound source device to the speaker for the left ear of the headphone, while supplying the sound signal of the right channel received from the sound source device to the speaker for the right ear of the headphone,
Noise using at least one of the phase difference or the time difference between the output signal of the first microphone arranged in the vicinity of the left ear speaker and the output signal of the second microphone arranged in the vicinity of the right ear speaker A binaural signal processing unit that performs analysis and noise reduction, and generates and outputs waveform data indicating the sound source waveform of the sound represented by those output signals;
It is determined whether or not the waveform represented by the waveform data output from the binaural signal processing unit matches any one or a plurality of predetermined waveforms. A voice recognition processing unit to output;
When an external sound detection signal is received from the voice recognition processing unit, the output signal of the first microphone is mixed with the sound signal of the left channel and supplied to the speaker for the left ear, while the sound of the right channel is supplied. And a mixer for mixing the output signal of the second microphone with the signal and supplying the mixed signal to the speaker for the right ear.

In the headphone drive control device that supplies the sound signal of the left channel received from the sound source device to the speaker for the left ear of the headphone, while supplying the sound signal of the right channel received from the sound source device to the speaker for the right ear of the headphone,
Noise using at least one of the phase difference or the time difference between the output signal of the first microphone arranged in the vicinity of the left ear speaker and the output signal of the second microphone arranged in the vicinity of the right ear speaker A binaural signal processing unit that performs analysis and noise reduction, and generates and outputs waveform data indicating the sound source waveform of the sound represented by those output signals;
It is determined whether or not the waveform represented by the waveform data output from the binaural signal processing unit matches any one or a plurality of predetermined waveforms. A voice recognition processing unit to output;
When an external sound detection signal is received from the voice recognition processing unit, the output signal of the first microphone is supplied to the left ear speaker instead of the left channel sound signal, and the right channel sound signal is replaced. A headphone drive control device comprising: a selection unit that supplies an output signal of the second microphone to the right-ear speaker.

There is a left ear speaker and a right ear speaker, and a first microphone is fixed in the vicinity of the left ear speaker, while a second microphone is fixed in the vicinity of the right ear speaker. Headphones characterized by