JP2007334968A

JP2007334968A - Voice switching apparatus

Info

Publication number: JP2007334968A
Application number: JP2006164005A
Authority: JP
Inventors: Yuki Okawa; 友樹大川
Original assignee: Pioneer Electronic Corp; Pioneer Solutions Corp
Current assignee: Pioneer Corp; Pioneer Solutions Corp
Priority date: 2006-06-13
Filing date: 2006-06-13
Publication date: 2007-12-27

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice switching apparatus in which surrounding change can be accurately recognized when a user listens to music. <P>SOLUTION: The apparatus is provided with: a storage device 101 storing external sound as a recorded sound; an analysis device 103 discriminating whether the external sound satisfies discrimination conditions or not; a reproducing device 102 reproducing recorded sound stored in the storage device 101; and a switching unit 105 switching alternatively music input from a music reproducing device 10, the external sound, and the recorded sound reproduced by the reproducing device, and outputting it to a loudspeaker 31, and when it is decided that the external sound satisfies discrimination conditions, the sound recorded a predetermined time before the satisfaction of conditions is detected is reproduced and output from the loudspeaker 31. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、音声切換装置に関し、詳細には、利用者がヘッドフォン等で音楽を聴きながら必要な周囲の状況を的確に把握することが可能な音声切換装置に関する。 The present invention relates to a voice switching device, and more particularly, to a voice switching device that allows a user to accurately grasp a necessary surrounding situation while listening to music with headphones or the like.

従来、ＣＤやＭＤ、フラッシュメモリなどの記録メディアに記憶された音楽データを再生する携帯可能な携帯音楽再生装置が知られている。該携帯音楽再生装置は、通常、ヘッドフォンを用いて使用者のみが再生された音楽を聴くようになっている。携帯音楽再生装置の音楽を再生して、ヘッドフォンで聴く場合、外部の音が聴こえ難い。これは、音楽のみを聴くには良い点であるが、例えば、電車などの公共交通施設を利用しているときには、電車内の放送（停車駅を知らせる放送など）がよく聞き取れず、乗り過ごすことがあるという問題がある。また、会話を行う際には、いちいちヘッドフォンを外さなければならず、面倒であるという問題がある。 Conventionally, portable portable music playback apparatuses that play back music data stored in a recording medium such as a CD, MD, or flash memory are known. The portable music player usually listens to music played only by the user using headphones. When playing music from a portable music player and listening with headphones, it is difficult to hear external sounds. This is a good point for listening to music only. For example, when using a public transportation facility such as a train, you may not be able to hear the broadcasts on the train (such as broadcasts that inform you of the stop station). There is a problem that there is. Moreover, when carrying out a conversation, the headphones must be removed one by one, which is problematic.

上記問題を解決するために、例えば、特許文献１の携帯音声再生装置では、マイクから入力される外部音声を音声認識し、音声認識された外部音声が予め記憶された所定の語句と一致した場合に、その旨を通知する技術が開示されている。 In order to solve the above problem, for example, in the portable audio reproduction device of Patent Document 1, external audio input from a microphone is recognized as voice, and the recognized voice is matched with a predetermined phrase stored in advance. Discloses a technique for notifying that effect.

しかしながら、特許文献１では、例えば、目的の降車駅を「うらわ」と登録した場合、「みなみうらわ」や「きたうらわ」というアナウンスに対して、音声認識の結果、一致と判断して、利用者に「うらわ」と通知することになるため、利用者は「うらわ」が「みなみうらわ」であるのか、「きたうらわ」であるのかを判断することができないという問題がある。 However, in Patent Document 1, for example, when the target disembarkation station is registered as “Urawa”, it is determined that the announcements “Minami Urawa” and “Kita Urawa” match as a result of voice recognition. , Because the user will be notified of "Urawa", the problem is that the user cannot determine whether "Urawa" is "Minami Urawa" or "Kita Urawa". is there.

また、特許文献１では、例えば、「うらわ」駅で下車しようとしている場合において、周囲の人物が目的地とは異なる場所で「うらわ」と発した場合、音声認識の結果、一致と判断して、利用者に「うらわ」と通知される場合がある。しかし、直ぐには、その通知が車掌アナウンスによるものか周囲の人物によるものかを判断できないという問題がある。 Further, in Patent Document 1, for example, when the user is getting off at “Urawa” station, if the surrounding person utters “Urawa” at a place different from the destination, it is determined as a result of voice recognition. Then, the user may be notified of “Urawa”. However, there is a problem in that it cannot be immediately determined whether the notification is due to a conductor announcement or a surrounding person.

また、利用者は音楽再生の利用を「周囲の騒音が好ましくないため、音楽再生を利用する」というケースもあるが、特許文献１では、周囲の騒音が静かになったら音楽再生を中止したいという要望に応えることができないという問題がある。 In addition, there is a case where the user uses the music playback because “the ambient noise is not preferable, so the music playback is used”. However, in Patent Document 1, the user wants to stop the music playback when the ambient noise becomes quiet. There is a problem that the request cannot be met.

また、特許文献２のヘッドセットでは、マイクから入力される外部音声を音声認識し、
音声認識の結果、特定のキーワード、特定の人の声紋、または一定基準音量以上の音が一定時間検出された場合に、音楽再生装置から入力される音楽出力を停止して、利用者が外部音声を聞こえるようにする技術が開示されている。特許文献２によれば、音楽鑑賞中も特定人物の呼びかけや電話の呼び出しにも直ぐに反応することができる。 Moreover, in the headset of patent document 2, the external sound input from a microphone is recognized as voice,
As a result of voice recognition, when a specific keyword, a voice print of a specific person, or a sound of a certain reference volume or higher is detected for a certain period of time, the music output input from the music playback device is stopped and the user A technique for making a sound heard is disclosed. According to Patent Document 2, it is possible to immediately respond to a call of a specific person or a call of a telephone even while listening to music.

しかしながら、特許文献２では、「声紋一致」および「一定基準音量以上の音の検出」によって再生中の音楽がミュートされた場合、どのような呼びかけ、音によって音楽再生がミュートされたのかを判別することができないという問題がある。 However, in Patent Document 2, when the music being played is muted by “voice pattern matching” and “detection of sound above a certain reference volume”, it is determined what kind of call and music reproduction is muted by the sound. There is a problem that can not be.

例えば、音楽鑑賞中の利用者に対して「夕飯を食べに行こうよ」という呼びかけに対して、声紋一致した場合、音楽再生はミュートされたが、利用者はなぜ音楽再生がミュートされたのかがわからないという問題がある。例えば、音楽鑑賞中の利用者周辺でドアブザー音が鳴り響いた場合、音楽再生はミュートされたが、その中断がドアブザーなのか、電話コールなのか、または誰かの呼びかけなのかを判別することができないとい問題がある。さらに、特許文献１と同様、周囲の騒音が静かになったら音楽再生を中止したいという要望に応えることができないという問題がある。 For example, if the voiceprint matches a call to a user who is listening to music saying “Let's go for dinner,” the music playback was muted, but the user was muted. There is a problem of not knowing. For example, if a door buzzer sounds in the vicinity of a user listening to music, the music playback is muted, but it cannot be determined whether the interruption is a door buzzer, a telephone call, or someone's call. There's a problem. Furthermore, as in Patent Document 1, there is a problem that it is impossible to respond to a request to stop music playback when ambient noise becomes quiet.

特開２００１−２５６７７１号公報JP 2001-256771 A 特開２００５−１９２００４号公報Japanese Patent Laid-Open No. 2005-192004

本発明は、上記課題に鑑みてなされたものであり、主として、利用者が音楽鑑賞をしている場合に、周囲の状況変化を的確に把握することが可能な音声切換装置を提供することを目的とする。 The present invention has been made in view of the above problems, and mainly provides a voice switching device capable of accurately grasping changes in the surrounding situation when a user is listening to music. Objective.

上述した課題を解決し、目的を達成するために、本発明は、マイクから入力される外部音声を録音音声として記憶する記憶手段と、前記マイクから入力される外部音声が判定条件に一致するか否かを判定する解析手段と、前記記憶手段に記憶した録音音声を再生する再生手段と、音楽再生装置から入力される音楽、前記マイクから入力される外部音声、および前記再生手段で再生される録音音声を択一的に切り換えてスピーカーに出力する切換手段と、を備え、前記解析手段により前記マイクから入力される外部音声が判定条件に一致すると判断された場合、前記再生手段は、条件一致を検出した時点よりも所定時間前からの録音音声を再生し、前記切換手段は当該再生した録音音声をスピーカーに出力することを特徴とする。 In order to solve the above-described problems and achieve the object, the present invention relates to storage means for storing external sound input from a microphone as recorded sound and whether the external sound input from the microphone matches a determination condition. An analysis means for determining whether or not, a playback means for playing back the recorded voice stored in the storage means, music input from a music playback device, external sound input from the microphone, and playback by the playback means Switching means for selectively switching the recorded sound and outputting it to a speaker, and when the external sound input from the microphone matches the determination condition by the analysis means, the reproduction means The recorded sound from a predetermined time before the point in time when the sound is detected is reproduced, and the switching means outputs the reproduced recorded sound to a speaker.

以下に添付図面を参照して、この発明にかかる音声切換装置の最良な実施の形態を詳細に説明する。この実施の形態によりこの発明が限定されるものではない。また、下記実施の形態における構成要素には、当業者が容易に想定できるものまたは実質的に同一のものが含まれる。 Exemplary embodiments of an audio switching device according to the present invention will be explained below in detail with reference to the accompanying drawings. The present invention is not limited to the embodiments. In addition, constituent elements in the following embodiments include those that can be easily assumed by those skilled in the art or that are substantially the same.

（実施の形態）
本実施の形態に係る音声切換装置は、マイクから入力される外部音声を録音音声として記憶する記憶手段と、前記マイクから入力される外部音声が判定条件に一致するか否かを判定する解析手段と、前記記憶手段に記憶した録音音声を再生する再生手段と、音楽再生装置から入力される音楽、前記マイクから入力される外部音声、および前記再生手段で再生される録音音声を択一的に切り換えてスピーカーに出力する切換手段と、を備え、前記解析手段により前記マイクから入力される外部音声が判定条件に一致すると判断された場合、前記再生手段は、条件一致を検出した時点よりも所定時間前からの録音音声を再生し、前記切換手段は当該再生した録音音声をスピーカーに出力する。この所定時間は、判定条件に一致した音声および利用者が該一致した音声の前後関係を把握するために必要な先行時間を含む長さ（図６のΔＴ＝Δｔ１＋Δｔ３参照）であることが望ましい。 (Embodiment)
The audio switching device according to the present embodiment includes a storage unit that stores external audio input from a microphone as recorded audio, and an analysis unit that determines whether the external audio input from the microphone matches a determination condition. And playback means for playing back the recorded voice stored in the storage means, music input from the music playback device, external voice input from the microphone, and recorded voice played back by the playback means. Switching means for switching and outputting to a speaker, and when the external sound input from the microphone is determined to match the determination condition by the analysis means, the playback means is more predetermined than when the condition match is detected. The recorded voice from before the time is played back, and the switching means outputs the played back recorded voice to the speaker. The predetermined time is desirably a length including the voice that matches the determination condition and the preceding time required for the user to grasp the context of the matched voice (see ΔT = Δt1 + Δt3 in FIG. 6).

本実施の形態の音声切換装置によれば、マイクから入力される外部音声が判定条件に一致した場合に、現時点よりも所定時間前からの録音音声を再生して、スピーカーに出力することができ、ヘッドフォン等を使用して音楽鑑賞をしている場合に、周囲の状況変化を的確に把握することが可能となる。 According to the audio switching device of the present embodiment, when the external audio input from the microphone matches the determination condition, the recorded audio from a predetermined time before the current time can be reproduced and output to the speaker. When listening to music using headphones or the like, it becomes possible to accurately grasp changes in the surrounding environment.

また、再生手段は、録音音声を短縮再生（録音音声の録音時間よりも短い時間で再生）し、具体的には、通常の音声再生よりも早い再生スピードで再生（スピード再生）し、若しくは、無音部分をＳＫＩＰ（スキップ）して再生（ＳＫＩＰ再生）するのが望ましい。
また、再生手段は、前記録音音声の再生は実時間に追いつくまで行なうことが望ましい。これにより、ライブの外部音声に追いつくことができ、意識の切換をスムースにさせることができる。 Further, the playback means plays back the recorded voice in a shortened manner (plays back in a time shorter than the recording time of the recorded voice), specifically, plays back at a playback speed faster than normal voice playback (speed playback), or It is desirable to reproduce the silence (SKIP reproduction) by skipping the silent part.
Further, it is desirable that the reproduction means performs reproduction of the recorded voice until it catches up with real time. As a result, it is possible to catch up with live external audio and to smoothly switch consciousness.

また、前記録音音声の再生の前に、所定の音声を再生する音声再生手段を備えることが望ましい。これにより、利用者は録音音声が再生されることを前もって知ることが可能となる。 In addition, it is desirable to provide a sound reproducing means for reproducing a predetermined sound before reproducing the recorded sound. As a result, the user can know in advance that the recorded voice will be reproduced.

また、解析手段は、判定条件として、キーワード、声紋、音声パターン、および音声レベルのうちの１または複数を予め記憶しておき、マイクから入力される外部音声がキーワード、声紋、音声パターン、および音声レベル等と一致するか否かを判定することが望ましい。これにより、外部音声がキーワード、声紋、音声パターン、および音声レベルの条件に一致した場合に、利用者は現時点よりも所定時間前からの録音音声を聴くことができ、例えば、ヘッドフォンをして音楽鑑賞をしている電車の利用者は、降車駅のような外部アナウンスを的確に判別することができ、また、外部から利用者への呼びかけや周辺環境状況の変化等を認識することが可能となる。 Further, the analysis means stores in advance one or more of keywords, voiceprints, voice patterns, and voice levels as determination conditions, and external voices input from the microphone are keywords, voiceprints, voice patterns, and voices. It is desirable to determine whether or not the level matches. As a result, when the external sound matches the keyword, voiceprint, sound pattern, and sound level conditions, the user can listen to the recorded sound from a predetermined time before the current time. Train users who are watching can accurately identify external announcements such as getting off stations, and can recognize external calls to users and changes in the surrounding environment. Become.

また、再生手段は、録音音声を短縮再生（録音音声の録音時間よりも短い時間で再生）し、具体的には、通常の音声再生よりも早い再生スピードで再生（スピード再生）し、若しくは、無音部分をＳＫＩＰ（スキップ）して再生（ＳＫＩＰ再生）するのが望ましい。これにより、ライブの外部音声に追いつくことができ、意識の切換をスムースにさせることができる。 Further, the playback means plays back the recorded voice in a shortened manner (plays back in a time shorter than the recording time of the recorded voice), specifically, plays back at a playback speed faster than normal voice playback (speed playback), or It is desirable to reproduce the silence (SKIP reproduction) by skipping the silent part. As a result, it is possible to catch up with live external audio and to smoothly switch consciousness.

また、解析手段は、判定条件として、キーワード、声紋、音声パターン、および音声レベルのうちの１または複数を予め記憶しておき、マイクから入力される外部音声がキーワード、声紋、音声パターン、および音声レベル等と一致するか否かを判定することが望ましい。これにより、外部音声がキーワード、声紋、音声パターン、および音声レベルの条件に一致した場合に、利用者は現時点よりも所定時間前からの所定時間前（図６のΔＴ＝Δｔ１＋Δｔ３）からの録音音声を聴くことができ、例えば、ヘッドフォンをして音楽鑑賞をしている電車の利用者は、降車駅のような外部アナウンスを的確に判別することができ、また、外部から利用者への呼びかけや周辺環境状況の変化等を認識することが可能となる。 Further, the analysis means stores in advance one or more of keywords, voiceprints, voice patterns, and voice levels as determination conditions, and external voices input from the microphone are keywords, voiceprints, voice patterns, and voices. It is desirable to determine whether or not the level matches. Thus, when the external sound matches the keyword, voiceprint, sound pattern, and sound level conditions, the user can record the sound from a predetermined time before the current time (ΔT = Δt1 + Δt3 in FIG. 6). For example, train users who are listening to music using headphones can accurately identify external announcements such as getting off stations, and can also call users from outside. It becomes possible to recognize changes in the surrounding environment.

また、スピーカーは、ヘッドフォンに内蔵されるスピーカーであることが望ましい。これにより、ヘッドフォンを使用して、音声再生装置で再生される音楽を鑑賞している際に、周囲の状況変化を的確に把握することが可能となる。 The speaker is preferably a speaker built in the headphones. This makes it possible to accurately grasp changes in the surrounding situation while listening to music played by the audio playback device using headphones.

（実施例）
図１は、本発明に係る音声切換装置を適用したヘッドフォンシステムの外観構成を示す図である。同図において、１０は、ＣＤやＭＤ、フラッシュメモリなどの記録メディアに記憶された音楽データを再生する携帯可能な音声再生装置、２０は、操作部２１やマイク２２が搭載されるスイッチ部、２１は、音声再生装置１０の音楽の再生／停止等を操作するための操作部、２２は、外部の音声を集音するためのマイク、３０はヘッドフォン、３１は、ヘッドフォン３０に搭載されたスピーカーを示している。本発明に係る音声切換装置１００は、スイッチ部２０の内部に搭載される。 (Example)
FIG. 1 is a diagram showing an external configuration of a headphone system to which an audio switching device according to the present invention is applied. In the figure, 10 is a portable audio playback device that plays back music data stored in a recording medium such as a CD, MD, or flash memory, 20 is a switch unit on which an operation unit 21 and a microphone 22 are mounted, 21 Is an operation unit for operating music playback / stop of the audio playback device 10, 22 is a microphone for collecting external audio, 30 is a headphone, 31 is a speaker mounted on the headphone 30. Show. The voice switching device 100 according to the present invention is mounted inside the switch unit 20.

図２は、本発明に係る音声切換装置１００のブロック構成を示す図である。音声切換装置１００は、図２に示すように、記憶装置１０１、再生装置１０２、解析装置１０３、検出器１０４、切換器１０５を備えている。 FIG. 2 is a diagram showing a block configuration of the voice switching device 100 according to the present invention. As shown in FIG. 2, the voice switching device 100 includes a storage device 101, a playback device 102, an analysis device 103, a detector 104, and a switching device 105.

マイク２１は、音声切換装置１００の利用中（電源稼動中）、終始周囲の音声を集音し、集音された音声は外部音声として、記憶装置１０１、音声解析装置１０３、切換器１０５に出力される。 The microphone 21 collects surrounding sounds from the beginning to the end while the sound switching device 100 is being used (power supply is operating), and the collected sound is output as external sound to the storage device 101, the sound analysis device 103, and the switch 105. Is done.

切換器１０５は、検出器１０４や操作部２２からの切換指示に従って、マイク２１から入力される外部音声、再生装置１０２から入力される録音音声、音楽再生装置１０から入力される音楽を択一的に切り換えて、スピーカー３１に出力する。 The switch 105 selectively selects external sound input from the microphone 21, recorded sound input from the playback device 102, or music input from the music playback device 10 in accordance with a switching instruction from the detector 104 or the operation unit 22. And output to the speaker 31.

記憶装置１０１は、電源稼動中、マイク２１から入力される外部音声の録音を継続する。ここで、録音によって録音音声が保持される時間をΔＴとする。 The storage device 101 continues recording external sound input from the microphone 21 while the power supply is operating. Here, ΔT is the time during which the recorded sound is retained by recording.

検出器１０４は、音楽再生装置１０からの音楽の入力／停止や再生装置１０２からの録音音声の入力を検出して、切換器１０５に対する切換指示や解析装置１０３に対する解析指示を出力する。 The detector 104 detects the input / stop of music from the music playback device 10 and the input of recorded sound from the playback device 102, and outputs a switching instruction to the switch 105 and an analysis instruction to the analysis device 103.

再生装置１０２は、解析装置１０３から録音音声再生指示（ＰＬＡＹ）が入力されると、記憶装置１０１に蓄積された外部音声を録音音声として再生して、検出器１０４および切換器１０５に出力する。 When the recorded sound playback instruction (PLAY) is input from the analysis device 103, the playback device 102 plays back the external sound stored in the storage device 101 as recorded sound and outputs it to the detector 104 and the switch 105.

解析装置１０３は、検出器１０４から判定開始指示が入力されると、マイク２１から入力される外部音声を認識し、予め保存している判定条件テーブル（図４参照）に登録されている判定条件（所定キーワード、声紋、音声パターン、音声レベル）と一致するか否かを判定し、一致した場合には、解析を停止して再生装置１０２に対して、録音音声再生指示（ＰＬＡＹ）を出力する。 When the determination start instruction is input from the detector 104, the analysis apparatus 103 recognizes the external sound input from the microphone 21, and the determination condition registered in the determination condition table (see FIG. 4) stored in advance. It is determined whether or not it matches (predetermined keyword, voice print, voice pattern, voice level). If they match, the analysis is stopped and a recorded voice playback instruction (PLAY) is output to the playback device 102. .

具体的には、解析装置１０３は、入力される外部音声に対して、音声認識する音声認識手段、声紋認識する声紋認識手段、音声パターン認識する音声パターン認識手段、音声レベルを測定する音声レベル測定手段等を備えており、入力される外部音声が、予め条件判定テーブルに登録されている所定キーワード、声紋、音声パターン、音声レベルと一致するか否かを検出する。 Specifically, the analysis device 103 performs voice recognition for voice recognition, voiceprint recognition for voiceprint recognition, voice pattern recognition for voice pattern recognition, and voice level measurement for measuring voice level with respect to input external voice. Means for detecting whether or not the input external voice matches a predetermined keyword, voice print, voice pattern, and voice level registered in the condition determination table in advance.

図４は、判定条件テーブルの構成例を示す図である。判定条件テーブルには、図４に示すように、所定キーワード、声紋、音声パターン、および音声レベルの比較要素と、通知する場合の関連語句が登録される。判定条件テーブルのデータは不図示の入力手段で利用者が任意に登録可能となっている。例えば、図４に示す判定条件テーブルでは、所定キーワードとして「うらわ」、声紋として「友人Ａの声紋データ」、音声レベルとして「２０ｄＢ以下」、および音声パターンとして「電話ベルの音声パターンデータ」が登録されており、解析装置１０３は、外部音声がこれらの比較要素（判定条件）の少なくとも１つと一致するか否かの判定を行う。なお、条件一致を判定した場合には、対応する関連語句の音声をスピーカー３１から出力して、利用者にどの判定条件で一致を検出したかを通知することにしてもよい。例えば、図４に示す例では、「所定キーワード」の「うらわ」で条件一致を検出した場合には、「うらわ」を検出した旨を通知し、また、「友人Ａの声紋データ」で条件一致を検出した場合には、「友人Ａの声紋」を検出した旨を通知する。 FIG. 4 is a diagram illustrating a configuration example of the determination condition table. In the determination condition table, as shown in FIG. 4, predetermined keywords, voiceprints, voice patterns, and voice level comparison elements, and related words / phrases for notification are registered. The data of the determination condition table can be arbitrarily registered by the user using an input means (not shown). For example, in the determination condition table shown in FIG. 4, “Urawa” as a predetermined keyword, “voice print data of friend A” as a voice print, “20 dB or less” as a voice level, and “phone bell voice pattern data” as a voice pattern. The analysis apparatus 103 is registered and determines whether or not the external voice matches at least one of these comparison elements (determination conditions). When the condition match is determined, the sound of the corresponding related phrase may be output from the speaker 31 to notify the user of the determination condition under which the match is detected. For example, in the example shown in FIG. 4, when a condition match is detected with “Urawa” of “predetermined keyword”, the fact that “Urawa” has been detected is notified, and “voice print data of friend A” is used. If a condition match is detected, a notification that “friend A's voiceprint” has been detected is sent.

また、ここでは、判定条件を所定キーワード、声紋、音声パターン、および音声レベルとしているが、これらの全てを判定条件とする必要はなく、所定キーワード、声紋、音声パターン、および音声レベルのうちの１または複数を判定条件とすることができる。また、ここでは、一致の判定をＯＲ条件としているが、ＡＮＤ条件としてもよい。 In addition, here, the determination condition is a predetermined keyword, a voiceprint, a voice pattern, and a voice level. However, it is not necessary to set all of them as the determination condition, and one of the predetermined keyword, voiceprint, voice pattern, and voice level. Alternatively, a plurality of determination conditions can be used. Further, here, the determination of coincidence is an OR condition, but an AND condition may be used.

図３は、図１の音声切換装置１００の動作を説明するためのフローチャートである。以下の説明では、利用者がヘッドフォン３０を使用して、音楽再生装置１０で再生される音楽を鑑賞する場合について説明する。 FIG. 3 is a flowchart for explaining the operation of the voice switching device 100 of FIG. In the following description, a case will be described in which the user uses the headphones 30 to watch music played on the music playback device 10.

図３において、まず、音声切換装置１００では、電源が投入されると、マイク２１は、周囲の音声を集音して外部音声として、記憶装置１０１、音声解析装置１０３、および切換器１０５に出力する。また、切換器１０５は、ＳＷを端子ａに切換えて、マイク２１から入力される外部音声をスピーカー３１より出力する。これにより、ヘッドフォン３０をしている利用者が、ヘッドフォン３０をすることによって阻害される周囲からの音声を、音声切換装置１００を経由して明確に聞き取ることが可能となる。すなわち、ヘッドフォン３０を外すことなく、リアルタイムに周囲の音を確認したり、会話したりすることが可能となる。また、記憶装置１０１は、マイク２１から入力される外部音声を録音し続ける。録音によって保持される時間はΔＴとする。 In FIG. 3, first, in the voice switching device 100, when the power is turned on, the microphone 21 collects surrounding voices and outputs them as external voices to the storage device 101, the voice analysis device 103, and the switch 105. To do. Further, the switch 105 switches the SW to the terminal a and outputs the external sound input from the microphone 21 from the speaker 31. Thus, the user wearing the headphones 30 can clearly hear the sound from the surroundings obstructed by the headphones 30 via the sound switching device 100. In other words, it is possible to check the surrounding sounds and talk in real time without removing the headphones 30. In addition, the storage device 101 continues to record external sound input from the microphone 21. The time retained by recording is ΔT.

つづいて、検出器１０４は、音楽入力信号を電圧レベルなどから検出することで、音楽再生装置１０から音楽が入力されたか否かを検出し（ステップＳ１１）、音楽の入力を検出した場合には（ステップＳ１１の「Ｙｅｓ」）、外部音声解析処理を実行する（ステップＳ１２）。なお、本実施例では、音楽入力信号を電圧レベルなどから検出することで、音声切換装置１００が自律して稼動できるような構成としているが、音楽再生開始の検出を音楽再生装置１０からの再生指示を制御信号として受信することで判断しても良い。また、音声切換のタイミングで、ビープ音や効果音などを鳴らすことにしてもよい。これにより、利用者に、これから以前とは異なる選択音声が出力されるということを意識させることができる。また、現在の切換状況を利用者に知らせるため、スイッチ部２０にＬＥＤや液晶パネルなどの表示装置を合わせて用意することで、切換状況（何の音声が出力されているか）を通知することにしてもよい。 Subsequently, the detector 104 detects whether or not music is input from the music playback device 10 by detecting a music input signal from a voltage level or the like (step S11). (“Yes” in step S11), external voice analysis processing is executed (step S12). In the present embodiment, the audio switching device 100 is configured to be able to operate autonomously by detecting the music input signal from the voltage level or the like. However, the start of music playback is detected from the music playback device 10. The determination may be made by receiving the instruction as a control signal. In addition, a beep sound or a sound effect may be generated at the timing of voice switching. As a result, the user can be made aware that a selection voice different from the previous one will be output. In addition, in order to inform the user of the current switching status, a switch device 20 is provided with a display device such as an LED or a liquid crystal panel to notify the switching status (what audio is being output). May be.

ここで、外部音声解析処理を詳細に説明する。検出器１０４は、音楽入力を検知すると、切換器１０５に対して、音楽出力を行うための切換指示（ＳＷ：ｂ）を出力すると共に、解析装置１０３に対して、判定開始指示（解析指示）を出力する。 Here, the external audio analysis process will be described in detail. When detecting the music input, the detector 104 outputs a switching instruction (SW: b) for performing music output to the switching unit 105 and at the same time a determination start instruction (analysis instruction) to the analysis device 103. Is output.

切換器１０５は、検出器１０４から音楽出力を行うための切換指示（ＳＷ：ｂ）が入力されると、ＳＷを端子ｂに切換え、音声出力がスピーカー３１に出力される。これにより、ユーザはスピーカー３１より再生された音楽を聴くことができる。また、解析装置１０３は、検出器１０４から判定開始指示を受けとると、判定条件テーブルに予め登録されている判定条件（所定キーワード、声紋、音声パターン、音声レベル）と、入力された外部音声が一致するかどうかの判定を開始し（ステップＳ１５）、一致しない場合には、ステップＳ１３に戻る一方、一致した場合には、外部音声解析処理を停止し、ステップＳ１６に移行する。 When the switching instruction (SW: b) for performing music output from the detector 104 is input to the switching device 105, the switching device 105 switches the SW to the terminal b, and the audio output is output to the speaker 31. Thereby, the user can listen to the music reproduced from the speaker 31. When the analysis apparatus 103 receives a determination start instruction from the detector 104, the determination condition (predetermined keyword, voiceprint, sound pattern, sound level) registered in the determination condition table matches the input external sound. In step S15, if not matched, the process returns to step S13. If matched, the external sound analysis process is stopped, and the process proceeds to step S16.

検出器１０４は、音楽再生を検知している間、音楽停止に関しても監視を行う（ステップＳ１３）。検知器１０４は、音楽停止を検知した場合、解析中止処理を実行した後（ステップＳ１４）、ステップＳ１１に戻る。なお、音楽停止の検出は外部の音楽再生装置１０からの停止指示を制御信号として受信することで判断しても良い。 While detecting the music reproduction, the detector 104 also monitors the music stop (step S13). When detecting the music stop, the detector 104 performs an analysis stop process (step S14), and then returns to step S11. The detection of music stop may be determined by receiving a stop instruction from the external music playback device 10 as a control signal.

この解析中止処理では、検出器１０４は、音楽停止を検知すると、切換器１０５に対して外部音声出力を行うための切換指示（ＳＷ：ａ）を出力すると共に、解析装置１０３に対して、判定中止指示（解析中止指示）を出力する。切換器１０５は、検出器１０４から外部音声出力を行うための切換指示（ＳＷ：ａ）が入力されると、ＳＷを端子ａに切換え、外部音声がスピーカー３１に出力される。これにより、利用者は音楽が停止している間はヘッドフォン３０を外すことなく、リアルタイムに周囲の音を確認したり、会話したりすることが可能となる。また、解析装置１０３は、検出器１０４から判定中止指示を受けとると、外部音声の解析を中止する。 In this analysis stop processing, when detecting the music stop, the detector 104 outputs a switching instruction (SW: a) for outputting an external sound to the switch 105 and determines to the analysis device 103. Outputs a stop instruction (analysis stop instruction). When a switching instruction (SW: a) for outputting an external sound is input from the detector 104, the switch 105 switches the SW to the terminal a, and the external sound is output to the speaker 31. As a result, the user can check the surrounding sounds and have a conversation in real time without removing the headphones 30 while the music is stopped. In addition, when the analysis apparatus 103 receives the determination stop instruction from the detector 104, the analysis apparatus 103 stops the analysis of the external sound.

ステップＳ１６では、記憶音声再生処理が実行される。この記憶音声再生処理では、解析装置１０３は、再生装置１０２に対して、記憶した録音音声の録音音声再生指示（ＰＬＡＹ）を出力する。再生装置１０２は、解析装置１０３から録音音声再生指示（ＰＬＡＹ）が入力されると、記憶装置１０１に蓄積された外部音声を録音音声として再生して、検出器１０４に出力する。 In step S16, a stored audio reproduction process is executed. In the stored voice reproduction process, the analysis apparatus 103 outputs a recorded voice reproduction instruction (PLAY) of the stored recorded voice to the reproduction apparatus 102. When the recording device playback instruction (PLAY) is input from the analysis device 103, the playback device 102 plays back the external sound stored in the storage device 101 as the recorded sound and outputs it to the detector 104.

なお、録音再生している際中も音楽再生装置１０の音楽再生は継続しているが、音楽再生装置１０と音声切換装置１００の間に制御ラインが設けられている場合、録音再生と同期して、音楽再生停止信号を外部の音楽再生装置１０に通知してもよい。これにより、音楽が利用者の意図しない状況で再生され続けるという状況を回避することができる。 Note that music playback of the music playback device 10 continues during recording and playback, but when a control line is provided between the music playback device 10 and the voice switching device 100, the playback is synchronized with the recording and playback. Thus, the music playback stop signal may be notified to the external music playback device 10. As a result, it is possible to avoid a situation in which music continues to be played back in situations not intended by the user.

検出器１０４は、再生装置１０２から出力される録音音声を検出すると、切換器１０５に対して録音音声をスピーカー３１に出力するための切換指示（ＳＷ：ｃ）を出力する。切換器１０５は、検出器１０４から録音音声出力を行うための切換指示（ＳＷ：ｃ）が入力されると、ＳＷを端子ｃに切換え、録音音声がスピーカー３１に出力される。 When the detector 104 detects the recorded sound output from the playback device 102, the detector 104 outputs a switching instruction (SW: c) for outputting the recorded sound to the speaker 31 to the switch 105. When the switching instruction (SW: c) for performing the recording sound output from the detector 104 is input from the detector 104, the switching device 105 switches the SW to the terminal c, and the recording sound is output to the speaker 31.

ここで、再生装置１０２で再生される録音音声について説明する。再生装置１０２による録音音声の再生は、録音音声開始時点（判定条件の一致を検出した時点）よりも所定時間ΔＴ過去から開始する。図５は、録音音声の再生方法（その１）を説明するための図である。所定時間ΔＴは、下記の条件式（１）を満たすことが望ましい。 Here, the recorded sound reproduced by the reproducing apparatus 102 will be described. The reproduction of the recorded voice by the playback device 102 starts from the past of the predetermined time ΔT from the recording voice start time (when the coincidence of the determination conditions is detected). FIG. 5 is a diagram for explaining a recorded sound reproduction method (part 1). The predetermined time ΔT desirably satisfies the following conditional expression (1).

所定時間ΔＴ＝所定の音声が出力される時間Δｔ１＋一定時間Δｔ３・・・（１）
図５において、Δｔ１は、所定の音声が出力される時間であり、説明の便宜上、解析装置１０３の条件一致の判定に必要な解析時間と同じ時間であるとしている。例えば、Δｔ１は、所定キーワードとして「うらわ」が登録されている場合、「うらわ」の音声が出力されるだけの時間である。また、Δｔ３は、利用者が該一致した音声の前後関係を把握するために必要な先行時間である。例えば、Δｔ２を、「みなみ」を音声出力する時間とした場合、Δｔ３は、「みなみ」を十分判断できるだけの時間である。なお、Δｔ３を操作者が任意に設定できる構成としてもよい。 Predetermined time ΔT = predetermined sound output time Δt1 + predetermined time Δt3 (1)
In FIG. 5, Δt1 is a time during which a predetermined sound is output, and for the sake of convenience of explanation, it is assumed that it is the same time as the analysis time necessary for determining whether the analysis apparatus 103 matches the condition. For example, Δt1 is a time during which “Urawa” sound is output when “Urawa” is registered as the predetermined keyword. Δt3 is a preceding time required for the user to grasp the context of the matched voice. For example, when Δt2 is a time for outputting “Minami” as a sound, Δt3 is a time that can sufficiently determine “Minami”. Note that Δt3 may be arbitrarily set by the operator.

このように、「うらわ」というキーワードに対して、所定時間ΔＴ過去からの録音音声を再生することで、その前後関係を把握でき、判定一致が純粋に「うらわ」でヒットしたのか、「みなみうらわ」や「きたうらわ」でヒットしたのかを判別することができる。また、「うらわ」というキーワードに対して、外部音声をそのまま録音音声として再生することで、判定一致した状況が降車のための車掌アナウンスによるものか、周囲の人が目的地とは関係なく話した内容なのかを判別することができる。これにより、利用者はヘッドフォン３０で音楽を聴きながら、利用者にとって必要な周囲の状況を的確に把握することが可能となる。 In this way, by playing back the recorded voice from the past for a predetermined time ΔT for the keyword “Urawa”, it is possible to grasp the context, and whether the judgment match is purely “Urawa”, “ It is possible to determine whether it has been hit with Minami Urawa or Kita Urawa. Also, by playing the external voice directly as recorded voice for the keyword “Urawa”, whether the matching situation is due to the conductor announcement for getting off the vehicle, or the surrounding people talks regardless of the destination. Can be determined. As a result, the user can accurately grasp the surrounding conditions necessary for the user while listening to music with the headphones 30.

また、［声紋データ］の場合には、例えば、友人Ａが「夕飯を食べに行こうよ」と呼びかけた際、［友人Ａの声紋］という判定一致の状況を、ΔＴ過去からの録音音声を再生することで、友人Ａが呼びかけの際に何と呼びかけたのかを判別することができる。 Also, in the case of [voice print data], for example, when a friend A calls “Let's go to eat dinner”, the situation of the determination coincidence [voice print of friend A] is expressed as ΔT from the past. By playing, it is possible to determine what the friend A called when calling.

また、［音声パターン］の場合には、電話のベルが鳴ったこと、また、その直後に電話ベルが鳴り止んだとしても、判定一致した状況を、ΔＴ過去からの録音音声を再生することで、電話のベルが鳴ったことを判断することができる。 Also, in the case of [Voice Pattern], even if the telephone bell rings, and even if the telephone bell stops ringing immediately after that, the situation where the judgment coincides can be reproduced by playing the recorded voice from the past ΔT. , You can determine that the phone bell rang.

また、［音声レベル］の場合には、周囲の状況が静かになったという状況を判別することができる。これは周囲の状況に対して音漏れを気にする人に対しては周囲が静かになったから、音楽のボリュームを下げるといった判断を行うのに適している。なお、周囲がうるさくなったという状況を判別して、周囲がうるさいから音楽のボリュームを上げることにしてもよい。 Further, in the case of [audio level], it is possible to determine a situation where the surrounding situation is quiet. This is suitable for a person who cares about sound leakage in the surrounding situation because the surroundings have become quieter and the volume of music is reduced. Note that it may be determined that the surroundings are noisy and the music volume is increased because the surroundings are noisy.

なお、上述したように、録音音声を再生開始する直前に今回の判定が何に起因して発生したのかを、スピーカー３１から利用者に通知する構成としてもよい。この通知によって、利用者は周囲の状況を把握するのにより優位なデータを取得することが可能となるからである。或いはチャイム音などにより注意を促がすようにしてもよい。また、これらを併用してもよい。 Note that, as described above, the speaker 31 may be notified to the user of what caused the current determination immediately before the reproduction of the recorded sound is started. This is because this notification allows the user to acquire superior data by grasping the surrounding situation. Or you may make it call attention with a chime sound. These may be used in combination.

また、録音音声を再生する場合は短縮再生する。ここで、短縮再生とは、録音音声の録音時間よりも短い時間で再生することをいい、例えば、通常の音声再生よりも早い再生スピードで再生（スピード再生）し、若しくは、無音部分を効率よくＳＫＩＰして再生（ＳＫＩＰ再生）する。 In addition, when the recorded voice is played back, the playback is shortened. Here, the shortened playback means playback in a shorter time than the recording time of the recorded voice, for example, playback at a playback speed faster than normal voice playback (speed playback), or efficient silence portion. SKIP playback (SKIP playback).

図６は、本実施例に係る録音音声の再生方法（その２）を説明するための図である。図６において、録音音声を通常スピードで再生すると、スピーカー３１から出力される音声は、常にΔＴ分だけ過去の音声となる。本実施例では、ΔＴのずれを通常スピードより早く再生すること、若しくは無音部分をＳＫＩＰすることでその差分が「０」に近づいて行くように再生して、短縮再生時間（ΔＳ）が経過した時点、つまり差分が「０」となった時点で、再生装置１０２による録音音声の再生を停止し、切換器１０５により、スピーカー出力を外部音声に切り替える。これにより、ヘッドフォン３０で音楽を聞きながら目的の外部音声を認識でき、かつ外部音声認識後への復帰（意識の切換）をスムースにさせることができる。 FIG. 6 is a diagram for explaining a method (part 2) for reproducing a recorded sound according to the present embodiment. In FIG. 6, when the recorded sound is reproduced at the normal speed, the sound output from the speaker 31 is always the past sound by ΔT. In this embodiment, the difference in ΔT is reproduced faster than the normal speed, or the silence is skipped so that the difference approaches “0”, and the shortened reproduction time (ΔS) has elapsed. At the time, that is, when the difference becomes “0”, the playback of the recorded sound by the playback device 102 is stopped, and the switcher 105 switches the speaker output to the external sound. As a result, the target external sound can be recognized while listening to music with the headphones 30, and the return (recognition switching) after the external sound recognition can be performed smoothly.

例えば、音楽を聴いている最中に友人Ａに話しかけられた状況を想定すると、まず、声紋による判定一致によって、「声紋、友人Ａ」で条件一致を通知する。これにより、音楽を聴いている最中に友人Ａが呼びかけをしたということが認識できる。次に、録音音声を短縮再生することで、友人Ａがどのような内容の呼びかけをしたのかを把握することができる。さらに、友人Ａが話を継続したとしても、そのまま友人Ａの会話を中断することなく、その話を途切れなく聞いて理解することが可能となる。 For example, assuming a situation in which a friend A talks to music while listening to music, first, “voice comprehension, friend A” is notified of a condition match by determination match by voice print. Thereby, it can be recognized that the friend A called while listening to the music. Next, by shortening and reproducing the recorded voice, it is possible to grasp what kind of content the friend A called. Furthermore, even if the friend A continues the conversation, the conversation can be heard and understood without interruption without interrupting the conversation of the friend A.

さて、上記ステップＳ１６の記憶音声再生処理が終了した後は、検出器１０５は、切換器１０５に対して外部音声をスピーカー３１出力するための切換指示（ＳＷ：ａ）を出力する。切換器１０５は、検出器１０４から外部音声出力を行うための切換指示（ＳＷ：ａ）が入力されると、ＳＷを端子ａに切換え、外部音声がスピーカー３１から出力される。なお、この場合、利用者が新たに音楽鑑賞を行う場合は、操作部２２から手動にて切換器１０５に音楽出力するための切換指示（ＳＷ：ｂ）を出力する。なお、一定時間経過後に、自動的に切換器１０５に音楽出力するための切換指示（ＳＷ：ｂ）を出力する構成としてもよい。 Now, after the stored audio reproduction process in step S16 is completed, the detector 105 outputs a switching instruction (SW: a) for outputting the external audio to the speaker 31 to the switch 105. When a switching instruction (SW: a) for outputting an external sound is input from the detector 104, the switch 105 switches the SW to the terminal a and the external sound is output from the speaker 31. In this case, when the user newly listens to music, a switching instruction (SW: b) for manually outputting music to the switching unit 105 is output from the operation unit 22. In addition, it is good also as a structure which outputs the switch instruction | indication (SW: b) for outputting music to the switch 105 automatically after fixed time progress.

以上説明したように、本実施例に係る音声切換装置１００によれば、マイク２１から入力される外部音声を録音音声として記憶する記憶装置１０１と、マイク２１から入力される外部音声が判定条件に一致するか否かを判定する解析装置１０３と、記憶装置１０１に記憶した録音音声を再生する再生装置１０２と、検出器１０４の指示に応じて、音楽再生装置１０から入力される音楽、マイク２１から入力される外部音声、および再生装置１０２で再生される録音音声を択一的に切り換えてスピーカー３１に出力する切換器１０５と、を備え、解析装置１０２によりマイク２１から入力される外部音声が判定条件に一致すると判断された場合、条件一致を検出時点より所定時間前からの録音音声を再生して録音音声をスピーカー３１に出力することとしたので、マイクから入力される外部音声が判定条件に一致した場合に、条件一致を検出時点よりも所定時間前からの録音音声を再生して、スピーカーに出力することができ、ヘッドフォン等を使用して音楽鑑賞をしている場合に、周囲の状況変化を的確に把握することが可能となる。 As described above, according to the voice switching device 100 according to the present embodiment, the storage device 101 that stores the external voice input from the microphone 21 as the recorded voice and the external voice input from the microphone 21 are used as the determination conditions. Analyzing device 103 that determines whether or not they match, playback device 102 that plays back the recorded voice stored in the storage device 101, and music input from the music playback device 10 according to instructions from the detector 104, microphone 21 And a switch 105 that selectively switches between the external sound input from the recording device and the recorded sound reproduced by the playback device 102 and outputs the same to the speaker 31, and the external sound input from the microphone 21 by the analysis device 102 When it is determined that the determination condition is met, the recorded sound from a predetermined time before the time point when the condition match is detected is played and the recorded sound is output to the speaker 31. Therefore, when the external sound input from the microphone matches the judgment condition, the recorded sound from a predetermined time before the point of detection of the condition match can be played and output to the speaker, such as headphones. When listening to music using the, it is possible to accurately grasp the changes in the surrounding situation.

また、本実施例に係る音声切換装置１００によれば、再生装置１０２は、録音音声を短縮再生（録音音声の録音時間よりも短い時間で再生）し、具体的には、通常の音声再生よりも早い再生スピードで再生（スピード再生）し、若しくは無音部分をＳＫＩＰして再生（ＳＫＩＰ再生）することとしたので、ライブの外部音声に追いつくことができ、意識の切換をスムースにさせることができる。 Further, according to the audio switching device 100 according to the present embodiment, the reproducing device 102 reproduces the recorded audio in a shortened manner (reproduced in a time shorter than the recording time of the recorded audio), and more specifically, than normal audio reproduction. Since the playback is performed at a fast playback speed (speed playback) or the silent part is played back by SKIP (SKIP playback), it is possible to catch up with the live external sound and to smoothly switch the consciousness. .

また、本実施例に係る音声切換装置１００によれば、解析装置１０３は、判定条件として、キーワード、声紋、音声パターン、および音声レベルのうちの１または複数を予め記憶しておき、マイク２１から入力される外部音声がキーワード、声紋、音声パターン、および音声レベル等と一致するか否かを判定することとしたので、外部音声がキーワード、声紋、音声パターン、および音声レベルの条件に一致した場合に、利用者は所定時間前からの録音音声を聴くことができ、例えば、ヘッドフォンをして音楽鑑賞をしている電車の利用者は、降車駅のような外部アナウンスを的確に判別することができ、また、外部から利用者への呼びかけや周辺環境状況の変化等を認識することが可能となる。 Further, according to the voice switching device 100 according to the present embodiment, the analysis device 103 stores in advance one or more of a keyword, a voiceprint, a voice pattern, and a voice level as determination conditions. When it is determined whether the input external voice matches the keyword, voiceprint, voice pattern, and voice level, etc., the external voice matches the keyword, voiceprint, voice pattern, and voice level conditions In addition, the user can listen to the recorded sound from a predetermined time ago. For example, a train user who listens to music using headphones can accurately determine an external announcement such as an exit station. It is also possible to recognize external calls to users and changes in the surrounding environment.

また、本実施例に係る音声切換装置１００によれば、スピーカー３１は、ヘッドフォン３０に内蔵されることとしたので、ヘッドフォンを使用して、音声再生装置で再生される音楽を鑑賞している際に、周囲の状況変化を的確に把握することが可能となる。 Further, according to the audio switching device 100 according to the present embodiment, since the speaker 31 is built in the headphones 30, when listening to music reproduced by the audio reproducing device using the headphones. In addition, it becomes possible to accurately grasp the surrounding situation changes.

（変形例１）
図７は、本発明に係る音声切換装置の変形例１を説明するための図である。図７において、変形例１は上記実施例と異なり、音声切換装置１００を音声再生装置１０の内部に実装した構成である。 (Modification 1)
FIG. 7 is a view for explaining a first modification of the voice switching device according to the present invention. In FIG. 7, the modification 1 is different from the above embodiment in that the audio switching device 100 is mounted inside the audio reproducing device 10.

（変形例２）
図８は、本発明に係る音声切換装置の変形例２を説明するための図である。図８において、変形例３は、音声切換装置１００をヘッドフォンの内部に実装した構成である。 (Modification 2)
FIG. 8 is a diagram for explaining a second modification of the voice switching device according to the present invention. In FIG. 8, the modification 3 is the structure which mounted the audio | voice switching apparatus 100 inside the headphones.

（変形例３）
図２に示す音声切換装置１００では、入出力音声や制御信号を送信するために信号線（有線）を使用しているが、信号の送信を行う場合は有線に限られるものではなく、Ｂｌｕｅｔｏｏｔｈ等の無線通信を使用することにしてもよい。 (Modification 3)
In the voice switching device 100 shown in FIG. 2, a signal line (wired) is used to transmit input / output voices and control signals. However, the signal transmission is not limited to wired communication, such as Bluetooth. The wireless communication may be used.

（変形例４）
ヘッドセットを使用して電話会議・ＴＶ会議を実施する場合、周囲の音声状況の変化、例えば、他から声をかけられたり、電話が鳴ってないのに気づかないことがある。そこで、本実施例に係るヘッドフォンシステムを使用して電話会議・ＴＶ会議を実施する場合、音楽再生装置１０からの音楽を入力する代わりに、会議音声を入力とし、さらに、音声・ＴＶ会議の場合には、会議音声と外部音声のどちらの音声出力が優先かを前もって判断することが難しいため、会議音声をＭＵＴＥ（ミュート）せずに効果音やビープ音を会議音声に重ねてスピーカー３１に出力することにしてもよい。また、利用者にとって必要な状況変化が生じたことを、表示装置を使用して映像で知らせることにしてもよい。さらに、状況変化の音声による通知と、映像による通知の両方を行うことにしてもよく、音声および映像にどのような内容の状況変化があったかの情報を含めることにしてもよい。 (Modification 4)
When a conference call or video conference is performed using a headset, there may be a case where a change in the surrounding voice situation, for example, a voice call from other people or a phone call is not noticed. Therefore, in the case of conducting a telephone conference / TV conference using the headphone system according to the present embodiment, instead of inputting music from the music playback device 10, the conference audio is used as an input. Since it is difficult to determine in advance whether the audio output of the conference audio or the external audio is prioritized, the conference audio is superimposed on the conference audio and output to the speaker 31 without muting the conference audio. You may decide to do it. In addition, a display device may be used to notify the user of a situation change necessary for the user. Further, both notification by voice of situation change and notification by video may be performed, and information on what kind of situation change has occurred in voice and video may be included.

（変形例５）
また、本発明に係る音声切換装置１００は、ヘッドフォンを使用した場合に限られるものではなく、室内・車載用のスピーカー等に出力することにしてもよい。例えば、マイクを玄関のドアに配置し、かつ、音声切換装置１００、音楽再生装置、およびスピーカーを室内に配置し、玄関のマイクから入力される外部音声に応じて、スピーカーから出力される音声を切換ることにしてもよい。これにより、例えば、室内で音楽鑑賞をしている場合に、玄関のマイクで来客者の音声を検出した場合に、スピーカーの出力を来客者の音声に切り換えることが可能となる。 (Modification 5)
Moreover, the audio switching device 100 according to the present invention is not limited to the case where headphones are used, but may be output to indoor / in-vehicle speakers or the like. For example, a microphone is arranged at the door of the entrance, and the audio switching device 100, the music playback device, and the speaker are arranged in the room, and the sound output from the speaker is output according to the external sound input from the entrance microphone. It may be switched. Thereby, for example, when listening to music indoors, when the voice of the visitor is detected by the front microphone, the output of the speaker can be switched to the voice of the visitor.

本発明に係る音声切換装置は、音楽鑑賞をしている場合に、外部の状況変化を的確に把握する場合に有用であり、特に、ヘッドフォンで音楽鑑賞をする場合に有用である。 The audio switching device according to the present invention is useful for accurately grasping an external situation change when listening to music, and particularly useful for listening to music using headphones.

本発明に係る音声切換装置を適用したヘッドフォンシステムの外観構成を示す図である。It is a figure which shows the external appearance structure of the headphone system to which the audio | voice switching apparatus which concerns on this invention is applied. 本発明に係る音声切換装置のブロック構成を示す図である。It is a figure which shows the block configuration of the audio | voice switching apparatus based on this invention. 音声切換装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of a voice switching apparatus. 判定条件テーブルの構成例を示す図である。It is a figure which shows the structural example of a determination condition table. 録音音声の再生方法（その１）を説明するための図である。It is a figure for demonstrating the reproduction method (the 1) of recorded audio | voice. 録音音声の再生方法（その２）を説明するための図である。It is a figure for demonstrating the reproduction method (the 2) of recorded audio | voice. 変形例１を説明するための図である。It is a figure for demonstrating the modification 1. FIG. 変形例２を説明するための図である。It is a figure for demonstrating the modification 2. FIG.

Explanation of symbols

１０音声再生装置
２０スイッチ部
２１操作部
２２マイク
３０ヘッドフォン
３１スピーカー
１００音声切換装置
１０１記憶装置
１０２再生装置
１０３解析装置
１０４検出器
１０５切換器 DESCRIPTION OF SYMBOLS 10 Audio | voice reproduction apparatus 20 Switch part 21 Operation part 22 Microphone 30 Headphone 31 Speaker 100 Audio | voice switching apparatus 101 Memory | storage device 102 Playback apparatus 103 Analysis apparatus 104 Detector 105 Switcher

Claims

Storage means for storing external sound input from a microphone as recorded sound;
Analysis means for determining whether or not external sound input from the microphone matches a determination condition;
Playback means for playing back the recorded voice stored in the storage means;
Switching means for selectively switching between music input from a music playback device, external sound input from the microphone, and recorded sound played back by the playback means, and outputting to a speaker;
With
When it is determined by the analysis means that the external sound input from the microphone matches the determination condition, the reproduction means reproduces the recorded sound from a predetermined time before the time point when the condition match is detected, and the switching means Outputs the reproduced recording sound to the speaker.

The voice according to claim 1, wherein the predetermined time is a length including a voice that matches the determination condition and a preceding time required for a user to grasp a context of the matched voice. Switching device.

The voice switching device according to claim 1 or 2, wherein the playback means plays back the recorded voice in a shortened manner.

4. The voice switching device according to claim 3, wherein the playback means plays back the recorded voice until it catches up with real time.

The voice switching device according to any one of claims 1 to 4, further comprising voice playback means for playing back a predetermined voice before playing the recorded voice.

The voice switching device according to claim 1, wherein the determination condition is one or more of a keyword, a voiceprint, a voice pattern, and a voice level.

The voice switching device according to claim 1, wherein the speaker is a speaker built in a headphone.