JP7512684B2

JP7512684B2 - Speech input system and input speech processing method

Info

Publication number: JP7512684B2
Application number: JP2020094797A
Authority: JP
Inventors: 孝之内田
Original assignee: JVCKenwood Corp
Current assignee: JVCKenwood Corp
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2024-07-09
Anticipated expiration: 2040-05-29
Also published as: JP2021190860A

Description

本発明は、音声入力システム及び入力音声処理方法に関する。 The present invention relates to a voice input system and an input voice processing method.

特許文献１に、マイク及びイヤホンを備えたワイヤレスのヘッドセットを音声入力装置として用い、収音した使用者の発声をＡＩアシスタントに向け送出することが記載されている。また、特許文献２には、音声入力装置となるマイク付きのワイヤレスイヤホンが記載されている。 Patent document 1 describes using a wireless headset equipped with a microphone and earphones as a voice input device to transmit the user's captured speech to an AI assistant. Patent document 2 describes wireless earphones with a microphone that serve as a voice input device.

特開２０２０－０３０７８０号公報JP 2020-030780 A 特開２０１９－１９５１７９号公報JP 2019-195179 A

話者が、マイク付きワイヤレスイヤホンなどの音声入力装置に対し、ＡＩアシスタントに送出する音声を周囲音が大きい中で発すると、発した音声と共に大きい周囲音が音声入力装置のマイクで収音されてＡＩアシスタントに送出される。そのため、ＡＩアシスタントが使用者の音声を認識できずに適切な応答を行えなくなる虞がある。 When a speaker speaks into a voice input device, such as a wireless earphone with a microphone, to be sent to an AI assistant in the presence of loud ambient noise, the loud ambient noise is picked up by the microphone of the voice input device along with the spoken voice and sent to the AI assistant. As a result, there is a risk that the AI assistant will not be able to recognize the user's voice and will not be able to respond appropriately.

そこで、本発明が解決しようとする課題は、周囲音が大きくても、発した音声のＡＩアシスタントによる認識率が高い音声入力システム及び入力音声処理方法を提供することにある。 The problem that the present invention aims to solve is to provide a voice input system and an input voice processing method that can achieve a high recognition rate for spoken voice by an AI assistant even when the surrounding noise is loud.

上記の課題を解決するために、本発明は次の１）～４）の構成を有する。
１）話者の外耳道外の第１の位置で音声を収音し第１入力音声信号を送出する第１マイクと、
前記話者の外耳道外における前記第１の位置よりも前記話者の口に近い第２の位置で音声を収音し第２入力音声信号を送出する第２マイクと、
前記話者の外耳道内で音声を収音し第３入力音声信号を送出する第３マイクと、
前記第１入力音声信号の音圧を検出し、検出した前記音圧に応じて前記第２入力音声信号及び前記第３入力音声信号それぞれの反映度合を設定すると共に、前記反映度合に基づいて前記第２入力音声信号及び前記第３入力音声信号の少なくとも一方を含む出力音声信号を生成する制御部と、
前記出力音声信号を外部に送信する通信部と、
を有して、相互に通信可能な第１の音声入力装置及び第２の音声入力装置を含み、
前記第１の音声入力装置の前記制御部は、
前記第１の音声入力装置における前記第１入力音声信号の音圧と、前記第２の音声入力装置における前記第１入力音声信号の音圧との大小を判定し、前記判定結果を基に前記通信部から外部へ送信される前記出力音声信号を設定する音声入力システムである。
２）話者の外耳道外の第１の位置で収音した音声を第１入力音声信号として取得すると共に前記第１入力音声信号の音圧を検出し、
前記話者の外耳道外における前記第１の位置よりも前記話者の口に近い第２の位置で収音した音声を第２入力音声信号として取得し、
前記話者の外耳道内で収音した音声を第３入力音声信号として取得し、
前記第１入力音声信号の音圧に応じて、前記第２入力音声信号と前記第３入力音声信号のいずれか一方を選択的に出力音声信号として設定し外部に送信する音声入力装置の一対である第１の音声入力装置及び第２の音声入力装置を用い、
前記第１の音声入力装置及び前記第２の音声入力装置をそれぞれ左耳及び右耳に装着し、
前記第１の音声入力装置における前記第１入力音声信号の音圧と、前記第２の音声入力装置における前記第１入力音声信号の音圧との大小を判定し、前記判定結果を基に通信部から外部へ送信される前記出力音声信号を設定する入力音声処理方法である。
３）話者の外耳道外の第１の位置で収音した音声を第１入力音声信号として取得すると共に前記第１入力音声信号の音圧を検出し、
前記話者の外耳道外における前記第１の位置よりも前記話者の口に近い第２の位置で収音した音声を第２入力音声信号として取得し、
前記話者の外耳道内で収音した音声を第３入力音声信号として取得し、
前記第２入力音声信号と前記第３入力音声信号とを、前記第１入力音声信号の音圧に応じた音圧比率で混合して出力音声信号として設定し外部に送信する音声入力装置の一対である第１の音声入力装置及び第２の音声入力装置を用い、
前記第１の音声入力装置及び前記第２の音声入力装置をそれぞれ左耳及び右耳に装着し、
前記第１の音声入力装置における前記第１入力音声信号の音圧と、前記第２の音声入力装置における前記第１入力音声信号の音圧との大小を判定し、前記判定結果を基に通信部から外部へ送信される前記出力音声信号を設定する入力音声処理方法である。 In order to solve the above problems, the present invention has the following configurations 1) to 4).
1) a first microphone that picks up a voice at a first position outside the ear canal of a speaker and outputs a first input voice signal;
a second microphone that picks up a voice at a second position outside the ear canal of the speaker that is closer to the mouth of the speaker than the first position and outputs a second input voice signal;
a third microphone that picks up a voice in the ear canal of the speaker and outputs a third input voice signal;
a control unit that detects a sound pressure of the first input audio signal, sets a reflection degree of each of the second input audio signal and the third input audio signal in accordance with the detected sound pressure, and generates an output audio signal including at least one of the second input audio signal and the third input audio signal based on the reflection degree;
a communication unit that transmits the output audio signal to an external device;
The audio input device includes a first audio input device and a second audio input device that can communicate with each other,
The control unit of the first voice input device
This is an audio input system that determines whether the sound pressure of the first input audio signal in the first audio input device is larger than the sound pressure of the first input audio signal in the second audio input device, and sets the output audio signal to be transmitted from the communication unit to the outside based on the determination result.
2) acquiring a first input audio signal based on a sound picked up at a first position outside the ear canal of a speaker, and detecting a sound pressure of the first input audio signal;
acquiring, as a second input audio signal, a sound picked up at a second position outside the ear canal of the speaker that is closer to the mouth of the speaker than the first position;
Acquiring a third input audio signal based on a sound picked up in the ear canal of the speaker;
a pair of audio input devices, a first audio input device and a second audio input device, which selectively set one of the second input audio signal and the third input audio signal as an output audio signal in response to a sound pressure of the first input audio signal and transmit the output audio signal to an outside;
The first audio input device and the second audio input device are attached to the left ear and the right ear, respectively;
This is an input audio processing method that determines whether the sound pressure of the first input audio signal in the first audio input device is larger than the sound pressure of the first input audio signal in the second audio input device, and sets the output audio signal to be transmitted from a communication unit to the outside based on the determination result.
3) acquiring a first input audio signal based on a sound picked up at a first position outside the ear canal of the speaker, and detecting a sound pressure of the first input audio signal;
acquiring, as a second input audio signal, a sound picked up at a second position outside the ear canal of the speaker that is closer to the mouth of the speaker than the first position;
Acquiring a third input audio signal based on a sound picked up in the ear canal of the speaker;
a first audio input device and a second audio input device which are a pair of audio input devices for mixing the second input audio signal and the third input audio signal at a sound pressure ratio corresponding to the sound pressure of the first input audio signal, setting the output audio signal as an output audio signal, and transmitting the output audio signal to an external device;
The first audio input device and the second audio input device are attached to the left ear and the right ear, respectively;
This is an input audio processing method that determines whether the sound pressure of the first input audio signal in the first audio input device is larger than the sound pressure of the first input audio signal in the second audio input device, and sets the output audio signal to be transmitted from a communication unit to the outside based on the determination result.

本発明によれば、周囲音が大きくても、発した音声のＡＩアシスタントによる認識率が高い、という効果が得られる。 The present invention has the advantage that even when the surrounding noise is loud, the AI assistant can recognize the spoken voice with a high accuracy.

図１は、本発明の実施の形態に係る音声入力装置の実施例１であるイヤホン９１を示す模式的断面図である。FIG. 1 is a schematic cross-sectional view showing an earphone 91 which is a first embodiment of a voice input device according to the present invention. 図２は、イヤホン９１のブロック図である。FIG. 2 is a block diagram of the earphone 91. 図３は、イヤホン９１の動作を示す図である。FIG. 3 is a diagram showing the operation of the earphone 91. 図４は、イヤホン９１の変形例１であるイヤホン９１Ａを示すブロック図である。FIG. 4 is a block diagram showing an earphone 91A which is a first variation of the earphone 91. 図５は、イヤホン９１Ａの動作を示す図である。FIG. 5 is a diagram showing the operation of the earphone 91A. 図６は、イヤホン９１の変形例２の動作を示す図である。FIG. 6 is a diagram showing the operation of the second modified example of the earphone 91. 図７は、イヤホン９１の変形例３の動作を示す図である。FIG. 7 is a diagram showing the operation of the third modified example of the earphone 91. 図８は、イヤホン９１の変形例４の動作を示す図である。FIG. 8 is a diagram showing the operation of the fourth modified example of the earphone 91. 図９は、本発明の実施の形態に係る音声入力システムの実施例２であるイヤホンシステム９１ＳＴを示すブロック図である。FIG. 9 is a block diagram showing an earphone system 91ST which is a second embodiment of a voice input system according to the present invention. 図１０は、イヤホンシステム９１ＳＴの動作を示す表である。FIG. 10 is a table showing the operation of the earphone system 91ST. 図１１は、第３マイクＭ３が骨伝導マイクの場合の配置位置の例を示した模式的断面図である。FIG. 11 is a schematic cross-sectional view showing an example of the arrangement position of the third microphone M3 when the third microphone M3 is a bone conduction microphone.

（実施例１）
本発明の実施の形態に係る音声入力装置を、実施例１のイヤホン９１により図１及び図２を参照して説明する。 Example 1
A voice input device according to an embodiment of the present invention will be described using an earphone 91 of Example 1 with reference to FIGS.

図１は、実施例１のイヤホン９１を示す縦断面図である。図１では、イヤホン９１が話者Ｈの耳介Ｅに装着された使用状態で示されている。
図２は、イヤホン９１のブロック図である。 Fig. 1 is a vertical cross-sectional view showing an earphone 91 according to Example 1. In Fig. 1, the earphone 91 is shown in use, being attached to the auricle E of a speaker H.
FIG. 2 is a block diagram of the earphone 91.

イヤホン９１は、本体部１と、本体部１から突出して外耳道Ｅ１内に挿入される挿入部２とを有する。
本体部１は、第１マイクＭ１及び第２マイクＭ２と、制御部３及び通信部４と、駆動部５及びスピーカユニット６とを有する。
挿入部２は、第３マイクＭ３を有する。
制御部３は、音圧検出部３ａ及び入力選択部３ｂを有する。 The earphone 91 has a main body 1 and an insertion portion 2 that protrudes from the main body 1 and is inserted into the ear canal E1.
The main body 1 has a first microphone M1 and a second microphone M2, a control unit 3, a communication unit 4, a drive unit 5, and a speaker unit 6.
The insertion section 2 has a third microphone M3.
The control unit 3 includes a sound pressure detection unit 3a and an input selection unit 3b.

本体部１は、スピーカユニット６の放音側に気室１ａを有する。
挿入部２は、先端に開口し気室１ａに連通した放音路２ａを有する。
イヤホン９１の使用状態において、駆動部５の動作によってスピーカユニット６から出力された音は、気室１ａ及び放音路２ａを通り、外耳道Ｅ１内に放出される。
これにより、イヤホン９１は、通信部４が外部の音声再生装置から無線送信された音声信号を受信し、制御部３及び駆動部５を介してスピーカユニット６で再生することができる。 The main body 1 has an air chamber 1 a on the sound output side of the speaker unit 6 .
The insertion portion 2 has a sound emission path 2a that opens at the tip and communicates with the air chamber 1a.
When the earphone 91 is in use, sound output from the speaker unit 6 by the operation of the drive unit 5 passes through the air chamber 1a and the sound output path 2a and is released into the ear canal E1.
This allows the earphone 91 to receive an audio signal wirelessly transmitted from an external audio playback device via the communication unit 4 and play the signal in the speaker unit 6 via the control unit 3 and drive unit 5 .

第１マイクＭ１は、本体部１における話者Ｈの口から遠い側の部位である第１の位置に配置されており、本体部１の周囲の音を収音する。
第２マイクＭ２は、本体部１における話者Ｈの口に近い側の部位である第２の位置に配置されており、主に、話者Ｈが発した音声を気導音として収音する。すなわち、イヤホン９１の使用状態において、第２マイクＭ２は、第１マイクＭ１よりも話者Ｈの口に近い位置にある。
以下、本体部１の周囲の音を、単に周囲音とも称する。
第３マイクＭ３は、気導音マイクであって、挿入部２の放音路２ａに面した第３の位置に配置され、話者Ｈが発し骨導音として外耳道Ｅ１に達した音声が、外耳道Ｅ１及び放音路２ａの内部空間Ｅｖに反響して生じた気導音を収音する。
すなわち、第１マイクＭ１の第１の位置は、話者Ｈの外耳道外にある。第２マイクＭ２の第２の位置は、話者Ｈの外耳道外における、第１の位置よりも話者Ｈの口に近い位置にある。第３マイクＭ３は話者Ｈの外耳道内にある。 The first microphone M1 is disposed at a first position on the main body 1 that is farther from the mouth of the speaker H, and picks up sounds around the main body 1.
The second microphone M2 is disposed at a second position on the main body 1 that is closer to the mouth of the speaker H, and mainly collects the voice uttered by the speaker H as air-conducted sound. In other words, when the earphones 91 are in use, the second microphone M2 is located closer to the mouth of the speaker H than the first microphone M1.
Hereinafter, the sound around the main body 1 will also be simply referred to as ambient sound.
The third microphone M3 is an air conduction sound microphone and is arranged at a third position facing the sound emission path 2a of the insertion portion 2. The third microphone M3 picks up air conduction sound generated when the sound emitted by the speaker H reaches the external auditory canal E1 as bone conduction sound and reverberates in the internal space Ev of the external auditory canal E1 and the sound emission path 2a.
That is, the first position of the first microphone M1 is outside the ear canal of the speaker H. The second position of the second microphone M2 is outside the ear canal of the speaker H and closer to the mouth of the speaker H than the first position. The third microphone M3 is inside the ear canal of the speaker H.

制御部３の音圧検出部３ａは、第１マイクＭ１から入来した第１入力音声信号である入力音声信号ＳＮ１の音圧を検出し、検出第１音声信号ＳＮ１ａとして出力する。
音圧は、例えば、等価騒音レベル（ＬＡｅｑ）として検出する。音圧検出部３ａによって等価騒音レベル（ＬＡｅｑ）として検出した検出第１音声信号ＳＮ１ａの音圧を、以下、音圧Ｖａと称する。
第１マイクＭ１は、上述のように、主に周囲音を収音するので、音圧Ｖａは、周囲音の音圧とみなすことができる。 The sound pressure detector 3a of the control unit 3 detects the sound pressure of the input sound signal SN1, which is the first input sound signal received from the first microphone M1, and outputs it as a detected first sound signal SN1a.
The sound pressure is detected, for example, as an equivalent continuous-tone acoustic level (LAeq). The sound pressure of the detected first sound signal SN1a detected as the equivalent continuous-tone acoustic level (LAeq) by the sound pressure detector 3a is hereinafter referred to as a sound pressure Va.
As described above, the first microphone M1 mainly picks up ambient sound, and therefore the sound pressure Va can be regarded as the sound pressure of the ambient sound.

図２に示されるように、入力選択部３ｂには、第２マイクＭ２からの第２入力音声信号である入力音声信号ＳＮ２及び第３マイクＭ３からの第３入力音声信号である入力音声信号ＳＮ３と、音圧検出部３ａからの検出第１音声信号ＳＮ１ａとが入力される。
制御部３の入力選択部３ｂは、通信部４へ向けて出力音声信号ＳＮｔを生成し出力する。
その際、制御部３の入力選択部３ｂは、出力音声信号ＳＮｔにおける、入力音声信号ＳＮ２と入力音声信号ＳＮ３それぞれの反映度合ＲＦ１と反映度合ＲＦ２とを、検出第１音声信号ＳＮ１ａの音圧Ｖａに基づいて設定する。
反映度合ＲＦ１及び反映度合ＲＦ２は、出力音声信号ＳＮｔに対する入力音声信号ＳＮ２及び入力音声信号ＳＮ３の反映の程度を、それぞれ示す指標である。
指標は、例えば音圧の大きさとする。この例において、入力選択部３ｂは、反映度合ＲＦ１及び反映度合ＲＦ２を、一方を反映有、他方を反映無し、の二者択一として設定する。
すなわち、入力選択部３ｂは、出力音声信号ＳＮｔを、検出第１音声信号ＳＮ１ａの音圧Ｖａに基づき、入力音声信号ＳＮ２及び入力音声信号ＳＮ３のいずれか一方に切り換えるように択一選択的に設定して生成する。これにより、出力音声信号ＳＮ１は、入力音声信号ＳＮ２及び入力音声信号ＳＮ３の少なくとも一方を含むものとなっている。
通信部４は、入力選択部３ｂから出力された出力音声信号ＳＮｔを、イヤホン９１の外部へ無線送信する。無線送信は、例えば、Ｂｌｕｅｔｏｏｔｈ（登録商標）によって行う。 As shown in Figure 2, the input selection unit 3b receives an input audio signal SN2 which is the second input audio signal from the second microphone M2, an input audio signal SN3 which is the third input audio signal from the third microphone M3, and a detected first audio signal SN1a from the sound pressure detection unit 3a.
The input selection unit 3 b of the control unit 3 generates and outputs an output audio signal SNt to the communication unit 4 .
At this time, the input selection unit 3b of the control unit 3 sets the reflection degrees RF1 and RF2 of the input audio signals SN2 and SN3 in the output audio signal SNt, respectively, based on the sound pressure Va of the detected first audio signal SN1a.
The reflection degree RF1 and the reflection degree RF2 are indexes indicating the degree to which the input audio signal SN2 and the input audio signal SN3 are reflected in the output audio signal SNt, respectively.
The index is, for example, the magnitude of sound pressure. In this example, the input selection unit 3b sets the reflection degree RF1 and the reflection degree RF2 as an alternative, one being reflected and the other not being reflected.
That is, the input selection unit 3b selectively sets and generates the output audio signal SNt to switch to either the input audio signal SN2 or the input audio signal SN3 based on the sound pressure Va of the detected first audio signal SN1a. As a result, the output audio signal SN1 includes at least one of the input audio signal SN2 and the input audio signal SN3.
The communication unit 4 wirelessly transmits the output audio signal SNt output from the input selection unit 3b to the outside of the earphone 91. The wireless transmission is performed by, for example, Bluetooth (registered trademark).

次に、入力選択部３ｂの動作による入力音声処理方法を、図３を参照して詳述する。
図３は、横軸を音圧Ｖａとし、縦軸を、出力音声信号ＳＮｔとして択一的に選択される第２マイクＭ２の入力音声信号ＳＮ２及び第３マイクＭ３の入力音声信号ＳＮ３とした図である。
音圧Ｖａの任意値として、第１の音圧の切換下音圧Ｖａ１と、切換下音圧Ｖａ１よりも大きい第２の音圧の切換上音圧Ｖａ２とを予め設定しておく。 Next, the input voice processing method by the operation of the input selection unit 3b will be described in detail with reference to FIG.
FIG. 3 is a diagram in which the horizontal axis represents sound pressure Va and the vertical axis represents the input audio signal SN2 of the second microphone M2 and the input audio signal SN3 of the third microphone M3, which are alternatively selected as the output audio signal SNt.
As arbitrary values of the sound pressure Va, a first sound pressure, a lower switching sound pressure Va1, and a second sound pressure, an upper switching sound pressure Va2, which is higher than the lower switching sound pressure Va1, are preset.

入力選択部３ｂは、音圧Ｖａが切換下音圧Ｖａ１未満となる場合に、生成する出力音声信号ＳＮｔとして入力音声信号ＳＮ２を選択し設定する。また、音圧Ｖａが切換上音圧Ｖａ２を超える場合に、生成する出力音声信号ＳＮｔとして入力音声信号ＳＮ３を選択し設定する。 When the sound pressure Va is less than the switching lower sound pressure Va1, the input selection unit 3b selects and sets the input sound signal SN2 as the output sound signal SNt to be generated. When the sound pressure Va is greater than the switching upper sound pressure Va2, the input selection unit 3b selects and sets the input sound signal SN3 as the output sound signal SNt to be generated.

入力選択部３ｂは、出力音声信号ＳＮｔとして入力音声信号ＳＮ２を選択して設定しているときに、音圧Ｖａが大きくなって切換上音圧Ｖａ２を超えたら、入力音声信号ＳＮ２を入力音声信号ＳＮ３に切り換える。
入力選択部３ｂは、出力音声信号ＳＮｔとして入力音声信号ＳＮ３を選択して設定しているときに、音圧Ｖａが小さくなって切換下音圧Ｖａ１未満となったら、入力音声信号ＳＮ３を入力音声信号ＳＮ２に切り換える。 When the input audio signal SN2 is selected and set as the output audio signal SNt, if the sound pressure Va increases and exceeds the upper switching sound pressure Va2, the input selection unit 3b switches the input audio signal SN2 to the input audio signal SN3.
When the input audio signal SN3 is selected and set as the output audio signal SNt, if the sound pressure Va decreases to less than the switching lower sound pressure Va1, the input selection unit 3b switches the input audio signal SN3 to the input audio signal SN2.

すなわち、イヤホン９１は、周囲音が小さいときには、外耳道Ｅ１外において気導音として第２マイクＭ２で収音した話者Ｈの音声を、出力音声信号ＳＮｔとして外部に送出する。
また、イヤホン９１は、周囲音が大きいときには、外耳道Ｅ１内において骨導音を経た気導音として第３マイクＭ３で収音した話者Ｈの音声を、出力音声信号ＳＮｔとして外部に送出する。 That is, when the ambient sound is low, the earphone 91 transmits the voice of the speaker H, which is picked up by the second microphone M2 as air-conducted sound outside the ear canal E1, to the outside as an output audio signal SNt.
Furthermore, when the ambient sound is loud, the earphone 91 transmits to the outside the voice of the speaker H picked up by the third microphone M3 as air-conducted sound that has passed through bone conduction in the external auditory canal E1 as an output audio signal SNt.

外耳道Ｅ１内において、骨導音を経た気導音として、或いは骨導音として収音された話者Ｈの音声は、外耳道Ｅ１外において気導音で収音された音声と比べて明瞭さは低いものの、周囲音の影響をほとんど受けることなく安定した音圧で得られる。
そのため、イヤホン９１は、周囲音が大きくても、周囲音に埋もれることなく話者Ｈの音声の音圧が大きい出力音声信号ＳＮｔを送出できる。
また、周囲音が小さい場合は、話者Ｈの音声を外耳道Ｅ１外において気導音で収音して、話者Ｈの音声の音圧が比較的大きく、より明瞭な出力音声信号ＳＮｔを送出できる。 The voice of speaker H picked up in the ear canal E1 as air-conducted sound via bone conduction or as bone conduction sound is less clear than the voice picked up outside the ear canal E1 as air-conducted sound, but is obtained with a stable sound pressure with almost no influence from ambient sounds.
Therefore, even if the surrounding sound is loud, the earphone 91 can output the output audio signal SNt with a high sound pressure of the voice of the speaker H without being drowned out by the surrounding sound.
Furthermore, when the ambient sound is low, the voice of the speaker H is picked up outside the ear canal E1 as air-conducted sound, and the sound pressure of the voice of the speaker H is relatively high, allowing a clearer output voice signal SNt to be sent.

図３に示されるように、イヤホン９１は、入力選択部３ｂが出力音声信号ＳＮｔを入力音声信号ＳＮ２から入力音声信号ＳＮ３に切り換える切換上音圧Ｖａ２と、入力音声信号ＳＮ３から入力音声信号ＳＮ２に切り換える切換下音圧Ｖａ１とが異なる値に設定されている。具体的には、切換上音圧Ｖａ２が切換下音圧Ｖａ１よりも高く設定されている。 As shown in FIG. 3, in the earphone 91, the upper switching sound pressure Va2 at which the input selection unit 3b switches the output audio signal SNt from the input audio signal SN2 to the input audio signal SN3 and the lower switching sound pressure Va1 at which the input audio signal SN3 switches to the input audio signal SN2 are set to different values. Specifically, the upper switching sound pressure Va2 is set higher than the lower switching sound pressure Va1.

切換上音圧Ｖａ２の値と切換下音圧Ｖａ１との値を異ならせることで、第１マイクＭ１で収音される周囲音が、切換下音圧Ｖａ１又は切換上音圧Ｖａ２を跨いで頻繁に上下する場合に、入力音声信号ＳＮ２と入力音声信号ＳＮ３とが頻繁に切り換わって出力音声信号ＳＮｔの音圧或いは音質が安定しなくなる現象を回避できる。これにより、イヤホン９１の周囲音の音圧変動に伴ってＡＩアシスタント８１の音声認識低下が生じることが防止される。 By making the value of the upper switching sound pressure Va2 different from the value of the lower switching sound pressure Va1, when the ambient sound picked up by the first microphone M1 frequently rises and falls between the lower switching sound pressure Va1 or the upper switching sound pressure Va2, it is possible to avoid a phenomenon in which the input audio signal SN2 and the input audio signal SN3 frequently switch and the sound pressure or sound quality of the output audio signal SNt becomes unstable. This prevents a deterioration in the voice recognition of the AI assistant 81 due to fluctuations in the sound pressure of the ambient sound of the earphone 91.

また、切換上音圧Ｖａ２を切換下音圧Ｖａ１よりも大きく設定することで、切換下音圧Ｖａ１と切換上音圧Ｖａ２との間で音圧Ｖａの増減変動が逆転した場合に、選択されるべき入力音声信号に切り換わらなくなる不具合が防止される。 In addition, by setting the upper switching sound pressure Va2 to be greater than the lower switching sound pressure Va1, a problem is prevented in which the input audio signal to be selected is not switched to when the increase/decrease fluctuation of the sound pressure Va is reversed between the lower switching sound pressure Va1 and the upper switching sound pressure Va2.

切換下音圧Ｖａ１及び切換上音圧Ｖａ２は、イヤホン９１の使用環境などに応じて、ＡＩアシスタント８１の認識率が高く維持されるよう製造者側で適切に設定される。また、これに限らず、話者Ｈが切換下音圧Ｖａ１及び切換上音圧Ｖａ２を、イヤホン９１の使用環境に応じて調整できるようにしてもよい。 The lower switching sound pressure Va1 and the upper switching sound pressure Va2 are appropriately set by the manufacturer in accordance with the environment in which the earphones 91 are used, etc., so that the recognition rate of the AI assistant 81 is maintained high. In addition, the present invention is not limited to this, and the speaker H may be able to adjust the lower switching sound pressure Va1 and the upper switching sound pressure Va2 in accordance with the environment in which the earphones 91 are used.

上述のように、イヤホン９１は、本体部１の周囲の音の大小によらず、制御部３が生成し通信部４から送出される出力音声信号ＳＮｔにおいて、話者Ｈの発した音声が大きい音圧で維持される。
これにより、出力音声信号ＳＮｔを受信したＡＩアシスタント８１は、話者Ｈの発した音声の認識率が向上する。 As described above, the earphone 91 maintains the voice of the speaker H at a high sound pressure in the output audio signal SNt generated by the control unit 3 and sent from the communication unit 4, regardless of the volume of the sound around the main body 1.
As a result, the AI assistant 81 that receives the output voice signal SNt can improve the recognition rate of the voice uttered by the speaker H.

以上詳述した実施例１は、上述の構成及び手順に限定されるものではなく、本発明の要旨を逸脱しない範囲において変形した変形例としてもよい。 The above detailed description of Example 1 is not limited to the above-mentioned configuration and procedures, and may be modified without departing from the spirit and scope of the present invention.

（変形例１）
図４は、イヤホン９１の変形例であるイヤホン９１Ａを示すブロック図であり、図５は、イヤホン９１Ａの動作を示す図である。
図４に示されるように、イヤホン９１Ａは、イヤホン９１に対し、入力選択部３ｂを入力混合部３ｃに置き換えたものであり、それ以外の構成は同じである。 (Variation 1)
FIG. 4 is a block diagram showing an earphone 91A which is a variation of the earphone 91, and FIG. 5 is a diagram showing the operation of the earphone 91A.
As shown in FIG. 4, the earphone 91A is the same as the earphone 91 except that the input selection unit 3b is replaced with an input mixing unit 3c.

入力混合部３ｃには、第２マイクＭ２からの入力音声信号ＳＮ２及び第３マイクＭ３からの入力音声信号ＳＮ３と、音圧検出部３ａからの検出第１音声信号ＳＮ１ａとが入力される。
入力混合部３ｃは、入力音声信号ＳＮ２と入力音声信号ＳＮ３とを、検出第１音声信号ＳＮ１ａの音圧Ｖａに応じた音圧比率で混合し、出力音声信号ＳＮｔとして通信部４へ向け出力する。
制御部３の入力混合部３ｃは、出力音声信号ＳＮｔにおける入力音声信号ＳＮ２と入力音声信号ＳＮ３それぞれの反映度合ＲＦ２と反映度合ＲＦ３とを、それぞれの音圧の比率で設定する。音圧比率は、出力音声信号ＳＮｔに含まれる入力音声信号ＳＮ２と入力音声信号ＳＮ３との音圧の比率である。 The input mixer 3c receives the input audio signal SN2 from the second microphone M2, the input audio signal SN3 from the third microphone M3, and the detected first audio signal SN1a from the sound pressure detector 3a.
The input mixer 3c mixes the input audio signal SN2 and the input audio signal SN3 at a sound pressure ratio according to the sound pressure Va of the detected first audio signal SN1a, and outputs the mixed signal to the communication unit 4 as an output audio signal SNt.
The input mixing unit 3c of the control unit 3 sets the reflection degrees RF2 and RF3 of the input audio signals SN2 and SN3 in the output audio signal SNt based on the ratio of their sound pressures. The sound pressure ratio is the ratio of the sound pressures of the input audio signals SN2 and SN3 included in the output audio signal SNt.

図５を参照して入力混合部３ｃの動作による入力音声処理方法を説明する。
図５は、横軸を音圧Ｖａの線形軸とし、縦左軸を入力音声信号ＳＮ２及び入力音声信号ＳＮ３それぞれの混合する混合音圧Ｖの線形軸とし、縦右軸を出力音声信号ＳＮｔの総音圧Ｖｔの線形軸とする。
総音圧Ｖｔは、入力音声信号ＳＮ２と入力音声信号ＳＮ３とを混合した音声信号の音圧であり、いずれか一方が０（ゼロ）の場合も含む。 The input voice processing method by the operation of the input mixer 3c will be described with reference to FIG.
In Figure 5, the horizontal axis is a linear axis of sound pressure Va, the left vertical axis is a linear axis of mixed sound pressure V of input audio signal SN2 and input audio signal SN3, and the right vertical axis is a linear axis of total sound pressure Vt of output audio signal SNt.
The total sound pressure Vt is the sound pressure of the sound signal obtained by mixing the input sound signal SN2 and the input sound signal SN3, and includes the case where either one of them is 0 (zero).

図５に示されるように、音圧Ｖａに対し、予め、任意値として混合下限音圧Ｖａ３と、混合下限音圧Ｖａ３よりも大きい混合上限音圧Ｖａ４とを設定しておく。以下、音圧Ｖａにおける、混合下限音圧Ｖａ３以上、混合上限音圧Ｖａ４以下の範囲を、音圧Ｖａにおける混合範囲Ｒとも称する。
また、入力音声信号ＳＮ２及び入力音声信号ＳＮ３に対し、予め、混合する最小音圧である混合最小音圧Ｖｍｉｎと、混合する最大の音圧である混合最大音圧Ｖｍａｘとを設定しておく。混合最小音圧Ｖｍｉｎは０（ゼロ）であってもよい。 5, a mixed lower limit sound pressure Va3 and a mixed upper limit sound pressure Va4 that is greater than the mixed lower limit sound pressure Va3 are set in advance as arbitrary values for the sound pressure Va. Hereinafter, the range of the sound pressure Va that is equal to or greater than the mixed lower limit sound pressure Va3 and equal to or less than the mixed upper limit sound pressure Va4 is also referred to as the mixed range R of the sound pressure Va.
Furthermore, a minimum mixed sound pressure Vmin, which is the minimum sound pressure to be mixed, and a maximum mixed sound pressure Vmax, which is the maximum sound pressure to be mixed, are set in advance for the input audio signal SN2 and the input audio signal SN3. The minimum mixed sound pressure Vmin may be 0 (zero).

入力混合部３ｃは、音圧Ｖａが混合下限音圧Ｖａ３未満の場合に、入力音声信号ＳＮ２を混合最大音圧Ｖｍａｘとし、入力音声信号ＳＮ３を混合最小音圧Ｖｍｉｎとする。
入力混合部３ｃは、音圧Ｖａが混合上限音圧Ｖａ４を超える場合に、入力音声信号ＳＮ２を混合最小音圧Ｖｍｉｎとし、入力音声信号ＳＮ３を混合最大音圧Ｖｍａｘとする。
入力混合部３ｃは、音圧Ｖａの混合範囲Ｒにおいて、入力音声信号ＳＮ２については音圧Ｖａが大きくなるほど混合音圧Ｖを減少させ、入力音声信号ＳＮ３については、音圧Ｖａが大きくなるほど混合音圧Ｖを増加させる。
すなわち、入力混合部３ｃは、音圧Ｖａが大きくなるほど、入力音声信号ＳＮ２の反映度合ＦＲ２を減少させ、入力音声信号ＳＮ３の反映度合ＦＲ３を増加させる。
入力混合部３ｃは、音圧Ｖａの混合範囲Ｒにおいて、音圧Ｖａに対し混合音圧Ｖを例えば線形に増減させる。 When the sound pressure Va is less than the mixing lower limit sound pressure Va3, the input mixing unit 3c sets the input sound signal SN2 to the mixing maximum sound pressure Vmax and sets the input sound signal SN3 to the mixing minimum sound pressure Vmin.
When the sound pressure Va exceeds the mixed upper limit sound pressure Va4, the input mixing unit 3c sets the input sound signal SN2 to the mixed minimum sound pressure Vmin and sets the input sound signal SN3 to the mixed maximum sound pressure Vmax.
The input mixer 3c decreases the mixed sound pressure V for the input audio signal SN2 as the sound pressure Va increases within the mixing range R of the sound pressure Va, and increases the mixed sound pressure V for the input audio signal SN3 as the sound pressure Va increases.
That is, as the sound pressure Va increases, the input mixing unit 3c decreases the reflection degree FR2 of the input sound signal SN2 and increases the reflection degree FR3 of the input sound signal SN3.
The input mixer 3c increases or decreases the mixed sound pressure V relative to the sound pressure Va, for example, linearly, within the mixing range R of the sound pressure Va.

これにより、入力混合部３ｃは、音圧Ｖａにおける混合範囲Ｒの任意の音圧Ｖａｘにおいて、入力音声信号ＳＮ２及び入力音声信号ＳＮ３を、それぞれ音圧Ｖａｘに対応した混合音圧Ｖ２ｘ及び混合音圧Ｖ３ｘで混合し、出力音声信号ＳＮｔを生成し、生成した出力音声信号ＳＮｔを通信部４に向け出力する。 As a result, the input mixing unit 3c mixes the input audio signal SN2 and the input audio signal SN3 at an arbitrary sound pressure Vax in the mixing range R of the sound pressure Va with mixed sound pressures V2x and V3x corresponding to the sound pressure Vax, respectively, to generate an output audio signal SNt, and outputs the generated output audio signal SNt to the communication unit 4.

上述の入力混合部３ｃの動作により、出力音声信号ＳＮｔの総音圧Ｖｔは、音圧Ｖａの大小によらず、一定の総音圧Ｖｔｃとなる。 By the operation of the input mixing unit 3c described above, the total sound pressure Vt of the output audio signal SNt becomes a constant total sound pressure Vtc, regardless of the magnitude of the sound pressure Va.

混合下限音圧Ｖａ３，混合上限音圧Ｖａ４，混合最小音圧Ｖｍｉｎ，及び混合最大音圧Ｖｍａｘは、イヤホン９１Ａの使用環境などに応じて、ＡＩアシスタント８１の音声認識率が高く維持されるように、製造者側で適切に設定される。
混合下限音圧Ｖａ３，混合上限音圧Ｖａ４，混合最小音圧Ｖｍｉｎ，及び混合最大音圧Ｖｍａｘは、話者Ｈが調整可能であってもよい。 The mixed lower limit sound pressure Va3, the mixed upper limit sound pressure Va4, the mixed minimum sound pressure Vmin, and the mixed maximum sound pressure Vmax are appropriately set by the manufacturer depending on the usage environment of the earphones 91A, etc., so that the voice recognition rate of the AI assistant 81 is maintained high.
The lower mixed sound pressure limit Va3, the upper mixed sound pressure limit Va4, the minimum mixed sound pressure Vmin, and the maximum mixed sound pressure Vmax may be adjustable by the speaker H.

変形例１のイヤホン９１Ａによれば、周囲音の音圧Ｖａが混合下限音圧Ｖａ３及び混合上限音圧Ｖａ４との間の混合範囲Ｒにあるときに、入力音声信号ＳＮ２と入力音声信号ＳＮ３とが、音圧Ｖａに応じた反映度合ＦＲ２，ＦＲ３の音圧音圧で混合されると共に、混合される音圧の比率が、本体部１の周囲音の音圧の増減に応じて線形で徐々に変化する。
例えば、出力音声信号ＳＮｔにおける反映度合ＦＲ２は、音圧Ｖａが音圧Ｖａ３のときＶｍａｘ／Ｖｍｉｎ、音圧ＶａｘのときＶ２ｘ／Ｖ３ｘ、音圧Ｖａ４のときＶｍｉｎ／Ｖｍａｘで示される。
また、出力音声信号ＳＮｔにおける反映度合ＦＲ３は、音圧Ｖａが音圧Ｖａ３のときＶｍｉｎ／Ｖｍａｘ、音圧ＶａｘでＶ３ｘ／Ｖ２ｘ、音圧Ｖａ４でＶｍａｘ／Ｖｍｉｎで示される。
そのため、周囲音の増減に応じた出力音声信号ＳＮｔの音質の変化が緩やかで滑らかになり、本体部１の周囲音の音圧によらず、話者Ｈが発した音声のＡＩアシスタント８１による認識率が高く維持される。
また、イヤホン９１Ａは、周囲音の増減によらず出力音声信号ＳＮｔの総音圧Ｖｔが一定で急変しないので、話者Ｈが発した音声のＡＩアシスタント８１による認識率がより高く維持される。 According to the earphone 91A of the first modified example, when the sound pressure Va of the ambient sound is within the mixing range R between the lower mixing limit sound pressure Va3 and the upper mixing limit sound pressure Va4, the input audio signal SN2 and the input audio signal SN3 are mixed at sound pressures with reflection degrees FR2, FR3 corresponding to the sound pressure Va, and the ratio of the mixed sound pressures gradually changes linearly in accordance with the increase or decrease in the sound pressure of the ambient sound of the main body 1.
For example, the reflection degree FR2 in the output sound signal SNt is represented by Vmax/Vmin when the sound pressure Va is Va3, V2x/V3x when the sound pressure Vax, and Vmin/Vmax when the sound pressure Va4.
The reflection degree FR3 in the output audio signal SNt is expressed as Vmin/Vmax when the sound pressure Va is the sound pressure Va3, as V3x/V2x when the sound pressure Vax, and as Vmax/Vmin when the sound pressure Va4.
As a result, the change in sound quality of the output audio signal SNt in response to an increase or decrease in the ambient sound becomes gradual and smooth, and the recognition rate of the voice uttered by the speaker H by the AI assistant 81 is maintained high regardless of the sound pressure of the ambient sound of the main body 1.
In addition, since the total sound pressure Vt of the output audio signal SNt of the earphone 91A remains constant and does not change suddenly regardless of whether the ambient sound increases or decreases, the recognition rate of the voice uttered by the speaker H by the AI assistant 81 is maintained at a higher level.

（変形例２）
イヤホン９１Ａは、出力音声信号ＳＮｔの総音圧Ｖｔを音圧Ｖａによらず一定とする入力混合部３ｃに換えて、図６に示されるように、総音圧Ｖｔを音圧Ｖａに応じて変える入力混合部３ｃＢ（図４参照）とした、変形例２のイヤホン９１Ｂ（図４参照）であってもよい。 (Variation 2)
The earphone 91A may be an earphone 91B (see FIG. 4) of variant example 2 in which, instead of the input mixing unit 3c that keeps the total sound pressure Vt of the output audio signal SNt constant regardless of the sound pressure Va, an input mixing unit 3cB (see FIG. 4) that changes the total sound pressure Vt according to the sound pressure Va, as shown in FIG. 6 .

入力混合部３ｃＢは、例えば、音圧Ｖａの混合範囲Ｒにおいて、総音圧Ｖｔを、音圧Ｖａが大きくなるに従って大きくする。
詳しくは、入力混合部３ｃＢは、図６に示されるように、入力音声信号ＳＮ２の混合最大音圧Ｖ２ｍａｘと入力音声信号ＳＮ３の混合最大音圧Ｖ３ｍａｘとを、異なる値として混合動作を実行する。例えば、混合最大音圧Ｖ３ｍａｘを、混合最大音圧Ｖ２ｍａｘより大きいものとする。
これにより出力音声信号ＳＮｔは、混合下限音圧Ｖａ３における総音圧Ｖｔ１と、混合上限音圧Ｖａ４における総音圧Ｖｔ１よりも大きい総音圧Ｖｔ２との間で音圧が増減する。 For example, in the mixing range R of the sound pressure Va, the input mixer 3cB increases the total sound pressure Vt as the sound pressure Va increases.
Specifically, the input mixer 3cB performs a mixing operation with the maximum mixed sound pressure V2max of the input audio signal SN2 and the maximum mixed sound pressure V3max of the input audio signal SN3 set to different values, as shown in Fig. 6. For example, the maximum mixed sound pressure V3max is set to be greater than the maximum mixed sound pressure V2max.
As a result, the sound pressure of the output sound signal SNt increases and decreases between a total sound pressure Vt1 at the mixed lower limit sound pressure Va3 and a total sound pressure Vt2 that is greater than the total sound pressure Vt1 at the mixed upper limit sound pressure Va4.

総音圧Ｖｔを一定とした場合、音圧Ｖａが大きい、すなわち周囲音が大きい場合に、入力音声信号ＳＮ２に背景騒音としてある程度含まれる周囲音の音圧比率が高くなる。従って、出力音声信号ＳＮｔの総音圧Ｖｔにおける周囲音の音圧比率が相対的に高くなる。
これに対し、イヤホン９１Ｂは、音圧Ｖａにおける混合範囲Ｒにおいて、音圧Ｖａが大きくなるに従って、入力音声信号ＳＮ２に対する入力音声信号ＳＮ３の音圧の混合比率が大きくなる。そのため、出力音声信号ＳＮｔの総音圧Ｖｔにおける周囲音の音圧比率の上昇が抑制される。
これにより、出力音声信号ＳＮｔを受信したＡＩアシスタント８１による音声認識率が安定的に維持される。 When the total sound pressure Vt is constant, if the sound pressure Va is large, that is, the ambient sound is large, the sound pressure ratio of the ambient sound contained to a certain extent as background noise in the input audio signal SN2 becomes high. Therefore, the sound pressure ratio of the ambient sound in the total sound pressure Vt of the output audio signal SNt becomes relatively high.
In contrast, in the earphone 91B, the mixing ratio of the sound pressure of the input audio signal SN3 to the input audio signal SN2 increases as the sound pressure Va increases in the mixing range R of the sound pressure Va. This suppresses an increase in the sound pressure ratio of the ambient sound in the total sound pressure Vt of the output audio signal SNt.
This allows the voice recognition rate by the AI assistant 81 that receives the output voice signal SNt to be stably maintained.

（変形例３）
変形例１のイヤホン９１Ａは、入力混合部３ｃを、非線形に減少及び増加させる入力混合部３ｃＣ（図４参照）に置き換えた変形例３のイヤホン９１Ｃ(図４参照)に変形してもよい。 (Variation 3)
The earphone 91A of the first modification may be modified to an earphone 91C of a third modification (see FIG. 4) in which the input mixer 3c is replaced with an input mixer 3cC (see FIG. 4) that nonlinearly decreases and increases the input signal.

入力混合部３ｃＣは、図７に示されるように、音圧Ｖａにおける混合範囲Ｒにおいて、音圧Ｖａが時間と共に減少する場合の入力音声信号ＳＮ２と入力音声信号ＳＮ３とを等しい音圧で混合する音圧Ｖａ５が、混合下限音圧Ｖａ３と混合上限音圧Ｖａ４との中点よりも混合下限音圧Ｖａ３側に寄って設定されている。
すなわち、入力混合部３ｃＣは、音圧Ｖａが減少する場合の入力音声信号ＳＮ２と入力音声信号ＳＮ３との混合を、非線形の特性線ＬＮ２ｂ及び特性線ＬＮ３ｂに基づいて実行する。 As shown in Figure 7, in the input mixing unit 3cC, in the mixing range R at the sound pressure Va, the sound pressure Va5 that mixes the input audio signal SN2 and the input audio signal SN3 at an equal sound pressure when the sound pressure Va decreases over time is set closer to the mixing lower limit sound pressure Va3 than the midpoint between the mixing lower limit sound pressure Va3 and the mixing upper limit sound pressure Va4.
That is, the input mixing unit 3cC mixes the input audio signal SN2 and the input audio signal SN3 when the sound pressure Va decreases, based on the nonlinear characteristic lines LN2b and LN3b.

一方、入力混合部３ｃＣは、音圧Ｖａが時間と共に増加する場合の入力音声信号ＳＮ２と入力音声信号ＳＮ３とを等しい音圧で混合する音圧Ｖａ６が、混合下限音圧Ｖａ３と混合上限音圧Ｖａ４との中点よりも混合上限音圧Ｖａ４側に寄って設定されている。
すなわち、入力混合部３ｃＣは、音圧Ｖａが増加する場合の入力音声信号ＳＮ２と入力音声信号ＳＮ３との混合を、非線形の特性線ＬＮ２ａ及び特性線ＬＮ３ａに基づいて実行する。 On the other hand, the input mixing unit 3cC has a sound pressure Va6 that mixes the input audio signal SN2 and the input audio signal SN3 at an equal sound pressure when the sound pressure Va increases over time, and is set closer to the mixed upper limit sound pressure Va4 than the midpoint between the mixed lower limit sound pressure Va3 and the mixed upper limit sound pressure Va4.
That is, the input mixing unit 3cC mixes the input audio signal SN2 and the input audio signal SN3 when the sound pressure Va increases, based on the nonlinear characteristic lines LN2a and LN3a.

また、入力混合部３ｃＣは、音圧Ｖａが増加するも混合上限音圧Ｖａ４に至らずに減少に転じた場合は、特性線ＬＮ２ａ及び特性線ＬＮ３ａ上で混合比を変化させる。また、入力混合部３ｃＣは、音圧Ｖａが減少するも混合下限音圧Ｖａ３に至らずに増加に転じた場合は、特性線ＬＮ３ｂ及び特性線ＬＮ２ｂ上で混合比を変化させる。 In addition, when the sound pressure Va increases but does not reach the upper mixed sound pressure Va4 and starts to decrease, the input mixing unit 3cC changes the mixing ratio on the characteristic lines LN2a and LN3a.In addition, when the sound pressure Va decreases but does not reach the lower mixed sound pressure Va3 and starts to increase, the input mixing unit 3cC changes the mixing ratio on the characteristic lines LN3b and LN2b.

また、入力混合部３ｃＣは、出力音声信号ＳＮｔの総音圧Ｖｔが、音圧Ｖａの大小によらず、一定の総音圧Ｖｔｃとなるように入力音声信号ＳＮ２と入力音声信号ＳＮ３との混合比を制御する。
図７における入力音声信号ＳＮ２及びＳＮ３の非線形特性は、イヤホン９１の製造者によって予め設定されるか、又は話者Ｈの調整によって設定される。 Moreover, the input mixer 3cC controls the mixing ratio of the input audio signal SN2 and the input audio signal SN3 so that the total sound pressure Vt of the output audio signal SNt becomes a constant total sound pressure Vtc regardless of the magnitude of the sound pressure Va.
The nonlinear characteristics of the input audio signals SN2 and SN3 in FIG. 7 are preset by the manufacturer of the earphone 91 or are set by adjustment of the speaker H.

変形例３のイヤホン９１Ｃは、周囲音の音圧Ｖａが、比較的小さく維持されて混合範囲Ｒの混合下限音圧Ｖａ３に近い側にある場合に、出力音声信号ＳＮｔを、入力音声信号ＳＮ３に対する入力音声信号ＳＮ２の比率がより高くなるように混合して生成し、音声の明瞭化を優先的に図る。
また、イヤホン９１Ｃは、周囲音の音圧Ｖａが、比較的大きく維持されて混合範囲Ｒの混合上限音圧Ｖａ４に近い側にある場合に、出力音声信号ＳＮｔを、入力音声信号ＳＮ２に対する入力音声信号ＳＮ３の比率がより高くなるように混合して生成し、音声の高音圧化を優先的に図る。 When the sound pressure Va of the ambient sound is maintained relatively small and is close to the lower limit sound pressure Va3 of the mixing range R, the earphone 91C of variant example 3 generates an output audio signal SNt by mixing so that the ratio of the input audio signal SN2 to the input audio signal SN3 is higher, thereby prioritizing clarity of the sound.
In addition, when the sound pressure Va of the ambient sound is maintained relatively high and is close to the upper mixed sound pressure Va4 of the mixing range R, the earphone 91C generates the output audio signal SNt by mixing so that the ratio of the input audio signal SN3 to the input audio signal SN2 is higher, thereby prioritizing the increase in sound pressure of the audio.

このように、変形例３のイヤホン９１Ｃは、周囲音の音圧Ｖａの大小傾向に応じた、音声認識により適した出力音声信号ＳＮｔを生成する。そのため、話者Ｈが発した音声のＡＩアシスタント８１による認識率がより高く維持される。 In this way, the earphone 91C of the third modification generates an output audio signal SNt that is more suitable for voice recognition according to the tendency of the sound pressure Va of the ambient sound. Therefore, the recognition rate of the voice uttered by the speaker H by the AI assistant 81 is maintained at a higher level.

（変形例４）
イヤホン９１Ｃは、入力混合部３ｃＣを、図８に示されるように、総音圧Ｖｔを音圧Ｖａに応じて変える入力混合部３ｃＤ（図４参照）に置き換えた変形例４のイヤホン９１Ｄ（図４参照）に変形してもよい。 (Variation 4)
The earphone 91C may be modified to an earphone 91D (see FIG. 4) of variant 4 in which the input mixing unit 3cC is replaced with an input mixing unit 3cD (see FIG. 4) that changes the total sound pressure Vt in response to the sound pressure Va, as shown in FIG. 8.

入力混合部３ｃＤは、例えば、音圧Ｖａにおける混合範囲Ｒにおいて、総音圧Ｖｔを音圧Ｖａが,大きくなるに従って大きくする。
詳しくは、入力混合部３ｃＤは、図８に示されるように、入力音声信号ＳＮ２の混合最大音圧Ｖ２ｍａｘと入力音声信号ＳＮ３の混合最大音圧Ｖ３ｍａｘとを、異なる値として混合動作を実行する。例えば、混合最大音圧Ｖ３ｍａｘを、混合最大音圧Ｖ２ｍａｘより大きいものとする。
これにより出力音声信号ＳＮｔは、混合下限音圧Ｖａ３における総音圧Ｖｔ１と、混合上限音圧Ｖａ４における総音圧Ｖｔ１よりも大きい総音圧Ｖｔ２との間で音圧が増減する。 For example, in a mixing range R for a sound pressure Va, the input mixing unit 3cD increases the total sound pressure Vt as the sound pressure Va increases.
8, the input mixer 3cD performs a mixing operation with the maximum mixed sound pressure V2max of the input audio signal SN2 and the maximum mixed sound pressure V3max of the input audio signal SN3 set to different values. For example, the maximum mixed sound pressure V3max is set to be greater than the maximum mixed sound pressure V2max.
As a result, the sound pressure of the output sound signal SNt increases and decreases between a total sound pressure Vt1 at the mixed lower limit sound pressure Va3 and a total sound pressure Vt2 that is greater than the total sound pressure Vt1 at the mixed upper limit sound pressure Va4.

これにより、変形例２と同様に、音圧Ｖａが大きくなるほど、入力音声信号ＳＮ２に対する入力音声信号ＳＮ３の音圧の混合比率が大きくなる。そのため、出力音声信号ＳＮｔの総音圧Ｖｔにおける周囲音の音圧比率の上昇が抑制され、出力音声信号ＳＮｔを受信したＡＩアシスタント８１の、音声認識率が安定して維持される。 As a result, as in variant example 2, the higher the sound pressure Va, the higher the mixing ratio of the sound pressure of the input sound signal SN3 to the input sound signal SN2. This suppresses an increase in the sound pressure ratio of the ambient sound in the total sound pressure Vt of the output sound signal SNt, and the voice recognition rate of the AI assistant 81 that receives the output sound signal SNt is stably maintained.

イヤホン９１，９１Ａ～９１Ｃを商品として販売する場合は、単品での販売に限らず、２個以上を組として販売してもよい。イヤホン９１，９１Ａ～９１Ｃを左右両方の耳に対し装着可能な仕様とすれば、左右の耳用として２個一組で販売してもよい。
また、イヤホン９１，９１Ａ～９１Ｃは、大規模店舗の複数の従業員が装着する片耳用のマイク付きイヤホンとして、任意の複数個を組として販売してもよい。 When the earphones 91, 91A to 91C are sold as products, they may be sold not only as single items but also as a set of two or more. If the earphones 91, 91A to 91C are designed to be wearable on both the left and right ears, they may be sold as a set of two for the left and right ears.
Furthermore, the earphones 91, 91A to 91C may be sold as a set of any number of earphones with a microphone for one ear to be worn by multiple employees of a large store.

（実施例２）
本発明の実施の形態に係る音声入力システムを、実施例２のイヤホンシステム９１ＳＴにより主に図１，図９，及び図１０を参照して説明する。
図９は、イヤホンシステム９１ＳＴのブロック図であり、図１０はイヤホンシステム９１ＳＴの動作を示す表である。図９に示されるように、イヤホンシステム９１ＳＴは、第１の音声入力装置であるイヤホン９１Ｌと、第２の音声入力装置であるイヤホン９１Ｒの一対の組として構成される。イヤホン９１Ｌは話者Ｈの左耳に装着され、イヤホン９１Ｒは話者Ｈの右耳に装着される。 Example 2
Second Embodiment A voice input system according to an embodiment of the present invention will be described below using an earphone system 91ST according to a second embodiment with reference mainly to FIGS.
Fig. 9 is a block diagram of the earphone system 91ST, and Fig. 10 is a table showing the operation of the earphone system 91ST. As shown in Fig. 9, the earphone system 91ST is configured as a pair of an earphone 91L which is a first voice input device and an earphone 91R which is a second voice input device. The earphone 91L is worn on the left ear of a speaker H, and the earphone 91R is worn on the right ear of the speaker H.

図１に示されるように、イヤホン９１Ｌは、本体部１Ｌ及び挿入部２を有し、イヤホン９１Ｒは、本体部１Ｒ及び挿入部２を有する。
イヤホン９１Ｌ及びイヤホン９１Ｒにおける、第１～第３マイクＭ１～Ｍ３と駆動部５とスピーカユニット６との構成及び配置位置は、実施例１のイヤホン９１と同じである。
以下、イヤホン９１と同じ部材には同じ符号を付し、異なる部材について、符号末尾にそれぞれＬ，Ｒを付して区別する。 As shown in FIG. 1, the earphone 91L has a main body 1L and an insertion portion 2, and the earphone 91R has a main body 1R and an insertion portion 2.
In the earphones 91L and 91R, the configurations and positions of the first to third microphones M1 to M3, the driver 5, and the speaker unit 6 are the same as those of the earphones 91 of the first embodiment.
Hereinafter, the same reference numerals will be used to designate the same components as those in the earphone 91, and different components will be distinguished by adding L and R to the end of the reference numerals, respectively.

図１及び図９に示されるように、イヤホン９１Ｌ及びイヤホン９１Ｒは、イヤホン９１の制御部３の替わりにそれぞれ制御部３Ｌ及び制御部３Ｒを有し、イヤホン９１の通信部４の替わりにそれぞれ通信部４Ｌ及び通信部４Ｒを有する。 As shown in FIG. 1 and FIG. 9, earphone 91L and earphone 91R have control units 3L and 3R, respectively, instead of control unit 3 of earphone 91, and communication units 4L and 4R, respectively, instead of communication unit 4 of earphone 91.

イヤホン９１Ｌにおいて、本体部１Ｌは、第１マイクＭ１及び第２マイクＭ２と、制御部３Ｌ及び通信部４Ｌと、駆動部５及びスピーカユニット６とを有する。挿入部２は、第３マイクＭ３を有する。
イヤホン９１Ｒにおいて、本体部１Ｒは、第１マイクＭ１及び第２マイクＭ２と、制御部３Ｌ及び通信部４Ｌと、駆動部５及びスピーカユニット６とを有する。挿入部２は、第３マイクＭ３を有する。 In the earphone 91L, the main body 1L has a first microphone M1, a second microphone M2, a control unit 3L, a communication unit 4L, a drive unit 5, and a speaker unit 6. The insertion portion 2 has a third microphone M3.
In the earphone 91R, the main body 1R has a first microphone M1, a second microphone M2, a control unit 3L, a communication unit 4L, a drive unit 5, and a speaker unit 6. The insertion portion 2 has a third microphone M3.

図１に示されるように、本体部１Ｌ及び本体部１Ｒは、スピーカユニット６の放音側に気室１ａを有する。
挿入部２は、先端に開口し気室１ａに連通した放音路２ａを有する。
イヤホン９１Ｌ，９１Ｒの使用状態において、駆動部５の動作によってスピーカユニット６から出力された音は、気室１ａ及び放音路２ａを通り、左右の耳それぞれの外耳道Ｅ１内に放出される。
これにより、イヤホン９１Ｌ，９１Ｒは、それぞれ通信部４Ｌ，４Ｒが外部の音声再生装置から無線送信された音声信号を受信し、制御部３Ｌ，３Ｒ及び駆動部５を介してスピーカユニット６で再生することができる。
イヤホン９１Ｌとイヤホン９１Ｒとは、通信部４Ｌと通信部４Ｒとの間で相互に通信可能である。 As shown in FIG. 1, the main body 1L and the main body 1R have an air chamber 1a on the sound output side of the speaker unit 6.
The insertion portion 2 has a sound emission path 2a that opens at the tip and communicates with the air chamber 1a.
When the earphones 91L and 91R are in use, sound output from the speaker unit 6 by the operation of the drive unit 5 passes through the air chamber 1a and the sound output path 2a and is output into the external ear canals E1 of each of the left and right ears.
As a result, the earphones 91L, 91R can receive audio signals wirelessly transmitted from an external audio playback device via the communication units 4L, 4R, respectively, and play the signals in the speaker unit 6 via the control units 3L, 3R and drive unit 5.
The earphones 91L and 91R are capable of communicating with each other via the communication units 4L and 4R.

本体部１Ｌ，１Ｒそれぞれに備えられた第１マイクＭ１は、本体部１Ｌ，１Ｒにおける話者Ｈの口から遠い側の部位に配置され、本体部１Ｌ，１Ｒの周囲の音を収音する。
本体部１Ｌ，１Ｒそれぞれに備えられた第２マイクＭ２は、本体部１Ｌ，１Ｒにおける話者Ｈの口に近い側の部位に配置されている。すなわち、イヤホン９１Ｌ，９１Ｒの使用状態において、第２マイクＭ２は、第１マイクＭ１よりも話者Ｈの口に近い位置にある。
第３マイクＭ３は、気導音マイクであって、挿入部２の放音路２ａに面した第３の位置に配置され、話者Ｈが発し骨導音として外耳道Ｅ１に達した音声が、外耳道Ｅ１及び放音路２ａの内部空間Ｅｖに反響して生じた気導音を収音する。
すなわち、第１マイクＭ１の第１の位置は、話者Ｈの外耳道外にある。第２マイクＭ２の第２の位置は、話者Ｈの外耳道外における、第１の位置よりも話者Ｈの口に近い位置にある。第３マイクＭ３は話者Ｈの外耳道内にある。 The first microphone M1 provided on each of the main bodies 1L and 1R is disposed at a portion of the main body 1L or 1R farther from the mouth of the speaker H, and picks up sounds around the main body 1L or 1R.
The second microphone M2 provided in each of the main bodies 1L and 1R is disposed at a portion of the main bodies 1L and 1R that is closer to the mouth of the speaker H. In other words, when the earphones 91L and 91R are in use, the second microphone M2 is located closer to the mouth of the speaker H than the first microphone M1.
The third microphone M3 is an air conduction sound microphone and is arranged at a third position facing the sound emission path 2a of the insertion portion 2. The third microphone M3 picks up air conduction sound generated when the sound emitted by the speaker H reaches the external auditory canal E1 as bone conduction sound and reverberates in the internal space Ev of the external auditory canal E1 and the sound emission path 2a.
That is, the first position of the first microphone M1 is outside the ear canal of the speaker H. The second position of the second microphone M2 is outside the ear canal of the speaker H and closer to the mouth of the speaker H than the first position. The third microphone M3 is inside the ear canal of the speaker H.

図９に示されるように、イヤホン９１Ｌの制御部３Ｌは、音圧検出部３ａＬ，入力選択部３ｂＬ，及び音圧差評価部３ｄを有する。
イヤホン９１Ｒの制御部３Ｒは、音圧検出部３ａＲ，入力選択部３ｂＲ，及び出力制御部３ｅを有する。 As shown in FIG. 9, a control unit 3L of an earphone 91L has a sound pressure detection unit 3aL, an input selection unit 3bL, and a sound pressure difference evaluation unit 3d.
The control unit 3R of the earphone 91R has a sound pressure detection unit 3aR, an input selection unit 3bR, and an output control unit 3e.

イヤホン９１Ｌにおいて、音圧検出部３ａＬは、第１マイクＭ１から入来した音声信号ＳＮ１Ｌの音圧を検出し、検出第１音声信号ＳＮＬとして、入力選択部３ｂＬと音圧差評価部３ｄとの両方に向け出力する。
イヤホン９１Ｒにおいて、音圧検出部３ａＲは、第１マイクＭ１から入来した音声信号ＳＮ１Ｒの音圧を検出し、検出第１音声信号ＳＮＲとして、入力選択部３ｂＲと出力制御部３ｅとの両方に向け出力する。 In the earphone 91L, the sound pressure detector 3aL detects the sound pressure of the audio signal SN1L received from the first microphone M1, and outputs it as a detected first audio signal SNL to both the input selector 3bL and the sound pressure difference evaluator 3d.
In the earphone 91R, the sound pressure detector 3aR detects the sound pressure of the audio signal SN1R received from the first microphone M1, and outputs it as a detected first audio signal SNR to both the input selector 3bR and the output controller 3e.

音声信号ＳＮ１Ｌ，ＳＮ１Ｒの音圧は、例えば、等価騒音レベル（ＬＡｅｑ）として検出する。音圧検出部３ａＬ，３ａＲによって、等価騒音レベル（ＬＡｅｑ）としてそれぞれが検出した検出第１音声信号ＳＮＬ，ＳＮＲの音圧を、以下、それぞれ音圧ＶＬ，ＶＲと称する。 The sound pressure of the audio signals SN1L and SN1R is detected, for example, as an equivalent noise level (LAeq). The sound pressures of the detected first audio signals SNL and SNR detected by the sound pressure detectors 3aL and 3aR as equivalent noise levels (LAeq) are hereinafter referred to as sound pressures VL and VR, respectively.

本体部１Ｌに備えられた第１マイクＭ１は、本体部１Ｌの周囲音を収音し、本体部１Ｒに備えられた第１マイクＭ１は、本体部１Ｒの周囲音を収音する。
従って、音圧ＶＬは、左耳用のイヤホン９１Ｌの周囲音の音圧とみなすことができる。また、音圧ＶＲは、右耳用のイヤホン９１Ｒの周囲音の音圧とみなすことができる。 The first microphone M1 provided in the main body 1L picks up ambient sound around the main body 1L, and the first microphone M1 provided in the main body 1R picks up ambient sound around the main body 1R.
Therefore, the sound pressure VL can be regarded as the sound pressure of the ambient sound of the left earphone 91L, and the sound pressure VR can be regarded as the sound pressure of the ambient sound of the right earphone 91R.

イヤホン９１Ｒの出力制御部３ｅは、入来した検出第１音声信号ＳＮＲの音圧ＶＲを含む音圧情報ＪＲ１と、通信制御情報ＪＲ２（詳細は後述する）とを、通信部４Ｒに向け送出する。 The output control unit 3e of the earphone 91R sends sound pressure information JR1, which includes the sound pressure VR of the incoming detected first audio signal SNR, and communication control information JR2 (details of which will be described later), to the communication unit 4R.

イヤホン９１Ｌ，９１Ｒそれぞれの入力選択部３ｂＬ，３ｂＲの動作による入力音声処理方法は、実施例１の入力選択部３ｂと同様である。
具体的には、図３に示されるように、イヤホン９１Ｌの入力選択部３ｂＬは、音圧検出部３ａＬから出力された検出第１音声信号ＳＮＬの音圧ＶＬが、予め設定された少なくとも切換下音圧Ｖａ１未満となる場合に、出力音声信号ＳＮｔＬとして入力音声信号ＳＮ２Ｌを選択し設定する。また、音圧ＶＲが少なくとも切換上音圧Ｖａ２を超える場合に、出力音声信号ＳＮｔＬとして入力音声信号ＳＮ３Ｌを選択し設定する。
入力選択部３ｂＬは、上述のように設定した出力音声信号ＳＮｔＬを、通信部４Ｌに向け出力する。
このように、入力選択部３ｂＬは、出力音声信号ＳＮｔＬにおける、入力音声信号ＳＮ２Ｌと入力音声信号ＳＮ３Ｌそれぞれの反映度合ＲＦ１Ｌと反映度合ＲＦ２Ｌとを、検出第１音声信号ＳＮ１ａの音圧Ｖａに基づいて設定する。
この例において、入力選択部３ｂＬは、反映度合ＲＦ１Ｌ及び反映度合ＲＦ２Ｌを、一方を反映有、他方を反映無し、の二者択一として設定する。 The input sound processing method by the operation of the input selection units 3bL and 3bR of the earphones 91L and 91R is similar to that of the input selection unit 3b of the first embodiment.
3, the input selection unit 3bL of the earphone 91L selects and sets the input audio signal SN2L as the output audio signal SNtL when the sound pressure VL of the detected first audio signal SNL output from the sound pressure detection unit 3aL is less than at least a preset lower switching sound pressure Va1, and selects and sets the input audio signal SN3L as the output audio signal SNtL when the sound pressure VR exceeds at least an upper switching sound pressure Va2.
The input selection unit 3bL outputs the output audio signal SNtL set as described above to the communication unit 4L.
In this manner, the input selection unit 3bL sets the reflection degrees RF1L and RF2L of the input sound signals SN2L and SN3L in the output sound signal SNtL, respectively, based on the sound pressure Va of the detected first sound signal SN1a.
In this example, the input selection unit 3bL sets the reflection degree RF1L and the reflection degree RF2L as an alternative between one being reflected and the other not being reflected.

また、イヤホン９１Ｒの入力選択部３ｂＲは、音圧検出部３ａＲから出力された検出第１音声信号ＳＮＲの音圧ＶＲが、予め設定された少なくとも切換下音圧Ｖａ１未満となる場合に、出力音声信号ＳＮｔＲとして入力音声信号ＳＮ２Ｒを選択し設定する。また、音圧ＶＲが少なくとも切換上音圧Ｖａ２を超える場合に、出力音声信号ＳＮｔＲとして入力音声信号ＳＮ３Ｒを選択し設定する。
入力選択部３ｂＲは、上述のように設定した出力音声信号ＳＮｔＲを、通信部４Ｒに向け出力する。
このように、入力選択部３ｂＲは、出力音声信号ＳＮｔＲにおける、入力音声信号ＳＮ２Ｒと入力音声信号ＳＮ３Ｒそれぞれの反映度合ＲＦ１Ｒと反映度合ＲＦ２Ｒとを、検出第１音声信号ＳＮ１ａの音圧Ｖａに基づいて設定する。
この例において、入力選択部３ｂＲは、反映度合ＲＦ１Ｒ及び反映度合ＲＦ２Ｒを、一方を反映有、他方を反映無し、の二者択一として設定する。 In addition, the input selection unit 3bR of the earphone 91R selects and sets the input audio signal SN2R as the output audio signal SNtR when the sound pressure VR of the detected first audio signal SNR output from the sound pressure detection unit 3aR is less than at least the preset lower switching sound pressure Va1, and selects and sets the input audio signal SN3R as the output audio signal SNtR when the sound pressure VR exceeds at least the upper switching sound pressure Va2.
The input selection unit 3bR outputs the output audio signal SNtR set as described above to the communication unit 4R.
In this manner, the input selection unit 3bR sets the reflection degrees RF1R and RF2R of the input sound signals SN2R and SN3R in the output sound signal SNtR, respectively, based on the sound pressure Va of the detected first sound signal SN1a.
In this example, the input selection unit 3bR sets the reflection degree RF1R and the reflection degree RF2R as an alternative between one being reflected and the other not being reflected.

通信部４Ｒは、出力制御部３ｅから入来した音圧情報ＪＲ１を、イヤホン９１Ｒの外部へ無線送信する。無線送信方法は、例えば、Ｂｌｕｅｔｏｏｔｈ（登録商標）である。
ここで、通信部４Ｒの、入力選択部３ｂＲから出力された出力音声信号ＳＮｔＲの無線送信の有無は、出力制御部３ｅから入来した通信制御情報ＪＲ２によって制御される。
すなわち、通信制御情報ＪＲ２は、出力音声信号ＳＮｔＲの無線送信の許可と禁止のいずれかの指令を含み、この指令に基づいて、通信部４Ｒにおける出力音声信号ＳＮｔＲの無線送信が制御される。 The communication unit 4R wirelessly transmits the sound pressure information JR1 received from the output control unit 3e to the outside of the earphone 91R. The wireless transmission method is, for example, Bluetooth (registered trademark).
Here, whether or not the communication section 4R wirelessly transmits the output audio signal SNtR output from the input selection section 3bR is controlled by communication control information JR2 input from the output control section 3e.
That is, the communication control information JR2 includes a command to either permit or prohibit wireless transmission of the output audio signal SNtR, and based on this command, the wireless transmission of the output audio signal SNtR in the communication unit 4R is controlled.

イヤホン９１Ｌの通信部４Ｌは、イヤホン９１Ｒの通信部４Ｒから無線送信された音圧情報ＪＲ１を受信し、音圧差評価部３ｄに送出する。
音圧差評価部３ｄは、通信部４Ｌから送出された音圧情報ＪＲ１から音圧ＶＲを取得し、音圧ＶＲと、音圧検出部３ａＬから取得した検出第１音声信号ＳＮＬの音圧ＶＬとの大小を比較する。 The communication unit 4L of the earphone 91L receives the sound pressure information JR1 wirelessly transmitted from the communication unit 4R of the earphone 91R, and sends it to the sound pressure difference evaluation unit 3d.
The sound pressure difference evaluation unit 3d acquires the sound pressure VR from the sound pressure information JR1 sent from the communication unit 4L, and compares the sound pressure VR with the sound pressure VL of the detected first sound signal SNL acquired from the sound pressure detection unit 3aL.

音圧差評価部３ｄは、音圧ＶＬと音圧ＶＲとの大小関係に対応して予め設定された出力音声信号ＳＮｔＬ及び出力音声信号ＳＮｔＲの一方を、イヤホンシステム９１ＳＴとして外部に無線送信する出力音声信号ＳＮｓｔとして設定する。
次いで音圧差評価部３ｄは、出力音声信号ＳＮｔＬに設定された信号を特定する通信制御情報ＪＬ２を通信部４Ｌに出力し、通信部４Ｌは、通信制御情報ＪＬ２をイヤホン９１Ｒの通信部４Ｒに向け無線送信する。
通信部４Ｒは、通信制御情報ＪＬ２を受信すると共に、受信した通信制御情報ＪＬ２を出力制御部３ｅに送出する。 The sound pressure difference evaluation unit 3d sets one of the output sound signal SNtL and the output sound signal SNtR, which are preset in accordance with the magnitude relationship between the sound pressures VL and VR, as an output sound signal SNst to be wirelessly transmitted to the outside as the earphone system 91ST.
Next, the sound pressure difference evaluation unit 3d outputs communication control information JL2 that identifies the signal set in the output sound signal SNtL to the communication unit 4L, and the communication unit 4L wirelessly transmits the communication control information JL2 to the communication unit 4R of the earphone 91R.
The communication unit 4R receives the communication control information JL2 and sends the received communication control information JL2 to the output control unit 3e.

音圧差評価部３ｄの動作について、図１０を参照して詳述する。
図１０は、音圧ＶＬと音圧ＶＲとの大小と、イヤホンシステム９１ＳＴとして外部に無線送信する出力音声信号ＳＮｓｔとの関係を示した表である。
図１０に示されるように、音圧差評価部３ｄは、音圧ＶＬよりも音圧ＶＲが大きいと判定した場合、イヤホンシステム９１ＳＴとして無線送信する出力音声信号ＳＮｓｔに、出力音声信号ＳＮｔＬを設定する。 The operation of the sound pressure difference evaluation unit 3d will be described in detail with reference to FIG.
FIG. 10 is a table showing the relationship between the magnitude of the sound pressure VL and the sound pressure VR and the output audio signal SNst wirelessly transmitted to the outside as the earphone system 91ST.
As shown in FIG. 10, when the sound pressure difference evaluation unit 3d determines that the sound pressure VR is greater than the sound pressure VL, it sets the output sound signal SNtL to the output sound signal SNst to be wirelessly transmitted by the earphone system 91ST.

また、音圧差評価部３ｄは、通信制御情報ＪＬ２に、通信部４Ｌに対し出力音声信号ＳＮｔＬの無線送信を実行する指示を含め通信部４Ｌに向け送出する。
音圧差評価部３ｄは、音圧ＶＬよりも音圧ＶＲが小さいと判定した場合、通信制御情報ＪＬ２に、通信部４Ｌに対し出力音声信号ＳＮｔＬの無線送信を停止する指示を含め通信部４Ｌに向け送出する。 Moreover, the sound pressure difference evaluation unit 3d sends the communication control information JL2 to the communication unit 4L, including an instruction to the communication unit 4L to wirelessly transmit the output sound signal SNtL.
When the sound pressure difference evaluation unit 3d determines that the sound pressure VR is smaller than the sound pressure VL, it sends the communication control information JL2 to the communication unit 4L including an instruction to the communication unit 4L to stop wireless transmission of the output sound signal SNtL.

通信部４Ｌは、通信制御情報ＪＬ２を通信部４Ｒに向け送信すると共に、通信制御情報ＪＬ２に含まれる通信部４Ｌに対する指示に基づいて、出力音声信号ＳＮｔＬの無線送信を実行又は停止する。
一方、通信部４Ｒは、通信部４Ｌから送信された通信制御情報ＪＬ２を受信して出力制御部３ｅへ送出する。 The communication unit 4L transmits the communication control information JL2 to the communication unit 4R, and executes or stops wireless transmission of the output audio signal SNtL based on an instruction for the communication unit 4L contained in the communication control information JL2.
On the other hand, the communication section 4R receives the communication control information JL2 transmitted from the communication section 4L and sends it to the output control section 3e.

出力制御部３ｅは、通信制御情報ＪＬ２に基づき、通信部４Ｌに出力音声信号ＳＮｔＬを無線送信させる場合、通信制御情報ＪＲ２を通信部４Ｒにおける出力音声信号ＳＮｔＲの無線送信を停止する指示を含め生成する。
また、出力制御部３ｅは、通信部４Ｌに出力音声信号ＳＮｔＬを無線送信させない場合、通信制御情報ＪＲ２を通信部４Ｒにおける出力音声信号ＳＮｔＲの無線送信を実行させる指示を含め生成する。
出力制御部３ｅは、生成した通信制御情報ＪＲ２を通信部４Ｒに向け送出する。
通信部４Ｒは、出力制御部３ｅから送出された通信制御情報ＪＲ２に基づいて、出力音声信号ＳＮｔＲの無線送信を実行又は停止する。 When the output control unit 3e causes the communication unit 4L to wirelessly transmit the output audio signal SNtL based on the communication control information JL2, the output control unit 3e generates the communication control information JR2 including an instruction to stop wireless transmission of the output audio signal SNtR in the communication unit 4R.
Furthermore, when the output control unit 3e does not want the communication unit 4L to wirelessly transmit the output audio signal SNtL, the output control unit 3e generates communication control information JR2 including an instruction to wirelessly transmit the output audio signal SNtR in the communication unit 4R.
The output control unit 3e sends the generated communication control information JR2 to the communication unit 4R.
The communication unit 4R executes or stops wireless transmission of the output audio signal SNtR based on the communication control information JR2 sent from the output control unit 3e.

上述の音声入力装置及び入力音声処理方法から明らかなように、イヤホンシステム９１ＳＴは、二つのイヤホン９１Ｌ，９１Ｒの内、周囲音の小さい方の出力音声信号を択一選択して外部に無線送信する。
これにより、イヤホンシステム９１ＳＴは、話者Ｈの発した音声のＡＩアシスタント８１における認識率が向上する。 As is apparent from the above-described audio input device and input audio processing method, the earphone system 91ST selects the output audio signal of the earphone 91L, 91R, whichever has the smaller ambient sound, and wirelessly transmits it to the outside.
As a result, the earphone system 91ST improves the recognition rate of the AI assistant 81 for the voice uttered by the speaker H.

以上詳述した実施例２は、上述の構成及び手順に限定されるものではなく、本発明の要旨を逸脱しない範囲において変形した変形例としてもよい。 The second embodiment described above is not limited to the above-mentioned configuration and procedure, and may be modified without departing from the spirit and scope of the present invention.

イヤホン９１Ｌ及びイヤホン９１Ｒは、実施例１における変形例１で説明したように、入力選択部３ｂＬ及び入力選択部３ｂＲの替わりに、入力混合部３ｃと同じ動作を実行する入力混合部３ｃＬ及び入力混合部３ｃＲ（図９参照）を有していてもよい。
例えば、入力混合部３ｃＬにおいて、入力音声信号ＳＮ２Ｌと入力音声信号ＳＮ３とが、音圧Ｖａに応じたそれぞれ反映度合ＦＲ２Ｌ，ＦＲ３Ｌの音圧比率で混合され、混合される音圧の比率が、本体部１の周囲音の音圧の増減に応じて線形で徐々に変化する。
反映度合ＲＦ１Ｌ及び反映度合ＲＦ２Ｌは、出力音声信号ＳＮｔＬに対する入力音声信号ＳＮ２Ｌ及び入力音声信号ＳＮ３Ｌの反映の程度を、それぞれ示す指標である。指標は、例えば音圧の大きさとする。従って、音圧比率は、出力音声信号ＳＮｔＬに含まれる入力音声信号ＳＮ２Ｌと入力音声信号ＳＮ３Ｌとの音圧の比率である。
例えば、図５に示されるように、出力音声信号ＳＮｔＬにおける反映度合ＦＲ２Ｌは、音圧Ｖａが音圧Ｖａ３のときＶｍａｘ／Ｖｍｉｎ、音圧ＶａｘのときＶ２ｘ／Ｖ３ｘ、音圧Ｖａ４のときＶｍｉｎ／Ｖｍａｘで示される。
また、出力音声信号ＳＮｔＬにおける反映度合ＦＲ３は、音圧Ｖａが音圧Ｖａ３のときＶｍｉｎ／Ｖｍａｘ、音圧ＶａｘでＶ３ｘ／Ｖ２ｘ、音圧Ｖａ４でＶｍａｘ／Ｖｍｉｎで示される。
イヤホン９１Ｌ及びイヤホン９１Ｒが、入力選択部３ｂＬ及び入力選択部３ｂＲの替わりにそれぞれ入力混合部３ｃＬ及び入力混合部３ｃＲを有すると、周囲音の増減に応じた出力音声信号ＳＮｓｔの音質変化が緩やかでなめらかになる。
これにより、本体部１Ｌ又は本体部１Ｒの周囲音の音圧によらず、話者Ｈが発した音声のＡＩアシスタント８１による認識率が高く維持される。
また、イヤホン９１Ｌ及びイヤホン９１Ｒが、それぞれ入力混合部３ｃＬ，３ｃＲを有することで、周囲音の増減によらず出力音声信号ＳＮｓｔの総音圧が一定で急変しないので、話者Ｈが発した音声のＡＩアシスタント８１による認識率がより高く維持される。 As described in Modification 1 of the first embodiment, the earphones 91L and 91R may have, instead of the input selection units 3bL and 3bR, input mixing units 3cL and 3cR (see FIG. 9 ) that perform the same operation as the input mixing unit 3c.
For example, in the input mixing unit 3cL, the input audio signal SN2L and the input audio signal SN3 are mixed at a sound pressure ratio of the reflection degrees FR2L and FR3L, respectively, which correspond to the sound pressure Va, and the ratio of the mixed sound pressures gradually changes linearly in response to the increase or decrease in the sound pressure of the ambient sound of the main body unit 1.
The reflection degree RF1L and the reflection degree RF2L are indexes indicating the degree of reflection of the input audio signal SN2L and the input audio signal SN3L to the output audio signal SNtL, respectively. The indexes are, for example, the magnitude of sound pressure. Therefore, the sound pressure ratio is the ratio of the sound pressure of the input audio signal SN2L and the input audio signal SN3L contained in the output audio signal SNtL.
For example, as shown in FIG. 5, the reflection degree FR2L in the output audio signal SNtL is represented by Vmax/Vmin when the sound pressure Va is Va3, V2x/V3x when the sound pressure Vax, and Vmin/Vmax when the sound pressure Va4.
The reflection degree FR3 in the output audio signal SNtL is expressed as Vmin/Vmax when the sound pressure Va is the sound pressure Va3, as V3x/V2x when the sound pressure Vax, and as Vmax/Vmin when the sound pressure Va4.
When the earphones 91L and 91R have the input mixing units 3cL and 3cR instead of the input selection units 3bL and 3bR, respectively, the change in sound quality of the output audio signal SNst in response to an increase or decrease in the ambient sound becomes gradual and smooth.
This maintains a high recognition rate by the AI assistant 81 of the voice uttered by the speaker H, regardless of the sound pressure of the surrounding sound of the main body unit 1L or the main body unit 1R.
In addition, since the earphone 91L and the earphone 91R have the input mixing units 3cL and 3cR, respectively, the total sound pressure of the output audio signal SNst remains constant and does not change suddenly regardless of an increase or decrease in the ambient sound, so that the recognition rate by the AI assistant 81 of the voice uttered by the speaker H is maintained at a higher level.

イヤホンシステム９１ＳＴは、イヤホン９１Ｌ及びイヤホン９１Ｒがそれぞれ入力混合部３ｃＬ及び入力混合部３ｃＲを有する場合に、実施例１の変形例２～４を可能な範囲で適用することができる。 In the case where earphone 91L and earphone 91R have input mixing unit 3cL and input mixing unit 3cR, respectively, the earphone system 91ST can apply variants 2 to 4 of Example 1 to the extent possible.

通信部４，４Ｌ，４Ｒが行う通信方法は、上述のＢｌｕｅｔｏｏｔｈ（登録商標）に限定されず、種々の方法を適用できる。また、通信部４，４Ｌ，４Ｒは、外部との通信を無線で行うことに限定されず、有線で行ってもよい。 The communication method used by the communication units 4, 4L, and 4R is not limited to the above-mentioned Bluetooth (registered trademark), and various methods can be applied. Furthermore, the communication units 4, 4L, and 4R are not limited to wireless communication with the outside world, and may be wired.

実施例１，２の音声入力装置であるイヤホン９１，９１Ａ～９１Ｄ，９１Ｌ，９１Ｒにおいて、第３マイクＭ３は、上述の気導音マイクに限定されず、骨導音を収音する骨伝導マイクであってもよい。
図１１は、第３マイクＭ３が骨伝導マイクの場合の配置位置を例示した図である。図１１に示されるように、第３マイクＭ３は骨伝導マイクであって、挿入部２を外耳道Ｅ１に挿入した際に、外耳道Ｅ１の内面に密着する第３の位置にあり、話者Ｈが発した音声の骨導音を収音する。 In the earphones 91, 91A to 91D, 91L, and 91R that are the voice input devices of the first and second embodiments, the third microphone M3 is not limited to the air conduction sound microphone described above, and may be a bone conduction microphone that picks up bone conduction sound.
Fig. 11 is a diagram illustrating an example of the arrangement position of the third microphone M3 when the third microphone M3 is a bone conduction microphone. As shown in Fig. 11, the third microphone M3 is a bone conduction microphone, and when the insertion portion 2 is inserted into the ear canal E1, the third microphone M3 is located at a third position where it is in close contact with the inner surface of the ear canal E1, and picks up the bone conduction sound of the voice uttered by the speaker H.

実施例２の音声入力システム９１ＳＴにおいて、第１の音声入力装置としてのイヤホン９１Ｌ及び第２の音声入力装置としてのイヤホン９１Ｒの使用態様は、話者Ｈの一方の耳及び他方の耳に装着されて使用される態様に限定されない。例えば、イヤホン９１Ｌを第１の話者の耳に装着し、イヤホン９１Ｒを第１の話者とは別の第２の話者の耳に装着して使用する態様であってもよい。 In the voice input system 91ST of the second embodiment, the use of the earphone 91L as the first voice input device and the earphone 91R as the second voice input device is not limited to being worn on one ear and the other ear of the speaker H. For example, the earphone 91L may be worn on the ear of the first speaker, and the earphone 91R may be worn on the ear of a second speaker other than the first speaker.

１本体部
１ａ気室
２挿入部
２ａ放音路
３，３Ｌ，３Ｒ制御部
３ａ，３ａＬ，３ａＲ音圧検出部
３ｂ，３ｂＬ，３ｂＲ入力選択部
３ｃ，３ｃＢ，３ｃＣ，３ｃＤ入力混合部
３ｄ音圧差評価部
３ｅ出力制御部
４，４Ｌ，４Ｒ通信部
５駆動部
６スピーカユニット
８１ＡＩアシスタント
９１，９１Ａ，９１Ｂ，９１Ｃ，９１Ｄ，９１Ｌ，９１Ｒイヤホン（音声入力装置）
９１ＳＴイヤホンシステム（音声入力システム）
Ｅ耳介
Ｅ１外耳道
Ｅｖ内部空間
ＦＲ２，ＦＲ３反映度合
Ｈ話者
ＪＲ１音圧情報
ＪＬ２，ＪＲ２通信制御情報
Ｍ１第１マイク
Ｍ２第２マイク
Ｍ３第３マイク
ＬＮ２ｂ，ＬＮ３ｂ，ＬＮ２ａ，ＬＮ３ａ特性線
Ｒ混合範囲
ＳＮ１～ＳＮ３，ＳＮ１Ｌ～ＳＮ３Ｌ，ＳＮ１Ｒ～ＳＮ３Ｒ入力音声信号
ＳＮ１ａ，ＳＮＬ，ＳＮＲ検出第１音声信号
ＳＮｔ，ＳＮｔＬ，ＳＮｔＲ，ＳＮｓｔ出力音声信号
Ｖ，Ｖ２ｘ，Ｖ３ｘ混合音圧
Ｖａ，Ｖａｘ，Ｖａ５，Ｖａ６音圧
Ｖａ１切換下音圧
Ｖａ２切換上音圧
Ｖａ３混合下限音圧
Ｖａ４混合上限音圧
Ｖｍａｘ混合最大音圧
Ｖｍｉｎ混合最小音圧
ＶＬ，ＶＲ音圧
Ｖｔ，Ｖｔ１，Ｖｔ２，Ｖｔｃ総音圧
Ｖ２ｍａｘ，Ｖ３ｍａｘ混合最大音圧 1 Main body 1a Air chamber 2 Insertion section 2a Sound emission path 3, 3L, 3R Control section 3a, 3aL, 3aR Sound pressure detection section 3b, 3bL, 3bR Input selection section 3c, 3cB, 3cC, 3cD Input mixing section 3d Sound pressure difference evaluation section 3e Output control section 4, 4L, 4R Communication section 5 Drive section 6 Speaker unit 81 AI assistant 91, 91A, 91B, 91C, 91D, 91L, 91R Earphones (voice input device)
91ST Earphone System (Audio Input System)
E Pinna E1 Ear canal Ev Internal space FR2, FR3 Degree of reflection H Speaker JR1 Sound pressure information JL2, JR2 Communication control information M1 First microphone M2 Second microphone M3 Third microphone LN2b, LN3b, LN2a, LN3a Characteristic line R Mixing range SN1 to SN3, SN1L to SN3L, SN1R to SN3R Input audio signal SN1a, SNL, SNR Detected first audio signal SNt, SNtL, SNtR, SNst Output audio signal V, V2x, V3x Mixed sound pressure Va, Vax, Va5, Va6 Sound pressure Va1 Switching lower sound pressure Va2 Switching upper sound pressure Va3 Mixed lower limit sound pressure Va4 Mixed upper limit sound pressure Vmax Mixed maximum sound pressure Vmin Mixed minimum sound pressure VL, VR Sound pressure Vt, Vt1, Vt2, Vtc Total sound pressure V2max, V3max Maximum mixed sound pressure

Claims

a first microphone that picks up a voice at a first position outside the ear canal of a speaker and outputs a first input voice signal;
a second microphone that picks up a voice at a second position outside the ear canal of the speaker that is closer to the mouth of the speaker than the first position and outputs a second input voice signal;
a third microphone that picks up a voice in the ear canal of the speaker and outputs a third input voice signal;
a control unit that detects a sound pressure of the first input audio signal, sets a reflection degree of each of the second input audio signal and the third input audio signal in accordance with the detected sound pressure, and generates an output audio signal including at least one of the second input audio signal and the third input audio signal based on the reflection degree;
a communication unit that transmits the output audio signal to an external device;
The audio input device includes a first audio input device and a second audio input device that can communicate with each other,
The control unit of the first voice input device
An audio input system that determines whether the sound pressure of the first input audio signal in the first audio input device is larger than the sound pressure of the first input audio signal in the second audio input device, and sets the output audio signal to be transmitted from the communication unit to the outside based on the determination result.

In the first voice input device and the second voice input device,
The control unit selects the reflection degree as either "reflected" or "not reflected,"
2. The voice input system according to claim 1, wherein one of the second input voice signal and the third input voice signal is selectively set as an output voice signal in response to the detected sound pressure.

In the first voice input device and the second voice input device,
The control unit is
When a sound pressure of the detected first input sound signal is smaller than a preset first sound pressure, the second input sound signal is set as the output sound signal;
3. The voice input system according to claim 2, wherein the third input voice signal is set as the output voice signal when the sound pressure of the first input voice signal exceeds a second sound pressure that is greater than the first sound pressure.

In the first voice input device and the second voice input device,
The control unit is
The reflection degree is a sound pressure ratio,
2. The voice input system according to claim 1, wherein the second input voice signal and the third input voice signal are mixed at the sound pressure ratio according to the detected sound pressure, and the result is set as an output voice signal.

The control unit of the first voice input device
The second input audio signal and the third input audio signal,
When the detected sound pressure of the first input audio signal is smaller than a preset first sound pressure, the sound pressure of the second input audio signal is mixed to be greater than the sound pressure of the third input audio signal;
When the sound pressure of the detected first input audio signal exceeds a second sound pressure that is greater than the first sound pressure, a sound pressure of the third input audio signal is mixed to be greater than a sound pressure of the second input audio signal;
5. The voice input system according to claim 4, characterized in that, when the detected sound pressure of the first input voice signal is in a range equal to or greater than the first sound pressure and equal to or less than the second sound pressure, the sound pressure is mixed so that the ratio of the sound pressure of the third input voice signal to the sound pressure of the second input voice signal increases as the sound pressure increases.

acquiring a first input audio signal based on a sound picked up at a first position outside the ear canal of a speaker, and detecting a sound pressure of the first input audio signal;
acquiring, as a second input audio signal, a sound picked up at a second position outside the ear canal of the speaker that is closer to the mouth of the speaker than the first position;
Acquiring a third input audio signal based on a sound picked up in the ear canal of the speaker;
a pair of audio input devices, a first audio input device and a second audio input device, which selectively set one of the second input audio signal and the third input audio signal as an output audio signal in response to a sound pressure of the first input audio signal and transmit the output audio signal to an outside;
The first audio input device and the second audio input device are attached to the left ear and the right ear, respectively;
An input audio processing method that determines whether the sound pressure of the first input audio signal in the first audio input device is larger than the sound pressure of the first input audio signal in the second audio input device, and sets the output audio signal to be transmitted from a communication unit to the outside based on the determination result.

acquiring a first input audio signal based on a sound picked up at a first position outside the ear canal of a speaker, and detecting a sound pressure of the first input audio signal;
acquiring, as a second input audio signal, a sound picked up at a second position outside the ear canal of the speaker that is closer to the mouth of the speaker than the first position;
Acquiring a third input audio signal based on a sound picked up in the ear canal of the speaker;
a first audio input device and a second audio input device, which are a pair of audio input devices, mix the second input audio signal and the third input audio signal at a sound pressure ratio corresponding to the sound pressure of the first input audio signal, set the output audio signal, and transmit the output audio signal to an outside;
The first audio input device and the second audio input device are attached to the left ear and the right ear, respectively;
An input audio processing method that determines whether the sound pressure of the first input audio signal in the first audio input device is larger than the sound pressure of the first input audio signal in the second audio input device, and sets the output audio signal to be transmitted from a communication unit to the outside based on the determination result.