JP2017123554A

JP2017123554A - Speech apparatus and audio signal correction program

Info

Publication number: JP2017123554A
Application number: JP2016001367A
Authority: JP
Inventors: 遠藤　香緒里; Kaori Endo; 香緒里遠藤
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2016-01-06
Filing date: 2016-01-06
Publication date: 2017-07-13

Abstract

PROBLEM TO BE SOLVED: To prevent an occurrence of echo by learning erroneous transfer characteristics during a call.SOLUTION: The speech apparatus includes: a receiver; a first microphone; a second microphone; an echo suppression unit, and a transfer characteristic learning unit. The receiver outputs an audio signal as an air conduction sound. The first microphone picks up the air conduction sound and the second microphone picks up the bone conduction sound. The echo suppression unit suppresses the echo component included in the air-conduction sound signal input from the first microphone on the basis of the transfer characteristic of the sound from the receiver to the first microphone. When the input level of the bone conduction sound signal input from the second microphone is equal to or less than a predetermined threshold value, the transfer characteristic learning unit learns the transfer characteristic on the basis of the received speech signal and the air conduction sound signal outputted from the receiver.SELECTED DRAWING: Figure 2

Description

本発明は、通話装置及び音声信号補正プログラム The present invention relates to a call device and an audio signal correction program.

携帯電話端末等の通話装置は、音声信号を気導音として出力（放射）するレシーバと、通話装置の周囲を伝播する気導音を収音するマイクロフォン（以下「マイク」という）とを備える。通話装置は、他の通話装置との呼接続が確立された状態では、レシーバにより他の通話装置から受信した音声信号（受話信号）を出力するとともに、マイクで収音した気導音の音声信号（送話信号）を他の通話装置に送信する。そのため、一方の通話装置における送話信号は、当該通話装置のレシーバから出力した音声を含んでいることがある。レシーバから出力した音声を含む音声信号が一方の通話装置から他方の通話装置に送信されると、エコーが発生して通話品質が劣化する。 A communication device such as a mobile phone terminal includes a receiver that outputs (radiates) an audio signal as air conduction sound, and a microphone that collects air conduction sound that propagates around the communication device (hereinafter referred to as a “microphone”). In a state in which a call connection with another call device is established, the call device outputs an audio signal (received signal) received from the other call device by the receiver and an air conduction sound signal picked up by the microphone. (Transmission signal) is transmitted to another communication device. For this reason, the transmission signal in one of the communication devices may include sound output from the receiver of the communication device. When a voice signal including voice output from the receiver is transmitted from one call device to the other call device, echo is generated and the call quality deteriorates.

エコーによる通話品質の劣化を防ぐ技術として、レシーバからマイクへの気導音の伝達特性に基づいてマイクから入力された音声信号に含まれるエコー成分を抑圧する技術が知られている。エコー成分は、エコーの発生原因となる成分である。エコー成分を抑圧する際には、レシーバからマイクへの音の伝達特性を示す伝達係数と受話信号とから擬似エコー信号を生成し、擬似エコー信号と送話信号とに基づいて残留信号を生成する（例えば、特許文献１を参照）。 As a technique for preventing deterioration of call quality due to echo, a technique for suppressing an echo component included in a voice signal input from a microphone based on a transfer characteristic of air conduction sound from the receiver to the microphone is known. The echo component is a component that causes an echo to occur. When suppressing the echo component, a pseudo echo signal is generated from the transmission coefficient indicating the transmission characteristic of the sound from the receiver to the microphone and the received signal, and a residual signal is generated based on the pseudo echo signal and the transmitted signal. (For example, see Patent Document 1).

また、伝達特性に基づいてエコー成分を抑圧する際に、レシーバから出力させる受話信号とマイクから入力された送話信号とに基づいて伝達特性を学習することにより、送話信号に含まれるエコー成分を適切に抑圧する技術が知られている。 Further, when suppressing the echo component based on the transfer characteristic, the echo component included in the transmission signal is learned by learning the transfer characteristic based on the reception signal output from the receiver and the transmission signal input from the microphone. A technique for appropriately suppressing the above is known.

更に、マイクから入力された送話信号に通話装置の利用者の音声が含まれる場合には伝達特性を学習しないようにすることで、誤った伝達特性の学習を防ぎエコー成分の抑圧性能の低下を防止する技術が知られている（例えば、特許文献２を参照）。 In addition, when the voice signal input from the microphone contains the voice of the user of the telephone device, learning of the transfer characteristic is prevented to prevent erroneous transfer characteristic learning and the echo component suppression performance is reduced. A technique for preventing this is known (see, for example, Patent Document 2).

特開２０１３−０８１１６３号公報JP 2013-081163 A 特開平０４−３４２３１７号公報Japanese Patent Laid-Open No. 04-342317

マイクで気導音を収音する通話装置では、利用者の発した音声（気導音）が小さい等の理由により、利用者が発話しているにもかかわらずマイクから入力された音声信号に含まれる利用者の音声が検出されないことがある。そのため、マイクから入力された音声信号に利用者の音声が含まれる場合には伝達特性を学習しない通話装置においても、利用者の音声を含む音声信号に基づいて誤った伝達特性を学習してしまいエコー成分の抑圧性能が低下することがある。 In a call device that collects air conduction sound with a microphone, the voice signal (air conduction sound) emitted by the user is low, and the sound signal input from the microphone is not processed even though the user is speaking. The contained user's voice may not be detected. For this reason, when a voice signal input from a microphone includes a user's voice, a call device that does not learn a transfer characteristic learns an incorrect transfer characteristic based on the voice signal including the user's voice. The echo component suppression performance may be reduced.

１つの側面において、本発明は、通話時に誤った伝達特性を学習することによるエコーの発生や送話音質の劣化を防止することを目的とする。 In one aspect, an object of the present invention is to prevent generation of echoes and deterioration of transmission sound quality due to learning of erroneous transfer characteristics during a call.

１つの態様の通話装置は、レシーバと、第１のマイクと、第２のマイクと、エコー抑圧部と、伝達特性学習部と、を備える。レシーバは、音声信号を気導音として出力する。第１のマイクは気導音を収音し、第２のマイクは骨導音を収音する。エコー抑圧部は、レシーバから第１のマイクへの音声の伝達特性に基づいて第１のマイクから入力された気導音信号に含まれるエコー成分を抑圧する。伝達特性学習部は、第２のマイクから入力された骨導音信号の入力レベルが所定の閾値以下である場合に、レシーバから出力させる受話信号、及び気導音信号に基づいて伝達特性を学習する。 The telephone device according to one aspect includes a receiver, a first microphone, a second microphone, an echo suppression unit, and a transfer characteristic learning unit. The receiver outputs an audio signal as an air conduction sound. The first microphone picks up air conduction sound, and the second microphone picks up bone conduction sound. The echo suppression unit suppresses an echo component included in the air conduction sound signal input from the first microphone based on the transmission characteristic of the sound from the receiver to the first microphone. The transfer characteristic learning unit learns transfer characteristics based on the reception signal and the air conduction sound signal output from the receiver when the input level of the bone conduction sound signal input from the second microphone is equal to or lower than a predetermined threshold. To do.

上述の態様によれば、通話時に誤った伝達特性を学習することによるエコーの発生や送話音質の劣化を防止することが可能となる。 According to the above-described aspect, it is possible to prevent the occurrence of echoes and the deterioration of the transmission sound quality due to learning of erroneous transfer characteristics during a call.

第１の実施形態に係る通話装置の構成を示す図である。It is a figure which shows the structure of the telephone apparatus which concerns on 1st Embodiment. 第１の実施形態に係る通話装置における送話信号処理部の機能的構成を示す図である。It is a figure which shows the functional structure of the transmission signal processing part in the telephone apparatus which concerns on 1st Embodiment. 第１の実施形態に係るエコー抑圧処理を説明するフローチャートである。It is a flowchart explaining the echo suppression process which concerns on 1st Embodiment. 利用者音声検出処理の内容を説明するフローチャートである。It is a flowchart explaining the content of a user voice detection process. 伝達特性を学習する処理の内容を説明するフローチャートである。It is a flowchart explaining the content of the process which learns a transfer characteristic. エコー成分を抑圧する処理の内容を説明するフローチャートである。It is a flowchart explaining the content of the process which suppresses an echo component. 第２の実施形態に係る通話装置における送話信号処理部の機能的構成を示す図である。It is a figure which shows the functional structure of the transmission signal process part in the telephone apparatus which concerns on 2nd Embodiment. 第２の実施形態に係るエコー抑圧処理を説明するフローチャート（その１）である。It is a flowchart (the 1) explaining the echo suppression process which concerns on 2nd Embodiment. 第２の実施形態に係るエコー抑圧処理を説明するフローチャート（その２）である。It is a flowchart (the 2) explaining the echo suppression process which concerns on 2nd Embodiment. 骨導音信号を補正する処理の内容を説明するフローチャートである。It is a flowchart explaining the content of the process which correct | amends a bone-conduction sound signal. 骨導音補正特性の算出方法の例を説明するフローチャートである。It is a flowchart explaining the example of the calculation method of a bone-conduction sound correction characteristic. 第３の実施形態に係る通話装置における送話信号処理部の機能的構成を示す図である。It is a figure which shows the functional structure of the transmission signal process part in the telephone apparatus which concerns on 3rd Embodiment. 第３の実施形態に係るエコー抑圧処理を説明するフローチャート（その１）である。It is a flowchart (the 1) explaining the echo suppression process which concerns on 3rd Embodiment. 第３の実施形態に係るエコー抑圧処理を説明するフローチャート（その２）である。It is a flowchart (the 2) explaining the echo suppression process which concerns on 3rd Embodiment. 骨導音信号の信頼度の算出方法を説明するグラフである。It is a graph explaining the calculation method of the reliability of a bone-conduction sound signal. 伝達特性の更新係数の算出方法を説明するグラフである。It is a graph explaining the calculation method of the update coefficient of a transfer characteristic. 第４の実施形態に係る通話装置における要部の機能的構成を示す図である。It is a figure which shows the functional structure of the principal part in the telephone apparatus which concerns on 4th Embodiment. 第４の実施形態における骨導音信号の信頼度の算出方法を説明するグラフである。It is a graph explaining the calculation method of the reliability of the bone-conduction sound signal in 4th Embodiment. コンピュータのハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of a computer.

［第１の実施形態］
図１は、第１の実施形態に係る通話装置の構成を示す図である。 [First Embodiment]
FIG. 1 is a diagram illustrating a configuration of a communication device according to the first embodiment.

本実施形態の通話装置は、携帯電話端末等、通話が可能な移動体通信装置である。図１に示すように、本実施形態の通話装置１は、ＲＦ送受信部２と、アンテナ３と、ベースバンド処理部４と、音声信号処理部５と、レシーバ８と、第１のマイク９と、第２のマイク１０と、を備える。また、通話装置１は、Ｄ／Ａコンバータ１１と、Ａ／Ｄコンバータ１２Ａ，１２Ｂと、増幅器１３Ａ，１３Ｂ，１３Ｃと、を備える。また、通話装置１は、図示しない入力操作部や表示部等を備える。 The call device according to the present embodiment is a mobile communication device such as a mobile phone terminal that can make a call. As shown in FIG. 1, the communication device 1 of the present embodiment includes an RF transmission / reception unit 2, an antenna 3, a baseband processing unit 4, an audio signal processing unit 5, a receiver 8, and a first microphone 9. And a second microphone 10. Further, the communication device 1 includes a D / A converter 11, A / D converters 12A and 12B, and amplifiers 13A, 13B, and 13C. In addition, the communication device 1 includes an input operation unit, a display unit, and the like (not shown).

ＲＦ送受信部２は、アンテナ３で受信した信号の復調、及び他の通話装置に向けて送信する信号の変調を行う。 The RF transmitter / receiver 2 demodulates a signal received by the antenna 3 and modulates a signal to be transmitted to another communication device.

ベースバンド処理部４は、ＲＦ送受信部２で復調した信号、及びＲＦ送受信部２で変調させる信号に対するベースバンド処理を行う。また、ベースバンド処理部４は、ＲＦ送受信部２で復調したアナログ信号をデジタル信号に変換するＡ／Ｄコンバータと、ＲＦ送受信部２で変調させるデジタル信号をアナログ信号に変換するＤ／Ａコンバータとを含む。 The baseband processing unit 4 performs baseband processing on the signal demodulated by the RF transmission / reception unit 2 and the signal modulated by the RF transmission / reception unit 2. The baseband processing unit 4 includes an A / D converter that converts the analog signal demodulated by the RF transmission / reception unit 2 into a digital signal, and a D / A converter that converts the digital signal modulated by the RF transmission / reception unit 2 into an analog signal. including.

音声信号処理部５は、音声信号に対し所定の処理を行う。音声信号処理部５は、他の通話装置から受信した音声信号に対する処理を行う受話信号処理部６と、他の通話装置に送信する音声信号に対する処理を行う送話信号処理部７と、を含む。 The audio signal processing unit 5 performs predetermined processing on the audio signal. The voice signal processing unit 5 includes a reception signal processing unit 6 that performs processing on a voice signal received from another telephone device, and a transmission signal processing unit 7 that performs processing on a voice signal transmitted to the other telephone device. .

レシーバ８は、受話信号処理部６で処理された音声信号を気導音として通話装置１の外部に出力（放射）する。受話信号処理部６で処理された音声信号は、Ｄ／Ａコンバータ１１でデジタル信号からアナログ信号に変換し増幅器１３Ａで増幅させた後、レシーバ８から出力する。なお、本明細書ではレシーバ８としているが、これに限らず、スピーカ等、音声信号を気導音として通話装置１の外部に出力（放射）することが可能なものであればよい。 The receiver 8 outputs (radiates) the voice signal processed by the reception signal processing unit 6 to the outside of the communication device 1 as an air conduction sound. The audio signal processed by the reception signal processing unit 6 is converted from a digital signal to an analog signal by the D / A converter 11, amplified by the amplifier 13 A, and then output from the receiver 8. In the present specification, the receiver 8 is used. However, the receiver 8 is not limited to this, and any device that can output (radiate) an audio signal as an air conduction sound to the outside of the communication device 1 is acceptable.

第１のマイク９は、空気中を伝播して通話装置１に到来する音（気導音）を収音する。第１のマイク９で収音した気導音の音声信号は、増幅器１３Ｂで増幅させＡ／Ｄコンバータ１２Ａでアナログ信号からデジタル信号に変換した後、送話信号処理部７に入力する。以下、第１のマイク９から送話信号処理部７に入力される音声信号を気導音信号ともいう。 The first microphone 9 collects sound (air conduction sound) that propagates in the air and arrives at the communication device 1. The air-conducted sound signal collected by the first microphone 9 is amplified by the amplifier 13B, converted from an analog signal to a digital signal by the A / D converter 12A, and then input to the transmission signal processing unit 7. Hereinafter, the audio signal input from the first microphone 9 to the transmission signal processing unit 7 is also referred to as an air conduction sound signal.

第２のマイク１０は、通話装置１と接触した固体を介して通話装置１に伝達する音（例えば骨導音）を収音する。第２のマイク１０で収音した骨導音の音声信号は、増幅器１３Ｃで増幅させＡ／Ｄコンバータ１２Ｂでアナログ信号からデジタル信号に変換した後、送話信号処理部７に入力する。以下、第２のマイク１０から送話信号処理部７に入力される音声信号を骨導音信号ともいう。 The second microphone 10 collects sound (for example, bone conduction sound) transmitted to the call device 1 through a solid body that is in contact with the call device 1. The bone-conducted sound signal collected by the second microphone 10 is amplified by the amplifier 13C, converted from an analog signal to a digital signal by the A / D converter 12B, and then input to the transmission signal processing unit 7. Hereinafter, the audio signal input from the second microphone 10 to the transmission signal processing unit 7 is also referred to as a bone conduction sound signal.

本実施形態の通話装置１は、他の通話装置との呼接続が確立されると、他の通話装置から受信した音声信号をレシーバ８から出力させる処理とともに、第１のマイク９から入力された気導音信号を他の通話装置に向けて送信する処理を行う。この際、第１のマイク９から入力された気導音信号を他の通話装置に向けて送信する処理は、送話信号処理部７が行う。送話信号処理部７は、気導音信号に対する処理の１つとして、気導音信号に含まれるエコー成分を抑圧する処理を行う。気導音信号に含まれるエコー成分は、通信装置１や他の通信装置のレシーバから出力される音声を聞く際のエコーの要因となる音声成分である。エコーは、図１に示したように、レシーバ８から出力された音声の一部１４が第１のマイク９で収音されることにより生じる。 When the call connection with the other call device is established, the call device 1 of the present embodiment is inputted from the first microphone 9 together with the process of outputting the audio signal received from the other call device from the receiver 8. A process of transmitting the air conduction sound signal to another communication device is performed. At this time, the transmission signal processing unit 7 performs a process of transmitting the air conduction sound signal input from the first microphone 9 to another communication device. The transmission signal processing unit 7 performs a process of suppressing an echo component included in the air conduction sound signal as one of the processes for the air conduction sound signal. The echo component included in the air conduction sound signal is a sound component that causes an echo when listening to the sound output from the receiver of the communication apparatus 1 or another communication apparatus. As shown in FIG. 1, the echo is generated when a part 14 of the sound output from the receiver 8 is collected by the first microphone 9.

図２は、第１の実施形態に係る通話装置における送話信号処理部の機能的構成を示す図である。 FIG. 2 is a diagram illustrating a functional configuration of a transmission signal processing unit in the communication device according to the first embodiment.

図２に示すように、送話信号処理部７は、利用者音声検出部７０１と、伝達特性学習部７０２と、エコー抑圧部７０３と、記憶部７１１と、を含む。 As shown in FIG. 2, the transmission signal processing unit 7 includes a user voice detection unit 701, a transfer characteristic learning unit 702, an echo suppression unit 703, and a storage unit 711.

利用者音声検出部７０１は、第２のマイク１０から入力される骨導音信号に含まれる通話装置１の利用者が発した音声を検出する。通話装置１の利用者は、通話装置１のレシーバ８から出力された音声（気導音）を聞きながら通話を行う者である。以下、利用者が発した音声のことを利用者音声ともいう。なお、本実施形態に係る利用者音声検出部７０１は、骨導音信号から利用者音声が検出された場合に、利用者音声を検出したと判定する。 The user voice detection unit 701 detects voice uttered by the user of the communication device 1 included in the bone conduction signal input from the second microphone 10. The user of the call device 1 is a person who makes a call while listening to the sound (air conduction sound) output from the receiver 8 of the call device 1. Hereinafter, the voice uttered by the user is also referred to as user voice. Note that the user voice detection unit 701 according to the present embodiment determines that the user voice is detected when the user voice is detected from the bone conduction signal.

伝達特性学習部７０２は、レシーバ８から第１のマイク９に伝播する音の伝達特性を学習する。本実施形態に係る伝達特性学習部７０２は、受話信号と、第１のマイク９から入力された気導音信号とを用いて伝達特性を学習する。また、本実施形態に係る送話信号処理部７では、利用者音声検出部７０１において利用者音声が検出されなかった場合に、伝達特性学習部７０２による伝達特性の学習を行う。 The transfer characteristic learning unit 702 learns the transfer characteristic of sound propagating from the receiver 8 to the first microphone 9. The transfer characteristic learning unit 702 according to the present embodiment learns transfer characteristics using the received signal and the air conduction sound signal input from the first microphone 9. Also, in the transmission signal processing unit 7 according to the present embodiment, when the user voice is not detected by the user voice detecting unit 701, the transfer characteristic learning unit 702 learns the transfer characteristic.

エコー抑圧部７０３は、伝達特性に基づいて気導音信号に含まれるエコー成分を抑圧する。エコー抑圧部７０３は、受話信号に伝達特性を適用して気導音信号に含まれるエコー成分を推定し、推定したエコー成分を気導音信号から除去する。利用者音声検出部７０１において利用者音声が検出されなかった場合、エコー抑圧部７０３は、伝達特性学習部７０２で学習した伝達特性に基づいて気導音信号のエコー成分を抑圧する。一方、利用者音声検出部７０１において利用者音声を検出した場合、エコー抑圧部７０３は、記憶部７１１に記憶させた伝達特性に基づいて気導音信号のエコー成分を抑圧する。 The echo suppression unit 703 suppresses an echo component included in the air conduction sound signal based on the transfer characteristic. The echo suppression unit 703 applies a transfer characteristic to the received signal to estimate an echo component included in the air conduction sound signal, and removes the estimated echo component from the air conduction sound signal. When the user voice is not detected by the user voice detection unit 701, the echo suppression unit 703 suppresses the echo component of the air conduction sound signal based on the transfer characteristic learned by the transfer characteristic learning unit 702. On the other hand, when user voice is detected by the user voice detection unit 701, the echo suppression unit 703 suppresses the echo component of the air conduction sound signal based on the transfer characteristics stored in the storage unit 711.

記憶部７１１には、伝達特性の初期値、及び伝達特性学習部７０２において学習した伝達特性を記憶させる。 The storage unit 711 stores the initial value of the transfer characteristic and the transfer characteristic learned by the transfer characteristic learning unit 702.

本実施形態の通話装置１と他の通話装置との呼接続が確立されると、音声信号処理部５の送話信号処理部７は、順次入力される気導音信号、骨導音信号、及び受話信号に基づいて、図３に示すようなエコー抑圧処理を行う。 When the call connection between the communication device 1 of the present embodiment and another communication device is established, the transmission signal processing unit 7 of the audio signal processing unit 5 is sequentially input air conduction sound signal, bone conduction sound signal, And echo suppression processing as shown in FIG. 3 is performed based on the received signal.

図３は、第１の実施形態に係るエコー抑圧処理を説明するフローチャートである。
送話信号処理部７は、図３に示すように、まず、音声信号の処理単位であるフレームを識別する変数ｔを１に初期化する（ステップＳ１）。 FIG. 3 is a flowchart for explaining echo suppression processing according to the first embodiment.
As shown in FIG. 3, the transmission signal processing unit 7 first initializes a variable t for identifying a frame, which is a processing unit of the audio signal, to 1 (step S1).

次に、送話信号処理部７は、フレームｔの気導音信号、骨導音信号、及び受話信号を入力する（ステップＳ２）。ステップＳ２において、送話信号処理部７は、例えば、骨導音信号及び受話信号を利用者音声検出部７０１に入力し、気導音信号を伝達特性学習部７０２及びエコー抑圧部７０３に入力する。 Next, the transmission signal processing unit 7 inputs the air conduction sound signal, the bone conduction sound signal, and the reception signal of the frame t (step S2). In step S2, the transmission signal processing unit 7 inputs, for example, the bone conduction sound signal and the reception signal to the user voice detection unit 701, and inputs the air conduction sound signal to the transfer characteristic learning unit 702 and the echo suppression unit 703. .

次に、送話信号処理部７は、骨導音信号を用いた利用者音声検出処理（ステップＳ３）を行う。ステップＳ３の処理は、利用者音声検出部７０１が行う。利用者音声検出部７０１は、ステップＳ２で入力されたフレームｔ（処理対象フレーム）の骨導音信号から音声を検出する処理を行う。また、利用者音声検出部７０１は、音声を検出する処理の結果に基づいて、利用者音声を検出したか否かを判定する（ステップＳ４）。利用者音声検出部７０１は、骨導音信号から利用者音声を検出した場合に、利用者音声を検出した（ステップＳ４；Ｙｅｓ）と判定する。利用者音声を検出した場合（ステップＳ４；Ｙｅｓ）、利用者音声検出部７０１は、エコー抑圧部７０３に、伝達特性に基づいて気導音信号のエコー成分を抑圧する処理（ステップＳ６）を行わせる。 Next, the transmission signal processing unit 7 performs user voice detection processing (step S3) using the bone conduction sound signal. The process of step S3 is performed by the user voice detection unit 701. The user voice detection unit 701 performs processing for detecting voice from the bone conduction sound signal of the frame t (processing target frame) input in step S2. Further, the user voice detection unit 701 determines whether or not a user voice has been detected based on the result of the voice detection process (step S4). When the user voice is detected from the bone conduction signal, the user voice detection unit 701 determines that the user voice has been detected (step S4; Yes). When the user voice is detected (step S4; Yes), the user voice detection unit 701 performs a process (step S6) of suppressing the echo component of the air conduction sound signal based on the transfer characteristic in the echo suppression unit 703. Make it.

一方、利用者音声が検出されなかった場合（ステップＳ４；Ｎｏ）、利用者音声検出部７０１は、伝達特性学習部７０２に、受話信号及び気導音信号を用いて伝達特性を学習する処理（ステップＳ５）を行わせる。ステップＳ５において、伝達特性学習部７０２は、フレームｔの気導音信号及び受話信号を取得し、現在の伝達特性と、フレームｔの気導音信号及び受話信号とに基づいて、フレームｔの気導音信号に適用する伝達特性を学習（算出）する。ここで、現在の伝達特性は、伝達特性の初期値、又は１フレーム前（フレームｔ−１）の気導音信号に対するエコー抑圧処理で用いた伝達特性である。伝達特性学習部７０２は、記憶部７１１から現在の伝達特性を読み出す。また、ステップＳ５で伝達特性を学習した場合、伝達特性学習部７０２は、学習した伝達特性をエコー抑圧部７０３に送信するとともに、記憶部７１１に記憶させる。 On the other hand, when the user voice is not detected (step S4; No), the user voice detection unit 701 causes the transfer characteristic learning unit 702 to learn the transfer characteristic using the received signal and the air conduction sound signal ( Step S5) is performed. In step S5, the transfer characteristic learning unit 702 acquires the air conduction sound signal and the reception signal of the frame t, and based on the current transfer characteristic and the air conduction sound signal and the reception signal of the frame t, the air characteristic sound of the frame t. Learn (calculate) transfer characteristics to be applied to the sound guide signal. Here, the current transfer characteristic is an initial value of the transfer characteristic or a transfer characteristic used in the echo suppression process for the air conduction sound signal one frame before (frame t−1). The transfer characteristic learning unit 702 reads the current transfer characteristic from the storage unit 711. When the transfer characteristic is learned in step S 5, the transfer characteristic learning unit 702 transmits the learned transfer characteristic to the echo suppression unit 703 and stores it in the storage unit 711.

伝達特性学習部７０２におけるステップＳ５の処理が終わると、次に、エコー抑圧部７０３が伝達特性に基づいて気導音信号のエコー成分を抑圧する（ステップＳ６）。フレームｔの気導音信号に対する処理において伝達特性学習部７０２が伝達特性を学習した場合、エコー抑圧部７０３は、学習後の伝達特性に基づいてフレームｔの気導音信号のエコー成分を抑圧する。一方、伝達特性学習部７０２が伝達特性を学習していない場合、エコー抑圧部７０３は、フレームｔの気導音信号を取得し、伝達特性の初期値又は１フレーム前の気導音信号に対するエコー抑圧処理で用いた伝達特性に基づいて気導音信号のエコー成分を抑圧する。この際、エコー抑圧部７０３は、記憶部７１１から伝達特性を読み出す。エコー抑圧部７０３は、エコー成分を抑圧した気導音信号をベースバンド処理部４に出力する。 When the process of step S5 in the transfer characteristic learning unit 702 is completed, the echo suppression unit 703 then suppresses the echo component of the air conduction sound signal based on the transfer characteristic (step S6). When the transfer characteristic learning unit 702 learns the transfer characteristic in the process for the air conduction sound signal of the frame t, the echo suppression unit 703 suppresses the echo component of the air conduction sound signal of the frame t based on the learned transfer characteristic. . On the other hand, when the transfer characteristic learning unit 702 has not learned the transfer characteristic, the echo suppression unit 703 acquires the air conduction sound signal of the frame t and echoes the initial value of the transfer characteristic or the air conduction sound signal one frame before. The echo component of the air conduction sound signal is suppressed based on the transfer characteristic used in the suppression process. At this time, the echo suppression unit 703 reads the transfer characteristic from the storage unit 711. The echo suppression unit 703 outputs the air conduction sound signal in which the echo component is suppressed to the baseband processing unit 4.

フレームｔの気導音信号に対するエコー成分の抑圧を終えると、送話信号処理部７は、エコー成分を抑圧したフレームｔが最終フレームであるか否かを判定する（ステップＳ７）。エコー成分を抑圧したフレームｔが最終フレームではない場合（ステップＳ７；Ｎｏ）、送話信号処理部７は、変数ｔをｔ＋１に更新し（ステップＳ８）、後続のフレームに対するステップＳ２〜Ｓ６の処理を行う。一方、エコー成分を抑圧したフレームｔが最終フレームである場合（ステップＳ７；Ｙｅｓ）、送話信号処理部７は、エコー抑圧処理を終了する。 When the suppression of the echo component with respect to the air conduction sound signal of the frame t is completed, the transmission signal processing unit 7 determines whether or not the frame t in which the echo component is suppressed is the final frame (step S7). When the frame t in which the echo component is suppressed is not the final frame (step S7; No), the transmission signal processing unit 7 updates the variable t to t + 1 (step S8), and performs the processing of steps S2 to S6 for the subsequent frame. I do. On the other hand, when the frame t in which the echo component is suppressed is the final frame (step S7; Yes), the transmission signal processing unit 7 ends the echo suppression process.

図４は、利用者音声検出処理の内容を説明するフローチャートである。
本実施形態のエコー抑圧処理における利用者音声検出処理（ステップＳ３）は、上記のように、利用者音声検出部７０１が骨導音信号を用いて行う。利用者音声検出部７０１は、図４に示すように、まず、フレームｔの骨導音信号のフレームパワーＰｄ（単位はｄＢ）を算出する（ステップＳ３０１）。ステップＳ３０１において、利用者音声検出部７０１は、フレームｔの骨導音信号に対する周波数解析により得た周波数スペクトル（パワースペクトル）に基づいてフレームパワーＰｄを算出する。骨導音信号に対する周波数解析は、利用者音声検出部７０１、又は図２には示していない周波数解析部が行う。 FIG. 4 is a flowchart for explaining the contents of the user voice detection process.
As described above, the user voice detection process (step S3) in the echo suppression process of the present embodiment is performed by the user voice detection unit 701 using the bone conduction sound signal. As shown in FIG. 4, the user voice detection unit 701 first calculates the frame power Pd (unit: dB) of the bone conduction sound signal of the frame t (step S301). In step S301, the user voice detection unit 701 calculates a frame power Pd based on a frequency spectrum (power spectrum) obtained by frequency analysis with respect to the bone conduction sound signal of the frame t. The frequency analysis for the bone conduction sound signal is performed by the user voice detection unit 701 or a frequency analysis unit not shown in FIG.

次に、利用者音声検出部７０１は、ステップＳ３０１で算出した骨導音信号のフレームパワーＰｄと、利用者音声がない場合のフレームパワーＰｆ（単位はｄＢ）との差分値ΔＰ＝Ｐｂ−Ｐｆを算出する（ステップＳ３０２）。ここで、利用者音声がない場合のフレームパワーＰｆは、第２のマイク１０から入力される音声信号（骨導音信号）が利用者音声を含まない場合のフレームパワーの平均値である。フレームパワーＰｆは、予め、第２のマイク１０により利用者音声を含まない音を収音して求めておく。 Next, the user sound detection unit 701 calculates a difference value ΔP = Pb−Pf between the frame power Pd of the bone conduction sound signal calculated in step S301 and the frame power Pf (unit: dB) when there is no user sound. Is calculated (step S302). Here, the frame power Pf when there is no user voice is an average value of the frame power when the voice signal (bone conduction signal) input from the second microphone 10 does not include the user voice. The frame power Pf is obtained in advance by collecting a sound that does not include user voice by the second microphone 10.

次に、利用者音声検出部７０１は、ステップＳ３０２で算出したフレームパワーの差分値ΔＰが判定閾値ＴＨｂよりも大きいか否かを判定する（ステップＳ３０３）。利用者が頭部に通話装置１を接触させた状態で発話した場合、第２のマイク１０から入力される骨導音信号には利用者音声が含まれる。そのため、骨導音信号に利用者音声が含まれる場合のフレームパワーＰｄは、利用者音声がない場合のフレームパワーＰｆよりも大きくなる。よって、判定閾値ＴＨｐは、任意の正の値（例えば６ｄＢ）とする。 Next, the user voice detection unit 701 determines whether or not the frame power difference value ΔP calculated in step S302 is larger than the determination threshold value THb (step S303). When the user speaks with the telephone device 1 in contact with the head, the bone conduction sound signal input from the second microphone 10 includes the user voice. Therefore, the frame power Pd when the user sound is included in the bone conduction sound signal is larger than the frame power Pf when there is no user sound. Therefore, the determination threshold value THp is an arbitrary positive value (for example, 6 dB).

ΔＰ＞ＴＨｂの場合（ステップＳ３０３；Ｙｅｓ）、利用者音声検出部７０１は、処理結果を「利用者音声を検出した」とし（ステップＳ３０４）、利用者音声検出処理を終了する（リターン）。一方、ΔＰ≦ＴＨｂの場合（ステップＳ３０３；Ｎｏ）、利用者音声検出部７０１は、処理結果を「利用者音声が検出されなかった」とし（ステップＳ３０５）、利用者音声検出処理を終了する。 When ΔP> THb (step S303; Yes), the user voice detection unit 701 determines that the process result is “user voice detected” (step S304), and ends the user voice detection process (return). On the other hand, if ΔP ≦ THb (step S303; No), the user voice detection unit 701 determines that the process result is “no user voice detected” (step S305), and ends the user voice detection process.

なお、図４の利用者音声検出処理は一例に過ぎず、骨導音信号から利用者音声を検出する処理は、骨導音信号の入力レベルが利用者音声を含むレベルであるか否かを判定できれば、他の方法で利用者音声を検出してもよい。 Note that the user voice detection process of FIG. 4 is merely an example, and the process of detecting the user voice from the bone conduction signal determines whether the input level of the bone conduction signal is a level including the user voice. If it can be determined, the user voice may be detected by another method.

上記の利用者音声検出処理（ステップＳ３０１〜Ｓ３０５）を終えると、利用者音声検出部７０１は、利用者音声を検出したか否かを判定する（ステップＳ４）。ステップＳ４の判定は、伝達特性を学習するか否かを判定するために行う。本実施形態の通信装置１では、利用者音声が検出されなかった場合（ステップＳ４；Ｎｏ）、伝達特性を学習する処理（ステップＳ５）を行う。すなわち、通信装置１は、骨導音信号から利用者音声が検出されなかった場合に伝達特性を学習する処理を行う。伝達特性を学習する処理は、伝達特性学習部７０２が行う。このため、利用者音声が検出されなかった場合（ステップＳ４；Ｎｏ）、利用者音声検出部７０１は、入力された受話信号を伝達特性学習部７０２に送信する。本実施形態に係る伝達特性学習部７０２は、図５に示したような処理を行う。 When the user voice detection process (steps S301 to S305) is completed, the user voice detection unit 701 determines whether or not a user voice has been detected (step S4). The determination in step S4 is performed to determine whether or not to learn transfer characteristics. In the communication apparatus 1 of this embodiment, when a user voice is not detected (step S4; No), a process of learning transfer characteristics (step S5) is performed. That is, the communication device 1 performs processing for learning transfer characteristics when no user voice is detected from the bone conduction sound signal. The transfer characteristic learning unit 702 performs the process of learning the transfer characteristic. For this reason, when the user voice is not detected (step S4; No), the user voice detection unit 701 transmits the input received signal to the transfer characteristic learning unit 702. The transfer characteristic learning unit 702 according to the present embodiment performs processing as illustrated in FIG.

図５は、伝達特性を学習する処理の内容を説明するフローチャートである。
伝達特性学習部７０２は、図５に示すように、まず、フレームｔの気導音信号及び受話信号の周波数スペクトルを取得する（ステップＳ５０１）。気導音信号及び受話信号の周波数スペクトルは、例えば、伝達特性学習部７０２において周波数解析を行って算出する。なお、気導音信号及び受話信号の周波数スペクトルは、図２には示していない周波数解析部において気導音信号及び受話信号の周波数解析を行って算出してもよい。 FIG. 5 is a flowchart for explaining the contents of the process of learning the transfer characteristics.
As shown in FIG. 5, the transfer characteristic learning unit 702 first obtains the frequency spectrum of the air conduction sound signal and the reception signal of the frame t (step S501). The frequency spectra of the air conduction sound signal and the reception signal are calculated by performing frequency analysis in the transfer characteristic learning unit 702, for example. The frequency spectrum of the air conduction sound signal and the reception signal may be calculated by performing frequency analysis of the air conduction sound signal and the reception signal in a frequency analysis unit not shown in FIG.

次に、伝達特性学習部７０２は、周波数スペクトルの周波数帯域を識別する変数ｉをｉ＝０に初期化する（ステップＳ５０２）。 Next, the transfer characteristic learning unit 702 initializes a variable i for identifying the frequency band of the frequency spectrum to i = 0 (step S502).

次に、伝達特性学習部７０２は、記憶部７１１から現在の伝達特性ＥＨ（ｉ，ｔ−１）を読み込む（ステップＳ５０３）。伝達特性ＥＨ（ｉ，ｔ−１）は、フレームｔ−１の気導音信号のエコー成分を抑圧する処理において周波数帯域ｉの振幅スペクトルに適用した伝達特性である。なお、変数ｔが初期値（例えばｔ＝１）の場合、伝達特性学習部７０２は、現在の伝達特性ＥＨ（ｉ，０）として、伝達特性の初期値を読み込む。 Next, the transfer characteristic learning unit 702 reads the current transfer characteristic EH (i, t−1) from the storage unit 711 (step S503). The transfer characteristic EH (i, t-1) is a transfer characteristic applied to the amplitude spectrum of the frequency band i in the process of suppressing the echo component of the air conduction sound signal of the frame t-1. When the variable t is an initial value (for example, t = 1), the transfer characteristic learning unit 702 reads the initial value of the transfer characteristic as the current transfer characteristic EH (i, 0).

次に、伝達特性学習部７０２は、気導音信号の振幅スペクトルＡ（ｉ，ｔ）、受話信号の振幅スペクトルＲｅｆ（ｉ，ｔ）、現在の伝達特性ＥＨ（ｉ，ｔ−１）に基づいて、伝達特性ＥＨ（ｉ，ｔ）を学習（算出）する（ステップＳ５０４）。伝達特性ＥＨ（ｉ，ｔ）は、気導音信号の振幅スペクトルＡ（ｉ，ｔ）に適用する伝達特性である。ステップＳ５０４において、伝達特性学習部７０２は、式（１）を用いて伝達特性ＥＨ（ｉ，ｔ）を算出する。 Next, the transfer characteristic learning unit 702 is based on the amplitude spectrum A (i, t) of the air conduction sound signal, the amplitude spectrum Ref (i, t) of the received signal, and the current transfer characteristic EH (i, t−1). Thus, the transfer characteristic EH (i, t) is learned (calculated) (step S504). The transfer characteristic EH (i, t) is a transfer characteristic applied to the amplitude spectrum A (i, t) of the air conduction sound signal. In step S504, the transfer characteristic learning unit 702 calculates the transfer characteristic EH (i, t) using the equation (1).

式（１）のαは伝達特性の更新係数であり、０＜α＜１の定数（例えばα＝０．９９）とする。 Α in equation (1) is a transfer characteristic update coefficient, and is a constant of 0 <α <1 (for example, α = 0.99).

次に、伝達特性学習部７０２は、全ての周波数帯域に対して処理をしたか否かを判定する（ステップＳ５０５）。未処理の周波数帯域がある場合（ステップＳ５０５；Ｎｏ）、伝達特性学習部７０２は、変数ｉをｉ＋１に更新し（ステップＳ５０６）、ステップＳ５０３及びＳ５０４の処理に行う。全ての周波数帯域に対して処理を行った場合（ステップＳ５０５；Ｙｅｓ）、伝達特性学習部７０２は、フレームｔの気導音信号に適用する伝達特性を学習する処理を終了する。この際、伝達特性学習部７０２は、例えば、学習した伝達特性ＥＨ（ｉ，ｔ）を記憶部７１１に記憶させる。また、伝達特性学習部７０２は、例えば、気導音信号及び受話信号の振幅スペクトルと、学習（算出）した伝達特性とをエコー抑圧部７０３に送信する。この場合、エコー抑圧部７０３は、伝達特性学習部７０２において学習した伝達特性ＥＨ（ｉ，ｔ）に基づいて、フレームｔの気導音信号のエコー成分を抑圧する処理（ステップＳ６）を行う。すなわち、利用者音声検出部７０１において利用者音声が検出されなかった場合、エコー抑圧部７０３は、学習した伝達特性ＥＨ（ｉ，ｔ）に基づいて、フレームｔの気導音信号のエコー成分を抑圧する。一方、利用者音声検出部７０１において利用者音声を検出した場合、エコー抑圧部７０３は、伝達特性ＥＨ（ｉ，ｔ−１）に基づいて、フレームｔの気導音信号に含まれるエコー成分を抑圧する。すなわち、利用者音声検出部７０１において利用者音声を検出した場合、エコー抑圧部７０３は、ＥＨ（ｉ，ｔ）＝ＥＨ（ｉ，ｔ−１）として、フレームｔの気導音信号に含まれるエコー成分を抑圧する。 Next, the transfer characteristic learning unit 702 determines whether or not processing has been performed for all frequency bands (step S505). When there is an unprocessed frequency band (step S505; No), the transfer characteristic learning unit 702 updates the variable i to i + 1 (step S506), and performs the processing of steps S503 and S504. When processing has been performed for all frequency bands (step S505; Yes), the transfer characteristic learning unit 702 ends the process of learning transfer characteristics to be applied to the air conduction sound signal of the frame t. At this time, the transfer characteristic learning unit 702 stores the learned transfer characteristic EH (i, t) in the storage unit 711, for example. Also, the transfer characteristic learning unit 702 transmits, for example, the amplitude spectra of the air conduction sound signal and the reception signal and the learned (calculated) transfer characteristic to the echo suppression unit 703. In this case, the echo suppression unit 703 performs a process (step S6) of suppressing the echo component of the air conduction sound signal of the frame t based on the transfer characteristic EH (i, t) learned by the transfer characteristic learning unit 702. That is, when the user voice is not detected by the user voice detection unit 701, the echo suppression unit 703 calculates the echo component of the air conduction sound signal of the frame t based on the learned transfer characteristic EH (i, t). Repress. On the other hand, when the user voice is detected by the user voice detection unit 701, the echo suppression unit 703 uses the echo component included in the air conduction sound signal of the frame t based on the transfer characteristic EH (i, t−1). Repress. That is, when user voice is detected by the user voice detection unit 701, the echo suppression unit 703 includes EH (i, t) = EH (i, t−1) and is included in the air conduction sound signal of the frame t. Suppresses the echo component.

図６は、エコー成分を抑圧する処理の内容を説明するフローチャートである。
エコー抑圧部７０３は、図６に示すように、まず、フレームｔの気導音信号及び受話信号の周波数スペクトル、並びに伝達特性を取得する（ステップＳ６０１）。フレームｔの気導音信号に適用する伝達特性ＥＨ（ｉ，ｔ）を学習した場合、エコー抑圧部７０３は、伝達特性学習部７０２から伝達特性ＥＨ（ｉ，ｔ）を取得する。一方、フレームｔの気導音信号に適用する伝達特性ＥＨ（ｉ，ｔ）を学習しなかった場合、エコー抑圧部７０３は、記憶部７１１から伝達特性ＥＨ（ｉ，ｔ−１）を読み出して伝達特性ＥＨ（ｉ，ｔ）とする。 FIG. 6 is a flowchart for explaining the content of the processing for suppressing the echo component.
As shown in FIG. 6, the echo suppression unit 703 first acquires the frequency spectrum of the air conduction sound signal and the reception signal of the frame t and the transfer characteristics (step S601). When the transfer characteristic EH (i, t) applied to the air conduction sound signal of the frame t is learned, the echo suppression unit 703 acquires the transfer characteristic EH (i, t) from the transfer characteristic learning unit 702. On the other hand, when the transfer characteristic EH (i, t) applied to the air conduction sound signal of the frame t is not learned, the echo suppression unit 703 reads the transfer characteristic EH (i, t−1) from the storage unit 711. It is assumed that the transfer characteristic EH (i, t).

次に、エコー抑圧部７０３は、周波数帯域を識別する変数ｉをｉ＝０に初期化する（ステップＳ６０２）。 Next, the echo suppression unit 703 initializes a variable i for identifying a frequency band to i = 0 (step S602).

次に、エコー抑圧部７０３は、受話信号の振幅スペクトルＲｅｆ（ｉ，ｔ）、及び伝達特性ＥＨ（ｉ，ｔ）に基づいて、エコー推定信号Ｅest（ｉ，ｔ）を算出する（ステップＳ６０３）。エコー推定信号Ｅest（ｉ，ｔ）は、気導音信号における周波数帯域ｉの振幅スペクトルに含まれるエコー成分の推定値を表す。エコー抑圧部７０３は、下記式（２）を用いてエコー推定信号Ｅest（ｉ，ｔ）を算出する。 Next, the echo suppression unit 703 calculates an echo estimation signal Eest (i, t) based on the amplitude spectrum Ref (i, t) of the received signal and the transfer characteristic EH (i, t) (step S603). . The echo estimation signal Eest (i, t) represents an estimated value of an echo component included in the amplitude spectrum of the frequency band i in the air conduction sound signal. The echo suppression unit 703 calculates an echo estimation signal Eest (i, t) using the following equation (2).

Ｅest（ｉ，ｔ）＝Ｒｅｆ（ｉ，ｔ）×ＥＨ（ｉ，ｔ）・・・（２） Eest (i, t) = Ref (i, t) × EH (i, t) (2)

式（２）のＲｅｆ（ｉ，ｔ）及びＥＨ（ｉ，ｔ）は、それぞれ、受話信号における周波数帯域ｉの振幅スペクトル、及び周波数帯域ｉの気導音信号に適用する伝達特性である。 Ref (i, t) and EH (i, t) in Expression (2) are transfer characteristics applied to the amplitude spectrum of the frequency band i and the air conduction sound signal of the frequency band i in the received signal, respectively.

次に、エコー抑圧部７０３は、気導音信号における周波数帯域ｉの振幅スペクトルＡ（ｉ）、及びエコー推定信号Ｅest（ｉ，ｔ）に基づいて、エコー成分を抑圧した振幅スペクトルを算出する（ステップＳ６０４）。エコー抑圧部７０３は、下記式（３）を用いてエコー成分を抑圧した振幅スペクトルＡmod（ｉ，ｔ）を算出する。 Next, the echo suppression unit 703 calculates an amplitude spectrum in which the echo component is suppressed based on the amplitude spectrum A (i) of the frequency band i in the air conduction sound signal and the echo estimation signal Eest (i, t) ( Step S604). The echo suppression unit 703 calculates an amplitude spectrum Amod (i, t) in which the echo component is suppressed using the following equation (3).

Ａmod（ｉ，ｔ）＝Ａ（ｉ，ｔ）−Ｅest（ｉ，ｔ）・・・（３） Amod (i, t) = A (i, t) −Eest (i, t) (3)

次に、エコー抑圧部７０３は、全ての周波数帯域に対して処理をしたか否かを判定する（ステップＳ６０５）。未処理の周波数帯域がある場合（ステップＳ６０５；Ｎｏ）、エコー抑圧部７０３は、変数ｉをｉ＋１に更新し（ステップＳ６０６）、ステップＳ６０３及びＳ６０４の処理を行う。そして、全ての周波数帯域に対して処理を行った場合（ステップＳ６０５；Ｙｅｓ）、エコー抑圧部７０３は、エコー成分を抑圧した気導音信号の周波数スペクトルを時間領域の信号に変換する（ステップＳ６０７）。その後、エコー抑圧部７０３は、時間領域の信号に変換された気導音信号を出力してエコー成分を抑圧する処理を終了する。 Next, the echo suppression unit 703 determines whether or not processing has been performed for all frequency bands (step S605). If there is an unprocessed frequency band (step S605; No), the echo suppression unit 703 updates the variable i to i + 1 (step S606), and performs the processes of steps S603 and S604. When processing is performed for all frequency bands (step S605; Yes), the echo suppression unit 703 converts the frequency spectrum of the air conduction sound signal in which the echo component is suppressed into a signal in the time domain (step S607). ). After that, the echo suppressing unit 703 ends the process of outputting the air conduction sound signal converted into the time domain signal and suppressing the echo component.

第１のマイク９から入力される気導音信号は、通話装置１を頭部に接触させた状態の利用者が発話した音声だけでなく、利用者の周囲を伝播する雑音等が含まれる。そのため、気導音信号を用いて利用者音声を検出する処理を行った場合、利用者音声が小さい、又は雑音が大きい等の理由により、利用者音声を含んでいるにもかかわらず利用者音声が検出されないことがある。このように処理対象の音声信号（フレーム）に含まれる利用者音声の検出に失敗した場合、適切な伝達特性に基づいてエコー成分を適切に抑圧することができない可能性がある。 The air conduction sound signal input from the first microphone 9 includes not only the voice uttered by the user in a state where the telephone conversation device 1 is in contact with the head, but also noise that propagates around the user. Therefore, when the process for detecting the user voice using the air conduction sound signal is performed, the user voice is included even though the user voice is included because the user voice is low or the noise is high. May not be detected. As described above, when the detection of the user voice included in the audio signal (frame) to be processed fails, there is a possibility that the echo component cannot be appropriately suppressed based on an appropriate transfer characteristic.

これに対し、本実施形態のエコー抑圧処理では、上記のように、第２のマイク１０から入力された骨導音信号を用いて利用者音声を検出する。第２のマイク１０で収音する骨導音は、利用者（話者）が発話した際に利用者の頭骨や皮膚組織等を伝播する音、言い換えると固体を伝播する音である。そのため、第２のマイク１０から入力される骨導音信号は、利用者の周囲を伝播する雑音（気導音）等の成分が非常に少ない。よって、第２のマイク１０から入力された骨導音信号を用いて利用者音声を検出することにより、利用者音声の検出精度の低下を抑制することができる。すなわち、本実施形態によれば、利用者音声の有無に基づいて伝達特性を学習するか否かを判定する際の誤判定を低減することが可能となる。したがって、本実施形態によれば、誤った伝達特性を学習することによるエコー成分の抑圧性能の低下、言い換えると誤った（不適切な）伝達特性を用いることによるエコーの発生や送話音質の劣化等を防止することが可能となる。 On the other hand, in the echo suppression process of the present embodiment, the user voice is detected using the bone conduction sound signal input from the second microphone 10 as described above. The bone conduction sound collected by the second microphone 10 is a sound that propagates through the user's skull, skin tissue, or the like when the user (speaker) speaks, in other words, a sound that propagates through the solid. Therefore, the bone conduction sound signal input from the second microphone 10 has very few components such as noise (air conduction sound) propagating around the user. Therefore, by detecting the user voice using the bone conduction sound signal input from the second microphone 10, it is possible to suppress a decrease in the detection accuracy of the user voice. That is, according to the present embodiment, it is possible to reduce erroneous determination when determining whether to learn transfer characteristics based on the presence or absence of user voice. Therefore, according to the present embodiment, the echo component suppression performance is reduced by learning an erroneous transfer characteristic, in other words, the generation of echo and the deterioration of the transmission sound quality by using an incorrect (inappropriate) transfer characteristic. Etc. can be prevented.

［第２の実施形態］
図７は、第２の実施形態に係る通話装置における送話信号処理部の機能的構成を示す図である。 [Second Embodiment]
FIG. 7 is a diagram illustrating a functional configuration of a transmission signal processing unit in the communication device according to the second embodiment.

本実施形態に係る通話装置１は、携帯電話端末等、通話が可能な移動体通信装置であり、その機能的構成は第１の実施形態に係る通話装置１と同様である。 The call device 1 according to the present embodiment is a mobile communication device capable of making a call such as a mobile phone terminal, and the functional configuration thereof is the same as that of the call device 1 according to the first embodiment.

なお、図７に示すように、本実施形態の通話装置１における送話信号処理部７は、利用者音声検出部７０１と、骨導音補正部７０４と、伝達特性学習部７０２と、エコー抑圧部７０３と、第１の記憶部７１１と、第２の記憶部７１２と、を含む。 As shown in FIG. 7, the transmission signal processing unit 7 in the communication device 1 of the present embodiment includes a user voice detection unit 701, a bone conduction correction unit 704, a transfer characteristic learning unit 702, and echo suppression. Unit 703, first storage unit 711, and second storage unit 712.

利用者音声検出部７０１は、第２のマイク１０から入力される骨導音信号に含まれる通話装置１の利用者が発した音声を検出する。なお、本実施形態においても、骨導音信号から利用者音声が検出された場合にのみ、利用者音声を検出したと判定する。 The user voice detection unit 701 detects voice uttered by the user of the communication device 1 included in the bone conduction signal input from the second microphone 10. In the present embodiment, it is determined that the user voice is detected only when the user voice is detected from the bone conduction sound signal.

骨導音補正部７０４は、第２の記憶部７１２に記憶させた骨導音補正特性に基づいて、骨導音信号を補正する。本実施形態の送話信号処理部７では、利用者音声検出部７０１において利用者音声を検出した場合に、骨導音補正部７０４による骨導音信号の補正を行う。骨導音補正特性は、骨導音信号における利用者音声と気導音信号における利用者音声との対応関係を表す特性である。すなわち、本実施形態の送話信号処理部７では、骨導音補正部７０４において骨導音信号を補正することにより、気導音信号に含まれる利用者音声を推定する。 The bone conduction sound correction unit 704 corrects the bone conduction sound signal based on the bone conduction sound correction characteristic stored in the second storage unit 712. In the transmission signal processing unit 7 of the present embodiment, when the user voice is detected by the user voice detection unit 701, the bone conduction sound correction unit 704 corrects the bone conduction signal. The bone conduction sound correction characteristic is a characteristic representing a correspondence relationship between the user voice in the bone conduction sound signal and the user voice in the air conduction sound signal. That is, in the transmission signal processing unit 7 according to the present embodiment, the bone conduction sound correction unit 704 corrects the bone conduction sound signal, thereby estimating the user voice included in the air conduction sound signal.

伝達特性学習部７０２は、レシーバ８から第１のマイク９に伝播する音の伝達特性を学習する。本実施形態に係る伝達特性学習部７０２は、利用者音声が検出されなかった場合には受話信号及び気導音信号を用いて伝達特性を学習し、利用者音声を検出した場合には受話信号、気導音信号、及び補正した骨導音信号を用いて伝達特性を学習する。 The transfer characteristic learning unit 702 learns the transfer characteristic of sound propagating from the receiver 8 to the first microphone 9. The transfer characteristic learning unit 702 according to the present embodiment learns the transfer characteristic using the received signal and the air conduction sound signal when the user voice is not detected, and receives the received signal when the user voice is detected. The transfer characteristic is learned using the air conduction sound signal and the corrected bone conduction sound signal.

エコー抑圧部７０３は、伝達特性学習部７０２において学習した伝達特性に基づいて、気導音信号に含まれるエコー成分を抑圧する。エコー抑圧部７０３は、受話信号に伝達特性を適用して気導音信号に含まれるエコー成分を推定し、推定したエコー成分を気導音信号から除去する。 The echo suppression unit 703 suppresses an echo component included in the air conduction sound signal based on the transfer characteristic learned by the transfer characteristic learning unit 702. The echo suppression unit 703 applies a transfer characteristic to the received signal to estimate an echo component included in the air conduction sound signal, and removes the estimated echo component from the air conduction sound signal.

第１の記憶部７１１には、伝達特性の初期値、及び伝達特性学習部７０２において学習した伝達特性を記憶させる。第２の記憶部７１２には、骨導音補正特性を記憶させる。 The first storage unit 711 stores the initial value of the transfer characteristic and the transfer characteristic learned by the transfer characteristic learning unit 702. The second storage unit 712 stores bone conduction sound correction characteristics.

本実施形態の通話装置１と他の通話装置との呼接続が確立されると、音声信号処理部５の送話信号処理部７は、順次入力される気導音信号、骨導音信号、及び受話信号に基づいて、図８Ａ及び図８Ｂに示すようなエコー抑圧処理を行う。 When the call connection between the communication device 1 of the present embodiment and another communication device is established, the transmission signal processing unit 7 of the audio signal processing unit 5 is sequentially input air conduction sound signal, bone conduction sound signal, Based on the received signal, echo suppression processing as shown in FIGS. 8A and 8B is performed.

図８Ａは、第２の実施形態に係るエコー抑圧処理を説明するフローチャート（その１）である。図８Ｂは、第２の実施形態に係るエコー抑圧処理を説明するフローチャート（その２）である。 FIG. 8A is a flowchart (part 1) for explaining echo suppression processing according to the second embodiment. FIG. 8B is a flowchart (part 2) for explaining echo suppression processing according to the second embodiment.

送話信号処理部７は、図８Ａに示すように、まず、音声信号の処理単位であるフレームを識別する変数ｔを１に初期化する（ステップＳ１）。 As shown in FIG. 8A, the transmission signal processing unit 7 first initializes a variable t for identifying a frame, which is a processing unit of an audio signal, to 1 (step S1).

次に、送話信号処理部７は、フレームｔの気導音信号、骨導音信号、及び受話信号を入力する（ステップＳ２）。ステップＳ２において、送話信号処理部７は、例えば、骨導音信号及び受話信号を利用者音声検出部７０１に入力し、気導音信号を伝達特性学習部７０２に入力する。 Next, the transmission signal processing unit 7 inputs the air conduction sound signal, the bone conduction sound signal, and the reception signal of the frame t (step S2). In step S 2, for example, the transmission signal processing unit 7 inputs the bone conduction sound signal and the reception signal to the user voice detection unit 701, and inputs the air conduction sound signal to the transfer characteristic learning unit 702.

次に、送話信号処理部７は、骨導音信号を用いた利用者音声検出処理（ステップＳ３）を行う。ステップＳ３の処理は、利用者音声検出部７０１が行う。利用者音声検出部７０１は、ステップＳ２で入力されたフレームｔ（処理対象フレーム）の骨導音信号から利用者音声を検出する処理を行う。利用者音声検出部７０１は、ステップＳ３の処理として、例えば、第１の実施形態で説明したステップＳ３０１〜Ｓ３０５の処理を行う（図４参照）。また、利用者音声検出部７０１は、利用者音声検出処理の結果に基づいて、利用者音声を検出したか否かを判定する（ステップＳ４）。利用者音声検出部７０１は、ステップＳ３において骨導音信号から利用者音声を検出した場合に、利用者音声を検出したと判定する。利用者音声が検出されなかった場合（ステップＳ４；Ｎｏ）、利用者音声検出部７０１は、伝達特性学習部７０２に、受話信号及び気導音信号を用いて伝達特性を学習する処理（ステップＳ５）を行わせる。この場合、伝達特性学習部７０２は、ステップＳ５の処理として、例えば、第１の実施形態で説明したステップＳ５０１〜Ｓ５０６の処理を行う（図５参照）。 Next, the transmission signal processing unit 7 performs user voice detection processing (step S3) using the bone conduction sound signal. The process of step S3 is performed by the user voice detection unit 701. The user voice detection unit 701 performs a process of detecting user voice from the bone conduction sound signal of the frame t (processing target frame) input in step S2. The user voice detection unit 701 performs, for example, the processes of steps S301 to S305 described in the first embodiment as the process of step S3 (see FIG. 4). Further, the user voice detection unit 701 determines whether user voice has been detected based on the result of the user voice detection process (step S4). The user voice detection unit 701 determines that the user voice has been detected when the user voice is detected from the bone conduction sound signal in step S3. When the user voice is not detected (step S4; No), the user voice detection unit 701 causes the transfer characteristic learning unit 702 to learn the transfer characteristic using the received signal and the air conduction sound signal (step S5). ). In this case, the transfer characteristic learning unit 702 performs, for example, the processes of steps S501 to S506 described in the first embodiment as the process of step S5 (see FIG. 5).

一方、利用者音声を検出した場合（ステップＳ４；Ｙｅｓ）、利用者音声検出部７０１は、骨導音補正部７０４に骨導音信号を補正させる（ステップＳ１１）。骨導音補正部７０４は、骨導音信号の周波数スペクトルにおける各周波数帯域の振幅スペクトルに第２の記憶部７１２から読み出した骨導音補正特性を適用して各振幅スペクトルを補正する。骨導音信号を補正すると、骨導音補正部７０４は、伝達特性学習部７０２に、受話信号、気導音信号及び補正した骨導音信号を用いて伝達特性を学習する処理（ステップＳ１２）を行わせる。この場合、伝達特性学習部７０２は、ステップＳ１２の処理として、例えば、第１の実施形態で説明したステップＳ５０１〜Ｓ５０６と同様の処理を行う。ただし、ステップＳ１２では、伝達特性学習部７０２は、ステップＳ５０１と対応する処理として、フレームｔの受話信号、気導音信号及び補正後の骨導音信号の周波数スペクトル（振幅スペクトル）を取得する処理を行う。また、ステップＳ１２では、伝達特性学習部７０２は、ステップＳ５０４と対応する処理として、フレームｔの受話信号、気導音信号及び補正後の骨導音信号の振幅スペクトルと伝達特性ＥＨ（ｉ，ｔ−１）とに基づいて伝達特性ＥＨ（ｉ，ｔ）を算出する処理を行う。 On the other hand, when the user voice is detected (step S4; Yes), the user voice detection unit 701 causes the bone conduction sound correction unit 704 to correct the bone conduction sound signal (step S11). The bone conduction sound correction unit 704 corrects each amplitude spectrum by applying the bone conduction sound correction characteristic read from the second storage unit 712 to the amplitude spectrum of each frequency band in the frequency spectrum of the bone conduction signal. When the bone conduction sound signal is corrected, the bone conduction sound correction unit 704 causes the transfer characteristic learning unit 702 to learn the transfer characteristic using the received signal, the air conduction sound signal, and the corrected bone conduction signal (step S12). To do. In this case, the transfer characteristic learning unit 702 performs, for example, the same processes as steps S501 to S506 described in the first embodiment as the process of step S12. However, in step S12, the transfer characteristic learning unit 702 obtains the frequency spectrum (amplitude spectrum) of the received signal, the air conduction sound signal, and the corrected bone conduction sound signal in frame t as a process corresponding to step S501. I do. In step S12, the transfer characteristic learning unit 702 performs the processing corresponding to step S504, the amplitude spectrum of the reception signal, the air conduction sound signal, and the corrected bone conduction signal in frame t, and the transfer characteristic EH (i, t -1) and a process of calculating the transfer characteristic EH (i, t).

ステップＳ５又はＳ１２による伝達特性の学習を終えると、送話信号処理部７は、次に、伝達特性に基づいて気導音信号のエコー成分を抑圧する（ステップＳ６）。ステップＳ６の処理は、エコー抑圧部７０３が行う。エコー抑圧部７０３は、ステップＳ６の処理として、例えば、第１の実施形態で説明したステップＳ６０１〜Ｓ６０７の処理を行う（図６参照）。 After completing the learning of the transfer characteristic in step S5 or S12, the transmission signal processing unit 7 then suppresses the echo component of the air conduction sound signal based on the transfer characteristic (step S6). The echo suppression unit 703 performs the process in step S6. The echo suppression unit 703 performs, for example, the processes of steps S601 to S607 described in the first embodiment as the process of step S6 (see FIG. 6).

エコー抑圧部７０３がステップＳ６の処理を終えると、送話信号処理部７は、エコー成分を抑圧したフレームｔが最終フレームであるか否かを判定する（ステップＳ７）。エコー成分を抑圧したフレームｔが最終フレームではない場合（ステップＳ７；Ｎｏ）、送話信号処理部７は、変数ｔをｔ＋１に更新し（ステップＳ８）、後続のフレームに対するステップＳ２〜Ｓ６，Ｓ１１，及びＳ１２の処理を行う。一方、エコー成分を抑圧したフレームｔが最終フレームである場合（ステップＳ７；Ｙｅｓ）、送話信号処理部７は、エコー抑圧処理を終了する。 When the echo suppression unit 703 finishes the process of step S6, the transmission signal processing unit 7 determines whether or not the frame t in which the echo component is suppressed is the final frame (step S7). When the frame t in which the echo component is suppressed is not the final frame (step S7; No), the transmission signal processing unit 7 updates the variable t to t + 1 (step S8), and steps S2 to S6 and S11 for the subsequent frames. , And S12. On the other hand, when the frame t in which the echo component is suppressed is the final frame (step S7; Yes), the transmission signal processing unit 7 ends the echo suppression process.

このように、本実施形態に係るエコー抑圧処理では、利用者音声が検出されなかった場合には気導音信号を用いて伝達特性を学習し、利用者音声を検出した場合には気導音信号及び骨導音信号を用いて伝達特性を学習する。また、気導音信号及び骨導音信号を用いて伝達特性を学習する際には、骨導音補正部７０４において、骨導音補正特性に基づいて骨導音信号を補正する処理（ステップＳ１１）を行う。骨導音補正部７０４は、ステップＳ１１の処理として、図９に示したような処理を行う。 As described above, in the echo suppression processing according to the present embodiment, when the user voice is not detected, the transfer characteristic is learned using the air conduction sound signal, and when the user voice is detected, the air conduction sound is detected. The transfer characteristic is learned using the signal and the bone conduction sound signal. Further, when learning the transfer characteristic using the air conduction sound signal and the bone conduction sound signal, the bone conduction sound correction unit 704 corrects the bone conduction sound signal based on the bone conduction sound correction characteristic (step S11). )I do. The bone conduction sound correction unit 704 performs the process shown in FIG. 9 as the process of step S11.

図９は、骨導音信号を補正する処理の内容を説明するフローチャートである。
骨導音補正部７０４は、図９に示すように、まず、フレームｔの骨導音信号の周波数スペクトルを取得する（ステップＳ１１０１）。骨導音補正部７０４は、利用者音声検出部７０１、又は図７には示していない周波数解析部から骨導音信号の周波数スペクトルを取得する。 FIG. 9 is a flowchart for explaining the content of the process for correcting the bone conduction sound signal.
As shown in FIG. 9, the bone conduction sound correction unit 704 first acquires the frequency spectrum of the bone conduction sound signal of the frame t (step S1101). The bone conduction sound correction unit 704 acquires the frequency spectrum of the bone conduction sound signal from the user voice detection unit 701 or the frequency analysis unit not shown in FIG.

次に、骨導音補正部７０４は、周波数帯域を識別する変数ｉをｉ＝０に初期化する（ステップＳ１１０２）。 Next, the bone conduction sound correcting unit 704 initializes a variable i for identifying a frequency band to i = 0 (step S1102).

次に、骨導音補正部７０４は、骨導音信号における周波数帯域ｉの振幅スペクトルＢ（ｉ，ｔ）と、骨導音補正特性coef（ｉ，ｔ）とに基づいて、骨導音信号における周波数帯域の振幅スペクトルを補正する（ステップＳ１１０３）。ステップＳ１１０３において、骨導音補正部７０４は、式（４）を用いて振幅スペクトルの補正値Ｂmod（ｉ，ｔ）を算出する。 Next, the bone conduction sound correction unit 704 generates a bone conduction sound signal based on the amplitude spectrum B (i, t) of the frequency band i in the bone conduction signal and the bone conduction sound correction characteristic coef (i, t). The amplitude spectrum of the frequency band at is corrected (step S1103). In step S1103, the bone conduction sound correction unit 704 calculates an amplitude spectrum correction value Bmod (i, t) using the equation (4).

Ｂmod（ｉ，ｔ）＝Ｂ（ｉ，ｔ）×coef（ｉ，ｔ）・・・（４） Bmod (i, t) = B (i, t) × coef (i, t) (4)

次に、骨導音補正部７０４は、全ての周波数帯域に対して処理をしたか否かを判定する（ステップＳ１１０４）。未処理の周波数帯域がある場合（ステップＳ１１０４；Ｎｏ）、骨導音補正部７０４は、変数ｉをｉ＋１に更新し（ステップＳ１１０５）、ステップＳ１１０３の処理を行う。そして、全ての周波数帯域に対して処理を行った場合（ステップＳ１１０４；Ｙｅｓ）、骨導音補正部７０４は、補正した骨導音信号（各周波数帯域の振幅スペクトルＢmod（ｉ，ｔ））を伝達特性学習部７０２に送信し、骨導音信号を補正する処理を終了する。 Next, the bone conduction sound correction unit 704 determines whether or not processing has been performed for all frequency bands (step S1104). When there is an unprocessed frequency band (step S1104; No), the bone conduction sound correction unit 704 updates the variable i to i + 1 (step S1105), and performs the process of step S1103. When processing is performed for all frequency bands (step S1104; Yes), the bone conduction sound correcting unit 704 outputs the corrected bone conduction signal (amplitude spectrum Bmod (i, t) of each frequency band). The process is transmitted to the transfer characteristic learning unit 702, and the process of correcting the bone conduction sound signal is terminated.

このように骨導音補正部７０４において骨導音信号を補正した場合、伝達特性学習部７０２は、受話信号と、気導音信号と、補正した骨導音信号とを用いて伝達特性を学習する。この場合、伝達特性学習部７０２は、下記式（５）を用いてフレームｔの気導音信号に適用する伝達特性を学習（算出）する。 When the bone conduction sound correction unit 704 corrects the bone conduction sound signal in this way, the transfer characteristic learning unit 702 learns the transmission characteristic using the received signal, the air conduction sound signal, and the corrected bone conduction signal. To do. In this case, the transfer characteristic learning unit 702 learns (calculates) the transfer characteristic applied to the air conduction sound signal of the frame t using the following formula (5).

式（５）のＡ（ｉ，ｔ）は、現フレーム（フレームｔ）の気導音信号における周波数帯域ｉの振幅スペクトルである。式（５）のＲｅｆ（ｉ，ｔ）は、現フレームの受話信号における周波数帯域ｉの振幅スペクトルである。式（５）のＥＨ（ｉ，ｔ−１）は、１フレーム前（フレームｔ−１）の気導音信号における周波数帯域ｉの振幅スペクトルに適用した伝達特性である。式（５）のαは伝達特性の更新係数であり、本実施形態では０＜α＜１の定数（例えばα＝０．９９）とする。すなわち、式（５）を用いた伝達特性の学習方法では、フレームｔの受話信号と、フレームｔの気導音信号から補正した骨導音信号を減算した音声信号と、に基づいて伝達特性を学習する。 A (i, t) in Expression (5) is an amplitude spectrum of the frequency band i in the air conduction sound signal of the current frame (frame t). Ref (i, t) in Expression (5) is an amplitude spectrum of the frequency band i in the received signal of the current frame. EH (i, t−1) in Expression (5) is a transfer characteristic applied to the amplitude spectrum of the frequency band i in the air conduction sound signal one frame before (frame t−1). Α in Expression (5) is a transfer characteristic update coefficient, and in the present embodiment, a constant of 0 <α <1 (for example, α = 0.99). That is, in the transfer characteristic learning method using equation (5), the transfer characteristic is determined based on the received signal of frame t and the audio signal obtained by subtracting the bone conduction sound signal corrected from the air conduction sound signal of frame t. learn.

なお、骨導音信号を補正する処理（ステップＳ１１）で用いる骨導音補正特性は、例えば、図１０に示したような方法で、通話装置１とは別の情報処理装置を用いて予め算出しておく。 Note that the bone conduction sound correction characteristic used in the process of correcting the bone conduction sound signal (step S11) is calculated in advance using an information processing device different from the communication device 1, for example, by the method shown in FIG. Keep it.

図１０は、骨導音補正特性の算出方法の例を説明するフローチャートである。
骨導音補正特性を算出する情報処理装置は、図１０に示すように、まず、気導音信号及び骨導音信号の周波数スペクトルのサンプルを取得する（ステップＳ２１）。ステップＳ２１において、情報処理装置は、通話装置１の第１のマイク９から入力される気導音信号及び第２のマイク１０から入力される骨導音信号を、それぞれ複数フレームずつ取得する。 FIG. 10 is a flowchart for explaining an example of a method for calculating the bone conduction sound correction characteristic.
As illustrated in FIG. 10, the information processing apparatus that calculates the bone conduction sound correction characteristic first acquires a sample of the air conduction sound signal and the frequency spectrum of the bone conduction sound signal (step S21). In step S 21, the information processing apparatus acquires an air conduction sound signal input from the first microphone 9 and a bone conduction sound signal input from the second microphone 10 of the communication device 1 for each of a plurality of frames.

次に、情報処理装置は、周波数帯域を識別する変数ｉをｉ＝０に初期化する（ステップＳ２２）。 Next, the information processing apparatus initializes a variable i for identifying a frequency band to i = 0 (step S22).

次に、情報処理装置は、気導音信号の平均振幅スペクトルＦａ（ｉ）及び骨導音信号の平均振幅スペクトルＦｂ（ｉ）を算出する（ステップＳ２３）。気導音信号の平均振幅スペクトルＦａ（ｉ）は、複数フレーム分の気導音信号における周波数帯域ｉの振幅スペクトルに基づいて算出する。同様に、骨導音信号の平均振幅スペクトルＦｂ（ｉ）は、複数フレーム分の骨導音信号における周波数帯域ｉの振幅スペクトルに基づいて算出する。 Next, the information processing apparatus calculates an average amplitude spectrum Fa (i) of the air conduction sound signal and an average amplitude spectrum Fb (i) of the bone conduction sound signal (step S23). The average amplitude spectrum Fa (i) of the air conduction sound signal is calculated based on the amplitude spectrum of the frequency band i in the air conduction sound signals for a plurality of frames. Similarly, the average amplitude spectrum Fb (i) of the bone conduction sound signal is calculated based on the amplitude spectrum of the frequency band i in the bone conduction sound signals for a plurality of frames.

次に、情報処理装置は、周波数帯域ｉの気導音信号の平均振幅スペクトルＦａ（ｉ）と、骨導音信号の平均振幅スペクトルＦｂ（ｉ）とを用いて、周波数帯域ｉの振幅スペクトルに対する骨導音補正特性coef（ｉ）を算出する（ステップＳ２４）。骨導音補正特性coef（ｉ）は、例えば、下記式（６）により算出する。 Next, the information processing apparatus uses the average amplitude spectrum Fa (i) of the air conduction sound signal in the frequency band i and the average amplitude spectrum Fb (i) of the bone conduction sound signal to the amplitude spectrum in the frequency band i. A bone conduction sound correction characteristic coef (i) is calculated (step S24). The bone conduction sound correction characteristic coef (i) is calculated by, for example, the following formula (6).

次に、情報処理装置は、全ての周波数帯域に対して処理をしたか否かを判定する（ステップＳ２５）。未処理の周波数帯域がある場合（ステップＳ２５；Ｎｏ）、情報処理装置は、変数ｉをｉ＋１に更新し（ステップＳ２６）、ステップＳ２３及びＳ２４の処理を行う。そして、全ての周波数帯域に対して処理を行った場合（ステップＳ２５；Ｙｅｓ）、情報処理装置は、骨導音補正特性の算出処理を終了する。このようにして得た各周波数帯域ｉの骨導音補正特性coef（ｉ）の組は、通信装置１の第２の記憶部７１２に記憶させる。 Next, the information processing apparatus determines whether or not processing has been performed for all frequency bands (step S25). When there is an unprocessed frequency band (step S25; No), the information processing apparatus updates the variable i to i + 1 (step S26), and performs the processes of steps S23 and S24. When the processing is performed on all the frequency bands (step S25; Yes), the information processing apparatus ends the bone conduction sound correction characteristic calculation processing. The set of bone conduction sound correction characteristics coef (i) of each frequency band i obtained in this way is stored in the second storage unit 712 of the communication device 1.

式（６）により算出した骨導音補正特性coef（ｉ）は、骨導音信号における周波数帯域ｉの平均振幅スペクトルＦｂ（ｉ）に対する、気導音信号における周波数帯域ｉの平均振幅スペクトルＦａ（ｉ）である。そのため、骨導音補正特性coef（ｉ）を用いた式（４）により骨導音信号の補正することは、骨導音信号から気導音信号に含まれる利用者音声を推定することともいえる。すなわち、式（４）を用いて骨導音信号を補正することにより、送話信号処理部７は、気導音信号に含まれる利用者の音声を推定することが可能となる。そのため、伝達特性学習部７０２は、気導音信号及び補正した骨導音信号に基づいて、利用者音声を含まない気導音信号を推定することが可能となる。したがって、第１のマイク９から入力された気導音信号に利用者音声が含まれる場合でも、伝達特性学習部７０２は、推定した気導音信号に基づいて利用者音声を含まない場合と同等の信頼度で伝達特性を学習することが可能となる。すなわち、本実施形態に係るエコー抑圧処理によれば、第１のマイク９から入力された気導音信号に含まれる利用者音声が正しく検出されない場合に誤った伝達特性を学習してしまうことを防止でき、エコー成分の抑圧性能の低下を防止することが可能となる。 The bone conduction sound correction characteristic coef (i) calculated by the equation (6) has an average amplitude spectrum Fa (of the frequency band i in the air conduction sound signal to the average amplitude spectrum Fb (i) of the frequency band i in the bone conduction signal. i). Therefore, correcting the bone conduction sound signal by the equation (4) using the bone conduction sound correction characteristic coef (i) can be said to estimate the user voice included in the air conduction sound signal from the bone conduction sound signal. . That is, by correcting the bone conduction sound signal using Expression (4), the transmission signal processing unit 7 can estimate the user's voice included in the air conduction sound signal. Therefore, the transfer characteristic learning unit 702 can estimate an air conduction sound signal that does not include the user voice based on the air conduction sound signal and the corrected bone conduction sound signal. Therefore, even when the air conduction sound signal input from the first microphone 9 includes the user sound, the transfer characteristic learning unit 702 is equivalent to the case where the user sound is not included based on the estimated air conduction sound signal. It is possible to learn transfer characteristics with a certain degree of reliability. That is, according to the echo suppression processing according to the present embodiment, when the user voice included in the air conduction sound signal input from the first microphone 9 is not correctly detected, an erroneous transfer characteristic is learned. Therefore, it is possible to prevent a reduction in echo component suppression performance.

また、本実施形態に係るエコー抑圧処理では、第１のマイク９から入力された気導音信号に利用者音声が含まれる場合にも気導音信号から利用者音声を除去して伝達特性を学習（推定）することが可能となる。よって、本実施形態のエコー抑圧処理によれば、利用者の発話中に伝達特性が変化した場合にも、適切な伝達特性を学習してエコー成分を抑圧することが可能となる。したがって、利用者の発話中においても、誤った（不適切な）伝達特性を用いることによるエコーの発生や送話音質の劣化等を防止することが可能となる Further, in the echo suppression processing according to the present embodiment, even when the user's voice is included in the air conduction sound signal input from the first microphone 9, the user voice is removed from the air conduction sound signal to obtain the transfer characteristic. It is possible to learn (estimate). Therefore, according to the echo suppression processing of the present embodiment, even when the transfer characteristic changes during the user's utterance, it is possible to learn the appropriate transfer characteristic and suppress the echo component. Therefore, even during the user's utterance, it is possible to prevent the occurrence of echoes and the deterioration of the transmission sound quality due to using an incorrect (inappropriate) transfer characteristic.

［第３の実施形態］
図１１は、第３の実施形態に係る通話装置における送話信号処理部の機能的構成を示す図である。 [Third Embodiment]
FIG. 11 is a diagram illustrating a functional configuration of a transmission signal processing unit in the communication device according to the third embodiment.

本実施形態に係る通信装置１は、携帯電話端末等、通話が可能な移動体通信装置であり、その機能的構成は第１の実施形態に係る通話装置１と同様である。 The communication device 1 according to the present embodiment is a mobile communication device capable of making a call such as a mobile phone terminal, and the functional configuration thereof is the same as that of the call device 1 according to the first embodiment.

なお、図１１に示すように、本実施形態の通話装置１における送話信号処理部７は、利用者音声検出部７０１と、信頼度算出部７０５と、骨導音補正部７０４と、伝達特性学習部７０２と、エコー抑圧部７０３と、を含む。また、送話信号処理部７は、第１の記憶部７１１と、第２の記憶部７１２と、を含む。 As shown in FIG. 11, the transmission signal processing unit 7 in the communication device 1 of the present embodiment includes a user voice detection unit 701, a reliability calculation unit 705, a bone conduction sound correction unit 704, and a transfer characteristic. A learning unit 702 and an echo suppression unit 703 are included. The transmission signal processing unit 7 includes a first storage unit 711 and a second storage unit 712.

利用者音声検出部７０１は、第２のマイク１０から入力される骨導音信号に含まれる通話装置１の利用者が発した音声を検出する。なお、本実施形態においても、利用者音声検出部７０１は、骨導音信号から利用者音声が検出された場合に、利用者音声を検出したと判定する。 The user voice detection unit 701 detects voice uttered by the user of the communication device 1 included in the bone conduction signal input from the second microphone 10. Also in this embodiment, the user voice detection unit 701 determines that the user voice has been detected when the user voice is detected from the bone conduction sound signal.

信頼度算出部７０５は、第２のマイク１０から入力された骨導音信号の信頼度を算出する。ここで、骨導音信号の信頼度は、入力された骨導音信号が利用者の発した音声（骨導音）を反映した音声信号であることの信頼性を表す値である。本実施形態の信頼度算出部７０５は、骨導音信号及び気導音信号に基づいて骨導音信号の信頼度を算出する。本実施形態の送話信号処理部７では、利用者音声検出部７０１において利用者音声を検出した場合に、信頼度算出部７０５において骨導音信号の信頼度を算出する。信頼度算出部７０５で算出した骨導音信号の信頼度は、伝達特性の学習に補正した骨導音信号を用いるか否かの判定に用いる。 The reliability calculation unit 705 calculates the reliability of the bone conduction sound signal input from the second microphone 10. Here, the reliability of the bone conduction sound signal is a value representing the reliability that the input bone conduction signal is an audio signal reflecting the voice (bone conduction sound) uttered by the user. The reliability calculation unit 705 of the present embodiment calculates the reliability of the bone conduction sound signal based on the bone conduction sound signal and the air conduction sound signal. In the transmission signal processing unit 7 of the present embodiment, when the user voice is detected by the user voice detection unit 701, the reliability calculation unit 705 calculates the reliability of the bone conduction sound signal. The reliability of the bone conduction sound signal calculated by the reliability calculation unit 705 is used to determine whether or not to use the corrected bone conduction signal for learning of transfer characteristics.

骨導音補正部７０４は、第２の記憶部７１２に記憶させた骨導音補正特性に基づいて、骨導音信号を補正する。本実施形態の送話信号処理部７では、利用者音声検出部７０１において利用者音声を検出し、かつ骨導音信号の信頼度が閾値よりも大きい場合に、骨導音補正部７０４による骨導音信号の補正を行う。 The bone conduction sound correction unit 704 corrects the bone conduction sound signal based on the bone conduction sound correction characteristic stored in the second storage unit 712. In the transmission signal processing unit 7 of the present embodiment, when the user voice is detected by the user voice detection unit 701 and the reliability of the bone conduction signal is larger than the threshold value, the bone conduction sound correction unit 704 generates a bone. Correct the sound guide signal.

伝達特性学習部７０２は、レシーバ８から第１のマイク９に伝播する音の伝達特性を学習する。本実施形態に係る伝達特性学習部７０２は、利用者音声が検出されなかった場合、及び利用者音声を検出したが骨導音信号の信頼度が閾値以下である場合には、受話信号及び気導音信号を用いて伝達特性を学習する。また、伝達特性学習部７０２は、利用者音声を検出し、かつ骨導音信号の信頼度が閾値よりも大きい場合には、受話信号、気導音信号及び補正した骨導音信号を用いて伝達特性を学習する。 The transfer characteristic learning unit 702 learns the transfer characteristic of sound propagating from the receiver 8 to the first microphone 9. The transfer characteristic learning unit 702 according to the present embodiment detects the user signal and the voice when the user voice is not detected, and when the user voice is detected but the reliability of the bone conduction sound signal is equal to or less than the threshold. The transfer characteristic is learned using the sound guide signal. In addition, the transfer characteristic learning unit 702 detects the user voice and, when the reliability of the bone conduction sound signal is larger than the threshold value, uses the reception signal, the air conduction sound signal, and the corrected bone conduction sound signal. Learn transfer characteristics.

本実施形態の通話装置１と他の通話装置との呼接続が確立されると、音声信号処理部５の送話信号処理部７は、順次入力される気導音信号、骨導音信号、及び受話信号に基づいて、図１２Ａ及び図１２Ｂに示すようなエコー抑圧処理を行う。 When the call connection between the communication device 1 of the present embodiment and another communication device is established, the transmission signal processing unit 7 of the audio signal processing unit 5 is sequentially input air conduction sound signal, bone conduction sound signal, Based on the received signal, echo suppression processing as shown in FIGS. 12A and 12B is performed.

図１２Ａは、第３の実施形態に係るエコー抑圧処理を説明するフローチャート（その１）である。図１２Ｂは、第３の実施形態に係るエコー抑圧処理を説明するフローチャート（その２）である。 FIG. 12A is a flowchart (part 1) for explaining echo suppression processing according to the third embodiment. FIG. 12B is a flowchart (part 2) illustrating the echo suppression processing according to the third embodiment.

送話信号処理部７は、図１２Ａに示すように、まず、音声信号の処理単位であるフレームを識別する変数ｔを１に初期化する（ステップＳ１）。 As shown in FIG. 12A, the transmission signal processing unit 7 first initializes a variable t for identifying a frame, which is a processing unit of the audio signal, to 1 (step S1).

次に、送話信号処理部７は、フレームｔの気導音信号、骨導音信号、及び受話信号を入力する（ステップＳ２）。ステップＳ２において、送話信号処理部７は、例えば、利用者音声検出部７０１に骨導音信号及び受話信号を入力する。また、送話信号処理部７は、例えば、伝達特性学習部７０２に気導音信号を入力する。更に、送話信号処理部７は、信頼度算出部７０５に骨導音信号及び気導音信号を入力する。 Next, the transmission signal processing unit 7 inputs the air conduction sound signal, the bone conduction sound signal, and the reception signal of the frame t (step S2). In step S 2, the transmission signal processing unit 7 inputs a bone conduction sound signal and a reception signal to the user voice detection unit 701, for example. In addition, the transmission signal processing unit 7 inputs an air conduction sound signal to the transfer characteristic learning unit 702, for example. Further, the transmission signal processing unit 7 inputs the bone conduction sound signal and the air conduction sound signal to the reliability calculation unit 705.

次に、送話信号処理部７は、骨導音信号を用いた利用者音声検出処理（ステップＳ３）を行う。ステップＳ３の処理は、利用者音声検出部７０１が行う。利用者音声検出部７０１は、ステップＳ２で入力されたフレームｔ（処理対象フレーム）の骨導音信号から利用者音声を検出する処理を行う。利用者音声検出部７０１は、ステップＳ３の処理として、例えば、第１の実施形態で説明したステップＳ３０１〜Ｓ３０５の処理を行う（図４参照）。また、利用者音声検出部７０１は、利用者音声検出処理の結果に基づいて、利用者音声を検出したか否かを判定する（ステップＳ４）。利用者音声検出部７０１は、骨導音信号から利用者音声を検出した場合に、利用者音声を検出したと判定する。利用者音声が検出されなかった場合（ステップＳ４；Ｎｏ）、利用者音声検出部７０１は、伝達特性学習部７０２に、受話信号及び気導音信号を用いて伝達特性を学習する処理（ステップＳ５）を行わせる。この場合、伝達特性学習部７０２は、ステップＳ５の処理として、例えば、第１の実施形態で説明したステップＳ５０１〜Ｓ５０６の処理を行う（図５参照）。 Next, the transmission signal processing unit 7 performs user voice detection processing (step S3) using the bone conduction sound signal. The process of step S3 is performed by the user voice detection unit 701. The user voice detection unit 701 performs a process of detecting user voice from the bone conduction sound signal of the frame t (processing target frame) input in step S2. The user voice detection unit 701 performs, for example, the processes of steps S301 to S305 described in the first embodiment as the process of step S3 (see FIG. 4). Further, the user voice detection unit 701 determines whether user voice has been detected based on the result of the user voice detection process (step S4). The user voice detection unit 701 determines that the user voice has been detected when the user voice is detected from the bone conduction sound signal. When the user voice is not detected (step S4; No), the user voice detection unit 701 causes the transfer characteristic learning unit 702 to learn the transfer characteristic using the received signal and the air conduction sound signal (step S5). ). In this case, the transfer characteristic learning unit 702 performs, for example, the processes of steps S501 to S506 described in the first embodiment as the process of step S5 (see FIG. 5).

一方、利用者音声を検出した場合（ステップＳ４；Ｙｅｓ）、利用者音声検出部７０１は、信頼度算出部７０５に骨導音信号の信頼度を算出させる（ステップＳ９）。信頼度算出部７０５は、例えば、骨導音信号と気導音信号との相関係数に基づいて骨導音信号の信頼度を算出する。また、信頼度算出部７０５は、算出した骨導音信号の信頼度を利用者音声検出部７０１に通知する。利用者音声検出部７０１は、骨導音信号の信頼度を受け取ると、受け取った信頼度が閾値ＴＨよりも大きいか否かを判定する（ステップＳ１０）。骨導音信号の信頼度が閾値ＴＨ以下の場合（ステップＳ１０；Ｎｏ）、利用者音声検出部７０１は、伝達特性学習部７０２に、受話信号及び気導音信号を用いて伝達特性を学習する処理（ステップＳ５）を行わせる。 On the other hand, when the user voice is detected (step S4; Yes), the user voice detection unit 701 causes the reliability calculation unit 705 to calculate the reliability of the bone conduction sound signal (step S9). For example, the reliability calculation unit 705 calculates the reliability of the bone conduction sound signal based on the correlation coefficient between the bone conduction sound signal and the air conduction sound signal. In addition, the reliability calculation unit 705 notifies the user voice detection unit 701 of the reliability of the calculated bone conduction sound signal. Upon receiving the reliability of the bone conduction sound signal, the user voice detection unit 701 determines whether the received reliability is greater than the threshold value TH (step S10). When the reliability of the bone conduction sound signal is equal to or less than the threshold value TH (step S10; No), the user voice detection unit 701 learns the transfer characteristic using the reception signal and the air conduction sound signal in the transfer characteristic learning unit 702. The process (step S5) is performed.

骨導音信号の信頼度が閾値ＴＨよりも大きい場合（ステップＳ１０；Ｙｅｓ）、利用者音声検出部７０１は、骨導音補正部７０４に骨導音信号を補正させる（ステップＳ１１）。骨導音補正部７０４は、ステップＳ１１の処理として、例えば、第２の実施形態で説明したステップＳ１１０１〜Ｓ１１０５の処理を行う（図９参照）。 When the reliability of the bone conduction sound signal is larger than the threshold value TH (step S10; Yes), the user sound detection unit 701 causes the bone conduction sound correction unit 704 to correct the bone conduction sound signal (step S11). The bone conduction sound correction unit 704 performs, for example, the processes of steps S1101 to S1105 described in the second embodiment as the process of step S11 (see FIG. 9).

骨導音信号を補正すると、骨導音補正部７０４は、伝達特性学習部７０２に、受話信号、気導音信号及び補正した骨導音信号を用いて伝達特性を学習する処理（ステップＳ１２）を行わせる。この場合、伝達特性学習部７０２は、ステップＳ１２の処理として、例えば、第１の実施形態で説明したステップＳ５０１〜Ｓ５０６と同様の処理を行う。ただし、ステップＳ１２では、伝達特性学習部７０２は、ステップＳ５０１と対応する処理として、フレームｔの受話信号、気導音信号及び補正後の骨導音信号の周波数スペクトル（振幅スペクトル）を取得する処理を行う。また、ステップＳ１２では、伝達特性学習部７０２は、ステップＳ５０４と対応する処理として、フレームｔの受話信号、気導音信号及び補正後の骨導音信号の振幅スペクトルと伝達特性ＥＨ（ｉ，ｔ−１）とに基づいて伝達特性ＥＨ（ｉ，ｔ）を算出する処理を行う。 When the bone conduction sound signal is corrected, the bone conduction sound correction unit 704 causes the transfer characteristic learning unit 702 to learn the transfer characteristic using the received signal, the air conduction sound signal, and the corrected bone conduction signal (step S12). To do. In this case, the transfer characteristic learning unit 702 performs, for example, the same processes as steps S501 to S506 described in the first embodiment as the process of step S12. However, in step S12, the transfer characteristic learning unit 702 obtains the frequency spectrum (amplitude spectrum) of the received signal, the air conduction sound signal, and the corrected bone conduction sound signal in frame t as a process corresponding to step S501. I do. In step S12, the transfer characteristic learning unit 702 performs the processing corresponding to step S504, the amplitude spectrum of the reception signal, the air conduction sound signal, and the corrected bone conduction signal in frame t, and the transfer characteristic EH (i, t -1) and a process of calculating the transfer characteristic EH (i, t).

ステップＳ５又はＳ１２による伝達特性の学習を終えると、送話信号処理部７は、図１２Ｂに示すように、次に、伝達特性に基づいて気導音信号のエコー成分を抑圧する（ステップＳ６）。ステップＳ６の処理は、エコー抑圧部７０３が行う。エコー抑圧部７０３は、ステップＳ６の処理として、例えば、第１の実施形態で説明したステップＳ６０１〜Ｓ６０７の処理を行う（図６参照）。 When the transmission characteristic learning in step S5 or S12 is completed, the transmission signal processing unit 7 next suppresses the echo component of the air conduction sound signal based on the transmission characteristic as shown in FIG. 12B (step S6). . The echo suppression unit 703 performs the process in step S6. The echo suppression unit 703 performs, for example, the processes of steps S601 to S607 described in the first embodiment as the process of step S6 (see FIG. 6).

エコー抑圧部７０３がステップＳ６の処理を終えると、送話信号処理部７は、エコー成分を抑圧したフレームｔが最終フレームであるか否かを判定する（ステップＳ７）。エコー成分を抑圧したフレームｔが最終フレームではない場合（ステップＳ７；Ｎｏ）、送話信号処理部７は、変数ｔをｔ＋１に更新し（ステップＳ８）、後続のフレームに対するステップＳ２〜Ｓ６，及びＳ９〜Ｓ１２の処理を行う。一方、エコー成分を抑圧したフレームｔが最終フレームである場合（ステップＳ７；Ｙｅｓ）、送話信号処理部７は、エコー抑圧処理を終了する。 When the echo suppression unit 703 finishes the process of step S6, the transmission signal processing unit 7 determines whether or not the frame t in which the echo component is suppressed is the final frame (step S7). When the frame t in which the echo component is suppressed is not the final frame (step S7; No), the transmission signal processing unit 7 updates the variable t to t + 1 (step S8), and steps S2 to S6, and The process of S9-S12 is performed. On the other hand, when the frame t in which the echo component is suppressed is the final frame (step S7; Yes), the transmission signal processing unit 7 ends the echo suppression process.

このように、本実施形態に係るエコー抑圧処理では、利用者音声を検出した場合に、骨導音信号の信頼度に基づいて骨導音信号を用いた伝達特性の学習を行うか否かを判定する。すなわち、骨導音信号から利用者音声を検出したとしても、骨導音信号の信頼度が低い場合には骨導音信号を用いずに伝達特性を学習する。そのため、信頼度の低い骨導音信号を用いて伝達特性を学習することによる伝達特性の信頼度の低下を防ぐことが可能となる。したがって、骨導音信号から利用者音声を検出した場合に、信頼度の高い伝達特性に基づいてより適切にエコー成分を抑圧することが可能となる。 As described above, in the echo suppression processing according to the present embodiment, whether or not learning of transfer characteristics using a bone conduction sound signal is performed based on the reliability of the bone conduction signal when a user voice is detected. judge. That is, even if the user voice is detected from the bone conduction signal, if the reliability of the bone conduction signal is low, the transfer characteristic is learned without using the bone conduction signal. Therefore, it is possible to prevent a decrease in the reliability of the transfer characteristic due to learning of the transfer characteristic using the bone conduction sound signal with low reliability. Therefore, when the user voice is detected from the bone conduction sound signal, the echo component can be more appropriately suppressed based on the highly reliable transfer characteristic.

図１３は、骨導音信号の信頼度の算出方法を説明するグラフである。
本実施形態のエコー抑圧処理では、上記のように、骨導音信号の信頼度に基づいて骨導音信号を用いた伝達特性の学習を行うか否かを判定する。骨導音信号の信頼度は、骨導音信号と気導音信号との相関係数に基づいて算出する。骨導音信号と気導音信号との相関係数corrは、下記式（７）を用いて算出する。 FIG. 13 is a graph illustrating a method for calculating the reliability of the bone conduction sound signal.
In the echo suppression processing of this embodiment, as described above, it is determined whether or not to perform transfer characteristic learning using the bone conduction sound signal based on the reliability of the bone conduction sound signal. The reliability of the bone conduction sound signal is calculated based on the correlation coefficient between the bone conduction sound signal and the air conduction sound signal. The correlation coefficient corr between the bone conduction sound signal and the air conduction sound signal is calculated using the following equation (7).

式（７）のＮはフレームのサンプル数であり、８kHzサンプリングの場合、Ｎ＝１６０である。式（７）のｓａ_ｊ及びｓｂ_ｊは、それぞれ、気導音信号におけるｊ番目のサンプル、及び骨導音信号におけるｊ番目のサンプルである。 N in Equation (7) is the number of samples in the frame, and N = 160 in the case of 8 kHz sampling. In the equation (7), sa _j and sb _j are the j-th sample in the air conduction sound signal and the j-th sample in the bone conduction sound signal, respectively.

相関係数corrに基づいて骨導音信号の信頼度を算出する際には、例えば、図１３に示すような相関係数corrと信頼度Ｒとの対応関係に基づいて算出する。すなわち、骨導音信号の信頼度Ｒは、下記式（８）を用いて算出する。 When calculating the reliability of the bone conduction sound signal based on the correlation coefficient corr, for example, it is calculated based on the correspondence between the correlation coefficient corr and the reliability R as shown in FIG. That is, the reliability R of the bone conduction sound signal is calculated using the following formula (8).

式（８）における第１の相関閾値corrL及び第２の相関閾値corrHは、０＜corrL＜corrH＜１を満たす任意の値とし、例えば、corrL＝０．２、corrH＝０．７とする。 The first correlation threshold corrL and the second correlation threshold corrH in Expression (8) are arbitrary values that satisfy 0 <corrL <corrH <1, for example, corrL = 0.2 and corrH = 0.7.

第２のマイク１０で骨導音を収音するには、利用者の頭部に通話装置１を接触させた状態で利用者が発話する必要がある。このとき、利用者の頭部と通話装置１との接触状態が不安定であると、利用者が発した音声と骨導音信号に含まれる利用者音声とに差異が生じ、骨導音信号から気導音信号に含まれる利用者音声を正しく推定することが困難となる。そのため、本実施形態に係るエコー抑圧処理では、骨導音信号の信頼度Ｒに基づいて伝達特性の学習に骨導音信号を用いるか否かを判定する。そして、骨導音信号の信頼度が低く気導音信号に含まれる利用者音声を正しく推定することが困難であると判定した場合、送話信号処理部７は、受話信号及び気導音信号のみを用いて伝達特性を学習する。すなわち、送話信号処理部７は、骨導音信号から利用者音声を検出したとしても、骨導音信号の信頼度Ｒが低い場合には骨導音信号を用いずに伝達特性を学習する。よって、本実施形態によれば、信頼度の低い骨導音信号（言い換えると利用者音声を適切に反映していない骨導音信号）を用いて誤った伝達特性を学習することを防ぐことが可能となる。 In order to collect the bone conduction sound with the second microphone 10, the user needs to speak while the communication device 1 is in contact with the user's head. At this time, if the contact state between the user's head and the communication device 1 is unstable, there is a difference between the voice uttered by the user and the user voice included in the bone conduction sound signal. Therefore, it is difficult to correctly estimate the user voice included in the air conduction sound signal. Therefore, in the echo suppression processing according to the present embodiment, it is determined based on the reliability R of the bone conduction sound signal whether or not the bone conduction sound signal is used for learning transfer characteristics. When it is determined that the reliability of the bone conduction sound signal is low and it is difficult to correctly estimate the user voice included in the air conduction sound signal, the transmission signal processing unit 7 receives the reception signal and the air conduction sound signal. Learning transfer characteristics using only That is, even if the transmission signal processing unit 7 detects the user voice from the bone conduction signal, if the reliability R of the bone conduction signal is low, the transmission signal processing unit 7 learns the transfer characteristics without using the bone conduction signal. . Therefore, according to the present embodiment, it is possible to prevent learning an erroneous transfer characteristic by using a bone conduction sound signal with low reliability (in other words, a bone conduction sound signal that does not appropriately reflect the user voice). It becomes possible.

なお、利用者音声を検出しかつ骨導音信号の信頼度Ｒが低い場合（ステップＳ１０；Ｎｏ）、例えば、第１の実施形態に係るエコー抑圧処理において利用者音声を検出したときのように、伝達特性を学習せずにエコー成分を抑圧してもよい。 When the user voice is detected and the reliability R of the bone conduction sound signal is low (step S10; No), for example, when the user voice is detected in the echo suppression processing according to the first embodiment. The echo component may be suppressed without learning the transfer characteristics.

また、本実施形態のように骨導音信号の信頼度Ｒを算出する場合、信頼度Ｒに基づいて伝達特性の学習に骨導音信号を用いるか否かを判定する代わりに、骨導音信号の信頼度Ｒに応じて式（５）における伝達特性の更新係数αの値を変更してもよい。 Further, when the reliability R of the bone conduction sound signal is calculated as in the present embodiment, instead of determining whether or not the bone conduction sound signal is used for learning of the transfer characteristics based on the reliability R, the bone conduction sound is determined. Depending on the signal reliability R, the value of the transfer characteristic update coefficient α in the equation (5) may be changed.

図１４は、伝達特性の更新係数の算出方法を説明するグラフである。
骨導音信号を用いた伝達特性の学習（算出）に用いる式（５）では、更新係数αの値が小さくなるほど、算出した伝達特性ＥＨ（ｉ，ｔ）における骨導音信号の振幅スペクトルＢmod（ｉ，ｔ）の寄与度が大きくなる。そのため、例えば、骨導音信号の信頼度Ｒが低い場合には算出した伝達特性ＥＨ（ｉ，ｔ）におけるフレームｔの骨導音信号の寄与度が小さくなるよう更新係数αを決定する。 FIG. 14 is a graph illustrating a method for calculating the transfer coefficient update coefficient.
In Expression (5) used for transfer characteristic learning (calculation) using the bone conduction sound signal, the amplitude spectrum Bmod of the bone conduction sound signal in the calculated transmission characteristic EH (i, t) as the update coefficient α decreases. The contribution of (i, t) increases. Therefore, for example, when the reliability R of the bone conduction sound signal is low, the update coefficient α is determined so that the contribution degree of the bone conduction signal of the frame t in the calculated transfer characteristic EH (i, t) is small.

骨導音信号の信頼度Ｒに応じて伝達特性の更新係数αを変更する場合、更新係数αの値は、例えば、図１４に示すような信頼度Ｒと更新係数αとの対応関係に基づいて変更する。すなわち、骨導音信号の信頼度Ｒに応じて伝達特性の更新係数αを変更する場合、更新係数αは、下記式（９）を用いて算出する。 When changing the update coefficient α of the transfer characteristic according to the reliability R of the bone conduction sound signal, the value of the update coefficient α is based on, for example, the correspondence between the reliability R and the update coefficient α as shown in FIG. To change. That is, when the update coefficient α of the transfer characteristic is changed according to the reliability R of the bone conduction sound signal, the update coefficient α is calculated using the following formula (9).

更新係数αの最小値αminは、０＜αmin＜１の任意の値とし、例えば、αmin＝０．９５とする。また、第１の判定閾値RL及び第２の判定閾値RHは、０＜RL＜RH＜１を満たす任意の値とし、例えば、RL＝０．２、RH＝０．７とする。 The minimum value αmin of the update coefficient α is an arbitrary value of 0 <αmin <1, for example, αmin = 0.95. The first determination threshold RL and the second determination threshold RH are arbitrary values that satisfy 0 <RL <RH <1, for example, RL = 0.2 and RH = 0.7.

骨導音信号の信頼度Ｒに応じて更新係数αを変更する場合、例えば、図１２Ａの骨導音信号を用いて伝達特性を学習する処理（ステップＳ１２）において、式（９）により更新係数αを算出（決定）する。 When the update coefficient α is changed according to the reliability R of the bone conduction sound signal, for example, in the process of learning transfer characteristics using the bone conduction sound signal of FIG. 12A (step S12), the update coefficient is expressed by Expression (9). α is calculated (determined).

このように、骨導音信号の信頼度Ｒが低くなると更新係数αが大きくなるように信頼度Ｒに応じて更新係数αを変更することで、骨導音信号の信頼度Ｒが低い場合に式（５）で算出される伝達特性における骨導音信号の寄与度を小さくすることが可能となる。そのため、信頼度の低い骨導音信号（言い換えると利用者の音声を適切に反映していない骨導音信号）を用いて誤った伝達特性を学習（算出）することを防ぐことが可能となる。 As described above, when the reliability R of the bone conduction sound signal is low by changing the update coefficient α according to the reliability R so that the update coefficient α is increased when the reliability R of the bone conduction sound signal is low. It is possible to reduce the contribution degree of the bone conduction sound signal to the transfer characteristic calculated by Expression (5). Therefore, it is possible to prevent learning (calculation) of erroneous transfer characteristics using a bone conduction sound signal with low reliability (in other words, a bone conduction signal that does not properly reflect the user's voice). .

また、図１４に示したような０≦Ｒ≦１の信頼度Ｒと更新係数αとの対応関係を参照して更新係数αを決定する場合、Ｒ＜ＲＬであると更新係数αが１となり、式（５）で算出した伝達特性に対する骨導音信号の寄与度は０となる。そのため、図１４に示したような０≦Ｒ≦１の信頼度Ｒと更新係数αとの対応関係を参照して更新係数αを決定する場合、例えば、図１２ＡにおけるステップＳ１０の判定を省略可能である。 Further, when the update coefficient α is determined with reference to the correspondence relationship between the reliability R of 0 ≦ R ≦ 1 and the update coefficient α as shown in FIG. 14, the update coefficient α becomes 1 when R <RL. The contribution degree of the bone conduction sound signal to the transfer characteristic calculated by Expression (5) is zero. Therefore, when determining the update coefficient α with reference to the correspondence relationship between the reliability R of 0 ≦ R ≦ 1 and the update coefficient α as shown in FIG. 14, for example, the determination in step S10 in FIG. 12A can be omitted. It is.

また、図１２Ａ及び図１２Ｂのエコー抑圧処理において骨導音信号の信頼度Ｒに応じて伝達特性の更新係数αを変更する場合、信頼度Ｒと更新係数αとの対応関係は、例えば、信頼度Ｒが閾値ＴＨよりも大きい範囲（ＴＨ＜Ｒ≦１）のみを用意してもよい。 When the transfer characteristic update coefficient α is changed in accordance with the reliability R of the bone conduction sound signal in the echo suppression processing of FIGS. 12A and 12B, the correspondence relationship between the reliability R and the update coefficient α is, for example, the reliability Only a range in which the degree R is greater than the threshold value TH (TH <R ≦ 1) may be prepared.

［第４の実施形態］
本実施形態では、通話装置１が利用者の頭部から受ける圧力（押圧荷重）に基づいて骨導音信号の信頼度Ｒを算出する通話装置について説明する。 [Fourth Embodiment]
In the present embodiment, a communication device that calculates the reliability R of the bone conduction sound signal based on the pressure (pressing load) that the communication device 1 receives from the user's head will be described.

図１５は、第４の実施形態に係る通話装置における要部の機能的構成を示す図である。
図１５に示すように、本実施形態の通話装置１は、レシーバ８と、第１のマイク９と、第２のマイク１０と、送話信号処理部７と、圧力センサ１５と、を備える。なお、本実施形態の通話装置１は、図１５には示していないＲＦ送受信部２、アンテナ３、ベースバンド処理部４、受話信号処理部８等を備える。 FIG. 15 is a diagram illustrating a functional configuration of a main part in the communication device according to the fourth embodiment.
As shown in FIG. 15, the communication device 1 of this embodiment includes a receiver 8, a first microphone 9, a second microphone 10, a transmission signal processing unit 7, and a pressure sensor 15. Note that the communication device 1 according to the present embodiment includes an RF transmission / reception unit 2, an antenna 3, a baseband processing unit 4, a received signal processing unit 8 and the like not shown in FIG.

圧力センサ１５は、通話時に通話装置１が利用者の頭部から受ける圧力の検出に用いる。そのため、圧力センサ１５は、通話装置１において通話時に利用者の頭部と対向する面内の利用者の頭部が接触する領域に印加される圧力を検出可能な態様で通話装置１に搭載される。 The pressure sensor 15 is used to detect the pressure that the communication device 1 receives from the user's head during a call. Therefore, the pressure sensor 15 is mounted on the call device 1 in a manner capable of detecting the pressure applied to the area where the user's head is in contact with the user's head in the plane facing the user's head during the call in the call device 1. The

送話信号処理部７は、利用者音声検出部７０１と、信頼度算出部７０６と、骨導音補正部７０４と、伝達特性学習部７０２と、エコー抑圧部７０３と、第１の記憶部７１１と、第２の記憶部７１２と、を含む。 The transmission signal processing unit 7 includes a user voice detection unit 701, a reliability calculation unit 706, a bone conduction sound correction unit 704, a transfer characteristic learning unit 702, an echo suppression unit 703, and a first storage unit 711. And a second storage unit 712.

信頼度算出部７０６は、第２のマイク１０から入力された骨導音信号の信頼度を算出する。本実施形態の信頼度算出部７０６は、圧力センサ１５による圧力の検出結果に基づいて骨導音信号の信頼度を算出する。本実施形態の送話信号処理部７では、利用者音声検出部７０１において利用者音声を検出した場合に、信頼度算出部７０６において骨導音信号の信頼度を算出する。信頼度算出部７０６で算出した骨導音信号の信頼度は、伝達特性の学習に補正した骨導音信号を用いるか否かの判定に用いる。 The reliability calculation unit 706 calculates the reliability of the bone conduction sound signal input from the second microphone 10. The reliability calculation unit 706 of the present embodiment calculates the reliability of the bone conduction sound signal based on the pressure detection result by the pressure sensor 15. In the transmission signal processing unit 7 of the present embodiment, when the user voice is detected by the user voice detection unit 701, the reliability calculation unit 706 calculates the reliability of the bone conduction sound signal. The reliability of the bone conduction sound signal calculated by the reliability calculation unit 706 is used to determine whether or not the corrected bone conduction signal is used for learning transfer characteristics.

伝達特性学習部７０２は、レシーバ８から第１のマイク９に伝播する音の伝達特性を学習する。本実施形態に係る伝達特性学習部７０２は、利用者音声が検出されなかった場合、及び利用者音声を検出したが骨導音信号の信頼度が閾値以下である場合には、受話信号及び気導音信号のみを用いて伝達特性を学習する。また、伝達特性学習部７０２は、利用者音声を検出し、かつ骨導音信号の信頼度が閾値よりも大きい場合には、受話信号、気導音信号、及び補正した骨導音信号を用いて伝達特性を学習する。 The transfer characteristic learning unit 702 learns the transfer characteristic of sound propagating from the receiver 8 to the first microphone 9. The transfer characteristic learning unit 702 according to the present embodiment detects the user signal and the voice when the user voice is not detected, and when the user voice is detected but the reliability of the bone conduction sound signal is equal to or less than the threshold. The transfer characteristic is learned using only the sound guide signal. In addition, the transfer characteristic learning unit 702 detects the user voice and uses the received signal, the air conduction sound signal, and the corrected bone conduction sound signal when the reliability of the bone conduction sound signal is larger than the threshold value. To learn transfer characteristics.

本実施形態の通話装置１と他の通話装置との呼接続が確立されると、音声信号処理部５の送話信号処理部７は、図１２Ａ及び図１２Ｂに示したエコー抑圧処理を行う。なお、本実施形態に係るエコー抑圧処理では、骨導音信号の信頼度Ｒを算出する処理（ステップＳ９）を信頼度算出部７０６が行う。信頼度算出部７０６は、圧力センサ１５が検出した圧力（言い換えると通話装置１が利用者の頭部から受ける圧力）に基づいて骨導音信号の信頼度Ｒを算出する。 When the call connection between the call device 1 of the present embodiment and another call device is established, the transmission signal processing unit 7 of the audio signal processing unit 5 performs the echo suppression processing shown in FIGS. 12A and 12B. In the echo suppression process according to the present embodiment, the reliability calculation unit 706 performs a process (step S9) of calculating the reliability R of the bone conduction sound signal. The reliability calculation unit 706 calculates the reliability R of the bone conduction sound signal based on the pressure detected by the pressure sensor 15 (in other words, the pressure received by the communication device 1 from the user's head).

図１６は、第４の実施形態における骨導音信号の信頼度の算出方法を説明するグラフである。 FIG. 16 is a graph for explaining a calculation method of the reliability of the bone conduction sound signal in the fourth embodiment.

圧力センサ１５が検出した圧力に基づいて骨導音信号の信頼度を算出する際には、例えば、図１６に示すような圧力Ｐと信頼度Ｒとの対応関係に基づいて算出する。すなわち、圧力センサ１５が検出した圧力Ｐに基づいて骨導音信号の信頼度Ｒを算出する場合、信頼度Ｒは、下記式（１０）を用いて算出する。 When calculating the reliability of the bone conduction sound signal based on the pressure detected by the pressure sensor 15, for example, it is calculated based on the correspondence between the pressure P and the reliability R as shown in FIG. That is, when calculating the reliability R of the bone conduction sound signal based on the pressure P detected by the pressure sensor 15, the reliability R is calculated using the following equation (10).

第１の圧力閾値PL及び第２の圧力閾値PHは、０＜PL＜PHを満たす任意の値とし、例えば、PL＝０．２ｋＰａ、PH＝１．２ｋＰａとする。 The first pressure threshold value PL and the second pressure threshold value PH are arbitrary values satisfying 0 <PL <PH, for example, PL = 0.2 kPa and PH = 1.2 kPa.

利用者が頭部に通話装置１を押し付ける力が大きいほど、圧力センサ１５が検出する圧力Ｐは大きな値となる。また、利用者が所定の押圧力よりも大きい力で頭部に通話装置１を押し付けている場合、利用者の頭部から通話装置１（第２のマイク１０）に骨導音が正しく伝達される。逆に、利用者が頭部に通話装置１を押し付ける力が小さい場合、利用者の頭部から通話装置１（第２のマイク１０）に伝達される骨導音が不安定になる。そのため、本実施形態に係るエコー抑圧処理においては、通話装置１が利用者の頭部から受ける圧力Ｐに基づいて算出した骨導音信号の信頼度Ｒが閾値ＴＨよりも大きい場合にのみ、伝達特性の学習に補正した骨導音信号を用いる。これにより、利用者音声を適切に反映していない骨導音信号を用いて誤った伝達特性を学習（算出）することを防ぐことが可能となる。 The greater the force with which the user presses the communication device 1 against the head, the greater the pressure P detected by the pressure sensor 15. In addition, when the user presses the communication device 1 against the head with a force larger than a predetermined pressing force, the bone conduction sound is correctly transmitted from the user's head to the communication device 1 (second microphone 10). The On the other hand, when the force with which the user presses the communication device 1 against the head is small, the bone conduction sound transmitted from the user's head to the communication device 1 (second microphone 10) becomes unstable. Therefore, in the echo suppression processing according to the present embodiment, transmission is performed only when the reliability R of the bone conduction sound signal calculated based on the pressure P received by the communication device 1 from the user's head is greater than the threshold value TH. A bone conduction sound signal corrected for characteristic learning is used. Thereby, it is possible to prevent learning (calculating) an erroneous transfer characteristic using a bone conduction sound signal that does not appropriately reflect the user's voice.

また、骨導音信号の信頼度Ｒは、骨導音信号と気導音信号との相関係数に基づいて算出した第１の信頼度Ｒcorrと、圧力センサ１５が検出した圧力Ｐに基づいて算出した第２の信頼度Ｒpとを用い、下記式（１１）により算出してもよい。 The reliability R of the bone conduction sound signal is based on the first reliability Rcorr calculated based on the correlation coefficient between the bone conduction sound signal and the air conduction sound signal, and the pressure P detected by the pressure sensor 15. You may calculate by following formula (11) using the calculated 2nd reliability Rp.

Ｒ＝β×Ｒcorr＋（１−β）×Ｒp ・・・（１１） R = β × Rcorr + (1−β) × Rp (11)

式（１１）の第１の信頼度Ｒcorr及び第２の信頼度Ｒｐは、それぞれ、例えば式（８）及び式（１０）を用いて算出する。また、式（１１）のβは、重み係数である。重み係数βは、０≦β≦１を満たす任意の値とし、例えば、β＝０．５とする。 The first reliability Rcorr and the second reliability Rp in Expression (11) are calculated using, for example, Expression (8) and Expression (10), respectively. Further, β in the equation (11) is a weighting factor. The weighting coefficient β is an arbitrary value satisfying 0 ≦ β ≦ 1, for example, β = 0.5.

式（１１）のように異なる情報を用いて算出した複数の信頼度に基づいて骨導音信号の信頼度Ｒを決定（算出）することにより、信頼度Ｒの精度（信頼性）を高くすることが可能となる。そのため、利用者音声を適切に反映していない骨導音信号を用いた誤った伝達特性の学習（算出）をより効果的に防止することが可能となる。 The reliability (reliability) of the reliability R is increased by determining (calculating) the reliability R of the bone conduction sound signal based on a plurality of reliability calculated using different information as in Expression (11). It becomes possible. Therefore, it becomes possible to more effectively prevent learning (calculation) of erroneous transfer characteristics using a bone conduction sound signal that does not properly reflect the user voice.

なお、第１〜第４の実施形態で示した送話信号処理部７（通話装置１）の機能的構成は一例に過ぎず、各実施形態で説明したエコー抑圧処理を実行可能であれば他の構成であってもよい。 Note that the functional configuration of the transmission signal processing unit 7 (calling device 1) shown in the first to fourth embodiments is merely an example, and the echo suppression processing described in each embodiment can be executed. It may be configured as follows.

また、図３〜図６、図８Ａ及び図８Ｂ、図９、図１０、並びに図１２Ａ及び図１２Ｂに示したフローチャートはいずれも一例に過ぎず、処理内容や処理手順は適宜変更可能である。 Also, the flowcharts shown in FIGS. 3 to 6, 8A and 8B, 9, 10, 12 A and 12 B are only examples, and the processing content and processing procedure can be changed as appropriate.

また、第１〜第４の実施形態に係る通話装置１は、例えば、コンピュータと、当該コンピュータに実行させるプログラムとを用いて実現することが可能である。以下、コンピュータとプログラムとを用いて実現される通話装置１について、図１７を参照して説明する。 Moreover, the call device 1 according to the first to fourth embodiments can be realized by using, for example, a computer and a program executed by the computer. Hereinafter, the communication device 1 realized using a computer and a program will be described with reference to FIG.

図１７は、コンピュータのハードウェア構成を示す図である。
図１７に示すように、コンピュータ２０は、プロセッサ２００１と、主記憶装置２００２と、補助記憶装置２００３と、入力装置２００４と、表示装置２００５と、インタフェース装置２００６と、通信制御装置２００７と、記憶媒体駆動装置２００８と、を備える。コンピュータ２０におけるこれらの要素２００１〜２００８は、バス２０１０により相互に接続されており、要素間でのデータの受け渡しが可能になっている。 FIG. 17 is a diagram illustrating a hardware configuration of a computer.
As shown in FIG. 17, the computer 20 includes a processor 2001, a main storage device 2002, an auxiliary storage device 2003, an input device 2004, a display device 2005, an interface device 2006, a communication control device 2007, and a storage medium. A driving device 2008. These elements 2001 to 2008 in the computer 20 are connected to each other by a bus 2010 so that data can be exchanged between the elements.

プロセッサ２００１は、Central Processing Unit（ＣＰＵ）等の演算処理装置であり、オペレーティングシステムを含む各種のプログラムを実行することによりコンピュータ２０の全体の動作を制御する。 The processor 2001 is an arithmetic processing unit such as a central processing unit (CPU), and controls the overall operation of the computer 20 by executing various programs including an operating system.

主記憶装置２００２は、図示しないRead Only Memory（ＲＯＭ）及びRandom Access Memory（ＲＡＭ）を含む。主記憶装置２００２のＲＯＭには、例えばコンピュータ２０の起動時にプロセッサ２００１が読み出す所定の基本制御プログラム等が予め記録されている。また、主記憶装置２００２のＲＡＭは、プロセッサ２００１が各種のプログラムを実行する際に、必要に応じて作業用記憶領域として使用する。主記憶装置２００２のＲＡＭは、例えば、伝達特性や骨導音補正特性等の記憶に利用可能である。 The main storage device 2002 includes a read only memory (ROM) and a random access memory (RAM) not shown. In the ROM of the main storage device 2002, for example, a predetermined basic control program read by the processor 2001 when the computer 20 is started is recorded in advance. The RAM of the main storage device 2002 is used as a working storage area as necessary when the processor 2001 executes various programs. The RAM of the main storage device 2002 can be used for storing, for example, transfer characteristics and bone conduction sound correction characteristics.

補助記憶装置２００３は、Hard Disk Drive（ＨＤＤ）やSolid State Drive（ＳＳＤ）等の主記憶装置２００２に比べて容量の大きい記憶装置である。補助記憶装置２００３には、プロセッサ２００１によって実行される各種のプログラムや各種のデータ等を記憶させることができる。補助記憶装置２００３は、例えば、図３〜図６に示した処理を含む通話用プログラム等の記憶に利用可能である。また、補助記憶装置２００３は、例えば、伝達特性や骨導音補正特性等の記憶に利用可能である。 The auxiliary storage device 2003 is a storage device having a larger capacity than the main storage device 2002 such as a hard disk drive (HDD) or a solid state drive (SSD). The auxiliary storage device 2003 can store various programs executed by the processor 2001, various data, and the like. The auxiliary storage device 2003 can be used for storing, for example, a calling program including the processes shown in FIGS. Further, the auxiliary storage device 2003 can be used for storing, for example, transmission characteristics and bone conduction sound correction characteristics.

入力装置２００４は、例えばキーボード装置やタッチパネル装置である。コンピュータ２０のオペレータ（利用者）が入力装置２００４に対し押下する等の操作を行うと、入力装置２００４は、その操作内容に対応付けられている入力情報をプロセッサ２００１に送信する。 The input device 2004 is, for example, a keyboard device or a touch panel device. When an operator (user) of the computer 20 performs an operation such as pressing the input device 2004, the input device 2004 transmits input information associated with the operation content to the processor 2001.

表示装置２００５は、例えば液晶ディスプレイである。表示装置２００５は、プロセッサ２００１等から送信される表示データに従って各種のテキスト画面、画像等を表示する。 The display device 2005 is a liquid crystal display, for example. The display device 2005 displays various text screens, images, and the like according to display data transmitted from the processor 2001 or the like.

インタフェース装置２００６は、コンピュータ２０と他の電子装置等とを接続する装置であり、Universal Serial Bus（ＵＳＢ）規格のコネクタ等を備える。インタフェース装置２００６によりコンピュータ２０と接続可能な装置には、レシーバ８、第１のマイク９、第２のマイク１０等がある。 The interface device 2006 is a device that connects the computer 20 to other electronic devices, and includes a Universal Serial Bus (USB) standard connector and the like. Devices that can be connected to the computer 20 by the interface device 2006 include a receiver 8, a first microphone 9, a second microphone 10, and the like.

通信制御装置２００７は、電話網やインターネット等のネットワーク２１を介したコンピュータ２０と他の通信機器との各種通信を制御する装置である。通信制御装置２００７が行う通信の制御には、ネットワーク２１を介したコンピュータ２０と他の通話装置２２との通話（音声信号の送受信）の制御が含まれる。 The communication control device 2007 is a device that controls various communications between the computer 20 and other communication devices via the network 21 such as a telephone network or the Internet. Control of communication performed by the communication control apparatus 2007 includes control of a call (transmission and reception of audio signals) between the computer 20 and another call apparatus 22 via the network 21.

記憶媒体駆動装置２００８は、図示しない可搬型記憶媒体に記録されているプログラムやデータの読み出し、補助記憶装置２００３に記憶されたデータ等の可搬型記憶媒体への書き込みを行う。可搬型記憶媒体としては、例えば、ＵＳＢ規格のコネクタが備えられているフラッシュメモリが利用可能である。また、可搬型記憶媒体としては、Compact Disk（ＣＤ）、Digital Versatile Disc（ＤＶＤ）、Blu-ray Disc（Blu-rayは登録商標）等の光ディスクも利用可能である。 The storage medium drive device 2008 reads programs and data recorded in a portable storage medium (not shown) and writes data stored in the auxiliary storage device 2003 to the portable storage medium. As the portable storage medium, for example, a flash memory equipped with a USB standard connector can be used. Further, as a portable storage medium, an optical disc such as a Compact Disk (CD), a Digital Versatile Disc (DVD), and a Blu-ray Disc (Blu-ray is a registered trademark) can be used.

コンピュータ２０は、プロセッサ２００１が補助記憶装置２００３等から図３〜図６の処理を含むプログラムを読み出し、第１のマイク９から入力された気導音信号のエコー成分を抑圧しながら他の通話装置２２との間で音声信号の送受信を行う。 In the computer 20, the processor 2001 reads out the program including the processes of FIGS. 3 to 6 from the auxiliary storage device 2003 or the like, and suppresses the echo component of the air conduction sound signal input from the first microphone 9. An audio signal is transmitted to and received from 22.

なお、通話装置１として用いるコンピュータ２０は、図１７に示した全ての構成要素を含む必要はなく、用途や条件に応じて一部の構成要素を省略することも可能である。例えば、インタフェース装置２００６を省略してレシーバ８、第１のマイク９、第２のマイク１０等をプリント回路板に直接接続することも可能である。 Note that the computer 20 used as the communication device 1 does not have to include all the components shown in FIG. 17, and some components can be omitted depending on the application and conditions. For example, it is possible to omit the interface device 2006 and connect the receiver 8, the first microphone 9, the second microphone 10 and the like directly to the printed circuit board.

以上記載した各実施例を含む実施形態に関し、更に以下の付記を開示する。
（付記１）
音声信号を気導音として出力するレシーバと、
気導音を収音する第１のマイクと、
骨導音を収音する第２のマイクと、
前記レシーバから前記第１のマイクへの音声の伝達特性に基づいて、前記第１のマイクから入力された気導音信号に含まれるエコー成分を抑圧するエコー抑圧部と、
前記第２のマイクから入力された骨導音信号の入力レベルが所定の閾値以下である場合に、前記レシーバから出力させる受話信号、及び前記気導音信号に基づいて前記伝達特性を学習する伝達特性学習部と、
を備えることを特徴とする通話装置。
（付記２）
前記伝達特性学習部は、前記骨導音信号の入力レベルが前記閾値を超えている場合に、前記受話信号と、前記気導音信号から前記骨導音信号を減算して得られる音声信号とに基づいて前記伝達特性を学習する、
ことを特徴とする付記１に記載の通話装置。
（付記３）
予め前記気導音信号及び前記骨導音信号に基づいて算出した骨導音補正特性と、前記骨導音信号とに基づいて、前記骨導音信号を補正する骨導音補正部、を更に備え、
前記伝達特性学習部は、前記骨導音信号の入力レベルが前記閾値を超えている場合に、前記受話信号と、前記気導音信号から前記骨導音補正部で補正した前記骨導音信号を減算して得られる音声信号とに基づいて伝達特性を学習する、
ことを特徴とする付記２に記載の通話装置。
（付記４）
前記骨導音信号の入力レベルが前記閾値を超えている場合に、前記骨導音信号が前記骨導音の音源が発した音声を反映した音声信号であることの信頼度を算出する信頼度算出部、を更に備え、
前記伝達特性学習部は、前記骨導音信号の信頼度が所定の閾値を超えている場合に、前記受話信号と、前記気導音信号から前記骨導音信号を減算して得られる音声信号とに基づいて伝達特性を学習する、
ことを特徴とする付記２に記載の通話装置。
（付記５）
前記信頼度算出部は、前記骨導音信号と前記気導音信号との相関係数に基づいて前記骨導音信号の信頼度を算出する、
ことを特徴とする付記４に記載の通話装置。
（付記６）
前記通話装置は、前記骨導音の音源から受ける圧力を検出する態様で当該通話装置に設けられた圧力センサ、を更に備え、
前記信頼度算出部は、前記圧力センサが検出した圧力に基づいて前記骨導音信号の信頼度を算出する、
ことを特徴とする付記４に記載の通話装置。
（付記７）
前記伝達特性学習部は、前記骨導音信号の入力レベルが前記閾値を超えており、かつ前記受話信号に音声が含まれる場合に、前記受話信号と、前記気導音信号から前記骨導音信号を減算して得られる音声信号とに基づいて伝達特性を学習する、
ことを特徴とする付記２に記載の通話装置。
（付記８）
気導音を収音する第１のマイクから気導音信号を取得するとともに骨導音を収音する第２のマイクから骨導音信号を取得し、
前記骨導音信号の入力レベルが所定の閾値以下である場合に、レシーバから出力させる受話信号と、前記気導音信号とに基づいて、前記レシーバから前記第１のマイクへの音声の伝達特性を学習し、
前記伝達特性に基づいて、前記第１のマイクから入力された気導音信号を補正する、
処理をコンピュータに実行させる音声信号補正プログラム。
（付記９）
前記骨導音信号の入力レベルが前記閾値を超えている場合に、前記受話信号と、前記気導音信号から前記骨導音信号を減算して得られる音声信号とに基づいて前記伝達特性を学習する、処理を前記コンピュータに更に実行させる、
ことを特徴とする付記８に記載の音声信号補正プログラム。
（付記１０）
前記骨導音信号の入力レベルが前記閾値を超えている場合の前記伝達特性を学習する処理は、
予め前記気導音信号及び前記骨導音信号に基づいて算出した骨導音補正特性と、前記骨導音信号とに基づいて、前記骨導音信号を補正し、
前記受話信号と、前記気導音信号から補正した前記骨導音信号を減算して得られる音声信号とに基づいて伝達特性を学習する、処理を含む、
ことを特徴とする付記９に記載の音声信号補正プログラム。
（付記１１）
前記骨導音信号の入力レベルが前記閾値を超えている場合の前記伝達特性を学習する処理は、
前記骨導音信号が前記骨導音の音源が発した音声を反映した音声信号であることの信頼度を算出し、
前記骨導音信号の信頼度が所定の閾値を超えている場合に、前記受話信号と、前記気導音信号から前記骨導音信号を減算して得られる音声信号とに基づいて伝達特性を学習する、処理を含む、
ことを特徴とする付記９に記載の音声信号補正プログラム。
（付記１２）
前記骨導音信号と前記気導音信号との相関係数に基づいて前記骨導音信号の信頼度を算出する、処理をコンピュータに実行させる、
ことを特徴とする付記１０に記載の音声信号補正プログラム。
（付記１３）
前記骨導音信号の信頼度を算出する処理は、
前記第２のマイクを設置した筐体が前記骨導音の音源から受ける圧力を取得し、
取得した前記圧力に基づいて前記骨導音信号の信頼度を算出する、処理を含む、
ことを特徴とする付記１０に記載の音声信号補正プログラム。
（付記１４）
前記骨導音信号の入力レベルが前記閾値を超えている場合に、前記受話信号と、前記気導音信号から前記骨導音信号を減算して得られる音声信号とに基づいて前記伝達特性を学習する、処理を前記コンピュータに更に実行させる、
ことを特徴とする付記９に記載の音声信号補正プログラム。
（付記１５）
前記気導音信号を補正する処理は、
前記伝達特性と前記受話信号とに基づいて、前記気導音信号に含まれる前記レシーバから出力させた音声成分を推定し、
推定した前記音声成分を前記気導音信号から減算する、処理を含む、
ことを特徴とする付記８に記載の音声信号補正プログラム。 The following additional notes are further disclosed with respect to the embodiments including the examples described above.
(Appendix 1)
A receiver that outputs an audio signal as air conduction sound;
A first microphone that collects the air conduction sound;
A second microphone for picking up bone conduction sound;
An echo suppression unit that suppresses an echo component included in an air conduction sound signal input from the first microphone based on a transmission characteristic of sound from the receiver to the first microphone;
Transmission that learns the transfer characteristic based on the received signal to be output from the receiver and the air conduction sound signal when the input level of the bone conduction sound signal inputted from the second microphone is equal to or lower than a predetermined threshold value A characteristic learning unit;
A call device comprising:
(Appendix 2)
The transfer characteristic learning unit, when an input level of the bone conduction sound signal exceeds the threshold, the received signal, and a voice signal obtained by subtracting the bone conduction signal from the air conduction signal; Learning the transfer characteristics based on
The telephone call device according to supplementary note 1, wherein:
(Appendix 3)
A bone conduction sound correction unit that corrects the bone conduction sound signal based on the bone conduction sound correction characteristic calculated in advance based on the air conduction sound signal and the bone conduction sound signal, and the bone conduction sound signal; Prepared,
When the input level of the bone conduction sound signal exceeds the threshold, the transfer characteristic learning unit corrects the bone conduction sound signal corrected by the bone conduction sound correction unit from the received signal and the air conduction sound signal. Learning transfer characteristics based on the audio signal obtained by subtracting
The telephone call device according to Supplementary Note 2, wherein
(Appendix 4)
A reliability for calculating a reliability of the bone conduction sound signal reflecting a sound emitted from the sound source of the bone conduction sound when an input level of the bone conduction sound signal exceeds the threshold; A calculation unit,
The transfer characteristic learning unit is configured to subtract the bone conduction sound signal from the received signal and the air conduction sound signal when the reliability of the bone conduction sound signal exceeds a predetermined threshold. Learn transfer characteristics based on
The telephone call device according to Supplementary Note 2, wherein
(Appendix 5)
The reliability calculation unit calculates the reliability of the bone conduction sound signal based on a correlation coefficient between the bone conduction sound signal and the air conduction sound signal.
The telephone call device according to supplementary note 4, characterized in that:
(Appendix 6)
The communication device further includes a pressure sensor provided in the communication device in a mode of detecting pressure received from the bone conduction sound source,
The reliability calculation unit calculates the reliability of the bone conduction sound signal based on the pressure detected by the pressure sensor.
The telephone call device according to supplementary note 4, characterized in that:
(Appendix 7)
The transfer characteristic learning unit, when an input level of the bone conduction sound signal exceeds the threshold value and the speech signal includes voice, the bone conduction sound from the reception signal and the air conduction sound signal. Learning transfer characteristics based on the audio signal obtained by subtracting the signal,
The telephone call device according to Supplementary Note 2, wherein
(Appendix 8)
Obtaining an air conduction sound signal from a first microphone that collects the air conduction sound and obtaining a bone conduction sound signal from a second microphone that collects the bone conduction sound;
Transfer characteristics of sound from the receiver to the first microphone based on the reception signal output from the receiver and the air conduction sound signal when the input level of the bone conduction sound signal is equal to or lower than a predetermined threshold value To learn and
Correcting the air conduction sound signal input from the first microphone based on the transfer characteristic;
An audio signal correction program that causes a computer to execute processing.
(Appendix 9)
When the input level of the bone conduction sound signal exceeds the threshold, the transfer characteristic is determined based on the received signal and an audio signal obtained by subtracting the bone conduction signal from the air conduction sound signal. Learning, causing the computer to perform further processing,
The audio signal correction program according to appendix 8, wherein
(Appendix 10)
The process of learning the transfer characteristic when the input level of the bone conduction sound signal exceeds the threshold,
Based on the bone conduction sound correction characteristic calculated based on the air conduction sound signal and the bone conduction sound signal in advance, and the bone conduction sound signal, the bone conduction sound signal is corrected,
Learning a transfer characteristic based on the received signal and an audio signal obtained by subtracting the bone conduction sound signal corrected from the air conduction sound signal,
The audio signal correction program according to Supplementary Note 9, wherein
(Appendix 11)
The process of learning the transfer characteristic when the input level of the bone conduction sound signal exceeds the threshold,
Calculating the reliability that the bone conduction sound signal is an audio signal reflecting the sound emitted by the bone conduction sound source;
When the reliability of the bone conduction sound signal exceeds a predetermined threshold value, transfer characteristics are determined based on the received signal and an audio signal obtained by subtracting the bone conduction signal from the air conduction sound signal. Learn, process,
The audio signal correction program according to Supplementary Note 9, wherein
(Appendix 12)
Calculating a reliability of the bone conduction sound signal based on a correlation coefficient between the bone conduction sound signal and the air conduction sound signal, causing the computer to execute a process;
The audio signal correction program according to appendix 10, wherein the program is corrected.
(Appendix 13)
The process of calculating the reliability of the bone conduction sound signal,
Obtaining the pressure received by the case where the second microphone is installed from the bone conduction sound source;
Calculating the reliability of the bone conduction sound signal based on the acquired pressure, including processing,
The audio signal correction program according to appendix 10, wherein the program is corrected.
(Appendix 14)
When the input level of the bone conduction sound signal exceeds the threshold, the transfer characteristic is determined based on the received signal and an audio signal obtained by subtracting the bone conduction signal from the air conduction sound signal. Learning, causing the computer to perform further processing,
The audio signal correction program according to Supplementary Note 9, wherein
(Appendix 15)
The process of correcting the air conduction sound signal is as follows:
Based on the transfer characteristics and the received signal, the speech component output from the receiver included in the air conduction sound signal is estimated,
Subtracting the estimated audio component from the air conduction sound signal, including processing.
The audio signal correction program according to appendix 8, wherein

１通話装置
２ＲＦ送受信部
３アンテナ
４ベースバンド処理部
５音声信号処理部
６受話信号処理部
７送話信号処理部
７０１利用者音声検出部
７０２伝達特性学習部
７０３エコー抑圧部
７０４骨導音補正部
７０５，７０６信頼度算出部
７１１（第１の）記憶部
７１２第２の記憶部
８レシーバ
９第１のマイク
１０第２のマイク
１５圧力センサ
２０コンピュータ
２００１プロセッサ
２００２主記憶装置
２００３補助記憶装置
２００４入力装置
２００５表示装置
２００６インタフェース装置
２００７通信制御装置
２００８記憶媒体駆動装置
２１ネットワーク
２２通話装置 DESCRIPTION OF SYMBOLS 1 Call apparatus 2 RF transmission / reception part 3 Antenna 4 Baseband process part 5 Audio | voice signal process part 6 Received signal process part 7 Transmission signal process part 701 User voice detection part 702 Transfer characteristic learning part 703 Echo suppression part 704 Bone conduction correction Units 705 and 706 reliability calculation unit 711 (first) storage unit 712 second storage unit 8 receiver 9 first microphone 10 second microphone 15 pressure sensor 20 computer 2001 processor 2002 main storage device 2003 auxiliary storage device 2004 Input device 2005 Display device 2006 Interface device 2007 Communication control device 2008 Storage medium drive device 21 Network 22 Communication device

Claims

A receiver that outputs an audio signal as air conduction sound;
A first microphone that collects the air conduction sound;
A second microphone for picking up bone conduction sound;
An echo suppression unit that suppresses an echo component included in an air conduction sound signal input from the first microphone based on a transmission characteristic of sound from the receiver to the first microphone;
Transmission that learns the transfer characteristic based on the received signal to be output from the receiver and the air conduction sound signal when the input level of the bone conduction sound signal inputted from the second microphone is equal to or lower than a predetermined threshold value A characteristic learning unit;
A call device comprising:

The transfer characteristic learning unit, when an input level of the bone conduction sound signal exceeds the threshold, the received signal, and a voice signal obtained by subtracting the bone conduction signal from the air conduction signal; Learn transfer characteristics based on
The call device according to claim 1.

A bone conduction sound correction unit that corrects the bone conduction sound signal based on the bone conduction sound correction characteristic calculated in advance based on the air conduction sound signal and the bone conduction sound signal, and the bone conduction sound signal; Prepared,
When the input level of the bone conduction sound signal exceeds the threshold, the transfer characteristic learning unit corrects the bone conduction sound signal corrected by the bone conduction sound correction unit from the received signal and the air conduction sound signal. Learning transfer characteristics based on the audio signal obtained by subtracting
The communication device according to claim 2.

A reliability for calculating a reliability of the bone conduction sound signal reflecting a sound emitted from the sound source of the bone conduction sound when an input level of the bone conduction sound signal exceeds the threshold; A calculation unit,
The transfer characteristic learning unit is configured to subtract the bone conduction sound signal from the received signal and the air conduction sound signal when the reliability of the bone conduction sound signal exceeds a predetermined threshold. Learn transfer characteristics based on
The communication device according to claim 2.

The reliability calculation unit calculates the reliability of the bone conduction sound signal based on a correlation coefficient between the bone conduction sound signal and the air conduction sound signal.
The call device according to claim 4, wherein:

The communication device further includes a pressure sensor provided in the communication device in a mode of detecting pressure received from the bone conduction sound source,
The reliability calculation unit calculates the reliability of the bone conduction sound signal based on the pressure detected by the pressure sensor.
The call device according to claim 4, wherein:

Obtaining an air conduction sound signal from a first microphone that collects the air conduction sound and obtaining a bone conduction sound signal from a second microphone that collects the bone conduction sound;
Transfer characteristics of sound from the receiver to the first microphone based on the reception signal output from the receiver and the air conduction sound signal when the input level of the bone conduction sound signal is equal to or lower than a predetermined threshold value To learn and
Correcting the air conduction sound signal input from the first microphone based on the transfer characteristic;
An audio signal correction program that causes a computer to execute processing.