JP6409378B2

JP6409378B2 - Voice communication apparatus and program

Info

Publication number: JP6409378B2
Application number: JP2014143053A
Authority: JP
Inventors: 祐弘向嶋
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2014-07-11
Filing date: 2014-07-11
Publication date: 2018-10-24
Anticipated expiration: 2034-07-11
Also published as: JP2016019263A

Description

本発明は、利用者間で音声を授受する音声通信技術に関する。 The present invention relates to a voice communication technique for transferring voice between users.

利用者が発声した会話音等の音声を示す音響信号を複数の利用者間で通信網を介して送受信する音声通信装置において、収音機器が収音した音響信号から雑音成分を抑圧する技術が従来から提供されている。例えば特許文献１には、音響信号から推定される雑音成分のスペクトル（推定雑音スペクトル）を音響信号から除去する技術が開示されている。特許文献１の技術では、音響信号における音声成分の有無に応じた推定雑音スペクトルと音声成分の有無とは無関係に推定された推定雑音スペクトルとから最終的な推定雑音スペクトルを生成して音響信号から除去する。 A technology for suppressing noise components from an acoustic signal collected by a sound collection device in a voice communication device that transmits and receives an acoustic signal indicating speech such as a conversation sound uttered by a user via a communication network between a plurality of users. Traditionally provided. For example, Patent Document 1 discloses a technique for removing a noise component spectrum (estimated noise spectrum) estimated from an acoustic signal from the acoustic signal. In the technique of Patent Document 1, a final estimated noise spectrum is generated from an estimated noise spectrum according to the presence / absence of a speech component in an acoustic signal and an estimated noise spectrum estimated regardless of the presence / absence of the speech component. Remove.

特開２０１０−１０２２０４号公報JP 2010-102204 A

特許文献１のような高度な雑音抑圧技術を利用すれば、音響信号に包含される雑音成分を非常に高精度に抑圧して通話相手の音声通信装置に送信することが可能である。しかし、発話者の周囲に存在する音響（以下「環境音」という）を含む雑音成分が過度に高精度に除去されると、発話者の周囲の環境音を通話相手に伝達できないという問題がある。以上の事情を考慮して、利用者の周囲の環境音を通話相手に適切に伝達することを目的とする。 By using an advanced noise suppression technique such as that disclosed in Patent Document 1, it is possible to suppress a noise component included in an acoustic signal with very high accuracy and transmit it to the voice communication device of the other party. However, if noise components including sound (hereinafter referred to as “environmental sound”) existing around the speaker are removed with an excessively high accuracy, there is a problem that the environmental sound around the speaker cannot be transmitted to the other party. . In view of the above circumstances, an object is to appropriately transmit environmental sounds around the user to the other party.

以上の課題を解決するために、本発明に係る音声通信装置は、通話相手の通信装置から送信された受話信号を受信する受信部と、前記受信部が受信した前記受話信号に応じた音響を放音する放音部と、目的音と環境音とを含む音響を収音して収音信号を生成する収音部と、前記収音部が生成した前記収音信号のうち前記目的音成分を強調した第１音響信号を生成する第１信号処理部と、前記収音部が生成した前記収音信号のうち前記環境音成分を強調した第２音響信号を生成する第２信号処理部と、前記受信部が受信した前記受話信号のレベルに応じて前記第２音響信号のレベルを制御する制御部と、前記第１音響信号と前記第２音響信号とを送信する送信部とを具備する。以上の構成では、収音部が生成した収音信号のうち目的音成分を強調した第１音響信号と環境音成分を強調した第２音響信号とが通話相手の通信装置に送信される。したがって、利用者の周囲の環境音を通話相手に適切に伝達することが可能である。なお、目的音とは、収音の目的となる音響であり、具体的には音声通信装置の利用者の発声音である。他方、環境音とは、目的音以外の音響であり、音声通信装置の利用者の周囲に存在する音響（人混みでの雑踏音や空調設備の動作音等）と、放音部から放射されて収音部に収音される帰還音とを含む。帰還音は、例えば、通話相手の通信装置から送信されて放音部から放射された通話相手の発声音である。 In order to solve the above-described problems, a voice communication device according to the present invention includes a receiving unit that receives a received signal transmitted from a communication device of a communication partner, and a sound corresponding to the received signal received by the receiving unit. A sound emission unit that emits sound, a sound collection unit that collects sound including target sound and environmental sound, and generates a sound collection signal; and the target sound component of the sound collection signal generated by the sound collection unit A first signal processing unit that generates a first acoustic signal that emphasizes the sound, and a second signal processing unit that generates a second acoustic signal that emphasizes the environmental sound component of the collected sound signal generated by the sound collecting unit; A control unit that controls the level of the second acoustic signal according to the level of the received signal received by the reception unit; and a transmission unit that transmits the first acoustic signal and the second acoustic signal. . In the above configuration, the first acoustic signal in which the target sound component is emphasized and the second acoustic signal in which the environmental sound component is enhanced among the collected sound signals generated by the sound collecting unit are transmitted to the communication device of the other party. Therefore, it is possible to appropriately transmit environmental sounds around the user to the other party. Note that the target sound is the sound that is the target of sound collection, and specifically, the sound produced by the user of the voice communication device. On the other hand, the environmental sound is sound other than the target sound, and is radiated from the sound that is present around the user of the voice communication device (such as crowded noises in the crowds and operating sounds of air conditioning equipment) and from the sound emission unit. And feedback sound collected by the sound collection unit. The feedback sound is, for example, the voice of the call partner transmitted from the communication device of the call partner and radiated from the sound emitting unit.

ところで、受話信号に包含される通話相手の音声は、放音部から収音部に到達する帰還音として環境音に包含されるから、例えば受話信号のレベルに関わらず第２音響信号のレベルが維持される構成では、利用者の音声通信装置と通話相手の音声通信装置との間で通話相手の利用者の音声が循環し、結果的にハウリングを発生させる原因となり得る。以上の事情を考慮して、本発明の好適な態様における前記制御部は、前記受話信号のレベルが高いほど前記第２音響信号のレベルが低下するように前記第２音響信号のレベルを制御する。以上の態様では、受話信号のレベルが高いほど第２音響信号のレベルが低下するように第２音響信号のレベルが制御されるから、受話信号のレベルに関わらず第２音響信号のレベルが維持される構成と比較してハウリングを有効に防止できるという利点がある。 By the way, since the other party's voice included in the received signal is included in the environmental sound as a feedback sound that reaches the sound collecting unit from the sound emitting unit, for example, the level of the second acoustic signal is set regardless of the level of the received signal. In the maintained configuration, the voice of the other party's user circulates between the user's voice communication apparatus and the other party's voice communication apparatus, which may result in howling. In view of the above circumstances, the control unit according to a preferred aspect of the present invention controls the level of the second acoustic signal so that the level of the second acoustic signal decreases as the level of the received signal increases. . In the above aspect, since the level of the second acoustic signal is controlled so that the level of the second acoustic signal decreases as the level of the received signal increases, the level of the second acoustic signal is maintained regardless of the level of the received signal. There is an advantage that howling can be effectively prevented as compared with the configuration.

本発明の好適な態様において、前記制御部は、前記受話信号のレベルが閾値を上回る場合に、前記第２音響信号のレベルが低下するように前記第２音響信号のレベルを制御する。以上の態様では、受話信号のレベルが閾値を上回る場合に第２音響信号のレベルが低下するように第２音響信号のレベルが制御されるから、受話信号のレベルに関わらず第２音響信号のレベルを受話信号のレベルに連動させる構成と比較して、適切に環境音が伝達されるという利点がある。 In a preferred aspect of the present invention, the control unit controls the level of the second acoustic signal so that the level of the second acoustic signal decreases when the level of the received signal exceeds a threshold value. In the above aspect, since the level of the second acoustic signal is controlled so that the level of the second acoustic signal decreases when the level of the received signal exceeds the threshold value, the second acoustic signal does not depend on the level of the received signal. Compared to a configuration in which the level is linked to the level of the received signal, there is an advantage that environmental sound is appropriately transmitted.

本発明の好適な態様において、前記制御部は、前記受話信号を音声区間と前記音声区間以外の挿入区間とに区分し、前記挿入区間において、前記受話信号のレベルに対する前記第２音響信号のレベルの変動が前記音声区間と比較して低減されるように前記第２音響信号のレベルを制御する。以上の態様では、受話信号を音声区間と挿入区間とに区分し、挿入区間では受話信号のレベルに対する第２音響信号のレベルの変動が音声区間と比較して低減されるように第２音響信号のレベルが制御される。したがって、挿入区間内での環境音の変動に起因して通話相手の利用者が違和感を知覚する可能性が低減されるという利点がある。なお、音声区間とは、受話信号のうち通話相手の発声音が優勢に存在する区間であり、挿入区間とは、音声区間以外の区間（例えば相前後する音声区間の間で発声者の発話が途切れた区間）である。 In a preferred aspect of the present invention, the control unit divides the received signal into a voice section and an insertion section other than the voice section, and the level of the second acoustic signal relative to the level of the received signal in the insertion section. The level of the second acoustic signal is controlled such that the fluctuation of the second acoustic signal is reduced as compared with the voice interval. In the above aspect, the received signal is divided into a voice section and an insertion section, and the second acoustic signal is reduced in the insertion section so that the fluctuation in the level of the second acoustic signal with respect to the level of the received signal is reduced compared to the voice section. Level is controlled. Therefore, there is an advantage that the possibility that the user of the other party of the call perceives the uncomfortable feeling due to the fluctuation of the environmental sound in the insertion section is reduced. Note that the voice section is a section in which the voice of the other party is dominant in the received signal, and the insertion section is a section other than the voice section (for example, a speaker's utterance between adjacent voice sections). (Interrupted section).

本発明の好適な態様において、前記制御部は、前記挿入区間の時間長が閾値を下回る場合に、当該挿入区間において、前記受話信号のレベルに対する前記第２音響信号のレベルの変動が前記音声区間と比較して低減されるように前記第２音響信号のレベルを制御する一方、前記挿入区間の時間長が前記閾値を上回る場合に、当該挿入区間において、前記受話信号のレベルに対する前記第２音響信号のレベルの変動が前記音声区間と同等となるように前記第２音響信号のレベルを制御する。以上の態様では、挿入区間の時間長が閾値を上回る場合（通話相手の一連の発話が終了したと推定される状況）には、受話信号のレベルに応じた第２音響信号のレベルが音声区間と同等に制御され、閾値を下回る場合には、受話信号のレベルに応じた第２音響信号のレベルの制御が音声区間と比較して抑制されるように第２音響信号のレベルが制御される。したがって、挿入区間の時間長が閾値を下回る場合には、環境音の変動に起因して利用者が違和感を知覚する可能性が低減することが可能になる。他方、挿入区間の時間長が閾値を上回る場合には、利用者側の環境音を通話相手に適切に伝達することが可能になる。以上の態様によれば、通話相手の発話の状況に応じた適切なレベルの環境音を通話相手に伝達することが可能になるという利点がある。 In a preferred aspect of the present invention, when the time length of the insertion section is less than a threshold value, the control unit causes a change in the level of the second acoustic signal relative to the level of the received signal in the insertion section. While the level of the second acoustic signal is controlled so as to be reduced compared to the second acoustic signal, when the time length of the insertion section exceeds the threshold, the second acoustic signal with respect to the level of the received signal in the insertion section The level of the second acoustic signal is controlled so that the fluctuation of the signal level is equivalent to that of the voice section. In the above aspect, when the time length of the insertion section exceeds the threshold (a situation in which it is estimated that a series of utterances of the other party has been completed), the level of the second acoustic signal corresponding to the level of the received signal is the voice section. When the level is lower than the threshold, the level of the second acoustic signal is controlled so that the control of the level of the second acoustic signal according to the level of the received signal is suppressed as compared with the voice interval. . Therefore, when the time length of the insertion section is less than the threshold value, it is possible to reduce the possibility that the user perceives a sense of incongruity due to the fluctuation of the environmental sound. On the other hand, when the time length of the insertion section exceeds the threshold, it is possible to appropriately transmit the environmental sound on the user side to the other party. According to the above aspect, there is an advantage that it is possible to transmit an environmental sound of an appropriate level according to the state of speech of the other party to the other party.

本発明に係る音声通信装置は、音声通信に関連する処理に専用されるDSP(Digital Signal Processor)などのハードウェア（電子回路）によって実現されるほか、CPU(Central Processing Unit)などの汎用の演算処理装置とプログラムとの協働によっても実現される。本発明に係るプログラムは、通話相手の通信装置から送信された受話信号を受信する受信部、前記受信部が受信した前記受話信号に応じた音響を放音する放音部、目的音と環境音とを含む音響を収音して収音信号を生成する収音部、前記収音部が生成した前記収音信号のうち前記目的音成分を強調した第１音響信号を生成する第１信号処理部、前記収音部が生成した前記収音信号のうち前記環境音成分を強調した第２音響信号を生成する第２信号処理部、前記受信部が受信した前記受話信号のレベルに応じて前記第２音響信号のレベルを制御する制御部、および、前記第１音響信号と前記第２音響信号とを送信する送信部としてコンピュータを機能させる。本発明のプログラムは、コンピュータが読取可能な記録媒体に格納された形態で提供されてコンピュータにインストールされ得る。記録媒体は、例えば非一過性（non-transitory）の記録媒体であり、ＣＤ-ＲＯＭ等の光学式記録媒体（光ディスク）が好例であるが、半導体記録媒体や磁気記録媒体等の公知の任意の形式の記録媒体を包含し得る。なお、例えば、本発明のプログラムは、通信網を介した配信の形態で提供されてコンピュータにインストールされ得る。また、以上の各態様に係る音声通信装置の動作方法（音声通信方法）としても本発明は特定される。 The voice communication apparatus according to the present invention is realized by hardware (electronic circuit) such as DSP (Digital Signal Processor) dedicated to processing related to voice communication, and general-purpose computation such as CPU (Central Processing Unit). This is also realized by cooperation between the processing device and the program. A program according to the present invention includes a receiving unit that receives a received signal transmitted from a communication device of a communication partner, a sound emitting unit that emits sound according to the received signal received by the receiving unit, a target sound, and an environmental sound. And a first signal processing for generating a first acoustic signal in which the target sound component is emphasized among the collected signals generated by the sound collecting unit. A second signal processing unit that generates a second acoustic signal that emphasizes the environmental sound component of the collected sound signal generated by the sound collecting unit, and the level of the received signal received by the receiving unit. The computer is caused to function as a control unit that controls the level of the second acoustic signal and a transmission unit that transmits the first acoustic signal and the second acoustic signal. The program of the present invention can be provided in a form stored in a computer-readable recording medium and installed in the computer. The recording medium is, for example, a non-transitory recording medium, and an optical recording medium (optical disk) such as a CD-ROM is a good example, but a known arbitrary one such as a semiconductor recording medium or a magnetic recording medium This type of recording medium can be included. For example, the program of the present invention can be provided in the form of distribution via a communication network and installed in a computer. The present invention is also specified as an operation method (voice communication method) of the voice communication device according to each of the above aspects.

第１実施形態に係る通信システムの構成を示す図である。It is a figure which shows the structure of the communication system which concerns on 1st Embodiment. 音声通信装置のブロック図である。It is a block diagram of a voice communication apparatus. 音声通信装置の具体的な形態の説明図である。It is explanatory drawing of the specific form of a voice communication apparatus. 第１信号処理部のブロック図である。It is a block diagram of a 1st signal processing part. 調音処理部のブロック図である。It is a block diagram of an articulation processing part. 制御部のブロック図である。It is a block diagram of a control part. 受話信号のレベルと調整値との関係の説明図である。It is explanatory drawing of the relationship between the level of a received signal, and an adjustment value. 受話信号のレベルに応じた調整値の変化の説明図である。It is explanatory drawing of the change of the adjustment value according to the level of a received signal. 第２実施形態に係る制御部のブロック図である。It is a block diagram of a control part concerning a 2nd embodiment. 音声区間および挿入区間でのレベルと調整値との関係の説明図である。It is explanatory drawing of the relationship between the level and adjustment value in an audio | voice area and an insertion area. 第３実施形態における受話信号のレベルと調整値との関係の説明図である。It is explanatory drawing of the relationship between the level of a received signal and adjustment value in 3rd Embodiment. 第４実施形態に係る音声通信装置のブロック図である。It is a block diagram of the audio | voice communication apparatus which concerns on 4th Embodiment. 変形例に係る音声通信装置のブロック図である。It is a block diagram of the audio | voice communication apparatus which concerns on a modification.

＜第１実施形態＞
図１は、本発明の第１実施形態に係る音声通信装置を利用した通信システムの構成を示す図である。図１に例示されるように、通信システム１００は、通信網２００と複数の音声通信装置Ｄ（Ｄ1，Ｄ2）とを含んで構成される。複数の音声通信装置Ｄの各々は、例えば、利用者に携行される通信端末であり、他の音声通信装置Ｄとの間で通信網２００を介した音声通話を実行する。通信網（例えば移動通信網）２００は、基地局と交換局とを含む多数の中継装置で構成される。図１では、相互に通信する２個の音声通信装置Ｄ（Ｄ1，Ｄ2）のみが便宜的に図示されている。以下の説明では、利用者Ｕ1が使用する音声通信装置Ｄ1と利用者Ｕ2が使用する音声通信装置Ｄ2とを利用して利用者Ｕ1と利用者Ｕ2とが通話する場合を想定する。また、音声通信装置Ｄ1に便宜的に着目して構成および動作を例示するが、音声通信装置Ｄ2の構成および動作も同様である。 <First Embodiment>
FIG. 1 is a diagram showing a configuration of a communication system using a voice communication apparatus according to the first embodiment of the present invention. As illustrated in FIG. 1, the communication system 100 includes a communication network 200 and a plurality of voice communication devices D (D1, D2). Each of the plurality of voice communication devices D is, for example, a communication terminal carried by a user, and performs a voice call via the communication network 200 with another voice communication device D. A communication network (for example, a mobile communication network) 200 includes a large number of relay devices including a base station and an exchange station. In FIG. 1, only two voice communication apparatuses D (D1, D2) communicating with each other are illustrated for convenience. In the following description, it is assumed that the user U1 and the user U2 make a call using the voice communication device D1 used by the user U1 and the voice communication device D2 used by the user U2. Further, although the configuration and the operation are illustrated with a focus on the voice communication device D1 for convenience, the configuration and the operation of the voice communication device D2 are the same.

図２は、音声通信装置Ｄ1のブロック図である。音声通信装置Ｄ1は、利用者Ｕ1が発声した音声等の周囲の音響を表す音響信号（以下「送話信号」という）ＳTを通信網２００に送信するとともに、通話相手である利用者Ｕ2の音声を含む音響を表す音響信号（以下「受話信号」という）ＳRを通信網２００から受信して受話信号ＳRに応じた音響を放射する装置であり、音響処理部１０と収音部２０と通信部３０と制御部４０と放音部５０とを具備する。図２に例示された各要素（例えば音響処理部１０や制御部４０）は、例えば各種の記録媒体に記憶されたプログラムを演算処理装置（ＣＰＵ）が実行することで実現される。なお、音響処理部１０の各機能を複数の集積回路に分散した構成や、専用の電子回路（ＤＳＰ）が各機能を実現する構成も採用され得る。音響信号をデジタル信号に変換するＡ／Ｄ変換器や、音響信号をアナログ信号に変換するＤ／Ａ変換器の図示は便宜的に省略されている。 FIG. 2 is a block diagram of the voice communication device D1. The voice communication device D1 transmits an acoustic signal (hereinafter referred to as “transmission signal”) ST representing the surrounding sound such as voice uttered by the user U1 to the communication network 200 and the voice of the user U2 who is the other party. Is a device that receives an acoustic signal (hereinafter referred to as “received signal”) SR representing the sound including sound from the communication network 200 and emits sound corresponding to the received signal SR, and includes an acoustic processing unit 10, a sound collecting unit 20, and a communication unit. 30, a control unit 40, and a sound emitting unit 50. Each element (for example, the acoustic processing unit 10 and the control unit 40) illustrated in FIG. 2 is realized by, for example, an arithmetic processing unit (CPU) executing programs stored in various recording media. A configuration in which each function of the acoustic processing unit 10 is distributed over a plurality of integrated circuits or a configuration in which a dedicated electronic circuit (DSP) realizes each function may be employed. An A / D converter that converts an acoustic signal into a digital signal and a D / A converter that converts an acoustic signal into an analog signal are not shown for convenience.

収音部２０は、周囲の音響を収音して収音信号Ｍ（ＭA1，ＭA2，ＭB）を生成する音響機器であり、相互に離間して配置される複数の収音機器２２（２２A1，２２A2，２２B）を含んで構成される。放音部５０は、利用者Ｕ2の音声通信装置Ｄ2から通信網２００を介して受信した受話信号ＳRに応じた音響を放射する音響機器（例えばスピーカやイヤホン）である。 The sound collection unit 20 is an acoustic device that collects ambient sounds and generates a sound collection signal M (MA1, MA2, MB), and a plurality of sound collection devices 22 (22A1,. 22A2, 22B). The sound emitting unit 50 is an acoustic device (for example, a speaker or an earphone) that emits sound corresponding to the received signal SR received from the voice communication device D2 of the user U2 via the communication network 200.

収音部２０には、目的音と環境音との混合音が到来する。目的音は、収音の目的となる音響であり、具体的には利用者Ｕ1の発声音である。環境音は、目的音以外の音響であり、利用者Ｕ1の周囲に存在する音響（例えば人混みでの雑踏音や空調設備の動作音等）と、放音部５０から放射されて収音機器２２に収音される帰還音とを含む。帰還音は、例えば、音声通信装置Ｄ2から送信された利用者Ｕ2の発声音である。 A mixed sound of the target sound and the environmental sound arrives at the sound collection unit 20. The target sound is the sound that is the target of sound collection, and specifically is the sound produced by the user U1. The environmental sound is sound other than the target sound, such as sound existing around the user U1 (for example, crowded noise in the crowd, operation sound of the air conditioning equipment, etc.) and the sound collecting device 22 radiated from the sound emitting unit 50. And the return sound collected. The feedback sound is, for example, the utterance sound of the user U2 transmitted from the voice communication device D2.

図３は、第１実施形態の音声通信装置Ｄの外観図である。図３では、眼鏡型のウェアラブル端末が音声通信装置Ｄ1として例示されている。音声通信装置Ｄ1は、利用者Ｕ1の両眼の前方に位置する本体部６０と、本体部６０の両側に設置される支持部６２および支持部６４とを具備する電子機器である。利用者Ｕ1の左耳に装着される支持部６２の基端側（本体部６０側）に収音機器２２A1が設置され、利用者Ｕ1の右耳に装着される支持部６４の基端側に収音機器２２A2が設置される。すなわち、収音機器２２A1と収音機器２２A2とは相互に間隔ｄ1をあけて離間する。支持部６４の先端側（収音機器２２A2とは反対側であって利用者Ｕ1からみて後方）には収音機器２２Bが設置される。 FIG. 3 is an external view of the voice communication device D according to the first embodiment. In FIG. 3, a glasses-type wearable terminal is illustrated as the voice communication device D1. The voice communication device D1 is an electronic device including a main body portion 60 positioned in front of both eyes of the user U1, and support portions 62 and support portions 64 installed on both sides of the main body portion 60. The sound collecting device 22A1 is installed on the base end side (main body 60 side) of the support portion 62 attached to the left ear of the user U1, and on the base end side of the support portion 64 attached to the right ear of the user U1. A sound collecting device 22A2 is installed. That is, the sound collecting device 22A1 and the sound collecting device 22A2 are separated from each other with a distance d1. The sound collection device 22B is installed on the front end side of the support portion 64 (on the opposite side to the sound collection device 22A2 and behind the user U1).

収音機器２２A（２２A1，２２A2）は、目的音の収音用に配置された無指向性のマイクロホンである。収音機器２２Aは、周囲の音響（目的音と環境音との混合音）の波形を表す収音信号ＭA（ＭA1，ＭA2）を生成する。他方、収音機器２２Bは、環境音の収音用に配置された無指向性のマイクロホンであり、目的音と比較して環境音を優勢に含有する音響の波形を表す収音信号ＭBを生成する。以上に説明した通り、支持部６２および支持部６４に収音部２０と放音部５０とが配置される構成では、利用者Ｕ1は拡声通話（ハンズフリー通話）が可能である。 The sound collecting device 22A (22A1, 22A2) is an omnidirectional microphone arranged for collecting a target sound. The sound collection device 22A generates a sound collection signal MA (MA1, MA2) representing the waveform of the surrounding sound (mixed sound of the target sound and the environmental sound). On the other hand, the sound collecting device 22B is an omnidirectional microphone arranged for collecting environmental sound, and generates a sound collecting signal MB representing an acoustic waveform containing the environmental sound predominantly compared to the target sound. To do. As described above, in the configuration in which the sound collection unit 20 and the sound emission unit 50 are arranged on the support unit 62 and the support unit 64, the user U1 can make a loud call (hands-free call).

図２の音響処理部１０は、収音部２０が生成した収音信号Ｍ（ＭA1，ＭA2，ＭB）に応じて送話信号ＳTを生成する。通信部３０は、送信部３２と受信部３４とを含み、通信網２００を介して音声通信装置Ｄ2との間で通信する通信機器（アンテナおよび変復調回路）である。送信部３２は、音声通信装置Ｄ2を送信先として送話信号ＳTを通信網２００に送信する。他方、受信部３４は、音声通信装置Ｄ2が送信した送話信号ＳTを受話信号ＳRとして通信網２００から受信する。前述の通り、受信部３４が受信した受話信号ＳRに応じた音響が放音部５０から放射される。 The sound processing unit 10 in FIG. 2 generates a transmission signal ST according to the sound collection signal M (MA1, MA2, MB) generated by the sound collection unit 20. The communication unit 30 includes a transmission unit 32 and a reception unit 34, and is a communication device (antenna and modulation / demodulation circuit) that communicates with the voice communication device D2 via the communication network 200. The transmission unit 32 transmits the transmission signal ST to the communication network 200 with the voice communication device D2 as the transmission destination. On the other hand, the receiving unit 34 receives the transmission signal ST transmitted from the voice communication device D2 from the communication network 200 as the reception signal SR. As described above, sound corresponding to the reception signal SR received by the receiving unit 34 is radiated from the sound emitting unit 50.

図２に例示される通り、第１実施形態の音響処理部１０は、第１信号処理部１１と第２信号処理部１２と加算部１３とを含んで構成される。第１信号処理部１１は、収音部２０（収音機器２２A1，収音機器２２A2）が生成した収音信号ＭAのうち目的音成分を強調（環境音成分を抑圧）した第１音響信号Ｓ1を生成する。 As illustrated in FIG. 2, the acoustic processing unit 10 according to the first embodiment includes a first signal processing unit 11, a second signal processing unit 12, and an addition unit 13. The first signal processing unit 11 emphasizes the target sound component (suppresses the environmental sound component) in the collected sound signal MA generated by the sound collecting unit 20 (sound collecting device 22A1, sound collecting device 22A2). Is generated.

図４は、第１信号処理部１１のブロック図である。図４に例示される通り、第１信号処理部１１は、残響抑圧部１１１と指向制御部１１２と残響抑圧部１１３と雑音抑圧部１１４と帯域強調部１１５と強度調整部１１６とを含んで構成される。残響抑圧部１１１は、受信部３４から放音部５０に供給される受話信号ＳRと各収音機器２２（２２A1，２２A2）が生成した収音信号ＭA（ＭA1，ＭA2）とを利用した適応フィルタ処理により、収音信号ＭA（ＭA1，ＭA2）に重畳された推定エコー成分Ｅを推定し、収音信号ＭA（ＭA1，ＭA2）から推定エコー成分Ｅを抑圧することで音響信号Ｘ1（Ｘ1a，Ｘ1b）を生成する。推定エコー成分Ｅは、放音部５０から収音部２０に到来する帰還音を推定した音響成分である。 FIG. 4 is a block diagram of the first signal processing unit 11. As illustrated in FIG. 4, the first signal processing unit 11 includes a dereverberation unit 111, a directivity control unit 112, a dereverberation suppression unit 113, a noise suppression unit 114, a band enhancement unit 115, and an intensity adjustment unit 116. Is done. The reverberation suppressing unit 111 is an adaptive filter that uses the received signal SR supplied from the receiving unit 34 to the sound emitting unit 50 and the collected sound signals MA (MA1, MA2) generated by the sound collecting devices 22 (22A1, 22A2). By processing, the estimated echo component E superimposed on the collected sound signal MA (MA1, MA2) is estimated, and the estimated echo component E is suppressed from the collected sound signal MA (MA1, MA2), thereby generating the acoustic signal X1 (X1a, X1b). ) Is generated. The estimated echo component E is an acoustic component obtained by estimating the feedback sound coming from the sound emitting unit 50 to the sound collecting unit 20.

指向制御部１１２は、収音機器２２A（２２A1，２２A2）の指向方向を制御する。具体的には、指向制御部１１２は、例えば公知のビーム形成処理（例えば遅延加算型ビーム形成）により、収音のビーム（収音感度が高い領域）を利用者Ｕ1の口元に向けるように制御して、音響信号Ｘ1（Ｘ1a，Ｘ1b）のうち目的音成分を強調した音響信号Ｘ2を生成する。残響抑圧部１１３は、音響信号Ｘ2から推定エコー成分Ｅを抑圧することで音響信号Ｘ3を生成する。残響抑圧部１１１および残響抑圧部１１３の双方で推定エコー成分Ｅを抑圧するのは、残響抑圧部１１１による１回の抑圧だけでは推定エコー成分Ｅを充分に抑圧できないからである。雑音抑圧部１１４は、音響信号Ｘ3から雑音成分（目的音成分以外の音響成分）を抑圧することで音響信号Ｘ4を生成する。雑音成分の抑圧には、スペクトル減算等の公知の雑音抑圧処理が任意に採用され得る。帯域強調部１１５は、音響信号Ｘ4のうち目的音成分（発声音）を包含する周波数帯域の音響成分が他帯域と比較して強調されるように音響信号Ｘ4の周波数特性を制御（イコライジング）して音響信号Ｘ5を生成する。強度調整部１１６は、音響信号Ｘ5のレベルのダイナミックレンジを周波数帯域毎に調整すること（Dynamic Range Control）で第１音響信号Ｓ1を生成する。 The directivity control unit 112 controls the directivity direction of the sound collection device 22A (22A1, 22A2). Specifically, the directivity control unit 112 performs control so that a beam of sound collection (an area where sound collection sensitivity is high) is directed toward the mouth of the user U1 by, for example, a known beam forming process (for example, delay addition type beam formation). Then, the acoustic signal X2 in which the target sound component is emphasized in the acoustic signal X1 (X1a, X1b) is generated. The reverberation suppression unit 113 generates the acoustic signal X3 by suppressing the estimated echo component E from the acoustic signal X2. The reason why the estimated echo component E is suppressed by both the reverberation suppressing unit 111 and the reverberation suppressing unit 113 is that the estimated echo component E cannot be sufficiently suppressed by only one suppression by the reverberation suppressing unit 111. The noise suppression unit 114 generates the acoustic signal X4 by suppressing noise components (acoustic components other than the target sound component) from the acoustic signal X3. For noise component suppression, known noise suppression processing such as spectral subtraction can be arbitrarily employed. The band emphasizing unit 115 controls (equalizes) the frequency characteristics of the acoustic signal X4 so that the acoustic component in the frequency band including the target sound component (voiced sound) in the acoustic signal X4 is enhanced compared to the other bands. To generate an acoustic signal X5. The intensity adjusting unit 116 generates the first acoustic signal S1 by adjusting the dynamic range of the level of the acoustic signal X5 for each frequency band (Dynamic Range Control).

図２の第２信号処理部１２は、収音部２０(収音機器２２B)が生成した収音信号ＭBのうち環境音成分を強調（目的音成分を抑圧）した第２音響信号Ｓ2を生成する。第２信号処理部１２は、図２に例示される通り、調音処理部１２０と調整部１３０とを含んで構成される。 The second signal processing unit 12 in FIG. 2 generates a second acoustic signal S2 that emphasizes the environmental sound component (suppresses the target sound component) in the collected sound signal MB generated by the sound collecting unit 20 (sound collecting device 22B). To do. As illustrated in FIG. 2, the second signal processing unit 12 includes an articulation processing unit 120 and an adjustment unit 130.

図５は、調音処理部１２０のブロック図である。第１実施形態の調音処理部１２０は、収音信号ＭBのうち環境音成分を強調した環境音信号ＳEを生成する要素であり、図５に例示される通り、雑音抑圧部１２１と帯域強調部１２２と強度調整部１２３とを含んで構成される。 FIG. 5 is a block diagram of the articulation processing unit 120. The articulation processing unit 120 of the first embodiment is an element that generates an environmental sound signal SE in which the environmental sound component is emphasized in the collected sound signal MB, and as illustrated in FIG. 5, a noise suppression unit 121 and a band enhancement unit. 122 and an intensity adjusting unit 123.

雑音抑圧部１２１は、収音機器２２Bが生成した収音信号ＭBのうち放音部５０を構成する機器に固有の雑音成分（例えばヒスノイズ）を抑圧することで音響信号Ｙ1を生成する。帯域強調部１２２は、音響信号Ｙ1のうち環境音成分を包含する周波数帯域の音響成分が他帯域と比較して強調されるように音響信号Ｙ1の周波数特性を制御（イコライジング）することで音響信号Ｙ2を生成する。強度調整部１２３は、音響信号Ｙ2のレベルのダイナミックレンジを周波数帯域毎に調整することで環境音信号ＳEを生成する。 The noise suppression unit 121 generates the acoustic signal Y1 by suppressing a noise component (for example, hiss noise) specific to the device constituting the sound emission unit 50 in the sound collection signal MB generated by the sound collection device 22B. The band emphasizing unit 122 controls (equalizing) the frequency characteristics of the acoustic signal Y1 so that the acoustic component in the frequency band including the environmental sound component in the acoustic signal Y1 is enhanced compared to the other bands. Y2 is generated. The intensity adjusting unit 123 generates the environmental sound signal SE by adjusting the dynamic range of the level of the acoustic signal Y2 for each frequency band.

図２の調整部１３０は、調音処理部１２０が生成した環境音信号ＳEを調整値Ｇに応じて調整することで第２音響信号Ｓ2を生成する。具体的には、環境音信号ＳEに調整値Ｇを乗算する乗算器が調整部１３０として好適に採用され得る。以上の説明から理解される通り、第２音響信号Ｓ2のレベルは調整値（ゲイン）Ｇに応じて調整される。 2 adjusts the environmental sound signal SE generated by the articulation processing unit 120 according to the adjustment value G to generate the second acoustic signal S2. Specifically, a multiplier that multiplies the environmental sound signal SE by the adjustment value G can be suitably employed as the adjustment unit 130. As understood from the above description, the level of the second acoustic signal S2 is adjusted according to the adjustment value (gain) G.

図２の加算部１３は、第１信号処理部１１が生成した第１音響信号Ｓ1と第２信号処理部１２が生成した第２音響信号Ｓ2とを加算することで送話信号ＳTを生成する。加算部１３による加算後の送話信号ＳTが送信部３２から通信網２００を介して利用者Ｕ2の音声通信装置Ｄ2に送信される。 2 adds the first acoustic signal S1 generated by the first signal processing unit 11 and the second acoustic signal S2 generated by the second signal processing unit 12 to generate the transmission signal ST. . The transmission signal ST after the addition by the adder 13 is transmitted from the transmitter 32 to the voice communication device D2 of the user U2 via the communication network 200.

制御部４０は、調整部１３０による調整に適用される調整値Ｇを可変に制御する。第１実施形態の制御部４０は、受信部３４が受信して放音部５０に供給される受話信号ＳRのレベルに応じて調整値Ｇを制御する。以上の説明から理解される通り、制御部４０は、受話信号ＳRのレベルに応じて第２音響信号Ｓ2のレベルを制御する要素として機能する。 The control unit 40 variably controls the adjustment value G applied to the adjustment by the adjustment unit 130. The control unit 40 according to the first embodiment controls the adjustment value G according to the level of the received signal SR received by the receiving unit 34 and supplied to the sound emitting unit 50. As understood from the above description, the control unit 40 functions as an element for controlling the level of the second acoustic signal S2 in accordance with the level of the received signal SR.

図６は、制御部４０のブロック図である。図６に例示される通り、制御部４０は、レベル算出部４２と調整値設定部４４とを含んで構成される。レベル算出部４２は、受話信号ＳRのレベルＬEを算出する。受話信号ＳRのレベルＬEの算出には公知の技術が任意に採用され得るが、例えば、受話信号ＳRのパワーを時間軸の方向に平滑化することでレベルＬEを算出することが可能である。 FIG. 6 is a block diagram of the control unit 40. As illustrated in FIG. 6, the control unit 40 includes a level calculation unit 42 and an adjustment value setting unit 44. The level calculator 42 calculates the level LE of the received signal SR. For calculating the level LE of the received signal SR, a known technique can be arbitrarily adopted. For example, the level LE can be calculated by smoothing the power of the received signal SR in the direction of the time axis.

調整値設定部４４は、レベル算出部４２が算出した受話信号ＳRのレベルＬEに応じた調整値Ｇを設定する。第１実施形態の調整値設定部４４は、受話信号ＳRのレベルＬEと調整値Ｇとの関係を規定する調整値テーブル（図示略）を調整値Ｇの設定に利用する。具体的には、調整値設定部４４は、受話信号ＳRのレベルＬEに対応する調整値Ｇを調整値テーブルから取得する。図７に例示される通り、第１実施形態では、受話信号ＳRのレベルＬEが大きいほど調整値Ｇが小さくなるように制御部４０において調整値Ｇが設定される。したがって、受話信号ＳRのレベルＬEが大きい（通話相手である利用者Ｕ2の発声音量が大きい）ほど、第２音響信号Ｓ2のレベルは低下する。なお、以上の説明では調整値テーブルを利用する構成を例示したが、受話信号ＳRのレベルＬEを適用した所定の演算で調整値Ｇを算定する構成も採用され得る。 The adjustment value setting unit 44 sets an adjustment value G corresponding to the level LE of the reception signal SR calculated by the level calculation unit 42. The adjustment value setting unit 44 of the first embodiment uses an adjustment value table (not shown) that defines the relationship between the level LE of the received signal SR and the adjustment value G for setting the adjustment value G. Specifically, the adjustment value setting unit 44 acquires the adjustment value G corresponding to the level LE of the reception signal SR from the adjustment value table. As illustrated in FIG. 7, in the first embodiment, the adjustment value G is set in the control unit 40 so that the adjustment value G decreases as the level LE of the received signal SR increases. Therefore, the level of the second acoustic signal S2 decreases as the level LE of the received signal SR increases (the utterance volume of the user U2 who is the other party is higher). In the above description, the configuration using the adjustment value table is exemplified. However, a configuration in which the adjustment value G is calculated by a predetermined calculation using the level LE of the received signal SR may be employed.

図８には、受話信号ＳRのレベルＬEに応じた調整値Ｇの変動の具体例が例示されている。受話信号ＳRのレベルＬEが充分に小さい状態（ｔ1〜ｔ2）では、調整値Ｇは最大値（例えば１）に維持される。そして、利用者Ｕ2による発声の開始とともに受話信号ＳRのレベルＬEが増加すると、調整値ＧはレベルＬEに連動して経時的に減少する（ｔ2〜ｔ3）。また、利用者Ｕ2による一連の発声が終了に近付いて受話信号ＳRのレベルＬEが減少すると、調整値ＧはレベルＬEに連動して経時的に増加する（ｔ3〜ｔ4）。 FIG. 8 illustrates a specific example of the fluctuation of the adjustment value G according to the level LE of the reception signal SR. When the level LE of the received signal SR is sufficiently small (t1 to t2), the adjustment value G is maintained at the maximum value (for example, 1). When the level LE of the received signal SR increases with the start of utterance by the user U2, the adjustment value G decreases with time (t2 to t3) in conjunction with the level LE. When the series of utterances by the user U2 approaches the end and the level LE of the received signal SR decreases, the adjustment value G increases with time in conjunction with the level LE (t3 to t4).

以上の例示から理解される通り、環境音成分を包含する第２音響信号Ｓ2のレベルは、通話相手である利用者Ｕ2の音声の有無（受話信号ＳRのレベルＬE）に応じて刻々と変動する。具体的には、利用者Ｕ2の音声が小さい状態（例えば利用者Ｕ2が沈黙して利用者Ｕ1の発声音を聴取する状態）では、利用者Ｕ1の周囲に存在する環境音成分を優勢に含有する第２音響信号Ｓ2を利用者Ｕ1の発声音（目的音成分）の第１音響信号Ｓ1に付加した送話信号ＳTが通話相手の利用者Ｕ2の音声通信装置Ｄ2に送信される。他方、放音部５０から収音部２０に到達する帰還音に利用者Ｕ2の音声が優勢に含有される状態（利用者Ｕ2が発声する状態）では、利用者Ｕ2の音声通信装置Ｄ2に送信される送話信号ＳTのうち帰還音を含有する第２音響信号Ｓ2のレベルが低下し、環境音成分が少なく、利用者Ｕ1の発声音成分が多い送話信号ＳTが送信される。 As can be understood from the above example, the level of the second acoustic signal S2 including the environmental sound component changes every moment according to the presence or absence of the voice of the user U2 who is the other party (the level LE of the received signal SR). . Specifically, in a state where the voice of the user U2 is low (for example, when the user U2 is silent and listens to the voice of the user U1), the environmental sound component present around the user U1 is predominantly included. The transmission signal ST obtained by adding the second acoustic signal S2 to the first acoustic signal S1 of the utterance sound (target sound component) of the user U1 is transmitted to the voice communication device D2 of the user U2 who is the other party. On the other hand, in a state where the voice of the user U2 is predominantly contained in the feedback sound that reaches the sound pickup unit 20 from the sound emitting unit 50 (a state in which the user U2 utters), it is transmitted to the voice communication device D2 of the user U2. Among the transmitted signals ST, the level of the second acoustic signal S2 containing the feedback sound is lowered, and the transmitted signal ST having a small environmental sound component and a large amount of the utterance sound component of the user U1 is transmitted.

以上に説明した通り、第１実施形態では、収音機器２２A(２２A1，２２A2)が収音した収音信号ＭAのうち目的音成分を強調した第１音響信号Ｓ1と、環境音成分を強調した第２音響信号Ｓ2との双方が音声通信装置Ｄ2に送信される。したがって、目的音成分が強調された第１音響信号Ｓ1のみが音声通信装置Ｄ2に送信される構成と比較すると、音声通信装置Ｄ2の利用者Ｕ2が、利用者Ｕ1の発声音のほか、利用者Ｕ1の周囲の環境音を聴取できるという利点がある。しかも、第１実施形態では、第２音響信号Ｓ2のレベルが第１音響信号Ｓ1とは独立に調整されるから、利用者Ｕ1の周囲の環境音を適切なレベルで利用者Ｕ2に伝達することが可能である。 As described above, in the first embodiment, the first sound signal S1 in which the target sound component is emphasized and the environmental sound component in the sound collection signal MA collected by the sound collection device 22A (22A1, 22A2) is emphasized. Both the second acoustic signal S2 is transmitted to the voice communication device D2. Therefore, in comparison with a configuration in which only the first acoustic signal S1 with the target sound component emphasized is transmitted to the voice communication device D2, the user U2 of the voice communication device D2 can not only hear the user U1 but also the user. There is an advantage that environmental sounds around U1 can be heard. Moreover, in the first embodiment, since the level of the second acoustic signal S2 is adjusted independently of the first acoustic signal S1, the environmental sound around the user U1 is transmitted to the user U2 at an appropriate level. Is possible.

ところで、適度なレベルの環境音を利用者Ｕ1の音声に付加した送話信号ＳTを生成する構成（環境音を利用者Ｕ2に伝達する構成）としては、例えば、第１実施形態の第２信号処理部１２を省略したうえで、収音信号ＭAの環境音成分を第１信号処理部１１にて完全には除去せずに送話信号ＳTに適度なレベルで残存させる、という構成（以下「対比例１」という）も想定され得る。しかし、目的音成分と環境音成分とが単一の系統で纏めて処理される対比例１の構成では、環境音成分を抑圧する処理にて目的音成分に不可避的に波形歪が発生して音質が低下する（例えば目的音成分の聴覚的な明瞭性が低下する）という問題が発生し得る。対比例１とは対照的に、第１実施形態では、第１信号処理部１１による目的音成分の強調と第２信号処理部１２による環境音成分の強調とが相互に別個に実行されたうえで、処理後の第１音響信号Ｓ1と第２音響信号Ｓ2とを含む送話信号ＳTが送信される。以上の構成では、第１信号処理部１１では目的音成分の強調に最適化された音響処理により音質（特に明瞭性）を維持しながら目的音成分を充分に強調する一方、第２信号処理部１２では環境音成分の強調に最適化された音響処理により音質を維持しながら環境音成分を充分に強調することが可能である。したがって、目的音成分および環境音成分の各々が高音質に維持された送話信号ＳTを利用者Ｕ2に送信できるという利点がある。特に、目的音成分および環境音成分の各々の明瞭性が維持されるから、環境音が周囲に存在するなかで利用者Ｕ1が目的音を発生するという状況を利用者Ｕ2が明瞭に知覚し得る臨場感のある送話信号ＳTを生成することが可能である。 By the way, as a configuration for generating a transmission signal ST in which an environmental sound of an appropriate level is added to the voice of the user U1 (a configuration for transmitting the environmental sound to the user U2), for example, the second signal of the first embodiment. A configuration in which the processing unit 12 is omitted and the ambient sound component of the collected sound signal MA is not completely removed by the first signal processing unit 11 but is left in the transmission signal ST at an appropriate level (hereinafter, “ Also referred to as “proportional 1”). However, in the configuration of the proportional 1 in which the target sound component and the environmental sound component are processed together in a single system, waveform distortion inevitably occurs in the target sound component in the process of suppressing the environmental sound component. There may be a problem that the sound quality is degraded (for example, the auditory clarity of the target sound component is degraded). In contrast to the contrast 1, in the first embodiment, the enhancement of the target sound component by the first signal processing unit 11 and the enhancement of the environmental sound component by the second signal processing unit 12 are performed separately from each other. Then, the transmission signal ST including the processed first acoustic signal S1 and second acoustic signal S2 is transmitted. With the above configuration, the first signal processing unit 11 sufficiently emphasizes the target sound component while maintaining the sound quality (particularly clarity) by the acoustic processing optimized for emphasizing the target sound component, while the second signal processing unit 12, it is possible to sufficiently enhance the environmental sound component while maintaining the sound quality by the acoustic processing optimized for enhancing the environmental sound component. Therefore, there is an advantage that the transmission signal ST in which each of the target sound component and the environmental sound component is maintained at high sound quality can be transmitted to the user U2. In particular, since the clarity of each of the target sound component and the environmental sound component is maintained, the user U2 can clearly perceive the situation where the user U1 generates the target sound in the presence of the environmental sound. It is possible to generate a transmission signal ST with a sense of presence.

なお、環境音成分が優勢な第２音響信号Ｓ2を目的音成分が優勢な第１音響信号Ｓ1に付加するという観点のみからすると、第２音響信号Ｓ2のレベルを制御する要素（制御部４０および調整部１３０）を省略し、調音処理部１２０が生成した環境音信号ＳEを第２音響信号Ｓ2として第１音響信号Ｓ1に加算する構成（以下「対比例２」という）も想定され得る。しかし、受話信号ＳRに包含される利用者Ｕ2の音声は、放音部５０から収音部２０に到達する帰還音として環境音に包含されるから、受話信号ＳRのレベルＬEに関わらず第２音響信号Ｓ2のレベルが維持される対比例２の構成では、音声通信装置Ｄ1と音声通信装置Ｄ2との間で利用者Ｕ2の音声が循環し、結果的にハウリングを発生させる原因となり得る。対比例２とは対照的に、第１実施形態では、受話信号ＳRのレベルＬEに応じて第２音響信号Ｓ2のレベル（第１音響信号Ｓ1に付加される環境音成分のレベル）が制御される。具体的には、受話信号ＳRのレベルＬEが高いほど第２音響信号Ｓ2のレベルが低下する。すなわち、前述の通り、利用者Ｕ2が発声する期間内では第２音響信号Ｓ2が低いレベルに抑制される。したがって、第１実施形態によれば、利用者Ｕ2が発声する音声に起因したハウリングを対比例２と比較して有効に防止できるという利点がある。 From the viewpoint of adding the second acoustic signal S2 having a dominant environmental sound component to the first acoustic signal S1 having a dominant target sound component, an element that controls the level of the second acoustic signal S2 (the control unit 40 and A configuration (hereinafter referred to as “proportional 2”) in which the adjustment unit 130) is omitted and the environmental sound signal SE generated by the articulation processing unit 120 is added to the first acoustic signal S1 as the second acoustic signal S2 may be assumed. However, since the voice of the user U2 included in the received signal SR is included in the environmental sound as a feedback sound that reaches the sound collecting unit 20 from the sound emitting unit 50, the second is irrespective of the level LE of the received signal SR. In the configuration of contrast 2 in which the level of the acoustic signal S2 is maintained, the voice of the user U2 circulates between the voice communication device D1 and the voice communication device D2, which may cause howling as a result. In contrast to contrast 2, in the first embodiment, the level of the second acoustic signal S2 (the level of the environmental sound component added to the first acoustic signal S1) is controlled in accordance with the level LE of the received signal SR. The Specifically, the higher the level LE of the received signal SR, the lower the level of the second acoustic signal S2. That is, as described above, the second acoustic signal S2 is suppressed to a low level within the period when the user U2 speaks. Therefore, according to the first embodiment, there is an advantage that howling caused by the voice uttered by the user U2 can be effectively prevented as compared with the comparative example 2.

なお、以上の説明から理解される通り、利用者Ｕ2が聴取する音響では利用者Ｕ1の周囲の環境音の音量が刻々と増減するから、当該音響を聴取する利用者Ｕ2が聴覚的な違和感を知覚する可能性も想定される。しかし、自身が発声している最中には自身に対する到来音を余り意識しない（自分の発話中は他人の発声を余り集中して聴取しない）という一般的な傾向を考慮すると、利用者Ｕ2が聴取する音響において利用者Ｕ1の環境音の音量が変動することに起因して利用者Ｕ2が違和感を知覚する可能性は特段の問題にならない。 In addition, as understood from the above description, in the sound that the user U2 listens to, the volume of the environmental sound around the user U1 increases and decreases every moment, so that the user U2 who listens to the sound feels a sense of incongruity. The possibility of perception is also assumed. However, considering the general tendency that the user is not conscious of the incoming sound during his / her utterance (not intensively listening to the utterances of other people during his / her utterance), the user U2 The possibility that the user U2 perceives a sense of incongruity due to the change in the volume of the environmental sound of the user U1 in the sound to be listened to is not a special problem.

＜第２実施形態＞
本発明の第２実施形態を以下に説明する。第１実施形態では、受話信号ＳRのレベルＬEに応じて第２音響信号Ｓ2のレベルを制御した。以上の構成において、利用者Ｕ2による相前後する発話の間で利用者Ｕ2の音声が途切れる区間にて環境音が変動すると、利用者Ｕ2が違和感を知覚する可能性がある。以上の事情を考慮して、第２実施形態では、受話信号ＳRのうち利用者Ｕ2の音声が優勢に存在する区間（以下「音声区間」という）と音声区間以外の区間（以下「挿入区間」という）とで調整値Ｇの設定を相違させる。なお、以下に例示する各形態において作用や機能が第１実施形態と同様である要素については、第１実施形態の説明で参照した符号を流用して各々の詳細な説明を適宜に省略する。 Second Embodiment
A second embodiment of the present invention will be described below. In the first embodiment, the level of the second acoustic signal S2 is controlled according to the level LE of the received signal SR. In the above configuration, if the environmental sound fluctuates in a section where the voice of the user U2 is interrupted between successive utterances by the user U2, the user U2 may perceive a sense of discomfort. In consideration of the above circumstances, in the second embodiment, in the received signal SR, a section where the voice of the user U2 is dominant (hereinafter referred to as “voice section”) and a section other than the voice section (hereinafter referred to as “insertion section”). And the adjustment value G is set differently. In addition, about the element which an effect | action and function are the same as that of 1st Embodiment in each form illustrated below, the reference | standard referred by description of 1st Embodiment is diverted, and each detailed description is abbreviate | omitted suitably.

図９は、第２実施形態の制御部４０のブロック図であり、図１０は、第２実施形態における調整値Ｇの設定の説明図である。図９に例示される通り、第２実施形態の調整値設定部４４は、区間検出部４６を包含する。区間検出部４６は、図１０に例示される通り、受話信号ＳRを時間軸上で音声区間ＴV（ＴV1，ＴV2）と挿入区間ＴD（ＴD1，ＴD2）とに区分する。挿入区間ＴDは、例えば、利用者Ｕ2による相前後する発話の間で利用者Ｕ2の音声が途切れる区間である。区間検出部４６は、例えば、受話信号ＳRを時間軸上で区分したフレーム毎に音声の有無を判定することで受話信号ＳRを音声区間ＴVと挿入区間ＴDとに区分する。音声区間ＴVと挿入区間ＴDとの判別には、公知の音声検出技術が任意に採用される。 FIG. 9 is a block diagram of the control unit 40 of the second embodiment, and FIG. 10 is an explanatory diagram of setting of the adjustment value G in the second embodiment. As illustrated in FIG. 9, the adjustment value setting unit 44 of the second embodiment includes a section detection unit 46. As illustrated in FIG. 10, the section detector 46 divides the received signal SR into a voice section TV (TV1, TV2) and an insertion section TD (TD1, TD2) on the time axis. The insertion section TD is a section where, for example, the voice of the user U2 is interrupted between successive utterances by the user U2. The section detector 46 divides the received signal SR into a voice section TV and an insertion section TD by determining the presence / absence of voice for each frame obtained by dividing the received signal SR on the time axis, for example. A known voice detection technique is arbitrarily employed for discrimination between the voice section TV and the insertion section TD.

図１０は、音声区間ＴVおよび挿入区間ＴDの調整値Ｇの説明図である。図１０では、時間軸上に例示される音声区間ＴVのうち、音声区間ＴV（ＴV1，ＴV2）と挿入区間ＴD（ＴD1，ＴD2）とを便宜的に図示している。第２実施形態の調整値設定部４４は、音声区間ＴVと挿入区間ＴDとで調整値Ｇを個別に設定する。各区間（音声区間ＴV，挿入区間ＴD）での調整値Ｇの設定について以下に詳述する。 FIG. 10 is an explanatory diagram of the adjustment value G of the voice section TV and the insertion section TD. In FIG. 10, among the voice sections TV exemplified on the time axis, a voice section TV (TV1, TV2) and an insertion section TD (TD1, TD2) are shown for convenience. The adjustment value setting unit 44 of the second embodiment sets the adjustment value G individually for the voice section TV and the insertion section TD. The setting of the adjustment value G in each section (voice section TV, insertion section TD) will be described in detail below.

音声区間ＴV内において、調整値設定部４４は、第１実施形態と同様に、受話信号ＳRのレベルＬEに応じて調整値Ｇを設定する。具体的には、調整値設定部４４は、受話信号ＳRのレベルが高いほど第２音響信号Ｓ2のレベルが低下するように調整値Ｇを設定する。 Within the voice section TV, the adjustment value setting unit 44 sets the adjustment value G according to the level LE of the received signal SR, as in the first embodiment. Specifically, the adjustment value setting unit 44 sets the adjustment value G so that the level of the second acoustic signal S2 decreases as the level of the received signal SR increases.

他方、挿入区間ＴDでは、音声通信装置Ｄ2の利用者の発話の状況に応じて調整値Ｇが設定される。具体的には、以下に例示される通り、調整値設定部４４は、挿入区間ＴDの時間長と閾値Ｔ0との比較結果に応じて調整値Ｇを設定する。 On the other hand, in the insertion section TD, the adjustment value G is set according to the state of speech of the user of the voice communication device D2. Specifically, as exemplified below, the adjustment value setting unit 44 sets the adjustment value G according to the comparison result between the time length of the insertion interval TD and the threshold value T0.

（a）．挿入区間ＴDの時間長が閾値Ｔ0を下回る場合（ＴD＜Ｔ0）
挿入区間ＴDの時間長が閾値Ｔ0を下回る場合（相前後する音声区間ＴVの間隔が短い場合）には、音声通信装置Ｄ2の利用者Ｕ2が発話の途中であると推定される。相前後する発話の間に環境音成分（第２音響信号Ｓ2）のレベルが音声区間ＴVと同様に変動すると、利用者Ｕ2が違和感を知覚する可能性がある。以上の傾向を考慮して、調整値設定部４４は、受話信号ＳRのレベルＬEに応じた第２音響信号Ｓ2のレベルの変動が、音声区間ＴVと比較して抑制されるように調整値Ｇを設定する。具体的には、図１０の挿入区間ＴD1に例示される通り、受話信号ＳRのレベルＬEに対する調整値Ｇの変動（レベルＬEに対する調整値Ｇの変化率）が、音声区間ＴVにおけるレベルＬEに対する調整値Ｇの変動と比較して緩やかになるように調整値Ｇが設定される。なお、時間長が閾値Ｔ0を下回る挿入区間ＴDにおいて調整値Ｇを一定値に維持する構成（レベルＬEに対する調整値Ｇの変化率をゼロに設定する構成）も採用され得る。 (A). When the length of the insertion section TD is less than the threshold value T0 (TD <T0)
When the time length of the insertion section TD is less than the threshold value T0 (when the interval between adjacent voice sections TV is short), it is estimated that the user U2 of the voice communication device D2 is in the middle of speaking. If the level of the environmental sound component (second acoustic signal S2) fluctuates in the same manner as in the voice interval TV during successive utterances, there is a possibility that the user U2 perceives a sense of discomfort. In consideration of the above tendency, the adjustment value setting unit 44 adjusts the adjustment value G so that the fluctuation of the level of the second acoustic signal S2 according to the level LE of the reception signal SR is suppressed as compared with the voice interval TV. Set. Specifically, as illustrated in the insertion interval TD1 in FIG. 10, the fluctuation of the adjustment value G with respect to the level LE of the received signal SR (the rate of change of the adjustment value G with respect to the level LE) is adjusted with respect to the level LE in the voice interval TV. The adjustment value G is set so as to be gradual compared with the fluctuation of the value G. A configuration in which the adjustment value G is maintained at a constant value in the insertion interval TD whose time length is less than the threshold T0 (a configuration in which the rate of change of the adjustment value G with respect to the level LE is set to zero) may be employed.

（b）．挿入区間ＴDの時間長が閾値Ｔ0を上回る場合（ＴD＞Ｔ0）
挿入区間ＴDの時間長が閾値Ｔ0を上回る場合（直前の音声区間ＴVの終点から利用者Ｕ2が発声することなく充分な時間長が経過した場合）には、利用者Ｕ2の一連の発話が終了した状況（例えば発話主体が利用者Ｕ1に変更されて利用者Ｕ2は音声通信装置Ｄ1の利用者Ｕ1の音声を聴取している状況）にあると推定されるから、利用者Ｕ1の周囲の環境音が適切なレベルで利用者Ｕ2に伝達されることが好ましい。以上の傾向を考慮して、調整値設定部４４は、受話信号ＳRのレベルＬEに応じた第２音響信号Ｓ2のレベルの変動が音声区間ＴV1と同等になるように調整値Ｇを設定する。具体的には、図１０の挿入区間ＴD2に例示される通り、直前の音声区間ＴV2の終点からの経過時間が閾値Ｔ0を上回る時点から、音声区間ＴVと同様の調整値Ｇの制御が開始される。音声区間ＴV2の終点からの経過時間が閾値Ｔ0を上回った時点で、受話信号ＳRのレベルＬEに応じて調整値Ｇを最大値まで増大させる構成も考えられる。しかし、調整値Ｇが急激に増大する構成では、利用者Ｕ2が違和感を知覚する可能性がある。そこで、図１０に例示される通り、利用者Ｕ1の周囲の環境音が適切なレベルで利用者Ｕ2に伝達するように調整値Ｇを徐々に増大させる。 (B). When the length of the insertion section TD exceeds the threshold value T0 (TD> T0)
When the time length of the insertion section TD exceeds the threshold value T0 (when a sufficient time length has passed without the user U2 uttering from the end point of the immediately preceding voice section TV), the series of utterances of the user U2 ends. (For example, the user U2 is changing to the user U1 and the user U2 is listening to the voice of the user U1 of the voice communication device D1). The sound is preferably transmitted to the user U2 at an appropriate level. Considering the above tendency, the adjustment value setting unit 44 sets the adjustment value G so that the fluctuation of the level of the second acoustic signal S2 corresponding to the level LE of the received signal SR becomes equal to that of the voice section TV1. Specifically, as illustrated in the insertion interval TD2 in FIG. 10, the control of the adjustment value G similar to that in the audio interval TV is started when the elapsed time from the end point of the immediately previous audio interval TV2 exceeds the threshold T0. The A configuration in which the adjustment value G is increased to the maximum value according to the level LE of the received signal SR when the elapsed time from the end point of the voice section TV2 exceeds the threshold value T0 is also conceivable. However, in the configuration in which the adjustment value G increases rapidly, the user U2 may perceive a sense of discomfort. Therefore, as illustrated in FIG. 10, the adjustment value G is gradually increased so that the environmental sound around the user U1 is transmitted to the user U2 at an appropriate level.

調整部１３０は、音声区間ＴVおよび挿入区間ＴDの各区間について設定された調整値Ｇを環境音信号ＳEに乗算することで第２音響信号Ｓ2のレベルを制御する。以降の処理については、第１実施形態と同様であるので詳細な説明を省略する。 The adjustment unit 130 controls the level of the second acoustic signal S2 by multiplying the environmental sound signal SE by the adjustment value G set for each of the voice interval TV and the insertion interval TD. Since the subsequent processing is the same as in the first embodiment, detailed description thereof is omitted.

第２実施形態においても第１実施形態と同様の効果が実現される。また、第２実施形態では、受話信号ＳRのうち挿入区間ＴD（時間長が閾値Ｔ0を下回る挿入区間ＴD）において、第２音響信号Ｓ2のレベルの変動が音声区間ＴVと比較して抑制されるように第２音響信号Ｓ2のレベルが制御される。したがって、挿入区間ＴD内での環境音の変動に起因して利用者Ｕ2が違和感を知覚する可能性が低減されるという利点がある。 In the second embodiment, the same effect as in the first embodiment is realized. Further, in the second embodiment, in the insertion section TD (insertion section TD whose time length is less than the threshold value T0) in the reception signal SR, fluctuations in the level of the second acoustic signal S2 are suppressed compared to the voice section TV. In this way, the level of the second acoustic signal S2 is controlled. Therefore, there is an advantage that the possibility that the user U2 perceives a sense of incongruity due to the fluctuation of the environmental sound in the insertion section TD is reduced.

他方、時間長が閾値Ｔ0を上回る挿入区間ＴD（利用者Ｕ2の一連の発話が終了したと推定される状況）では、受話信号ＳRのレベルＬEに応じた第２音響信号Ｓ2のレベルの制御が音声区間ＴVと同様の制御に復帰する。したがって、利用者Ｕ2の発話の終了後には、利用者Ｕ1側の環境音を利用者Ｕ2に適切に伝達できるという利点がある。以上の説明から理解される通り、第２実施形態では、利用者Ｕ2の発話の状況に応じた適切なレベルの環境音を利用者Ｕ2に伝達することが可能である。 On the other hand, in the insertion interval TD in which the time length exceeds the threshold value T0 (a situation in which a series of utterances by the user U2 is estimated to have ended), the control of the level of the second acoustic signal S2 according to the level LE of the received signal SR is performed. The control returns to the same control as in the voice section TV. Therefore, there is an advantage that the environmental sound on the user U1 side can be appropriately transmitted to the user U2 after the end of the utterance of the user U2. As understood from the above description, in the second embodiment, it is possible to transmit an environmental sound of an appropriate level according to the state of the utterance of the user U2 to the user U2.

＜第３実施形態＞
第１実施形態では、受話信号ＳRのレベルＬEに対して調整値Ｇを直線的に変化させたが、レベルＬEに対する調整値Ｇの変化の態様は以上の例示に限定されない。例えば、受話信号ＳRに起因したハウリングが発生しない程度に受話信号ＳRのレベルＬEが低い範囲では、第２音響信号Ｓ2のレベルの低減によるハウリングの抑制よりも、音声通信装置Ｄ2の利用者Ｕ2に対する環境音の伝達を優先させるべきである。以上の事情を考慮して、第３実施形態では、受話信号ＳRのレベルＬEが所定の閾値Ｌ0を上回る場合に、当該レベルＬEの増加に応じて調整値Ｇが低減されるように調整値Ｇが設定される。 <Third Embodiment>
In the first embodiment, the adjustment value G is linearly changed with respect to the level LE of the received signal SR, but the mode of change of the adjustment value G with respect to the level LE is not limited to the above example. For example, in a range where the level LE of the received signal SR is so low that howling due to the received signal SR does not occur, the feedback to the user U2 of the voice communication device D2 is more effective than the suppression of howling by reducing the level of the second acoustic signal S2. Prioritize environmental sound transmission. Considering the above circumstances, in the third embodiment, when the level LE of the received signal SR exceeds a predetermined threshold value L0, the adjustment value G is reduced so that the adjustment value G is reduced as the level LE increases. Is set.

図１１は、第３実施形態における受話信号ＳRのレベルＬEと調整値Ｇとの関係（調整値テーブル）の説明図である。図１１に例示されるように、受話信号ＳRのレベルＬEが閾値Ｌ0を下回る場合（ＬE＜Ｌ0）には、調整値Ｇは、レベルＬEに依存しない所定値（例えば最大値１）に維持される。他方、受話信号ＳRのレベルＬEが閾値Ｌ0を上回る場合（ＬE＞Ｌ0）には、受話信号ＳRのレベルＬEが増加するほど調整値Ｇが低減するように調整値Ｇが設定される。具体的には、レベルＬEの値域のうち閾値Ｌ0を上回る領域が複数の範囲に区分され、レベルＬEに対する調整値Ｇの変化率が範囲毎に個別に設定される。図１１では、レベルＬEが大きい範囲ほどレベルＬEに対する調整値Ｇの変化率（勾配）が増加する場合が例示されている。 FIG. 11 is an explanatory diagram of the relationship (adjustment value table) between the level LE of the received signal SR and the adjustment value G in the third embodiment. As illustrated in FIG. 11, when the level LE of the received signal SR is lower than the threshold value L0 (LE <L0), the adjustment value G is maintained at a predetermined value (for example, the maximum value 1) independent of the level LE. The On the other hand, when the level LE of the received signal SR exceeds the threshold value L0 (LE> L0), the adjustment value G is set so that the adjustment value G decreases as the level LE of the received signal SR increases. Specifically, an area exceeding the threshold L0 in the range of the level LE is divided into a plurality of ranges, and the rate of change of the adjustment value G with respect to the level LE is set individually for each range. FIG. 11 illustrates a case where the rate of change (gradient) of the adjustment value G with respect to the level LE increases as the level LE increases.

第３実施形態においても第１実施形態と同様の効果が実現される。また、第３実施形態では、受話信号ＳRのレベルＬEが閾値Ｌ0を上回る場合にのみ、調整値Ｇが低減されるように（ひいては第２音響信号Ｓ2のレベルが低下するように）、すなわち、受話信号ＳRのレベルＬEに応じて調整値Ｇの変化の度合いが異なるように調整値Ｇが設定される。したがって、第２音響信号Ｓ2のレベルを受話信号ＳRのレベルＬEに連動させる度合いを受話信号ＳRのレベルＬEに関わらず一定とした構成と比較して、適切に環境音が伝達されるという利点がある。 In the third embodiment, the same effect as in the first embodiment is realized. In the third embodiment, the adjustment value G is reduced only when the level LE of the received signal SR exceeds the threshold value L0 (so that the level of the second acoustic signal S2 is lowered), that is, The adjustment value G is set so that the degree of change of the adjustment value G differs according to the level LE of the received signal SR. Therefore, there is an advantage that the environmental sound is appropriately transmitted as compared with the configuration in which the level of the second acoustic signal S2 is linked to the level LE of the received signal SR regardless of the level LE of the received signal SR. is there.

＜第４実施形態＞
図１２は、第４実施形態の音声通信装置Ｄ1のブロック図である。第１実施形態では、目的音の収音用の収音機器２２Aとは別個に、環境音の収音用の収音機器２２Bを設置した。第４実施形態では、図１２に例示される通り、目的音の収音と環境音の収音とに共通の収音機器２２A2が兼用される。収音機器２２A2が生成した収音信号ＭA2は、第１信号処理部１１および第２信号処理部１２の双方に供給され、第２信号処理部１２は、収音信号ＭA2の環境音成分を強調することで第２音響信号Ｓ2を生成する。以上の構成では、目的音の収音用の収音機器と環境音の収音用の収音機器とを別個に設ける必要がないから、音声通信装置Ｄ1の装置構成が簡略化されるという利点がある。なお、収音機器２２A1を指向性のマイクロホンによって構成し、指向方向を音声通信装置Ｄ1の利用者の口元としてもよい。また、目的音の収音と環境音の収音とに兼用される収音機器２２A2を無指向性のマイクロホンによって構成しても良い。 <Fourth embodiment>
FIG. 12 is a block diagram of the voice communication device D1 of the fourth embodiment. In the first embodiment, the sound collecting device 22B for collecting the environmental sound is installed separately from the sound collecting device 22A for collecting the target sound. In the fourth embodiment, as illustrated in FIG. 12, a common sound collecting device 22A2 is also used for collecting the target sound and the environmental sound. The sound collection signal MA2 generated by the sound collection device 22A2 is supplied to both the first signal processing unit 11 and the second signal processing unit 12, and the second signal processing unit 12 emphasizes the environmental sound component of the sound collection signal MA2. As a result, the second acoustic signal S2 is generated. In the above configuration, it is not necessary to separately provide a sound collecting device for collecting the target sound and a sound collecting device for collecting the environmental sound, so that the device configuration of the voice communication device D1 is simplified. There is. Note that the sound collecting device 22A1 may be configured by a directional microphone, and the directional direction may be the mouth of the user of the voice communication device D1. Further, the sound collecting device 22A2 that is used for collecting the target sound and the environmental sound may be constituted by a non-directional microphone.

＜変形例＞
前述の各形態は多様に変形され得る。具体的な変形の態様を以下に例示する。以下の例示から任意に選択された２以上の態様を適宜に併合することも可能である。 <Modification>
Each of the above-described embodiments can be variously modified. Specific modifications are exemplified below. Two or more modes arbitrarily selected from the following examples can be appropriately combined.

（１）前述の各形態で例示した第１信号処理部１１および第２信号処理部１２の構成要素は任意であり、図４や図５に例示された要素は適宜に省略され得る。例えば、第１信号処理部１１から指向制御部１１２を省略した構成も好適に採用され得る。前述の各形態のように指向制御部１１２を包含する構成では、指向方向を制御するビーム形成処理のために複数の収音機器２２（２２A1，２２A2）が必要であるが、指向制御部１１２を省略した構成（ビーム形成処理を実行しない構成）では、例えば図１３に例示される通り、収音部２０を１個の収音機器２２A1により構成することも可能である。収音機器２２A1から収音された収音信号ＭA1は、第１信号処理部１１および第２信号処理部１２に供給される。以上の説明から理解される通り、前述の各形態の収音部２０は、目的音と環境音とを含む音響を収音して収音信号Ｍを生成する要素として包括的に表現され、収音部２０を構成する収音機器２２の個数や指向性の有無は不問である。 (1) The components of the first signal processing unit 11 and the second signal processing unit 12 illustrated in the above-described embodiments are arbitrary, and the components illustrated in FIGS. 4 and 5 can be omitted as appropriate. For example, a configuration in which the directing control unit 112 is omitted from the first signal processing unit 11 can be suitably employed. In the configuration including the directivity control unit 112 as in each of the above-described embodiments, a plurality of sound collecting devices 22 (22A1, 22A2) are necessary for the beam forming process for controlling the directivity direction. In the omitted configuration (configuration in which beam forming processing is not executed), for example, as illustrated in FIG. 13, the sound collection unit 20 may be configured by one sound collection device 22A1. The collected sound signal MA1 collected from the sound collecting device 22A1 is supplied to the first signal processing unit 11 and the second signal processing unit 12. As understood from the above description, the sound collection unit 20 of each of the above-described forms is comprehensively expressed as an element that collects sound including the target sound and the environmental sound and generates the sound collection signal M, and collects sound. The number of sound collecting devices 22 constituting the sound unit 20 and the presence or absence of directivity are not questioned.

（２）前述の各形態では、目的音成分に対応する第１音響信号Ｓ1と環境音成分に対応する第２音響信号Ｓ2との加算結果に応じた送話信号ＳTを通話相手の音声通信装置Ｄ2に送信する構成を例示したが、第１音響信号Ｓ1と第２音響信号Ｓ2とを相互に個別に送信して音声通信装置Ｄ2で加算することも可能である。すなわち、音声通信装置Ｄ1の加算部１３は省略され得る。以上の説明から理解される通り、前述の各形態の送信部３２は、第１音響信号Ｓ1と第２音響信号Ｓ2とを送信する要素として包括的に表現され、第１音響信号Ｓ1と第２音響信号Ｓ2との加算信号（送話信号ＳT）を送信する構成と、第１音響信号Ｓ1および第２音響信号Ｓ2の各々を送信する構成との双方を包含する。 (2) In each of the above-described embodiments, the transmission signal ST corresponding to the addition result of the first acoustic signal S1 corresponding to the target sound component and the second acoustic signal S2 corresponding to the environmental sound component is used as the voice communication device of the communication partner. Although the configuration of transmitting to D2 has been illustrated, the first acoustic signal S1 and the second acoustic signal S2 can be transmitted separately from each other and added by the voice communication device D2. That is, the adding unit 13 of the voice communication device D1 can be omitted. As understood from the above description, the transmission unit 32 of each of the above-described forms is comprehensively expressed as an element that transmits the first acoustic signal S1 and the second acoustic signal S2, and the first acoustic signal S1 and the second acoustic signal S2 It includes both a configuration for transmitting an addition signal (transmission signal ST) with the acoustic signal S2 and a configuration for transmitting each of the first acoustic signal S1 and the second acoustic signal S2.

（３）通信網２００としては、広帯域のＩＰ(Internet Protocol)網、公共無線ＬＡＮ（ＷｉＦｉ）が好適に採用され得る。発声音（目的音成分）に対応する周波数成分は比較的に低帯域であり、環境音に対応する周波数成分は比較的に高帯域であるから、高速データ通信規格に準拠した広帯域の通信システムが好適である。 (3) As the communication network 200, a broadband IP (Internet Protocol) network or a public wireless LAN (WiFi) can be suitably employed. Since the frequency component corresponding to the uttered sound (target sound component) is relatively low-band and the frequency component corresponding to environmental sound is relatively high-band, a broadband communication system compliant with high-speed data communication standards is available. Is preferred.

（４）前述の各形態では、眼鏡型のウェアラブル端末を音声通信装置Ｄとして例示したが、音声通話が可能な電子機器であって利用者による携行が可能であれば、音声通信装置Ｄの形態は任意である。例えば、携帯電話機やスマートフォン等の公知の通信端末が音声通信装置Ｄとして任意に利用され得る。 (4) In each of the above-described embodiments, the glasses-type wearable terminal is exemplified as the voice communication device D. However, if the electronic device is capable of voice communication and can be carried by the user, the voice communication device D may be used. Is optional. For example, a known communication terminal such as a mobile phone or a smartphone can be arbitrarily used as the voice communication device D.

（５）前述の各形態では、利用者の周囲の環境音を目的音とともに通話相手に伝達する構成を例示したが、現実的には、通話相手に自分の居場所を知らせたくない場合も想定される。そこで、環境音を前述の各形態と同様に目的音に付加する動作モードと、環境音を付加しない動作モード（第２信号処理部１２の動作を無効化する動作モード）とを利用者が任意に選択できる構成も採用され得る。 (5) In the above-described embodiments, the configuration in which the environmental sound around the user is transmitted to the other party along with the target sound is exemplified. However, in reality, it may be assumed that the other party does not want to know his / her whereabouts. The Therefore, the user arbitrarily selects an operation mode in which the environmental sound is added to the target sound in the same manner as the above-described embodiments and an operation mode in which the environmental sound is not added (an operation mode in which the operation of the second signal processing unit 12 is invalidated). It is also possible to adopt a configuration that can be selected.

（６）前述の各形態では、受話信号ＳRのレベルＬEに対して調整値Ｇを直線的に変化させたが、レベルＬEに対する調整値Ｇの変化の態様は以上の例示に限定されない。例えば、受話信号ＳRのレベルＬEに対して調整値Ｇを非線形に変化させる構成や、受話信号ＳRのレベルＬEに対して調整値Ｇを曲線的に規定される構成も採用され得る。 (6) In each of the above-described embodiments, the adjustment value G is linearly changed with respect to the level LE of the received signal SR. However, the mode of change of the adjustment value G with respect to the level LE is not limited to the above examples. For example, a configuration in which the adjustment value G is nonlinearly changed with respect to the level LE of the received signal SR or a configuration in which the adjustment value G is defined in a curve with respect to the level LE of the received signal SR can be employed.

Ｄ……音声通信装置、１０……音響処理部、１１……第１信号処理部、１２……第２信号処理部、２０……収音部、２２A1，２２A2，２２B……収音機器，３０……通信部、３２……送信部、３４……受信部、１２０……調音処理部、１３０……調整部、４０……制御部、４２……レベル算出部、４４……調整値設定部、２００……通信網。
D: Voice communication device, 10: Acoustic processing unit, 11: First signal processing unit, 12: Second signal processing unit, 20: Sound collection unit, 22A1, 22A2, 22B ... Sound collection device, 30 …… Communication unit, 32 …… Transmitting unit, 34 …… Reception unit, 120 …… Articulation processing unit, 130 …… Adjustment unit, 40 …… Control unit, 42 …… Level calculation unit, 44 …… Adjustment value setting Department, 200 ... communication network.

Claims

A receiving unit for receiving a reception signal transmitted from a communication device of a communication partner;
A sound emitting unit that emits sound according to the received signal received by the receiving unit;
A sound collection unit that collects sound including target sound and environmental sound and generates a sound collection signal;
A first signal processing unit for generating a first acoustic signal by the first signal processing to emphasize the target sound components to the environmental sound component of the collected sound signal the sound pickup unit has generated,
Emphasizing the environmental sound component with respect to the target sound components of the sound collection signals the sound pickup unit has generated, the second signal to generate a second audio signal by a different second signal processing from the first signal processing A processing unit;
A control unit for controlling the level of the second acoustic signal according to the level of the received signal received by the receiving unit ;
A voice communication device comprising: a transmission unit that transmits the first acoustic signal and the second acoustic signal.

The voice communication apparatus according to claim 1, wherein the control unit controls the level of the second acoustic signal so that the level of the second acoustic signal decreases as the level of the received signal increases.

The control unit divides the received signal into a voice section and an insertion section other than the voice section, and a fluctuation in the level of the second acoustic signal with respect to the level of the received signal is compared with the voice section in the insertion section. The voice communication device according to claim 1, wherein the level of the second acoustic signal is controlled so as to be reduced.

When the time length of the insertion section is less than a threshold, the control unit is configured to reduce a variation in the level of the second acoustic signal with respect to the level of the received signal in the insertion section as compared with the voice section. When the time length of the insertion section exceeds the threshold value, the fluctuation of the level of the second acoustic signal with respect to the level of the received signal is detected in the insertion section. The voice communication apparatus according to claim 3, wherein the level of the second acoustic signal is controlled to be equal to the section.

A receiving unit for receiving a reception signal transmitted from a communication device of a communication partner;
A sound emitting unit that emits sound according to the received signal received by the receiving unit;
A sound collection unit that collects sound including target sound and environmental sound to generate a sound collection signal,
The first signal processing unit for generating a first acoustic signal by the first signal processing to emphasize the target sound components to the environmental sound component of the sound pickup unit is the sound pickup signal generated,
Emphasizing the environmental sound component with respect to the target sound components of the sound collection signals the sound pickup unit has generated, the second signal to generate a second audio signal by a different second signal processing from the first signal processing Processing section,
A control unit for controlling the level of the second acoustic signal in accordance with the level of the received signal received by the receiving unit; and
A program that causes a computer to function as a transmission unit that transmits the first acoustic signal and the second acoustic signal.