JP6409378B2 - Voice communication apparatus and program - Google Patents

Voice communication apparatus and program Download PDF

Info

Publication number
JP6409378B2
JP6409378B2 JP2014143053A JP2014143053A JP6409378B2 JP 6409378 B2 JP6409378 B2 JP 6409378B2 JP 2014143053 A JP2014143053 A JP 2014143053A JP 2014143053 A JP2014143053 A JP 2014143053A JP 6409378 B2 JP6409378 B2 JP 6409378B2
Authority
JP
Japan
Prior art keywords
sound
signal
level
unit
acoustic signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2014143053A
Other languages
Japanese (ja)
Other versions
JP2016019263A (en
Inventor
祐弘 向嶋
祐弘 向嶋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yamaha Corp
Original Assignee
Yamaha Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yamaha Corp filed Critical Yamaha Corp
Priority to JP2014143053A priority Critical patent/JP6409378B2/en
Publication of JP2016019263A publication Critical patent/JP2016019263A/en
Application granted granted Critical
Publication of JP6409378B2 publication Critical patent/JP6409378B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Description

本発明は、利用者間で音声を授受する音声通信技術に関する。   The present invention relates to a voice communication technique for transferring voice between users.

利用者が発声した会話音等の音声を示す音響信号を複数の利用者間で通信網を介して送受信する音声通信装置において、収音機器が収音した音響信号から雑音成分を抑圧する技術が従来から提供されている。例えば特許文献1には、音響信号から推定される雑音成分のスペクトル(推定雑音スペクトル)を音響信号から除去する技術が開示されている。特許文献1の技術では、音響信号における音声成分の有無に応じた推定雑音スペクトルと音声成分の有無とは無関係に推定された推定雑音スペクトルとから最終的な推定雑音スペクトルを生成して音響信号から除去する。   A technology for suppressing noise components from an acoustic signal collected by a sound collection device in a voice communication device that transmits and receives an acoustic signal indicating speech such as a conversation sound uttered by a user via a communication network between a plurality of users. Traditionally provided. For example, Patent Document 1 discloses a technique for removing a noise component spectrum (estimated noise spectrum) estimated from an acoustic signal from the acoustic signal. In the technique of Patent Document 1, a final estimated noise spectrum is generated from an estimated noise spectrum according to the presence / absence of a speech component in an acoustic signal and an estimated noise spectrum estimated regardless of the presence / absence of the speech component. Remove.

特開2010−102204号公報JP 2010-102204 A

特許文献1のような高度な雑音抑圧技術を利用すれば、音響信号に包含される雑音成分を非常に高精度に抑圧して通話相手の音声通信装置に送信することが可能である。しかし、発話者の周囲に存在する音響(以下「環境音」という)を含む雑音成分が過度に高精度に除去されると、発話者の周囲の環境音を通話相手に伝達できないという問題がある。以上の事情を考慮して、利用者の周囲の環境音を通話相手に適切に伝達することを目的とする。   By using an advanced noise suppression technique such as that disclosed in Patent Document 1, it is possible to suppress a noise component included in an acoustic signal with very high accuracy and transmit it to the voice communication device of the other party. However, if noise components including sound (hereinafter referred to as “environmental sound”) existing around the speaker are removed with an excessively high accuracy, there is a problem that the environmental sound around the speaker cannot be transmitted to the other party. . In view of the above circumstances, an object is to appropriately transmit environmental sounds around the user to the other party.

以上の課題を解決するために、本発明に係る音声通信装置は、通話相手の通信装置から送信された受話信号を受信する受信部と、前記受信部が受信した前記受話信号に応じた音響を放音する放音部と、目的音と環境音とを含む音響を収音して収音信号を生成する収音部と、前記収音部が生成した前記収音信号のうち前記目的音成分を強調した第1音響信号を生成する第1信号処理部と、前記収音部が生成した前記収音信号のうち前記環境音成分を強調した第2音響信号を生成する第2信号処理部と、前記受信部が受信した前記受話信号のレベルに応じて前記第2音響信号のレベルを制御する制御部と、前記第1音響信号と前記第2音響信号とを送信する送信部とを具備する。以上の構成では、収音部が生成した収音信号のうち目的音成分を強調した第1音響信号と環境音成分を強調した第2音響信号とが通話相手の通信装置に送信される。したがって、利用者の周囲の環境音を通話相手に適切に伝達することが可能である。なお、目的音とは、収音の目的となる音響であり、具体的には音声通信装置の利用者の発声音である。他方、環境音とは、目的音以外の音響であり、音声通信装置の利用者の周囲に存在する音響(人混みでの雑踏音や空調設備の動作音等)と、放音部から放射されて収音部に収音される帰還音とを含む。帰還音は、例えば、通話相手の通信装置から送信されて放音部から放射された通話相手の発声音である。   In order to solve the above-described problems, a voice communication device according to the present invention includes a receiving unit that receives a received signal transmitted from a communication device of a communication partner, and a sound corresponding to the received signal received by the receiving unit. A sound emission unit that emits sound, a sound collection unit that collects sound including target sound and environmental sound, and generates a sound collection signal; and the target sound component of the sound collection signal generated by the sound collection unit A first signal processing unit that generates a first acoustic signal that emphasizes the sound, and a second signal processing unit that generates a second acoustic signal that emphasizes the environmental sound component of the collected sound signal generated by the sound collecting unit; A control unit that controls the level of the second acoustic signal according to the level of the received signal received by the reception unit; and a transmission unit that transmits the first acoustic signal and the second acoustic signal. . In the above configuration, the first acoustic signal in which the target sound component is emphasized and the second acoustic signal in which the environmental sound component is enhanced among the collected sound signals generated by the sound collecting unit are transmitted to the communication device of the other party. Therefore, it is possible to appropriately transmit environmental sounds around the user to the other party. Note that the target sound is the sound that is the target of sound collection, and specifically, the sound produced by the user of the voice communication device. On the other hand, the environmental sound is sound other than the target sound, and is radiated from the sound that is present around the user of the voice communication device (such as crowded noises in the crowds and operating sounds of air conditioning equipment) and from the sound emission unit. And feedback sound collected by the sound collection unit. The feedback sound is, for example, the voice of the call partner transmitted from the communication device of the call partner and radiated from the sound emitting unit.

ところで、受話信号に包含される通話相手の音声は、放音部から収音部に到達する帰還音として環境音に包含されるから、例えば受話信号のレベルに関わらず第2音響信号のレベルが維持される構成では、利用者の音声通信装置と通話相手の音声通信装置との間で通話相手の利用者の音声が循環し、結果的にハウリングを発生させる原因となり得る。以上の事情を考慮して、本発明の好適な態様における前記制御部は、前記受話信号のレベルが高いほど前記第2音響信号のレベルが低下するように前記第2音響信号のレベルを制御する。以上の態様では、受話信号のレベルが高いほど第2音響信号のレベルが低下するように第2音響信号のレベルが制御されるから、受話信号のレベルに関わらず第2音響信号のレベルが維持される構成と比較してハウリングを有効に防止できるという利点がある。   By the way, since the other party's voice included in the received signal is included in the environmental sound as a feedback sound that reaches the sound collecting unit from the sound emitting unit, for example, the level of the second acoustic signal is set regardless of the level of the received signal. In the maintained configuration, the voice of the other party's user circulates between the user's voice communication apparatus and the other party's voice communication apparatus, which may result in howling. In view of the above circumstances, the control unit according to a preferred aspect of the present invention controls the level of the second acoustic signal so that the level of the second acoustic signal decreases as the level of the received signal increases. . In the above aspect, since the level of the second acoustic signal is controlled so that the level of the second acoustic signal decreases as the level of the received signal increases, the level of the second acoustic signal is maintained regardless of the level of the received signal. There is an advantage that howling can be effectively prevented as compared with the configuration.

本発明の好適な態様において、前記制御部は、前記受話信号のレベルが閾値を上回る場合に、前記第2音響信号のレベルが低下するように前記第2音響信号のレベルを制御する。以上の態様では、受話信号のレベルが閾値を上回る場合に第2音響信号のレベルが低下するように第2音響信号のレベルが制御されるから、受話信号のレベルに関わらず第2音響信号のレベルを受話信号のレベルに連動させる構成と比較して、適切に環境音が伝達されるという利点がある。   In a preferred aspect of the present invention, the control unit controls the level of the second acoustic signal so that the level of the second acoustic signal decreases when the level of the received signal exceeds a threshold value. In the above aspect, since the level of the second acoustic signal is controlled so that the level of the second acoustic signal decreases when the level of the received signal exceeds the threshold value, the second acoustic signal does not depend on the level of the received signal. Compared to a configuration in which the level is linked to the level of the received signal, there is an advantage that environmental sound is appropriately transmitted.

本発明の好適な態様において、前記制御部は、前記受話信号を音声区間と前記音声区間以外の挿入区間とに区分し、前記挿入区間において、前記受話信号のレベルに対する前記第2音響信号のレベルの変動が前記音声区間と比較して低減されるように前記第2音響信号のレベルを制御する。以上の態様では、受話信号を音声区間と挿入区間とに区分し、挿入区間では受話信号のレベルに対する第2音響信号のレベルの変動が音声区間と比較して低減されるように第2音響信号のレベルが制御される。したがって、挿入区間内での環境音の変動に起因して通話相手の利用者が違和感を知覚する可能性が低減されるという利点がある。なお、音声区間とは、受話信号のうち通話相手の発声音が優勢に存在する区間であり、挿入区間とは、音声区間以外の区間(例えば相前後する音声区間の間で発声者の発話が途切れた区間)である。   In a preferred aspect of the present invention, the control unit divides the received signal into a voice section and an insertion section other than the voice section, and the level of the second acoustic signal relative to the level of the received signal in the insertion section. The level of the second acoustic signal is controlled such that the fluctuation of the second acoustic signal is reduced as compared with the voice interval. In the above aspect, the received signal is divided into a voice section and an insertion section, and the second acoustic signal is reduced in the insertion section so that the fluctuation in the level of the second acoustic signal with respect to the level of the received signal is reduced compared to the voice section. Level is controlled. Therefore, there is an advantage that the possibility that the user of the other party of the call perceives the uncomfortable feeling due to the fluctuation of the environmental sound in the insertion section is reduced. Note that the voice section is a section in which the voice of the other party is dominant in the received signal, and the insertion section is a section other than the voice section (for example, a speaker's utterance between adjacent voice sections). (Interrupted section).

本発明の好適な態様において、前記制御部は、前記挿入区間の時間長が閾値を下回る場合に、当該挿入区間において、前記受話信号のレベルに対する前記第2音響信号のレベルの変動が前記音声区間と比較して低減されるように前記第2音響信号のレベルを制御する一方、前記挿入区間の時間長が前記閾値を上回る場合に、当該挿入区間において、前記受話信号のレベルに対する前記第2音響信号のレベルの変動が前記音声区間と同等となるように前記第2音響信号のレベルを制御する。以上の態様では、挿入区間の時間長が閾値を上回る場合(通話相手の一連の発話が終了したと推定される状況)には、受話信号のレベルに応じた第2音響信号のレベルが音声区間と同等に制御され、閾値を下回る場合には、受話信号のレベルに応じた第2音響信号のレベルの制御が音声区間と比較して抑制されるように第2音響信号のレベルが制御される。したがって、挿入区間の時間長が閾値を下回る場合には、環境音の変動に起因して利用者が違和感を知覚する可能性が低減することが可能になる。他方、挿入区間の時間長が閾値を上回る場合には、利用者側の環境音を通話相手に適切に伝達することが可能になる。以上の態様によれば、通話相手の発話の状況に応じた適切なレベルの環境音を通話相手に伝達することが可能になるという利点がある。   In a preferred aspect of the present invention, when the time length of the insertion section is less than a threshold value, the control unit causes a change in the level of the second acoustic signal relative to the level of the received signal in the insertion section. While the level of the second acoustic signal is controlled so as to be reduced compared to the second acoustic signal, when the time length of the insertion section exceeds the threshold, the second acoustic signal with respect to the level of the received signal in the insertion section The level of the second acoustic signal is controlled so that the fluctuation of the signal level is equivalent to that of the voice section. In the above aspect, when the time length of the insertion section exceeds the threshold (a situation in which it is estimated that a series of utterances of the other party has been completed), the level of the second acoustic signal corresponding to the level of the received signal is the voice section. When the level is lower than the threshold, the level of the second acoustic signal is controlled so that the control of the level of the second acoustic signal according to the level of the received signal is suppressed as compared with the voice interval. . Therefore, when the time length of the insertion section is less than the threshold value, it is possible to reduce the possibility that the user perceives a sense of incongruity due to the fluctuation of the environmental sound. On the other hand, when the time length of the insertion section exceeds the threshold, it is possible to appropriately transmit the environmental sound on the user side to the other party. According to the above aspect, there is an advantage that it is possible to transmit an environmental sound of an appropriate level according to the state of speech of the other party to the other party.

本発明に係る音声通信装置は、音声通信に関連する処理に専用されるDSP(Digital Signal Processor)などのハードウェア(電子回路)によって実現されるほか、CPU(Central Processing Unit)などの汎用の演算処理装置とプログラムとの協働によっても実現される。本発明に係るプログラムは、通話相手の通信装置から送信された受話信号を受信する受信部、前記受信部が受信した前記受話信号に応じた音響を放音する放音部、目的音と環境音とを含む音響を収音して収音信号を生成する収音部、前記収音部が生成した前記収音信号のうち前記目的音成分を強調した第1音響信号を生成する第1信号処理部、前記収音部が生成した前記収音信号のうち前記環境音成分を強調した第2音響信号を生成する第2信号処理部、前記受信部が受信した前記受話信号のレベルに応じて前記第2音響信号のレベルを制御する制御部、および、前記第1音響信号と前記第2音響信号とを送信する送信部としてコンピュータを機能させる。本発明のプログラムは、コンピュータが読取可能な記録媒体に格納された形態で提供されてコンピュータにインストールされ得る。記録媒体は、例えば非一過性(non-transitory)の記録媒体であり、CD-ROM等の光学式記録媒体(光ディスク)が好例であるが、半導体記録媒体や磁気記録媒体等の公知の任意の形式の記録媒体を包含し得る。なお、例えば、本発明のプログラムは、通信網を介した配信の形態で提供されてコンピュータにインストールされ得る。また、以上の各態様に係る音声通信装置の動作方法(音声通信方法)としても本発明は特定される。   The voice communication apparatus according to the present invention is realized by hardware (electronic circuit) such as DSP (Digital Signal Processor) dedicated to processing related to voice communication, and general-purpose computation such as CPU (Central Processing Unit). This is also realized by cooperation between the processing device and the program. A program according to the present invention includes a receiving unit that receives a received signal transmitted from a communication device of a communication partner, a sound emitting unit that emits sound according to the received signal received by the receiving unit, a target sound, and an environmental sound. And a first signal processing for generating a first acoustic signal in which the target sound component is emphasized among the collected signals generated by the sound collecting unit. A second signal processing unit that generates a second acoustic signal that emphasizes the environmental sound component of the collected sound signal generated by the sound collecting unit, and the level of the received signal received by the receiving unit. The computer is caused to function as a control unit that controls the level of the second acoustic signal and a transmission unit that transmits the first acoustic signal and the second acoustic signal. The program of the present invention can be provided in a form stored in a computer-readable recording medium and installed in the computer. The recording medium is, for example, a non-transitory recording medium, and an optical recording medium (optical disk) such as a CD-ROM is a good example, but a known arbitrary one such as a semiconductor recording medium or a magnetic recording medium This type of recording medium can be included. For example, the program of the present invention can be provided in the form of distribution via a communication network and installed in a computer. The present invention is also specified as an operation method (voice communication method) of the voice communication device according to each of the above aspects.

第1実施形態に係る通信システムの構成を示す図である。It is a figure which shows the structure of the communication system which concerns on 1st Embodiment. 音声通信装置のブロック図である。It is a block diagram of a voice communication apparatus. 音声通信装置の具体的な形態の説明図である。It is explanatory drawing of the specific form of a voice communication apparatus. 第1信号処理部のブロック図である。It is a block diagram of a 1st signal processing part. 調音処理部のブロック図である。It is a block diagram of an articulation processing part. 制御部のブロック図である。It is a block diagram of a control part. 受話信号のレベルと調整値との関係の説明図である。It is explanatory drawing of the relationship between the level of a received signal, and an adjustment value. 受話信号のレベルに応じた調整値の変化の説明図である。It is explanatory drawing of the change of the adjustment value according to the level of a received signal. 第2実施形態に係る制御部のブロック図である。It is a block diagram of a control part concerning a 2nd embodiment. 音声区間および挿入区間でのレベルと調整値との関係の説明図である。It is explanatory drawing of the relationship between the level and adjustment value in an audio | voice area and an insertion area. 第3実施形態における受話信号のレベルと調整値との関係の説明図である。It is explanatory drawing of the relationship between the level of a received signal and adjustment value in 3rd Embodiment. 第4実施形態に係る音声通信装置のブロック図である。It is a block diagram of the audio | voice communication apparatus which concerns on 4th Embodiment. 変形例に係る音声通信装置のブロック図である。It is a block diagram of the audio | voice communication apparatus which concerns on a modification.

<第1実施形態>
図1は、本発明の第1実施形態に係る音声通信装置を利用した通信システムの構成を示す図である。図1に例示されるように、通信システム100は、通信網200と複数の音声通信装置D(D1,D2)とを含んで構成される。複数の音声通信装置Dの各々は、例えば、利用者に携行される通信端末であり、他の音声通信装置Dとの間で通信網200を介した音声通話を実行する。通信網(例えば移動通信網)200は、基地局と交換局とを含む多数の中継装置で構成される。図1では、相互に通信する2個の音声通信装置D(D1,D2)のみが便宜的に図示されている。以下の説明では、利用者U1が使用する音声通信装置D1と利用者U2が使用する音声通信装置D2とを利用して利用者U1と利用者U2とが通話する場合を想定する。また、音声通信装置D1に便宜的に着目して構成および動作を例示するが、音声通信装置D2の構成および動作も同様である。
<First Embodiment>
FIG. 1 is a diagram showing a configuration of a communication system using a voice communication apparatus according to the first embodiment of the present invention. As illustrated in FIG. 1, the communication system 100 includes a communication network 200 and a plurality of voice communication devices D (D1, D2). Each of the plurality of voice communication devices D is, for example, a communication terminal carried by a user, and performs a voice call via the communication network 200 with another voice communication device D. A communication network (for example, a mobile communication network) 200 includes a large number of relay devices including a base station and an exchange station. In FIG. 1, only two voice communication apparatuses D (D1, D2) communicating with each other are illustrated for convenience. In the following description, it is assumed that the user U1 and the user U2 make a call using the voice communication device D1 used by the user U1 and the voice communication device D2 used by the user U2. Further, although the configuration and the operation are illustrated with a focus on the voice communication device D1 for convenience, the configuration and the operation of the voice communication device D2 are the same.

図2は、音声通信装置D1のブロック図である。音声通信装置D1は、利用者U1が発声した音声等の周囲の音響を表す音響信号(以下「送話信号」という)STを通信網200に送信するとともに、通話相手である利用者U2の音声を含む音響を表す音響信号(以下「受話信号」という)SRを通信網200から受信して受話信号SRに応じた音響を放射する装置であり、音響処理部10と収音部20と通信部30と制御部40と放音部50とを具備する。図2に例示された各要素(例えば音響処理部10や制御部40)は、例えば各種の記録媒体に記憶されたプログラムを演算処理装置(CPU)が実行することで実現される。なお、音響処理部10の各機能を複数の集積回路に分散した構成や、専用の電子回路(DSP)が各機能を実現する構成も採用され得る。音響信号をデジタル信号に変換するA/D変換器や、音響信号をアナログ信号に変換するD/A変換器の図示は便宜的に省略されている。   FIG. 2 is a block diagram of the voice communication device D1. The voice communication device D1 transmits an acoustic signal (hereinafter referred to as “transmission signal”) ST representing the surrounding sound such as voice uttered by the user U1 to the communication network 200 and the voice of the user U2 who is the other party. Is a device that receives an acoustic signal (hereinafter referred to as “received signal”) SR representing the sound including sound from the communication network 200 and emits sound corresponding to the received signal SR, and includes an acoustic processing unit 10, a sound collecting unit 20, and a communication unit. 30, a control unit 40, and a sound emitting unit 50. Each element (for example, the acoustic processing unit 10 and the control unit 40) illustrated in FIG. 2 is realized by, for example, an arithmetic processing unit (CPU) executing programs stored in various recording media. A configuration in which each function of the acoustic processing unit 10 is distributed over a plurality of integrated circuits or a configuration in which a dedicated electronic circuit (DSP) realizes each function may be employed. An A / D converter that converts an acoustic signal into a digital signal and a D / A converter that converts an acoustic signal into an analog signal are not shown for convenience.

収音部20は、周囲の音響を収音して収音信号M(MA1,MA2,MB)を生成する音響機器であり、相互に離間して配置される複数の収音機器22(22A1,22A2,22B)を含んで構成される。放音部50は、利用者U2の音声通信装置D2から通信網200を介して受信した受話信号SRに応じた音響を放射する音響機器(例えばスピーカやイヤホン)である。   The sound collection unit 20 is an acoustic device that collects ambient sounds and generates a sound collection signal M (MA1, MA2, MB), and a plurality of sound collection devices 22 (22A1,. 22A2, 22B). The sound emitting unit 50 is an acoustic device (for example, a speaker or an earphone) that emits sound corresponding to the received signal SR received from the voice communication device D2 of the user U2 via the communication network 200.

収音部20には、目的音と環境音との混合音が到来する。目的音は、収音の目的となる音響であり、具体的には利用者U1の発声音である。環境音は、目的音以外の音響であり、利用者U1の周囲に存在する音響(例えば人混みでの雑踏音や空調設備の動作音等)と、放音部50から放射されて収音機器22に収音される帰還音とを含む。帰還音は、例えば、音声通信装置D2から送信された利用者U2の発声音である。   A mixed sound of the target sound and the environmental sound arrives at the sound collection unit 20. The target sound is the sound that is the target of sound collection, and specifically is the sound produced by the user U1. The environmental sound is sound other than the target sound, such as sound existing around the user U1 (for example, crowded noise in the crowd, operation sound of the air conditioning equipment, etc.) and the sound collecting device 22 radiated from the sound emitting unit 50. And the return sound collected. The feedback sound is, for example, the utterance sound of the user U2 transmitted from the voice communication device D2.

図3は、第1実施形態の音声通信装置Dの外観図である。図3では、眼鏡型のウェアラブル端末が音声通信装置D1として例示されている。音声通信装置D1は、利用者U1の両眼の前方に位置する本体部60と、本体部60の両側に設置される支持部62および支持部64とを具備する電子機器である。利用者U1の左耳に装着される支持部62の基端側(本体部60側)に収音機器22A1が設置され、利用者U1の右耳に装着される支持部64の基端側に収音機器22A2が設置される。すなわち、収音機器22A1と収音機器22A2とは相互に間隔d1をあけて離間する。支持部64の先端側(収音機器22A2とは反対側であって利用者U1からみて後方)には収音機器22Bが設置される。   FIG. 3 is an external view of the voice communication device D according to the first embodiment. In FIG. 3, a glasses-type wearable terminal is illustrated as the voice communication device D1. The voice communication device D1 is an electronic device including a main body portion 60 positioned in front of both eyes of the user U1, and support portions 62 and support portions 64 installed on both sides of the main body portion 60. The sound collecting device 22A1 is installed on the base end side (main body 60 side) of the support portion 62 attached to the left ear of the user U1, and on the base end side of the support portion 64 attached to the right ear of the user U1. A sound collecting device 22A2 is installed. That is, the sound collecting device 22A1 and the sound collecting device 22A2 are separated from each other with a distance d1. The sound collection device 22B is installed on the front end side of the support portion 64 (on the opposite side to the sound collection device 22A2 and behind the user U1).

収音機器22A(22A1,22A2)は、目的音の収音用に配置された無指向性のマイクロホンである。収音機器22Aは、周囲の音響(目的音と環境音との混合音)の波形を表す収音信号MA(MA1,MA2)を生成する。他方、収音機器22Bは、環境音の収音用に配置された無指向性のマイクロホンであり、目的音と比較して環境音を優勢に含有する音響の波形を表す収音信号MBを生成する。以上に説明した通り、支持部62および支持部64に収音部20と放音部50とが配置される構成では、利用者U1は拡声通話(ハンズフリー通話)が可能である。   The sound collecting device 22A (22A1, 22A2) is an omnidirectional microphone arranged for collecting a target sound. The sound collection device 22A generates a sound collection signal MA (MA1, MA2) representing the waveform of the surrounding sound (mixed sound of the target sound and the environmental sound). On the other hand, the sound collecting device 22B is an omnidirectional microphone arranged for collecting environmental sound, and generates a sound collecting signal MB representing an acoustic waveform containing the environmental sound predominantly compared to the target sound. To do. As described above, in the configuration in which the sound collection unit 20 and the sound emission unit 50 are arranged on the support unit 62 and the support unit 64, the user U1 can make a loud call (hands-free call).

図2の音響処理部10は、収音部20が生成した収音信号M(MA1,MA2,MB)に応じて送話信号STを生成する。通信部30は、送信部32と受信部34とを含み、通信網200を介して音声通信装置D2との間で通信する通信機器(アンテナおよび変復調回路)である。送信部32は、音声通信装置D2を送信先として送話信号STを通信網200に送信する。他方、受信部34は、音声通信装置D2が送信した送話信号STを受話信号SRとして通信網200から受信する。前述の通り、受信部34が受信した受話信号SRに応じた音響が放音部50から放射される。   The sound processing unit 10 in FIG. 2 generates a transmission signal ST according to the sound collection signal M (MA1, MA2, MB) generated by the sound collection unit 20. The communication unit 30 includes a transmission unit 32 and a reception unit 34, and is a communication device (antenna and modulation / demodulation circuit) that communicates with the voice communication device D2 via the communication network 200. The transmission unit 32 transmits the transmission signal ST to the communication network 200 with the voice communication device D2 as the transmission destination. On the other hand, the receiving unit 34 receives the transmission signal ST transmitted from the voice communication device D2 from the communication network 200 as the reception signal SR. As described above, sound corresponding to the reception signal SR received by the receiving unit 34 is radiated from the sound emitting unit 50.

図2に例示される通り、第1実施形態の音響処理部10は、第1信号処理部11と第2信号処理部12と加算部13とを含んで構成される。第1信号処理部11は、収音部20(収音機器22A1,収音機器22A2)が生成した収音信号MAのうち目的音成分を強調(環境音成分を抑圧)した第1音響信号S1を生成する。   As illustrated in FIG. 2, the acoustic processing unit 10 according to the first embodiment includes a first signal processing unit 11, a second signal processing unit 12, and an addition unit 13. The first signal processing unit 11 emphasizes the target sound component (suppresses the environmental sound component) in the collected sound signal MA generated by the sound collecting unit 20 (sound collecting device 22A1, sound collecting device 22A2). Is generated.

図4は、第1信号処理部11のブロック図である。図4に例示される通り、第1信号処理部11は、残響抑圧部111と指向制御部112と残響抑圧部113と雑音抑圧部114と帯域強調部115と強度調整部116とを含んで構成される。残響抑圧部111は、受信部34から放音部50に供給される受話信号SRと各収音機器22(22A1,22A2)が生成した収音信号MA(MA1,MA2)とを利用した適応フィルタ処理により、収音信号MA(MA1,MA2)に重畳された推定エコー成分Eを推定し、収音信号MA(MA1,MA2)から推定エコー成分Eを抑圧することで音響信号X1(X1a,X1b)を生成する。推定エコー成分Eは、放音部50から収音部20に到来する帰還音を推定した音響成分である。   FIG. 4 is a block diagram of the first signal processing unit 11. As illustrated in FIG. 4, the first signal processing unit 11 includes a dereverberation unit 111, a directivity control unit 112, a dereverberation suppression unit 113, a noise suppression unit 114, a band enhancement unit 115, and an intensity adjustment unit 116. Is done. The reverberation suppressing unit 111 is an adaptive filter that uses the received signal SR supplied from the receiving unit 34 to the sound emitting unit 50 and the collected sound signals MA (MA1, MA2) generated by the sound collecting devices 22 (22A1, 22A2). By processing, the estimated echo component E superimposed on the collected sound signal MA (MA1, MA2) is estimated, and the estimated echo component E is suppressed from the collected sound signal MA (MA1, MA2), thereby generating the acoustic signal X1 (X1a, X1b). ) Is generated. The estimated echo component E is an acoustic component obtained by estimating the feedback sound coming from the sound emitting unit 50 to the sound collecting unit 20.

指向制御部112は、収音機器22A(22A1,22A2)の指向方向を制御する。具体的には、指向制御部112は、例えば公知のビーム形成処理(例えば遅延加算型ビーム形成)により、収音のビーム(収音感度が高い領域)を利用者U1の口元に向けるように制御して、音響信号X1(X1a,X1b)のうち目的音成分を強調した音響信号X2を生成する。残響抑圧部113は、音響信号X2から推定エコー成分Eを抑圧することで音響信号X3を生成する。残響抑圧部111および残響抑圧部113の双方で推定エコー成分Eを抑圧するのは、残響抑圧部111による1回の抑圧だけでは推定エコー成分Eを充分に抑圧できないからである。雑音抑圧部114は、音響信号X3から雑音成分(目的音成分以外の音響成分)を抑圧することで音響信号X4を生成する。雑音成分の抑圧には、スペクトル減算等の公知の雑音抑圧処理が任意に採用され得る。帯域強調部115は、音響信号X4のうち目的音成分(発声音)を包含する周波数帯域の音響成分が他帯域と比較して強調されるように音響信号X4の周波数特性を制御(イコライジング)して音響信号X5を生成する。強度調整部116は、音響信号X5のレベルのダイナミックレンジを周波数帯域毎に調整すること(Dynamic Range Control)で第1音響信号S1を生成する。   The directivity control unit 112 controls the directivity direction of the sound collection device 22A (22A1, 22A2). Specifically, the directivity control unit 112 performs control so that a beam of sound collection (an area where sound collection sensitivity is high) is directed toward the mouth of the user U1 by, for example, a known beam forming process (for example, delay addition type beam formation). Then, the acoustic signal X2 in which the target sound component is emphasized in the acoustic signal X1 (X1a, X1b) is generated. The reverberation suppression unit 113 generates the acoustic signal X3 by suppressing the estimated echo component E from the acoustic signal X2. The reason why the estimated echo component E is suppressed by both the reverberation suppressing unit 111 and the reverberation suppressing unit 113 is that the estimated echo component E cannot be sufficiently suppressed by only one suppression by the reverberation suppressing unit 111. The noise suppression unit 114 generates the acoustic signal X4 by suppressing noise components (acoustic components other than the target sound component) from the acoustic signal X3. For noise component suppression, known noise suppression processing such as spectral subtraction can be arbitrarily employed. The band emphasizing unit 115 controls (equalizes) the frequency characteristics of the acoustic signal X4 so that the acoustic component in the frequency band including the target sound component (voiced sound) in the acoustic signal X4 is enhanced compared to the other bands. To generate an acoustic signal X5. The intensity adjusting unit 116 generates the first acoustic signal S1 by adjusting the dynamic range of the level of the acoustic signal X5 for each frequency band (Dynamic Range Control).

図2の第2信号処理部12は、収音部20(収音機器22B)が生成した収音信号MBのうち環境音成分を強調(目的音成分を抑圧)した第2音響信号S2を生成する。第2信号処理部12は、図2に例示される通り、調音処理部120と調整部130とを含んで構成される。   The second signal processing unit 12 in FIG. 2 generates a second acoustic signal S2 that emphasizes the environmental sound component (suppresses the target sound component) in the collected sound signal MB generated by the sound collecting unit 20 (sound collecting device 22B). To do. As illustrated in FIG. 2, the second signal processing unit 12 includes an articulation processing unit 120 and an adjustment unit 130.

図5は、調音処理部120のブロック図である。第1実施形態の調音処理部120は、収音信号MBのうち環境音成分を強調した環境音信号SEを生成する要素であり、図5に例示される通り、雑音抑圧部121と帯域強調部122と強度調整部123とを含んで構成される。   FIG. 5 is a block diagram of the articulation processing unit 120. The articulation processing unit 120 of the first embodiment is an element that generates an environmental sound signal SE in which the environmental sound component is emphasized in the collected sound signal MB, and as illustrated in FIG. 5, a noise suppression unit 121 and a band enhancement unit. 122 and an intensity adjusting unit 123.

雑音抑圧部121は、収音機器22Bが生成した収音信号MBのうち放音部50を構成する機器に固有の雑音成分(例えばヒスノイズ)を抑圧することで音響信号Y1を生成する。帯域強調部122は、音響信号Y1のうち環境音成分を包含する周波数帯域の音響成分が他帯域と比較して強調されるように音響信号Y1の周波数特性を制御(イコライジング)することで音響信号Y2を生成する。強度調整部123は、音響信号Y2のレベルのダイナミックレンジを周波数帯域毎に調整することで環境音信号SEを生成する。   The noise suppression unit 121 generates the acoustic signal Y1 by suppressing a noise component (for example, hiss noise) specific to the device constituting the sound emission unit 50 in the sound collection signal MB generated by the sound collection device 22B. The band emphasizing unit 122 controls (equalizing) the frequency characteristics of the acoustic signal Y1 so that the acoustic component in the frequency band including the environmental sound component in the acoustic signal Y1 is enhanced compared to the other bands. Y2 is generated. The intensity adjusting unit 123 generates the environmental sound signal SE by adjusting the dynamic range of the level of the acoustic signal Y2 for each frequency band.

図2の調整部130は、調音処理部120が生成した環境音信号SEを調整値Gに応じて調整することで第2音響信号S2を生成する。具体的には、環境音信号SEに調整値Gを乗算する乗算器が調整部130として好適に採用され得る。以上の説明から理解される通り、第2音響信号S2のレベルは調整値(ゲイン)Gに応じて調整される。   2 adjusts the environmental sound signal SE generated by the articulation processing unit 120 according to the adjustment value G to generate the second acoustic signal S2. Specifically, a multiplier that multiplies the environmental sound signal SE by the adjustment value G can be suitably employed as the adjustment unit 130. As understood from the above description, the level of the second acoustic signal S2 is adjusted according to the adjustment value (gain) G.

図2の加算部13は、第1信号処理部11が生成した第1音響信号S1と第2信号処理部12が生成した第2音響信号S2とを加算することで送話信号STを生成する。加算部13による加算後の送話信号STが送信部32から通信網200を介して利用者U2の音声通信装置D2に送信される。   2 adds the first acoustic signal S1 generated by the first signal processing unit 11 and the second acoustic signal S2 generated by the second signal processing unit 12 to generate the transmission signal ST. . The transmission signal ST after the addition by the adder 13 is transmitted from the transmitter 32 to the voice communication device D2 of the user U2 via the communication network 200.

制御部40は、調整部130による調整に適用される調整値Gを可変に制御する。第1実施形態の制御部40は、受信部34が受信して放音部50に供給される受話信号SRのレベルに応じて調整値Gを制御する。以上の説明から理解される通り、制御部40は、受話信号SRのレベルに応じて第2音響信号S2のレベルを制御する要素として機能する。   The control unit 40 variably controls the adjustment value G applied to the adjustment by the adjustment unit 130. The control unit 40 according to the first embodiment controls the adjustment value G according to the level of the received signal SR received by the receiving unit 34 and supplied to the sound emitting unit 50. As understood from the above description, the control unit 40 functions as an element for controlling the level of the second acoustic signal S2 in accordance with the level of the received signal SR.

図6は、制御部40のブロック図である。図6に例示される通り、制御部40は、レベル算出部42と調整値設定部44とを含んで構成される。レベル算出部42は、受話信号SRのレベルLEを算出する。受話信号SRのレベルLEの算出には公知の技術が任意に採用され得るが、例えば、受話信号SRのパワーを時間軸の方向に平滑化することでレベルLEを算出することが可能である。   FIG. 6 is a block diagram of the control unit 40. As illustrated in FIG. 6, the control unit 40 includes a level calculation unit 42 and an adjustment value setting unit 44. The level calculator 42 calculates the level LE of the received signal SR. For calculating the level LE of the received signal SR, a known technique can be arbitrarily adopted. For example, the level LE can be calculated by smoothing the power of the received signal SR in the direction of the time axis.

調整値設定部44は、レベル算出部42が算出した受話信号SRのレベルLEに応じた調整値Gを設定する。第1実施形態の調整値設定部44は、受話信号SRのレベルLEと調整値Gとの関係を規定する調整値テーブル(図示略)を調整値Gの設定に利用する。具体的には、調整値設定部44は、受話信号SRのレベルLEに対応する調整値Gを調整値テーブルから取得する。図7に例示される通り、第1実施形態では、受話信号SRのレベルLEが大きいほど調整値Gが小さくなるように制御部40において調整値Gが設定される。したがって、受話信号SRのレベルLEが大きい(通話相手である利用者U2の発声音量が大きい)ほど、第2音響信号S2のレベルは低下する。なお、以上の説明では調整値テーブルを利用する構成を例示したが、受話信号SRのレベルLEを適用した所定の演算で調整値Gを算定する構成も採用され得る。   The adjustment value setting unit 44 sets an adjustment value G corresponding to the level LE of the reception signal SR calculated by the level calculation unit 42. The adjustment value setting unit 44 of the first embodiment uses an adjustment value table (not shown) that defines the relationship between the level LE of the received signal SR and the adjustment value G for setting the adjustment value G. Specifically, the adjustment value setting unit 44 acquires the adjustment value G corresponding to the level LE of the reception signal SR from the adjustment value table. As illustrated in FIG. 7, in the first embodiment, the adjustment value G is set in the control unit 40 so that the adjustment value G decreases as the level LE of the received signal SR increases. Therefore, the level of the second acoustic signal S2 decreases as the level LE of the received signal SR increases (the utterance volume of the user U2 who is the other party is higher). In the above description, the configuration using the adjustment value table is exemplified. However, a configuration in which the adjustment value G is calculated by a predetermined calculation using the level LE of the received signal SR may be employed.

図8には、受話信号SRのレベルLEに応じた調整値Gの変動の具体例が例示されている。受話信号SRのレベルLEが充分に小さい状態(t1〜t2)では、調整値Gは最大値(例えば1)に維持される。そして、利用者U2による発声の開始とともに受話信号SRのレベルLEが増加すると、調整値GはレベルLEに連動して経時的に減少する(t2〜t3)。また、利用者U2による一連の発声が終了に近付いて受話信号SRのレベルLEが減少すると、調整値GはレベルLEに連動して経時的に増加する(t3〜t4)。   FIG. 8 illustrates a specific example of the fluctuation of the adjustment value G according to the level LE of the reception signal SR. When the level LE of the received signal SR is sufficiently small (t1 to t2), the adjustment value G is maintained at the maximum value (for example, 1). When the level LE of the received signal SR increases with the start of utterance by the user U2, the adjustment value G decreases with time (t2 to t3) in conjunction with the level LE. When the series of utterances by the user U2 approaches the end and the level LE of the received signal SR decreases, the adjustment value G increases with time in conjunction with the level LE (t3 to t4).

以上の例示から理解される通り、環境音成分を包含する第2音響信号S2のレベルは、通話相手である利用者U2の音声の有無(受話信号SRのレベルLE)に応じて刻々と変動する。具体的には、利用者U2の音声が小さい状態(例えば利用者U2が沈黙して利用者U1の発声音を聴取する状態)では、利用者U1の周囲に存在する環境音成分を優勢に含有する第2音響信号S2を利用者U1の発声音(目的音成分)の第1音響信号S1に付加した送話信号STが通話相手の利用者U2の音声通信装置D2に送信される。他方、放音部50から収音部20に到達する帰還音に利用者U2の音声が優勢に含有される状態(利用者U2が発声する状態)では、利用者U2の音声通信装置D2に送信される送話信号STのうち帰還音を含有する第2音響信号S2のレベルが低下し、環境音成分が少なく、利用者U1の発声音成分が多い送話信号STが送信される。   As can be understood from the above example, the level of the second acoustic signal S2 including the environmental sound component changes every moment according to the presence or absence of the voice of the user U2 who is the other party (the level LE of the received signal SR). . Specifically, in a state where the voice of the user U2 is low (for example, when the user U2 is silent and listens to the voice of the user U1), the environmental sound component present around the user U1 is predominantly included. The transmission signal ST obtained by adding the second acoustic signal S2 to the first acoustic signal S1 of the utterance sound (target sound component) of the user U1 is transmitted to the voice communication device D2 of the user U2 who is the other party. On the other hand, in a state where the voice of the user U2 is predominantly contained in the feedback sound that reaches the sound pickup unit 20 from the sound emitting unit 50 (a state in which the user U2 utters), it is transmitted to the voice communication device D2 of the user U2. Among the transmitted signals ST, the level of the second acoustic signal S2 containing the feedback sound is lowered, and the transmitted signal ST having a small environmental sound component and a large amount of the utterance sound component of the user U1 is transmitted.

以上に説明した通り、第1実施形態では、収音機器22A(22A1,22A2)が収音した収音信号MAのうち目的音成分を強調した第1音響信号S1と、環境音成分を強調した第2音響信号S2との双方が音声通信装置D2に送信される。したがって、目的音成分が強調された第1音響信号S1のみが音声通信装置D2に送信される構成と比較すると、音声通信装置D2の利用者U2が、利用者U1の発声音のほか、利用者U1の周囲の環境音を聴取できるという利点がある。しかも、第1実施形態では、第2音響信号S2のレベルが第1音響信号S1とは独立に調整されるから、利用者U1の周囲の環境音を適切なレベルで利用者U2に伝達することが可能である。   As described above, in the first embodiment, the first sound signal S1 in which the target sound component is emphasized and the environmental sound component in the sound collection signal MA collected by the sound collection device 22A (22A1, 22A2) is emphasized. Both the second acoustic signal S2 is transmitted to the voice communication device D2. Therefore, in comparison with a configuration in which only the first acoustic signal S1 with the target sound component emphasized is transmitted to the voice communication device D2, the user U2 of the voice communication device D2 can not only hear the user U1 but also the user. There is an advantage that environmental sounds around U1 can be heard. Moreover, in the first embodiment, since the level of the second acoustic signal S2 is adjusted independently of the first acoustic signal S1, the environmental sound around the user U1 is transmitted to the user U2 at an appropriate level. Is possible.

ところで、適度なレベルの環境音を利用者U1の音声に付加した送話信号STを生成する構成(環境音を利用者U2に伝達する構成)としては、例えば、第1実施形態の第2信号処理部12を省略したうえで、収音信号MAの環境音成分を第1信号処理部11にて完全には除去せずに送話信号STに適度なレベルで残存させる、という構成(以下「対比例1」という)も想定され得る。しかし、目的音成分と環境音成分とが単一の系統で纏めて処理される対比例1の構成では、環境音成分を抑圧する処理にて目的音成分に不可避的に波形歪が発生して音質が低下する(例えば目的音成分の聴覚的な明瞭性が低下する)という問題が発生し得る。対比例1とは対照的に、第1実施形態では、第1信号処理部11による目的音成分の強調と第2信号処理部12による環境音成分の強調とが相互に別個に実行されたうえで、処理後の第1音響信号S1と第2音響信号S2とを含む送話信号STが送信される。以上の構成では、第1信号処理部11では目的音成分の強調に最適化された音響処理により音質(特に明瞭性)を維持しながら目的音成分を充分に強調する一方、第2信号処理部12では環境音成分の強調に最適化された音響処理により音質を維持しながら環境音成分を充分に強調することが可能である。したがって、目的音成分および環境音成分の各々が高音質に維持された送話信号STを利用者U2に送信できるという利点がある。特に、目的音成分および環境音成分の各々の明瞭性が維持されるから、環境音が周囲に存在するなかで利用者U1が目的音を発生するという状況を利用者U2が明瞭に知覚し得る臨場感のある送話信号STを生成することが可能である。   By the way, as a configuration for generating a transmission signal ST in which an environmental sound of an appropriate level is added to the voice of the user U1 (a configuration for transmitting the environmental sound to the user U2), for example, the second signal of the first embodiment. A configuration in which the processing unit 12 is omitted and the ambient sound component of the collected sound signal MA is not completely removed by the first signal processing unit 11 but is left in the transmission signal ST at an appropriate level (hereinafter, “ Also referred to as “proportional 1”). However, in the configuration of the proportional 1 in which the target sound component and the environmental sound component are processed together in a single system, waveform distortion inevitably occurs in the target sound component in the process of suppressing the environmental sound component. There may be a problem that the sound quality is degraded (for example, the auditory clarity of the target sound component is degraded). In contrast to the contrast 1, in the first embodiment, the enhancement of the target sound component by the first signal processing unit 11 and the enhancement of the environmental sound component by the second signal processing unit 12 are performed separately from each other. Then, the transmission signal ST including the processed first acoustic signal S1 and second acoustic signal S2 is transmitted. With the above configuration, the first signal processing unit 11 sufficiently emphasizes the target sound component while maintaining the sound quality (particularly clarity) by the acoustic processing optimized for emphasizing the target sound component, while the second signal processing unit 12, it is possible to sufficiently enhance the environmental sound component while maintaining the sound quality by the acoustic processing optimized for enhancing the environmental sound component. Therefore, there is an advantage that the transmission signal ST in which each of the target sound component and the environmental sound component is maintained at high sound quality can be transmitted to the user U2. In particular, since the clarity of each of the target sound component and the environmental sound component is maintained, the user U2 can clearly perceive the situation where the user U1 generates the target sound in the presence of the environmental sound. It is possible to generate a transmission signal ST with a sense of presence.

なお、環境音成分が優勢な第2音響信号S2を目的音成分が優勢な第1音響信号S1に付加するという観点のみからすると、第2音響信号S2のレベルを制御する要素(制御部40および調整部130)を省略し、調音処理部120が生成した環境音信号SEを第2音響信号S2として第1音響信号S1に加算する構成(以下「対比例2」という)も想定され得る。しかし、受話信号SRに包含される利用者U2の音声は、放音部50から収音部20に到達する帰還音として環境音に包含されるから、受話信号SRのレベルLEに関わらず第2音響信号S2のレベルが維持される対比例2の構成では、音声通信装置D1と音声通信装置D2との間で利用者U2の音声が循環し、結果的にハウリングを発生させる原因となり得る。対比例2とは対照的に、第1実施形態では、受話信号SRのレベルLEに応じて第2音響信号S2のレベル(第1音響信号S1に付加される環境音成分のレベル)が制御される。具体的には、受話信号SRのレベルLEが高いほど第2音響信号S2のレベルが低下する。すなわち、前述の通り、利用者U2が発声する期間内では第2音響信号S2が低いレベルに抑制される。したがって、第1実施形態によれば、利用者U2が発声する音声に起因したハウリングを対比例2と比較して有効に防止できるという利点がある。   From the viewpoint of adding the second acoustic signal S2 having a dominant environmental sound component to the first acoustic signal S1 having a dominant target sound component, an element that controls the level of the second acoustic signal S2 (the control unit 40 and A configuration (hereinafter referred to as “proportional 2”) in which the adjustment unit 130) is omitted and the environmental sound signal SE generated by the articulation processing unit 120 is added to the first acoustic signal S1 as the second acoustic signal S2 may be assumed. However, since the voice of the user U2 included in the received signal SR is included in the environmental sound as a feedback sound that reaches the sound collecting unit 20 from the sound emitting unit 50, the second is irrespective of the level LE of the received signal SR. In the configuration of contrast 2 in which the level of the acoustic signal S2 is maintained, the voice of the user U2 circulates between the voice communication device D1 and the voice communication device D2, which may cause howling as a result. In contrast to contrast 2, in the first embodiment, the level of the second acoustic signal S2 (the level of the environmental sound component added to the first acoustic signal S1) is controlled in accordance with the level LE of the received signal SR. The Specifically, the higher the level LE of the received signal SR, the lower the level of the second acoustic signal S2. That is, as described above, the second acoustic signal S2 is suppressed to a low level within the period when the user U2 speaks. Therefore, according to the first embodiment, there is an advantage that howling caused by the voice uttered by the user U2 can be effectively prevented as compared with the comparative example 2.

なお、以上の説明から理解される通り、利用者U2が聴取する音響では利用者U1の周囲の環境音の音量が刻々と増減するから、当該音響を聴取する利用者U2が聴覚的な違和感を知覚する可能性も想定される。しかし、自身が発声している最中には自身に対する到来音を余り意識しない(自分の発話中は他人の発声を余り集中して聴取しない)という一般的な傾向を考慮すると、利用者U2が聴取する音響において利用者U1の環境音の音量が変動することに起因して利用者U2が違和感を知覚する可能性は特段の問題にならない。   In addition, as understood from the above description, in the sound that the user U2 listens to, the volume of the environmental sound around the user U1 increases and decreases every moment, so that the user U2 who listens to the sound feels a sense of incongruity. The possibility of perception is also assumed. However, considering the general tendency that the user is not conscious of the incoming sound during his / her utterance (not intensively listening to the utterances of other people during his / her utterance), the user U2 The possibility that the user U2 perceives a sense of incongruity due to the change in the volume of the environmental sound of the user U1 in the sound to be listened to is not a special problem.

<第2実施形態>
本発明の第2実施形態を以下に説明する。第1実施形態では、受話信号SRのレベルLEに応じて第2音響信号S2のレベルを制御した。以上の構成において、利用者U2による相前後する発話の間で利用者U2の音声が途切れる区間にて環境音が変動すると、利用者U2が違和感を知覚する可能性がある。以上の事情を考慮して、第2実施形態では、受話信号SRのうち利用者U2の音声が優勢に存在する区間(以下「音声区間」という)と音声区間以外の区間(以下「挿入区間」という)とで調整値Gの設定を相違させる。なお、以下に例示する各形態において作用や機能が第1実施形態と同様である要素については、第1実施形態の説明で参照した符号を流用して各々の詳細な説明を適宜に省略する。
Second Embodiment
A second embodiment of the present invention will be described below. In the first embodiment, the level of the second acoustic signal S2 is controlled according to the level LE of the received signal SR. In the above configuration, if the environmental sound fluctuates in a section where the voice of the user U2 is interrupted between successive utterances by the user U2, the user U2 may perceive a sense of discomfort. In consideration of the above circumstances, in the second embodiment, in the received signal SR, a section where the voice of the user U2 is dominant (hereinafter referred to as “voice section”) and a section other than the voice section (hereinafter referred to as “insertion section”). And the adjustment value G is set differently. In addition, about the element which an effect | action and function are the same as that of 1st Embodiment in each form illustrated below, the reference | standard referred by description of 1st Embodiment is diverted, and each detailed description is abbreviate | omitted suitably.

図9は、第2実施形態の制御部40のブロック図であり、図10は、第2実施形態における調整値Gの設定の説明図である。図9に例示される通り、第2実施形態の調整値設定部44は、区間検出部46を包含する。区間検出部46は、図10に例示される通り、受話信号SRを時間軸上で音声区間TV(TV1,TV2)と挿入区間TD(TD1,TD2)とに区分する。挿入区間TDは、例えば、利用者U2による相前後する発話の間で利用者U2の音声が途切れる区間である。区間検出部46は、例えば、受話信号SRを時間軸上で区分したフレーム毎に音声の有無を判定することで受話信号SRを音声区間TVと挿入区間TDとに区分する。音声区間TVと挿入区間TDとの判別には、公知の音声検出技術が任意に採用される。   FIG. 9 is a block diagram of the control unit 40 of the second embodiment, and FIG. 10 is an explanatory diagram of setting of the adjustment value G in the second embodiment. As illustrated in FIG. 9, the adjustment value setting unit 44 of the second embodiment includes a section detection unit 46. As illustrated in FIG. 10, the section detector 46 divides the received signal SR into a voice section TV (TV1, TV2) and an insertion section TD (TD1, TD2) on the time axis. The insertion section TD is a section where, for example, the voice of the user U2 is interrupted between successive utterances by the user U2. The section detector 46 divides the received signal SR into a voice section TV and an insertion section TD by determining the presence / absence of voice for each frame obtained by dividing the received signal SR on the time axis, for example. A known voice detection technique is arbitrarily employed for discrimination between the voice section TV and the insertion section TD.

図10は、音声区間TVおよび挿入区間TDの調整値Gの説明図である。図10では、時間軸上に例示される音声区間TVのうち、音声区間TV(TV1,TV2)と挿入区間TD(TD1,TD2)とを便宜的に図示している。第2実施形態の調整値設定部44は、音声区間TVと挿入区間TDとで調整値Gを個別に設定する。各区間(音声区間TV,挿入区間TD)での調整値Gの設定について以下に詳述する。   FIG. 10 is an explanatory diagram of the adjustment value G of the voice section TV and the insertion section TD. In FIG. 10, among the voice sections TV exemplified on the time axis, a voice section TV (TV1, TV2) and an insertion section TD (TD1, TD2) are shown for convenience. The adjustment value setting unit 44 of the second embodiment sets the adjustment value G individually for the voice section TV and the insertion section TD. The setting of the adjustment value G in each section (voice section TV, insertion section TD) will be described in detail below.

音声区間TV内において、調整値設定部44は、第1実施形態と同様に、受話信号SRのレベルLEに応じて調整値Gを設定する。具体的には、調整値設定部44は、受話信号SRのレベルが高いほど第2音響信号S2のレベルが低下するように調整値Gを設定する。   Within the voice section TV, the adjustment value setting unit 44 sets the adjustment value G according to the level LE of the received signal SR, as in the first embodiment. Specifically, the adjustment value setting unit 44 sets the adjustment value G so that the level of the second acoustic signal S2 decreases as the level of the received signal SR increases.

他方、挿入区間TDでは、音声通信装置D2の利用者の発話の状況に応じて調整値Gが設定される。具体的には、以下に例示される通り、調整値設定部44は、挿入区間TDの時間長と閾値T0との比較結果に応じて調整値Gを設定する。   On the other hand, in the insertion section TD, the adjustment value G is set according to the state of speech of the user of the voice communication device D2. Specifically, as exemplified below, the adjustment value setting unit 44 sets the adjustment value G according to the comparison result between the time length of the insertion interval TD and the threshold value T0.

(a).挿入区間TDの時間長が閾値T0を下回る場合(TD<T0)
挿入区間TDの時間長が閾値T0を下回る場合(相前後する音声区間TVの間隔が短い場合)には、音声通信装置D2の利用者U2が発話の途中であると推定される。相前後する発話の間に環境音成分(第2音響信号S2)のレベルが音声区間TVと同様に変動すると、利用者U2が違和感を知覚する可能性がある。以上の傾向を考慮して、調整値設定部44は、受話信号SRのレベルLEに応じた第2音響信号S2のレベルの変動が、音声区間TVと比較して抑制されるように調整値Gを設定する。具体的には、図10の挿入区間TD1に例示される通り、受話信号SRのレベルLEに対する調整値Gの変動(レベルLEに対する調整値Gの変化率)が、音声区間TVにおけるレベルLEに対する調整値Gの変動と比較して緩やかになるように調整値Gが設定される。なお、時間長が閾値T0を下回る挿入区間TDにおいて調整値Gを一定値に維持する構成(レベルLEに対する調整値Gの変化率をゼロに設定する構成)も採用され得る。
(A). When the length of the insertion section TD is less than the threshold value T0 (TD <T0)
When the time length of the insertion section TD is less than the threshold value T0 (when the interval between adjacent voice sections TV is short), it is estimated that the user U2 of the voice communication device D2 is in the middle of speaking. If the level of the environmental sound component (second acoustic signal S2) fluctuates in the same manner as in the voice interval TV during successive utterances, there is a possibility that the user U2 perceives a sense of discomfort. In consideration of the above tendency, the adjustment value setting unit 44 adjusts the adjustment value G so that the fluctuation of the level of the second acoustic signal S2 according to the level LE of the reception signal SR is suppressed as compared with the voice interval TV. Set. Specifically, as illustrated in the insertion interval TD1 in FIG. 10, the fluctuation of the adjustment value G with respect to the level LE of the received signal SR (the rate of change of the adjustment value G with respect to the level LE) is adjusted with respect to the level LE in the voice interval TV. The adjustment value G is set so as to be gradual compared with the fluctuation of the value G. A configuration in which the adjustment value G is maintained at a constant value in the insertion interval TD whose time length is less than the threshold T0 (a configuration in which the rate of change of the adjustment value G with respect to the level LE is set to zero) may be employed.

(b).挿入区間TDの時間長が閾値T0を上回る場合(TD>T0)
挿入区間TDの時間長が閾値T0を上回る場合(直前の音声区間TVの終点から利用者U2が発声することなく充分な時間長が経過した場合)には、利用者U2の一連の発話が終了した状況(例えば発話主体が利用者U1に変更されて利用者U2は音声通信装置D1の利用者U1の音声を聴取している状況)にあると推定されるから、利用者U1の周囲の環境音が適切なレベルで利用者U2に伝達されることが好ましい。以上の傾向を考慮して、調整値設定部44は、受話信号SRのレベルLEに応じた第2音響信号S2のレベルの変動が音声区間TV1と同等になるように調整値Gを設定する。具体的には、図10の挿入区間TD2に例示される通り、直前の音声区間TV2の終点からの経過時間が閾値T0を上回る時点から、音声区間TVと同様の調整値Gの制御が開始される。音声区間TV2の終点からの経過時間が閾値T0を上回った時点で、受話信号SRのレベルLEに応じて調整値Gを最大値まで増大させる構成も考えられる。しかし、調整値Gが急激に増大する構成では、利用者U2が違和感を知覚する可能性がある。そこで、図10に例示される通り、利用者U1の周囲の環境音が適切なレベルで利用者U2に伝達するように調整値Gを徐々に増大させる。
(B). When the length of the insertion section TD exceeds the threshold value T0 (TD> T0)
When the time length of the insertion section TD exceeds the threshold value T0 (when a sufficient time length has passed without the user U2 uttering from the end point of the immediately preceding voice section TV), the series of utterances of the user U2 ends. (For example, the user U2 is changing to the user U1 and the user U2 is listening to the voice of the user U1 of the voice communication device D1). The sound is preferably transmitted to the user U2 at an appropriate level. Considering the above tendency, the adjustment value setting unit 44 sets the adjustment value G so that the fluctuation of the level of the second acoustic signal S2 corresponding to the level LE of the received signal SR becomes equal to that of the voice section TV1. Specifically, as illustrated in the insertion interval TD2 in FIG. 10, the control of the adjustment value G similar to that in the audio interval TV is started when the elapsed time from the end point of the immediately previous audio interval TV2 exceeds the threshold T0. The A configuration in which the adjustment value G is increased to the maximum value according to the level LE of the received signal SR when the elapsed time from the end point of the voice section TV2 exceeds the threshold value T0 is also conceivable. However, in the configuration in which the adjustment value G increases rapidly, the user U2 may perceive a sense of discomfort. Therefore, as illustrated in FIG. 10, the adjustment value G is gradually increased so that the environmental sound around the user U1 is transmitted to the user U2 at an appropriate level.

調整部130は、音声区間TVおよび挿入区間TDの各区間について設定された調整値Gを環境音信号SEに乗算することで第2音響信号S2のレベルを制御する。以降の処理については、第1実施形態と同様であるので詳細な説明を省略する。   The adjustment unit 130 controls the level of the second acoustic signal S2 by multiplying the environmental sound signal SE by the adjustment value G set for each of the voice interval TV and the insertion interval TD. Since the subsequent processing is the same as in the first embodiment, detailed description thereof is omitted.

第2実施形態においても第1実施形態と同様の効果が実現される。また、第2実施形態では、受話信号SRのうち挿入区間TD(時間長が閾値T0を下回る挿入区間TD)において、第2音響信号S2のレベルの変動が音声区間TVと比較して抑制されるように第2音響信号S2のレベルが制御される。したがって、挿入区間TD内での環境音の変動に起因して利用者U2が違和感を知覚する可能性が低減されるという利点がある。   In the second embodiment, the same effect as in the first embodiment is realized. Further, in the second embodiment, in the insertion section TD (insertion section TD whose time length is less than the threshold value T0) in the reception signal SR, fluctuations in the level of the second acoustic signal S2 are suppressed compared to the voice section TV. In this way, the level of the second acoustic signal S2 is controlled. Therefore, there is an advantage that the possibility that the user U2 perceives a sense of incongruity due to the fluctuation of the environmental sound in the insertion section TD is reduced.

他方、時間長が閾値T0を上回る挿入区間TD(利用者U2の一連の発話が終了したと推定される状況)では、受話信号SRのレベルLEに応じた第2音響信号S2のレベルの制御が音声区間TVと同様の制御に復帰する。したがって、利用者U2の発話の終了後には、利用者U1側の環境音を利用者U2に適切に伝達できるという利点がある。以上の説明から理解される通り、第2実施形態では、利用者U2の発話の状況に応じた適切なレベルの環境音を利用者U2に伝達することが可能である。   On the other hand, in the insertion interval TD in which the time length exceeds the threshold value T0 (a situation in which a series of utterances by the user U2 is estimated to have ended), the control of the level of the second acoustic signal S2 according to the level LE of the received signal SR is performed. The control returns to the same control as in the voice section TV. Therefore, there is an advantage that the environmental sound on the user U1 side can be appropriately transmitted to the user U2 after the end of the utterance of the user U2. As understood from the above description, in the second embodiment, it is possible to transmit an environmental sound of an appropriate level according to the state of the utterance of the user U2 to the user U2.

<第3実施形態>
第1実施形態では、受話信号SRのレベルLEに対して調整値Gを直線的に変化させたが、レベルLEに対する調整値Gの変化の態様は以上の例示に限定されない。例えば、受話信号SRに起因したハウリングが発生しない程度に受話信号SRのレベルLEが低い範囲では、第2音響信号S2のレベルの低減によるハウリングの抑制よりも、音声通信装置D2の利用者U2に対する環境音の伝達を優先させるべきである。以上の事情を考慮して、第3実施形態では、受話信号SRのレベルLEが所定の閾値L0を上回る場合に、当該レベルLEの増加に応じて調整値Gが低減されるように調整値Gが設定される。
<Third Embodiment>
In the first embodiment, the adjustment value G is linearly changed with respect to the level LE of the received signal SR, but the mode of change of the adjustment value G with respect to the level LE is not limited to the above example. For example, in a range where the level LE of the received signal SR is so low that howling due to the received signal SR does not occur, the feedback to the user U2 of the voice communication device D2 is more effective than the suppression of howling by reducing the level of the second acoustic signal S2. Prioritize environmental sound transmission. Considering the above circumstances, in the third embodiment, when the level LE of the received signal SR exceeds a predetermined threshold value L0, the adjustment value G is reduced so that the adjustment value G is reduced as the level LE increases. Is set.

図11は、第3実施形態における受話信号SRのレベルLEと調整値Gとの関係(調整値テーブル)の説明図である。図11に例示されるように、受話信号SRのレベルLEが閾値L0を下回る場合(LE<L0)には、調整値Gは、レベルLEに依存しない所定値(例えば最大値1)に維持される。他方、受話信号SRのレベルLEが閾値L0を上回る場合(LE>L0)には、受話信号SRのレベルLEが増加するほど調整値Gが低減するように調整値Gが設定される。具体的には、レベルLEの値域のうち閾値L0を上回る領域が複数の範囲に区分され、レベルLEに対する調整値Gの変化率が範囲毎に個別に設定される。図11では、レベルLEが大きい範囲ほどレベルLEに対する調整値Gの変化率(勾配)が増加する場合が例示されている。   FIG. 11 is an explanatory diagram of the relationship (adjustment value table) between the level LE of the received signal SR and the adjustment value G in the third embodiment. As illustrated in FIG. 11, when the level LE of the received signal SR is lower than the threshold value L0 (LE <L0), the adjustment value G is maintained at a predetermined value (for example, the maximum value 1) independent of the level LE. The On the other hand, when the level LE of the received signal SR exceeds the threshold value L0 (LE> L0), the adjustment value G is set so that the adjustment value G decreases as the level LE of the received signal SR increases. Specifically, an area exceeding the threshold L0 in the range of the level LE is divided into a plurality of ranges, and the rate of change of the adjustment value G with respect to the level LE is set individually for each range. FIG. 11 illustrates a case where the rate of change (gradient) of the adjustment value G with respect to the level LE increases as the level LE increases.

第3実施形態においても第1実施形態と同様の効果が実現される。また、第3実施形態では、受話信号SRのレベルLEが閾値L0を上回る場合にのみ、調整値Gが低減されるように(ひいては第2音響信号S2のレベルが低下するように)、すなわち、受話信号SRのレベルLEに応じて調整値Gの変化の度合いが異なるように調整値Gが設定される。したがって、第2音響信号S2のレベルを受話信号SRのレベルLEに連動させる度合いを受話信号SRのレベルLEに関わらず一定とした構成と比較して、適切に環境音が伝達されるという利点がある。   In the third embodiment, the same effect as in the first embodiment is realized. In the third embodiment, the adjustment value G is reduced only when the level LE of the received signal SR exceeds the threshold value L0 (so that the level of the second acoustic signal S2 is lowered), that is, The adjustment value G is set so that the degree of change of the adjustment value G differs according to the level LE of the received signal SR. Therefore, there is an advantage that the environmental sound is appropriately transmitted as compared with the configuration in which the level of the second acoustic signal S2 is linked to the level LE of the received signal SR regardless of the level LE of the received signal SR. is there.

<第4実施形態>
図12は、第4実施形態の音声通信装置D1のブロック図である。第1実施形態では、目的音の収音用の収音機器22Aとは別個に、環境音の収音用の収音機器22Bを設置した。第4実施形態では、図12に例示される通り、目的音の収音と環境音の収音とに共通の収音機器22A2が兼用される。収音機器22A2が生成した収音信号MA2は、第1信号処理部11および第2信号処理部12の双方に供給され、第2信号処理部12は、収音信号MA2の環境音成分を強調することで第2音響信号S2を生成する。以上の構成では、目的音の収音用の収音機器と環境音の収音用の収音機器とを別個に設ける必要がないから、音声通信装置D1の装置構成が簡略化されるという利点がある。なお、収音機器22A1を指向性のマイクロホンによって構成し、指向方向を音声通信装置D1の利用者の口元としてもよい。また、目的音の収音と環境音の収音とに兼用される収音機器22A2を無指向性のマイクロホンによって構成しても良い。
<Fourth embodiment>
FIG. 12 is a block diagram of the voice communication device D1 of the fourth embodiment. In the first embodiment, the sound collecting device 22B for collecting the environmental sound is installed separately from the sound collecting device 22A for collecting the target sound. In the fourth embodiment, as illustrated in FIG. 12, a common sound collecting device 22A2 is also used for collecting the target sound and the environmental sound. The sound collection signal MA2 generated by the sound collection device 22A2 is supplied to both the first signal processing unit 11 and the second signal processing unit 12, and the second signal processing unit 12 emphasizes the environmental sound component of the sound collection signal MA2. As a result, the second acoustic signal S2 is generated. In the above configuration, it is not necessary to separately provide a sound collecting device for collecting the target sound and a sound collecting device for collecting the environmental sound, so that the device configuration of the voice communication device D1 is simplified. There is. Note that the sound collecting device 22A1 may be configured by a directional microphone, and the directional direction may be the mouth of the user of the voice communication device D1. Further, the sound collecting device 22A2 that is used for collecting the target sound and the environmental sound may be constituted by a non-directional microphone.

<変形例>
前述の各形態は多様に変形され得る。具体的な変形の態様を以下に例示する。以下の例示から任意に選択された2以上の態様を適宜に併合することも可能である。
<Modification>
Each of the above-described embodiments can be variously modified. Specific modifications are exemplified below. Two or more modes arbitrarily selected from the following examples can be appropriately combined.

(1)前述の各形態で例示した第1信号処理部11および第2信号処理部12の構成要素は任意であり、図4や図5に例示された要素は適宜に省略され得る。例えば、第1信号処理部11から指向制御部112を省略した構成も好適に採用され得る。前述の各形態のように指向制御部112を包含する構成では、指向方向を制御するビーム形成処理のために複数の収音機器22(22A1,22A2)が必要であるが、指向制御部112を省略した構成(ビーム形成処理を実行しない構成)では、例えば図13に例示される通り、収音部20を1個の収音機器22A1により構成することも可能である。収音機器22A1から収音された収音信号MA1は、第1信号処理部11および第2信号処理部12に供給される。以上の説明から理解される通り、前述の各形態の収音部20は、目的音と環境音とを含む音響を収音して収音信号Mを生成する要素として包括的に表現され、収音部20を構成する収音機器22の個数や指向性の有無は不問である。 (1) The components of the first signal processing unit 11 and the second signal processing unit 12 illustrated in the above-described embodiments are arbitrary, and the components illustrated in FIGS. 4 and 5 can be omitted as appropriate. For example, a configuration in which the directing control unit 112 is omitted from the first signal processing unit 11 can be suitably employed. In the configuration including the directivity control unit 112 as in each of the above-described embodiments, a plurality of sound collecting devices 22 (22A1, 22A2) are necessary for the beam forming process for controlling the directivity direction. In the omitted configuration (configuration in which beam forming processing is not executed), for example, as illustrated in FIG. 13, the sound collection unit 20 may be configured by one sound collection device 22A1. The collected sound signal MA1 collected from the sound collecting device 22A1 is supplied to the first signal processing unit 11 and the second signal processing unit 12. As understood from the above description, the sound collection unit 20 of each of the above-described forms is comprehensively expressed as an element that collects sound including the target sound and the environmental sound and generates the sound collection signal M, and collects sound. The number of sound collecting devices 22 constituting the sound unit 20 and the presence or absence of directivity are not questioned.

(2)前述の各形態では、目的音成分に対応する第1音響信号S1と環境音成分に対応する第2音響信号S2との加算結果に応じた送話信号STを通話相手の音声通信装置D2に送信する構成を例示したが、第1音響信号S1と第2音響信号S2とを相互に個別に送信して音声通信装置D2で加算することも可能である。すなわち、音声通信装置D1の加算部13は省略され得る。以上の説明から理解される通り、前述の各形態の送信部32は、第1音響信号S1と第2音響信号S2とを送信する要素として包括的に表現され、第1音響信号S1と第2音響信号S2との加算信号(送話信号ST)を送信する構成と、第1音響信号S1および第2音響信号S2の各々を送信する構成との双方を包含する。 (2) In each of the above-described embodiments, the transmission signal ST corresponding to the addition result of the first acoustic signal S1 corresponding to the target sound component and the second acoustic signal S2 corresponding to the environmental sound component is used as the voice communication device of the communication partner. Although the configuration of transmitting to D2 has been illustrated, the first acoustic signal S1 and the second acoustic signal S2 can be transmitted separately from each other and added by the voice communication device D2. That is, the adding unit 13 of the voice communication device D1 can be omitted. As understood from the above description, the transmission unit 32 of each of the above-described forms is comprehensively expressed as an element that transmits the first acoustic signal S1 and the second acoustic signal S2, and the first acoustic signal S1 and the second acoustic signal S2 It includes both a configuration for transmitting an addition signal (transmission signal ST) with the acoustic signal S2 and a configuration for transmitting each of the first acoustic signal S1 and the second acoustic signal S2.

(3)通信網200としては、広帯域のIP(Internet Protocol)網、公共無線LAN(WiFi)が好適に採用され得る。発声音(目的音成分)に対応する周波数成分は比較的に低帯域であり、環境音に対応する周波数成分は比較的に高帯域であるから、高速データ通信規格に準拠した広帯域の通信システムが好適である。 (3) As the communication network 200, a broadband IP (Internet Protocol) network or a public wireless LAN (WiFi) can be suitably employed. Since the frequency component corresponding to the uttered sound (target sound component) is relatively low-band and the frequency component corresponding to environmental sound is relatively high-band, a broadband communication system compliant with high-speed data communication standards is available. Is preferred.

(4)前述の各形態では、眼鏡型のウェアラブル端末を音声通信装置Dとして例示したが、音声通話が可能な電子機器であって利用者による携行が可能であれば、音声通信装置Dの形態は任意である。例えば、携帯電話機やスマートフォン等の公知の通信端末が音声通信装置Dとして任意に利用され得る。 (4) In each of the above-described embodiments, the glasses-type wearable terminal is exemplified as the voice communication device D. However, if the electronic device is capable of voice communication and can be carried by the user, the voice communication device D may be used. Is optional. For example, a known communication terminal such as a mobile phone or a smartphone can be arbitrarily used as the voice communication device D.

(5)前述の各形態では、利用者の周囲の環境音を目的音とともに通話相手に伝達する構成を例示したが、現実的には、通話相手に自分の居場所を知らせたくない場合も想定される。そこで、環境音を前述の各形態と同様に目的音に付加する動作モードと、環境音を付加しない動作モード(第2信号処理部12の動作を無効化する動作モード)とを利用者が任意に選択できる構成も採用され得る。 (5) In the above-described embodiments, the configuration in which the environmental sound around the user is transmitted to the other party along with the target sound is exemplified. However, in reality, it may be assumed that the other party does not want to know his / her whereabouts. The Therefore, the user arbitrarily selects an operation mode in which the environmental sound is added to the target sound in the same manner as the above-described embodiments and an operation mode in which the environmental sound is not added (an operation mode in which the operation of the second signal processing unit 12 is invalidated). It is also possible to adopt a configuration that can be selected.

(6)前述の各形態では、受話信号SRのレベルLEに対して調整値Gを直線的に変化させたが、レベルLEに対する調整値Gの変化の態様は以上の例示に限定されない。例えば、受話信号SRのレベルLEに対して調整値Gを非線形に変化させる構成や、受話信号SRのレベルLEに対して調整値Gを曲線的に規定される構成も採用され得る。 (6) In each of the above-described embodiments, the adjustment value G is linearly changed with respect to the level LE of the received signal SR. However, the mode of change of the adjustment value G with respect to the level LE is not limited to the above examples. For example, a configuration in which the adjustment value G is nonlinearly changed with respect to the level LE of the received signal SR or a configuration in which the adjustment value G is defined in a curve with respect to the level LE of the received signal SR can be employed.

D……音声通信装置、10……音響処理部、11……第1信号処理部、12……第2信号処理部、20……収音部、22A1,22A2,22B……収音機器,30……通信部、32……送信部、34……受信部、120……調音処理部、130……調整部、40……制御部、42……レベル算出部、44……調整値設定部、200……通信網。
D: Voice communication device, 10: Acoustic processing unit, 11: First signal processing unit, 12: Second signal processing unit, 20: Sound collection unit, 22A1, 22A2, 22B ... Sound collection device, 30 …… Communication unit, 32 …… Transmitting unit, 34 …… Reception unit, 120 …… Articulation processing unit, 130 …… Adjustment unit, 40 …… Control unit, 42 …… Level calculation unit, 44 …… Adjustment value setting Department, 200 ... communication network.

Claims (5)

通話相手の通信装置から送信された受話信号を受信する受信部と、
前記受信部が受信した前記受話信号に応じた音響を放音する放音部と、
目的音と環境音とを含む音響を収音して収音信号を生成する収音部と、
前記収音部が生成した前記収音信号のうち前記目的音成分を前記環境音成分に対して強調する第1信号処理により第1音響信号を生成する第1信号処理部と、
前記収音部が生成した前記収音信号のうち前記環境音成分を前記目的音成分に対して強調する、第1信号処理とは異なる第2信号処理により第2音響信号を生成する第2信号処理部と、
前記受信部が受信した前記受話信号のレベルに応じて前記第2音響信号のレベルを制御する制御部と
前記第1音響信号と前記第2音響信号とを送信する送信部と
を具備する音声通信装置。
A receiving unit for receiving a reception signal transmitted from a communication device of a communication partner;
A sound emitting unit that emits sound according to the received signal received by the receiving unit;
A sound collection unit that collects sound including target sound and environmental sound and generates a sound collection signal;
A first signal processing unit for generating a first acoustic signal by the first signal processing to emphasize the target sound components to the environmental sound component of the collected sound signal the sound pickup unit has generated,
Emphasizing the environmental sound component with respect to the target sound components of the sound collection signals the sound pickup unit has generated, the second signal to generate a second audio signal by a different second signal processing from the first signal processing A processing unit;
A control unit for controlling the level of the second acoustic signal according to the level of the received signal received by the receiving unit ;
A voice communication device comprising: a transmission unit that transmits the first acoustic signal and the second acoustic signal.
前記制御部は、前記受話信号のレベルが高いほど前記第2音響信号のレベルが低下するように前記第2音響信号のレベルを制御する
請求項1の音声通信装置。
The voice communication apparatus according to claim 1, wherein the control unit controls the level of the second acoustic signal so that the level of the second acoustic signal decreases as the level of the received signal increases.
前記制御部は、前記受話信号を音声区間と前記音声区間以外の挿入区間とに区分し、前記挿入区間において、前記受話信号のレベルに対する前記第2音響信号のレベルの変動が前記音声区間と比較して低減されるように前記第2音響信号のレベルを制御する
請求項1または請求項2の音声通信装置。
The control unit divides the received signal into a voice section and an insertion section other than the voice section, and a fluctuation in the level of the second acoustic signal with respect to the level of the received signal is compared with the voice section in the insertion section. The voice communication device according to claim 1, wherein the level of the second acoustic signal is controlled so as to be reduced.
前記制御部は、前記挿入区間の時間長が閾値を下回る場合に、当該挿入区間において、前記受話信号のレベルに対する前記第2音響信号のレベルの変動が前記音声区間と比較して低減されるように前記第2音響信号のレベルを制御する一方、前記挿入区間の時間長が前記閾値を上回る場合に、当該挿入区間において、前記受話信号のレベルに対する前記第2音響信号のレベルの変動が前記音声区間と同等となるように前記第2音響信号のレベルを制御する
請求項3の音声通信装置。
When the time length of the insertion section is less than a threshold, the control unit is configured to reduce a variation in the level of the second acoustic signal with respect to the level of the received signal in the insertion section as compared with the voice section. When the time length of the insertion section exceeds the threshold value, the fluctuation of the level of the second acoustic signal with respect to the level of the received signal is detected in the insertion section. The voice communication apparatus according to claim 3, wherein the level of the second acoustic signal is controlled to be equal to the section.
通話相手の通信装置から送信された受話信号を受信する受信部、
前記受信部が受信した前記受話信号に応じた音響を放音する放音部、
目的音と環境音とを含む音響を収音して収音信号を生成する収音部、
前記収音部が生成した前記収音信号のうち前記目的音成分を前記環境音成分に対して強調する第1信号処理により第1音響信号を生成する第1信号処理部、
前記収音部が生成した前記収音信号のうち前記環境音成分を前記目的音成分に対して強調する、第1信号処理とは異なる第2信号処理により第2音響信号を生成する第2信号処理部、
前記受信部が受信した前記受話信号のレベルに応じて前記第2音響信号のレベルを制御する制御部、および、
前記第1音響信号と前記第2音響信号とを送信する送信部
としてコンピュータを機能させるプログラム。
A receiving unit for receiving a reception signal transmitted from a communication device of a communication partner;
A sound emitting unit that emits sound according to the received signal received by the receiving unit;
A sound collection unit that collects sound including target sound and environmental sound to generate a sound collection signal,
The first signal processing unit for generating a first acoustic signal by the first signal processing to emphasize the target sound components to the environmental sound component of the sound pickup unit is the sound pickup signal generated,
Emphasizing the environmental sound component with respect to the target sound components of the sound collection signals the sound pickup unit has generated, the second signal to generate a second audio signal by a different second signal processing from the first signal processing Processing section,
A control unit for controlling the level of the second acoustic signal in accordance with the level of the received signal received by the receiving unit; and
A program that causes a computer to function as a transmission unit that transmits the first acoustic signal and the second acoustic signal.
JP2014143053A 2014-07-11 2014-07-11 Voice communication apparatus and program Active JP6409378B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2014143053A JP6409378B2 (en) 2014-07-11 2014-07-11 Voice communication apparatus and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2014143053A JP6409378B2 (en) 2014-07-11 2014-07-11 Voice communication apparatus and program

Publications (2)

Publication Number Publication Date
JP2016019263A JP2016019263A (en) 2016-02-01
JP6409378B2 true JP6409378B2 (en) 2018-10-24

Family

ID=55234157

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2014143053A Active JP6409378B2 (en) 2014-07-11 2014-07-11 Voice communication apparatus and program

Country Status (1)

Country Link
JP (1) JP6409378B2 (en)

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2636897B2 (en) * 1988-08-25 1997-07-30 富士通株式会社 Hands-free communication circuit
JP3155791B2 (en) * 1991-11-11 2001-04-16 株式会社東芝 Wireless telephone equipment
JPH10224459A (en) * 1997-02-10 1998-08-21 Atsuden Kk Hands free device
JP4483134B2 (en) * 2001-06-11 2010-06-16 パナソニック電工株式会社 Loudspeaker call system
JP4042701B2 (en) * 2004-01-27 2008-02-06 松下電工株式会社 Intercom base unit
JP2006287431A (en) * 2005-03-31 2006-10-19 Saxa Inc Telephone device
JP5293342B2 (en) * 2009-03-30 2013-09-18 沖電気工業株式会社 Voice communication apparatus, method and program
JP2012105115A (en) * 2010-11-11 2012-05-31 Oki Electric Ind Co Ltd Echo canceller, echo cancellation program, and telephone apparatus

Also Published As

Publication number Publication date
JP2016019263A (en) 2016-02-01

Similar Documents

Publication Publication Date Title
EP2899996B1 (en) Signal enhancement using wireless streaming
US8204263B2 (en) Method of estimating weighting function of audio signals in a hearing aid
JP5315506B2 (en) Method and system for bone conduction sound propagation
US9432778B2 (en) Hearing aid with improved localization of a monaural signal source
US20130337796A1 (en) Audio Communication Networks
US20190110137A1 (en) Binaural hearing system with localization of sound sources
CN108769884B (en) Binaural level and/or gain estimator and hearing system comprising the same
US9191755B2 (en) Spatial enhancement mode for hearing aids
CN108694956B (en) Hearing device with adaptive sub-band beamforming and related methods
EP3777114B1 (en) Dynamically adjustable sidetone generation
CN113544775B (en) Audio signal enhancement for head-mounted audio devices
JP2017063419A (en) Method of determining objective perceptual quantity of noisy speech signal
CN109218948B (en) Hearing aid system, system signal processing unit and method for generating an enhanced electrical audio signal
CN113825076A (en) Method for direction dependent noise suppression for a hearing system comprising a hearing device
EP2928213B1 (en) A hearing aid with improved localization of a monaural signal source
US11153695B2 (en) Hearing devices and related methods
WO2020035158A1 (en) Method of operating a hearing aid system and a hearing aid system
JP6409378B2 (en) Voice communication apparatus and program
US10643597B2 (en) Method and device for generating and providing an audio signal for enhancing a hearing impression at live events
US11438712B2 (en) Method of operating a hearing aid system and a hearing aid system
KR102139599B1 (en) Sound transferring apparatus
JP2011199697A (en) Headphone
US11617037B2 (en) Hearing device with omnidirectional sensitivity
JP2021536207A (en) Hearing device environment Methods, systems, and hearing devices for enhancing audio signals
EP3886463A1 (en) Method at a hearing device

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20170524

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20180517

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20180522

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20180720

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20180828

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20180910

R151 Written notification of patent or utility model registration

Ref document number: 6409378

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R151