JP4735419B2

JP4735419B2 - Voice communication device

Info

Publication number: JP4735419B2
Application number: JP2006149590A
Authority: JP
Inventors: 拓夫阿部; 一樹丹羽
Original assignee: Aiphone Co Ltd
Current assignee: Aiphone Co Ltd
Priority date: 2006-05-30
Filing date: 2006-05-30
Publication date: 2011-07-27
Anticipated expiration: 2026-05-30
Also published as: JP2007324675A

Description

本発明は、騒音に応じて通話状態を維持する音声通話装置に利用する。 The present invention is used in a voice call device that maintains a call state according to noise.

音声通話装置は、異なる場所（これを近端および遠端と呼ぶ）に居る二者が互いに通話を行うための装置であり、この二者の内、近端に居る方を近端話者と呼び、遠端に居る方を遠端話者と呼ぶ。また、遠端話者が発声して遠端でマイクによって収録された遠端音声信号が入力される端部を受話入力端と呼び、この受話入力端に入力された音声信号が近端でスピーカによって出力される端部を受話出力端と呼び、近端話者が発声して近端でマイクによって収録された近端音声信号が入力される端部を送話入力端と呼び、この送話入力端に入力された音声信号が遠端スピーカによって出力される端部を送話出力端と呼ぶ。 The voice communication device is a device for two parties in different places (referred to as the near end and the far end) to talk to each other, and one of the two parties is the near end speaker. The person at the far end is called the far end speaker. Also, the end where the far-end speaker speaks and the far-end audio signal recorded by the microphone at the far end is input is called the receiving input end, and the audio signal input to the receiving input end is the speaker at the near end. The end that is output by is called the receiving output end, and the end where the near-end speaker utters and receives the near-end audio signal recorded by the microphone at the near-end is called the transmission input end. The end where the audio signal input to the input end is output by the far-end speaker is called a transmission output end.

なお、本発明において音声信号とは、人が発する音声のみならず背景雑音も含めて音声信号という。また、人が音声を発しておらず、背景雑音のみである場合も音声信号という。 In the present invention, the audio signal is referred to as an audio signal including background noise as well as a voice emitted by a person. In addition, a case where a person does not utter a voice but only background noise is also called a voice signal.

このような音声通話装置では、例えば、近端のスピーカから出力された遠端話者の音声が近端のマイクに入力されて遠端にある送話出力端に達して遠端のスピーカから出力され、これが再び遠端のマイクに入力されて近端のスピーカから出力されるといったことを繰り返す場合がある。このような状態が発生すると遠端話者には自己の音声がエコーとしてしばらく持続し、その結果、遠端話者の音声入力を阻害する場合がある。 In such a voice communication device, for example, the far-end speaker's voice output from the near-end speaker is input to the near-end microphone, reaches the transmission output end at the far-end, and is output from the far-end speaker. In some cases, this is repeatedly input to the far-end microphone and output from the near-end speaker. When such a situation occurs, the voice of the far-end speaker continues for a while as an echo, and as a result, the voice input of the far-end speaker may be hindered.

また、音声通話装置は、遠端との通話において送話出力端から受話入力端までの利得が０ｄＢ未満であることを条件とするが、送話出力端から受話入力端までの利得が０ｄＢ（もしくは送話出力端から受話入力端までを短絡した状態）では、近端音声信号によって発振状態に陥ることによりハウリングが発生する場合がある。その結果、通話が阻害される場合がある。 Further, the voice communication device is provided that the gain from the transmission output end to the reception input end is less than 0 dB in a call with the far end, but the gain from the transmission output end to the reception input end is 0 dB ( Alternatively, in a state where the transmission output terminal to the reception input terminal are short-circuited), howling may occur due to an oscillation state caused by a near-end audio signal. As a result, the call may be hindered.

また、上記における近端および遠端の立場を入れ替えても同じことがいえる。 The same can be said even if the positions of the near end and the far end in the above are interchanged.

このような事態を回避するために、近端話者の音声および遠端話者の音声の有無をそれぞれ検出し、近端または遠端において、音声が検出されていない側の信号経路に損失挿入を行うことにより、エコーまたはハウリングを回避する。また、近端および遠端の双方で音声が検出されている場合には、いずれか一方に損失挿入を行うことにより、エコーまたはハウリングを回避する。 In order to avoid such a situation, the presence or absence of near-end speech and far-end speech is detected, and loss is inserted into the signal path on the near-end or far-end where no speech is detected. To avoid echo or howling. In addition, when voice is detected at both the near end and the far end, echo or howling is avoided by performing loss insertion on either one.

さらに、受話出力端から出力される音声信号と同じ音声信号が送話入力端から入力されたことを検出し、これをキャンセルするエコーキャンセラを備えることにより、上述した損失挿入量を小さくすることができる。損失挿入量が小ければ小さいほど音声信号の伝達効率が高くなり、通話に要する音量を確保することが容易になるので、損失挿入量は小さい方が望ましい。 Furthermore, the above-described loss insertion amount can be reduced by providing an echo canceller that detects that the same audio signal as the audio signal output from the reception output terminal is input from the transmission input terminal and cancels this. it can. The smaller the loss insertion amount, the higher the transmission efficiency of the audio signal and the easier it is to secure the volume required for the call. Therefore, it is desirable that the loss insertion amount is smaller.

特開平１０−２４３０８２号公報Japanese Patent Laid-Open No. 10-243082 特開２００５−３４７８７５号公報JP 2005-347875 A

上述した従来技術におけるエコーキャンセラは、近端の背景雑音が比較的小さい場合にはエコー消去量が十分であることが期待できるが、近端の背景雑音が比較的大きい場合には、キャンセルすべきエコーと背景雑音とを明確に区別することが困難となるため、エコー消去性能が低下して通話に支障が生じる。さらに、音声の有無を検出する手段が背景雑音を近端話者の送話として誤検出してしまうと、受話入力端と受話出力端との間の信号経路に損失挿入を行ってしまうため、近端側では遠端話者の音声を聞き取ることが困難となる場合がある。 The echo canceller in the above-described prior art can be expected to have a sufficient echo cancellation amount when the near-end background noise is relatively small, but should be canceled when the near-end background noise is relatively large. Since it becomes difficult to clearly distinguish the echo and the background noise, the echo cancellation performance is deteriorated and the call is hindered. Furthermore, if the means for detecting the presence or absence of speech erroneously detects background noise as a near-end speaker's transmission, it will cause loss insertion in the signal path between the receiving input end and the receiving output end, It may be difficult to hear the voice of the far-end speaker on the near-end side.

このような問題を解決するために、近端側の入力信号をエコーキャンセラの受話側に加算する方法（例えば、特許文献１参照）などの適応フィルタによる解決法や、浴室に限定されているが、損失推定量と固定損失量とを近端の背景雑音によって使い分ける方法（例えば、特許文献２参照）などが提案されている。 In order to solve such a problem, it is limited to a solution using an adaptive filter such as a method of adding the input signal on the near end side to the receiving side of the echo canceller (see, for example, Patent Document 1) or a bathroom. For example, a method of using the loss estimation amount and the fixed loss amount properly depending on the near-end background noise (see, for example, Patent Document 2) has been proposed.

しかしながら、例えば、特許文献１の提案では、適用可能な背景雑音は、伝送される音声信号とフィルタによって明確に区別できる雑音である必要があり、実際に存在する様々な背景雑音には適用できない。また、特許文献２の提案は、浴室内での使用に特化した提案であり、浴室以外に適用範囲を拡張することは難しい。 However, for example, in the proposal of Patent Document 1, the applicable background noise needs to be clearly distinguishable by a voice signal to be transmitted and a filter, and cannot be applied to various actually existing background noises. Moreover, the proposal of patent document 2 is a proposal specialized for the use in a bathroom, and it is difficult to extend an application range other than a bathroom.

本発明は、このような背景の下に行われたものであって、様々な雑音環境下において良好な通話を行うことができる音声通話装置を提供することを目的とする。 The present invention has been made under such a background, and an object of the present invention is to provide a voice call device capable of making a good call under various noise environments.

本発明は、送話入力端と送話出力端との間および受話入力端と受話出力端との間にそれぞれ送話信号経路および受話信号経路を設け、近端話者と遠端話者との間で通話を行う音声通話手段を備え、この音声通話手段は、前記送話信号経路および前記受話信号経路における音声信号の有無をそれぞれ検出する音声信号検出手段と、この音声信号検出手段の検出結果にしたがって前記送話信号経路または前記受話信号経路の損失挿入箇所および損失挿入量を適応的に設定する損失挿入手段と、前記受話出力端と前記送話入力端との間に設けられ、前記遠端話者の音声信号により生じるエコーをキャンセルするエコーキャンセラと、受話損失挿入後の音声信号と前記エコーキャンセラによるエコー消去後の音声信号とに基づき所要の損失挿入量を推定する損失量推定手段とを備えた音声通話装置である。 The present invention provides a transmission signal path and a reception signal path between a transmission input terminal and a transmission output terminal and between a reception input terminal and a reception output terminal, respectively. Voice communication means for making a call between the voice signal detection means for detecting the presence or absence of a voice signal in the transmission signal path and the reception signal path, and detection of the voice signal detection means Loss insertion means for adaptively setting the loss insertion location and loss insertion amount of the transmission signal path or the reception signal path according to the result, provided between the reception output terminal and the transmission input terminal, The required loss insertion amount is estimated based on the echo canceller that cancels the echo generated by the far-end speaker's speech signal, the speech signal after insertion loss insertion and the speech signal after echo cancellation by the echo canceller. A voice communication device that includes a loss amount estimating means for.

ここで、本発明の特徴とするところは、前記送話入力端から入力された音声信号または前記エコーキャンセラによるエコー消去後の音声信号の所定時間における平均的な信号強度を推定ノイズレベルとし、当該推定ノイズレベルの強度に応じ、予め定められた複数段階に通話環境の良否を分類する状態分類手段を備えたところにある。このように本発明では、通話環境の良否を分類し、様々な雑音環境下において最適な制御を行うことを特徴とする。 Here, a feature of the present invention is that an average signal strength in a predetermined time of a voice signal input from the transmission input terminal or a voice signal after echo cancellation by the echo canceller is an estimated noise level, According to the present invention, there is provided a state classification means for classifying the quality of the call environment in a plurality of predetermined stages according to the strength of the estimated noise level. As described above, the present invention is characterized in that the quality of the call environment is classified and optimal control is performed under various noise environments.

例えば、前記状態分類手段は、前記推定ノイズレベルの強度に応じ、低雑音であり前記損失量推定手段による損失挿入量の推定値（以下では、これを推定損失量と呼ぶ）が適用範囲内および前記音声信号検出手段による音声信号の有無の検出および前記エコーキャンセラによるエコー消去が可能である低雑音下通話状態、高雑音であり前記推定損失量が適用範囲外および前記音声信号検出手段による音声信号の有無の検出が不確定および前記エコーキャンセラによるエコー消去が可能である雑音下通話状態、さらに高雑音であり前記推定損失量が適用範囲外および前記音声信号検出手段による音声信号の有無の検出および前記エコーキャンセラによるエコー消去が不可能である高雑音下通話状態の三つの状態に分類する手段を備える。 For example, the state classification means is low noise according to the strength of the estimated noise level, and an estimated value of loss insertion amount by the loss amount estimation means (hereinafter referred to as an estimated loss amount) is within the applicable range and The voice signal detection means detects the presence / absence of a voice signal and the echo canceller can cancel the echo. The voice signal is low and the estimated loss amount is out of the applicable range and the voice signal is detected by the voice signal detection means. Detection of the presence or absence of speech and the state of speech under noise in which echo cancellation by the echo canceller is possible, detection of the presence or absence of a speech signal by the speech signal detection means, and the estimated loss amount being out of the applicable range when the noise is high There is provided means for classifying into three states of a call state under high noise in which echo cancellation by the echo canceller is impossible.

このようにして分類が行われることにより、例えば、前記損失挿入手段は、前記状態分類手段により低雑音下通話状態に分類され、前記受話信号経路で音声信号を検出し、前記送話信号経路で音声信号を検出しないときには、前記受話信号経路に予め定められた受話出力制限損失量を損失挿入し、前記送話信号経路に前記推定損失量を損失挿入し、前記送話信号経路で音声信号を検出し、前記受話信号経路で音声信号を検出しないとき、または、前記受話信号経路および前記送話信号経路の双方で音声信号を検出したとき、または、前記受話信号経路および前記送話信号経路の双方で音声信号を検出しないときには、前記受話信号経路に前記受話出力制限損失量または前記推定損失量のいずれか大きい方を損失挿入する。 By performing the classification in this manner, for example, the loss insertion unit is classified into a low-noise call state by the state classification unit, detects a voice signal in the reception signal path, and detects in the transmission signal path. When no voice signal is detected, a predetermined amount of received power limit loss is inserted into the received signal path, the estimated amount of loss is inserted into the transmitted signal path, and a voice signal is transmitted through the transmitted signal path. And when no voice signal is detected in the received signal path, or when a voice signal is detected in both the received signal path and the transmitted signal path, or in the received signal path and the transmitted signal path. When neither of the voice signals is detected, the greater of the received output limit loss amount or the estimated loss amount is inserted into the received signal path as a loss.

すなわち、低雑音下通話状態とは、近端の背景雑音レベルが低く、送話音声検出が容易であり、エコーキャンセラによるエコー消去も可能（具体的にはエコーキャンセラ内の適応フィルタが収束する状態）であり、推定損失量も適用範囲内である状態をいう。送話信号経路で音声信号を検出したときには、送話信号経路には損失挿入しないため、近端話者の音声信号は良好に遠端に届く。また、双方の信号経路で同時に音声信号を検出したときには、受話信号経路に受話出力制限損失量よりも大きな推定損失量が損失挿入される場合があり、これにより、遠端音声が近端において多少小さくなることもあるが、近端における背景雑音レベルが低いので十分聞き取れる。低雑音下通話状態は、背景雑音が低く、最も良好な通話状態である。 In other words, the low-noise call state means that the near-end background noise level is low, the transmitted voice is easy to detect, and echo cancellation by the echo canceller is possible (specifically, the adaptive filter in the echo canceller converges) ) And the estimated loss is within the applicable range. When a voice signal is detected in the transmission signal path, no loss is inserted in the transmission signal path, so that the voice signal of the near-end speaker reaches the far end satisfactorily. Also, when voice signals are detected simultaneously on both signal paths, an estimated loss amount larger than the received output limit loss amount may be inserted into the received signal path, so that the far-end speech is somewhat at the near-end. Although it may be small, it is sufficiently audible because the background noise level at the near end is low. The low-noise call state is the best call state with low background noise.

また、例えば、前記損失挿入手段は、前記状態分類手段により雑音下通話状態に分類され、前記送話信号経路で音声信号を検出し、前記受話信号経路で音声信号を検出しないときには、前記受話信号経路に前記受話出力制限損失量または予め定められたエコーおよびハウリングを抑圧可能である第一の固定損失量のいずれか大きい方を損失挿入し、前記受話信号経路および前記送話信号経路の双方で音声信号を検出したときには、前記受話信号経路に前記受話出力制限損失量を損失挿入し、前記送話信号経路に前記第一の固定損失量を損失挿入する。 Further, for example, when the loss insertion means is classified into a call state under noise by the state classification means, detects a voice signal in the transmission signal path, and does not detect a voice signal in the reception signal path, the reception signal Insert the loss of the received output limit loss amount or the first fixed loss amount capable of suppressing a predetermined echo and howling into the path, whichever is larger, in both the received signal path and the transmitted signal path When an audio signal is detected, the received power limit loss amount is inserted into the received signal path as a loss, and the first fixed loss amount is inserted as a loss into the transmitted signal path.

すなわち、雑音下通話状態とは、音声信号検出手段が近端の背景雑音を送話音声と誤判定する確率が高くなり、また、エコーキャンセラによるエコー消去は可能であるが、推定損失量が適用範囲外となる状態をいう。背景雑音レベルが大きくなると、双方の信号経路で同時に音声信号を検出したときに、低雑音下通話状態のように、受話入力端と受話出力端との間に受話出力制限損失量以上の大きな損失挿入を行うと、受話出力が背景雑音にかき消されてしまい聞き取れなくなる場合がある。よって、雑音下通話状態では、遠端音声が入力されたときには、受話信号経路には受話出力制限損失量以上の損失挿入は行わずに送話信号経路に損失挿入する。 In other words, in the noisy call state, the probability that the voice signal detection means misdetermines the near-end background noise as the transmitted voice is high, and echo cancellation by the echo canceller is possible, but the estimated loss amount is applied. A state that is out of range. When the background noise level increases, when a voice signal is detected simultaneously on both signal paths, a large loss that is greater than the received output limit loss amount between the receiving input terminal and the receiving output terminal, as in a low noise call state. If inserted, the received output may be erased by background noise and may not be heard. Therefore, in the state of speech under noise, when a far-end voice is input, loss is inserted into the transmission signal path without performing loss insertion exceeding the reception output limit loss amount in the reception signal path.

このとき、推定損失量が適用範囲外になってしまうため、推定損失量をそのまま適用することはできないので、エコーまたはハウリングを防ぐために最低必要な損失量として第一の固定損失量を挿入する。 At this time, since the estimated loss amount is out of the applicable range, the estimated loss amount cannot be applied as it is. Therefore, the first fixed loss amount is inserted as the minimum necessary loss amount in order to prevent echo or howling.

これにより、雑音下通話状態では、近端話者と遠端話者とが同時に音声を発した場合には、近端話者の音声が第一の固定損失量の挿入によって遠端に届き難い事態も発生し得るが、近端話者または遠端話者のいずれか一方ずつが交互に通話する場合には支障無く通話可能となる。 As a result, in a noisy call state, when the near-end talker and the far-end talker speak simultaneously, it is difficult for the near-end talker to reach the far end by inserting the first fixed loss amount. Although a situation may also occur, if either the near-end speaker or the far-end speaker talks alternately, it becomes possible to talk without any problem.

しかしながら、遠端側における常に大きい背景雑音などにより受話状態が続くことで、遠端話者と近端話者との間で交互に通話できない場合についても、通話が可能となるような方策を設けることが必要となる。そのために、本発明では、上記の雑音下通話状態と比べてさらに背景雑音レベルが高い場合に適用される高雑音下通話状態を設けた。 However, there will be measures to enable a call even when the far-end speaker and the near-end speaker cannot talk alternately because the receiving state continues due to a large background noise at the far-end side. It will be necessary. Therefore, in the present invention, a call state under high noise is provided that is applied when the background noise level is higher than that in the call state under noise.

例えば、前記損失挿入手段は、前記状態分類手段により高雑音下通話状態に分類され、前記送話信号経路で音声信号を検出し、前記受話信号経路で音声信号を検出しないとき、または、前記受話信号経路および前記送話信号経路の双方で音声信号を検出したときには、前記受話信号経路に前記受話出力制限損失量を損失挿入し、前記送話信号経路に前記第一の固定損失量よりも小さい第二の固定損失量を損失挿入する。 For example, the loss insertion means is classified into a high-noise call state by the state classification means, detects a voice signal in the transmission signal path, and does not detect a voice signal in the reception signal path, or When a voice signal is detected in both the signal path and the transmission signal path, the reception output limit loss amount is inserted into the reception signal path as a loss, and is smaller than the first fixed loss amount in the transmission signal path. Insert a second fixed loss amount.

すなわち、高雑音下通話状態とは、音声信号検出手段が近端の背景雑音をほとんど送話音声として誤判定してしまい、また、エコーキャンセラによる正常なエコー消去も不可能となり、損失推定量も適用範囲外となる状態をいう。 In other words, the speech state under high noise means that the voice signal detection means misjudged the near-end background noise as the transmitted voice, and normal echo cancellation by the echo canceller is impossible, and the loss estimation amount is also large. A condition that falls outside the scope of application.

上記の雑音下通話状態において、送話信号経路および受話信号経路の双方の経路で同時に音声信号を検出しているときに、送話信号経路への第一の固定損失量の挿入によって、近端話者の音声が遠端話者に届き難い事態が生じると、近端話者は、遠端話者との間の意思疎通が困難であることを認識し、声を大きくしたり、または、口をマイクに近づけるなどの方法により、送話レベルの増大を図る。しかし、これによっても未だ遠端話者との間の意思疎通が困難であれば、近端話者は、さらに声を大きくしたり、または、さらに口をマイクに近づけるといった行動をとり、このような行動が継続すると前記推定ノイズレベルも大きくなる。 When a voice signal is detected simultaneously on both the transmission signal path and the reception signal path in the above-described noise state, the near end is inserted by inserting the first fixed loss amount into the transmission signal path. When a situation occurs where the speaker's voice cannot reach the far-end speaker, the near-end speaker recognizes that it is difficult to communicate with the far-end speaker, Increase the transmission level by bringing the mouth close to the microphone. However, if this still makes it difficult to communicate with the far-end speaker, the near-end speaker takes actions such as making the voice louder or bringing the mouth closer to the microphone. If the active action continues, the estimated noise level increases.

このような近端話者の行動や近端における背景雑音が過大であることにより前記推定ノイズレベルがきわめて大きい高雑音下通話状態では、送話信号経路に、第一の固定損失量に代えて、近端でハウリングしない最低限の損失量である第二の固定損失量を挿入し、近端音声を必要以上に抑圧せずに遠端に送出する。したがって、第二の固定損失量は第一の固定損失量と比べて小さい損失量となる。 In such a high noise call state where the estimated noise level is extremely high due to excessive background noise at the near end and the behavior of the near end speaker, the transmission signal path is replaced with the first fixed loss amount. The second fixed loss amount, which is the minimum loss amount that does not perform howling at the near end, is inserted, and the near end speech is transmitted to the far end without being suppressed more than necessary. Therefore, the second fixed loss amount is smaller than the first fixed loss amount.

これにより、近端話者の音声は遠端話者に届き易くなり、近端話者と遠端話者との間の通話が可能となる。 As a result, the voice of the near-end speaker can easily reach the far-end speaker, and a call between the near-end speaker and the far-end speaker becomes possible.

このように、様々な雑音環境下において、適切な損失挿入を行うことにより、様々な環境下において良好な通話を行うことができる。 As described above, by performing appropriate loss insertion under various noise environments, a good call can be performed under various environments.

また、本発明をプログラムの観点からみることができる。すなわち、本発明は、汎用の情報処理装置にインストールすることにより、その汎用の情報処理装置に、本発明の音声通話装置に相応する機能を実現させるプログラムである。本発明のプログラムは、記録媒体に記録されることにより、前記汎用の情報処理装置は、この記録媒体を用いて本発明のプログラムをインストールすることができる。あるいは、本発明のプログラムを保持するサーバからネットワークを介して直接前記汎用の情報処理装置に本発明のプログラムをインストールすることもできる。 Further, the present invention can be viewed from the viewpoint of a program. That is, the present invention is a program that, when installed in a general-purpose information processing apparatus, causes the general-purpose information processing apparatus to realize a function corresponding to the voice call device of the present invention. By recording the program of the present invention on a recording medium, the general-purpose information processing apparatus can install the program of the present invention using this recording medium. Alternatively, the program of the present invention can be directly installed on the general-purpose information processing apparatus via a network from a server that holds the program of the present invention.

これにより、汎用の情報処理装置を用いて、本発明の音声通話装置を実現することができる。 Thereby, the voice call device of the present invention can be realized using a general-purpose information processing device.

本発明によれば、様々な雑音環境下において良好な通話を行うことができる音声通話装置を実現することができる。 ADVANTAGE OF THE INVENTION According to this invention, the voice call apparatus which can perform a favorable call in various noise environments is realizable.

本発明の実施の形態について図面を参照して説明する。図１は本発明の実施の形態を示すブロック図である。本発明の実施の形態における音声通話装置は、遠端話者信号を受信する受話入力端１、近端へ遠端音声を出力する受話出力端６、近端に在る音を入力する送話入力端５、遠端に近端話者音声を出力する送話出力端３、近端で発生する近端エコーを消去するエコーキャンセラ１４、近端において発生する雑音状態を推定する背景雑音推定手段２１、遠端話者音声または近端話者音声の有無によって受話信号経路または送話信号経路に挿入する損失量を制御する音声スイッチ部１０、受話信号経路で損失を挿入する受話損失挿入手段１１、送話信号経路で損失を挿入する送話損失挿入手段１３、近端においてハウリングまたはエコーを抑圧するために必要な損失量を推定する損失量推定手段２０、近端スピーカの出力を抑えるために必要な損失量である受話出力制限損失量６１を推定する受話出力制限推定手段６０を備える。 Embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of the present invention. The voice communication device according to the embodiment of the present invention includes a reception input terminal 1 that receives a far-end speaker signal, a reception output terminal 6 that outputs a far-end voice to the near end, and a transmission that inputs a sound at the near end. An input terminal 5, a transmission output terminal 3 for outputting a near-end speaker voice at the far end, an echo canceller 14 for canceling a near-end echo generated at the near end, and a background noise estimation means for estimating a noise state generated at the near end 21, voice switch unit 10 for controlling the amount of loss to be inserted into the reception signal path or transmission signal path depending on the presence or absence of far-end speaker voice or near-end speaker voice, and reception loss insertion means 11 for inserting loss in the reception signal path Transmission loss insertion means 13 for inserting loss in the transmission signal path, loss amount estimation means 20 for estimating the amount of loss necessary for suppressing howling or echo at the near end, and suppressing the output of the near end speaker Required loss Comprising a receiver output limit estimating means 60 for estimating the certain reception output limit loss 61.

エコーキャンセラ１４は送話入力端５で入力される送話入力信号７と受話損失挿入後の音声信号１２を参照信号として、近端エコーを模擬するＦＩＲ型のエコー模擬フィルタ係数を適応フィルタの学習から得て、参照信号とエコー模擬フィルタの畳み込み演算の結果を疑似エコーとする。 The echo canceller 14 learns an adaptive filter using FIR type echo simulation filter coefficients that simulate near-end echoes, using the transmission input signal 7 input at the transmission input terminal 5 and the voice signal 12 after insertion of the reception loss as reference signals. And the result of the convolution operation of the reference signal and the echo simulation filter is taken as a pseudo echo.

そして、送話入力信号７から前記疑似エコーの差をエラー信号とし、このエラー信号の絶対値成分のパワーを送話入力信号７の絶対値成分で割った結果をエコー消去量とし、適応フィルタ学習開始から前記エコー消去量の監視を始め、収束時に少なくとも達してほしいエコー消去量である目的エコー消去量を前記エコー消去量が満たすまでを未収束状態、その後を収束状態と判定し、適応フィルタ状態１６として出力する機能を有する。 Then, the difference between the pseudo-echo from the transmission input signal 7 is used as an error signal, and the result obtained by dividing the power of the absolute value component of this error signal by the absolute value component of the transmission input signal 7 is used as an echo cancellation amount. The monitoring of the echo cancellation amount is started from the start, and it is determined that the target echo cancellation amount, which is an echo cancellation amount that is desired to be reached at the time of convergence, is the non-convergent state until the echo cancellation amount is satisfied, and the subsequent state is the convergence state. 16 as a function of outputting.

さらに、前記未収束状態では送話入力信号７をエコー消去後の音声信号１５として出力し、前記収束状態では前記エラー信号をエコー消去後の音声信号１５として出力する機能を有する。 Further, in the non-converged state, the transmission input signal 7 is output as an audio signal 15 after echo cancellation, and in the converged state, the error signal is output as an audio signal 15 after echo cancellation.

損失量推定手段２０は、受話損失挿入後の音声信号１２の絶対値成分の時間平均パワー（例えば、１〜３秒）を推定損失量受話パワーとし、エコー消去後の音声信号１５の絶対値成分の時間平均パワーを推定損失量送話パワーとする。 The loss amount estimation means 20 uses the time average power (for example, 1 to 3 seconds) of the absolute value component of the audio signal 12 after the reception loss is inserted as the estimated loss amount reception power, and the absolute value component of the audio signal 15 after echo cancellation. Is the estimated loss amount transmission power.

さらに、エコーを十分に抑圧し、かつ、近端エコーの変動を補うための固定損失量である損失余裕量、推定損失量２３の最小値である最小推定損失量、推定損失量２３の最大値である最大推定損失量を定める。 Furthermore, a loss margin amount that is a fixed loss amount for sufficiently suppressing echoes and compensating for fluctuations in the near-end echo, a minimum estimated loss amount that is a minimum value of the estimated loss amount 23, and a maximum value of the estimated loss amount 23 The maximum estimated loss amount is determined.

そして、前記推定損失量受話パワーの初期値を０を除く正数、前記損失推定量送話パワーの初期値を前記損失量推定受話パワーに前記第二固定損失量の逆数を乗じた値としたときに、前記推定損失量受話パワーを前記推定損失量送話パワーで割った値に前記損失余裕量を乗じ、最小推定損失量と最大推定損失量の範囲に制限する結果を推定損失量２３とする。 Then, the initial value of the estimated loss amount reception power is a positive number excluding 0, and the initial value of the loss estimation amount transmission power is a value obtained by multiplying the loss amount estimation reception power by the reciprocal of the second fixed loss amount. When the estimated loss amount reception power is divided by the estimated loss amount transmission power multiplied by the loss margin amount, the result of limiting the range between the minimum estimated loss amount and the maximum estimated loss amount is estimated loss amount 23. To do.

背景雑音推定手段２１の推定手順を図４のフローチャートに示す。背景雑音推定手段２１は、図４に示すように、エコー消去後の音声信号１５の絶対値成分の長時間平均パワーを推定ノイズレベルＰＮとする計算で得られるＰＮと（Ｓ１）、ＰＮｔｈ０はＰＮｔｈ１より小さい値で定義されるＰＮｔｈ０、ＰＮｔｈ１の２つの閾値において、ＰＮがＰＮｔｈ０未満であるとき低雑音下通話状態（Ｓ２→Ｓ５）、ＰＮがＰＮｔｈ０以上かつＰＮｔｈ１未満であるとき雑音下通話状態（Ｓ２→Ｓ３→Ｓ６）、ＰＮがＰＮｔｈ１以上であるとき高雑音下通話状態（Ｓ２→Ｓ３→Ｓ４）と判定する機能を有し、判定結果を近端状態２２とする。ＰＮｔｈ０、ＰＮｔｈ１は利用環境により経験的に設定することが可能であるとする。 The estimation procedure of the background noise estimation means 21 is shown in the flowchart of FIG. As shown in FIG. 4, the background noise estimation means 21 is obtained by calculation using the long-term average power of the absolute value component of the audio signal 15 after echo cancellation as the estimated noise level PN (S1), and PNth0 is PNth1. In two threshold values of PNth0 and PNth1 defined by smaller values, when PN is less than PNth0, a low noise call state (S2 → S5), and when PN is equal to or greater than PNth0 and less than PNth1, a noisy call state (S2 → S3 → S6), when the PN is equal to or greater than PNth1, it has a function of determining the call state under high noise (S2 → S3 → S4), and the determination result is the near end state 22. It is assumed that PNth0 and PNth1 can be set empirically depending on the usage environment.

受話出力制限推定手段６０は、図３に示すとおり、受話出力制限受話入力パワー計算手段６２、低雑音下通話状態受話出力上限レベル保持手段６４、雑音下通話状態受話出力上限レベル保持手段６５、高雑音下通話状態受話出力上限レベル保持手段６６、受話出力上限レベル選択手段６７、受話出力制限損失量更新手段６９で構成される。 As shown in FIG. 3, the reception output limit estimation means 60 includes a reception output restriction reception input power calculation means 62, a low noise call state reception output upper limit level holding means 64, a noise call state reception output upper limit level holding means 65, a high A noisy call state reception output upper limit level holding unit 66, reception output upper limit level selection unit 67, and reception output limit loss amount update unit 69 are configured.

受話出力制限受話入力パワー計算手段６２は受話入力信号２の瞬時値パワー計算の結果を受話出力制限受話入力パワー６３として出力する。受話出力上限レベル選択手段６７は、近端背景雑音推定手段２１から出力された近端状態２２の状態に対し、低雑音下通話状態では低雑音下通話状態受話出力上限レベル保持手段６４に保持されている上限レベルを、雑音下通話状態では雑音下通話状態受話出力上限レベル保持手段６５に保持されている上限レベルを、高雑音下通話状態では高雑音下通話状態受話出力上限レベル保持手段６６に保持されている上限レベルをそれぞれ受話出力上限レベル６８として出力する。 The reception output limit reception input power calculation means 62 outputs the result of instantaneous value power calculation of the reception input signal 2 as reception output restriction reception input power 63. The reception output upper limit level selection unit 67 is held in the low noise call state reception output upper limit level holding unit 64 in the low noise call state with respect to the state of the near end state 22 output from the near end background noise estimation unit 21. The upper limit level held in the noisy call state reception output upper limit level holding means 65 in the noisy call state and the noisy call state reception output upper limit level holding means 66 in the noisy call state. The held upper limit levels are output as received output upper limit levels 68, respectively.

低雑音下通話状態受話出力上限レベル保持手段６４、雑音下通話状態受話出力上限レベル保持手段６５、高雑音下通話状態受話出力上限レベル保持手段６６は各状態に対し、適切なスピーカ出力となるように、利用環境に応じて経験的に受話出力上限レベルを設定することを可能とする。 The low noise call state reception output upper limit level holding means 64, the noise call state reception output upper limit level holding means 65, and the high noise call state reception output upper limit level holding means 66 so as to provide an appropriate speaker output for each state. In addition, it is possible to empirically set the reception output upper limit level according to the use environment.

受話出力制限損失量更新手段６９は、現在の受話出力制限損失量６１を、受話出力制限受話入力パワー６３に受話出力制限損失量６１を乗じた結果が受話出力上限レベル６８未満となるために最小な損失量となる受話出力制限損失量６１に更新する機能を有する。 The reception output limit loss amount updating means 69 is the minimum because the result of multiplying the reception output limit reception amount 63 by the reception output limit loss amount 61 is less than the reception output upper limit level 68. A function of updating the received output limit loss amount 61 to be a large loss amount.

音声スイッチ部１０は、図２に示すとおり、受話音声判定手段３０、送話音声判定手段４０、損失量制御手段５０、第二固定損失量保持手段５１、第一固定損失量保持手段５２で構成される。既に説明したように、第一固定損失量保持手段５２に保持されている第一固定損失量は、予め定められたエコーおよびハウリングを抑圧可能である固定損失量である。また、第二固定損失量保持手段５１に保持されている第二固定損失量は、第一固定損失量よりも小さな損失量であり、ハウリングを抑圧可能な固定損失量である。 As shown in FIG. 2, the voice switch unit 10 includes a received voice determination unit 30, a transmitted voice determination unit 40, a loss amount control unit 50, a second fixed loss amount holding unit 51, and a first fixed loss amount holding unit 52. Is done. As already described, the first fixed loss amount held in the first fixed loss amount holding means 52 is a fixed loss amount capable of suppressing a predetermined echo and howling. Further, the second fixed loss amount held in the second fixed loss amount holding means 51 is a loss amount smaller than the first fixed loss amount, and is a fixed loss amount capable of suppressing howling.

受話音声判定手段３０は受話入力信号（遠端音声）２を監視することで受話音声の有無を判定する機能を有し、受話音声判定手段３０の判定結果を受話状態３１とする。 The received voice determination unit 30 has a function of determining the presence or absence of a received voice by monitoring the received input signal (far-end voice) 2, and sets the received voice determination unit 30 as a received state 31.

送話音声判定手段４０は、エコー消去後の音声信号１５を監視することで送話音声の有無を判定し、特に適応フィルタ状態１６が未収束状態で、かつ、受話状態３１が受話音声有りと判定しているときは、エコーキャンセラが消し残したエコーを送話音声有りと判定しづらくする（例えば、近端音響エコーが近端話者音声に対して無視できない大きさの場合は、予めわかっている最大近端音響エコーよりエコー消去後の音声信号１５が大きいときに音声有りと判定する）機能を有し、送話音声判定手段４０の判定結果を送話状態４１とする。 The transmission voice determination means 40 determines the presence or absence of transmission voice by monitoring the voice signal 15 after echo cancellation, and particularly indicates that the adaptive filter state 16 is in an unconverged state and the reception state 31 is reception voice. When it is judged, it is difficult to judge that the echo left behind by the echo canceller is transmitted speech (for example, if the near-end acoustic echo has a magnitude that cannot be ignored relative to the near-end talker speech, it is known in advance. The voice signal 15 after the echo cancellation is larger than the maximum near-end acoustic echo is determined to have a voice), and the judgment result of the transmission voice judgment unit 40 is a transmission state 41.

前述した背景雑音推定手段２１では、エコー消去後の音声信号１５の絶対値成分の長時間平均パワーを推定ノイズレベルＰＮとするが、ここでいう長時間とは、パワーの平均をとった結果が無音声では適切な近端ノイズレベルを示し、かつ、音声が有った場合でも前記近端ノイズレベルへの影響が微少となる時間であり、利用環境において主な音声レベル、近端背景雑音、各通話状態に変わる反応速度から適宜調整する。具体的に、本実施例では１秒から５秒の範囲で適宜調整した。 In the background noise estimation means 21 described above, the long-term average power of the absolute value component of the audio signal 15 after echo cancellation is used as the estimated noise level PN. The long-time referred to here is the result of averaging the power. Non-speech indicates an appropriate near-end noise level, and even when there is speech, it is a time during which the influence on the near-end noise level is minimal, and the main speech level, near-end background noise, Adjust appropriately from the response speed that changes to each call state. Specifically, in this example, the adjustment was appropriately made in the range of 1 second to 5 seconds.

例えば、前記長時間以上継続した背景雑音のみの前記長時間平均パワーと、背景雑音および音声の両方がある前記長時間平均パワーとを比べたときに、通常通話における音声に対しては無視できる程度の違いであり、通常通話における音声より数十ｄＢ以上大きな音声に対しては明らかな違いがあるようになる。 For example, when comparing the long-term average power with only background noise lasting for a long time with the long-term average power with both background noise and voice, it is negligible for voice in a normal call There is a clear difference for a voice that is several tens of dB or more larger than the voice in a normal call.

損失量制御手段５０は、受話状態３１、送話状態４１、推定損失量２３、近端状態２２、受話出力制限損失量６１から受話損失挿入手段１１、送話損失挿入手段１３で挿入する損失量を制御する。損失量制御手段５０が行う損失量の制御を表１に示し、損失量制御手段５０の制御手順を図５のフローチャートに示して以下に説明する。なお、表１における「×」の表記は、該当する状態が存在しないことを表す。 The loss amount control means 50 is a loss amount to be inserted by the reception loss insertion means 11 and the transmission loss insertion means 13 from the reception state 31, the transmission state 41, the estimated loss amount 23, the near end state 22, and the reception output limit loss amount 61. To control. The loss amount control performed by the loss amount control means 50 is shown in Table 1, and the control procedure of the loss amount control means 50 is shown in the flowchart of FIG. Note that the notation “x” in Table 1 indicates that the corresponding state does not exist.

近端状態２２が低雑音下通話状態のとき（Ｓ１０→Ｓ１１）、受話状態は受話状態３１を参照し、送話状態は送話状態４１を参照し、受話有音状態かつ送話無音状態では（Ｓ１１→Ｓ１２）、受話損失挿入手段１１に受話出力制限損失量を損失挿入し、送話損失挿入手段１３に推定損失量を損失挿入する。受話無音状態かつ送話有音状態、受話有音状態かつ送話有音状態、受話無音状態かつ送話無音状態では（Ｓ１１→Ｓ１３）、受話損失挿入手段１１に受話出力制限損失量または推定損失量のいずれか大きい方を損失挿入し、送話損失挿入手段１３には損失挿入しない。 When the near-end state 22 is a low noise call state (S10 → S11), the reception state refers to the reception state 31, the transmission state refers to the transmission state 41, and in the reception voiced state and the transmission silent state, (S11 → S12), the received output limit loss amount is inserted into the received loss insertion means 11 and the estimated loss amount is inserted into the transmitted loss insertion means 13 as a loss. In the reception silence state and the transmission sound state, the reception sound state and the transmission sound state, the reception silence state and the transmission silence state (S11 → S13), the reception loss insertion means 11 receives the reception output limit loss amount or the estimated loss. The greater of the amount is inserted as a loss, and no loss is inserted into the transmission loss insertion means 13.

近端状態２２が雑音下通話状態のとき（Ｓ１０→Ｓ１４）、受話状態は受話状態３１を参照し、送話状態は送話状態４１を参照するが、雑音下通話状態では、近端の背景雑音レベルが高いので、必ず送話有音状態となる。よって、とり得る状態は、受話有音状態かつ送話有音状態、または、受話無音状態かつ送話有音状態のいずれかであり、受話有音状態かつ送話有音状態では（Ｓ１４→Ｓ１５）、受話損失挿入手段１１に受話出力制限損失量を損失挿入し、送話損失挿入手段１３に第一固定損失量を損失挿入する。受話無音状態かつ送話有音状態では（Ｓ１４→Ｓ１７）、受話損失挿入手段１１に受話出力制限損失量または第一固定損失量のいずれか大きい方を損失挿入し、送話損失挿入手段１３には損失挿入しない。 When the near-end state 22 is a noisy call state (S10 → S14), the reception state refers to the reception state 31 and the transmission state refers to the transmission state 41. Since the noise level is high, the voice is always transmitted. Therefore, the possible states are either a received sound state and a transmitted sound state, or a received soundless state and a transmitted sound state. In the received sound state and the transmitted sound state (S14 → S15) ), The received output limit loss amount is inserted into the received loss insertion means 11, and the first fixed loss amount is inserted into the transmitted loss insertion means 13 as a loss. In the no-speech state and the transmission-sounding state (S14 → S17), the greater one of the reception output limit loss amount and the first fixed loss amount is inserted into the reception loss insertion unit 11 as a loss, and the transmission loss insertion unit 13 is inserted. Does not insert loss.

近端状態２２が高雑音下通話状態のとき（Ｓ１０→Ｓ１６）、受話状態は受話状態３１を参照し、送話状態は送話状態４１を参照するが、高雑音下通話状態では、近端の背景雑音レベルが高いので、必ず送話有音状態となる。よって、とり得る状態は、受話有音状態かつ送話有音状態、または、受話無音状態かつ送話有音状態のいずれかであり、いずれの場合にも受話損失挿入手段１１に受話出力制限損失量を損失挿入し、送話損失挿入手段１３に第二固定損失量を損失挿入する。 When the near-end state 22 is a high-noise call state (S10 → S16), the reception state refers to the reception state 31 and the transmission state refers to the transmission state 41. Since the background noise level is high, there is always a voice transmission state. Therefore, a possible state is either a received sound state and a transmitted sound state, or a received silence state and a transmitted sound state. In any case, the received loss insertion means 11 receives a received output limit loss. The amount of loss is inserted, and the second fixed loss amount is inserted into the transmission loss insertion means 13 as loss.

本発明の実施の形態は、汎用の情報処理装置にインストールすることにより、その汎用の情報処理装置に、上述の音声通話装置に相応する機能を実現させるプログラムとして実施することができる。このプログラムは、記録媒体に記録されて前記汎用の情報処理装置にインストールされ、あるいは通信回線を介して前記汎用の情報処理装置にインストールされることにより当該汎用の情報処理装置に、上述した音声通話装置に相応する機能を実現させることができる。汎用の情報処理装置は、例えば、遠隔通話装置などに搭載されるマイクロ・コンピュータ・チップなどのＣＰＵ(Central Processing Unit)である。 The embodiment of the present invention can be implemented as a program that, when installed in a general-purpose information processing device, causes the general-purpose information processing device to realize a function corresponding to the above-described voice communication device. The program is recorded on a recording medium and installed in the general-purpose information processing apparatus, or is installed in the general-purpose information processing apparatus via a communication line, whereby the above-described voice call is made to the general-purpose information processing apparatus. Functions corresponding to the device can be realized. The general-purpose information processing apparatus is, for example, a CPU (Central Processing Unit) such as a microcomputer chip mounted on a remote call device or the like.

本発明によれば、様々な雑音環境下において良好な通話を行うことができるので、背景雑音が様々に変化する環境下、例えば、建設工事現場または浴室または交通量の多い道路に面した建物におけるドアホンなどとして使用するのに適した音声通話装置を実現することができる。 According to the present invention, it is possible to make a good call under various noise environments, and therefore, in an environment where background noise changes variously, for example, in a construction site or a bathroom or a building facing a busy road. A voice communication device suitable for use as a door phone or the like can be realized.

本発明実施形態の音声通話装置の全体構成図。1 is an overall configuration diagram of a voice call device according to an embodiment of the present invention. 音声スイッチ部のブロック構成図。The block block diagram of an audio | voice switch part. 受話出力制限推定手段のブロック構成図。The block block diagram of a receiving output restriction | limiting estimation means. 状態判定手順を示すフローチャート。The flowchart which shows a state determination procedure. 損失挿入手順を示すフローチャート。The flowchart which shows a loss insertion procedure.

Explanation of symbols

１受話入力端
２受話入力信号
３送話出力端
４送話損失挿入後近端信号
５送話入力端
６受話出力端
７送話入力信号
１０音声スイッチ部
１１受話損失挿入手段
１２受話損失挿入後の音声信号
１３送話損失挿入手段
１４エコーキャンセラ
１５エコー消去後の音声信号
１６適応フィルタ状態
２０損失量推定手段
２１背景雑音推定手段
２２近端状態
２３推定損失量
３０受話音声判定手段
３１受話状態
４０送話音声判定手段
４１送話状態
５０損失量制御手段
５１第二固定損失量保持手段
５２第一固定損失量保持手段
６０受話出力制限推定手段
６１受話出力制限損失量
６２受話出力制限受話入力パワー計算手段
６３受話出力制限受話入力パワー
６４低雑音下通話状態受話出力上限レベル保持手段
６５雑音下通話状態受話出力上限レベル保持手段
６６高雑音下通話状態受話出力上限レベル保持手段
６７受話出力上限レベル選択手段
６８受話出力上限レベル
６９受話出力制限損失量更新手段 DESCRIPTION OF SYMBOLS 1 Reception input terminal 2 Reception input signal 3 Transmission output terminal 4 Near-end signal 5 after transmission loss insertion Transmission input terminal 6 Reception output terminal 7 Transmission input signal 10 Voice switch part 11 Reception loss insertion means 12 After reception loss insertion Voice signal 13 Transmission loss insertion means 14 Echo canceller 15 Voice signal 16 after echo cancellation Adaptive filter state 20 Loss amount estimation means 21 Background noise estimation means 22 Near end state 23 Estimated loss amount 30 Received voice determination means 31 Reception state 40 Transmission voice determination means 41 Transmission state 50 Loss amount control means 51 Second fixed loss amount holding means 52 First fixed loss amount holding means 60 Received output limit estimation means 61 Received output limit loss amount 62 Received output limit received input power calculation Means 63 Received output restriction received input power 64 Low noise call state reception output upper limit level holding means 65 Noisy call state reception output upper limit level Holding means 66 High-noise conversation state reception output upper limit level holding means 67 Reception output upper limit level selection means 68 Reception output upper limit level 69 Reception output limit loss amount updating means

Claims

One of two parties in different locations is a near-end speaker, the other is a far-end speaker, the end to which the voice signal of the near-end speaker is input is a transmission input end, and the far-end talk An end to which the voice signal of the far-end speaker is input, an end to which the voice signal of the far-end speaker is output is an output end, and an end to which the voice signal of the near-end speaker is output , A transmission signal path and a reception signal path between the transmission input terminal and the transmission output terminal and between the reception input terminal and the reception output terminal, respectively. Voice communication means to make a call between
This voice call means
Audio signal detection means for detecting the presence or absence of audio signals in the transmission signal path and the reception signal path, respectively;
Loss insertion means for adaptively setting a loss insertion location and a loss insertion amount of the transmission signal path or the reception signal path according to a detection result of the voice signal detection means;
An echo canceller provided between the reception output terminal and the transmission input terminal and canceling an echo generated by a voice signal of the far-end speaker;
In a voice communication device comprising: a loss amount estimating means for estimating a required loss insertion amount based on a voice signal after insertion loss reception and a voice signal after echo cancellation by the echo canceller;
An average signal strength at a predetermined time of a voice signal input from the transmission input terminal or a voice signal after echo cancellation by the echo canceller is set as an estimated noise level, and predetermined according to the strength of the estimated noise level. It includes a status classification means for classifying the quality of the call environment in a plurality of stages,
The state classification means is based on the intensity of the estimated noise level.
A low-noise call state in which the estimated value of the loss insertion amount by the loss amount estimation means is within the applicable range and the presence or absence of a voice signal by the voice signal detection means and echo cancellation by the echo canceller is possible,
A noisy call in which the estimated value of the loss insertion amount by the loss amount estimation means is out of the applicable range, the detection of the presence or absence of the voice signal by the voice signal detection means is uncertain, and echo cancellation by the echo canceller is possible Status,
Further, the call with high noise, the estimated value of the loss insertion amount by the loss amount estimation means is out of the applicable range, and the voice signal detection means cannot detect the presence or absence of the voice signal and the echo canceller cannot cancel the echo. Status
With the means to classify
The loss insertion means includes
It is classified into a call state under low noise by the state classification means,
When a voice signal is detected in the reception signal path and a voice signal is not detected in the transmission signal path, a predetermined reception output limit loss amount is inserted into the reception signal path as a loss, and the transmission signal path Insert the loss insertion amount estimated by the loss amount estimation means,
When a voice signal is detected in the transmission signal path and a voice signal is not detected in the reception signal path, or when a voice signal is detected in both the reception signal path and the transmission signal path, or the reception When a speech signal is not detected in both the signal path and the transmission signal path, the larger one of the received output limit loss amount or the loss insertion amount estimated by the loss amount estimation means is inserted into the received signal path as a loss insertion. And
Classified by the state classification means into a call state under noise,
When the voice signal is detected in the transmission signal path and the voice signal is not detected in the reception signal path, the reception output limit loss amount or a predetermined echo and howling can be suppressed in the reception signal path. Insert the larger of the fixed loss amount of
When a speech signal is detected in both the reception signal path and the transmission signal path, the reception output limit loss amount is inserted into the reception signal path as a loss, and the first fixed loss amount is added to the transmission signal path. Inserting loss,
It is classified into a call state under high noise by the state classification means,
When a voice signal is detected in the transmission signal path and a voice signal is not detected in the reception signal path, or when a voice signal is detected in both the reception signal path and the transmission signal path, the reception signal path A voice communication device characterized by inserting a loss of the received output limit loss amount into the transmission signal path and inserting a second fixed loss amount smaller than the first fixed loss amount into the transmission signal path .

A program that, when installed in a general-purpose information processing apparatus, causes the general-purpose information processing apparatus to realize a function corresponding to the voice call device according to claim 1 .