JP3460783B2

JP3460783B2 - Voice switch for talker

Info

Publication number: JP3460783B2
Application number: JP11572497A
Authority: JP
Inventors: 泰山崎; 知紀佐藤; 均松澤; 正人伊藤
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1997-05-06
Filing date: 1997-05-06
Publication date: 2003-10-27
Anticipated expiration: 2017-05-06
Also published as: JPH10308814A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明はハンズフリー通話機
などに用いられる音声スイッチに関するものである。音
声スイッチ方式を採用したハンズフリー通話機において
は、音響エコーを的確に抑圧できることが必要とされ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice switch used in a hands-free telephone or the like. In a hands-free telephone that employs a voice switch system, it is necessary to be able to accurately suppress acoustic echo.

【０００２】[0002]

【従来の技術】ハンズフリー機能を実現するためには、
スピーカの音量を上げ、マイクの感度を高める必要があ
る。しかしながら、このようにすると、図５に示される
ように、スピーカ等の音声出力部から出力された受話音
声がマイクロホン等の音声入力部に回り込む音響エコー
が生じる。これは、通話相手にとっては自分の声がこだ
まのように聞こえる現象で、非常に使いにくいものとな
る。この音響エコーを除去するためには、（１）エコー
キャンセラ方式、（２）音声スイッチ方式の二方式があ
る。2. Description of the Related Art In order to realize a hands-free function,
It is necessary to increase the speaker volume and microphone sensitivity. However, in this case, as shown in FIG. 5, an acoustic echo occurs in which the received voice output from the voice output unit such as the speaker wraps around to the voice input unit such as the microphone. This is a phenomenon in which one's voice sounds like a echo to the other party of the call, which is extremely difficult to use. In order to remove this acoustic echo, there are two methods: (1) echo canceller method and (2) voice switch method.

【０００３】エコーキャンセラ方式は適応信号処理技術
を用いて音響エコーを除去するものである。例えば図６
に示されるように、出力された受話音声がマイクに回り
込む音響エコーｒを、通話機の内部で擬似的に発生さ
せ、マイク入力された信号から差し引くものである。こ
の擬似エコーｒ’の発生はスピーカからマイクへの伝達
関数をＦＩＲフィルタで表したものである。この伝達関
数は通話機の周囲の状況によって変化するため、擬似エ
コーｒ’と音響エコーｒの誤差が最小になるよう適応的
にフィルタを変化させるものである。The echo canceller system removes acoustic echoes using an adaptive signal processing technique. For example, in FIG.
As shown in (1), an acoustic echo r in which the received voice that is output wraps around the microphone is artificially generated inside the communication device and subtracted from the signal input to the microphone. The generation of the pseudo echo r'is the transfer function from the speaker to the microphone represented by the FIR filter. Since this transfer function changes depending on the surroundings of the telephone, the filter is adaptively changed so as to minimize the error between the pseudo echo r ′ and the acoustic echo r.

【０００４】一方、音声スイッチ方式は、図７に示され
るように、スピーカ出力音声とマイク入力音声とのパワ
ーを比較し、どちらか一方を抑圧することで、音響エコ
ーを除去する。つまり、スピーカ出力している間はマイ
ク入力された信号は音響エコーである確率が高いので、
この間はマイク入力信号を抑圧することで、相手に音響
エコーを送信することを防ぐ。On the other hand, in the voice switch system, as shown in FIG. 7, the power of the speaker output voice is compared with the power of the microphone input voice, and one of them is suppressed to remove the acoustic echo. In other words, the signal input to the microphone during the speaker output has a high probability of being an acoustic echo.
During this period, the microphone input signal is suppressed to prevent the acoustic echo from being transmitted to the other party.

【０００５】このように、ハンズフリー機能を実現する
上で問題となる音響エコーの除去には、エコーキャンセ
ラ、音声スイッチの２方式がある。両者の長所、短所の
比較は図８に示すとおりであり、処理量と能力のトレー
ドオフとなる。コストを優先させる場合には音声スイッ
チ方式を採用することになる。本発明はこの音声スイッ
チに関わるものである。As described above, there are two methods of removing the acoustic echo, which is a problem in realizing the hands-free function, an echo canceller and a voice switch. The advantages and disadvantages of both are compared as shown in FIG. 8, and there is a trade-off between throughput and capacity. If cost is prioritized, the voice switch system will be adopted. The present invention relates to this voice switch.

【０００６】図９にはこの音声スイッチを備えたハンズ
フリー通話機の詳細な従来構成が示される。図９におい
て、１は相手側からの音声信号を受信する復調器等から
なる受信部、２は受信ゲインｇａｉｎ-rを変化させるこ
とで受信信号のパワーを抑圧制御できるパワー抑圧部、
３は増幅器やスピーカ等からなり受話音声（Ｒ）を放音
する音声出力部である。６はマイクロホンや増幅器から
なり送話音声（Ｓ）を入力する音声入力部、７は送信ゲ
インｇａｉｎ-sを変化させることで受信信号のパワーを
抑圧制御できるパワー抑圧部、８は送話音声信号を相手
側に送信する変調器等からなる送信部である。FIG. 9 shows a detailed conventional structure of a hands-free telephone equipped with this voice switch. In FIG. 9, reference numeral 1 denotes a receiving unit that includes a demodulator or the like that receives a voice signal from the other party, and 2 denotes a power suppressing unit that can suppress and control the power of the received signal by changing the reception gain gain-r.
Reference numeral 3 is a voice output unit that includes an amplifier, a speaker, and the like and emits a received voice (R). Reference numeral 6 is a voice input unit that is composed of a microphone and an amplifier for inputting the transmitted voice (S), 7 is a power suppressing unit that can control the power of the received signal by changing the transmission gain gain-s, and 8 is the transmitted voice signal. Is a transmission unit including a modulator or the like for transmitting the signal to the other party.

【０００７】４は受話音声と送話音声の大きさに基づい
て、受話側のパワー抑圧部２で受話音声を抑圧するか、
送話側のパワー抑圧部７で送話音声を抑圧するかを判定
する判定部である。４１は受信部１で受信した受信信号
のパワーを計算するパワー計算部、４２はパワー計算部
４１で算出したパワーに基づいて現在の受話音声状態ｓ
-sが無音か有音かを検出する有音検出部、４３は音声入
力部６に入力した音声信号のパワーを計算するパワー計
算部、４４はパワー計算部４３で算出したパワーに基づ
いて現在の送話音声状態ｓ-rが無音か有音かを検出する
有音検出部、４５は有音検出部４２、４４の検出結果に
基づいてパワー抑圧部２、７のいずれ側を抑圧制御状態
にするかを判定する判定部である。Reference numeral 4 indicates whether or not the power suppressing section 2 on the receiving side suppresses the received voice based on the sizes of the received voice and the transmitted voice.
This is a determination unit that determines whether or not the transmitted voice is suppressed by the power suppressing unit 7 on the transmitting side. Reference numeral 41 is a power calculation unit that calculates the power of the reception signal received by the reception unit 1, and 42 is the current received voice state s based on the power calculated by the power calculation unit 41.
-sound detection unit that detects whether s is silent or voiced, 43 is a power calculation unit that calculates the power of the audio signal input to the audio input unit 6, and 44 is the current calculated based on the power calculated by the power calculation unit 43. A voice detecting section for detecting whether the transmitted voice state s-r is silent or voiced, and 45 indicates which side of the power suppressing sections 2 and 7 is in the suppression control state based on the detection results of the voice detecting sections 42 and 44. It is a determination unit that determines whether to set.

【０００８】ここで、パワー計算部４１、４３は次の計
算式により入力音声データのパワーを計算する。すなわ
ち、入力された音声データをｘ_iとすると、出力パワー
ｐ_iは、ｐ_i＝１０×log 〔Σ（ｘ_i-j×ｘ_i-j）〕で求まる。但し、Σはｊ＝０からＪまでの加算であるも
のとする。Here, the power calculators 41 and 43 calculate the power of the input voice data by the following formula. That is, assuming that the input voice data is x _i , the output power p _i is obtained by p _i = 10 × log [Σ (x _ij × x _ij )]. However, Σ is assumed to be an addition from j = 0 to J.

【０００９】有音検出部４２、４４は、図１０に示され
るように、入力パワーｐ_iを一定のしきい値ｔｈと比較
する比較部からなり、次の判定式により、入力パワーｐ
_iをしきい値ｔｈと比較して、現在の音声状態Ｓ_iが有
音か無音かを判定している。ここで、ｓ_i＝０は無音、
ｓ_i＝１は有音を意味する。判定式は、ｉｆ（ｐ_i＜ｔｈ）ｓ_i＝０ｉｆ（ｐ_i＞ｔｈ）ｓ_i＝１である。これは、入力パワーｐ_iがしきい値ｔｈより小
さければ、音声状態ｓ_iを「０」とし、しきい値ｔｈに
よりも大きければ、音声状態ｓ_iを「１」とするもので
ある。これより、しきい値ｔｈ以下の背景雑音が誤って
有音を判定されることを防ぐ。As shown in FIG. 10, the sound detecting units 42 and 44 are composed of a comparing unit for comparing the input power p _i with a constant threshold th, and the input power p is calculated by the following judgment formula.
_i is compared with a threshold th to determine whether the current voice state S _i is voiced or silent. Where s _i = 0 is silence,
s _i = 1 means voiced. The determination formula is if (p _i <th) s _i = 0 if (p _i > th) s _i = 1. This means that if the input power p _i is smaller than the threshold th, the voice state s _i is “0”, and if it is larger than the threshold th, the voice state s _i is “1”. This prevents background noise equal to or less than the threshold th from being erroneously determined to be voiced.

【００１０】判定部４５は、図１１に一例として示す判
定論理テーブルに従って、受話パワー抑圧部２の受話ゲ
インｇａｉｎ-rと送話パワー抑圧部７の送話ゲインｇａ
ｉｎ-sを制御している。ここで、受話ゲインｇａｉｎ-r
と送話ゲインｇａｉｎ-sは０．０≦ｇａｉｎ≦１．０の範囲のものである。図１１の判定論理テーブルでは、送話音声状態ｓ-s＝０、受話音声状態ｓ-r＝０の場合
には、送話ゲインｇａｉｎ-sを「０．０」、受話ゲイン
ｇａｉｎ-rを「０．０」とする．送話音声状態ｓ-s＝１、受話音声状態ｓ-r＝０の場合
には、送話ゲインｇａｉｎ-sを「１．０」、受話ゲイン
ｇａｉｎ-rを「０．０」とする．送話音声状態ｓ-s＝０、受話音声状態ｓ-r＝１の場合
には、送話ゲインｇａｉｎ-sを「０．０」、受話ゲイン
ｇａｉｎ-rを「１．０」とする．送話音声状態ｓ-s＝１、受話音声状態ｓ-r＝１の場合
には、受話を優先して、送話ゲインｇａｉｎ-sを「０．
０」、受話ゲインｇａｉｎ-rを「１．０」とする．の制御を行う。The determination unit 45 follows the determination logic table shown in FIG. 11 as an example, and the reception gain gain-r of the reception power suppression unit 2 and the transmission gain ga of the transmission power suppression unit 7 are shown.
It controls in-s. Here, the receiving gain gain-r
The transmission gain gain-s is in the range of 0.0≤gain≤1.0. In the judgment logic table of FIG. 11, when the transmission voice state s−s = 0 and the reception voice state s−r = 0, the transmission gain gain-s is “0.0” and the reception gain gain-r is Set to "0.0". When the transmission voice state s-s = 1 and the reception voice state s-r = 0, the transmission gain gain-s is set to "1.0" and the reception gain gain-r is set to "0.0". When the transmission voice state s-s = 0 and the reception voice state s-r = 1, the transmission gain gain-s is set to "0.0" and the reception gain gain-r is set to "1.0". When the transmission voice state s-s = 1 and the reception voice state s-r = 1, the reception gain is prioritized and the transmission gain gain-s is set to "0.
0 ", and the receiving gain gain-r is set to" 1.0 ". Control.

【００１１】この判定部４５の判定結果に従って、パワ
ー抑圧部２、７は入力音声データｘ _iに対して以下の処
理を行って、出力音声データｘ_iとして出力する。ｘ_i＝ｘ_i×ｇａｉｎAccording to the judgment result of the judging section 45, the power is increased.
-The suppression units 2 and 7 are input voice data x _iAgainst
Output audio data x_iOutput as. x_i= X_i× gain

【００１２】このように、この音声スイッチ方式は、受
話音声と送話音声の状態によりどちらか一方を抑圧し、
他方が受話音声であればスピーカ出力し、送話音声であ
れば送信するものである。両者のいずれもが有音の場合
には、受話音声を優先する場合や、音声パワーの高い方
を優先する場合など様々な基準が考えられる。As described above, this voice switch system suppresses one of the received voice and the transmitted voice,
If the other is the received voice, it is output to the speaker, and if it is the transmitted voice, it is transmitted. When both of them have a voice, various criteria are conceivable, such as a case where the received voice is prioritized and a case where the voice power is higher is prioritized.

【００１３】[0013]

【発明が解決しようとする課題】上述のように、音声ス
イッチ方式は、受話音声と送話音声の状況を比較し、い
ずれか一方を抑圧し他方を通過させることにより音響エ
コーを除去するものである。これにより通常は問題なく
音響エコーを除去することができるが、フレーム処理を
行った場合には、受話音声と送話音声の間に時間的なず
れが生じ、音響エコーを完全に除去することができなく
なる場合がある。As described above, the voice switch method is to remove the acoustic echo by comparing the states of the received voice and the transmitted voice and suppressing one of them and passing the other. is there. This usually eliminates the acoustic echo without any problem, but when frame processing is performed, a time lag occurs between the received voice and the transmitted voice, and the acoustic echo can be completely eliminated. It may not be possible.

【００１４】このフレーム処理とは、例えば音声符復号
化処理を行う際に用いられ、一定の時間分のデータを一
括して処理することである。図１２はこのフレーム処理
を説明する図であり、入力された音声が送話側で符号化
され、受話側で復号化され出力するまでのタイミングを
示したものである。図１２において、送信側では音声入
力部で入力された送話音声が一定時間分溜められてフレ
ームとされ、このフレームは２フレーム目のタイミ
ングでは符号化部で符号化処理、送信部で送信処理され
て相手側に送られる。相手側ではフレームは受信部で
受信処理、復号化部で復号化処理された後、３フレーム
目のタイミングで音声出力部からフレームが受話音声
として放音される。この図１２から分かるように、送話
側での入力音声は少なくとも２フレームの遅延をもって
相手側で音声出力されることになる。The frame processing is used, for example, when performing voice coding / decoding processing, and means collectively processing data for a fixed time. FIG. 12 is a diagram for explaining this frame processing, and shows the timing until the input voice is encoded on the transmitting side, is decoded on the receiving side, and is output. In FIG. 12, on the transmission side, the transmitted voice input by the voice input unit is accumulated for a certain period of time to form a frame. At the timing of the second frame, the encoding process is performed by the encoding unit and the transmission process is performed by the transmission unit. Is sent and sent to the other party. On the other party's side, the frame is received by the receiving section and decoded by the decoding section, and then, at the timing of the third frame, the audio output section outputs the frame as the received voice. As can be seen from FIG. 12, the input voice on the transmitting side is output on the other side with a delay of at least 2 frames.

【００１５】このフレーム処理を音声スイッチで行った
場合には、図１３に示すとおり、判定部で判定時に比較
する受話音声と送話音声は、スピーカから出力された時
に同時にマイクロホンから入力された受話音声と送話音
声ではなくなる。つまり、判定の時点を基準に考える
と、１フレーム前にマイク入力された送話音声と１フレ
ーム後にスピーカ出力される受話音声とを比較している
ことになる。このずれのため、単純に比較すると、音声
入力部で入力された送話音声が音声出力部から回り込ん
だ音響エコーであるか否かを判断できなくなる。When this frame processing is performed by the voice switch, as shown in FIG. 13, the reception voice and the transmission voice to be compared at the time of determination by the determination unit are the voices input from the microphone at the same time when they are output from the speaker. It is no longer a voice and a transmitted voice. That is, considering the time point of the determination as a reference, the transmitted voice that is input to the microphone one frame before and the received voice that is output from the speaker one frame later are compared. Due to this deviation, if it is simply compared, it becomes impossible to determine whether or not the transmitted voice input by the voice input unit is an acoustic echo wrapping around from the voice output unit.

【００１６】以下、図１３に従ってこれを詳細に説明す
る。図１３は横軸方向に時間がフレームを単位にして示
されている。以下、このフレーム単位の時間に従って説
明する。This will be described below in detail with reference to FIG. In FIG. 13, time is shown in the horizontal axis in units of frames. Hereinafter, description will be made according to the time in frame units.

【００１７】１フレーム目：受話側の受信部で受信された受信データ
が有音、送話側の音声入力部で入力された入力データ
が有音である。２フレーム目：判定部で受話側の有音と送話側の有音
を比較する。両者が有音であるので、受話を優先し、
送話側の有音を抑圧する判定をする。３フレーム目：上記判定に従って、出力部からは上記有
音を有音’として出力し、送話側の有音は抑圧し
て無音’にして送信する。このとき、相手側が会話を
中断したため、受話音声が途絶え、受信部の受信データ
は無音になったものとする。この時点で、音声入力部
には有音が観測された。しかし、この有音は、自局
送話者の会話であるか、音声出力部から回り込んだ有音
’の音響エコーかは分からない。４フレーム目：判定部では、受話側の無音と送話側の
有音を比較し、その結果、送話側の有音は抑圧しな
いと判定する。５フレーム目：上記判定に従って、受話側では無音を
音声出力部から無音’として出力し、送話側では音声
入力部からの有音は抑圧せずに有音’として送信す
る。First frame: The reception data received by the receiving unit on the receiving side is voiced, and the input data input by the voice input unit on the transmitting side is voiced. Second frame: The judging section compares the voice on the receiving side with the voice on the transmitting side. Since both parties have voice, we prioritize the reception,
It is determined that the voice on the transmitting side is suppressed. Third frame: According to the above determination, the output unit outputs the above-mentioned sound as "sound", suppresses the sound on the transmitting side and makes it "silent" and transmits. At this time, it is assumed that the received voice is interrupted and the reception data of the reception unit is silent because the other party has interrupted the conversation. At this point, speech was observed in the voice input section. However, it is not known whether this voice is a conversation of the local speaker or a voice echo of the voice 'sneaking around from the voice output unit. Fourth frame: The determination unit compares the silence on the receiving side with the voice on the transmitting side, and as a result, determines that the voice on the transmitting side is not suppressed. Fifth frame: According to the above determination, silence is output from the voice output unit as "silence" on the receiving side, and the voice from the voice input unit is transmitted on the transmitting side as "voiceless" without being suppressed.

【００１８】上記のシーケンスでは、５フレーム目で有
音’を相手側に送信しているが、この有音は自局送
話者の会話音声であったのか、音声出力部から回り込ん
だ有音’の音響エコーであったのかは分からない。こ
のため、後者であった場合には、本来抑圧しなければな
らなかった音響エコーを相手側に送信してしまうことに
なり、相手側は話を中断した時などに自分の声のエコー
を聞くこととなって、これが不快に感じられる。In the above sequence, the voiced sound is transmitted to the other party at the 5th frame. However, whether the voiced sound is the conversation voice of the local speaker or the voice output from the voice output unit. I don't know if it was an acoustic echo of'sound '. For this reason, in the latter case, the acoustic echo, which originally had to be suppressed, will be transmitted to the other party, and the other party will hear the echo of their own voice when interrupting the talk. This makes it uncomfortable.

【００１９】本発明はかかる問題点に鑑みてなされたも
のであり、フレーム処理をともなう通話機において送話
受話信号の時間ずれにかかわらず音響エコーを的確に抑
圧することを目的とする。The present invention has been made in view of the above problems, and it is an object of the present invention to accurately suppress an acoustic echo regardless of a time shift of a transmitted / received signal in a communication device with frame processing.

【００２０】[0020]

【課題を解決するための手段】図１は本発明に係る原理
説明図である。上述の課題を解決するために、本発明に
係る通話機の音声スイッチは、受信した受話音声信号を
所定時間だけ遅延させる遅延手段と、受信した受話音声
信号と入力された送話音声信号とに基づき受話音声信号
を抑圧するか否かを判定する受話音声判定手段と、前記
遅延手段で遅延させた受話音声信号と入力された送話音
声信号とに基づき送話音声信号を抑圧するか否かを判定
する送話音声判定手段と、前記受話音声判定手段の判定
結果に従って前記受話音声信号を抑圧する受話側抑圧手
段と、前記送話音声判定手段の判定結果に従って前記送
話音声信号を抑圧する送話側抑圧手段とを備える。この
音声スイッチが適用される通話機ではフレーム処理によ
り音声データをブロック単位に処理しており、前記遅延
手段で遅延させる所定時間は、フレーム処理により生じ
る受話音声信号と送話音声信号との時間ずれを補償し両
者の同期をとる時間とする。この遅延手段は一時記憶を
する記憶手段で構成できる。この音声スイッチにおいて
は、受話音声信号を抑圧するか判定する際には、受信し
た受話音声と入力された送話音声を比較して行う。送話
音声信号を抑圧するか判定する際には、今入力された送
話音声とその送話音声と同じ時間にスピーカで観測され
た既に出力した受話音声とを比較し、時間的なずれを補
正し、受話音声信号と送話音声信号との同期を判定時に
とる。これは、受話音声信号を一時記憶手段などの遅延
手段で遅延させることで実現する。FIG. 1 is a diagram illustrating the principle of the present invention. In order to solve the above-mentioned problems, the voice switch of the telephone set according to the present invention has a delay unit that delays a received voice signal received by a predetermined time, and a received voice signal received and a transmitted voice signal input. Based on the received voice signal judged based on the received voice signal delayed by the delay means and the transmitted voice signal inputted, whether or not the transmitted voice signal is suppressed. And a receiving-side suppressing unit that suppresses the received voice signal according to the determination result of the receiving voice determination unit, and suppresses the transmitting voice signal according to the determination result of the transmitting voice determination unit. And a transmitting side suppressing means. In a telephone set to which this voice switch is applied, voice data is processed in block units by frame processing, and the predetermined time delayed by the delay means is a time lag between a reception voice signal and a transmission voice signal generated by the frame processing. Is the time to compensate for and synchronize the two. This delay means can be constituted by a storage means for temporarily storing. In this voice switch, when determining whether to suppress the received voice signal, the received voice is compared with the input transmitted voice. When determining whether or not to suppress the transmitted voice signal, the transmitted voice that has just been input is compared with the received voice that has already been output from the speaker at the same time as that transmitted voice, and the time lag is calculated. The received voice signal and the transmitted voice signal are synchronized at the time of determination. This is realized by delaying the received voice signal by delay means such as temporary storage means.

【００２１】上記音声スイッチにおいては、受話音声信
号と同期した送話音声信号の状態を推定する送話音声推
定手段を有し、前記送話音声推定手段からの送話音声信
号を受話音声判定手段に送話音声信号として入力するよ
うに構成できる。The above-mentioned voice switch has a transmission voice estimation means for estimating the state of the transmission voice signal synchronized with the reception voice signal, and the transmission voice signal from the transmission voice estimation means is received voice determination means. It can be configured to be input as a transmission voice signal.

【００２２】[0022]

【発明の実施の形態】以下、図面を参照して本発明の実
施例を説明する。図２には本発明の一実施例としての音
声スイッチを備えたハンズフリー通話機が示される。図
中、受信部１、パワー抑圧部２、７、音声出力部３、音
声入力部６、送信部８は、図６の従来装置で説明した回
路要素と同じものであるので、ここでは詳細な説明は省
く。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings. FIG. 2 shows a hands-free telephone equipped with a voice switch as an embodiment of the present invention. In the figure, the receiving unit 1, the power suppressing units 2 and 7, the voice output unit 3, the voice input unit 6, and the transmitting unit 8 are the same as the circuit elements described in the conventional device of FIG. I will omit the explanation.

【００２３】また、本実施例装置では受信データの復号
化処理を行う復号化部９と送信データの符号化処理を行
う符号化部１０を有している。例えばパソコン同士を接
続して通話を行う場合には、通信路の伝送容量が小さい
ことが十分考えられるので、符号化を行うことが必要に
なってくる。このため本実施例装置は、音声スイッチと
音声符復号器を用いたハンズフリー機能の構成となって
いる。Further, the apparatus of this embodiment has a decoding section 9 for decoding the received data and a coding section 10 for coding the transmission data. For example, when making a call by connecting personal computers to each other, it is sufficiently possible that the transmission capacity of the communication path is small, and thus it becomes necessary to perform encoding. Therefore, the apparatus of this embodiment has a hands-free function configuration using a voice switch and a voice codec.

【００２４】また、本実施例装置では、受話音声一時記
憶部４８を有し、パワー抑圧部２からの受話音声を所定
フレーム時間だけ遅延させるようになっている。この遅
延時間は送話音声と受話音声の時間的なずれを補正し両
者の同期をとれる長さとする。この実施例では１フレー
ムの受話音声データを２フレーム時間遅延させるものと
する。Further, the apparatus of this embodiment has a received voice temporary storage section 48 and delays the received voice from the power suppressing section 2 by a predetermined frame time. This delay time is set to a length that can synchronize the sending voice and the receiving voice by correcting the time difference between them. In this embodiment, one frame of received voice data is delayed by two frames.

【００２５】また、従来装置での判定部４に代えて、本
実施例では、受話音声判定部４６と送話音声判定部４７
の２つを有している。受話音声判定部４６は受話側のパ
ワー抑圧部２で受話音声を抑圧するか否かを判定するも
のであって、受話音声としては復号化部９から出力され
たものが入力される。送話音声判定部４７は送話側のパ
ワー抑圧部７で送話音声を抑圧するか否かを判定するも
のであって、受話音声としては受話音声一時記憶部４８
からの所定フレーム時間だけ遅延されたものが入力され
る。これら受話音声判定部４６と送話音声判定部４７は
従来技術で説明した判定部４と同じ構成を有しているの
で、その詳細な構成の説明は省略する。Further, instead of the determination unit 4 in the conventional device, in the present embodiment, the reception voice determination unit 46 and the transmission voice determination unit 47.
Has two. The received voice determination unit 46 determines whether or not the received power is suppressed by the power suppressing unit 2 on the receiving side, and the received voice output from the decoding unit 9 is input. The transmitted voice determining unit 47 determines whether or not the transmitted power is suppressed by the power suppressing unit 7 on the transmitting side. As the received voice, the received voice temporary storage unit 48 is used.
The data delayed by the predetermined frame time from is input. Since the received voice determination unit 46 and the transmitted voice determination unit 47 have the same configuration as the determination unit 4 described in the related art, detailed description thereof will be omitted.

【００２６】この実施例装置の動作シーケンスを図３を
参照して説明する。この図３は横軸方向に時間がフレー
ムを単位にして示されている。以下、このフレーム単位
の時間に従って説明する。The operation sequence of the apparatus of this embodiment will be described with reference to FIG. In FIG. 3, time is shown in the horizontal axis in units of frames. Hereinafter, description will be made according to the time in frame units.

【００２７】１フレーム目：受話側の受信部１で受信さ
れた受信データが有音、送話側の入力入力部６で入力
された入力データが有音である。First frame: The reception data received by the receiving section 1 on the receiving side is voiced, and the input data input by the input input section 6 on the transmitting side is voiced.

【００２８】２フレーム目：受話音声判定部４６で受話
側の有音と送話側の有音を比較する。両者が有音で
あるので、受話優先をし、受話側の有音を抑圧せずに
音声出力部３から有音’として出力すると判定する。Second frame: The reception voice judging section 46 compares the voice on the receiving side with the voice on the transmitting side. Since both are voiced, it is determined that the voice priority is given to the voice reception and the voice output unit 3 outputs the voiced voice as the voiced 'without suppressing the voice.

【００２９】３フレーム目：上記判定に従って、音声出
力部３からは上記有音をパワー抑圧部で抑圧せずに有
音’として出力する。このとき、相手側が会話を中断
したため、受話音声が途絶え、受信部１の受信データは
無音になったものとする。この時点で、音声入力部６
には有音が観測される。しかし、この有音は、自局
送話者の会話であるか、音声出力部３から回り込んだ受
話の有音’の音響エコーかはまだ分からない。Third frame: According to the above determination, the voice output section 3 outputs the voiced sound as a voiced sound 'without being suppressed by the power suppressing section. At this time, it is assumed that the other party has interrupted the conversation, so that the received voice is interrupted and the reception data of the receiving unit 1 becomes silent. At this point, the voice input unit 6
Sound is observed in. However, it is not yet known whether this voiced sound is a conversation of the local speaker or a voiced sound echo of the received voice circulated from the voice output unit 3.

【００３０】４フレーム目：受話音声判定部４６では、
受話側の無音と送話側の有音を比較し、受話側の無
音は抑圧して音声出力部３から無音’として出力す
ると判定する。また送信音声判定部４６では、受話音声
一時記憶部４８で２フレーム時間遅延させておいた受話
側の有音’と送話側の有音を比較し、受話優先の規
則に従って、送話音声を送話側のパワー抑圧部７で抑
圧して無音’として送信すると判定する。Fourth frame: In the received voice judging section 46,
The silent side on the receiving side and the voiced side on the transmitting side are compared, and it is determined that the silent side on the receiving side is suppressed and output as "silent" from the voice output unit 3. In addition, the transmission voice determination unit 46 compares the voice on the reception side, which has been delayed by two frames in the reception voice temporary storage unit 48, with the voice on the transmission side, and determines the transmission voice according to the rule of reception priority. It is determined that the power is suppressed by the power suppression unit 7 on the transmitting side and transmitted as "silence".

【００３１】５フレーム目：上記判定に従って、受話側
では無音を音声出力部から無音’として出力し、送
話側では入力部６からの有音は抑圧して無音’とし
て送信する。Fifth frame: According to the above determination, the receiving side outputs silence as "silence" from the voice output section, and the transmitting side suppresses the sound from the input section 6 and transmits it as "silence".

【００３２】上記のシーケンスでは、５フレーム目で、
有音を無音’として送信している。この場合、音声
入力部６に入力された有音は、同時点で音声出力部３
から有音’が出力されているため、この有音’の音
響エコーである可能性が高い。しかし、従来装置ではそ
の判別はできなかった。これに対して、本実施例装置に
よれば、送信音声判定部４７は、受話音声一時記憶部４
８で２フレーム時間遅延させておいた有音’と音声入
力部６に入力された有音とを比較しているので、受話
側の有音’と送話側の有音の時間のずれを一致させ
ることができ、両者が有音であれば、送話側の有音
は、受話側で出力された有音’が回り込んだ音響エコ
ーである可能性が大であると判断でき、この有音を抑
圧して無音を送信するよう判定する。これにより音響エ
コーを除去することができる。In the above sequence, at the 5th frame,
The voice is transmitted as'silence '. In this case, the voiced sound input to the voice input unit 6 is simultaneously output from the voice output unit 3 at the same point.
It is highly possible that this is a sound echo of this voiced 'because the voiced' is output from the voice. However, the conventional device could not make the determination. On the other hand, according to the apparatus of this embodiment, the transmission voice determination unit 47 includes the reception voice temporary storage unit 4
Since the voiced sound delayed by 2 frames in 8 is compared with the voiced sound input to the voice input unit 6, the time difference between the voiced sound on the receiving side and the voiced sound on the transmitting side can be calculated. If they can be matched and if both are voiced, it can be determined that the voice on the transmitting side is likely to be an acoustic echo that the voiced 'output on the receiving side wraps around. It is determined that the voice is suppressed and the silence is transmitted. Thereby, the acoustic echo can be removed.

【００３３】図４には本発明の他の実施例が示される。
前述の実施例と同様、フレーム遅延に対処したものであ
るが、受話音声判定部４６における受話音声の判定の際
にもなるベく受話音声と送話音声の時間ずれをなくすた
め、現在判定中の受話音声が出力される際に、同時に音
声入力部６のマイクで観測される送話音声の音声状態を
推定する送話音声推定部４９を有する。この送話音声の
推定は、過去数フレームの送話音声のパワーの平均をと
る方法などが可能である。音声は時間的に比較的に滑ら
かな変動をするものなので、推定精度も比較的高くなる
ものと考えられる。FIG. 4 shows another embodiment of the present invention.
Similar to the above-described embodiment, the frame delay is dealt with, but in order to eliminate the time lag between the received voice and the transmitted voice, which is also used when the received voice determination unit 46 determines the received voice, it is currently being determined. When the received voice is output, the transmission voice estimation unit 49 estimates the voice state of the transmission voice observed by the microphone of the voice input unit 6 at the same time. For estimation of the transmitted voice, a method of averaging the powers of the transmitted voices of the past several frames can be used. Since the speech changes relatively smoothly in time, the estimation accuracy is considered to be relatively high.

【００３４】[0034]

【発明の効果】以上説明したように、本発明によれば、
フレーム処理を用いた音声スイッチの問題点である判定
時の受話音声と送話音声の時間のずれを補正することが
できる。これにより、音響エコーの除去や受話・送話音
声の変化に対する素早い追従性を確保し、より自然な通
話を提供することが可能となる。As described above, according to the present invention,
It is possible to correct the time lag between the received voice and the transmitted voice at the time of determination, which is a problem of the voice switch using frame processing. This makes it possible to remove acoustic echoes and ensure quick followability to changes in received / transmitted voices, and provide a more natural call.

[Brief description of drawings]

【図１】本発明に係る原理説明図である。FIG. 1 is a diagram illustrating the principle of the present invention.

【図２】本発明に係る一実施例としての音声スイッチを
備えたハンズフリー通話機を示す図である。FIG. 2 is a diagram showing a hands-free telephone equipped with a voice switch as one embodiment according to the present invention.

【図３】実施例装置の動作シーケンスを説明する図であ
く。FIG. 3 is a diagram for explaining an operation sequence of the embodiment apparatus.

【図４】本発明の他の実施例を示す図である。FIG. 4 is a diagram showing another embodiment of the present invention.

【図５】ハンズフリー通話機等における音響エコーを説
明する図である。FIG. 5 is a diagram illustrating acoustic echo in a hands-free telephone or the like.

【図６】エコーキャンセラ方式を説明する図である。FIG. 6 is a diagram illustrating an echo canceller system.

【図７】音声スイッチ方式を説明する図である。FIG. 7 is a diagram illustrating a voice switch system.

【図８】エコーキャンセラ方式と音声スイッチ方式を比
較する図である。FIG. 8 is a diagram comparing an echo canceller system and a voice switch system.

【図９】従来の音声スイッチを備えたハンズフリー通話
機を示す図である。FIG. 9 is a diagram showing a hands-free telephone equipped with a conventional voice switch.

【図１０】従来装置における有音検出部の構成を示す図
である。FIG. 10 is a diagram showing a configuration of a sound detecting unit in a conventional device.

【図１１】有音／無音の判定テーブルの例を示す図であ
る。FIG. 11 is a diagram showing an example of a sound / silence determination table.

【図１２】フレーム処理により遅延を説明する図であ
る。FIG. 12 is a diagram illustrating delay by frame processing.

【図１３】従来装置の動作シーケンスを示す図である。FIG. 13 is a diagram showing an operation sequence of a conventional device.

[Explanation of symbols]

１受信部２、７パワー抑圧部３音声出力部４判定部６音声入力部８送信部４１、４３パワー計算部４２、４４有音検出部４５判定部４６受話音声判定部４７送話音声判定部４８受話音声一時記憶部４９送話音声推定部 1 receiver 2, 7 Power suppression unit 3 Audio output section 4 Judgment section 6 Voice input section 8 transmitter 41, 43 Power calculator 42, 44 voice detector 45 Judgment unit 46 Received voice judgment unit 47 Transmitted voice determination unit 48 Received voice temporary storage 49 Transmitted speech estimation unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者伊藤正人神奈川県川崎市中原区上小田中４丁目１番１号富士通株式会社内 (56)参考文献特開平５−75501（ＪＰ，Ａ) 特開平６−78046（ＪＰ，Ａ) 特開平７−202767（ＪＰ，Ａ) 特開平８−274689（ＪＰ，Ａ) 特開平５−75500（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) H04M 1/60 H04B 3/23 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Masato Ito 4-1-1 Kamiotanaka, Nakahara-ku, Kawasaki-shi, Kanagawa Fujitsu Limited (56) Reference JP-A-5-75501 (JP, A) 6-78046 (JP, A) JP 7-202767 (JP, A) JP 8-274689 (JP, A) JP 5-75500 (JP, A) (58) Fields investigated (Int .Cl. ⁷ , DB name) H04M 1/60 H04B 3/23

Claims

(57) [Claims]

1. A voice switch for a telephone with frame processing.
In, the delay means for delaying the received voice signal received by a predetermined time, the received voice determination means for determining whether to suppress the received voice signal based on the received voice signal received and the transmitted voice signal input, A transmission voice determination unit that determines whether or not to suppress the transmission voice signal based on the reception voice signal delayed by the delay unit and the input transmission voice signal, and the determination result of the reception voice determination unit wherein the receiving side suppressing means for suppressing the reception voice signal, and a transmitter-side suppression means for suppressing the transmission voice signal according to the determination result of the transmitted voice judging means, the delay time by the delay unit, the following By frame processing
The time lag between the received voice signal and the transmitted voice signal caused by
It is time to compensate and synchronize the two . The voice switch of the telephone.

2. A transmission voice estimation means for estimating a voice state of a transmission voice signal synchronized with the reception voice signal, wherein the transmission voice signal from the transmission voice estimation means is transmitted to the reception voice determination means. The voice switch for a telephone as claimed in claim 1, wherein the voice switch is configured to be inputted as a voice signal.