JPH11163888A

JPH11163888A - Voice coding transmitter

Info

Publication number: JPH11163888A
Application number: JP33825997A
Authority: JP
Inventors: Nobuyuki Hamada; 信行浜田
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1997-11-25
Filing date: 1997-11-25
Publication date: 1999-06-18

Abstract

PROBLEM TO BE SOLVED: To provide a voice coding transmitter by which a 'voiced sound' signal at a speech head is completely and surely sent. SOLUTION: A voice coding section 12 generates and outputs voice data resulting from coding a voice signal. A storage output section 13 stores voice data outputted from the voice coder 12 for a prescribed time and provides an output of them. A cell processing section 14 obtains voice cells resulting from assembling voice data into cells outputted from a storage output section 13 and provides an output. An idle cell generating section 15 generates an idle cell sent for management of a transmission line only. A voice level detection section 16 detects a voice level denoting the strength of the voice signal. A voiced sound discrimination section 17 discriminates whether or not the detected voice level reaches a reference voice level. A selection section 18 selects either the voice cell or the idle cell as a transmission cell according to the discrimination result. A transmission processing section 19 sends the selected transmission cell to a network.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は符号化した音声信号
の伝送路への送出を行う音声符号化伝送装置に係わり、
詳細には、与えられた音声信号のうち有音な部分のみを
伝送路へ送出する音声符号化伝送装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice coded transmission apparatus for transmitting a coded voice signal to a transmission line,
More specifically, the present invention relates to a voice coded transmission device that transmits only a sound part of a given voice signal to a transmission path.

【０００２】[0002]

【従来の技術】あらかじめ大きさが決められたセルを非
同期通信モード（Asynchronous Transfer Mode）で伝
送するＡＴＭネットワークにおいては、通信中にネット
ワークで輻輳が生じた場合、そのとき伝送中のセルの一
部を廃棄して輻輳の拡大を抑止することになっている。
そして、セルの廃棄が行われた場合、このセルが含まれ
るパケットの再送信などが行われるため、この再送信で
輻輳が生じる危険性が高まるとともに、この再送信によ
る通信時間の延長によりデータの伝送効率が低下してし
まうことが一般的に知られている。2. Description of the Related Art In an ATM network for transmitting a cell of a predetermined size in an asynchronous communication mode (Asynchronous Transfer Mode), when congestion occurs in the network during communication, a part of the cell being transmitted at that time. Is to be discarded to suppress the expansion of congestion.
When a cell is discarded, retransmission of a packet including the cell is performed, thereby increasing the risk of congestion due to the retransmission, and extending the communication time due to the retransmission, resulting in data transmission. It is generally known that transmission efficiency is reduced.

【０００３】近年、符号化した音声信号をＡＴＭネット
ワークを介して伝送する技術が一般化しつつある。そこ
で、特に音声信号を符号化して得られた音声データを対
象とするネットワークでの輻輳に関する各種の技術が提
案され、実際に用いられている。こうした技術のひとつ
に、無音圧縮と呼ばれるものがある。無音圧縮では、与
えられた音声信号の強さなどをあらかじめ決めた基準と
比較し、この基準を満たすものを「有音」の音声信号、
満たさないものを「無音」の音声信号とする。そして、
「有音」の音声信号については、その全内容を符号化し
てネットワークに送出するが、「無音」の音声信号につ
いては「無音」であることのみを伝えるセルを生成して
ネットワークに送出する。[0003] In recent years, a technique for transmitting an encoded voice signal via an ATM network has been generalized. Therefore, various techniques relating to congestion in a network especially for audio data obtained by encoding an audio signal have been proposed and actually used. One such technique is called silence compression. In silence compression, the strength of a given audio signal is compared with a predetermined criterion, and those that satisfy this criterion are referred to as "voiced" audio signals,
Those that do not satisfy the condition are referred to as “silent” audio signals. And
For a "voiced" voice signal, the entire contents are encoded and transmitted to the network. For a "voiceless" voice signal, a cell is transmitted that only indicates "silence" and transmitted to the network.

【０００４】この無音圧縮が適用されている通信システ
ムで、例えばＡＴＭネットワークを介して接続された第
１の電話と第２の電話との間で会話を行う場合について
考えてみる。この場合、通常の会話であれば、第１の電
話の側の利用者が話しているとき、第２の電話の側の利
用者は当然ながら黙って聞いているであろう。すなわ
ち、第１の電話の側で「有音」の音声信号が連続的に与
えられている間、第２の電話の側では「無音」の音声信
号が与えられていることが予想される。そこで、第１の
電話の側で与えられた「有音」の音声信号についてはそ
の全内容を符号化したセルをネットワークに送出し、第
２の電話の側でこれらのセルから元の「有音」の音声信
号を再生する。同時に、第２の電話の側で与えられた
「無音」の音声信号については「無音」であることのみ
を伝えるセルを生成してネットワークへ送出し、第１の
電話の側でこのセルに対応する「無音」の音声信号を復
元する。以上のように無音圧縮の処理を行えば、ＡＴＭ
ネットワークで伝送されるセルの多くは利用者が話して
いる電話の側の「有音」の音声信号から符号化したセル
となり、残りのごく一部のセルは利用者が聞いている電
話の側の「無音」を伝えるセルとなる。したがって、こ
の無音圧縮の適用により、ネットワークにおけるトラフ
ィックが半減するとともに、利用者が利便性の面で特別
な不都合を感じることはほとんどない。[0004] In a communication system to which this silence compression is applied, consider a case where a conversation is made between a first telephone and a second telephone connected via, for example, an ATM network. In this case, in a normal conversation, when the user on the first telephone side is talking, the user on the second telephone side will naturally listen silently. In other words, it is expected that the “voiceless” voice signal is continuously provided on the first telephone side while the “silent” voice signal is provided on the second telephone side. Therefore, for the "voiced" voice signal given by the first telephone, cells in which the entire contents are encoded are transmitted to the network, and the second telephone calls the original "voice" from these cells. Play the sound signal of "sound". At the same time, for the "silent" voice signal given on the second telephone side, a cell is generated to transmit only "silence" and transmitted to the network, and the first telephone responds to this cell. Restore the "silent" audio signal. By performing silence compression processing as described above, ATM
Many of the cells transmitted on the network are cells encoded from the "voiced" voice signal on the side of the telephone where the user is talking, and a small portion of the remaining cells are on the side of the telephone where the user is listening Cell that conveys "silence". Therefore, by applying the silence compression, the traffic in the network is reduced by half, and the user hardly feels any particular inconvenience in terms of convenience.

【０００５】上述した無音圧縮の適用を前提としたネッ
トワークでの輻輳に関する従来技術として、特開平５−
２４４１０４号公報記載の「音声符号化器」（以下、従
来技術と記す）、特開平６−６９９５０号公報記載の
「音声符号化伝送装置」（以下、従来技術と記す）、
特開平８−８９３３号公報記載の「音声セル符号化装
置」（以下、従来技術と記す）などが、それぞれ開示
されている。[0005] As a prior art relating to congestion in a network on the assumption that the above-mentioned silence compression is applied, Japanese Patent Application Laid-Open No. H05-205,051 discloses a conventional technique.
No. 244104, “Speech encoder” (hereinafter referred to as “prior art”), Japanese Unexamined Patent Publication No. 6-69950, “Speech coded transmission device” (hereinafter referred to as “prior art”),
Japanese Unexamined Patent Publication No. 8-8933 discloses a "speech cell coding apparatus" (hereinafter referred to as "prior art").

【０００６】従来技術は、ネットワークへ送信すべき
セルを格納しておく送信バッファのセル格納量に追随し
て音声データの圧縮（無音圧縮を含む）の程度を増大あ
るいは減少させるものである。この従来技術によれ
ば、ネットワークの混雑により送信バッファのセル格納
量が大きくなっているときに限って音声データの圧縮の
程度が増大するよう制御されるので、データ圧縮に伴っ
て必然的に生じる音声品質の劣化をネットワークで輻輳
が発生しているときのみに抑制でき、ＡＴＭネットワー
クで音声データを伝送する際の音声品質を維持すること
が可能となる。In the prior art, the degree of audio data compression (including silence compression) is increased or decreased following the cell storage amount of a transmission buffer for storing cells to be transmitted to a network. According to this conventional technique, the degree of compression of audio data is controlled to increase only when the cell storage amount of the transmission buffer is large due to network congestion. Deterioration of voice quality can be suppressed only when congestion occurs in the network, and it is possible to maintain voice quality when transmitting voice data in the ATM network.

【０００７】従来技術は、ネットワークに送出するセ
ルの各々に優先順位を付加し、ネットワークでの輻輳の
発生により一部のセルを廃棄せざるを得なくなったとき
にはこの優先順位にしたがって廃棄されるようにしたも
のである。この従来技術によれば、廃棄されるときの
優先順位を、補助的なデータからなるセル、無音を
表わすデータからなるセル、これら以外のセル、
のように付加することで、セルの廃棄があったとき受信
側で再生される音声信号の音声品質に与える影響を少な
くすることができる。In the prior art, a priority is added to each cell to be transmitted to the network, and when congestion occurs in the network, some cells have to be discarded so that the cells are discarded according to the priority. It was made. According to this prior art, the priority when discarded is determined by a cell composed of auxiliary data, a cell composed of data representing silence, a cell other than these,
Thus, the influence on the audio quality of the audio signal reproduced on the receiving side when the cell is discarded can be reduced.

【０００８】従来技術は、ＡＴＭネットワークでの伝
送中に音声データを構成するセルの一部が廃棄されて欠
落したとき、この欠落したセルの情報を正常に到着した
前後のセルから補間するものである。従来技術によれ
ば、送信側において、各セルの情報を補間するための最
適な補間方法を指示する選択信号を対応するセルに多重
化して送出し、受信側において、セルの欠落があったと
きにはこの選択信号に応じた最適な補間方法で欠落した
セルの情報を補間することができるので、セルの廃棄で
情報が欠落したとき、受信側で再生される音声信号の品
質劣化を抑えることができる。In the prior art, when a part of cells constituting voice data is discarded and lost during transmission on the ATM network, information of the lost cell is interpolated from cells before and after normal arrival. is there. According to the prior art, on the transmitting side, a selection signal indicating an optimal interpolation method for interpolating information of each cell is multiplexed and transmitted to a corresponding cell, and when a cell is missing on the receiving side, Since the information of the lost cell can be interpolated by the optimal interpolation method according to the selection signal, when the information is lost due to the discard of the cell, the quality deterioration of the audio signal reproduced on the receiving side can be suppressed. .

【０００９】[0009]

【発明が解決しようとする課題】ところで、上述した無
音圧縮を実現する際には、例えば、与えられた音声信号
の強さをあらかじめ決めた基準の強さと比較し、この基
準の強さを超えたか否かを検出する必要がある。そし
て、この検出結果にしたがって、音声信号を符号化した
セルおよび「無音」を伝えるセルのいずれかを選択し、
ネットワークへ送出しなければならない。そこで、上述
した比較および検出の精度や、この検出結果に応じたセ
ルの選択のタイミングなどにより、話の冒頭すなわち話
頭で与えられる音声信号が「無音」から「有音」に移行
するごく短い期間、「有音」であるべき音声信号の導入
部分が「無音」とされ、ネットワークへ送出されなくな
ってしまうことがあるという問題が発生した。このよう
な現象が発生した場合、例えば話頭で与えられた音声が
「さしすせそ」や「はひふへほ」など短く鋭い子音を有
する場合、この音声の受信側への伝わり方が不自然にな
ってしまい、意図した言葉が明瞭に伝わらなかったり、
意図しない言葉が誤って伝わったりすることがあるとい
う問題が発生した。When the above-described silent compression is realized, for example, the strength of a given audio signal is compared with the strength of a predetermined reference, and the strength of the reference signal exceeds the strength of this reference. Must be detected. Then, according to the detection result, select one of the cell that encodes the audio signal and the cell that transmits “silence”,
Must be sent to the network. Therefore, depending on the accuracy of the above-described comparison and detection, the timing of cell selection according to the detection result, and the like, the very beginning of the speech, that is, a very short period during which the speech signal given at the beginning of the speech transitions from “silence” to “voiced” However, there is a problem that an introduction portion of a voice signal that should be "voiced" is "silent" and may not be transmitted to the network. When such a phenomenon occurs, for example, if the voice given at the beginning of the speech has a short and sharp consonant such as "Sashisuse Soso" or "Hahifueho", the way of transmitting this voice to the receiving side becomes unnatural. And the intended words are not clearly communicated,
There was a problem that unintended words could be transmitted by mistake.

【００１０】一方、上記従来技術〜は、いずれも無
音圧縮を前提としたものであって、受信側に伝送された
セルから再生される音声信号の品質向上を図ることを目
的としている。そして、従来技術は、「無音」を伝え
るセルをできるだけ保存することによって、話頭切断あ
るいは話尾切断を防止できるとしている。しかしなが
ら、この従来技術は、ネットワークの輻輳に起因する
セルの廃棄により生じる話頭切断あるいは話尾切断を防
止するものであって、通常の会話における話頭、すなわ
ち直前まで聞き役であった電話の利用者が自ら話しはじ
めたときの話頭について、話頭切断あるいは話尾切断を
防止するものではない。また、従来技術およびに
は、話頭について何らの記述もない。On the other hand, the above-mentioned prior arts are all based on silent compression, and aim at improving the quality of an audio signal reproduced from a cell transmitted to a receiving side. According to the prior art, it is possible to prevent disconnection of the head or tail by preserving as much as possible the cell that transmits “silence”. However, this prior art is intended to prevent head-off or tail-cutting caused by cell discarding due to network congestion. It does not prevent the beginning of the talk or the end of the talk from being interrupted when the speaker begins to speak. In addition, there is no description in the prior art and the speech head.

【００１１】そこで本発明の目的は、話頭における「有
音」の音声信号が、ネットワークへ完全かつ確実に送出
される音声符号化伝送装置を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide a speech coded transmission apparatus in which a "voiced" speech signal at the beginning of a speech is completely and reliably transmitted to a network.

【００１２】[0012]

【課題を解決するための手段】請求項１記載の発明で
は、（イ）第１の音声信号を遅延させた第２の音声信号
を出力する遅延処理部と、（ロ）この第２の音声信号を
符号化した音声データを生成する音声符号化部と、
（ハ）この音声符号化部が生成した音声データをセル化
した音声セルを出力するセル化処理部と、（ニ）あらか
じめ決めた基準を前記した第１の音声信号が満たすと
き、このセル化処理部から出力される音声セルを伝送セ
ルとして選択する選択部を音声符号化伝送装置に具備さ
せる。According to the present invention, (a) a delay processing section for outputting a second audio signal obtained by delaying the first audio signal, and (b) this second audio signal An audio encoding unit that generates audio data obtained by encoding the signal;
(C) a cell processing section for outputting a voice cell obtained by converting the voice data generated by the voice coding section into cells, and (iv) a cell processing when the first voice signal satisfies a predetermined criterion. The speech coded transmission device is provided with a selection unit that selects a speech cell output from the processing unit as a transmission cell.

【００１３】すなわち請求項１記載の発明では、第１の
音声信号を遅延させた第２の音声信号から、この第２の
音声信号を符号化した音声データが生成され、さらに、
この音声データをセル化した音声セルが求められる。そ
して、あらかじめ決めた基準を第１の音声信号が満たす
とき、第２の音声信号から求められた音声セルが伝送セ
ルとして選択される。このため、第１の音声信号が基準
を満たさない状態から基準を満たす状態へ変化しつつあ
った場合、この変化を現実に検出した時点より上記遅延
相当分だけ遡った時点以後における第１の音声信号の内
容に対応する音声セルが伝送セルとして選択される。し
たがって、例えば「無音」から「有音」への変化を検出
する場合、上記変化を現実に検出した時点が厳密に正確
でなくとも、その少し前の時点以後における音声信号の
内容に対応する音声セルが伝送セルとして選択されるの
で、話頭における「有音」の音声信号を完全かつ確実に
伝送することができる。That is, according to the first aspect of the present invention, audio data obtained by encoding the second audio signal is generated from the second audio signal obtained by delaying the first audio signal.
A voice cell is obtained by converting the voice data into a cell. Then, when the first audio signal satisfies the predetermined criterion, the audio cell obtained from the second audio signal is selected as a transmission cell. For this reason, when the first audio signal is changing from a state that does not satisfy the criterion to a state that satisfies the criterion, the first audio signal after the time corresponding to the above-described delay from the time when this change is actually detected is detected. A voice cell corresponding to the content of the signal is selected as a transmission cell. Therefore, for example, when detecting a change from “silence” to “speech”, even if the time when the above change is actually detected is not exactly accurate, the sound corresponding to the content of the sound signal after a time immediately before that time Since the cell is selected as the transmission cell, the "voiced" voice signal at the beginning of the talk can be transmitted completely and reliably.

【００１４】請求項２記載の発明では、（イ）第１の音
声信号を遅延させた第２の音声信号を出力する遅延処理
部と、（ロ）この第２の音声信号を符号化した音声デー
タを生成する音声符号化部と、（ハ）この音声符号化部
が生成した音声データをセル化した音声セルを出力する
セル化処理部と、（ニ）伝送路の管理用のみに伝送され
る空きセルを生成する空きセル生成部と、（ホ）あらか
じめ決めた基準を前記した第１の音声信号が満たすとき
には前記したセル化処理部から出力される音声セルを伝
送セルとして選択し、前記した基準を前記した第１の音
声信号が満たさないときには前記した空きセル生成部に
より生成された空きセルを伝送セルとして選択する選択
部を音声符号化伝送装置に具備させる。According to the second aspect of the present invention, (a) a delay processing section for outputting a second audio signal obtained by delaying the first audio signal, and (b) an audio obtained by encoding the second audio signal A voice coding unit for generating data; (c) a cell processing unit for outputting voice cells obtained by converting the voice data generated by the voice coding unit into cells; and (d) a voice signal transmitted only for managing a transmission path. (E) selecting a voice cell output from the cell processing unit as a transmission cell when the first voice signal satisfies a predetermined criterion; When the first criterion signal does not satisfy the criterion, the speech coded transmission device is provided with a selection unit for selecting a vacant cell generated by the vacant cell generation unit as a transmission cell.

【００１５】すなわち請求項２記載の発明では、第１の
音声信号を遅延させた第２の音声信号から、この第２の
音声信号を符号化した音声データが生成され、さらに、
この音声データをセル化した音声セルが求められる。ま
た、空きセル生成部により、伝送路の管理用のみに伝送
される空きセルが生成される。そして、あらかじめ決め
た基準を第１の音声信号が満たすとき、第２の音声信号
から求められた音声セルが伝送セルとして選択される一
方、上記基準を第１の音声信号が満たさないとき、空き
セルが伝送セルとして選択される。このため、第１の音
声信号が、基準を満たさない状態から基準を満たす状態
へ変化しつつあった場合、この変化を現実に検出した時
点より上記遅延相当分だけ遡った時点以後については第
１の音声信号の内容に対応する音声セルが伝送セルとし
て選択され、これ以前については空きセルが伝送セルと
して選択される。したがって、例えば「無音」から「有
音」への変化を検出する場合、上記変化を現実に検出し
た時点が厳密に正確でなくとも、その少し前の時点以後
における音声信号の内容に対応する音声セルが伝送セル
として選択されるので、話頭における「有音」の音声信
号を完全かつ確実に伝送することができる。That is, according to the second aspect of the present invention, audio data obtained by encoding the second audio signal is generated from the second audio signal obtained by delaying the first audio signal.
A voice cell is obtained by converting the voice data into a cell. In addition, an empty cell generation unit generates an empty cell to be transmitted only for managing the transmission path. Then, when the first audio signal satisfies the predetermined criterion, the audio cell obtained from the second audio signal is selected as a transmission cell. A cell is selected as a transmission cell. For this reason, when the first audio signal is changing from a state that does not satisfy the criterion to a state that satisfies the criterion, the first sound signal is not changed from the time when the change is actually detected to the time after the time corresponding to the delay described above. A voice cell corresponding to the content of the voice signal is selected as a transmission cell, and before that, an empty cell is selected as a transmission cell. Therefore, for example, when detecting a change from “silence” to “speech”, even if the time when the above change is actually detected is not exactly accurate, the sound corresponding to the content of the sound signal after a time immediately before that time Since the cell is selected as the transmission cell, the "voiced" voice signal at the beginning of the talk can be transmitted completely and reliably.

【００１６】請求項３記載の発明では、（イ）音声信号
に対応する音声データを第１の時点で蓄積するととも
に、この音声データを前記した第１の時点以後の第２の
時点で出力する蓄積出力部と、（ロ）この蓄積出力部が
出力した音声データをセル化した音声セルを求めて出力
するセル化処理部と、（ハ）あらかじめ決めた基準を前
記した音声信号が満たすとき、このセル化処理部から出
力される音声セルを伝送セルとして選択する選択部を音
声符号化伝送装置に具備させる。According to the third aspect of the present invention, (a) audio data corresponding to an audio signal is stored at a first time, and the audio data is output at a second time after the first time. An accumulating output unit, (b) a cell processing unit for obtaining and outputting an audio cell obtained by converting the audio data output by the accumulating output unit into cells, and (c) when the audio signal satisfies a predetermined criterion, The speech coded transmission device is provided with a selection unit for selecting a speech cell output from the cell processing unit as a transmission cell.

【００１７】すなわち請求項３記載の発明では、音声信
号に対応する音声データが第１の時点で蓄積された後、
この第１の時点以後の第２の時点でセル化されて、該当
する音声セルが求められる。そして、あらかじめ決めた
基準を音声信号が満たすとき、第２の時点で求められた
音声セルが伝送セルとして選択される。このため、音声
信号が、基準を満たさない状態から基準を満たす状態へ
変化しつつあった場合、上記第１および第２の時点の差
に相当する時間だけ遡った時点以後における音声信号の
内容に対応する音声セルが伝送セルとして選択される。
したがって、例えば「無音」から「有音」への変化を検
出する場合、上記変化を現実に検出した時点が厳密に正
確でなくとも、その少し前の時点以後における音声信号
の内容に対応する音声セルが伝送セルとして選択される
ので、話頭における「有音」の音声信号を完全かつ確実
に伝送することができる。That is, according to the third aspect of the present invention, after the audio data corresponding to the audio signal is stored at the first time,
At the second time point after the first time point, the cell is formed and a corresponding voice cell is obtained. Then, when the voice signal satisfies the predetermined criterion, the voice cell obtained at the second time is selected as a transmission cell. Therefore, when the audio signal is changing from a state that does not satisfy the criterion to a state that satisfies the criterion, the content of the audio signal after the time corresponding to the time corresponding to the difference between the first and second time points is The corresponding voice cell is selected as a transmission cell.
Therefore, for example, when detecting a change from “silence” to “speech”, even if the time when the above change is actually detected is not exactly accurate, the sound corresponding to the content of the sound signal after a time immediately before that time Since the cell is selected as the transmission cell, the "voiced" voice signal at the beginning of the talk can be transmitted completely and reliably.

【００１８】請求項４記載の発明では、（イ）音声信号
に対応する音声データを第１の時点で蓄積するととも
に、この音声データを前記した第１の時点以後の第２の
時点で出力する蓄積出力部と、（ロ）この蓄積出力部が
出力した音声データをセル化した音声セルを求めて出力
するセル化処理部と、（ハ）伝送路の管理用のみに伝送
される空きセルを生成する空きセル生成部と、（ニ）あ
らかじめ決めた基準を前記した音声信号が満たすときに
は前記したセル化処理部から出力される音声セルを伝送
セルとして選択し、前記した基準を前記した音声信号が
満たさないときには前記した空きセル生成部により生成
された空きセルを伝送セルとして選択する選択部を音声
符号化伝送装置に具備させる。According to the fourth aspect of the present invention, (a) audio data corresponding to an audio signal is stored at a first time, and the audio data is output at a second time after the first time. An accumulator / output unit; (b) a cell processing unit for obtaining and outputting a cell of the audio data output from the accumulator / output unit; and (c) an empty cell transmitted only for managing the transmission path. An empty cell generation unit to be generated; and (d) when the audio signal satisfies a predetermined criterion, an audio cell output from the cell processing unit is selected as a transmission cell, and the audio signal satisfies the criterion. Is not satisfied, the speech encoding transmission apparatus is provided with a selection unit for selecting the empty cell generated by the empty cell generation unit as a transmission cell.

【００１９】すなわち請求項４記載の発明では、音声信
号に対応する音声データが第１の時点で蓄積された後、
この第１の時点以後の第２の時点でセル化されて、該当
する音声セルが求められる。また、空きセル生成部によ
り、伝送路の管理用のみに伝送される空きセルが生成さ
れる。そして、あらかじめ決めた基準を音声信号が満た
すとき、第２の時点で求められた音声セルが伝送セルと
して選択される一方、上記基準を音声信号が満たさない
とき、空きセルが伝送セルとして選択される。このた
め、音声信号が、基準を満たさない状態から基準を満た
す状態へ変化しつつあった場合、上記第１および第２の
時点の差に相当する時間だけ遡った時点以後については
音声信号の内容に対応する音声セルが伝送セルとして選
択され、これ以前については空きセルが伝送セルとして
選択される。したがって、例えば「無音」から「有音」
への変化を検出する場合、上記変化を現実に検出した時
点が厳密に正確でなくとも、その少し前の時点以後にお
ける音声信号の内容に対応する音声セルが伝送セルとし
て選択されるので、話頭における「有音」の音声信号を
完全かつ確実に伝送することができる。That is, after the audio data corresponding to the audio signal is stored at the first point in time,
At the second time point after the first time point, the cell is formed and a corresponding voice cell is obtained. In addition, an empty cell generation unit generates an empty cell to be transmitted only for managing the transmission path. When the voice signal satisfies the predetermined criterion, the voice cell determined at the second time is selected as a transmission cell, while when the voice signal does not satisfy the criterion, an empty cell is selected as a transmission cell. You. Therefore, when the audio signal is changing from a state that does not satisfy the criterion to a state that satisfies the criterion, after the time corresponding to the time corresponding to the difference between the first and second time points, Is selected as a transmission cell, and an empty cell before that is selected as a transmission cell. Therefore, for example, from "silence" to "voiced"
When detecting the change to the above, even if the time when the above change is actually detected is not exactly accurate, a voice cell corresponding to the content of the voice signal after a time immediately before that is selected as a transmission cell. Can be completely and reliably transmitted.

【００２０】請求項５記載の発明では、（イ）音声信号
を符号化した音声データを生成および出力する音声符号
化部と、（ロ）この音声符号化部が出力した音声データ
を第１の時点で蓄積するとともに、この音声データを前
記した第１の時点以後の第２の時点で出力する蓄積出力
部と、（ハ）この蓄積出力部が出力した音声データをセ
ル化した音声セルを求めて出力するセル化処理部と、
（ニ）あらかじめ決めた基準を前記した音声信号が満た
すとき、このセル化処理部から出力される音声セルを伝
送セルとして選択する選択部を音声符号化伝送装置に具
備させる。According to the fifth aspect of the present invention, (a) an audio encoding section for generating and outputting audio data obtained by encoding an audio signal, and (b) the audio data output from the audio encoding section is converted into a first audio data. A storage output unit that stores the audio data at a time point and outputs the audio data at a second time point after the first time point, and (c) obtains a voice cell obtained by converting the audio data output by the storage output unit into cells. A cell processing unit for outputting
(D) When the voice signal satisfies a predetermined criterion, the voice coded transmission device is provided with a selection unit for selecting a voice cell output from the cell processing unit as a transmission cell.

【００２１】すなわち請求項５記載の発明では、音声符
号化部による音声信号の符号化で得られた音声データが
第１の時点で蓄積された後、この第１の時点以後の第２
の時点でセル化されて、該当する音声セルが求められ
る。そして、あらかじめ決めた基準を音声信号が満たす
とき、第２の時点で求められた音声セルが伝送セルとし
て選択される。このため、音声信号が、基準を満たさな
い状態から基準を満たす状態へ変化しつつあった場合、
上記第１および第２の時点の差に相当する時間だけ遡っ
た時点以後における音声信号の内容に対応する音声セル
が伝送セルとして選択される。したがって、例えば「無
音」から「有音」への変化を検出する場合、上記変化を
現実に検出した時点が厳密に正確でなくとも、その少し
前の時点以後における音声信号の内容に対応する音声セ
ルが伝送セルとして選択されるので、話頭における「有
音」の音声信号を完全かつ確実に伝送することができ
る。That is, according to the fifth aspect of the present invention, after the audio data obtained by the encoding of the audio signal by the audio encoding unit is stored at the first time, the second data after the first time is stored.
At the point of time, a corresponding voice cell is obtained. Then, when the voice signal satisfies the predetermined criterion, the voice cell obtained at the second time is selected as a transmission cell. Therefore, when the audio signal is changing from a state that does not satisfy the criterion to a state that satisfies the criterion,
The voice cell corresponding to the content of the voice signal after the time point advanced by the time corresponding to the difference between the first and second time points is selected as the transmission cell. Therefore, for example, when detecting a change from “silence” to “speech”, even if the time when the above change is actually detected is not exactly accurate, the sound corresponding to the content of the sound signal after a time immediately before that time Since the cell is selected as the transmission cell, the "voiced" voice signal at the beginning of the talk can be transmitted completely and reliably.

【００２２】請求項６記載の発明では、（イ）音声信号
を符号化した音声データを生成および出力する音声符号
化部と、（ロ）この音声符号化部が出力した音声データ
を第１の時点で蓄積するとともに、この音声データを前
記した第１の時点以後の第２の時点で出力する蓄積出力
部と、（ハ）この蓄積出力部が出力した音声データをセ
ル化した音声セルを求めて出力するセル化処理部と、
（ニ）伝送路の管理用のみに伝送される空きセルを生成
する空きセル生成部と、（ホ）あらかじめ決めた基準を
前記した音声信号が満たすときには前記したセル化処理
部から出力される音声セルを伝送セルとして選択し、前
記した基準を前記した音声信号が満たさないときには前
記した空きセル生成部により生成された空きセルを伝送
セルとして選択する選択部を音声符号化伝送装置に具備
させる。According to the sixth aspect of the present invention, (a) an audio encoding section for generating and outputting audio data obtained by encoding an audio signal, and (b) the audio data output by the audio encoding section as the first audio data. A storage output unit that stores the audio data at a time point and outputs the audio data at a second time point after the first time point, and (c) obtains a voice cell obtained by converting the audio data output by the storage output unit into cells. A cell processing unit for outputting
(D) an empty cell generator for generating an empty cell transmitted only for transmission path management, and (e) audio output from the cell processor when the audio signal satisfies a predetermined criterion. The voice coded transmission device is provided with a selection unit that selects a cell as a transmission cell and selects the empty cell generated by the empty cell generation unit as a transmission cell when the above-mentioned criterion is not satisfied by the above-mentioned audio signal.

【００２３】すなわち請求項６記載の発明では、音声符
号化部による音声信号の符号化で得られた音声データが
第１の時点で蓄積された後、この第１の時点以後の第２
の時点でセル化されて、該当する音声セルが求められ
る。また、空きセル生成部により、伝送路の管理用のみ
に伝送される空きセルが生成される。そして、あらかじ
め決めた基準を音声信号が満たすとき、第２の時点で求
められた音声セルが伝送セルとして選択される一方、上
記基準を音声信号が満たさないとき、空きセルが伝送セ
ルとして選択される。このため、音声信号が、基準を満
たさない状態から基準を満たす状態へ変化しつつあった
場合、上記第１および第２の時点の差に相当する時間だ
け遡った時点以後については音声信号の内容に対応する
音声セルが伝送セルとして選択され、これ以前について
は空きセルが伝送セルとして選択される。したがって、
例えば「無音」から「有音」への変化を検出する場合、
上記変化を現実に検出した時点が厳密に正確でなくと
も、その少し前の時点以後における音声信号の内容に対
応する音声セルが伝送セルとして選択されるので、話頭
における「有音」の音声信号を完全かつ確実に伝送する
ことができる。That is, in the invention according to claim 6, after the audio data obtained by encoding the audio signal by the audio encoding unit is stored at the first time, the second data after the first time is stored.
At the point of time, a corresponding voice cell is obtained. In addition, an empty cell generation unit generates an empty cell to be transmitted only for managing the transmission path. When the voice signal satisfies the predetermined criterion, the voice cell determined at the second time is selected as a transmission cell, while when the voice signal does not satisfy the criterion, an empty cell is selected as a transmission cell. You. Therefore, when the audio signal is changing from a state that does not satisfy the criterion to a state that satisfies the criterion, after the time corresponding to the time corresponding to the difference between the first and second time points, Is selected as a transmission cell, and an empty cell before that is selected as a transmission cell. Therefore,
For example, when detecting a change from “silence” to “voiced”,
Even if the time at which the above change is actually detected is not exactly accurate, a voice cell corresponding to the content of the voice signal after a time immediately before that is selected as a transmission cell, so that a "voiced" voice signal at the beginning of the speech Can be completely and reliably transmitted.

【００２４】請求項７記載の発明では、（イ）音声信号
を符号化した音声データを生成および出力する音声符号
化部と、（ロ）この音声符号化部が出力した直後の音声
データを第１の記憶位置に蓄積するとともに、以前に蓄
積された音声データを第２の記憶位置から読み出して出
力する蓄積出力部と、（ハ）この蓄積出力部が出力した
音声データをセル化した音声セルを求めて出力するセル
化処理部と、（ニ）あらかじめ決めた基準を前記した音
声信号が満たすとき、このセル化処理部から出力される
音声セルを伝送セルとして選択する選択部を音声符号化
伝送装置に具備させる。According to the seventh aspect of the present invention, (a) an audio encoding section for generating and outputting audio data obtained by encoding an audio signal, and (b) audio data immediately after the audio encoding section outputs the audio data. A storage output unit that stores the audio data previously stored in the storage location and reads out and outputs the previously stored audio data from the second storage location; and (c) an audio cell that is a cell of the audio data output by the storage output unit. And (d) when the audio signal satisfies a predetermined criterion, selects a voice cell output from the cell processing unit as a transmission cell. A transmission device is provided.

【００２５】すなわち請求項７記載の発明では、音声符
号化部による音声信号の符号化で得られた直後の音声デ
ータが蓄積出力部における第１の記憶位置に蓄積される
とともに、以前に蓄積された音声データが蓄積出力部に
おける第２の記憶位置から読み出されてセル化され、該
当する音声セルが求められる。そして、あらかじめ決め
た基準を音声信号が満たすとき、蓄積出力部に蓄積され
た音声データから求められた音声セルが伝送セルとして
選択される。このため、音声信号が、基準を満たさない
状態から基準を満たす状態へ変化しつつあった場合、蓄
積出力部における上記第１および第２の記憶位置にそれ
ぞれ音声データが蓄積される時刻の差に相当する時間だ
け遡った時点以後における音声信号の内容に対応する音
声セルが伝送セルとして選択される。したがって、例え
ば「無音」から「有音」への変化を検出する場合、上記
変化を現実に検出した時点が厳密に正確でなくとも、そ
の少し前の時点以後における音声信号の内容に対応する
音声セルが伝送セルとして選択されるので、話頭におけ
る「有音」の音声信号を完全かつ確実に伝送することが
できる。That is, according to the seventh aspect of the present invention, the audio data immediately after being obtained by the encoding of the audio signal by the audio encoding unit is stored in the first storage position in the storage output unit and previously stored. The read audio data is read from the second storage position in the accumulation output unit and is cellized, and a corresponding audio cell is obtained. Then, when the audio signal satisfies the predetermined criterion, the audio cell obtained from the audio data stored in the storage output unit is selected as the transmission cell. Therefore, when the audio signal is changing from a state that does not satisfy the criterion to a state that satisfies the criterion, the difference between the times at which the audio data is stored in the first and second storage positions in the storage output unit is determined. An audio cell corresponding to the content of the audio signal after the point in time corresponding to the corresponding time is selected as a transmission cell. Therefore, for example, when detecting a change from “silence” to “speech”, even if the time when the above change is actually detected is not exactly accurate, the sound corresponding to the content of the sound signal after a time immediately before that time Since the cell is selected as the transmission cell, the "voiced" voice signal at the beginning of the talk can be transmitted completely and reliably.

【００２６】請求項８記載の発明では、（イ）音声信号
を符号化した音声データを生成および出力する音声符号
化部と、（ロ）この音声符号化部が出力した直後の音声
データを第１の記憶位置に蓄積するとともに、以前に蓄
積された音声データを第２の記憶位置から読み出して出
力する蓄積出力部と、（ハ）この蓄積出力部が出力した
音声データをセル化した音声セルを求めて出力するセル
化処理部と、（ニ）伝送路の管理用のみに伝送される空
きセルを生成する空きセル生成部と、（ホ）あらかじめ
決めた基準を前記した音声信号が満たすときには前記し
たセル化処理部から出力される音声セルを伝送セルとし
て選択し、前記した基準を前記した音声信号が満たさな
いときには前記した空きセル生成部により生成された空
きセルを伝送セルとして選択する選択部を音声符号化伝
送装置に具備させる。According to the invention of claim 8, (a) an audio encoding section for generating and outputting audio data obtained by encoding an audio signal, and (b) audio data immediately after the audio encoding section outputs the audio data. A storage output unit that stores the audio data previously stored in the storage location and reads out and outputs the previously stored audio data from the second storage location; and (c) an audio cell that is a cell of the audio data output by the storage output unit. (D) a vacant cell generator for generating a vacant cell transmitted only for transmission path management; and (e) when the audio signal satisfies a predetermined criterion. The voice cell output from the cell processing unit is selected as a transmission cell, and the vacant cell generated by the vacant cell generation unit is transmitted when the voice signal does not satisfy the criterion. And it is provided to the speech coding transmission equipment selection unit configured to select it.

【００２７】すなわち請求項８記載の発明では、音声符
号化部による音声信号の符号化で得られた直後の音声デ
ータが蓄積出力部における第１の記憶位置に蓄積される
とともに、以前に蓄積された音声データが蓄積出力部に
おける第２の記憶位置から読み出されてセル化され、該
当する音声セルが求められる。また、空きセル生成部に
より、伝送路の管理用のみに伝送される空きセルが生成
される。そして、あらかじめ決めた基準を音声信号が満
たすとき、蓄積出力部に蓄積された音声データから求め
られた音声セルが伝送セルとして選択される一方、上記
基準を音声信号が満たさないとき、空きセルが伝送セル
として選択される。このため、音声信号が、基準を満た
さない状態から基準を満たす状態へ変化しつつあった場
合、蓄積出力部における上記第１および第２の記憶位置
にそれぞれ音声データが蓄積される時刻の差に相当する
時間だけ遡った時点以後については音声信号の内容に対
応する音声セルが伝送セルとして選択され、これ以前に
ついては空きセルが伝送セルとして選択される。したが
って、例えば「無音」から「有音」への変化を検出する
場合、上記変化を現実に検出した時点が厳密に正確でな
くとも、その少し前の時点以後における音声信号の内容
に対応する音声セルが伝送セルとして選択されるので、
話頭における「有音」の音声信号を完全かつ確実に伝送
することができる。That is, in the eighth aspect of the present invention, the audio data immediately after being obtained by the encoding of the audio signal by the audio encoding unit is stored in the first storage position in the storage output unit and previously stored. The read audio data is read from the second storage position in the accumulation output unit and is cellized, and a corresponding audio cell is obtained. In addition, an empty cell generation unit generates an empty cell to be transmitted only for managing the transmission path. Then, when the audio signal satisfies the predetermined criterion, the audio cell obtained from the audio data stored in the storage output unit is selected as a transmission cell. Selected as a transmission cell. Therefore, when the audio signal is changing from a state that does not satisfy the criterion to a state that satisfies the criterion, the difference between the times at which the audio data is stored in the first and second storage positions in the storage output unit is determined. After the point in time corresponding to the corresponding time, a voice cell corresponding to the content of the voice signal is selected as a transmission cell, and before that, an empty cell is selected as a transmission cell. Therefore, for example, when detecting a change from “silence” to “speech”, even if the time when the above change is actually detected is not exactly accurate, the sound corresponding to the content of the sound signal after a time immediately before that time Since the cell is selected as the transmission cell,
The "voiced" voice signal at the beginning of the speech can be transmitted completely and reliably.

【００２８】[0028]

【発明の実施の形態】以下実施例につき本発明を詳細に
説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described in detail below with reference to embodiments.

【００２９】図１は本発明の一実施例における音声符号
化伝送装置の構成を表わしたものである。同図中、入力
装置１１は音声信号を入力するものであり、ＰＢＸ（構
内交換装置）などの機器に接続される。音声符号化部１
２は、入力装置１１から与えられる音声信号を符号化し
た音声データを生成および出力する。蓄積出力部１３
は、音声符号化装置１２が出力した音声データを一定時
間蓄積した後に出力する。セル化処理部１４は、この蓄
積出力部１３が出力した音声データをセル化した音声セ
ルを求めて出力する。空きセル生成部１５は、伝送路の
管理用のみに伝送される空きセルを生成する。音声レベ
ル検出部１６は、入力装置１１から与えられる音声信号
の強さを表わす音声レベルを検出する。有音判定部１７
は、この音声レベル検出部１６により検出された音声レ
ベルがあらかじめ決めた基準を満たすか否か、すなわ
ち、この音声信号が基準の音声レベルに達する「有音」
の音声信号であるか、あるいは基準の音声レベルに達し
ない「無音」の音声信号であるかを判定するものであ
る。選択部１８は、有音判定部１７の判定結果にしたが
って音声セルおよび空きセルのいずれかを伝送セルとし
て選択する。送出処理部１９は、この選択部１８により
選択された伝送セルをネットワークへ送出する処理を行
うものであり、例えばＡＴＭ交換機などから構成され
る。FIG. 1 shows the configuration of a speech coded transmission apparatus according to an embodiment of the present invention. In FIG. 1, an input device 11 inputs a voice signal, and is connected to a device such as a private branch exchange (PBX). Voice encoding unit 1
2 generates and outputs audio data obtained by encoding an audio signal provided from the input device 11. Storage output unit 13
Is output after accumulating the audio data output by the audio encoding device 12 for a certain period of time. The cell processing section 14 obtains and outputs a voice cell obtained by converting the voice data output from the accumulation output section 13 into cells. The empty cell generation unit 15 generates an empty cell transmitted only for managing the transmission path. The audio level detection section 16 detects an audio level indicating the strength of the audio signal provided from the input device 11. Sound existence judgment unit 17
Indicates whether or not the audio level detected by the audio level detection unit 16 satisfies a predetermined criterion, that is, “sound” when the audio signal reaches the reference audio level.
This is to determine whether the audio signal is an audio signal of “no sound” or a “silent” audio signal that does not reach the reference audio level. The selection unit 18 selects one of a voice cell and an empty cell as a transmission cell according to the determination result of the sound determination unit 17. The transmission processing unit 19 performs a process of transmitting the transmission cell selected by the selection unit 18 to the network, and includes, for example, an ATM switch.

【００３０】図２は図１中に示した蓄積出力部の構成を
表わしたものである。同図中、記憶部２１は、与えられ
た音声データをそれぞれ記憶させておくものである。書
き込み手段２２は、音声符号化装置１２が出力した直後
の音声データを記憶部２１に記憶させる。第１の記憶位
置２２ａは、書き込み手段２２により音声データが書き
込まれる記憶部２１中の位置である。読み出し手段２３
は、記憶部２１が以前に記憶した音声データを読み出し
て出力する。第２の記憶位置２３ａは、読み出し手段２
３により音声データが読み出される記憶部２１中の位置
である。FIG. 2 shows the configuration of the accumulation output section shown in FIG. In the figure, a storage unit 21 stores given audio data. The writing unit 22 causes the storage unit 21 to store the audio data immediately after output by the audio encoding device 12. The first storage location 22a is a location in the storage unit 21 where the audio data is written by the writing unit 22. Reading means 23
Reads out and outputs the audio data previously stored by the storage unit 21. The second storage location 23 a
3 is a position in the storage unit 21 from which audio data is read.

【００３１】図３は本発明の一実施例による音声信号の
処理を具体的に表わしたものであって、上段の図３
（ａ）は音声レベル検出部１６により検出された音声レ
ベルの変化を、下段の図３（ｂ）は送出処理部１９によ
りネットワークに送出された伝送セルから復元される音
声信号の音声レベルの変化を、それぞれ表わしている。
図３（ａ）において、横軸には音声レベル検出部１６に
より音声レベルが検出された時刻を、縦軸にはその音声
レベルの大きさを、それぞれ対応させてある。図３
（ｂ）において、横軸には送出処理部１９により伝送セ
ルが送出された時刻を、縦軸にはその伝送セルから復元
し得る音声信号の音声レベルの大きさを、それぞれ対応
させてある。また、図３（ａ）および図３（ｂ）は、横
軸に示した時刻が互いに一致するように描いてある。FIG. 3 specifically shows processing of an audio signal according to an embodiment of the present invention.
FIG. 3A shows a change in the sound level detected by the sound level detection unit 16, and FIG. 3B in the lower part shows a change in the sound level of the sound signal restored from the transmission cell transmitted to the network by the transmission processing unit 19. , Respectively.
In FIG. 3A, the horizontal axis indicates the time at which the audio level is detected by the audio level detection unit 16, and the vertical axis indicates the magnitude of the audio level. FIG.
In (b), the horizontal axis corresponds to the time at which the transmission cell was transmitted by the transmission processing unit 19, and the vertical axis corresponds to the audio level of the audio signal that can be restored from the transmission cell. 3 (a) and 3 (b) are drawn such that the times shown on the horizontal axis coincide with each other.

【００３２】図３（ａ）において、Ｌ₁は入力装置１１
から意図的に入力された音声信号をこれと無関係な雑音
から聞き分けるために必要な最小限の音声レベル、Ｌ₂
は入力装置１１から入力された音声信号が有音判定部１
７により「有音」であると判定されるときの音声レベル
である。そして、実線で描いたＳａ₁およびＳａ₂は入力
装置１１から意図的に入力された音声信号の変化を示
す。また、ｔ₁は音声信号Ｓａ₁が音声レベルＬ₁に達す
る時刻、ｔ₂は音声信号Ｓａ₁が音声レベルＬ₂に達する
時刻、ｔ₃は音声信号Ｓａ₁が音声レベルＬ₁を割込む時
刻、ｔ₄は音声信号Ｓａ₂が音声レベルＬ₁に達する時
刻、ｔ₅は音声信号Ｓａ₂が音声レベルＬ₂に達する時
刻、ｔ₆は音声信号Ｓａ₂が音声レベルＬ₁を割込む時刻
であって、時刻ｔ₁および時刻ｔ₂の差と時刻ｔ₄および
時刻ｔ₅との差は、いずれも同一の時間Δｔとなるよう
にされている。一方、図３（ｂ）において、Ｌ₃は伝送
セルから復元される音声信号をこれと無関係な雑音から
聞き分けるために必要な最小限の音声レベルであり、実
線で描いたＳｂ₁およびＳｂ₂は上述した音声信号Ｓａ₁
およびＳａ₂にそれぞれ対応して送出処理部１９から送
出される音声セルから復元される音声信号の変化を示
す。また、図３（ａ）および図３（ｂ）において、音声
信号Ｓｂ₁が音声レベルＬ₃に達する時刻は上述した時刻
ｔ₂、音声信号Ｓｂ₂が音声レベルＬ₃に達する時刻は上
述した時刻ｔ₅となるようにされている。In FIG. 3A, L ₁ is the input device 11
L ₂ , which is the minimum sound level necessary for distinguishing a speech signal intentionally input from the CDMA from unrelated noise.
Indicates that the voice signal input from the input device 11
7 is the sound level when it is determined that the sound is present. Sa ₁ and Sa ₂ drawn by solid lines indicate changes in the audio signal intentionally input from the input device 11. Also, t ₁ is the time when the audio signal Sa ₁ reaches the audio level L ₁ , t ₂ is the time when the audio signal Sa ₁ reaches the audio level L ₂ , and t ₃ is the time when the audio signal Sa ₁ interrupts the audio level L _1. , t ₄ is the time the audio signal Sa ₂ reaches a sound level L _1, t ₅ the time the audio signal Sa ₂ reaches a sound level L _2, t ₆ is at the time the audio signal Sa ₂ is interrupted the audio level L ₁ The difference between the times t ₁ and t _{2 and} the difference between the times t ₄ and t ₅ are set to be the same time Δt. On the other hand, in FIG. ₃ (b), L 3 is the minimum sound level required to discern the sound signal to be restored from a transmission cell from which the extraneous noise, Sb ₁ and Sb ₂ drawn by a solid line The above-described audio signal Sa ₁
And shows changes of the audio signal to be restored from the voice cell respectively Sa ₂ is delivered from the delivery unit 19 correspondingly. Time Further, in FIG. 3 (a) and 3 (b), the time that the audio signal Sb ₁ reaches a sound level L ₃ is the time the time t ₂ described above, the audio signal Sb ₂ reaches a sound level L ₃ is as described above have been in such a way that t _5.

【００３３】次に、本実施例における音声符号化伝送装
置の動作を説明する。Next, the operation of the speech coded transmission apparatus according to this embodiment will be described.

【００３４】入力装置１１は、入力された音声信号を音
声符号化部１２および音声レベル検出部１６にそれぞれ
与える。音声符号化部１２は、入力装置１１から与えら
れる音声信号をディジタル符号化した音声データを生成
および出力する。蓄積出力部１３は、音声符号化装置１
２が出力した音声データを一定時間すなわち図３中に示
した時間Δｔだけ蓄積した後、先に蓄積された音声デー
タから順に出力する。すなわち、図２に示したように、
音声符号化装置１２が出力した直後の音声データは、書
き込み手段２２により記憶部２１中の第１の記憶位置２
２ａに記憶されるとともに、この第１の記憶位置２２ａ
の値がインクリメントされる。そして時間Δｔが経過し
た後、第２の記憶位置２３ａの値がインクリメントされ
て先の第１の記憶位置２２ａの値に一致すると、読み出
し手段２３により以前に記憶された音声データが第２の
記憶位置２３ａから読み出されて出力される。The input device 11 supplies the input audio signal to the audio encoding unit 12 and the audio level detection unit 16, respectively. The audio encoding unit 12 generates and outputs audio data obtained by digitally encoding an audio signal supplied from the input device 11. The accumulation output unit 13 includes the audio encoding device 1
After accumulating the audio data output by the audio data 2 for a fixed time, that is, the time Δt shown in FIG. 3, the audio data is output in order from the audio data that has been stored first. That is, as shown in FIG.
The audio data immediately after output from the audio encoding device 12 is written into the first storage location 2 in the storage unit 21 by the writing unit 22.
2a and the first storage location 22a
Is incremented. After the elapse of the time Δt, the value of the second storage location 23a is incremented and coincides with the value of the first storage location 22a, and the voice data previously stored by the reading means 23 is stored in the second storage location 23a. It is read from the position 23a and output.

【００３５】セル化処理部１４は、この蓄積出力部１３
が出力した音声データをＡＴＭセル化した音声セルを求
めて出力する。空きセル生成部１５は、伝送路の管理用
のみに伝送され、「無音」を表わす空きセルを生成す
る。音声レベル検出部１６は、入力装置１１から与えら
れる音声信号の強さを表わす音声レベルを検出する。有
音判定部１７は、この音声レベル検出部１６により検出
された音声レベルがあらかじめ決めた基準の音声レベル
に達するか否かを判定し、その判定信号を出力する。選
択部１８は、有音判定部１７から出力される判定信号に
したがって、検出された音声レベルが基準の音声レベル
に達することを表わす判定信号であれば音声セルを伝送
セルとして選択する一方、検出された音声レベルが基準
の音声レベルに達しないことを表わす判定信号であれば
空きセルを伝送セルとして選択する。送出処理部１９
は、この選択部１８により選択された伝送セル、すなわ
ち音声セルおよび空きセルのいずれかをネットワークへ
送出する。The cell processing section 14 is provided with the storage output section 13
The ATM cell is converted into an ATM cell to output the voice data. The empty cell generation unit 15 generates an empty cell that is transmitted only for managing the transmission path and represents “silence”. The audio level detection section 16 detects an audio level indicating the strength of the audio signal provided from the input device 11. The sound determination section 17 determines whether or not the audio level detected by the audio level detection section 16 reaches a predetermined reference audio level, and outputs a determination signal. The selecting unit 18 selects a voice cell as a transmission cell according to the determination signal output from the sound determination unit 17 if the detected voice level is a determination signal indicating that the detected voice level reaches a reference voice level. If the determined voice level does not reach the reference voice level, a vacant cell is selected as a transmission cell. Transmission processing unit 19
Sends out the transmission cell selected by the selection unit 18, that is, one of a voice cell and an empty cell to the network.

【００３６】上述した本実施例における音声符号化伝送
装置によれば、音声信号の「無音」から「有音」への変
化を検出する場合、上記変化を現実に検出した時点が厳
密に正確でなくとも、その少し前の時点以後における音
声信号の内容に対応する音声セルが伝送セルとして選択
される。例えば、図３（ａ）に示すように、入力装置１
１から入力された音声信号が有音判定部１７により「有
音」であると判定されるときの音声レベルＬ₂は、この
音声信号と無関係な雑音から聞き分けるために必要な最
小限の音声レベルＬ₁より大きいのが普通であるため、
そのまま「有音」および「無音」の判定信号に応じて伝
送セルを選択すると、図３（ａ）中にハッチングで示し
た領域Ｐ₁および領域Ｐ₂の音声データが無視され、空き
セルが伝送セルとして選択されてしまう。これに対し本
実施例では、図３（ｂ）に示すように、判定信号に対し
て時間Δｔだけ音声データを遅延させたので、「無音」
から「有音」への変化を現実に検出した時刻ｔ₂あるい
は時刻ｔ₅より時間Δｔだけ遡った時点すなわち時刻ｔ₁
あるいは時刻ｔ₄以後については音声セルが伝送セルと
して選択され、時刻ｔ₁あるいは時刻ｔ₄以前については
空きセルが伝送セルとして選択される。したがって、時
間Δｔの値を適当に決めれば、話頭における「有音」の
音声信号を完全かつ確実に伝送することができ、無音圧
縮によるトラフィックの低減と話頭切断のない自然な音
声の伝送とを両立させることができる。According to the above-described speech coded transmission apparatus in the present embodiment, when detecting a change in the sound signal from "silence" to "speech", the time when the change is actually detected is strictly accurate. At least, a voice cell corresponding to the content of the voice signal after a point immediately before that is selected as a transmission cell. For example, as shown in FIG.
The audio level L ₂ when the audio signal input from 1 is determined to be “voiced” by the voiced sound determination unit 17 is the minimum audio level required to distinguish the voice signal from noise unrelated to this audio signal. for greater than L ₁ is usually,
Continued selecting a transmission cell according to the determination signal of "voiced" and "silence", FIGS. 3 (a) audio data in the area P ₁ and the region P ₂ shown by hatching in is ignored, empty cell transmission It is selected as a cell. On the other hand, in the present embodiment, as shown in FIG. 3B, since the audio data is delayed by the time Δt with respect to the determination signal, “silence”
From the time that is the time change was retroactive only time Δt than time t ₂ or time t ₅ has been detected in the reality of the "voiced" t ₁
Alternatively the time t ₄ subsequent voice cell is selected as the transmission cell, the time t ₁ or time t ₄ the empty cell for the previous is selected as the transmission cell. Therefore, if the value of the time Δt is appropriately determined, the “voiced” voice signal at the beginning of the speech can be transmitted completely and reliably, and the reduction of traffic by silence compression and the transmission of natural speech without the beginning of speech can be achieved. Can be compatible.

【００３７】なお、本実施例における音声符号化伝送装
置では、音声データを時間Δｔだけ遅延させる手段とし
て、音声データを一定時間蓄積した後に出力する蓄積出
力部１３を用いたが、結果的に音声信号を遅延させられ
るものであれば、他の手段を用いてもよい。また、Δｔ
の値を一定値とする必要はなく、例えば音声レベルの増
加の度合に応じてΔｔの値を可変とするようにしてもよ
い。In the voice coded transmission apparatus according to the present embodiment, the storage output unit 13 for storing voice data after storing it for a certain period of time is used as means for delaying voice data by the time Δt. Other means may be used as long as the signal can be delayed. Δt
Does not need to be a fixed value, and the value of Δt may be made variable according to, for example, the degree of increase in the audio level.

【００３８】[0038]

【発明の効果】以上説明したように請求項１記載の発明
によれば、第１の音声信号が基準を満たさない状態から
基準を満たす状態へ変化しつつあった場合、この変化を
現実に検出した時点より上記遅延相当分だけ遡った時点
以後における第１の音声信号の内容に対応する音声セル
が伝送セルとして選択される。したがって、例えば「無
音」から「有音」への変化を検出する場合、上記変化を
現実に検出した時点が厳密に正確でなくとも、その少し
前の時点以後における音声信号の内容に対応する音声セ
ルが伝送セルとして選択されるので、話頭における「有
音」の音声信号を完全かつ確実に伝送することができ、
無音圧縮によるトラフィックの低減と話頭切断のない自
然な音声の伝送とを両立させることができる。As described above, according to the first aspect of the present invention, when the first audio signal is changing from a state not satisfying the criterion to a state satisfying the criterion, this change is actually detected. The voice cell corresponding to the content of the first voice signal after the time point corresponding to the delay corresponding to the delay time is selected as the transmission cell. Therefore, for example, when detecting a change from “silence” to “speech”, even if the time when the above change is actually detected is not exactly accurate, the sound corresponding to the content of the sound signal after a time immediately before that time Since the cell is selected as the transmission cell, the "voiced" voice signal at the beginning of the speech can be transmitted completely and reliably,
It is possible to achieve both the reduction of traffic by silence compression and the transmission of natural voice without truncation.

【００３９】また、請求項２記載の発明によれば、第１
の音声信号が、基準を満たさない状態から基準を満たす
状態へ変化しつつあった場合、この変化を現実に検出し
た時点より上記遅延相当分だけ遡った時点以後について
は第１の音声信号の内容に対応する音声セルが伝送セル
として選択され、これ以前については空きセルが伝送セ
ルとして選択される。したがって、例えば「無音」から
「有音」への変化を検出する場合、上記変化を現実に検
出した時点が厳密に正確でなくとも、その少し前の時点
以後における音声信号の内容に対応する音声セルが伝送
セルとして選択されるので、話頭における「有音」の音
声信号を完全かつ確実に伝送することができ、無音圧縮
によるトラフィックの低減と話頭切断のない自然な音声
の伝送とを両立させることができる。According to the second aspect of the present invention, the first
Is changing from a state that does not satisfy the criterion to a state that satisfies the criterion, the content of the first audio signal is not changed after the time corresponding to the above-mentioned delay from the time when this change is actually detected. Is selected as a transmission cell, and an empty cell before that is selected as a transmission cell. Therefore, for example, when detecting a change from “silence” to “speech”, even if the time when the above change is actually detected is not exactly accurate, the sound corresponding to the content of the sound signal after a time immediately before that time Since the cell is selected as a transmission cell, the "voiced" voice signal at the head of the speech can be transmitted completely and reliably, and both the reduction of traffic due to silence compression and the transmission of natural voice without truncation of the head can be achieved. be able to.

【００４０】さらに、請求項３記載の発明によれば、音
声信号が、基準を満たさない状態から基準を満たす状態
へ変化しつつあった場合、上記第１および第２の時点の
差に相当する時間だけ遡った時点以後における音声信号
の内容に対応する音声セルが伝送セルとして選択され
る。したがって、例えば「無音」から「有音」への変化
を検出する場合、上記変化を現実に検出した時点が厳密
に正確でなくとも、その少し前の時点以後における音声
信号の内容に対応する音声セルが伝送セルとして選択さ
れるので、話頭における「有音」の音声信号を完全かつ
確実に伝送することができ、無音圧縮によるトラフィッ
クの低減と話頭切断のない自然な音声の伝送とを両立さ
せることができる。Further, according to the third aspect of the present invention, when the audio signal is changing from a state not satisfying the criterion to a state satisfying the criterion, it corresponds to the difference between the first and second time points. The voice cell corresponding to the content of the voice signal after the point in time when the time goes back is selected as the transmission cell. Therefore, for example, when detecting a change from “silence” to “speech”, even if the time when the above change is actually detected is not exactly accurate, the sound corresponding to the content of the sound signal after a time immediately before that time Since the cell is selected as a transmission cell, the "voiced" voice signal at the head of the speech can be transmitted completely and reliably, and both the reduction of traffic due to silence compression and the transmission of natural voice without truncation of the head can be achieved. be able to.

【００４１】また、請求項４記載の発明によれば、音声
信号が、基準を満たさない状態から基準を満たす状態へ
変化しつつあった場合、上記第１および第２の時点の差
に相当する時間だけ遡った時点以後については音声信号
の内容に対応する音声セルが伝送セルとして選択され、
これ以前については空きセルが伝送セルとして選択され
る。したがって、例えば「無音」から「有音」への変化
を検出する場合、上記変化を現実に検出した時点が厳密
に正確でなくとも、その少し前の時点以後における音声
信号の内容に対応する音声セルが伝送セルとして選択さ
れるので、話頭における「有音」の音声信号を完全かつ
確実に伝送することができ、無音圧縮によるトラフィッ
クの低減と話頭切断のない自然な音声の伝送とを両立さ
せることができる。According to the fourth aspect of the invention, when the audio signal is changing from a state not satisfying the criterion to a state satisfying the criterion, it corresponds to the difference between the first and second time points. After the point in time when the time goes back, a voice cell corresponding to the content of the voice signal is selected as a transmission cell,
Before this, empty cells are selected as transmission cells. Therefore, for example, when detecting a change from “silence” to “speech”, even if the time when the above change is actually detected is not exactly accurate, the sound corresponding to the content of the sound signal after a time immediately before that time Since the cell is selected as a transmission cell, the "voiced" voice signal at the head of the speech can be transmitted completely and reliably, and both the reduction of traffic due to silence compression and the transmission of natural voice without truncation of the head can be achieved. be able to.

【００４２】さらに、請求項５記載の発明によれば、音
声信号が、基準を満たさない状態から基準を満たす状態
へ変化しつつあった場合、上記第１および第２の時点の
差に相当する時間だけ遡った時点以後における音声信号
の内容に対応する音声セルが伝送セルとして選択され
る。したがって、例えば「無音」から「有音」への変化
を検出する場合、上記変化を現実に検出した時点が厳密
に正確でなくとも、その少し前の時点以後における音声
信号の内容に対応する音声セルが伝送セルとして選択さ
れるので、話頭における「有音」の音声信号を完全かつ
確実に伝送することができ、無音圧縮によるトラフィッ
クの低減と話頭切断のない自然な音声の伝送とを両立さ
せることができる。Further, according to the invention described in claim 5, when the audio signal is changing from a state not satisfying the criterion to a state satisfying the criterion, it corresponds to the difference between the first and second time points. The voice cell corresponding to the content of the voice signal after the point in time when the time goes back is selected as the transmission cell. Therefore, for example, when detecting a change from “silence” to “speech”, even if the time when the above change is actually detected is not exactly accurate, the sound corresponding to the content of the sound signal after a time immediately before that time Since the cell is selected as a transmission cell, the "voiced" voice signal at the head of the speech can be transmitted completely and reliably, and both the reduction of traffic due to silence compression and the transmission of natural voice without truncation of the head can be achieved. be able to.

【００４３】また、請求項６記載の発明によれば、音声
信号が、基準を満たさない状態から基準を満たす状態へ
変化しつつあった場合、上記第１および第２の時点の差
に相当する時間だけ遡った時点以後については音声信号
の内容に対応する音声セルが伝送セルとして選択され、
これ以前については空きセルが伝送セルとして選択され
る。したがって、例えば「無音」から「有音」への変化
を検出する場合、上記変化を現実に検出した時点が厳密
に正確でなくとも、その少し前の時点以後における音声
信号の内容に対応する音声セルが伝送セルとして選択さ
れるので、話頭における「有音」の音声信号を完全かつ
確実に伝送することができ、無音圧縮によるトラフィッ
クの低減と話頭切断のない自然な音声の伝送とを両立さ
せることができる。According to the sixth aspect of the present invention, when the audio signal is changing from a state where the standard is not satisfied to a state where the standard is satisfied, this corresponds to the difference between the first and second time points. After the point in time when the time goes back, a voice cell corresponding to the content of the voice signal is selected as a transmission cell,
Before this, empty cells are selected as transmission cells. Therefore, for example, when detecting a change from “silence” to “speech”, even if the time when the above change is actually detected is not exactly accurate, the sound corresponding to the content of the sound signal after a time immediately before that time Since the cell is selected as a transmission cell, the "voiced" voice signal at the head of the speech can be transmitted completely and reliably, and both the reduction of traffic due to silence compression and the transmission of natural voice without truncation of the head can be achieved. be able to.

【００４４】さらに、請求項７記載の発明によれば、音
声信号が、基準を満たさない状態から基準を満たす状態
へ変化しつつあった場合、蓄積出力部における上記第１
および第２の記憶位置にそれぞれ音声データが蓄積され
る時刻の差に相当する時間だけ遡った時点以後における
音声信号の内容に対応する音声セルが伝送セルとして選
択される。したがって、例えば「無音」から「有音」へ
の変化を検出する場合、上記変化を現実に検出した時点
が厳密に正確でなくとも、その少し前の時点以後におけ
る音声信号の内容に対応する音声セルが伝送セルとして
選択されるので、話頭における「有音」の音声信号を完
全かつ確実に伝送することができ、無音圧縮によるトラ
フィックの低減と話頭切断のない自然な音声の伝送とを
両立させることができる。According to the seventh aspect of the present invention, when the audio signal is changing from a state not satisfying the criterion to a state satisfying the criterion, the first signal in the accumulation output unit is output.
The audio cell corresponding to the content of the audio signal after the point in time when the time corresponding to the difference between the times at which the audio data is stored in the second storage location and the time when the audio data is stored is selected as the transmission cell. Therefore, for example, when detecting a change from “silence” to “speech”, even if the time when the above change is actually detected is not exactly accurate, the sound corresponding to the content of the sound signal after a time immediately before that time Since the cell is selected as a transmission cell, the "voiced" voice signal at the head of the speech can be transmitted completely and reliably, and both the reduction of traffic due to silence compression and the transmission of natural voice without truncation of the head can be achieved. be able to.

【００４５】また、請求項８記載の発明によれば、音声
信号が、基準を満たさない状態から基準を満たす状態へ
変化しつつあった場合、蓄積出力部における上記第１お
よび第２の記憶位置にそれぞれ音声データが蓄積される
時刻の差に相当する時間だけ遡った時点以後については
音声信号の内容に対応する音声セルが伝送セルとして選
択され、これ以前については空きセルが伝送セルとして
選択される。したがって、例えば「無音」から「有音」
への変化を検出する場合、上記変化を現実に検出した時
点が厳密に正確でなくとも、その少し前の時点以後にお
ける音声信号の内容に対応する音声セルが伝送セルとし
て選択されるので、話頭における「有音」の音声信号を
完全かつ確実に伝送することができ、無音圧縮によるト
ラフィックの低減と話頭切断のない自然な音声の伝送と
を両立させることができる。According to the eighth aspect of the present invention, when the audio signal is changing from a state that does not satisfy the criterion to a state that satisfies the criterion, the first and second storage positions in the accumulation output unit are set. After the time corresponding to the time corresponding to the difference between the times at which the voice data is stored, the voice cell corresponding to the content of the voice signal is selected as the transmission cell, and before that, the empty cell is selected as the transmission cell. You. Therefore, for example, from "silence" to "voiced"
In the case of detecting a change to the above, even if the time when the above change is actually detected is not exactly accurate, a voice cell corresponding to the content of the voice signal after a time immediately before that is selected as a transmission cell. In this case, the "voiced" voice signal can be completely and reliably transmitted, and both the reduction of traffic due to silence compression and the transmission of natural voice without truncation can be achieved.

[Brief description of the drawings]

【図１】本発明の一実施例における音声符号化伝送装置
の構成を表わしたブロック図である。FIG. 1 is a block diagram illustrating a configuration of a speech coded transmission device according to an embodiment of the present invention.

【図２】図１中に示した蓄積出力部の構成を表わした概
念図である。FIG. 2 is a conceptual diagram showing a configuration of an accumulation output unit shown in FIG.

【図３】本発明の一実施例による音声信号の処理を具体
的に表わしたタイミング図である。FIG. 3 is a timing chart specifically illustrating processing of an audio signal according to an embodiment of the present invention.

[Explanation of symbols]

１２音声符号化部１３蓄積出力部１４セル化処理部１５空きセル生成部１６音声レベル検出部１７有音判定部１８選択部 Reference Signs List 12 audio encoding unit 13 accumulation output unit 14 cell processing unit 15 empty cell generation unit 16 audio level detection unit 17 voiced judgment unit 18 selection unit

Claims

[Claims]

1. A delay processing unit that outputs a second audio signal obtained by delaying a first audio signal, an audio encoding unit that generates audio data obtained by encoding the second audio signal, A cell processing unit that outputs a voice cell obtained by converting the voice data generated by the encoding unit into cells; and a voice cell output from the cell processing unit when the first voice signal satisfies a predetermined criterion. And a selecting unit for selecting a transmission cell.

2. A delay processing section for outputting a second audio signal obtained by delaying a first audio signal; an audio encoding section for generating audio data obtained by encoding the second audio signal; A cell processing unit that outputs voice cells obtained by converting the voice data generated by the encoding unit into cells; an empty cell generation unit that generates empty cells transmitted only for transmission path management; and When the first audio signal is satisfied, an audio cell output from the cell processing unit is selected as a transmission cell, and when the first audio signal does not satisfy the criterion, an empty cell generated by the empty cell generation unit. And a selection unit for selecting a transmission cell as a transmission cell.

3. An accumulation output unit for accumulating audio data corresponding to an audio signal at a first point in time and outputting the audio data at a second point in time after the first point in time. A cell processing unit for obtaining and outputting a voice cell obtained by converting the output voice data into cells; and when the voice signal satisfies a predetermined criterion, selecting a voice cell output from the cell processing unit as a transmission cell. A speech coded transmission device comprising: a selection unit.

4. An accumulation output unit for accumulating audio data corresponding to an audio signal at a first point in time and outputting the audio data at a second point in time after the first point in time. A cell processing unit for obtaining and outputting a voice cell obtained by converting the output voice data into a cell; a vacant cell generating unit for generating a vacant cell transmitted only for transmission path management; and A selection unit that selects a voice cell output from the cell processing unit as a transmission cell when satisfies, and selects a vacant cell generated by the vacant cell generation unit as a transmission cell when the criterion is not satisfied by the voice signal. A voice coded transmission device comprising:

5. An audio encoding section for generating and outputting audio data obtained by encoding an audio signal, storing the audio data output from the audio encoding section at a first point in time, and A storage output unit for outputting at a second time point after the time point 1, a cell processing unit for obtaining and outputting voice cells obtained by converting the audio data output by the storage output unit into cells, And a selecting unit for selecting, as a transmission cell, a voice cell output from the cell processing unit when the signal is satisfied.

6. An audio encoding unit for generating and outputting audio data obtained by encoding an audio signal, storing the audio data output by the audio encoding unit at a first point in time, A storage output unit for outputting at a second time point after the time point 1, a cell processing unit for obtaining and outputting a voice cell obtained by converting the voice data output by the storage output unit into cells, A vacant cell generation unit for generating a vacant cell to be transmitted, and when the voice signal satisfies a predetermined criterion, a voice cell output from the cell processing unit is selected as a transmission cell, and the criterion is such that the voice signal is A selecting unit for selecting, as a transmission cell, a vacant cell generated by the vacant cell generating unit when the condition is not satisfied.

7. An audio encoding unit for generating and outputting audio data obtained by encoding an audio signal, and audio data immediately after output by the audio encoding unit are stored in a first storage location and previously stored. A storage output unit for reading and outputting the obtained audio data from the second storage location, a cell processing unit for obtaining and outputting a voice cell obtained by converting the audio data output by the storage output unit into cells, a predetermined reference And a selector for selecting, as a transmission cell, a speech cell output from the cell processing unit when the audio signal satisfies the following.

8. An audio encoding unit for generating and outputting audio data obtained by encoding an audio signal, and audio data immediately after output by the audio encoding unit are stored in a first storage location and previously stored. A storage and output unit for reading and outputting the obtained voice data from the second storage location, a cell processing unit for obtaining and outputting voice cells obtained by converting the voice data output from the storage and output unit into cells, A vacant cell generation unit that generates a vacant cell transmitted only for use, when the voice signal satisfies a predetermined criterion, selects a voice cell output from the cell processing unit as a transmission cell, and sets the criterion to the criterion. And a selecting unit for selecting an empty cell generated by the empty cell generating unit as a transmission cell when the audio signal is not satisfied.