JP4406382B2

JP4406382B2 - Speech coding selection control method

Info

Publication number: JP4406382B2
Application number: JP2005140529A
Authority: JP
Inventors: 岳至森; 仲大室; 祐介日和▲崎▼; 章俊片岡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2005-05-13
Filing date: 2005-05-13
Publication date: 2010-01-27
Anticipated expiration: 2025-05-13
Also published as: JP2006319685A

Description

本発明は、IPネットワークでの音声パケット通信における符号化演算量を制御する音声符号化選択制御方法、及びこの制御方法を用いて動作する音声パケット送信装置、受信装置、プログラム及びこれを記録した記録媒体に関する。 The present invention relates to a voice coding selection control method for controlling a coding calculation amount in voice packet communication in an IP network, a voice packet transmitting device, a receiving device, a program, and a recording recording the same. It relates to the medium.

インターネットではIP（Internet Protocol）パケットを利用してリアルタイムに音声信号を送信するVoIP（Voice over Internet Protocol）（以下単に音声パケット通信と称す）技術を使った音声通信サービスや画像通信と組み合わせたテレビ会議サービスが普及しつつあるが、通信中に音切れが発生するという問題がある。
音声パケット通信における音切れの原因として、現在広く使われているインターネットがベストエフォート型のIPネットワークであり、パケットが確実に着信先に到着する保証がないため、ネットワークの輻輳等の原因により、パケットが通信の途中で消失（パケットロス）してしまうことがあり、受信側にパケットが到着せず、再生する音を作成できないために音切れが発生することがある。このようなネットワークの輻輳に起因するパケット消失を抑制する手法として、ネットワークの輻輳時に低ビットレートの音声符号化を使用することが効果的である。例えば特許文献１にはパケット損失率等によりネットワークの状態を評価し、ネットワークの状態によって符号化方式を切り替えることでパケットロスを防止する手法が提案されている。 In the Internet, video conferencing combined with voice communication services and image communication using VoIP (Voice over Internet Protocol) technology (hereinafter simply referred to as voice packet communication) that transmits voice signals in real time using IP (Internet Protocol) packets Although services are spreading, there is a problem that sound interruption occurs during communication.
As a cause of sound interruption in voice packet communication, the currently widely used Internet is a best-effort IP network, and there is no guarantee that the packet will surely arrive at the destination. May be lost in the middle of communication (packet loss), and packets may not arrive at the receiving side and sound to be reproduced cannot be created, which may cause sound interruption. As a technique for suppressing packet loss due to such network congestion, it is effective to use low bit rate speech coding during network congestion. For example, Patent Document 1 proposes a method for preventing packet loss by evaluating a network state based on a packet loss rate or the like and switching an encoding method according to the network state.

受信端末において音声を実時間再生するためには、送信端末から音声データを決まった時間間隔（以下フレームと記載する、１０〜２０ms程度の時間間隔であることが多い）で送信する必要があり、上記フレームの時間間隔内に符号化処理を終了させる必要がある。しかし、低ビットレートの音声符号化は信号を精密に分析する必要上、演算処理量が多くなるため、例えば同時に画像データを符号化し、送信処理するプログラムが協動している状況下では画像データの処理量の変動に伴なって、送信端末の負荷が増大する状況が発生し易い。その結果、音声データの符号化処理を上記フレームの時間間隔で行なうことが出来ず、音声データを格納したパケット（本明細書では以下音声パケットと記載する）をフレームの時間間隔で送信側から送信することが出来なくなるためにパケットロスが発生してしまい、ネットワークが空いていても受信側で実時間再生を行なうことが出来ず、音切れが発生してしまうという問題があった。
特開２００２−３００２７４「ゲートウェイ装置及び音声データ転送方法」 In order to reproduce the audio at the receiving terminal in real time, it is necessary to transmit the audio data from the transmitting terminal at a fixed time interval (hereinafter referred to as a frame, which is often a time interval of about 10 to 20 ms). It is necessary to finish the encoding process within the time interval of the frame. However, since low bit rate speech coding requires a precise analysis of the signal and requires a large amount of calculation processing, for example, in the situation where a program that simultaneously encodes image data and performs transmission processing cooperates with the image data. As the amount of processing increases, the load on the transmitting terminal tends to increase. As a result, the encoding process of the audio data cannot be performed at the time interval of the frame, and the packet storing the audio data (hereinafter referred to as the audio packet in this specification) is transmitted from the transmission side at the time interval of the frame. As a result, packet loss occurs due to the inability to perform playback, and even when the network is free, there is a problem in that real-time playback cannot be performed on the receiving side and sound interruption occurs.
Japanese Patent Laid-Open No. 2002-300274 “Gateway Device and Voice Data Transfer Method”

本発明は、上述の問題点に鑑みてなされたものであり、音声パケット通信を行なう際、通信端末の負荷に起因するパケットロスを抑制し高品質な音声パケット通信を実現することが目的である。 The present invention has been made in view of the above-described problems, and an object of the present invention is to realize high-quality voice packet communication by suppressing packet loss caused by a load on a communication terminal when performing voice packet communication. .

本発明の骨子は、IPネットワークにおける音声パケット通信において、送信端末で音声符号化に要した演算時間を計測し、前記演算時間が所定の時間以上の場合には送信端末の負荷が大きくなっていると判定し、低負荷な音声符号化方法を選択する制御を行なうことで端末の負荷の増大に起因するパケットロスを抑制し、音切れを防止する音声符号化選択制御方法及びこの制御方法を用いて動作する音声パケット送信装置、受信装置、を提案するものである。
本発明で提案する音声符号化選択制御方法は、離散音声楽音サンプルを入力とし、離散音声楽音信号サンプルを符号化し、符号化されたディジタル符号をパケット化して送信する際に、所定時間毎の離散音声楽音信号サンプルを符号化するために必要な演算量の異なる少なくとも２種類の音声符号化方式を動的に切り替える音声符号化選択制御方法であって、所定時間毎の音声区間を音声符号化する時間を測定するステップと、音声符号化に要する時間が所定時間より長くなった場合に、演算量の少ない音声符号化方式を選択するステップとを備えることを特徴とする。 The essence of the present invention is that, in voice packet communication in an IP network, the calculation time required for voice encoding at the transmission terminal is measured, and when the calculation time is a predetermined time or more, the load on the transmission terminal is large. A speech coding selection control method for suppressing packet loss due to an increase in the load on the terminal by performing control for selecting a low-load speech coding method and preventing sound interruption, and using this control method A voice packet transmitting device and a receiving device that operate in this manner are proposed.
The speech coding selection control method proposed in the present invention receives discrete speech musical tone samples as input, encodes discrete speech musical tone signal samples, packetizes the encoded digital code, and transmits the packets. A voice coding selection control method for dynamically switching at least two kinds of voice coding systems having different calculation amounts necessary for coding a voice music signal sample, and voice coding a voice section every predetermined time. The method includes a step of measuring time, and a step of selecting a speech coding method with a small amount of calculation when a time required for speech coding becomes longer than a predetermined time.

この発明では更に上記音声符号化選択制御方法において、音声符号化に要する時間が所定時間より短くなった場合に、少なくとも伝送する情報量が少ない音声符号化方式に戻すステップを備えることを特徴とする。
本発明による音声符号化選択制御方法は演算量が少ない音声符号化方式を選択するステップでは、更に演算時間がある一定の閾値よりも長い場合に、同時に動作している画像符号化処理を一時停止する制御を行なうステップを備える。 According to the present invention, the speech coding selection control method further includes the step of returning to the speech coding system with a small amount of information to be transmitted when the time required for speech coding is shorter than a predetermined time. .
In the speech coding selection control method according to the present invention, in the step of selecting a speech coding method with a small amount of computation, when the computation time is longer than a certain threshold, the image coding processing that is simultaneously operating is temporarily stopped. A step of performing control .

本発明により、IPネットワークにおいて音声通信を行なう際に、音切れ発生の原因となる端末の負荷の増大に起因するパケットロスの発生を、音声符号化に要する演算時間に従い、符号化演算量が異なる音声符号化法を切り替える制御や画像符号化の動作・非動作を切り替える制御を行なうことで制御することが出来る。 According to the present invention, when voice communication is performed in an IP network, the amount of encoding calculation varies depending on the calculation time required for voice encoding in order to generate packet loss due to an increase in the load on a terminal that causes sound interruption. Control can be performed by performing control for switching the audio encoding method and control for switching between operation and non-operation of image encoding.

本発明による音声符号化選択制御方法を実行する音声パケット送信装置及びこの音声パケット送信装置が送信した音声パケットを受信し、復号して音声を再生する音声パケット受信装置は全てをハードウェアにより構築することができる。然し乍ら、これより簡素に実現するにはコンピュータに本発明で提案する音声パケット送信プログラム及び音声パケット受信プログラムをインストールし、コンピュータに備えたCPUでこれらのプログラムを解読させ、プログラムを実行させることによりコンピュータに音声パケット送信装置及び音声パケット受信装置として機能させる実施形態が最良の実施形態である。 The voice packet transmitting apparatus that executes the voice coding selection control method according to the present invention and the voice packet receiving apparatus that receives the voice packets transmitted by the voice packet transmitting apparatus, decodes them, and reproduces the voice are constructed entirely by hardware. be able to. However, in order to more simply realize this, the voice packet transmission program and the voice packet reception program proposed in the present invention are installed in the computer, the CPU provided in the computer decodes these programs, and the program is executed. An embodiment in which the device functions as a voice packet transmitting device and a voice packet receiving device is the best embodiment.

コンピュータに本発明による音声パケット送信装置として機能させるには、コンピュータには所定時間毎の音声区間の音声符号化に要する演算時間を計測する符号化時間計測部と、この符号化時間計測部で計測した演算時間に基づき、音声符号化に要する演算時間が所定時間より長くなった場合に、演算量の少ない音声符号化方式を選択する符号化方式選択部と、選択した音声符号化方式を表わす符号化識別信号を音声符号化信号に付加してパケット信号として送出する音声パケット送出部とを構築する。
コンピュータに本発明による音声パケット受信装置として機能させるには、コンピュータには音声パケット受信部と、音声符号化信号と符号化識別信号とを分離する音声符号化列分離部と、演算量が異なる符号化方式により符号化された符号化データをそれぞれ復号する少なくとも２種類の復号化部と、音声符号化列分離部で分割した符号化識別信号により復号化部の何れかを選択した復号動作させる復号選択部とを構築する。 In order for a computer to function as the voice packet transmitting apparatus according to the present invention, the computer has a coding time measuring unit that measures the calculation time required for voice coding of a voice section every predetermined time, and the coding time measuring unit measures the calculation time. Based on the calculated computation time, when the computation time required for speech encoding is longer than a predetermined time, a coding method selection unit that selects a speech coding method with a small amount of computation, and a code representing the selected speech coding method A voice packet transmission unit is configured to add the encrypted identification signal to the voice coded signal and send it as a packet signal.
In order for a computer to function as the voice packet receiving device according to the present invention, the computer has a voice packet receiving unit, a voice coded sequence separating unit that separates a voice coded signal and a coded identification signal, and codes having different calculation amounts. Decoding operation in which at least two types of decoding units that respectively decode encoded data encoded by the encoding method and a decoding unit selected by the encoded identification signal divided by the audio encoded sequence separation unit Build a selector.

以下に各部の実施例を図面を用いて詳細に説明する。 Hereinafter, embodiments of each part will be described in detail with reference to the drawings.

図１に本発明による音声パケット送信装置１００の第１の実施例を示す。本発明による音声パケット送信装置１００はタイマー１０１と、符号化部１０２と、符号化時間計測部１０３と、音声符号化方式選択部１０４と、符号化方式選択部１０５と、音声パケット作成部１０６と、音声パケット送信バッファ１０７と、音声パケット送出部１０８とを備えて構成される。
タイマー１０１はある一定の時間（例えば１msなど）ごとに時刻データを出力する。時刻データは２つの時刻間の相対時間が分かる形式であれば良い。例えば０から始まる一定の時間（例えば１msなど）ごとに１ずつ増える時刻データを出力する。 FIG. 1 shows a first embodiment of a voice packet transmitting apparatus 100 according to the present invention. The voice packet transmitting apparatus 100 according to the present invention includes a timer 101, an encoding unit 102, an encoding time measuring unit 103, an audio encoding method selection unit 104, an encoding method selection unit 105, and an audio packet creation unit 106. The voice packet transmission buffer 107 and the voice packet transmission unit 108 are provided.
The timer 101 outputs time data every certain time (for example, 1 ms). The time data may be in a format in which the relative time between two times is known. For example, time data that is incremented by 1 every certain time starting from 0 (for example, 1 ms) is output.

符号化部１０２は入力音声信号を符号化するために必要な単位時間当たりの演算量が異なる複数の符号化方式を内蔵している。本実施例では符号化手段１０２Ａの符号化方式ＡをITU-T G.729とし、符号化手段１０２Ｂの符号化方式ＢをITU-T G.711を用いた場合について説明する。これらの符号化方式の特徴を図４に示す。図４から明らかなように、符号化方式ITU-T G.729は伝送する情報量が少なくて済む代わりに、符号化のための演算量が大きい特質を持つ。また符号化方式ITU-T G.711は伝送する情報量が大きい代わりに、符号化のための演算量が小さい特質を持っている。ここでは２種類の音声符号化を有する構成について説明しているが、３種類以上の符号化を適用しても良い。 The encoding unit 102 incorporates a plurality of encoding methods having different calculation amounts per unit time required for encoding the input speech signal. In the present embodiment, a case will be described in which the encoding method A of the encoding means 102A is ITU-T G.729 and the encoding method B of the encoding means 102B is ITU-T G.711. The characteristics of these encoding methods are shown in FIG. As is apparent from FIG. 4, the encoding scheme ITU-T G.729 has a characteristic that the amount of calculation for encoding is large, instead of a small amount of information to be transmitted. Also, the encoding scheme ITU-T G.711 has a characteristic that the amount of computation for encoding is small, instead of the large amount of information to be transmitted. Here, a configuration having two types of speech encoding has been described, but three or more types of encoding may be applied.

入力音声信号（ディジタル化された離散音声楽音信号サンプル）は符号化方式選択部１０５の選択に従って符号化手段１０２Ａか１０２Ｂの何れかに入力され、符号化される。定常状態では伝送路への影響を考慮して伝送する情報量が少ない符号化（G.729）で符号化動作する符号化手段１０２Ａが選択されて動作するように初期設定されているものとすると、定常状態では符号化方式選択部１０５は入力音声信号を符号化手段１０２Ａに入力し、符号化手段１０２Ａを動作させる。
符号化時間計測部１０３はタイマー１０１が出力する時刻データを用いて符号化部１０２の符号化に要する時間を計測する。符号化に要する時間とは入力音声信号フレーム長（音声信号をディジタル化したディジタル符号列を所定時間間隔に区切る長さ）を例えば20msとした場合、この20msの間に存在するNサンプルのデータを符号化する時間を指す。符号化時間計測部１０３は符号化部１０２で動作する符号化手段１０２Ａ又は１０２Ｂの符号化に要する時間を計測し、その計測結果を音声符号化方式選択部１０４に出力する。音声符号化に要した演算時間は演算時間（＝（符号化終了時刻）−（開始時刻））で求める。 An input speech signal (a digitized discrete speech tone signal sample) is input to the encoding means 102A or 102B and encoded according to the selection of the encoding method selection unit 105. In a steady state, it is assumed that the encoding unit 102A that performs an encoding operation with an encoding (G.729) with a small amount of information to be transmitted in consideration of the influence on the transmission path is initially set to be selected and operated. In the steady state, the encoding scheme selection unit 105 inputs the input speech signal to the encoding unit 102A and operates the encoding unit 102A.
The encoding time measuring unit 103 measures the time required for encoding by the encoding unit 102 using the time data output from the timer 101. The time required for encoding is, for example, when the input audio signal frame length (the length for dividing a digital code string obtained by digitizing an audio signal into predetermined time intervals) is 20 ms, N samples of data existing in the 20 ms are Refers to the time to encode. The encoding time measuring unit 103 measures the time required for encoding by the encoding unit 102A or 102B operating in the encoding unit 102, and outputs the measurement result to the speech encoding method selection unit 104. The computation time required for speech coding is obtained by computation time (= (coding end time) − (start time)).

音声符号化方式選択部１０４は符号化時間計測部１０３から与えられる計測時間に従って符号化識別信号を出力し、この符号化識別信号によって符号化方式選択部１０５の状態を切り替える。図５に音声符号化方式選択部１０４が出力する符号化識別信号の例を示す。図５に示す例では演算時間が5msより大きい場合は符号化識別信号に１理論を割当て、演算時間が5msより小さい場合は符号化識別信号に０論理を割当てた場合を示す。この符号化識別信号は送信側及び受信側で用意されている符号化方式を指定できれば何でも良い。 The speech coding scheme selection unit 104 outputs a coding identification signal according to the measurement time given from the coding time measurement unit 103, and switches the state of the coding scheme selection unit 105 according to this coding identification signal. FIG. 5 shows an example of a coding identification signal output from the speech coding method selection unit 104. In the example shown in FIG. 5, when the calculation time is greater than 5 ms, 1 theory is assigned to the encoded identification signal, and when the calculation time is less than 5 ms, 0 logic is assigned to the encoded identification signal. This encoding identification signal may be anything as long as the encoding method prepared on the transmission side and the reception side can be designated.

符号化識別信号を符号化方式選択部１０５に入力することにより、符号化方式選択部１０５は符号化識別信号に従って入力音声信号の供給先を決定する。つまり、演算時間がこの例では5msより小さい状態では伝送する情報量が少ないG.729で符号化する。符号化手段１０２Ａに入力音声信号を入力する状態に維持する。
これに対し、演算時間が5msより大きくなった場合には演算量が小さい符号化手段１０２Ｂに入力音声信号を供給する状態に切り替わる。
符号化部１０２で符号化された音声符号化列データ及び１フレーム遅延部１０９で１フレーム分の遅延が与えられた符号化識別信号を音声パケット作成部１０６に入力する。音声パケット作成部１０６では、従来のVoIPと同様に音声符号化列データを入力とし、パケットを区別するためのパケット番号及び符号化識別信号、時刻情報などが記載されたパケットヘッダを付加した音声パケットデータを出力する。図３にその様子を示す。 By inputting the coding identification signal to the coding method selection unit 105, the coding method selection unit 105 determines the supply destination of the input speech signal according to the coding identification signal. That is, in this example, encoding is performed with G.729, which has a small amount of information to be transmitted when the calculation time is less than 5 ms. The state in which the input speech signal is input to the encoding means 102A is maintained.
On the other hand, when the computation time is longer than 5 ms, the state is switched to a state where the input speech signal is supplied to the encoding means 102B having a small computation amount.
The voice coded sequence data coded by the coding unit 102 and the coded identification signal given a delay of one frame by the one-frame delay unit 109 are input to the voice packet creation unit 106. The voice packet creation unit 106 receives voice coded sequence data as in the conventional VoIP, and adds a packet header including a packet number, a coding identification signal, time information, etc. for distinguishing the packets. Output data. This is shown in FIG.

音声パケット送信バッファ１０７は、音声パケットデータを入力とし、通常のIPネットワークにおける実時間音声通信の場合と同様に音声パケットデータをバッファリングし、上記フレームと同じ時間間隔で音声パケットデータを出力する。音声パケット送出部１０８は音声パケットデータを入力とし、ネットワークに音声パケットデータを送信する。
図２に本発明による音声パケット受信装置の実施例を示す。この実施例では音声パケット受信部２０１と、音声パケット受信バッファ２０２と、音声符号化列分離部２０３と、復号化選択部２０４と、復号化部２０５とによって音声パケット受信装置２００を構成した場合を示す。 The voice packet transmission buffer 107 receives the voice packet data, buffers the voice packet data as in the case of real-time voice communication in a normal IP network, and outputs the voice packet data at the same time interval as the frame. The voice packet sending unit 108 receives voice packet data as input and sends the voice packet data to the network.
FIG. 2 shows an embodiment of a voice packet receiving apparatus according to the present invention. In this embodiment, a case where the voice packet receiving device 200 is configured by the voice packet receiving unit 201, the voice packet receiving buffer 202, the voice coded sequence separating unit 203, the decoding selecting unit 204, and the decoding unit 205 is shown. Show.

音声パケット受信部２０１は音声パケットデータをネットワークより受信し、受信した音声パケットデータを出力する。
音声パケット受信バッファ２０２は音声パケットデータを入力とし、各パケットに付されたパケット番号に従って並び替えを行ない、格納されている音声パケットの中で最も時間的に早く送信されたパケット番号を持つ音声パケットデータを出力する。
音声符号化列分離部２０３は音声パケットデータを入力とし、図３に示した形式の音声パケットデータの中から、パケットヘッダと音声符号化列データとを分離して取り出す。更に、パケットヘッダから符号化識別信号を取り出し、この符号化識別信号を復号選択部２０４に入力する。復号化選択部２０４では符号化識別信号に従って音声符号化列分離部２０３から入力される音声符号化列データの供給先を決定する。 The voice packet receiving unit 201 receives voice packet data from the network and outputs the received voice packet data.
The voice packet reception buffer 202 receives voice packet data, sorts the packets according to the packet number attached to each packet, and has the packet number transmitted earliest among the stored voice packets. Output data.
The speech coded sequence separation unit 203 receives speech packet data as input, and separates and extracts the packet header and speech coded sequence data from the speech packet data in the format shown in FIG. Further, the encoded identification signal is extracted from the packet header, and this encoded identification signal is input to the decoding selection unit 204. The decoding selection unit 204 determines the supply destination of the speech encoded sequence data input from the speech encoded sequence separation unit 203 according to the encoded identification signal.

つまり、復号化部２０５には演算量が異なる符号化方式で符号化された符号化列データを復号する復号化手段２０５Ａと２０５Ｂを有し、符号化識別信号に従って、これらの復号化手段２０５Ａと２０５Ｂの何れかに符号化列データを入力し、復号化動作を実行させる。ここでは符号化列識別信号が０であれば符号化手段２０５Ａが選択され伝送する情報量が少ないG.729による復号化が実行される。また、符号化識別信号が１であれば復号化手段２０５Ｂが選択され、符号化のための演算量が少なくて済むG.711による復号化が実行され、音声信号が出力される。 That is, the decoding unit 205 includes decoding means 205A and 205B for decoding encoded string data encoded by encoding methods with different calculation amounts, and according to the encoded identification signal, these decoding means 205A and The encoded string data is input to any one of 205B, and the decoding operation is executed. Here, if the coded sequence identification signal is 0, the coding means 205A is selected, and G.729 decoding with a small amount of information to be transmitted is executed. If the encoding identification signal is 1, the decoding unit 205B is selected, decoding based on G.711 that requires a small amount of calculation for encoding is performed, and an audio signal is output.

以上説明したように本発明による音声符号化選択制御方法及びこの制御方法を用いて動作する音声パケット送信装置によれば常時は伝送する情報量が少ない符号化方式G.729で符号化動作を行ない、伝送情報量が少ない状態で通信を行っているものの、送信装置側のコンピュータの演算処理の増大に伴って単位時間当たりの符号化に要する時間が所定値を越え、音切れが発生する限界に近づくと、直ちに演算量が少なくて済む符号化方式G.711に切り替わるように符号化方式の選択制御が実行される。この結果、送信側のコンピュータの付加が増大中は演算量の少ない符号化方式で符号化が行われ、音声パケットの送出間隔は平常時の間隔に維持され、音切れの発生を阻止することができる。 As described above, the voice coding selection control method according to the present invention and the voice packet transmitting apparatus that operates using this control method always perform coding operation using the coding scheme G.729 with a small amount of information to be transmitted. Although the communication is performed with a small amount of transmission information, the time required for encoding per unit time exceeds a predetermined value due to an increase in the calculation processing of the computer on the transmission device side, and the limit of occurrence of sound interruption occurs. When approaching, encoding system selection control is executed so that the system immediately switches to the encoding system G.711, which requires only a small amount of calculation. As a result, while the addition of computers on the transmission side is increasing, encoding is performed with an encoding method with a small amount of calculation, and the transmission interval of voice packets is maintained at a normal interval, thereby preventing sound interruptions. it can.

音声符号化のための演算時間が所定値（5ms）より短い状態に戻ると、符号化方式も元の伝送する情報量が少なくて済む符号化方式に戻される。尚、音声符号化に要する時間が小さい場合に切り換える情報量が少ない符号化方式としてG.711に比べ演算量が多い符号化方式のG.729を適用した場合を説明したが、必ずしもこの符号化方式に限定するものでなく、他の符号化方式を適用することができる。要は、音声符号化に要する時間が小さい場合には、切り換える符号化方式の演算量の大小は問わないとすることである。 When the computation time for speech coding returns to a state shorter than a predetermined value (5 ms), the coding system is also restored to the coding system that requires less information to be transmitted. In addition, although the case where G.729 of an encoding method having a large calculation amount compared with G.711 is applied as an encoding method with a small amount of information to be switched when the time required for speech encoding is small has been described. The present invention is not limited to the method, and other encoding methods can be applied. The point is that when the time required for speech encoding is small, the amount of calculation of the encoding method to be switched does not matter.

図６及び図７に本発明の第２の実施例を示す。この実施例では画像パケットデータの送出状態と受信機能を備えた音声パケット送信装置と受信装置に本発明を適用した実施例を示す。図６は画像パケットデータ送信機能を備えた音声パケット送信装置、図７はその受信装置の実施例を示す。
図６に示す１００は図１を用いて説明した本発明による音声パケット送信装置を示す。従ってこの実施例でも音声パケット送信装置１００はタイマー１０１と、符号化部１０２と、符号化時間計測部１０３と、音声符号化方式選択部１０４と、符号化方式選択部１０５と、音声パケット作成部１０６と、音声パケット送信バッファ１０７と、音声パケット送出部１０８と、１フレーム遅延部１０９とを備え、図１の実施例と同様の音声パケットの送出動作を実行する。 6 and 7 show a second embodiment of the present invention. This embodiment shows an embodiment in which the present invention is applied to a voice packet transmitting apparatus and receiving apparatus having a transmission state and receiving function of image packet data. FIG. 6 shows an audio packet transmitting apparatus having an image packet data transmitting function, and FIG. 7 shows an embodiment of the receiving apparatus.
Reference numeral 100 shown in FIG. 6 denotes the voice packet transmitting apparatus according to the present invention described with reference to FIG. Therefore, also in this embodiment, the voice packet transmitting apparatus 100 includes the timer 101, the coding unit 102, the coding time measuring unit 103, the voice coding method selection unit 104, the coding method selection unit 105, and the voice packet creation unit. 106, a voice packet transmission buffer 107, a voice packet transmission unit 108, and a one-frame delay unit 109, and execute a voice packet transmission operation similar to that of the embodiment of FIG.

この実施例２の特徴とする構成は画像符号化選択部６０４と、この画像符号化選択部６０４が出力する画像符号化制御信号によって画像符号化部６１１の符号化動作を必要に応じて停止状態に制御する符号化動作制御部６１０を設けた点にある。
画像符号化選択部６０４は符号化時間計測部１０３の計測結果を入力とし、音声符号化のための演算時間が所定値以上に達した時点で画像符号化制御信号の状態を反転させ、画像符号化部６１１の符号化動作を停止させる。
つまり、音声符号化方式選択部１０４は図５に示した規則に従って符号化方式の選択を行なう。ここでは音声符号化のための演算時間が例えば5msより長くなると、演算処理量が少ない音声符号化方式（例えばG.711）に切り替えられる。演算処理量が少ない音声符号化に切り替えられることにより、演算時間は短くなる方向に変化する。音声符号化方式が演算量が少なくて済む方式（例えばG.711）で動作していたとしても送信機を構成するコンピュータの負荷が益々増加すると再び音声符号化のための演算時間は再び長くなる方向に変化する。 The configuration characteristic of the second embodiment is that the image encoding selection unit 604 and the image encoding control signal output from the image encoding selection unit 604 stop the encoding operation of the image encoding unit 611 as necessary. The point is that an encoding operation control unit 610 for controlling the control is provided.
The image coding selection unit 604 receives the measurement result of the coding time measuring unit 103 as input, and inverts the state of the image coding control signal when the calculation time for speech coding reaches a predetermined value or more. The encoding unit 611 stops the encoding operation.
That is, the speech coding scheme selection unit 104 selects a coding scheme according to the rules shown in FIG. Here, when the computation time for speech coding becomes longer than 5 ms, for example, the speech coding method (for example, G.711) with a small amount of computation processing is switched. By switching to speech coding with a small amount of calculation processing, the calculation time changes in the direction of shortening. Even if the speech coding method operates with a method that requires a small amount of computation (for example, G.711), the computation time for speech coding becomes longer again as the load on the computer constituting the transmitter increases more and more. Change direction.

本発明では演算量が少ない符号化方式で動作しているにも係わらず、音声符号化のために要する時間（１フレーム分の音声データを符号化する時間）が所定値を越えると画像データの符号化動作を停止させる制御を行わせる。図８に画像データの停止制御を行わせる制御規則を示す。図８に示す例では音声符号化のための演算時間が15msを越えると画像符号化制御信号の論理を１に反転させ、この画像符号化制御信号によって画像符号化動作制御部６１０をオフの状態に制御し、画像符号化手段６１１Ａへの入力画像信号の供給を遮断し、画像符号化手段６１１Ａの符号化動作を停止させる。 In the present invention, when the time required for speech coding (time for coding speech data for one frame) exceeds a predetermined value in spite of operating in a coding method with a small amount of calculation, the image data Control to stop the encoding operation is performed. FIG. 8 shows a control rule for performing stop control of image data. In the example shown in FIG. 8, when the calculation time for speech encoding exceeds 15 ms, the logic of the image encoding control signal is inverted to 1, and the image encoding operation control unit 610 is turned off by this image encoding control signal. The supply of the input image signal to the image encoding unit 611A is cut off, and the encoding operation of the image encoding unit 611A is stopped.

画像符号化手段６１１Ａの符号化動作が停止すると、その符号化動作分だけコンピュータの負荷が軽減され、少なくとも音声パケットの送信状態を維持する。音声符号化のための演算時間が定常状態（15ms以下の状態）に戻ると、画像パケットの送出も再開される。また、音声符号化のための演算時間が更に回復し、5ms以下の状態に戻ると、音声符号化方式も伝送する情報量の少ない例えばG.729の方式に戻される。
画像符号化手段６１１Ａで符号化された画像データは画像パケット作成部６１２に入力され、この画像パケット作成部６１２で時刻データと共に図３に示したパケットヘッダが付加され、パケット化され、画像パケットとして出力される。画像パケット作成部６１２から出力された画像パケットは画像パケット送信バッファ６１３でバッファリングされ、バッファ内に存在する画像パケットの中の最も早い時点でパケット化された画像パケットを出力し、画像パケット送出部６１４からネットワークに送り出される。 When the encoding operation of the image encoding unit 611A is stopped, the load on the computer is reduced by the amount corresponding to the encoding operation, and at least the transmission state of the voice packet is maintained. When the computation time for speech encoding returns to the steady state (state of 15 ms or less), the transmission of the image packet is resumed. Further, when the computation time for speech encoding further recovers and returns to a state of 5 ms or less, the speech encoding method is returned to the G.729 method with a small amount of information to be transmitted.
The image data encoded by the image encoding means 611A is input to the image packet creation unit 612. The image packet creation unit 612 adds the packet header shown in FIG. Is output. The image packet output from the image packet creation unit 612 is buffered by the image packet transmission buffer 613, and the image packet packetized at the earliest time among the image packets existing in the buffer is output, and the image packet transmission unit 614 is sent to the network.

図７に示す音声パケット受信装置２００では図２を用いて説明したのと同様に音声パケットに付加されて送られて来る符号化識別信号に従って復号化手段２０５Ａか２０５Ｂの何れかが選択されて復号動作し、音声信号を再生する。これと共に、画像パケット受信装置７００は画像パケット受信部７０１でネットワークから自己宛の画像パケットを受信し、画像パケット受信バッファ７０２で発信順に並べ替えを行ない、画像符号化分離部７０３で画像符号化データとパケットヘッダとを分離し、画像符号化列データを画像復号化部７０４に送り込み、画像信号を復号する。 In the voice packet receiving apparatus 200 shown in FIG. 7, either one of the decoding means 205A or 205B is selected and decoded in accordance with the coded identification signal sent by being added to the voice packet in the same manner as described with reference to FIG. Operates and plays audio signals. At the same time, the image packet receiving apparatus 700 receives the image packets addressed to itself from the network by the image packet receiving unit 701, rearranges them in the order of transmission by the image packet receiving buffer 702, and converts the encoded image data by the image encoding / separating unit 703. And the packet header are separated, and the image encoded string data is sent to the image decoding unit 704 to decode the image signal.

画像パケット受信装置７００では図６で説明した画像パケット送信装置６００の動作に従って、画像パケットの受信が一時中断する状況が発生するが、画像パケットの受信が中断している状態では、最後に受信した画像信号の表示を維持させることにより再生画像の中断を回避することができる。
本発明の音声パケット送信プログラム及び音声パケット受信プログラムはそれぞれ、コンピュータが解読可能なプログラム言語によって記述され、コンピュータが読み取り可能な例えば磁気ディスク或はCD-ROMのような記録媒体に記録され、これらの記録媒体から或は通信回線を通じてコンピュータにインストールされる。コンピュータにインストールされたプログラムをコンピュータに備えられたCPUに解読させてコンピュータに図１又は図６及び図２又は図７を用いて説明した音声パケット送信装置及び音声パケット受信装置として機能させる。 According to the operation of the image packet transmitting apparatus 600 described with reference to FIG. 6, the image packet receiving apparatus 700 has a situation in which the reception of the image packet is temporarily interrupted. By maintaining the display of the image signal, interruption of the reproduced image can be avoided.
The voice packet transmission program and the voice packet reception program of the present invention are each described in a computer-readable program language and recorded on a computer-readable recording medium such as a magnetic disk or a CD-ROM. It is installed in a computer from a recording medium or through a communication line. A program installed in the computer is decoded by a CPU provided in the computer, and the computer is caused to function as the voice packet transmitting device and the voice packet receiving device described with reference to FIG. 1, FIG. 6, FIG. 2, or FIG.

本発明による音声符号化選択制御方法及びこの制御方法を用いて動作する音声パケット送信装置、音声パケット受信装置は通信会議システムの分野に活用される。 The voice coding selection control method and voice packet transmitting apparatus and voice packet receiving apparatus operating using this control method according to the present invention are used in the field of communication conference systems.

本発明による音声パケット送信装置の第１の実施例を説明するためのブロック図。The block diagram for demonstrating the 1st Example of the voice packet transmitter by this invention. 本発明による音声パケット受信装置の第１の実施例を説明するためのブロック図。The block diagram for demonstrating the 1st Example of the voice packet receiver by this invention. 図１で説明した音声パケットのデータ構造を説明するための図。The figure for demonstrating the data structure of the audio | voice packet demonstrated in FIG. 図１で説明した実施例に用いた符号化方式の特徴を説明するための図。The figure for demonstrating the characteristic of the encoding system used for the Example demonstrated in FIG. 図１で説明した実施例に用いた符号化識別信号による制御規則を説明するための図。The figure for demonstrating the control rule by the encoding identification signal used for the Example demonstrated in FIG. 本発明の音声パケット送信装置の第２の実施例を説明するためのブロック図。The block diagram for demonstrating the 2nd Example of the audio | voice packet transmission apparatus of this invention. 本発明の音声パケット受信装置の第２の実施例を説明するためのブロック図。The block diagram for demonstrating the 2nd Example of the voice packet receiver of this invention. 図６に示した実施例に用いた画像符号化制御信号を説明するための図。The figure for demonstrating the image coding control signal used for the Example shown in FIG.

Explanation of symbols

１００音声パケット送信装置６００画像パケット送信装置
１０１タイマー６０４画像符号化選択部
１０２符号化部６１０符号化動作制御部
１０２Ａ、１０２Ｂ符号化手段６１２画像パケット作成部
１０３符号化時間計測部６１３画像パケット送信バッファ
１０４音声符号化方式選択部６１４画像パケット送出部
１０５符号化方式選択部７００画像パケット受信装置
１０６音声パケット作成部７０１画像パケット受信部
１０７音声パケット送信バッファ
１０８音声パケット送出部７０２画像パケット受信バッファ
２００音声パケット受信装置７０３画像符号化分離部
２０１音声パケット受信部７０４画像復号化部
２０２音声パケット受信バッファ
２０４復号化選択部
２０５復号化部 100 voice packet transmitter 600 image packet transmitter
101 timer 604 image coding selection unit
102 encoding unit 610 encoding operation control unit 102A, 102B encoding unit 612 image packet creation unit
103 Coding time measurement unit 613 Image packet transmission buffer
104 Voice encoding method selection unit 614 Image packet transmission unit
105 Coding method selection unit 700 Image packet receiver
106 Voice packet creation unit 701 Image packet reception unit
107 Voice packet transmission buffer
108 Voice packet transmission unit 702 Image packet reception buffer
200 Voice Packet Receiver 703 Image Coding Separator
201 voice packet receiving unit 704 image decoding unit
202 Voice packet reception buffer
204 Decryption selection unit
205 Decryption unit

Claims

The length of the digital code string obtained by digitizing the speech signal is set to the frame length, the discrete speech tone sample is input, the discrete speech tone signal sample is encoded, and the encoded digital code is packetized and transmitted. A voice coding selection control method for dynamically switching coding processing with different amount of computation required for coding discrete voice musical tone signal samples every predetermined time , in frame length units ,
Measuring the time for speech encoding a speech interval for each predetermined time;
A step of performing a coding process with a small amount of calculation when the time required for the speech coding is longer than a predetermined time;
A speech coding selection control method comprising:

2. The speech coding selection control method according to claim 1, further comprising a step of performing coding processing with a small amount of information to be transmitted when the time required for speech coding is shorter than a predetermined time. Speech coding selection control method.

3. The speech coding selection control method according to claim 1, wherein the speech coding processing with a small amount of computation is performed simultaneously when the computation time is longer than a certain threshold value. A speech coding selection control method comprising a step of performing control to temporarily stop an image coding process being performed.