JP2000069179A

JP2000069179A - Multispot conferencing device and its method, and terminal device for multispot conferencing

Info

Publication number: JP2000069179A
Application number: JP10233014A
Authority: JP
Inventors: Jo Matsui; 丈松井
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1998-08-19
Filing date: 1998-08-19
Publication date: 2000-03-03

Abstract

PROBLEM TO BE SOLVED: To prevent a voice of transmit voice data from degrading in quality by putting together input voice datagenerated without a filtering process by decoding 1st transmitted voice encoded data and transmit voice data to be transmitted, and encoding the composite data into 2nd voice encoded data, and sending the data to a transmission destination different from the transmission source of the 1st voice encoded data. SOLUTION: Terminal devices 11A to 11N respectively fetch multiplexed data D5 through a 1st I/F circuit 18 and obtain 1st voice encoded data D21 and control data D4 at a demultiplexer 19. A control part 15 processes the data D21 on a 3rd decoding basis by placing a 1st decoding part 24 in operation according to the control data D4, and sends the data out to the side of a speaker 33 after performing a filtering process and to a 1st adder 26 without performing the filtering process. The 1st adder 26 puts together the output of the decoding part 24 and the transmit voice data D12 generated by its terminal device, and sends the composite data to a next stage through a 1st encoding part 39... a 2nd I/F circuit 42.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は多地点会議装置及び
その方法に関し、例えばテレビジョン会議システム及び
その端末装置に適用して好適なものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a multipoint conference apparatus and a method therefor, and is suitably applied to, for example, a television conference system and its terminal apparatus.

【０００２】[0002]

【従来の技術】従来、テレビジョン会議システムにおい
ては、多地点にそれぞれ設置された端末装置間において
音声データ及び映像データを送受信することにより当該
音声データに基づく音声及び映像データに基づく映像を
用いて会議を実行し得るようになされている。2. Description of the Related Art Conventionally, in a video conference system, audio data and video data are transmitted and received between terminal devices installed at respective multipoints so that a video based on audio and video data based on the audio data is used. It is designed to be able to hold meetings.

【０００３】図１３は、このようなテレビジョン会議シ
ステム１の構成例を示すものであり、複数の端末装置２
Ａ〜２Ｎがそれぞれ公衆回線網３を介して多地点制御装
置（MCU:Multipoint Control Unit ）４に接続され、こ
れら各端末装置２Ａ〜２Ｎが伝送対象の音声データ及び
映像データを公衆回線網３を介して多地点制御装置４に
伝送し得るようになされている。FIG. 13 shows an example of the configuration of such a television conference system 1.
A to 2N are connected to a multipoint control unit (MCU: Multipoint Control Unit) 4 via the public line network 3, and each of the terminal devices 2A to 2N transmits audio data and video data to be transmitted to the public line network 3. The data can be transmitted to the multipoint control device 4 via the multipoint control device 4.

【０００４】多地点制御装置４は、各端末装置２Ａ〜２
Ｎから与えられた音声データのうち、順次１つの端末装
置２Ａ〜２Ｎを除く他の各端末装置２Ａ〜２Ｎに対応す
る音声データを加算処理して合成すると共に、各端末装
置２Ａ〜２Ｎから与えられた映像データのうち、当該１
つの端末装置２Ａ〜２Ｎを除く他の各端末装置２Ａ〜２
Ｎに対応する映像データを加算処理して合成する。[0004] The multipoint control device 4 includes terminal devices 2A to 2A.
N, the voice data corresponding to each of the other terminal devices 2A to 2N except for one terminal device 2A to 2N are sequentially added and synthesized, and the voice data supplied from each of the terminal devices 2A to 2N. Of the received video data,
Each of the other terminal devices 2A to 2 except for the two terminal devices 2A to 2N
The video data corresponding to N is added and combined.

【０００５】そして多地点制御装置４は、合成した音声
データ及び対応する合成した映像データをこれらを合成
する際に除いた音声データ及び映像データに対応する端
末装置２Ａ〜２Ｎに公衆回線網３を介して伝送する。[0005] The multipoint control device 4 connects the public line network 3 to the terminal devices 2A to 2N corresponding to the audio data and the video data which are excluded when synthesizing the synthesized audio data and the corresponding synthesized video data. To be transmitted through.

【０００６】これにより各端末装置２Ａ〜２Ｎは、多地
点制御装置４から与えられた合成された音声データに基
づく合成音声をスピーカから放音させると共に、合成さ
れた映像データに基づく複数の映像をモニタに一括表示
させ、かくして自端末装置２Ａ〜２Ｎを除く他の各端末
装置２Ａ〜２Ｎにおいて得られた音声データに基づく音
声及び映像データに基づく映像を視聴することができ
る。Accordingly, each of the terminal devices 2A to 2N emits a synthesized voice based on the synthesized voice data supplied from the multipoint control device 4 from a speaker, and simultaneously outputs a plurality of images based on the synthesized video data. It is possible to collectively display the images on the monitor, and thus view the audio based on the audio data and the video based on the video data obtained in each of the other terminal devices 2A to 2N except the own terminal devices 2A to 2N.

【０００７】ところでかかる構成のテレビジョン会議シ
ステム１においては、多地点制御装置４を用いて各端末
装置２Ａ〜２Ｎ間を接続することからシステム全体の構
成が煩雑になるため、従来、テレビジョン会議システム
として、図１４に示すように構成されたものもある。In the video conference system 1 having such a configuration, since the multipoint control device 4 is used to connect the terminal devices 2A to 2N, the configuration of the entire system becomes complicated. Some systems are configured as shown in FIG.

【０００８】かかるテレビジョン会議システム５におい
ては、複数の端末装置６Ａ〜６Ｄ間をそれぞれ公衆回線
網７を介して直接接続し、各端末装置６Ａ〜６Ｄがそれ
ぞれ伝送対象の音声データ及び映像データを公衆回線網
７を介して他の各端末装置６Ａ〜６Ｄに伝送する。In the television conference system 5, a plurality of terminal devices 6A to 6D are directly connected via a public line network 7, and the terminal devices 6A to 6D transmit audio data and video data to be transmitted, respectively. The data is transmitted to the other terminal devices 6A to 6D via the public line network 7.

【０００９】そして各端末装置６Ａ〜６Ｄは、他の各端
末装置６Ａ〜６Ｄから与えられた各音声データを合成し
た後、この合成した音声データに基づく合成音声をスピ
ーカから放音させると共に、他の各端末装置６Ａ〜６Ｄ
から与えられた各映像データを合成した後、この合成し
た映像データに基づく複数の映像をモニタに一括表示さ
せ、かくして自端末装置６Ａ〜６Ｄを除く他の各端末装
置６Ａ〜６Ｄにおいて得られた音声データに基づく音声
及び映像データに基づく映像を視聴することができる。[0009] Each of the terminal devices 6A to 6D synthesizes the respective voice data given from the other terminal devices 6A to 6D, and then emits a synthesized voice based on the synthesized voice data from the speaker, and Terminal devices 6A to 6D
After synthesizing each of the video data provided from the above, a plurality of videos based on the synthesized video data are collectively displayed on a monitor, and thus obtained in each of the other terminal devices 6A to 6D except the own terminal devices 6A to 6D. Audio based on audio data and video based on video data can be viewed.

【００１０】[0010]

【発明が解決しようとする課題】ところでかかる構成の
テレビジョン会議システム５においては、多地点制御装
置を除いて各端末装置６Ａ〜６Ｄ間を公衆回線網７を介
して接続することによりシステム全体の構成を簡易化し
得る利点があるものの、システム構築の際の各端末装置
６Ａ〜６Ｄ間の接続が煩雑になる問題があった。By the way, in the video conference system 5 having such a configuration, the terminal systems 6A to 6D are connected via the public line network 7 except for the multipoint control device, so that the whole system can be controlled. Although there is an advantage that the configuration can be simplified, there is a problem that the connection between the terminal devices 6A to 6D becomes complicated when the system is constructed.

【００１１】またこのテレビジョン会議システム５にお
いては、各端末装置６Ａ〜６Ｄにそれぞれ公衆回線網７
を介して他の各端末装置６Ａ〜６Ｄを接続するためのイ
ンターフェイス回路を設けることから、このインターフ
ェイス回路の数よりも多い複数の端末装置間を接続する
ように要望された場合にはこれに容易には対応し難い問
題があった。In the television conference system 5, a public line network 7 is connected to each of the terminal devices 6A to 6D.
Is provided with an interface circuit for connecting each of the other terminal devices 6A to 6D via the interface. Therefore, when it is requested to connect a plurality of terminal devices larger than the number of the interface circuits, this can be easily performed. Had difficulties to deal with.

【００１２】このためかかる問題を解決するテレビジョ
ン会議システムとして、従来、複数の端末装置を公衆回
線網を介して順次直列に従属接続し、各端末装置におい
て伝送対象の音声データを公衆回線網を介して順次次段
及び前段の双方向に合成しながら伝送すると共に、伝送
対象の映像データを公衆回線網を介して順次次段及び前
段の双方向に合成しながら伝送することが考えられてお
り、この構成によればシステム構築の際に各端末装置間
を容易に接続し得ると共に、端末装置の増設も容易に実
行し得る利点がある。For this reason, as a television conference system which solves this problem, conventionally, a plurality of terminal devices are sequentially connected in series via a public line network, and each terminal device transmits audio data to be transmitted to the public line network. It is considered that the video data to be transmitted is transmitted while being sequentially combined in the next and previous stages through a public line network while being combined in the next and previous stages in both directions. According to this configuration, there is an advantage that the terminal devices can be easily connected to each other at the time of constructing the system, and the addition of the terminal devices can be easily performed.

【００１３】実際上かかる構成のテレビジョン会議シス
テムにおいて、各端末装置には、データ伝送時、次段及
び前段の端末装置から音声データ（以下、これを入力音
声データと呼ぶ）を所定の符号化方式によって符号化処
理することにより得られた符号化データ（以下、これを
第１の音声符号化データと呼ぶ）が公衆回線網を介して
与えられると共に、映像データ（以下、これを入力映像
データと呼ぶ）を所定の符号化方式によって符号化処理
することにより得られた符号化データ（以下、これを第
１の映像符号化データと呼ぶ）が公衆回線網を介して与
えられる。In a television conference system having such a configuration, each terminal device encodes audio data (hereinafter referred to as input audio data) from the next and previous terminal devices in a predetermined manner during data transmission. Encoded data (hereinafter, referred to as first audio encoded data) obtained by performing encoding processing according to the system is provided via a public line network, and video data (hereinafter, referred to as input video data). ) (Hereinafter, referred to as first coded video data) obtained by performing coding processing on the video data according to a predetermined coding method.

【００１４】そして各端末装置は、この第１の音声符号
化データを所定の復号化方式によって復号化処理し、得
られた入力音声データを自端末装置において生成した伝
送対象の音声データ（以下、これを伝送音声データと呼
ぶ）と合成した後、所定の符号化方式によって符号化処
理し、得られた符号化データ（以下、これを第２の音声
符号化データと呼ぶ）を公衆回線網を介して次段及び前
段の端末装置に伝送すると共に、第１の映像符号化デー
タを所定の復号化方式によって復号化処理し、得られた
入力映像データを自端末装置において生成した伝送対象
の映像データ（以下、これを伝送映像データと呼ぶ）と
合成した後、所定の符号化方式によって符号化処理し、
得られた符号化データ（以下、これを第２の映像符号化
データと呼ぶ）を公衆回線網を介して次段及び前段の端
末装置に伝送する。Each of the terminal devices decodes the first coded audio data by a predetermined decoding method, and obtains the obtained input audio data in the transmission target audio data (hereinafter, referred to as “hereafter”) generated in its own terminal device. This is referred to as transmission voice data), and is subjected to coding processing according to a predetermined coding method. The obtained coded data (hereinafter referred to as second coded voice data) is transmitted to a public line network. Video data to be transmitted to the next and previous terminal devices via the first terminal, and the first video encoded data is decoded by a predetermined decoding method, and the obtained input video data is generated by the own terminal device as a transmission target video. After synthesizing with data (hereinafter referred to as transmission video data), the data is encoded by a predetermined encoding method,
The obtained encoded data (hereinafter, referred to as second encoded video data) is transmitted to the next and previous terminal devices via the public network.

【００１５】ここでかかる構成のテレビジョン会議シス
テムにおいては、入力音声データの伝送を考えた場合、
それぞれ圧縮率の異なるパルス符号変調（Pulse Code M
odulation ）のＡ法則符号化法（A-Law ）と呼ばれる符
号化方式（以下、これを第１の符号化方式と呼ぶ）及び
μ法則符号化法（μ-Law）と呼ばれる符号化方式（以
下、これを第２の符号化方式と呼ぶ）や、国際電気通信
連合電気通信標準化部門（ITU-T:International Teleco
mmunication Union-Telecommunication Standardizatio
n Sector）によって標準化された低遅延符号励振型線形
予測（LD-CELP:Low Delay-Code Excited Linear Predic
tion）と呼ばれる符号化方式（以下、これを第３の符号
化方式と呼ぶ）及び帯域分割適応差分パルス符号変調
（SB-ADPCM:Sub-band Adaptive Differential Pulse Co
de Modulation ）と呼ばれる符号化方式（以下、これを
第４の符号化方式と呼ぶ）のうち、所望する符号化方式
を用いて入力音声データを符号化処理するようになされ
ている。Here, in the video conference system having such a configuration, when transmission of input audio data is considered,
Pulse code modulation with different compression ratios (Pulse Code M
coding method called A-law coding method (A-Law) (hereinafter referred to as a first coding method) and coding method called μ-law coding method (μ-Law) , This is referred to as a second encoding method), and the International Telecommunication Union Telecommunication Standardization Sector (ITU-T: International Teleco
mmunication Union-Telecommunication Standardizatio
Low Delay-Code Excited Linear Predic (LD-CELP)
) (hereinafter referred to as a third coding method) and Sub-band Adaptive Differential Pulse Coding (SB-ADPCM).
The input audio data is encoded using a desired encoding method among encoding methods called “de Modulation” (hereinafter, this is referred to as a fourth encoding method).

【００１６】ところで第１〜第４の符号化方式のうち、
第３の符号化方式によって符号化処理して得られた第１
の音声符号化データを復号化処理する場合には、一般に
対応する所定の復号化方式により合成フィルタを用いて
この第１の音声符号化データから元の入力音声データを
生成するものの、このようにして得られた入力音声デー
タには聴感的なノイズやひずみ（以下、これをまとめて
雑音と呼ぶ）が発生していることから、通常、この入力
音声データをポストフィルタに通してフィルタリング処
理を施すことによりその雑音を除去している。By the way, among the first to fourth encoding methods,
The first obtained by performing the encoding process according to the third encoding method
When decoding the audio encoded data of the first, the original input audio data is generated from the first audio encoded data by using a synthesis filter by a corresponding predetermined decoding method. Since the input voice data thus obtained contains audible noises and distortions (hereinafter collectively referred to as noise), the input voice data is usually subjected to a filtering process through a post filter. This eliminates the noise.

【００１７】ところが複数の端末装置において第３の符
号化方式が選択されることにより、順次次段及び前段の
端末装置において入力音声データを繰り返しポストフィ
ルタに通した場合には、このポストフィルタの作用によ
って当該入力音声データが順次減衰し、この結果入力音
声データに基づく音声の品質が著しく劣化する問題があ
った。However, when the third encoding method is selected in a plurality of terminal devices, when the input audio data is repeatedly passed through the post-filter in the next-stage and preceding-stage terminal devices, the operation of the post-filter is performed. As a result, the input voice data is sequentially attenuated, and as a result, the quality of voice based on the input voice data is significantly deteriorated.

【００１８】本発明は以上の点を考慮してなされたもの
で、伝送対象の入力音声データを当該入力音声データに
基づく音声の品質の劣化を防止して伝送し得る多地点会
議装置及びその方法並びに多地点会議用端末装置を提案
しようとするものである。The present invention has been made in view of the above points, and a multipoint conference apparatus and method capable of transmitting input voice data to be transmitted while preventing deterioration in voice quality based on the input voice data. In addition, a terminal device for a multipoint conference is proposed.

【００１９】[0019]

【課題を解決するための手段】かかる課題を解決するた
め本発明においては、多地点にそれぞれ設置された多地
点会議用端末装置と、各多地点会議用端末装置間を順次
従属接続する回線網とを設けるようにし、各多地点会議
用端末装置は、前段及び又は次段の各多地点会議用端末
装置から回線網を介して伝送された第１の音声符号化デ
ータを取り込んで出力する取込み手段と、取込み手段か
ら出力された第１の音声符号化データを復号化処理し、
得られた入力音声データを当該入力音声データに基づく
音声を放音させるために雑音を除去するフィルタリング
処理を施して出力すると共に、当該フィルタリング処理
を施さずに出力する復号化手段と、伝送対象の伝送音声
データを生成して出力する音声データ生成手段と、復号
手段からフィルタリング処理が施されずに出力された入
力音声データと、音声データ生成手段から出力された伝
送音声データとを合成し、得られた出力音声データを出
力する合成手段と、合成手段から出力された出力音声デ
ータを符号化処理し、得られた第２の音声符号化データ
を出力する符号化手段と、符号化手段から出力された第
２の音声符号化データを、対応する第１の音声符号化デ
ータの伝送元と異なる前段及び又は次段の各多地点会議
用端末装置に回線網を介して伝送する伝送手段とを有す
るようにした。According to the present invention, there is provided a multi-point conference terminal installed at a multi-point, and a line network for sequentially subordinately connecting the multi-point conference terminal. The multipoint conference terminal device captures and outputs the first voice encoded data transmitted from the preceding and / or next stage multipoint conference terminal device via the network. Means for decoding the first encoded audio data output from the capturing means;
Decoding means for performing filtering processing for removing noise in order to emit the voice based on the input voice data, and outputting the input voice data without performing the filtering processing; and Audio data generation means for generating and outputting transmission audio data; input audio data output without filtering processing from the decoding means; and transmission audio data output from the audio data generation means, and synthesized. Synthesizing means for outputting the obtained output audio data, encoding means for encoding the output audio data output from the synthesizing means, and outputting the obtained second audio encoded data, and output from the encoding means. The obtained second voice-encoded data is connected to the preceding and / or subsequent multipoint conference terminal devices different from the transmission source of the corresponding first voice-encoded data. And to have a transmitting means for transmitting through.

【００２０】この結果、伝送される入力音声データが順
次各多地点会議用端末装置において繰り返されるフィル
タリング処理の作用により減衰することを防止すること
ができる。As a result, it is possible to prevent the input voice data to be transmitted from attenuating due to the effect of the filtering process repeated in each multipoint conference terminal device.

【００２１】また本発明においては、多地点にそれぞれ
設置された多地点会議用端末装置を所定の回線網を介し
て順次従属接続する第１のステップと、各多地点会議用
端末装置において、前段及び又は次段の各多地点会議用
端末装置から回線網を介して伝送された第１の音声符号
化データを復号化処理し、得られた入力音声データを当
該入力音声データに基づく音声を放音させるために雑音
を除去するフィルタリング処理を施して出力すると共
に、当該フィルタリング処理を施さずに出力する第２の
ステップと、伝送対象の伝送音声データを生成し、当該
生成した伝送音声データと、フィルタリング処理が施さ
れずに出力された入力音声データとを合成し、得られた
出力音声データを符号化処理することにより第２の音声
符号化データを生成する第３のステップと、第２の音声
符号化データを、対応する第１の音声符号化データの伝
送元と異なる前段及び又は次段の各多地点会議用端末装
置に回線網を介して伝送する第４のステップとを設ける
ようにした。Also, in the present invention, a first step of sequentially cascade-connecting the multipoint conference terminal devices respectively installed at the multipoint via a predetermined network, and the first step in each multipoint conference terminal device And / or decoding the first coded audio data transmitted from each multipoint conference terminal device at the next stage via the circuit network, and releasing the obtained input audio data based on the input audio data. A second step of performing filtering processing for removing noise in order to make the sound sound and outputting the filtered sound without performing the filtering processing, and generating transmission audio data to be transmitted, and generating the transmission audio data. Second audio encoded data is generated by synthesizing the input audio data output without being subjected to the filtering processing and encoding the obtained output audio data. And transmitting the second voice-encoded data to each of the preceding and / or subsequent multipoint conference terminal apparatuses different from the transmission source of the corresponding first voice-encoded data via the network. And a fourth step of performing the operation.

【００２２】この結果、伝送される入力音声データが順
次各多地点会議用端末装置において繰り返されるフィル
タリング処理の作用により減衰することを防止すること
ができる。As a result, it is possible to prevent the input voice data to be transmitted from being attenuated by the effect of the filtering process repeated in each multipoint conference terminal device.

【００２３】さらに本発明においては、外部から与えら
れる第１の音声符号化データを復号化処理し、得られた
入力音声データを当該入力音声データに基づく音声を放
音させるために雑音を除去するフィルタリング処理を施
して出力すると共に、当該フィルタリング処理せずに出
力する復号化手段と、伝送対象の伝送音声データを生成
して出力する音声データ生成手段と、復号化手段からフ
ィルタリング処理が施されずに出力された入力音声デー
タと、音声データ生成手段から出力された伝送音声デー
タとを合成し、得られた出力音声データを出力する合成
手段と、合成手段から出力された出力音声データを符号
化処理する符号化手段とを設けるようにした。Further, in the present invention, decoding processing is performed on externally supplied first voice encoded data, and noise is removed from the obtained input voice data so as to emit a voice based on the input voice data. Decoding means that performs filtering processing and outputs the data without performing the filtering processing, audio data generating means that generates and outputs transmission audio data to be transmitted, and filtering processing that is not performed by the decoding means. A synthesizing unit for synthesizing the input audio data output to the unit and the transmission audio data output from the audio data generating unit, and outputting the obtained output audio data; and encoding the output audio data output from the synthesizing unit. An encoding means for processing is provided.

【００２４】この結果、所定の回線網を介して順次従属
接続された複数の多地点会議用端末装置において順次繰
り返されるフィルタリング処理の作用により伝送される
入力音声データが減衰することを防止することができ
る。As a result, it is possible to prevent input voice data transmitted by a plurality of multipoint conference terminals sequentially connected in cascade via a predetermined line network from being attenuated by the action of the filtering process which is sequentially repeated. it can.

【００２５】[0025]

【発明の実施の形態】以下図面について、本発明の一実
施の形態を詳述する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below in detail with reference to the drawings.

【００２６】（１）本実施の形態によるテレビジョン会
議システムの構成図１において、１０は全体として本発明を適用したテレ
ビジョン会議システムを示し、多地点にそれぞれ設置さ
れた複数の端末装置１１Ａ〜１１Ｎがサービス総合ディ
ジタル網（ISDN:Integrated Services Digital Networ
k）等の公衆回線網１２を介して順次直列に従属接続さ
れて構成されている。(1) Configuration of Video Conference System According to the Present Embodiment In FIG. 1, reference numeral 10 denotes a video conference system to which the present invention is applied as a whole, and a plurality of terminal devices 11A to 11A installed at multiple points. 11N is an Integrated Services Digital Network (ISDN)
k) and the like are sequentially connected in series via a public line network 12.

【００２７】各端末装置１１Ａ〜１１Ｎにおいては、図
２に示すように、制御部１５と、各種制御情報を入力し
得るキーボード等のユーザインターフェイス１６と、対
応する端末装置１１Ａ〜１１Ｎを動作させるための所定
のプログラムが予め記憶された不揮発性のメモリ１７と
が設けられ、制御部１５がユーザインターフェイス１６
を介して入力された各種制御情報に基づいてメモリ１７
内のプログラムを起動させ、当該起動させたプログラム
に基づいて対応する端末装置１１Ａ〜１１Ｎ全体を制御
する。In each of the terminal devices 11A to 11N, as shown in FIG. 2, a control unit 15, a user interface 16 such as a keyboard for inputting various control information, and a corresponding terminal device 11A to 11N are operated. And a non-volatile memory 17 in which a predetermined program is stored in advance.
Memory 17 based on various control information input through
Of the terminal devices 11A to 11N based on the activated programs.

【００２８】そして各端末装置１１Ａ〜１１Ｎには、デ
ータ伝送時、前段及び次段の端末装置１１Ａ〜１１Ｎと
の間がそれぞれ回線接続されると、当該前段の端末装置
１１Ａ〜１１Ｎから第１の音声符号化データＤ２₁と、
第₁の映像符号化データＤ３₁と、前段の端末装置１１
Ａ〜１１Ｎ間において共通の動作モードを設定するため
の制御データＤ４とを時分割多重化することにより得ら
れた多重化データＤ５が公衆回線網１２を介して伝送さ
れ、これを第１のインターフェイス回路１８に取り込
む。When each of the terminal devices 11A to 11N is connected to a line between the previous and next terminal devices 11A to 11N at the time of data transmission, the first terminal device 11A to 11N transmits a first signal to the first terminal device 11A to 11N. the speech encoded data D2 _1,
The _first encoded video data D3 ₁ and the terminal device 11 in the preceding stage
A multiplexed data D5 obtained by time-division multiplexing control data D4 for setting a common operation mode among A to 11N is transmitted via the public line network 12, and transmitted to the first interface. It is taken into the circuit 18.

【００２９】第１のインターフェイス回路１８は、取り
込んだ多重化データＤ５に対応する伝送元の前段の端末
装置１１Ａ〜１１Ｎを表す第１の識別情報を付加し、こ
れをデマルチプレクサ１９に送出する。The first interface circuit 18 adds the first identification information representing the terminal equipment 11A to 11N at the preceding stage of the transmission source corresponding to the multiplexed data D5 taken in, and sends it to the demultiplexer 19.

【００３０】デマルチプレクサ１９は、第１のインター
フェイス回路１８から多重化データＤ５が与えられる
と、これに付加された第１の識別情報を検出した後、当
該多重化データＤ５を第１の音声符号化データＤ２
₁と、第１の映像符号化データＤ３₁と、制御データＤ
４とに分離する。Upon receiving the multiplexed data D5 from the first interface circuit 18, the demultiplexer 19 detects the first identification information added thereto, and then converts the multiplexed data D5 to the first audio code. Data D2
_1, the first video coded data D3 _1, control data D
And 4.

【００３１】そしてデマルチプレクサ１９は、この第１
の識別情報に基づいて、第１の音声符号化データＤ２₁
を前段の端末装置１１Ａ〜１１Ｎに予め割り当てられた
第１の識別番号（例えば「０」）を付加して第１及び第
２の音声処理部２０及び２１の第１及び第２の認識器２
２及び２３に送出すると共に、第１の映像符号化データ
Ｄ３₁を第１の識別番号を付加して図示しない映像処理
部に送出し、また制御データＤ４を第１の識別番号を付
加して制御部１５に送出する。Then, the demultiplexer 19
Based on the identification information of the first audio encoded data D2 ₁
Is added to a first identification number (for example, “0”) previously assigned to the terminal devices 11A to 11N at the preceding stage, and the first and second recognizers 2 of the first and second voice processing units 20 and 21 are added.
Sends out the 2 and 23, the first video coded data D3 ₁ sends to the image processing unit (not shown) by adding a first identification number and the control data D4 by adding a first identification number It is sent to the control unit 15.

【００３２】この場合第１の認識器２２は、デマルチプ
レクサ１９から与えられた第１の音声符号化データＤ２
₁に付加された識別番号を認識し、当該認識結果に基づ
いて、予め指定された第１の識別番号が付加された第１
の音声符号化データＤ２₁のみを選択し、第１の識別番
号を取り除いて第１の復号化部２４に送出する。In this case, the first recognizer 22 outputs the first coded speech data D2 given from the demultiplexer 19.
Recognize the identification number added to ₁ and, based on the result of the recognition, determine whether the first identification number specified in advance is added to the first identification number.
Selecting only the speech encoded data D2 _1, and sends the first decoding unit 24 by removing the first identification number.

【００３３】そして第１の復号化部２４は、第１の認識
器２２から与えられた第１の音声符号化データＤ２₁を
所定の復号化方式によって復号化処理し、得られた入力
音声データＤ１₁を第１のレートコンバータ２５を介し
て第１の加算器２６に送出すると共に、第２及び第３の
レートコンバータ２７及び２８を順次介して第３の加算
器２９に送出する。[0033] The first decoding unit 24, a first encoded audio data D2 ₁ supplied from the first recognizer 22 processes decoded by a predetermined decoding method, resulting input audio data the D1 ₁ sends out to the first adder 26 via a first rate converter 25, and sends to the third adder 29 sequentially through the second and third rate converters 27 and 28.

【００３４】このとき各端末装置１１Ａ〜１１Ｎにおい
ては、マイクロフォン３０を介して音声を集音すること
により伝送対象の音声信号Ｓ1 を得て、これをアナログ
／ディジタル変換器３１を介して伝送音声データＤ１₂
に変換した後、エコーキャンセラ３２に送出する。At this time, in each of the terminal devices 11A to 11N, a sound signal S1 to be transmitted is obtained by collecting sound through the microphone 30, and this is transmitted through the analog / digital converter 31. D1 ₂
After that, it is sent to the echo canceller 32.

【００３５】エコーキャンセラ３２は、アナログ／ディ
ジタル変換器３１から与えられた伝送音声データＤ１₂
を、音響結合（スピーカ３３から放音された音声がマイ
クロホン３０に混入すること）によって発生するエコー
やハウリングを抑止した後、第４のレートコンバータ３
４及び３５を介して第１及び第２の加算器２６及び３６
に送出する。The echo canceller 32, the transmission voice data D1 ₂ provided from the analog / digital converter 31
Is suppressed by the acoustic coupling (the sound emitted from the speaker 33 is mixed into the microphone 30) and the fourth rate converter 3
First and second adders 26 and 36 via 4 and 35
To send to.

【００３６】これにより第１の加算器２６は、第４のレ
ートコンバータ３４を介して与えられた、自端末装置１
１Ａ〜１１Ｎにおいて生成した伝送音声データＤ１
₂と、第１のレートコンバータ２５を介して与えられ
た、前段の端末装置１１Ａ〜１１Ｎから得られた入力音
声データＤ１₁とを加算処理して合成し、得られた音声
データ（以下、これを出力音声データと呼ぶ）Ｄ１₃を
第１のセレクタ３７及び第５のレートコンバータ３８を
順次介して第１の符号化部３９に送出する。Thus, the first adder 26 receives the signal from the own terminal 1 via the fourth rate converter 34.
Transmission audio data D1 generated in 1A to 11N
_2, supplied through a first rate converter 25, by adding processing the input audio data D1 ₁ obtained from the preceding terminal device 11A~11N synthesized speech obtained data (hereinafter, the output is referred to as audio data) D1 ₃ successively through the first selector 37 and the fifth rate converter 38 is sent to a first encoding unit 39.

【００３７】第１の符号化部３９は、第５のレートコン
バータ３８を介して与えられた出力音声データＤ１₃を
所定の符号化方式によって符号化処理し、得られた第２
の音声符号化データＤ２₂を第１の付加器４０に送出す
る。The first encoding unit 39, the output audio data D1 ₃ supplied through the fifth rate converter 38 to process encoded by a predetermined encoding method, the resulting second
The speech encoded data D2 ₂ delivered to a first adder 40.

【００３８】第１の付加器４０は、第１の符号化部３９
から与えられた第２の音声符号化データＤ２₂に対応す
る伝送先の次段の端末装置１１Ａ〜１１Ｎに予め割り当
てられた第２の識別番号（例えば「１」）を付加し、こ
れをマルチプレクサ４１に送出する。The first adder 40 includes a first encoding unit 39
, A second identification number (for example, "1") assigned in advance to the next terminal device 11A to 11N at the transmission destination corresponding to the _second encoded voice data D22 given by Send to 41.

【００３９】この場合マルチプレクサ４１には、映像処
理部から前段の端末装置１１Ａ〜１１Ｎから得られた入
力映像データと、自端末装置１１Ａ〜１１Ｎにおいて生
成した伝送対象の伝送映像データとを合成した後、所定
の符号化方式によって符号化処理し、かつ第２の識別番
号を付加して得られた第２の映像符号化データＤ３₂が
与えられると共に、制御部１５から所定の制御データＤ
４が与えられており、これによりマルチプレクサ４１
は、この第２の映像符号化データＤ３₂と、制御データ
Ｄ４と、第１の付加器４０から与えられた第２の音声符
号化データＤ２₂とを時分割多重化することにより多重
化データＤ５を生成する。In this case, the multiplexer 41 combines the input video data obtained from the terminal devices 11A to 11N in the preceding stage from the video processing unit with the transmission video data to be transmitted generated in the own terminal devices 11A to 11N. , predetermined processing encoded by the coding scheme, and the second with the second video coded data D3 ₂ is obtained by adding the identification number given, the control from the control unit 15 predetermined data D
4 is provided so that the multiplexer 41
Is multiplexed data and the second video coded data D3 _2, the control data D4, by time multiplexing the second and the speech encoded data D2 ₂ given from the first adder 40 Generate D5.

【００４０】またマルチプレクサ４１は、このとき第２
の音声符号化データＤ２₂及び第２の映像符号化データ
Ｄ３₂に付加されている第２の識別番号を検出し、当該
検出結果に基づいて対応する多重化データＤ５をこれに
含まれる第２の識別番号を取り除いた後、第２のインタ
ーフェイス回路４２及び公衆回線網１２を順次介して対
応する伝送先の次段の端末装置１１Ａ〜１１Ｎに伝送す
る。At this time, the multiplexer 41
The second identification number added to the audio encoded data D2 ₂ and the second video encoded data D3 ₂ is detected, and based on the detection result, the corresponding multiplexed data D5 is included in the second encoded data D5. After the identification number is removed, the data is transmitted to the next terminal device 11A to 11N at the corresponding transmission destination via the second interface circuit 42 and the public line network 12 sequentially.

【００４１】一方、各端末装置１１Ａ〜１１Ｎには、次
段の端末装置１１Ａ〜１１Ｎから第１の音声符号化デー
タＤ２₃と、第１の映像符号化データＤ３₃と、制御デ
ータＤ４とを時分割多重化することにより得られた多重
化データＤ５が公衆回線網１２を介して伝送され、これ
を第２のインターフェイス回路４２に取り込む。On the other hand, each terminal device 11A to 11N, a first speech encoded data D2 ₃ from the next stage of the terminal unit 11A to 11N, the first video coded data D3 _3, a control data D4 The multiplexed data D5 obtained by time-division multiplexing is transmitted via the public line network 12, and is taken into the second interface circuit 42.

【００４２】第２のインターフェイス回路４２は、取り
込んだ多重化データＤ５に対応する伝送元の次段の端末
装置１１Ａ〜１１Ｎを表す第２の識別情報を付加し、こ
れをデマルチプレクサ１９に送出する。The second interface circuit 42 adds second identification information indicating the next terminal device 11A to 11N of the transmission source corresponding to the multiplexed data D5 taken in, and sends it to the demultiplexer 19. .

【００４３】デマルチプレクサ１９は、第２のインター
フェイス回路４２から多重化データＤ５が与えられる
と、これに付加された第２の識別情報を検出し、当該多
重化データＤ５を第１の音声符号化データＤ２₃と、第
１の映像符号化データＤ３₃と、制御データＤ４とに分
離する。When the multiplexed data D5 is supplied from the second interface circuit 42, the demultiplexer 19 detects the second identification information added thereto, and converts the multiplexed data D5 into the first voice encoded data. and data D2 _3, the first video coded data D3 _3, separates the control data D4.

【００４４】そしてデマルチプレクサ１９は、この第２
の識別情報に基づいて、第１の音声符号化データＤ２₃
を第２の識別番号を付加して第１及び第２の認識器２２
及び２３に送出すると共に、第１の映像符号化データＤ
３₃を第２の識別番号を付加して図示しない映像処理部
に送出し、また制御データＤ４を第２の識別番号を付加
して制御部１５に送出する。The demultiplexer 19 outputs the second
Based on the identification information of the first audio encoded data D2 ₃
To the first and second recognizers 22 by adding a second identification number.
And 23, and the first video encoded data D
3 ₃ was sent to the second video processing unit (not shown) by adding the identification number, also sends control data D4 to the second adds the identification number controller 15.

【００４５】この場合第２の認識器２３は、デマルチプ
レクサ１９から与えられた第１の音声符号化データＤ２
₃に付加された識別番号を認識し、当該認識結果に基づ
いて、予め指定された第２の識別番号が付加された第１
の音声符号化データＤ２₃のみを選択し、第２の識別番
号を取り除いて第２の復号化部４３に送出する。In this case, the second recognizer 23 outputs the first encoded speech data D2 given from the demultiplexer 19.
_3. Recognize the identification number added to ₃ and, based on the recognition result, specify the first identification number
Selecting only the speech encoded data D2 _3, is sent to the second decoding unit 43 by removing the second identification number.

【００４６】そして第２の復号化部４３は、第２の認識
器２３から与えられた第１の音声符号化データＤ２₃を
所定の復号化方式によって復号化処理し、得られた入力
音声データＤ１₄を第１のレートコンバータ４４を介し
て第２の加算器３６に送出すると共に、第２及び第３の
レートコンバータ４５及び４６を順次介して第３の加算
器２９に送出する。[0046] The second decoding unit 43, a first speech encoded data D2 ₃ supplied from the second recognizer 23 processes decoded by a predetermined decoding method, resulting input audio data the D1 ₄ sends out the second adder 36 through a first rate converter 44, and sends to the third adder 29 sequentially through the second and third rate converters 45 and 46.

【００４７】このとき第２の加算器３６は、第４のレー
トコンバータ３５を介して与えられた伝送音声データＤ
１₂と、第１のレートコンバータ４４を介して与えられ
た、次段の端末装置１１Ａ〜１１Ｎから得られた入力音
声データＤ１₄とを加算処理して合成し、得られた出力
音声データＤ１₅を第２のセレクタ４７及び第５のレー
トコンバータ４８を順次介して第２の符号化部４９に送
出する。At this time, the second adder 36 transmits the transmission audio data D given through the fourth rate converter 35.
And 1 _2, was given via a first rate converter 44, by adding processing the input audio data D1 ₄ obtained from the next stage of the terminal device 11A~11N synthesized, obtained output audio data D1 ₅ is transmitted to the second encoding unit 49 via the second selector 47 and the fifth rate converter 48 in order.

【００４８】第２の符号化部４９は、第５のレートコン
バータ４８を介して与えられた出力音声データＤ１₅を
所定の符号化方式によって符号化処理し、得られた第２
の音声符号化データＤ２₄を第２の付加器５０に送出す
る。The second encoding unit 49, the output audio data D1 ₅ supplied through the fifth rate converter 48 to process encoded by a predetermined encoding method, the resulting second
The speech encoded data D2 ₄ sent to the second adder 50.

【００４９】第２の付加器５０は、第２の符号化部４９
から与えられた第２の音声符号化データＤ２₄にその伝
送先に対応する第１の識別番号を付加し、これをマルチ
プレクサ４１に送出する。The second adder 50 includes a second encoding unit 49
Adding first identification number corresponding to the transmission destination to the second speech encoded data D2 ₄ given from, and sends it to the multiplexer 41.

【００５０】この場合マルチプレクサ４１には、映像処
理部から次段の端末装置１１Ａ〜１１Ｎから得られた入
力映像データと、自端末装置１１Ａ〜１１Ｎにおいて生
成した伝送対象の伝送映像データとを合成した後、所定
の符号化方式によって符号化処理し、かつ第１の識別番
号を付加して得られた第２の映像符号化データＤ３₄が
与えられると共に、制御部１５から所定の制御データＤ
４が与えられており、これによりマルチプレクサ４１
は、この第２の映像符号化データＤ３₄と、制御データ
Ｄ４と、第２の付加器５０から与えられた第２の音声符
号化データＤ２₄とを時分割多重化することにより多重
化データＤ５を生成する。In this case, the multiplexer 41 combines the input video data obtained from the terminal devices 11A to 11N at the next stage from the video processing unit with the transmission video data to be transmitted generated in the own terminal devices 11A to 11N. after a predetermined processing encoded by the coding scheme, and the first together with the second video coded data D3 ₄ obtained by adding the identification number is given, the control from the control unit 15 predetermined data D
4 is provided so that the multiplexer 41
Is multiplexed data and the second video coded data D3 _4, the control data D4, by time division multiplexing and second speech encoded data D2 ₄ supplied from the second adder 50 Generate D5.

【００５１】またマルチプレクサ４１は、このとき第２
の音声符号化データＤ２₄及び第２の映像符号化データ
Ｄ３₄に付加されている第１の識別番号を検出し、当該
検出結果に基づいて対応する多重化データＤ５をこれに
含まれる第１の識別番号を取り除いた後、第１のインタ
ーフェイス回路１８及び公衆回線網１２を順次介して対
応する伝送先の前段の端末装置１１Ａ〜１１Ｎに伝送す
る。At this time, the multiplexer 41
Of detecting the first identification number added to the voice coded data D2 ₄ and the second video coded data D3 _4, first it contained the multiplexed data D5 corresponding based on the detection result to After the identification number is removed, the data is transmitted via the first interface circuit 18 and the public line network 12 sequentially to the corresponding terminal device 11A to 11N at the preceding stage of the transmission destination.

【００５２】また第３の加算器２９は、第１の音声処理
部２０から与えられた入力音声データＤ１₁と、第２の
音声処理部２１から与えられた入力音声データＤ１₄と
を加算処理して合成し、得られた音声データ（以下、こ
れを放音用音声データと呼ぶ）Ｄ１₆をエコーキャンセ
ラ３２を介して音響結合によるエコーやハウリングを抑
止した後、ディジタル／アナログ変換器５１を介して音
声信号Ｓ２に変換してスピーカ３３に送出する。[0052] The third adder 29, addition processing input audio data D1 ₁ provided from the first speech processing section 20, the input audio data D1 ₄ supplied from the second speech processing section 21 and to synthesize the voice data obtained after suppress echo and howling due to acoustic coupling (hereinafter, referred to as sound emitting data) D1 ₆ via the echo canceller 32, a digital / analog converter 51 The signal is converted to an audio signal S2 through the speaker 33.

【００５３】これにより各端末装置１１Ａ〜１１Ｎにお
いては、スピーカ３３からこの音声信号Ｓ２に基づく、
自端末装置１１Ａ〜１１Ｎを除く他の各端末装置１１Ａ
〜１１Ｎにおいて得られた音声を放音させることができ
るようになされている。Thus, in each of the terminal devices 11A to 11N, based on the audio signal S2 from the speaker 33,
Other terminal devices 11A except the own terminal devices 11A to 11N
To 11N can be emitted.

【００５４】因みに一端の端末装置１１Ａにおいては、
前段に端末装置が接続されていないことから、第２の音
声処理部２１から得られた入力音声データＤ１₄を第３
の加算器２９を介して放音用音声データＤ１₆としてエ
コーキャンセラ３２に送出する。In the terminal device 11A at one end,
Since the terminal device is not connected to the previous stage, the input audio data D1 ₄ obtained from the second speech processing section 21 third
Via the adder 29 is sent as a sound emitting data D1 ₆ to the echo canceller 32.

【００５５】また他端の端末装置１１Ｎにおいては、次
段に端末装置が接続されていないことから、第１の音声
処理部２０から得られた入力音声データＤ１₁を第３の
加算器２９を介して放音用音声データＤ１₆としてエコ
ーキャンセラ３２に送出する。[0055] In the other end of the terminal device 11N, since the next stage the terminal device is not connected, the input audio data D1 ₁ obtained from the first speech processing section 20 third adder 29 through sending a sound emitting data D1 ₆ to the echo canceller 32.

【００５６】このようにして各端末装置１１Ａ〜１１Ｎ
においては、伝送対象の伝送音声データＤ１₂を順次次
段及び前段に合成しながら伝送すると共に、この際自端
末装置１１Ａ〜１１Ｎを除く他の端末装置１１Ａ〜１１
Ｎにおいて得られた音声を聞くことができるようになさ
れている。In this way, each of the terminal devices 11A to 11N
In, and conveyed while sequentially synthesized to the next stage and the previous stage of the transmission voice data D1 ₂ to be transmitted, this time other terminals except for the own terminal apparatus 11A to 11N 11A～11
N so that the user can hear the sound obtained.

【００５７】因みに各端末装置１１Ａ〜１１Ｎにおいて
は、故障検査時、第１及び第２の音声処理部２０及び２
１においてセレクタ３７及び４７の接点を切り換えるこ
とにより、第１及び第２の加算器２６及び３６から与え
られた出力音声データＤ１₃及びＤ１₅に代えて、第２
のレートコンバータ２７及び４５を介して得られた入力
音声データＤ１₁及びＤ１₄を対応する第１及び第２の
符号化部３９及び４９に送出する。Incidentally, in each of the terminal devices 11A to 11N, the first and second voice processing units 20 and 2
By switching the contacts of the selector 37 and 47 at 1, instead of the output audio data D1 ₃ and D1 ₅ supplied from the first and second adders 26 and 36, the second
It sends the rate converters 27 and 45 input audio data D1 ₁ and D1 ₄ obtained through the the first and second encoding section 39 and 49 which correspond.

【００５８】そして各端末装置１１Ａ〜１１Ｎにおいて
は、このとき第１の符号化部３９から与えられた第２の
音声符号化データＤ２₂に第１の付加器４０において第
１の識別番号を付加すると共に、第２の符号化部４９か
ら与えられた第２の音声符号化データＤ２₄に第２の付
加器５０において第２の識別番号を付加する。In each of the terminal devices 11A to 11N, the first identification number is added by the first adder 40 to the _second encoded audio data D22 given from the first encoding unit 39 at this time. while, adding the second identification number in the second adder 50 to the second speech encoded data D2 ₄ supplied from the second encoding unit 49.

【００５９】これにより各端末装置１１Ａ〜１１Ｎにお
いては、この故障検査時、次段及び前段の端末装置１１
Ａ〜１１Ｎに多重化データＤ５として伝送した入力音声
データＤ１₁及びＤ１₄をこの次段及び前段の端末装置
１１Ａ〜１１Ｎから返送させ、かくして伝送した入力音
声データＤ１₁及びＤ１₄を返送された入力音声データ
Ｄ１₁及びＤ１₄と比較し、この比較結果に基づいて自
端末装置１１Ａ〜１１Ｎにおいて故障が発生しているか
否かを検査することができるようになされている。In this way, in each of the terminal devices 11A to 11N, at the time of this failure inspection, the next and previous terminal devices 11A to 11N are connected.
A~11N to return the input audio data D1 ₁ and D1 ₄ which transmitted as a multiplexed data D5 from the next stage and the previous stage of the terminal device 11A~11N in, thus sent back to the input audio data D1 ₁ and D1 ₄ that transmitted compared to the input audio data D1 ₁ and D1 _4, it is made to be able to check whether a failure has occurred in its own terminal apparatus 11A~11N based on the comparison result.

【００６０】（２）各端末装置１１Ａ〜１１Ｎの詳細構
成かかる構成に加えてこのテレビジョン会議システム１０
の場合、出力音声データＤ１₃及びＤ１₅を符号化する
際には、第１〜第４の符号化方式のうち、所望する符号
化方式をユーザインターフェイス１６を介して選択指定
し得るようになされており、制御部１５がこの選択指定
された符号化方式に基づいて第１及び第２の符号化部３
９及び４９において対応する符号化処理を実行させる。(2) Detailed Configuration of Each of Terminal Devices 11A to 11N
In the case of, when encoding the output audio data D1 ₃ and D1 ₅ , a desired encoding method among the first to fourth encoding methods can be selected and designated via the user interface 16. The control unit 15 controls the first and second encoding units 3 based on the selected and designated encoding method.
At 9 and 49, the corresponding encoding process is executed.

【００６１】また制御部１５は、指定された符号化方式
を表す情報を多重化データＤ５に含まれる制御データＤ
４として次段及び前段の端末装置１１Ａ〜１１Ｎに伝送
し、これにより次段及び前段の端末装置１１Ａ〜１１Ｎ
において制御部１５は、デマルチプレクサ１９から与え
られたこの制御データＤ４に基づいて第１及び第２の復
号化部２４及び４３に対応する復号化処理を実行させ
る。Further, the control unit 15 controls the control data D included in the multiplexed data D5 with information indicating the designated encoding system.
4 and transmitted to the next and previous terminal devices 11A to 11N, whereby the next and previous terminal devices 11A to 11N are transmitted.
In, the control unit 15 causes the first and second decoding units 24 and 43 to execute the corresponding decoding processing based on the control data D4 given from the demultiplexer 19.

【００６２】実際に第１及び第２の符号化部３９及び４
９は、図３に示すように、第１〜第４の符号化方式に基
づいて符号化処理し得る第１〜第４の符号化器５５〜５
８を有し、対応する第５のレートコンバータ３８及び４
８を介して与えられた出力音声データＤ１₃及びＤ１₅
をセレクタ５９を介して、選択された第１〜第４の符号
化方式に対応する第１〜第４の符号化器５５〜５８に送
出する。Actually, the first and second encoding units 39 and 4
9, first to fourth encoders 55 to 5 that can perform encoding processing based on the first to fourth encoding schemes, as shown in FIG.
8 and corresponding fifth rate converters 38 and 4
Output audio data D1 given via the 8 ₃ and D1 ₅
Through the selector 59 to the first to fourth encoders 55 to 58 corresponding to the selected first to fourth encoding schemes.

【００６３】そして第１及び第２の符号化部３９及び４
９は、対応する第１〜第４の符号化器５５〜５８におい
てこの出力音声データＤ１₃及びＤ１₅を符号化処理
し、得られた第２の音声符号化データＤ２₂及びＤ２₄
をセレクタ６０を介して対応する第１及び第２の付加器
４０及び５０に送出する。Then, the first and second encoding units 39 and 4
9, the output audio data D1 ₃ and D1 ₅ in the corresponding first to fourth encoders 55-58 treated encoding second audio encoded data D2 ₂ and D2 ₄ obtained
To the corresponding first and second adders 40 and 50 via the selector 60.

【００６４】また第１及び第２の復号化部２４及び４３
は、図４に示すように、第１〜第４の符号化方式に対応
する第１〜第４の復号化方式によって復号化処理し得る
第１〜第４の復号化器６１〜６４を有し、対応する第１
及び第２の認識器２２及び２３から与えられた第１の音
声符号化データＤ２₁及びＤ２₃をセレクタ６５を介し
て、選択された第１〜第４の符号化方式に対応する第１
〜第４の復号化器６１〜６４に送出する。The first and second decoding units 24 and 43
Has first to fourth decoders 61 to 64 that can perform decoding processing according to the first to fourth decoding methods corresponding to the first to fourth encoding methods, as shown in FIG. And the corresponding first
And a first speech encoded data D2 ₁ and D2 ₃ via the selector 65 supplied from the second recognizer 22 and 23, the first corresponding to the first to fourth coding scheme selected
To the fourth decoders 61 to 64.

【００６５】そして第１及び第２の復号化部２４及び４
３は、対応する第１〜第４の復号化器６１〜６４におい
てこの第１の音声符号化データＤ２₁及びＤ２₃を復号
化処理し、得られた入力音声データＤ１₁及びＤ１₄を
セレクタ６６を介して第１及び第２のレートコンバータ
２５、４４及び２７、４５に送出する。Then, the first and second decoding units 24 and 4
3, the corresponding first to fourth this first speech encoded data D2 ₁ and D2 ₃ treated decoded by the decoder 61 to 64, a selector input audio data D1 ₁ and D1 ₄ obtained It sends out to the first and second rate converters 25, 44 and 27, 45 via 66.

【００６６】ここで実際上第３の符号化器５７は、第３
の符号化方式が選択されることによりセレクタ５９を介
して出力音声データＤ１₃及びＤ１₅が与えられると、
この出力音声データＤ１₃及びＤ１₅を順次５個の連続
するサンプルブロックに分割すると共に、当該分割して
得られたサンプルブロック（以下、これを入力信号ブロ
ックと呼ぶ）にそれぞれ対応させて、予め記憶した例え
ば1024個のコードブックベクトルの候補に対して利得を
調整した後合成フィルタを通してフィルタリング処理を
施すことにより1024個の量子化ベクトルを生成する。Here, in practice, the third encoder 57
When the output audio data D1 ₃ and D1 ₅ via the selector 59 is given by the coding method is selected,
With dividing the output audio data D1 ₃ and D1 ₅ sequentially five consecutive sample blocks, the divided-obtained sample block (hereinafter, referred to as input signal blocks) respectively corresponding to the previously After adjusting the gain of the stored 1024 codebook vector candidates, for example, a filtering process is performed through a synthesis filter to generate 1024 quantized vectors.

【００６７】そして第３の符号化器５７は、1024個の量
子化ベクトルのうち、対応する入力信号ブロックと共に
周波数の重み付け処理を施して自乗平均誤差が最小とな
るものを選定し、当該選定した量子化ベクトルに対応す
るコードブックベクトルを表す例えば10ビットのコード
ブックインデックスを第２の音声符号化データＤ２₂及
びＤ２₄としてセレクタ６０に送出する。Then, the third encoder 57 selects, from the 1024 quantized vectors, the one that minimizes the root mean square error by performing frequency weighting processing together with the corresponding input signal block, and selects the selected vector. and sends to the selector 60 a codebook index representing, for example, 10-bit codebook vector corresponding to the quantized vector as the second speech encoded data D2 _2, and D2 _4.

【００６８】因みに第３の符号化器５７は、上述した利
得及び合成フィルタの係数を対応する過去の利得調整さ
れたコードブックベクトル及び量子化ベクトルに基づく
バックワード適用により周期的に更新する。The third encoder 57 periodically updates the gain and the coefficients of the synthesis filter by applying the backward based on the corresponding past gain-adjusted codebook vector and quantization vector.

【００６９】また第３の復号化器６３は、図５に示すよ
うに、５個のサンプルブロック単位で復号処理するよう
になされており、セレクタ６５を介して与えられた第１
の音声符号化データＤ２₁及びＤ２₃（すなわち、第２
の音声符号化データＤ２₂及びＤ２₄）を励振ベクトル
コードブック６８に取り込む。As shown in FIG. 5, the third decoder 63 performs a decoding process in units of five sample blocks.
Coded data D2 ₁ and D2 ₃ (that is, the second
Coded data D2 ₂ and D2 ₄ ) in the excitation vector codebook 68.

【００７０】この場合励振ベクトルコードブック６８
は、第３の符号化器５７と同様に1024個のコードブック
ベクトルを予め記憶しており、第１の音声符号化データ
Ｄ２₁及びＤ２₃（すなわちインデックス情報）に基づ
いて、1024個のコードブックベクトルから対応するコー
ドブックベクトルを抽出し、これをコードブックベクト
ルデータＤ８として利得制御回路６９を介して合成フィ
ルタ７０に送出する。In this case, the excitation vector code book 68
Preliminarily stores 1024 codebook vector similarly to the third encoder 57, based on the first speech encoded data D2 ₁ and D2 ₃ (i.e. index information), 1024 Code A corresponding codebook vector is extracted from the book vector, and the extracted codebook vector is transmitted to the synthesis filter 70 via the gain control circuit 69 as codebook vector data D8.

【００７１】合成フィルタ７０は、フィードバックのパ
スに50次の線形予測符号化（LPC:Linear Predictive Co
ding）予測器を有する50次の全極フィルタによって構成
されており、利得制御回路６９を介して与えられたコー
ドブックベクトルデータＤ８に基づいて元の５個のサン
プルブロック単位の入力音声データＤ１₁及びＤ１₄を
生成し、これをポストフィルタ７１を通した後、セレク
タ６６を介して対応する第２のレートコンバータ２７、
４５に送出すると共に、当該ポストフィルタ７１を通さ
ずに、セレクタ６６を介して第１のレートコンバータ２
５、４４に送出する。The synthesis filter 70 performs a linear predictive coding (LPC: Linear Predictive Coding) of order 50 on the feedback path.
ding) is constituted by 50-order all-pole filter with a predictor, the input audio data D1 ₁ of the original five sample block basis based on the codebook vector data D8 supplied through a gain control circuit 69 and D1 ₄ generates, after passing the post filter 71, the second rate converters 27 corresponding via selector 66,
45, and without passing through the post filter 71, through the selector 66 to the first rate converter 2
5 and 44.

【００７２】因みにバックワード利得適応器７２は、利
得制御回路６９から利得の調整されたコードブックベク
トルデータＤ８が与えられており、以前に与えられたコ
ードブックベクトルデータＤ８の利得に基づいて対数領
域の適応線形予測を行うようにして現時点において利得
制御回路６９に与えられたコードブックベクトルデータ
Ｄ８の利得を予測し、当該予測結果に基づいてこの利得
制御回路６９を利得を制御する。The backward gain adaptor 72 receives the codebook vector data D8 of which the gain has been adjusted from the gain control circuit 69, and performs logarithmic domain based on the gain of the codebook vector data D8 previously given. , The gain of the codebook vector data D8 currently given to the gain control circuit 69 is predicted, and the gain of the gain control circuit 69 is controlled based on the prediction result.

【００７３】またバックワード合成フィルタ適応器７３
は、合成フィルタ７０から入力音声データＤ１₁及びＤ
１₄が与えられており、この入力音声データＤ１₁及び
Ｄ１₄に基づいて合成フィルタ７０のフィルタ係数を制
御する。The backward synthesis filter adaptor 73
The input speech data from the synthesis filter 70 D1 ₁ and D
1 ₄ and is provided to control the filter coefficients of the synthesis filter 70 based on the input audio data D1 ₁ and D1 _4.

【００７４】さらに第３の復号化器６３は、復号処理時
に得られる所定の情報に基づいてポストフィルタ７１の
フィルタ係数を周期的に更新するようになされている。Further, the third decoder 63 is adapted to periodically update the filter coefficient of the post filter 71 based on predetermined information obtained at the time of decoding processing.

【００７５】このようにして第１及び第２の復号化部２
４及び４３は、第３の符号化方式が選択指定された場
合、第３の復号化器６３において合成フィルタ７０から
得られた入力音声データＤ１₁及びＤ１₄をポストフィ
ルタ７１を通してフィルタリング処理を施すことにより
聴感的な雑音を除去してスピーカ３３に送出すると共
に、当該合成フィルタ７０から得られた入力音声データ
Ｄ１₁及びＤ１₄をポストフィルタ７１を通さずにフィ
ルタリング処理を施さないようにして、次段及び前段の
端末装置１１Ａ〜１１Ｎに伝送するために対応する第１
及び第２の加算器２６及び３６に送出する。Thus, the first and second decoding units 2
4 and 43, the third encoding scheme may selectively designated, performs filtering processing through the third decoder 63 obtained from the synthesis filter 70 in the input audio data D1 ₁ and D1 ₄ post filtering 71 together to remove perceptual noise is sent to the speaker 33 by, and the input audio data D1 ₁ and D1 ₄ obtained from the synthesis filter 70 so as not subjected to filtering processing without passing through the post-filter 71, The first corresponding to the transmission to the terminal devices 11A to 11N of the next stage and the preceding stage
And to the second adders 26 and 36.

【００７６】ところで第１〜第３の符号化方式において
は、８〔kHz 〕の標本化周波数によって生成されたデー
タを符号化処理するように規定されると共に、第４の符
号化方式においては、16〔kHz 〕の標本化周波数によっ
て生成されたデータを符号化処理するように規定されて
いる。Incidentally, in the first to third encoding methods, it is specified that data generated at a sampling frequency of 8 [kHz] is to be encoded, and in the fourth encoding method, It is stipulated that data generated at a sampling frequency of 16 [kHz] is encoded.

【００７７】また各端末装置１１Ａ〜１１Ｎにおいて、
アナログ／ディジタル変換器３１は例えば16〔kHz 〕の
標本化周波数でアナログ／ディジタル変換処理を実行す
ると共に、ディジタル／アナログ変換器５１は例えば16
〔kHz 〕の標本化周波数に基づいてディジタル／アナロ
グ変換処理を実行する。In each of the terminal devices 11A to 11N,
The analog / digital converter 31 executes an analog / digital conversion process at a sampling frequency of, for example, 16 [kHz], and the digital / analog converter 51 outputs, for example, a 16 kHz sampling frequency.
The digital / analog conversion processing is executed based on the sampling frequency of [kHz].

【００７８】従って各端末装置１１Ａ〜１１Ｎにおいて
は、第１〜第５のレートコンバータ２５、２７、２８、
３４、３５、３８、４４、４５、４６、４８を選択的に
用い、必要に応じて入力音声データＤ１₁、Ｄ１₄及び
伝送音声データＤ１₂並びに出力音声データＤ１₃、Ｄ
１₅の標本化周波数を16〔kHz 〕から８〔kHz 〕に変換
処理（以下、これを間引き処理と呼ぶ）すると共に、当
該入力音声データＤ１₁、Ｄ１₄及び伝送音声データＤ
１₂並びに出力音声データＤ１₃、Ｄ１₅の標本化周波
数を８〔kHz 〕から16〔kHz 〕に変換処理（以下、これ
を補間処理と呼ぶ）する。Accordingly, in each of the terminal devices 11A to 11N, the first to fifth rate converters 25, 27, 28,
34,35,38,44,45,46,48 selectively using input if necessary audio data D1 _1, D1 ₄ and transmitted voice data D1 ₂ and output the audio data D1 _3, D
1 converts ₅ of sampling frequency from 16 [kHz] to 8 [kHz] treated with (hereinafter, this is referred to as a decimation process), the input audio data D1 _1, D1 ₄ and transmission audio data D
1 ₂ and conversion output audio data D1 _3, D1 ₅ sampling frequency from 8 [kHz] to 16 [kHz] (hereinafter referred to as interpolation) to.

【００７９】この場合第１〜第５のレートコンバータ２
５、２７、２８、３４、３５、３８、４４、４５、４
６、４８は、内部に16〔kHz 〕の標本化周波数でアナロ
グ／ディジタル変換処理されたデータの 4.0〔kHz 〕程
度以上の周波数成分を減衰させる低域通過フィルタが設
けられており、間引き処理時、16〔kHz 〕の標本化周波
数の入力音声データＤ１₁、Ｄ１₄及び伝送音声データ
Ｄ１₂並びに出力音声データＤ１₃、Ｄ１₅が与えられ
ると、これを低域通過フィルタを通した後、標本値を順
次１つおきに間引き、かくして入力音声データＤ１₁、
Ｄ１₄及び伝送音声データＤ１₂並びに出力音声データ
Ｄ１₃、Ｄ１₅の標本化周波数を16〔kHz〕から８〔kHz
〕に間引き処理する。In this case, the first to fifth rate converters 2
5, 27, 28, 34, 35, 38, 44, 45, 4
The low-pass filters 6 and 48 are provided with a low-pass filter for attenuating a frequency component of about 4.0 [kHz] or more of data analog-to-digital converted at a sampling frequency of 16 [kHz]. When 16 input audio data D1 ₁ of the sampling frequency [kHz], D1 ₄ and transmitted voice data D1 ₂ and output the audio data D1 _3, D1 ₅ is given, after which through a low-pass filter, the specimen thinning sequentially every other value, thus the input audio data D1 _1,
D1 ₄ and transmitted voice data D1 ₂ and output the audio data D1 _3, D1 8 a sampling frequency of 16 [kHz] of ₅ [kHz
].

【００８０】また第１〜第５のレートコンバータ２５、
２７、２８、３４、３５、３８、４４、４５、４６、４
８は、補間処理時、８〔kHz 〕の標本化周波数の入力音
声データＤ１₁、Ｄ１₄及び伝送音声データＤ１₂並び
に出力音声データＤ１₃、Ｄ１₅が与えられると、これ
を順次隣り合う標本値間に０振幅の標本値を挿入した
後、低域通過フィルタに通し、この結果得られた入力音
声データＤ１₁、Ｄ１₄及び伝送音声データＤ１₂並び
に出力音声データＤ１₃、Ｄ１₅の振幅を必要に応じて
２倍にし、かくして入力音声データＤ１₁、Ｄ１₄及び
伝送音声データＤ１₂並びに出力音声データＤ１₃、Ｄ
１₅の標本化周波数を８〔kHz 〕から16〔kHz 〕に補間
処理する。The first to fifth rate converters 25,
27, 28, 34, 35, 38, 44, 45, 46, 4
Sample 8, during the interpolation process, given the sampled input speech data D1 ₁ frequency, D1 ₄ and transmitted voice data D1 ₂ and output the audio data D1 _3, D1 ₅ of 8 [kHz], the mutually sequentially adjacent this after inserting the 0 amplitude sample values between values, passed through a low-pass filter, the resulting input audio data D1 _1, D1 ₄ and transmitted voice data D1 ₂ and output the audio data D1 _3, D1 ₅ amplitude doubled according to the need, thus the input audio data D1 _1, D1 ₄ and transmitted voice data D1 ₂ and output the audio data D1 _3, D
1 ₅ interpolating the sampling frequency from 8 [kHz] to 16 [kHz] of.

【００８１】因みに第１〜第５のレートコンバータ２
５、２７、２８、３４、３５、３８、４４、４５、４
６、４８は、音声データＤ１₁〜Ｄ１₅の標本化周波数
を間引き処理及び補間処理する必要がない場合には、対
応する入力音声データＤ１₁、Ｄ１₄及び伝送音声デー
タＤ１₂並びに出力音声データＤ１₃、Ｄ１₅をそのま
ま後段に送出する。Incidentally, the first to fifth rate converters 2
5, 27, 28, 34, 35, 38, 44, 45, 4
6,48, if it is not necessary to thinning processing and interpolation processing sampling frequency of the audio data D1 ₁ ~ D1 _5, the corresponding input speech data D1 _1, D1 ₄ and transmitted voice data D1 ₂ and the output audio data D1 _3, directly output to the subsequent the D1 _5.

【００８２】実際上図６に示すように、第１及び第２の
符号化部３９及び４９において出力音声データＤ１₃及
びＤ１₅を第１〜第３の符号化方式によって符号化処理
し、また第１及び第２の復号化部２４及び４３において
第１の音声符号化データＤ２₁及びＤ２₃を第１〜第３
の復号化方式によって復号化処理する場合には、第１〜
第５のレートコンバータ２５、２７、２８、３４、３
５、３８、４４、４５、４６、４８のうち、第３及び第
４のレートコンバータ２８、４６及び３４、３５のみを
動作させ、エコーキャンセラ３２を介して得られた伝送
音声データＤ１₂の標本化周波数を対応する第４のレー
トコンバータ３４及び３５を介して間引き処理し、第１
及び第２の復号化部２４及び４３を介して得られた入力
音声データＤ１₁及びＤ１₄の標本化周波数を対応する
第３のレートコンバータ２８及び４６を介して補間処理
する。[0082] As shown in practice Figure 6, the output audio data D1 ₃ and D1 ₅ in the first and second encoding section 39 and 49 treated encoded by the first to third encoding scheme, also the first and the second decoding unit 24 and 43 the first speech encoded data D2 ₁ and D2 ₃ first to third
When the decoding process is performed by the decoding method of
Fifth rate converters 25, 27, 28, 34, 3
Of 5,38,44,45,46,48, third and fourth rate converters 28, 46 and 34, 35 only is operated, the specimen of the transmission voice data D1 ₂ obtained through the echo canceller 32 Of the optimized frequency via the corresponding fourth rate converters 34 and 35,
And interpolation processing through a second decoding unit third rate converters 28 and 46 corresponding to the sampling frequency of the input audio data D1 ₁ and D1 ₄ obtained through the 24 and 43.

【００８３】また図７に示すように、第１及び第２の符
号化部３９及び４９において出力音声データＤ１₃及び
Ｄ１₅を第１〜第３の符号化方式によって符号化処理
し、また第１及び第２の復号化部２４及び４３において
第１の音声符号化データＤ２₁及びＤ２₃を第４の復号
化方式によって復号化処理する場合には、第１〜第５の
レートコンバータ２５、２７、２８、３４、３５、３
８、４４、４５、４６、４８のうち、第５のレートコン
バータ３８及び４８のみを動作させ、第１及び第２の加
算器２６及び３６を介して得られた出力音声データＤ１
₃及びＤ１₅の標本化周波数を対応する第５のレートコ
ンバータ３８及び４８を介して間引き処理する。[0083] Further, as shown in FIG. 7, the output audio data D1 ₃ and D1 ₅ in the first and second encoding section 39 and 49 treated encoded by the first to third encoding scheme, also the 1 and the second decoding unit 24 and 43 when the first audio encoded data D2 ₁ and D2 ₃ to decoding processing by the fourth decoding method, first to fifth rate converter 25, 27, 28, 34, 35, 3
8, 44, 45, 46, and 48, only the fifth rate converters 38 and 48 are operated, and the output audio data D1 obtained through the first and second adders 26 and 36 are output.
₃ and D1 ₅ sampling frequency via a fifth rate converter 38 and 48 corresponding to the thinning process.

【００８４】さらに図８に示すように、第１及び第２の
符号化部３９及び４９において出力音声データＤ１₃及
びＤ１₅を第４の符号化方式によって符号化処理し、ま
た第１及び第２の復号化部２４及び４３において第１の
音声符号化データＤ２₁及びＤ２₃を第１〜第３の復号
化方式によって復号化処理する場合には、第１〜第５の
レートコンバータ２５、２７、２８、３４、３５、３
８、４４、４５、４６、４８のうち、第１及び第２のレ
ートコンバータ２５、４４及び２７、４５のみを動作さ
せ、第１及び第２の復号化部２４及び４３を介して得ら
れた入力音声データＤ１₁及びＤ１₄の標本化周波数を
対応する第１及び第２のレートコンバータ２５、４４及
び２７、４５を介して補間処理する。[0084] As further shown in FIG. 8, the output audio data D1 ₃ and D1 ₅ in the first and second encoding unit 39 and 49 processes the encoding by the fourth coding system, also the first and second in the decoding unit 24 and 43 of 2 when the first speech encoded data D2 ₁ and D2 ₃ to decoding processing by the first to third decoding method, first to fifth rate converter 25, 27, 28, 34, 35, 3
Of the 8, 44, 45, 46, 48, only the first and second rate converters 25, 44 and 27, 45 were operated and obtained via the first and second decoders 24 and 43. interpolating process via the first and second rate converters 25,44 and 27 and 45 to the corresponding sampling frequency of the input audio data D1 ₁ and D1 _4.

【００８５】さらに図９に示すように、第１及び第２の
符号化部３９及び４９において出力音声データＤ１₃及
びＤ１₅を第４の符号化方式によって符号化処理し、ま
た第１及び第２の復号化部２４及び４３において第１の
音声符号化データＤ２₁及びＤ２₃を第４の復号化方式
によって復号化処理する場合には、全ての第１〜第５の
レートコンバータ２５、２７、２８、３４、３５、３
８、４４、４５、４６、４８の動作を停止させ、入力音
声データＤ１₁、Ｄ１₄及び伝送音声データＤ１₂並び
に出力音声データＤ１₃、Ｄ１₅の標本化周波数に対し
て補間処理及び間引き処理を実行しないようにする。[0085] As further shown in FIG. 9, the output audio data D1 ₃ and D1 ₅ in the first and second encoding unit 39 and 49 processes the encoding by the fourth coding system, also the first and second in the decoding unit 24 and 43 of 2 when the first speech encoded data D2 ₁ and D2 ₃ to decoding processing by the fourth decoding method, all of the first to fifth rate converter 25, 27 , 28,34,35,3
The operation of 8,44,45,46,48 stopped, interpolation and decimation process on the input audio data D1 _1, D1 ₄ and transmitted voice data D1 ₂ and output the audio data D1 _3, D1 ₅ sampling frequency Do not execute.

【００８６】このようにして各端末装置１１Ａ〜１１Ｎ
においては、第１及び第２の音声処理部２０及び２１に
おいてそれぞれ最大でも２つの第１〜第５のレートコン
バータ２５、２７、２８、３４、３５、３８、４４、４
５、４６、４８を使用して補間処理及び間引き処理を実
行する。Thus, each of the terminal devices 11A to 11N
, The first and second audio processing units 20 and 21 each have a maximum of two first to fifth rate converters 25, 27, 28, 34, 35, 38, 44, 4
The interpolation processing and the thinning processing are executed by using 5, 46, and 48.

【００８７】なおこの実施の形態の場合、各端末装置１
１Ａ〜１１Ｎにおいては、第１及び第２の符号化部３９
及び４９の出力段と、第１及び第２の復号化部２４及び
４３の入力段とのデータ送信路が16ビットの幅を有し、
出力音声データＤ１₃及びＤ１₅を第１及び第２並びに
第４の符号化方式によって符号化処理すると、標本化周
波数が８〔kHz 〕でなり順次８ビット単位で送信し得る
（64〔kbps〕程度の伝送レートを有する）第２の音声符
号化データＤ２₂及びＤ２₄を生成し得ることから、当
該データ送信路の８ビットの空きを利用してこの第２の
音声符号化データＤ２₂及びＤ２₄に第１及び第２の識
別番号（例えば２ビットでなる）を付加する。In this embodiment, each terminal device 1
1A to 11N, the first and second encoding units 39
And the output stage of 49 and the input stage of the first and second decoding sections 24 and 43 have a 16-bit data transmission path,
Treatment encoded output audio data D1 ₃ and D1 ₅ by the first and second and fourth encoding method, the sampling frequency may be transmitted in sequential 8-bit units becomes 8 [kHz] (64 [kbps] (Which has a transmission rate of the order), the second encoded audio data D2 ₂ and D2 ₄ can be generated, and the second encoded audio data D2 ₂ and D2 ₂ and to D2 ₄ adds the first and second identification number (for example, 2 bits).

【００８８】また出力音声データＤ１₃及びＤ１₅を第
３の符号化方式によって符号化処理した場合には、通
常、標本化周波数が 1.6〔kHz 〕でなり順次10ビット単
位で送信し得る（16〔kbps〕程度の伝送レートを有す
る）第２の音声符号化データＤ２₂及びＤ２₄を生成す
るものの、この第２の音声符号化データＤ２₂及びＤ２
_４を標本化周波数を８〔ｋＨｚ〕にし、順次２ビット
単位で送信する（16〔kbps〕程度の伝送レートを有す
る）ことにより、データ送信路の14ビットの空きを利用
してこの第２の音声符号化データＤ２₂及びＤ２₄に第
１及び第２の識別番号を付加する。When the output audio data D1 ₃ and D1 ₅ are encoded by the third encoding method, the sampling frequency is usually 1.6 [kHz] and can be transmitted sequentially in 10-bit units (16). having [kbps] of about transmission rate) but to produce a second speech encoded data D2 ₂ and D2 _4, the second speech encoded data D2 ₂ and D2
_{4 is set} to a sampling frequency of 8 [kHz], and is sequentially transmitted in units of 2 bits (having a transmission rate of about 16 [kbps]), thereby utilizing the 14-bit vacancy of the data transmission path. adding first and second identification numbers to the speech encoded data D2 _2, and D2 _4.

【００８９】これにより各端末装置１１Ａ〜１１Ｎにお
いては、第２の音声符号化データＤ２₂及びＤ２₄を最
大でも10ビット単位（第１及び第２の識別番号を含む）
で順次伝送することから装置内部の処理負荷を低減させ
ることができるようになされている。[0089] In this way each terminal device 11A to 11N, (including first and second identification number) second speech encoded data D2 ₂ and D2 ₄ 10-bit units at maximum
Therefore, the processing load inside the apparatus can be reduced because the data is sequentially transmitted.

【００９０】因みに各端末装置１１Ａ〜１１Ｎにおいて
は、１つの半導体チップに第１及び第２の音声処理部２
０及び２１が一括して作り込まれていることから、これ
ら各端末装置１１Ａ〜１１Ｎの製造時、回路基板への各
種回路素子の実装を簡易化し得るようになされている。In each of the terminal devices 11A to 11N, the first and second audio processing units 2 are provided on one semiconductor chip.
Since the terminals 0 and 21 are manufactured in a lump, the mounting of various circuit elements on the circuit board can be simplified when the terminal devices 11A to 11N are manufactured.

【００９１】（３）本実施の形態の動作及び効果以上の構成において、このテレビジョン会議システム１
０では、各端末装置１１Ａ〜１１Ｎにおいて第１の音声
処理部２０における符号化方式として第３の符号化方式
が選択された場合、第１の加算器２６を介して得られた
出力音声データＤ１₃を第１の符号化部３９を介して第
３の符号化方式によって符号化処理し、得られた第２の
音声符号化データＤ２₂を対応する第２の映像符号化デ
ータＤ３₂及びこの第３の符号化方式が選択されたこと
を表す制御データＤ４と共に時分割多重化し、得られた
多重化データＤ５を公衆回線網１２を介して次段の端末
装置１１Ａ〜１１Ｎに伝送する。(3) Operation and effect of the present embodiment In the above configuration, the television conference system 1
0, when the third encoding system is selected as the encoding system in the first audio processing unit 20 in each of the terminal devices 11A to 11N, the output audio data D1 obtained via the first adder 26 ₃ through the first encoding unit 39 processes encoded by the third encoding scheme, resulting second video coded data D3 ₂ and the corresponding second encoded audio data D2 ₂ Time-division multiplexing is performed together with control data D4 indicating that the third encoding method has been selected, and the obtained multiplexed data D5 is transmitted to the next-stage terminal devices 11A to 11N via the public network 12.

【００９２】そしてこの次段の端末装置１１Ａ〜１１Ｎ
では、この多重化データＤ５を第１のインターフェイス
回路１８を介してデマルチプレクサ１９に取り込み、当
該デマルチプレクサ１９において多重化データＤ５を第
１の音声符号化データＤ２₁、第１の映像符号化データ
Ｄ３₁及び制御データＤ４に分離し、当該第１の音声符
号化データＤ２₁を第１の復号化部２４に送出すると共
に、第１の映像符号化データＤ３₁を映像処理部に送出
し、また制御データＤ４を制御部１５に送出する。The terminal devices 11A to 11N at the next stage
Then, the multiplexed data D5 is taken into the demultiplexer 19 via the first interface circuit 18, and the multiplexed data D5 is converted into the first audio encoded data D2 ₁ , the first video encoded data separated into D3 ₁ and control data D4, sends out the first audio encoded data D2 ₁ to the first decoding unit 24, and sends the first image coded data D3 ₁ to the image processing unit, Also, it sends out the control data D4 to the control unit 15.

【００９３】このとき制御部１５は、制御データＤ４に
基づいて第１の復号化部２４の第３の復号化器６３を動
作させ、当該第３の復号化器６３において第１の音声符
号化データＤ２₁を合成フィルタ７０を介して第３の復
号化方式により復号化処理し、得られた入力音声データ
Ｄ１₁をポストフィルタ７１を通することによりフィル
タリング処理を施してスピーカ３３側に送出すると共
に、当該ポストフィルタ７１を通さないことによりフィ
ルタリング処理を施さずに第１の加算器２６に送出す
る。At this time, the control unit 15 operates the third decoder 63 of the first decoding unit 24 based on the control data D4, and the third decoder 63 performs the first audio coding. the data D2 ₁ via the synthesis filter 70 and decoding processing by the third decoding method, the input audio data D1 ₁ obtained by performing the filtering process by Tsusuru post filter 71 is sent to the speaker 33 side At the same time, the signal is transmitted to the first adder 26 without being subjected to the filtering process by not passing through the post filter 71.

【００９４】このようにして各端末装置１１Ａ〜１１Ｎ
では、第１の加算器２６において、このポストフィルタ
７１を通さずに得られた入力音声データＤ１₁と、自端
末装置１１Ａ〜１１Ｎにおいて生成した伝送音声データ
Ｄ１₂とを加算処理して合成し、得られた出力音声デー
タＤ１₃を第１の符号化部３９を介して、選択されたい
ずれかの第１〜第４の符号化方式によって符号化処理
し、得られた第２の音声符号化データＤ２₂をマルチプ
レクサ４１を介して対応する第２の映像符号化データＤ
３₂及び制御データＤ４と共に時分割多重化した後、第
２のインターフェイス回路４２及び公衆回線網１２を順
次介して次段の端末装置１１Ａ〜１１Ｎに伝送する。Thus, each of the terminal devices 11A to 11N
In, the first adder 26, the input speech data D1 ₁ obtained without passing through the post-filter 71, synthesized to the transmission voice data D1 ₂ generated in its own terminal apparatus 11A~11N adding process , the output audio data D1 ₃ obtained through the first encoding unit 39, and processed encoded by any of the first to fourth coding scheme selected, the resulting second speech code second video coded data D corresponding to data D2 ₂ via the multiplexer 41
After time-division multiplexed together with 3 _2, and control data D4, successively through and be transmitted to the next stage of the terminal device 11A~11N the second interface circuit 42 and the public network 12.

【００９５】因みに各端末装置１１Ａ〜１１Ｎでは、第
２の音声処理部２１における符号化方式として第３の符
号化方式が選択された場合、この第２の音声処理部２１
において上述した第１の音声処理部２０と同様の処理を
実行する。In each of the terminal devices 11A to 11N, when the third encoding system is selected as the encoding system in the second audio processing unit 21, the second audio processing unit 21
Performs the same processing as in the first audio processing unit 20 described above.

【００９６】従ってこのテレビジョン会議システム１０
では、各端末装置１１Ａ〜１１Ｎにおいて第３の符号化
方式が選択され、出力音声データＤ１₃及び又はＤ１₅
を第３の符号化方式によって符号化処理し、得られた第
２の音声符号化データＤ２_２及び又はＤ２_４を順次次
段及び又は前段の端末装置１１Ａ〜１１Ｎに伝送する場
合でも、当該次段及び又は前段の端末装置１１Ａ〜１１
Ｎから与えられた第１の音声符号化データＤ２₁及び又
はＤ２₃を第３の復号化方式によって復号化処理する際
に、次段及び又は前段の端末装置１１Ａ〜１１Ｎに伝送
すべき入力音声データＤ１₁及び又はＤ１₄をポストフ
ィルタ７１を通さずに伝送音声データＤ１₂と合成する
ことから、この入力音声データＤ₁及び又はＤ１₄が伝
送中にポストフィルタ７１の作用によって減衰すること
を防止することができる。Therefore, this video conference system 10
So in each terminal device 11A~11N selected third coding scheme, the output audio data D1 ₃ and or D1 ₅
It was treated encoded by the third encoding scheme, even when transmitting the second audio encoded data D2 ₂ and or D2 ₄ obtained sequentially next stage and or front of terminal devices 11A to 11N, the next And / or previous terminal devices 11A-11
The first speech encoded data D2 ₁ and or D2 ₃ given from N when processing decoded by the third decoding method, the input speech to be transmitted to the next stage and or preceding terminal device 11A~11N the data D1 ₁ and or D1 ₄ since it is combined with transmission voice data D1 ₂ without passing through the post-filter 71, that the input audio data D ₁ and or the D1 ₄ is attenuated by the effect of the post filter 71 in the transmission Can be prevented.

【００９７】ところで、従来、このようなテレビジョン
会議システム１０を構築する場合、例えば第１及び第２
の符号化部３９及び４９において出力音声データＤ１₃
及びＤ１₅を第１〜第３の符号化方式によって符号化処
理し、また第１及び第２の復号化部２４及び４３におい
て第１の音声符号化データＤ２₁及びＤ２₃を第１〜第
３の復号化方式によって復号化処理すると、第１のレー
トコンバータ２５及び４４、第５のレートコンバータ３
８及び４８、第２又は第３のレートコンバータ２７、４
５又は２８、４６の３つのレートコンバータを使用する
ことが考えられる。Conventionally, when such a television conference system 10 is constructed, for example, the first and second teleconference systems are used.
Output audio data D1 ₃ in the encoders 39 and 49 of FIG.
And D1 ₅ treated encoded by the first to third encoding scheme, also the first speech encoded data D2 ₁ and D2 ₃ in the first and second decoding unit 24 and the 43 first to When the decoding process is performed by the third decoding method, the first rate converters 25 and 44 and the fifth rate converter 3
8 and 48, the second or third rate converter 27, 4
It is conceivable to use three rate converters, 5 or 28,46.

【００９８】すなわち第１及び第２の復号化部２４及び
４３から得られた入力音声データＤ１₁及びＤ１₄の標
本化周波数を第１のレートコンバータ２５及び４４を介
して補間処理し、この入力音声データＤ１₁及びＤ１₄
を対応する第１及び第２の加算器２６及び３６を介して
伝送音声データＤ１₂と加算処理して合成した後、得ら
れた出力音声データＤ１₃及びＤ１₅の標本化周波数を
第５のレートコンバータ３８及び４８を介して間引き処
理すると共に、第１及び第２の復号化部２４及び４３か
ら得られた入力音声データＤ１₁及びＤ１₄の標本化周
波数を第２又は第３のレートコンバータ２７、４５又は
２８、４６を介して補間処理する。[0098] That is the sampling frequency of the first and second decoding unit 24 and the input audio data D1 ₁ and D1 ₄ obtained from 43 and the interpolation processing through a first rate converter 25 and 44, this input audio data D1 ₁ and D1 ₄
After you by adding processing and transmission audio data D1 ₂ through the first and second adders 26 and 36 corresponding synthesis, sampling frequency of the output audio data D1 ₃ and D1 ₅ obtained in the fifth while thinning processing through the rate converter 38 and 48, the sampling frequency of the first and second decoding unit 24 and the input audio data D1 ₁ and D1 ₄ obtained from the 43 second or third rate converter Interpolation processing is performed via 27, 45 or 28, 46.

【００９９】しかしながら本実施の形態によるテレビジ
ョン会議システム１０では、第１〜第５のレートコンバ
ータ２５、２７、２８、３４、３５、３８、４４、４
５、４６、４８のうち、最大でも２つの第１〜第５のレ
ートコンバータ２５、２７、２８、３４、３５、３８、
４４、４５、４６、４８を使用して補間処理及び間引き
処理を実行することから、上述した場合に比べて装置内
部のデータ処理の負荷を低減させることができると共
に、補間処理及び間引き処理の繰り返しによって生じる
入力音声データＤ１₁、Ｄ１₄及び伝送音声データＤ１
₂並びに出力音声データＤ１₃、Ｄ１₅の劣化を最小限
にとどめることができる。However, in the video conference system 10 according to the present embodiment, the first to fifth rate converters 25, 27, 28, 34, 35, 38, 44, 4
5, 46, 48, at most two of the first to fifth rate converters 25, 27, 28, 34, 35, 38,
Since the interpolation process and the decimation process are performed using 44, 45, 46, and 48, the load of the data processing inside the apparatus can be reduced as compared with the above-described case, and the interpolation process and the decimation process are repeated. input audio data D1 ₁ generated by, D1 ₄ and transmitted voice data D1
₂ and output the audio data D1 _3, D1 ₅ of degradation can be minimized.

【０１００】従ってこのテレビジョン会議システム１０
では、出力音声データＤ１₃及びＤ１₅の劣化を最小限
にとどめながら、第１〜第４の符号化方式のうち、所望
する第１〜第４の符号化方式を使用して符号化処理する
ことができ、かくしてシステム全体の使い勝手を向上さ
せることができる。Therefore, this television conference system 10
In, with minimal degradation of the output audio data D1 ₃ and D1 _5, among the first to fourth encoding method, encoding processing using the first to fourth encoding method desired Thus, the usability of the entire system can be improved.

【０１０１】さらにこのテレビジョン会議システム１０
では、次段及び前段の端末装置１１Ａ〜１１Ｎから伝送
された第１の音声符号化データＤ２₁及びＤ２₃に、対
応する第１及び第２の識別番号を付加して認識すると共
に、次段及び前段の端末装置１１Ａ〜１１Ｎに伝送すべ
き第２の音声符号化データＤ２₂及びＤ２₄に、対応す
る第１及び第２の識別番号を付加して処理することか
ら、縦列接続した各端末装置１１Ａ〜１１Ｎにおいて、
順次次段及び前段の端末装置１１Ａ〜１１Ｎに伝送すべ
き第２の音声符号化データＤ２₂及びＤ２₄を適確に伝
送することができる。Further, the television conference system 10
In, the first speech encoded data D2 ₁ and D2 ₃ transmitted from the next stage and the previous stage of the terminal unit 11A to 11N, as well as recognition by adding the corresponding first and second identification numbers, the next stage and second speech encoded data D2 ₂ and D2 ₄ to be transmitted to the front stage of the terminal unit 11A to 11N, from processing by adding the corresponding first and second identification numbers, each terminal connected in cascade In the devices 11A to 11N,
Can be transmitted sequentially next stage and the previous stage of the second to be transmitted to the terminal device 11A~11N encoded audio data D2 ₂ and D2 ₄ to accurately.

【０１０２】以上の構成によれば、次段及び又は前段の
端末装置１１Ａ〜１１Ｎから第３の符号化方式によって
符号化処理して得られた第１の音声符号化データＤ２₁
及び又はＤ２₃が伝送された場合、これを受けた端末装
置１１Ａ〜１１Ｎにおいて、この第１の音声符号化デー
タＤ２₁及びＤ２₃を対応する第３の復号化方式によっ
て復号化処理し、このとき得られた次段及び又は前段の
端末装置１１Ａ〜１１Ｎに伝送すべき入力音声データＤ
１₁及び又はＤ１₄をポストフィルタ７１を通さずに自
端末装置１１Ａ〜１１Ｎにおいて生成した伝送音声デー
タＤ１₂と合成した後符号化処理して次段及び又は前段
の端末装置１１Ａ〜１１Ｎに伝送するようにしたことに
より、入力音声データＤ１₁及びＤ１₃が各端末装置１
１Ａ〜１１Ｎにおいて順次ポストフィルタ７１を通すこ
とによりそのフィルタリング処理の作用によって減衰す
ることを防止することができ、かくして伝送対象の入力
音声データを当該入力音声データに基づく音声の品質の
劣化を防止して伝送し得るテレビジョン会議システムを
実現することができる。[0102] According to the above configuration, the next stage and or first audio encoded data D2 ₁ obtained by performing encoding processing by the third coding method from the preceding terminal device 11A~11N
And or if the D2 ₃ is transmitted, the terminal device 11A~11N which receives this, performs decoding processing by the third decoding method corresponding to the first audio encoded data D2 ₁ and D2 _3, this The input audio data D to be transmitted to the next and / or previous terminal devices 11A to 11N obtained at this time.
1 transmission ₁ and or D1 ₄ a and coding process after combining the transmission voice data D1 ₂ generated in its own terminal apparatus 11A~11N without passing through the post-filter 71 at the next stage and or preceding terminal device 11A~11N by which is adapted to, each terminal input audio data D1 ₁ and D1 ₃ device 1
In 1A-11N, it is possible to prevent the attenuation of the input audio data to be transmitted by preventing the input audio data to be transmitted from deteriorating by passing through the post-filter 71 sequentially, thereby preventing the input audio data to be transmitted from deteriorating. A video conference system that can transmit the video.

【０１０３】（４）他の実施の形態なお上述の実施の形態においては、各端末装置１１Ａ〜
１１Ｎを上述した図２に示すように構成した場合につい
て述べたが、本発明はこれに限らず、図２との対応部分
に同一符号を付した図１０に示す他の実施の形態による
各端末装置８０Ａ〜８０Ｎのように、エコーキャンセラ
３２及び第２の音声処理部２１の第４のレートコンバー
タ３５間に音声検出器８１を設け、当該音声検出器８１
によりエコーキャンセラ３２から与えられた伝送音声デ
ータＤ１₂に基づく音声が無音であるか否かを検出し、
当該音声が無音のときにはこのエコーキャンセラ３２及
び第４のレートコンバータ３５間の接続を遮断するよう
にしても良い。(4) Other Embodiments In the above embodiment, each of the terminal devices 11A to 11A
Although the case where 11N is configured as shown in FIG. 2 described above has been described, the present invention is not limited to this, and each terminal according to another embodiment shown in FIG. 10 in which parts corresponding to those in FIG. As in the devices 80A to 80N, a voice detector 81 is provided between the echo canceller 32 and the fourth rate converter 35 of the second voice processing unit 21.
Sound based on the transmission voice data D1 ₂ provided from the echo canceller 32 detects whether the silence by,
When the sound is silent, the connection between the echo canceller 32 and the fourth rate converter 35 may be cut off.

【０１０４】これにより無音のときにマイクロフォン３
０によって集音する環境（雰囲気）の雑音を伝送音声デ
ータＤ１₂として次段又は前段の端末装置１１Ａ〜１１
Ｎに伝送することを防止することができ、かくして伝送
する音声の品質を向上させることができる。Thus, when there is no sound, the microphone 3
The next stage or previous stage of the terminal device as a transmission audio data D1 ₂ noise environment (atmosphere) for collecting by 0 11A～11
N can be prevented from being transmitted, and thus the quality of transmitted voice can be improved.

【０１０５】また図２との対応部分に同一符号を付した
図１１に示す他の実施の形態による各端末装置８２Ａ〜
８２Ｎのように、マイクロフォン３０（図２）及びスピ
ーカ３３（図２）に代えてマイクロフォン及びスピーカ
が設けられたヘッドセット８３（又は図示しないハンド
セット）を設けるようにしても良い。この場合各端末装
置８２Ａ〜８２Ｎにおいては、ヘッドセット８３（又は
図示しないハンドセット）がエコーキャンセラ３２（図
２）の機能を有することから当該エコーキャンセラ３２
を除いて回路構成を簡易化することができる。Each terminal 82A-82 according to another embodiment shown in FIG. 11 in which parts corresponding to those in FIG.
Like 82N, a headset 83 (or a handset (not shown)) provided with a microphone and a speaker may be provided instead of the microphone 30 (FIG. 2) and the speaker 33 (FIG. 2). In this case, in each of the terminal devices 82A to 82N, since the headset 83 (or a handset not shown) has the function of the echo canceller 32 (FIG. 2), the echo canceller 32
, The circuit configuration can be simplified.

【０１０６】また上述の実施の形態においては、第１及
び第２の符号化部３９及び４９において出力音声データ
Ｄ１₃及びＤ１₅を第１〜第３の符号化方式によって符
号化処理し、また第１及び第２の復号化部２４及び４３
において第１の音声符号化データＤ２₁及びＤ２₃を第
１〜第３の復号化方式によって復号化処理する際に第３
及び第４のレートコンバータ２８、３４、３５、４６の
みを動作させるようにした場合について述べたが、本発
明はこれに限らず、図１２に示すように、第２及び第４
のレートコンバータ２７、３４、３５、４５のみを動作
させ、第１及び第２の復号化部２４及び４３から第３の
加算器２９に送出する入力音声データＤ１₁及びＤ１₄
の標本化周波数を第２のレートコンバータ２７及び４５
を介して補間処理するようにしても良く、これにより第
１及び第２の音声処理部２０及び２１を第３のレートコ
ンバータ２８及び４６を除いて構成することができ、か
くして端末装置１１Ａ〜１１Ｎの構成を簡略化すること
ができる。[0106] Further in the above embodiment, the output audio data D1 ₃ and D1 ₅ in the first and second encoding section 39 and 49 treated encoded by the first to third encoding scheme, also First and second decoding units 24 and 43
In decoding the first coded audio data D2 ₁ and D2 ₃ by the first to third decoding schemes,
And the case where only the fourth rate converters 28, 34, 35 and 46 are operated. However, the present invention is not limited to this, and as shown in FIG.
Operating the rate converter 27,34,35,45 only, the input audio data D1 ₁ and D1 ₄ for delivering the first and second decoding unit 24 and 43 to the third adder 29
Is converted to the second rate converters 27 and 45.
, The first and second audio processing units 20 and 21 can be configured without the third rate converters 28 and 46, and thus the terminal devices 11A to 11N can be configured. Can be simplified.

【０１０７】さらに上述の実施の形態においては、各端
末装置１１Ａ〜１１Ｎにおいて第１〜第４の符号化方式
のうち、所望する第１〜第４の符号化方式を選択して使
用するようにした場合について述べたが、本発明はこれ
に限らず、共役構造代数的符号励振型線形予測（CS-ACE
LP:Conjugate Structure Algebraic Code Excited Line
ar Prediction ）と呼ばれる符号化方式や、最尤量子化
型マルチパルス（MP-MLQ excitation:Multi-Pulse Maxi
mum Likelihood Quantization excitation）と呼ばれる
符号化方式、また代数的符号励振型線形予測（ACELP:Al
gebraic Code Excited Linear Prediction）と呼ばれる
符号化方式等の符号励振型線形予測を基本として考案さ
れた符号化方式等のように、この他種々の符号化方式を
選択し得るようにしても良い。Further, in the above-described embodiment, each of terminal apparatuses 11A to 11N selects and uses desired first to fourth encoding schemes among the first to fourth encoding schemes. However, the present invention is not limited to this, and the conjugate structure algebraic code-excited linear prediction (CS-ACE
LP: Conjugate Structure Algebraic Code Excited Line
ar Prediction), the maximum likelihood quantization type multi-pulse (MP-MLQ excitation: Multi-Pulse Maxi
An encoding method called mum Likelihood Quantization excitation, and algebraic code excitation type linear prediction (ACELP: Al
It is also possible to select other various coding methods, such as a coding method devised based on code-excited linear prediction, such as a coding method called gebraic Code Excited Linear Prediction).

【０１０８】因みに符号励振型線形予測を基本として考
案された符号化方式や、ポストフィルタ７１等の所定の
フィルタによるフィルタリング処理を繰り返すことによ
り入力音声データＤ１₁及びＤ１₄が減衰するような当
該フィルタリング処理を必要とする所定の復号化方式に
対応する符号化方式を用いるような場合には、対応する
第１の音声符号化データＤ２₁及びＤ２₃の復号処理時
に、得られた入力音声データＤ１₁及びＤ１₄を放音さ
せるためにフィルタリング処理を施して出力すると共
に、伝送させるためにフィルタリング処理を施さずに出
力すれば良く、これにより上述した実施の形態と同様に
品質の劣化を防止することができる。[0108] Incidentally the code excited type linear predicted and coding scheme devised as a base, the filtering as the input audio data D1 ₁ and D1 ₄ is attenuated by repeating the filtering process with a predetermined filter such as postfilter 71 when processing such as using coding scheme corresponding to a predetermined decoding scheme that requires, upon decoding of the corresponding first audio encoded data D2 ₁ and D2 _3, obtained input audio data D1 outputs by performing a filtering process in order to sound the ₁ and D1 _4, may be output without performing the filtering process in order to transmit, thereby similarly to the above-described embodiment to prevent deterioration of quality be able to.

【０１０９】なお最尤量子化型マルチパルスと呼ばれる
符号化方式や代数的符号励振線形予測と呼ばれる符号化
方式では、通常、復号化器にピッチポストフィルタ及び
ホルマントポストフィルタと呼ばれる２種類のポストフ
ィルタを用いるようになされていることから、入力音声
データＤ１₁及びＤ１₄をピッチポストフィルタ及び又
はホルマントポストフィルタを通さないようにして出力
することにより音声の品質の劣化を防止することができ
る。In a coding method called maximum likelihood quantization type multi-pulse or a coding method called algebraic code excitation linear prediction, two types of post-filters called a pitch post-filter and a formant post-filter are usually provided to a decoder. from that have been made to use, it is possible to prevent the voice quality deterioration of by the input audio data D1 ₁ and D1 ₄ so as not to pass through the pitch postfilter and or formant postfilter output.

【０１１０】さらに上述の実施の形態においては、デー
タ送信路の空きを利用して第２の音声符号化データＤ２
₁〜Ｄ２₄と共に第１及び第２の識別番号を送信するよ
うにした場合について述べたが、本発明はこれに限ら
ず、データ送信路には、第２の音声符号化データＤ２₁
〜Ｄ２₄と共に第１及び第２の識別番号を送信してもさ
らに空きがあることから、この空きを利用してこの他各
種情報を送信するようにしても良い。Further, in the above-described embodiment, the second encoded voice data D2
With ₁ ~ D2 ₄ has dealt with the case of transmitting the first and second identification numbers, the present invention is not limited to this, the data transmission path, the second speech encoded data D2 ₁
With ~ D2 ₄ since the first and second identification number have additional free be transmitted, using this idle may transmit the other various information.

【０１１１】さらに上述の実施の形態においては、上述
した図１に示すように各端末装置１１Ａ〜１１Ｎを公衆
回線網１２を介して順次直列に従属接続するようにした
場合について述べたが、本発明はこれに限らず、各端末
装置１１Ａ〜１１Ｎを所定の回線網を介して順次枝状に
従属接続するように、この他種々の状態に従属接続する
ようにしても良い。因みに各端末装置１１Ａ〜１１Ｎを
各種状態に接続した場合には、各端末装置１１Ａ〜１１
Ｎそれぞれ接続された端末装置１１Ａ〜１１Ｎの台数に
応じて当該端末装置１１Ａ〜１１Ｎ内に音声処理部を設
けるようにすれば良い。Further, in the above-described embodiment, a case has been described in which the terminal devices 11A to 11N are sequentially connected in series via the public line network 12 as shown in FIG. The present invention is not limited to this, and each of the terminal devices 11A to 11N may be cascade-connected in a variety of other states such that the terminal devices 11A to 11N are sequentially cascaded via a predetermined network. Incidentally, when each of the terminal devices 11A to 11N is connected in various states, each of the terminal devices 11A to 11N
The audio processing unit may be provided in each of the terminal devices 11A to 11N in accordance with the number of terminal devices 11A to 11N connected thereto.

【０１１２】さらに上述の実施の形態においては、多地
点にそれぞれ設置された多地点会議用端末装置として、
出力音声データＤ１₃及びＤ１₅と、出力映像データＤ
３₂及びＤ３₄を伝送する端末装置１１Ａ〜１１Ｎを適
用するようにした場合について述べたが、本発明はこれ
に限らず、音声データのみを伝送する電話会議システム
の端末装置等のように、この他種々の多地点会議用端末
装置に広く適用することができる。Further, in the above embodiment, the multipoint conference terminal device installed at each of the multipoints is
And output the audio data D1 ₃ and D1 _5, the output video data D
3 ₂ and D3 ₄ has dealt with the case of applying the terminal device 11A~11N for transmitting, the present invention is not limited to this, and as a terminal device such as a conference call system for transmitting only the audio data, In addition, the present invention can be widely applied to various multipoint conference terminal devices.

【０１１３】さらに上述の実施の形態においては、前段
及び又は次段の多地点会議用端末装置から回線網を介し
て伝送された第１の音声符号化データを取り込んで出力
する取込み手段として、第１及び第２のインターフェイ
ス回路１８及び４２を適用するようにした場合について
述べたが、本発明はこれに限らず、前段及び又は次段の
多地点会議用端末装置から回線網を介して伝送された第
１の音声符号化データを取り込んで出力することができ
れば、この他種々の取込み手段を適用するようにしても
良い。Further, in the above-described embodiment, the first audio coded data transmitted from the preceding and / or the next stage multipoint conference terminal device via the network is fetched and output by the fetching means. The case where the first and second interface circuits 18 and 42 are applied has been described. However, the present invention is not limited to this, and is transmitted from the preceding and / or next stage multipoint conference terminal device via the line network. As long as the first encoded audio data can be captured and output, various other capturing means may be applied.

【０１１４】さらに上述の実施の形態においては、伝送
対象の伝送音声データを生成して出力する音声データ生
成手段として、マイクロフォン１０、アナログ／ディジ
タル変換器３１を適用するようにした場合について述べ
たが、本発明はこれに限らず、伝送対象の伝送音声デー
タを生成して出力することができれば、この他種々の音
声データ生成手段を適用するようにしても良い。Further, in the above-described embodiment, a case has been described where the microphone 10 and the analog / digital converter 31 are applied as audio data generating means for generating and outputting transmission audio data to be transmitted. The present invention is not limited to this, and various other audio data generating means may be applied as long as transmission audio data to be transmitted can be generated and output.

【０１１５】さらに上述の実施の形態においては、復号
化手段からフィルタリング処理が施されずに出力された
入力音声データと、音声データ生成手段から出力された
伝送音声データとを合成し、得られた出力音声データを
出力する合成手段として、第１及び第２の加算器２６及
び３６を適用するようにした場合について述べたが、本
発明はこれに限らず、復号化手段からフィルタリング処
理が施されずに出力された入力音声データと、音声デー
タ生成手段から出力された伝送音声データとを合成し、
得られた出力音声データを出力することができれば、こ
の他種々の合成手段を適用するようにしても良い。Further, in the above-described embodiment, the input audio data output from the decoding means without being subjected to the filtering process and the transmission audio data output from the audio data generating means are synthesized and obtained. Although a case has been described where the first and second adders 26 and 36 are applied as synthesis means for outputting output audio data, the present invention is not limited to this, and filtering processing is performed by decoding means. The input audio data output without being transmitted and the transmission audio data output from the audio data generating means,
As long as the obtained output audio data can be output, various other synthesizing means may be applied.

【０１１６】さらに上述の実施の形態においては、符号
化手段から出力された第２の音声符号化データを、対応
する第１の音声符号化データの伝送元と異なる前段及び
又は次段の多地点会議用端末装置に回線網を介して伝送
する伝送手段として、第１及び第２のインターフェイス
回路１８及び４２を適用するようにした場合について述
べたが、本発明はこれに限らず、符号化手段から出力さ
れた第２の音声符号化データを、対応する第１の音声符
号化データの伝送元と異なる前段及び又は次段の多地点
会議用端末装置に回線網を介して伝送することができれ
ば、この他種々の伝送手段を適用するようにしても良
い。Further, in the above-described embodiment, the second audio encoded data output from the encoding means is converted into a multi-point signal of a preceding stage and / or a next stage different from the transmission source of the corresponding first audio encoded data. The case where the first and second interface circuits 18 and 42 are applied as transmission means for transmitting to the conference terminal device via the network has been described. However, the present invention is not limited to this. Can be transmitted via the network to the multistage conference terminal device at the previous stage and / or the next stage which is different from the source of the corresponding first voice encoded data. Alternatively, various transmission means may be applied.

【０１１７】さらに上述の実施の形態においては、取込
み手段から出力された第１の音声符号化データに対応す
る伝送元の各多地点会議用端末装置に予め割り当てられ
た所定の第１の識別番号を付加して出力する第１の識別
番号付加手段として、デマルチプレクサ１９を適用する
ようにした場合について述べたが、本発明はこれに限ら
ず、取込み手段から出力された第１の音声符号化データ
に対応する伝送元の各多地点会議用端末装置に予め割り
当てられた所定の第１の識別番号を付加して出力するこ
とができれば、この他種々の第１の識別番号付加手段を
適用するようにしても良い。Further, in the above-described embodiment, the first identification number assigned in advance to each of the multipoint conference terminal devices of the transmission source corresponding to the first encoded voice data output from the capturing means. Has been described as a case where the demultiplexer 19 is applied as the first identification number adding means for adding and outputting the first speech code. However, the present invention is not limited to this. If it is possible to add a predetermined first identification number assigned in advance to each multipoint conference terminal device of the transmission source corresponding to the data and output the same, various other first identification number adding means are applied. You may do it.

【０１１８】さらに上述の実施の形態においては、各復
号化手段にそれぞれ対応させて設けられ、第１の識別番
号付加手段から出力された第１の音声符号化データのう
ち、予め指定された所定の第１の識別番号が付加された
第１の音声符号化データのみを選択して対応する復号化
手段に送出する複数の選択手段として、第１及び第２の
認識器２２及び２３を適用するようにした場合について
述べたが、本発明はこれに限らず、各復号化手段にそれ
ぞれ対応させて設けられ、第１の識別番号付加手段から
出力された第１の音声符号化データのうち、予め指定さ
れた所定の第１の識別番号が付加された第１の音声符号
化データのみを選択して対応する復号化手段に送出する
ことができれば、この他種々の選択手段を適用するよう
にしても良い。Further, in the above-described embodiment, of the first speech coded data output from the first identification number adding means and provided in correspondence with each of the decoding means, The first and second recognizers 22 and 23 are applied as a plurality of selecting means for selecting only the first coded audio data to which the first identification number is added and sending the selected data to the corresponding decoding means. However, the present invention is not limited to this, and the first audio encoded data output from the first identification number adding means is provided corresponding to each decoding means. If it is possible to select only the first coded audio data to which a predetermined first identification number is added and send it to the corresponding decoding means, various other selection means may be applied. May be.

【０１１９】さらに上述の実施の形態においては、各符
号化手段にそれぞれ対応させて設けられ、当該対応する
符号化手段から出力された第２の音声符号化データを対
応する伝送先の各多地点会議用端末装置に予め割り当て
られた所定の第２の識別番号を付加して伝送手段に送出
する複数の第２の識別番号付加手段として、第１及び第
２の付加器４０及び５０を適用するようにした場合につ
いて述べたが、本発明はこれに限らず、各符号化手段に
それぞれ対応させて設けられ、当該対応する符号化手段
から出力された第２の音声符号化データを対応する伝送
先の各多地点会議用端末装置に予め割り当てられた所定
の第２の識別番号を付加して伝送手段に送出することが
できれば、この他種々の第２の識別番号付加手段を適用
するようにしても良い。Further, in the above-described embodiment, the second voice coded data output from the corresponding coding means is provided in correspondence with each of the coding means, and each of the multi-points of the corresponding transmission destination is transmitted. The first and second adders 40 and 50 are applied as a plurality of second identification number adding means for adding a predetermined second identification number assigned to the conference terminal device in advance and sending the same to the transmission means. However, the present invention is not limited to this, and the present invention is not limited to this, and is provided corresponding to each encoding means, and the second audio encoded data output from the corresponding encoding means is transmitted to the corresponding transmission means. If it is possible to add a predetermined second identification number assigned in advance to each of the above-mentioned multipoint conference terminal devices and send the same to the transmission means, various other second identification number adding means are applied. Even There.

【０１２０】[0120]

【発明の効果】上述のように本発明によれば、多地点に
それぞれ設置された多地点会議用端末装置と、各多地点
会議用端末装置間を順次従属接続する回線網とを設ける
ようにし、各多地点会議用端末装置は、前段及び又は次
段の各多地点会議用端末装置から回線網を介して伝送さ
れた第１の音声符号化データを取り込んで出力する取込
み手段と、取込み手段から出力された第１の音声符号化
データを復号化処理し、得られた入力音声データを当該
入力音声データに基づく音声を放音させるために雑音を
除去するフィルタリング処理を施して出力すると共に、
当該フィルタリング処理を施さずに出力する復号化手段
と、伝送対象の伝送音声データを生成して出力する音声
データ生成手段と、復号手段からフィルタリング処理が
施されずに出力された入力音声データと、音声データ生
成手段から出力された伝送音声データとを合成し、得ら
れた出力音声データを出力する合成手段と、合成手段か
ら出力された出力音声データを符号化処理し、得られた
第２の音声符号化データを出力する符号化手段と、符号
化手段から出力された第２の音声符号化データを、対応
する第１の音声符号化データの伝送元と異なる前段及び
又は次段の各多地点会議用端末装置に回線網を介して伝
送する伝送手段とを有するようにしたことにより、伝送
される入力音声データが順次各多地点会議用端末装置に
おいて繰り返されるフィルタリング処理の作用により減
衰することを防止することができ、かくして伝送対象の
入力音声データを当該入力音声データに基づく音声の品
質の劣化を防止して伝送し得る多地点会議装置を実現す
ることができる。As described above, according to the present invention, a multipoint conference terminal device installed at each multipoint and a line network for sequentially subordinately connecting the multipoint conference terminal devices are provided. , Each of the multipoint conference terminal devices captures and outputs the first voice encoded data transmitted from the preceding and / or next stage multipoint conference terminal device via the network, and capture means. Decoding processing of the first coded audio data output from, and performing filtering processing for removing noise in order to emit sound based on the input audio data, and outputting the obtained input audio data;
Decoding means for outputting without performing the filtering processing, audio data generating means for generating and outputting transmission audio data to be transmitted, and input audio data output without performing the filtering processing from the decoding means, Synthesizing means for synthesizing the transmitted audio data output from the audio data generating means and outputting the obtained output audio data; and encoding the output audio data output from the synthesizing means to obtain the second Encoding means for outputting audio encoded data; and a second audio encoded data output from the encoding means for each of a previous stage and / or a next stage different from the source of the corresponding first audio encoded data. Since the point-to-point conference terminal device has transmission means for transmitting the data via the line network, the input voice data to be transmitted is sequentially repeated in each multi-point conference terminal device. Thus, it is possible to realize a multipoint conference apparatus that can prevent attenuation due to the operation of the filtering process and can transmit input audio data to be transmitted while preventing deterioration of audio quality based on the input audio data. it can.

【０１２１】また多地点にそれぞれ設置された多地点会
議用端末装置を所定の回線網を介して順次従属接続する
第１のステップと、各多地点会議用端末装置において、
前段及び又は次段の各多地点会議用端末装置から回線網
を介して伝送された第１の音声符号化データを復号化処
理し、得られた入力音声データを当該入力音声データに
基づく音声を放音させるために雑音を除去するフィルタ
リング処理を施して出力すると共に、当該フィルタリン
グ処理を施さずに出力する第２のステップと、伝送対象
の伝送音声データを生成し、当該生成した伝送音声デー
タと、フィルタリング処理が施されずに出力された入力
音声データとを合成し、得られた出力音声データを符号
化処理することにより第２の音声符号化データを生成す
る第３のステップと、第２の音声符号化データを、対応
する第１の音声符号化データの伝送元と異なる前段及び
又は次段の各多地点会議用端末装置に回線網を介して伝
送する第４のステップとを設けるようにしたことによ
り、伝送される入力音声データが順次各多地点会議用端
末装置において繰り返されるフィルタリング処理の作用
により減衰することを防止することができ、かくして伝
送対象の入力音声データを当該入力音声データに基づく
音声の品質の劣化を防止して伝送し得る多地点会議方法
を実現することができる。A first step in which the multipoint conference terminal devices installed at the respective multipoints are sequentially cascaded via a predetermined network, and in each of the multipoint conference terminal devices,
The first audio coded data transmitted from the preceding and / or subsequent multipoint conference terminal device via the network is decoded, and the obtained input audio data is converted into a voice based on the input audio data. A second step of performing filtering processing for removing noise in order to emit sound and outputting the processed sound without performing the filtering processing, and generating transmission audio data to be transmitted, and generating the transmission audio data. A third step of synthesizing the input audio data output without being subjected to the filtering process and generating the second encoded audio data by encoding the obtained output audio data; and A fourth step of transmitting, via a network, the voice coded data to the multistage conference terminal device at the preceding and / or next stage different from the source of the corresponding first voice coded data. The input audio data to be transmitted can be prevented from being attenuated by the action of the filtering process repeated in each of the multipoint conference terminal devices in sequence. Can be realized while preventing deterioration in voice quality based on the input voice data.

【０１２２】さらに外部から与えられる第１の音声符号
化データを復号化処理し、得られた入力音声データを当
該入力音声データに基づく音声を放音させるために雑音
を除去するフィルタリング処理を施して出力すると共
に、当該フィルタリング処理せずに出力する復号化手段
と、伝送対象の伝送音声データを生成して出力する音声
データ生成手段と、復号化手段からフィルタリング処理
が施されずに出力された入力音声データと、音声データ
生成手段から出力された伝送音声データとを合成し、得
られた出力音声データを出力する合成手段と、合成手段
から出力された出力音声データを符号化処理する符号化
手段とを設けるようにしたことにより、所定の回線網を
介して順次従属接続された複数の多地点会議用端末装置
において順次繰り返されるフィルタリング処理の作用に
より伝送される入力音声データが減衰することを防止す
ることができ、かくして伝送対象の入力音声データを当
該入力音声データに基づく音声の品質の劣化を防止して
伝送し得る多地点会議用端末装置を実現することができ
る。Further, the first speech coded data supplied from the outside is decoded, and the obtained input speech data is subjected to a filtering process for removing noise in order to emit a speech based on the input speech data. A decoding means for outputting and outputting without performing the filtering processing, an audio data generating means for generating and outputting transmission audio data to be transmitted, and an input output without performing filtering processing from the decoding means. Synthesizing means for synthesizing audio data and transmission audio data output from the audio data generating means, and outputting the obtained output audio data, and encoding means for encoding the output audio data output from the synthesizing means Is provided, a plurality of multipoint conference terminal devices sequentially cascade-connected via a predetermined network are sequentially repeated. Input audio data transmitted by the filtering process can be prevented from being attenuated, and thus the input audio data to be transmitted can be transmitted while preventing the deterioration of the voice quality based on the input audio data. A point conference terminal device can be realized.

[Brief description of the drawings]

【図１】本発明によるテレビジョン会議システムの構成
の一実施の形態を示すブロック図である。FIG. 1 is a block diagram showing one embodiment of a configuration of a television conference system according to the present invention.

【図２】各端末装置の回路構成を示すブロック図であ
る。FIG. 2 is a block diagram illustrating a circuit configuration of each terminal device.

【図３】第１及び第２の符号化部の回路構成を示すブロ
ック図である。FIG. 3 is a block diagram illustrating a circuit configuration of first and second encoding units.

【図４】第１及び第２の復号化部の回路構成を示すブロ
ック図である。FIG. 4 is a block diagram illustrating a circuit configuration of first and second decoding units.

【図５】第３の復号化器の回路構成を示すブロック図で
ある。FIG. 5 is a block diagram showing a circuit configuration of a third decoder.

【図６】第１〜第５のレートコンバータの動作の説明に
供するブロック図である。FIG. 6 is a block diagram for explaining the operation of first to fifth rate converters;

【図７】第１〜第５のレートコンバータの動作の説明に
供するブロック図である。FIG. 7 is a block diagram for explaining operations of first to fifth rate converters;

【図８】第１〜第５のレートコンバータの動作の説明に
供するブロック図である。FIG. 8 is a block diagram for explaining operations of first to fifth rate converters;

【図９】第１〜第５のレートコンバータの動作の説明に
供するブロック図である。FIG. 9 is a block diagram for explaining operations of first to fifth rate converters;

【図１０】他の実施の形態による各端末装置の回路構成
を示すブロック図である。FIG. 10 is a block diagram showing a circuit configuration of each terminal device according to another embodiment.

【図１１】他の実施の形態による各端末装置の回路構成
を示すブロック図である。FIG. 11 is a block diagram showing a circuit configuration of each terminal device according to another embodiment.

【図１２】他の実施の形態によるレートコンバータの動
作の説明に供するブロック図である。FIG. 12 is a block diagram for explaining an operation of a rate converter according to another embodiment.

【図１３】従来のテレビジョン会議システムの構成を示
すブロック図である。FIG. 13 is a block diagram showing a configuration of a conventional video conference system.

【図１４】従来のテレビジョン会議システムの構成を示
すブロック図である。FIG. 14 is a block diagram showing a configuration of a conventional video conference system.

[Explanation of symbols]

１０……テレビジョン会議システム、１１Ａ〜１１Ｎ、
８０Ａ〜８０Ｎ、８２Ａ〜８２Ｎ……端末装置、１５…
…制御部、１８……第１のインターフェイス回路、１９
……デマルチプレクサ、２２……第１の認識器、２３…
…第２の認識器、２４……第１の復号化部、２５、４４
……第１のレートコンバータ、２６……第１の加算器、
２７、４５……第２のレートコンバータ、２８、４６…
…第３のレートコンバータ、３０……マイクロフォン、
３１……アナログ〜ディジタル変換器、３４、３５……
第４のレートコンバータ、３６……第２の加算器、３
８、４８……第５のレートコンバータ、３９……第１の
符号化部、４０……第１の付加器、４２……第２のイン
ターフェイス回路、４３……第２の復号化部、４９……
第２の符号化部、５０……第２の付加器、５７……第３
の符号化器、６３……第３の復号化器、７１……ポスト
フィルタ、Ｄ１₁、Ｄ１₄……入力音声データ、Ｄ１₂
……伝送音声データ、Ｄ１₃、Ｄ１₅……出力音声デー
タ、Ｄ２₁、Ｄ２₃……第１の音声符号化データ、Ｄ２
₂、Ｄ２₄……第２の音声符号化データ。10 TV conference system, 11A to 11N,
80A to 80N, 82A to 82N ... terminal device, 15 ...
... Control unit, 18 ... First interface circuit, 19
... Demultiplexer, 22 first recognizer, 23
... second recognizer, 24 ... first decoding unit, 25, 44
... a first rate converter, 26 ... a first adder,
27, 45 ... second rate converter, 28, 46 ...
... third rate converter, 30 ... microphone,
31 ... Analog to digital converter, 34, 35 ...
Fourth rate converter, 36... Second adder, 3
8, 48... Fifth rate converter, 39... First encoding unit, 40... First adder, 42... Second interface circuit, 43. ......
Second encoding unit, 50... Second adder, 57.
Encoder, 63 ...... third decoder, 71 ...... postfilter, D1 _1, D1 ₄ ...... input voice data, D1 ₂
... Transmission audio data, D1 ₃ , D1 ₅ ... output audio data, D2 ₁ , D2 ₃ ... first audio encoded data, D2
₂ , D2 ₄ ... Second audio encoded data.

Claims

[Claims]

1. A multi-point conference terminal device installed at each multi-point, and a line network for sequentially cascading the multi-point conference terminal devices, wherein each of the multi-point conference terminal devices comprises: Capturing means for capturing and outputting first voice coded data transmitted from the multistage conference terminal device of the preceding stage and / or the next stage via the network; and the first voice output from the capturing means. The audio coded data is decoded, and the obtained input audio data is subjected to a filtering process for removing noise in order to emit a sound based on the input audio data and output, and the filtering process is not performed. A sound data generating means for generating and outputting transmission sound data to be transmitted; and a decoding means for outputting the sound data without being subjected to the filtering processing. Synthesizing means for synthesizing the input audio data and the transmission audio data output from the audio data generating means, and outputting the obtained output audio data; and encoding the output audio data output from the synthesizing means. Encoding means for performing the encoding process and outputting the obtained second audio encoded data; and converting the second audio encoded data output from the encoding means into the corresponding first audio encoded data. Transmission means for transmitting the multistage conference terminal device of the preceding stage and / or the next stage different from the transmission source via the network.

2. The decoding means according to claim 1, wherein said decoding means comprises a first decoding method corresponding to an encoding method based on code excitation type linear prediction or said first decoding method requiring said filtering processing. 2. The multipoint conference apparatus according to claim 1, wherein the audio encoded data is decoded.

3. The multi-point conference terminal device, comprising: an output stage for applying the filtering process to the input audio data of the decoding unit and outputting the input audio data; and performing a filtering process on the input audio data of the decoding unit. And an output stage for performing output without performing the processing, an output stage of the audio data generating unit and an input stage of the synthesizing unit, and an output stage of the synthesizing unit and an input stage of the encoding unit. At least four rate converters for converting a sampling frequency of the input audio data, the transmission audio data, and the output audio data; and a sampling frequency defined by a decoding method used by the decoding means; Based on the sampling frequency specified in the encoding method used by the encoding means, each of the rate converters , Multipoint conferencing system according to claim 1, characterized in that it comprises a control means for controlling each of said rate converter so as to convert processing the sampling frequency in two respective said rate converter at the maximum.

4. The multipoint conference terminal device, wherein the decoding means provided corresponding to each of the multipoint conference terminal devices to which the second encoded voice data is transmitted, A plurality of data processing means comprising a synthesizing means and the encoding means; and a plurality of data processing means which are assigned in advance to the multi-point conference terminal devices of the transmission source corresponding to the first coded audio data output from the capturing means. First identification number adding means for adding and outputting the predetermined first identification number, and the first identification number adding means provided in correspondence with each of the decoding means, and the first identification number adding means output from the first identification number adding means. A plurality of selections for selecting only the first audio encoded data to which the predetermined first identification number is added from among one audio encoded data and transmitting the selected data to the corresponding decoding means. Means and on each The second audio encoded data output from the corresponding encoding means is provided in correspondence with each of the encoding means, and the second audio encoded data output from the corresponding encoding means is assigned to each of the multipoint conference terminal devices at the corresponding transmission destination in advance. A plurality of second identification number adding means for adding the second identification number to the transmission means and transmitting the second identification number to the transmission means, wherein the transmission means is configured to output the second identification number from each of the second identification number addition means. 2. The multipoint conference apparatus according to claim 1, wherein the voice coded data is transmitted to each of the corresponding multipoint conference terminal apparatuses based on the second identification number via the line network. 3.

5. A first method for sequentially cascade-connecting multipoint conference terminal devices respectively installed at multipoints via a predetermined network.
In each of the multipoint conference terminal devices, decoding the first encoded voice data transmitted from the preceding and / or next stage multipoint conference terminal devices via the network. ,
A second step of performing a filtering process for removing noise in order to emit the voice based on the input voice data and outputting the obtained voice data without performing the filtering process; By generating the transmitted audio data, synthesizing the generated transmitted audio data with the input audio data output without being subjected to the filtering process, and encoding the obtained output audio data. A third step of generating the second encoded voice data; and transmitting the second encoded audio data to the first and / or second stages different from the transmission source of the corresponding first encoded audio data. A fourth step of transmitting to the point conference terminal via the network.

6. In the second step, the first decoding method corresponding to a coding method based on code excitation type linear prediction or the first decoding method requiring the filtering processing is performed by the first decoding method. The multipoint conference method according to claim 5, wherein the audio encoded data is decoded.

7. In the third step, a sampling frequency specified in a decoding method used in the decoding process of the first coded audio data and a sampling frequency defined in the decoding process of the output audio data are performed. On the basis of the sampling frequency defined in the used encoding method, the sampling frequency of the input audio data and the transmission audio data and the output audio data corresponding to the two predetermined positions at the maximum are calculated. The multipoint conference method according to claim 5, wherein conversion processing is performed as needed.

8. In the second step, a predetermined first identification number previously assigned to each of the multipoint conference terminal devices of the transmission source corresponding to the first voice encoded data is added. In a plurality of processing systems provided corresponding to each of the multistage conference terminal devices of the preceding and / or subsequent stages to which the second audio encoded data is to be transmitted, And selecting and decoding only the first encoded audio data to which the identification number of 1 is added. In the third step, the second encoded audio data generated in each of the processing systems is selected. A predetermined second identification number assigned in advance to each of the multipoint conference terminal devices at the preceding and / or subsequent stages corresponding to the transmission destination, and in the fourth step, each of the second voices Encoded data Multipoint conference method according to claim 5, respectively, characterized in that transmission over the network to each said multipoint conference terminal device corresponding on the basis of the second identification number.

9. A multi-point conference terminal device installed at each of said multi-point conference devices, wherein said first voice code is provided from outside. Decoding the decoded data, performing filtering processing for removing noise in order to emit the sound based on the input sound data, and outputting the obtained input sound data, and outputting without performing the filtering processing Encoding means, audio data generating means for generating and outputting transmission audio data to be transmitted, input audio data output from the decoding means without being subjected to the filtering process, and audio data generating means Synthesizes the output transmission voice data,
A multipoint conference terminal device, comprising: synthesizing means for outputting the obtained output audio data; and encoding means for encoding the output audio data output from the synthesizing means.

10. The decoding means according to claim 1, wherein said first decoding means is a predetermined decoding method corresponding to an encoding method based on code excitation type linear prediction or said predetermined decoding method requiring said filtering processing. 10. The multipoint conference terminal device according to claim 9, wherein the encoded audio data is decoded.

11. An output stage of the decoding means for performing the filtering process on the input audio data and outputting the input audio data, and an output stage of the decoding means for outputting the input audio data without performing the filtering process. The input audio data, the transmission audio data, and the output corresponding to the output stage of the audio data generation unit and the input stage of the synthesis unit, and the output stage of the synthesis unit and the input stage of the encoding unit, respectively. At least four rate converters for converting the sampling frequency of the audio data, the sampling frequency specified in the decoding method used by the decoding means, and the coding method used in the coding method used by the coding means. At most two of the rate converters based on the sampling frequency. 10. The multipoint conference terminal device according to claim 9, further comprising control means for controlling each of said rate converters so that said sampling frequency is converted by said data converter.