JP4748113B2

JP4748113B2 - Learning device, learning method, program, and recording medium

Info

Publication number: JP4748113B2
Application number: JP2007147720A
Authority: JP
Inventors: 哲二郎近藤; 勉渡辺; 裕人木村
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2007-06-04
Filing date: 2007-06-04
Publication date: 2011-08-17
Anticipated expiration: 2022-03-07
Also published as: JP2007295599A

Abstract

PROBLEM TO BE SOLVED: To provide a technology of learning tap coefficients capable of decoding coded data into an image with high image quality and sound with high sound quality. SOLUTION: A teacher data generating section 161 generates teacher data being a teacher for learning the tap coefficients from learning data, a student data generating section 163 generates student data being a student for learning the tap coefficients from the learning data, and a coding section 12 codes the learning data and outputs learning coded data including characteristic data as to the learning data to a mismatch detection section 13. The mismatch detection section 13 judges correctness of the characteristic data included in the learning coded data and outputs mismatch information denoting a result of the judgement to an adaptive learning section 160, and the adaptive learning section 160 uses the teacher data and the student data to learn the tap coefficients on the basis of the mismatch information. The technology above can be applied to, e.g. learning apparatuses. COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、学習装置および学習方法、並びにプログラムおよび記録媒体に関し、特に、例えば、画像や音声等を符号化した符号化データを、高品質（高画質または高音質）の画像や音声に復号することができるようにするタップ係数を学習する学習装置および学習方法、並びにプログラムおよび記録媒体に関する。 The present invention relates to a learning apparatus, a learning method, a program, and a recording medium, and particularly, for example, decodes encoded data obtained by encoding an image, sound, or the like into a high-quality (high image quality or high sound quality) image or sound. The present invention relates to a learning device and a learning method for learning a tap coefficient so that the program can be performed, a program and a recording medium.

画像（動画像）データの高能率符号化方式としては、例えば、ＭＰＥＧ(Moving Picture Experts Group)方式が知られており、ＭＰＥＧ方式では、画像データが、横×縦が８×８画素のブロック単位で、水平および垂直の２方向について２次元ＤＣＴ（Discrete Cosine Transform）変換され、さらに量子化される。 For example, MPEG (Moving Picture Experts Group) method is known as a high-efficiency encoding method for image (moving image) data. In MPEG method, image data is a block unit of 8 × 8 pixels in horizontal × vertical. Then, two-dimensional DCT (Discrete Cosine Transform) transformation is performed in two directions, horizontal and vertical, and further quantized.

このように、ＭＰＥＧ方式では、画像データが２次元ＤＣＴ変換されるが、例えば、ＭＰＥＧ２では、２次元ＤＣＴ変換の対象となるブロックのＤＣＴタイプを、マクロブロック単位で、フレームＤＣＴモードとフィールドＤＣＴモードに切り替えることができる。フレームＤＣＴモードでは、ブロックが、同一フレームの画素から構成され、そのようなブロックの画素値が２次元ＤＣＴ変換される。また、フィールドＤＣＴモードでは、ブロックが、同一フィールドの画素から構成され、そのようなブロックの画素値が２次元ＤＣＴ変換される。 As described above, in the MPEG system, image data is subjected to two-dimensional DCT conversion. For example, in MPEG2, the DCT type of a block to be subjected to two-dimensional DCT conversion is changed into a frame DCT mode and a field DCT mode in macroblock units. You can switch to In the frame DCT mode, a block is composed of pixels of the same frame, and the pixel values of such a block are two-dimensionally DCT transformed. In the field DCT mode, a block is composed of pixels in the same field, and pixel values of such a block are two-dimensionally DCT transformed.

ＤＣＴタイプを、フレームＤＣＴモードまたはフィールドＤＣＴモードのうちのいずれとするかは、基本的には、例えば、画像の動きや、周辺のマクロブロックとの連続性等の画像の特性に基づき、復号画像におけるブロック歪みモスキートノイズ等を低減するように決定される。即ち、例えば、動きの大きい画像については、フィールドＤＣＴモードが選択され、動きのほとんどない画像については、フレームＤＣＴモードが選択される。 Whether the DCT type is the frame DCT mode or the field DCT mode is basically determined based on the characteristics of the image such as the motion of the image and continuity with the surrounding macroblocks. Is determined so as to reduce block distortion mosquito noise and the like. That is, for example, the field DCT mode is selected for an image with large motion, and the frame DCT mode is selected for an image with little motion.

ここで、画像をＭＰＥＧ符号化することにより得られる符号化データには、画像を２次元ＤＣＴ変換して量子化することにより得られる２次元ＤＣＴ係数の他、ＤＣＴタイプ等も含まれるが、このＤＣＴタイプは、上述のように、画像の動きなどに基づいて決定されるので、画像の特性を表しているということができる。 Here, the encoded data obtained by MPEG-encoding an image includes a DCT type and the like in addition to a two-dimensional DCT coefficient obtained by two-dimensional DCT transform and quantizing the image. As described above, the DCT type is determined based on the motion of the image and the like, and can be said to represent the characteristics of the image.

ところで、ＭＰＥＧ符号化においては、デコーダ側においてオーバーフローおよびアンダーフローが生じないように、符号化データのデータレートが制限される。そして、この符号化データのデータレートを制限するために、本来、フレームＤＣＴモードまたはフィールドＤＣＴモードに設定すべきＤＣＴタイプが、フィールドＤＣＴモードまたはフレームＤＣＴモードに、いわば不適切に設定されることがある。 By the way, in MPEG encoding, the data rate of encoded data is limited so that overflow and underflow do not occur on the decoder side. In order to limit the data rate of the encoded data, the DCT type that should originally be set to the frame DCT mode or the field DCT mode may be inappropriately set to the field DCT mode or the frame DCT mode. is there.

しかしながら、このような不適切なＤＣＴタイプが設定された場合であっても、デコーダ側では、その不適切なＤＣＴタイプにしたがって、符号化データを復号しなければならず、復号画像の画質が劣化する課題があった。 However, even when such an inappropriate DCT type is set, the decoder side must decode the encoded data in accordance with the inappropriate DCT type, and the quality of the decoded image is deteriorated. There was a problem to do.

本発明は、このような状況に鑑みてなされたものであり、符号化データを、高品質の画像や音声に復号することができるタップ係数を学習するようにするものである。 The present invention has been made in view of such a situation, and is designed to learn tap coefficients that can decode encoded data into high-quality images and sounds.

本発明の学習装置は、データを符号化した符号化データであって、少なくとも、データの特性を表す特性データを含む符号化データを復号するために行われる、符号化データとの予測演算に用いられるタップ係数を学習する学習装置であって、学習用のデータから、タップ係数の学習の教師となる教師データを生成して出力する教師データ生成手段と、学習用のデータから、タップ係数の学習の生徒となる生徒データを生成して出力する生徒データ生成手段と、学習用のデータを符号化し、そのデータについての特性データを含む学習用の符号化データを出力する符号化手段と、学習用の符号化データに含まれる特性データと、学習用の符号化データに対応する学習用のデータの実際の特性である実特性とを比較して得られるミスマッチ情報を出力する比較出力手段と、ミスマッチ情報に基づき、教師データと生徒データを用いて、タップ係数を学習する学習手段とを備え、学習手段は、注目している教師データである注目教師データを複数のクラスのうちのいずれかのクラスにクラス分類するのに用いるクラスタップを、ミスマッチ情報に予め対応付けられているクラスタップの抽出パターンで、生徒データから抽出するクラスタップ抽出手段と、クラスタップに基づいて、注目教師データをクラス分類し、対応するクラスのクラスコードを出力するクラス分類手段と、注目教師データについて、タップ係数との予測演算に用いる予測タップを、ミスマッチ情報に予め対応付けられている予測タップの抽出パターンで、生徒データから抽出する予測タップ抽出手段と、予測タップとタップ係数とを用いて予測演算を行うことにより得られる教師データの予測値の予測誤差が統計的に最小になるタップ係数を、クラスごとに求めるタップ係数演算手段とを有し、学習用のデータが音声データである場合、符号化手段は、学習用の音声データを、ＣＥＬＰ(Code Excited Liner Prediction coding)方式によって符号化して、学習用の符号化データを出力し、比較出力手段は、学習用の符号化データに含まれる特性データと、学習用の符号化データに対応する学習用の音声データの実際の特性である実特性との差分値を表すミスマッチ情報を出力し、教師データ生成手段および生徒データ生成手段は、教師データ生成手段が、学習用の音声データを、そのまま、教師データとして出力し、生徒データ生成手段が、学習用の音声データを、ＣＥＬＰ方式によって符号化して復号し、その復号結果を、生徒データとして出力するか、教師データ生成手段が、学習用の音声データを線形予測分析し、その結果得られる線形予測係数をフィルタ係数とする予測フィルタを、学習用の音声データにより駆動することによって、残差信号を生成し、その残差信号を、教師データとして出力し、生徒データ生成手段が、学習用の音声データをＣＥＬＰ方式によって符号化して復号し、その結果得られる、音声合成フィルタを駆動する残差信号を、生徒データとして出力するか、または、教師データ生成手段が、学習用の音声データを線形予測分析し、その結果得られる線形予測係数を、教師データとして出力し、生徒データ生成手段が、学習用の音声データをＣＥＬＰ方式によって符号化して復号し、その結果得られる、音声合成フィルタのフィルタ係数となる線形予測係数を、生徒データとして出力し、学習用のデータが画像データである場合、符号化手段は、学習用の画像データを、ＭＰＥＧ(Moving Picture Experts Group)方式によって符号化して、学習用の符号化データを出力し、比較出力手段は、学習用の符号化データに含まれる特性データと、学習用の符号化データに対応する学習用の画像データの実際の特性である実特性との組合せを表すミスマッチ情報を出力し、教師データ生成手段および生徒データ生成手段は、教師データ生成手段が、学習用の画像データを、そのまま、教師データとして出力し、生徒データ生成手段が、学習用の画像データを、ＭＰＥＧ方式によって符号化して復号し、その復号結果を、生徒データとして出力するか、教師データ生成手段が、学習用の画像データを、そのまま、教師データとして出力し、生徒データ生成手段が、学習用の画像データを、ＭＰＥＧ方式によって符号化して復号し、その結果得られる２次元ＤＣＴ係数を、生徒データとして出力するか、または、教師データ生成手段が、学習用の画像データを２次元ＤＣＴ変換し、その結果得られる２次元ＤＣＴ係数を、教師データとして出力し、生徒データ生成手段が、学習用の画像データを、ＭＰＥＧ方式によって符号化して復号し、その結果得られる２次元ＤＣＴ係数を、生徒データとして出力する学習装置である。 The learning device according to the present invention is encoded data obtained by encoding data, and is used for prediction calculation with encoded data, which is performed in order to decode at least encoded data including characteristic data representing characteristics of the data. A learning device for learning tap coefficients, learning data generating means for generating teacher data from a learning data and outputting the teacher data, and learning tap coefficients from the learning data Student data generating means for generating and outputting student data to be a student, encoding means for encoding learning data, and outputting learning encoded data including characteristic data about the data, and learning data Information obtained by comparing the characteristic data included in the encoded data with the actual characteristic that is the actual characteristic of the learning data corresponding to the encoded data for learning A comparison output means for outputting, and a learning means for learning tap coefficients using teacher data and student data based on mismatch information, and the learning means outputs a plurality of attention teacher data as attention teacher data. Based on the class tap extraction means for extracting the class tap used for classifying into any one of the classes from the student data with the class tap extraction pattern previously associated with the mismatch information, and the class tap The class classification unit that classifies the teacher data of interest and outputs the class code of the corresponding class, and the prediction tap used for the prediction calculation with the tap coefficient of the teacher data of interest are associated with the mismatch information in advance. in extraction pattern of prediction tap, the prediction tap extracting means for extracting from the student data, the prediction tap The learning data includes tap coefficient calculation means for obtaining, for each class, a tap coefficient that statistically minimizes the prediction error of the predicted value of the teacher data obtained by performing the prediction calculation using the tap coefficient. Is a speech data, the encoding means encodes the learning speech data by a CELP (Code Excited Liner Prediction coding) method and outputs the learning encoded data, and the comparison output means provides the learning data. Output mismatch information representing a difference value between the characteristic data included in the encoded data and the actual characteristic that is the actual characteristic of the speech data for learning corresponding to the encoded data for learning; The student data generation means outputs the learning voice data as it is as teacher data, and the student data generation means outputs the learning voice data as CE data. Encoding and decoding by the P method, and outputting the decoding result as student data, or the teacher data generation means performs linear prediction analysis on the speech data for learning and uses the resulting linear prediction coefficient as a filter coefficient The prediction filter is driven by the speech data for learning to generate a residual signal, the residual signal is output as teacher data, and the student data generating means encodes the speech data for learning by the CELP method. The residual signal that drives the speech synthesis filter obtained as a result is output as student data, or the teacher data generation means performs linear predictive analysis on the speech data for learning and obtains the result. Output linear prediction coefficients as teacher data, and the student data generation means encodes and decodes the speech data for learning by the CELP method Then, the linear prediction coefficient that is the filter coefficient of the speech synthesis filter obtained as a result is output as student data, and when the learning data is image data, the encoding means converts the learning image data into MPEG ( Encoding by Moving Picture Experts Group) method and output encoded data for learning.Comparison output means is for learning data corresponding to characteristic data included in encoded data for learning and encoded data for learning. The mismatch information representing the combination with the actual characteristic which is the actual characteristic of the image data is output, and the teacher data generating means and the student data generating means are the teacher data generating means, and the learning image data is directly used as the teacher data. The student data generation means encodes and decodes the learning image data by the MPEG method, and outputs the decoding result as student data. Alternatively, the teacher data generation means outputs the learning image data as it is as teacher data, and the student data generation means encodes and decodes the learning image data by the MPEG method, and the result is obtained. The two-dimensional DCT coefficient is output as student data, or the teacher data generation means performs two-dimensional DCT conversion on the image data for learning, and outputs the resulting two-dimensional DCT coefficient as teacher data. The data generation means is a learning device that encodes and decodes learning image data by the MPEG method, and outputs the resulting two-dimensional DCT coefficients as student data.

本発明の学習方法は、データを符号化した符号化データであって、少なくとも、データの特性を表す特性データを含む符号化データを復号するために行われる、符号化データとの予測演算に用いられるタップ係数を学習する学習装置の学習方法であって、学習用のデータから、タップ係数の学習の教師となる教師データを生成して出力する教師データ生成ステップと、学習用のデータから、タップ係数の学習の生徒となる生徒データを生成して出力する生徒データ生成ステップと、学習用のデータを符号化し、そのデータについての特性データを含む学習用の符号化データを出力する符号化ステップと、学習用の符号化データに含まれる特性データと、学習用の符号化データに対応する学習用のデータの実際の特性である実特性とを比較して得られるミスマッチ情報を出力する比較出力ステップと、ミスマッチ情報に基づき、教師データと生徒データを用いて、タップ係数を学習する学習ステップとを含み、学習ステップは、注目している教師データである注目教師データを複数のクラスのうちのいずれかのクラスにクラス分類するのに用いるクラスタップを、ミスマッチ情報に予め対応付けられているクラスタップの抽出パターンで、生徒データから抽出するクラスタップ抽出ステップと、クラスタップに基づいて、注目教師データをクラス分類し、対応するクラスのクラスコードを出力するクラス分類ステップと、注目教師データについて、タップ係数との予測演算に用いる予測タップを、ミスマッチ情報に予め対応付けられている予測タップの抽出パターンで、生徒データから抽出する予測タップ抽出ステップと、予測タップとタップ係数とを用いて予測演算を行うことにより得られる教師データの予測値の予測誤差が統計的に最小になるタップ係数を、クラスごとに求めるタップ係数演算ステップとを有し、学習用のデータが音声データである場合、符号化ステップは、学習用の音声データを、ＣＥＬＰ(Code Excited Liner Prediction coding)方式によって符号化して、学習用の符号化データを出力し、比較出力ステップは、学習用の符号化データに含まれる特性データと、学習用の符号化データに対応する学習用の音声データの実際の特性である実特性との差分値を表すミスマッチ情報を出力し、教師データ生成ステップおよび生徒データ生成ステップは、教師データ生成ステップが、学習用の音声データを、そのまま、教師データとして出力し、生徒データ生成ステップが、学習用の音声データを、ＣＥＬＰ方式によって符号化して復号し、その復号結果を、生徒データとして出力するか、教師データ生成ステップが、学習用の音声データを線形予測分析し、その結果得られる線形予測係数をフィルタ係数とする予測フィルタを、学習用の音声データにより駆動することによって、残差信号を生成し、その残差信号を、教師データとして出力し、生徒データ生成ステップが、学習用の音声データをＣＥＬＰ方式によって符号化して復号し、その結果得られる、音声合成フィルタを駆動する残差信号を、生徒データとして出力するか、または、教師データ生成ステップが、学習用の音声データを線形予測分析し、その結果得られる線形予測係数を、教師データとして出力し、生徒データ生成ステップが、学習用の音声データをＣＥＬＰ方式によって符号化して復号し、その結果得られる、音声合成フィルタのフィルタ係数となる線形予測係数を、生徒データとして出力し、学習用のデータが画像データである場合、符号化ステップは、学習用の画像データを、ＭＰＥＧ(Moving Picture Experts Group)方式によって符号化して、学習用の符号化データを出力し、比較出力ステップは、学習用の符号化データに含まれる特性データと、学習用の符号化データに対応する学習用の画像データの実際の特性である実特性との組合せを表すミスマッチ情報を出力し、教師データ生成ステップおよび生徒データ生成ステップは、教師データ生成ステップが、学習用の画像データを、そのまま、教師データとして出力し、生徒データ生成ステップが、学習用の画像データを、ＭＰＥＧ方式によって符号化して復号し、その復号結果を、生徒データとして出力するか、教師データ生成ステップが、学習用の画像データを、そのまま、教師データとして出力し、生徒データ生成ステップが、学習用の画像データを、ＭＰＥＧ方式によって符号化して復号し、その結果得られる２次元ＤＣＴ係数を、生徒データとして出力するか、または、教師データ生成ステップが、学習用の画像データを２次元ＤＣＴ変換し、その結果得られる２次元ＤＣＴ係数を、教師データとして出力し、生徒データ生成ステップが、学習用の画像データを、ＭＰＥＧ方式によって符号化して復号し、その結果得られる２次元ＤＣＴ係数を、生徒データとして出力する学習方法である。 The learning method of the present invention is encoded data obtained by encoding data, and is used for prediction calculation with encoded data performed to decode at least encoded data including characteristic data representing the characteristics of the data. A learning method of a learning device for learning a tap coefficient, wherein a teacher data generation step for generating and outputting teacher data serving as a teacher for learning a tap coefficient from learning data, and a tap from the learning data A student data generating step for generating and outputting student data to be a student of the coefficient learning, an encoding step for encoding the learning data and outputting encoded data for learning including characteristic data for the data; Compare the characteristic data included in the encoded data for learning with the actual characteristic that is the actual characteristic of the learning data corresponding to the encoded data for learning. A comparison output step for outputting the mismatch information, and a learning step for learning the tap coefficient using the teacher data and the student data based on the mismatch information, wherein the learning step is the teacher data of interest. A class tap extraction step of extracting the class tap used for classifying the data into any one of a plurality of classes from the student data with a class tap extraction pattern previously associated with the mismatch information; Based on the class tap, classifying the attention teacher data and outputting the class code of the corresponding class, and the prediction tap used for prediction calculation with the tap coefficient for the attention teacher data in advance corresponding to the mismatch information in the extraction pattern of prediction tap is attached, or the student data A tap coefficient for each class to obtain a tap coefficient that statistically minimizes the prediction error of the predicted value of the teacher data obtained by performing the prediction calculation using the prediction tap extraction step to be extracted and the prediction tap and the tap coefficient. When the learning data is speech data, the encoding step encodes the learning speech data by a CELP (Code Excited Liner Prediction coding) method, The comparison output step represents a difference value between the characteristic data included in the learning encoded data and the actual characteristic that is the actual characteristic of the learning speech data corresponding to the learning encoded data. The mismatch information is output. In the teacher data generation step and the student data generation step, the teacher data generation step directly converts the learning voice data into the teaching data. The data is output as data, and the student data generation step encodes and decodes the learning voice data by the CELP method and outputs the decoding result as student data, or the teacher data generation step determines the learning voice data. Is used to generate a residual signal by driving a prediction filter that uses the resulting linear prediction coefficient as a filter coefficient with learning speech data, and outputs the residual signal as teacher data In the student data generation step, the learning speech data is encoded and decoded by the CELP method, and the resulting residual signal for driving the speech synthesis filter is output as student data, or the teacher data The generation step performs linear prediction analysis on the speech data for learning, and uses the resulting linear prediction coefficients as teacher data. And the student data generation step encodes and decodes the speech data for learning by the CELP method, and outputs the linear prediction coefficients, which are the filter coefficients of the speech synthesis filter, obtained as a result, as student data, When the image data is image data, the encoding step encodes the learning image data by the MPEG (Moving Picture Experts Group) method and outputs the learning encoded data, and the comparison output step performs the learning. Output mismatch information representing a combination of characteristic data included in the encoded data for learning and actual characteristics that are actual characteristics of the learning image data corresponding to the encoded data for learning, and a teacher data generation step; In the student data generation step, the teacher data generation step outputs the image data for learning as teacher data as it is, The data generation step encodes and decodes the learning image data by the MPEG method, and outputs the decoding result as student data, or the teacher data generation step leaves the learning image data as it is as the teacher data. The student data generation step outputs the two-dimensional DCT coefficients obtained as a result of encoding and decoding the learning image data by the MPEG method as student data, or the teacher data generation step The learning image data is subjected to two-dimensional DCT transformation, and the two-dimensional DCT coefficient obtained as a result is output as teacher data. The student data generation step encodes and decodes the learning image data by the MPEG method. In this learning method, the two-dimensional DCT coefficient obtained as a result is output as student data.

本発明のプログラムは、データを符号化した符号化データであって、少なくとも、データの特性を表す特性データを含む符号化データを復号するために行われる、符号化データとの予測演算に用いられるタップ係数を学習する学習処理を、コンピュータに行なわせるためのプログラムであって、学習用のデータから、タップ係数の学習の教師となる教師データを生成して出力する教師データ生成ステップと、学習用のデータから、タップ係数の学習の生徒となる生徒データを生成して出力する生徒データ生成ステップと、学習用のデータを符号化し、そのデータについての特性データを含む学習用の符号化データを出力する符号化ステップと、学習用の符号化データに含まれる特性データと、学習用の符号化データに対応する学習用のデータの実際の特性である実特性とを比較して得られるミスマッチ情報を出力する比較出力ステップと、ミスマッチ情報に基づき、教師データと生徒データを用いて、タップ係数を学習する学習ステップとを含み、学習ステップは、注目している教師データである注目教師データを複数のクラスのうちのいずれかのクラスにクラス分類するのに用いるクラスタップを、ミスマッチ情報に予め対応付けられているクラスタップの抽出パターンで、生徒データから抽出するクラスタップ抽出ステップと、クラスタップに基づいて、注目教師データをクラス分類し、対応するクラスのクラスコードを出力するクラス分類ステップと、注目教師データについて、タップ係数との予測演算に用いる予測タップを、ミスマッチ情報に予め対応付けられている予測タップの抽出パターンで、生徒データから抽出する予測タップ抽出ステップと、予測タップとタップ係数とを用いて予測演算を行うことにより得られる教師データの予測値の予測誤差が統計的に最小になるタップ係数を、クラスごとに求めるタップ係数演算ステップとを有し、学習用のデータが音声データである場合、符号化ステップは、学習用の音声データを、ＣＥＬＰ(Code Excited Liner Prediction coding)方式によって符号化して、学習用の符号化データを出力し、比較出力ステップは、学習用の符号化データに含まれる特性データと、学習用の符号化データに対応する学習用の音声データの実際の特性である実特性との差分値を表すミスマッチ情報を出力し、教師データ生成ステップおよび生徒データ生成ステップは、教師データ生成ステップが、学習用の音声データを、そのまま、教師データとして出力し、生徒データ生成ステップが、学習用の音声データを、ＣＥＬＰ方式によって符号化して復号し、その復号結果を、生徒データとして出力するか、教師データ生成ステップが、学習用の音声データを線形予測分析し、その結果得られる線形予測係数をフィルタ係数とする予測フィルタを、学習用の音声データにより駆動することによって、残差信号を生成し、その残差信号を、教師データとして出力し、生徒データ生成ステップが、学習用の音声データをＣＥＬＰ方式によって符号化して復号し、その結果得られる、音声合成フィルタを駆動する残差信号を、生徒データとして出力するか、または、教師データ生成ステップが、学習用の音声データを線形予測分析し、その結果得られる線形予測係数を、教師データとして出力し、生徒データ生成ステップが、学習用の音声データをＣＥＬＰ方式によって符号化して復号し、その結果得られる、音声合成フィルタのフィルタ係数となる線形予測係数を、生徒データとして出力し、学習用のデータが画像データである場合、符号化ステップは、学習用の画像データを、ＭＰＥＧ(Moving Picture Experts Group)方式によって符号化して、学習用の符号化データを出力し、比較出力ステップは、学習用の符号化データに含まれる特性データと、学習用の符号化データに対応する学習用の画像データの実際の特性である実特性との組合せを表すミスマッチ情報を出力し、教師データ生成ステップおよび生徒データ生成ステップは、教師データ生成ステップが、学習用の画像データを、そのまま、教師データとして出力し、生徒データ生成ステップが、学習用の画像データを、ＭＰＥＧ方式によって符号化して復号し、その復号結果を、生徒データとして出力するか、教師データ生成ステップが、学習用の画像データを、そのまま、教師データとして出力し、生徒データ生成ステップが、学習用の画像データを、ＭＰＥＧ方式によって符号化して復号し、その結果得られる２次元ＤＣＴ係数を、生徒データとして出力するか、または、教師データ生成ステップが、学習用の画像データを２次元ＤＣＴ変換し、その結果得られる２次元ＤＣＴ係数を、教師データとして出力し、生徒データ生成ステップが、学習用の画像データを、ＭＰＥＧ方式によって符号化して復号し、その結果得られる２次元ＤＣＴ係数を、生徒データとして出力する学習処理を、コンピュータに行わせるためのプログラムである。 The program of the present invention is encoded data obtained by encoding data, and is used for a prediction operation with encoded data, which is performed to decode at least encoded data including characteristic data representing the characteristics of the data. A program for causing a computer to perform a learning process for learning tap coefficients, a teacher data generation step for generating and outputting teacher data serving as a teacher for learning tap coefficients from learning data, and a learning data A student data generation step for generating and outputting student data to be a student of tap coefficient learning from the data, and encoding the learning data and outputting encoded learning data including characteristic data about the data Encoding step, characteristic data included in the encoded learning data, and learning data corresponding to the encoded learning data A comparison output step that outputs mismatch information obtained by comparing actual characteristics that are actual characteristics, and a learning step that learns tap coefficients using teacher data and student data based on the mismatch information. In the step, a class tap extraction pattern in which class taps used for classifying attention teacher data as attention teacher data into one of a plurality of classes is previously associated with mismatch information The class tap extracting step for extracting from the student data, the class classification step for classifying the attention teacher data based on the class tap and outputting the class code of the corresponding class, and the tap coefficient for the attention teacher data predicting a prediction tap used for prediction calculation, it is related in advance to the mismatch information Extraction pattern-up, the prediction tap extraction step of extracting from the student data, the prediction error of the prediction value of the teacher data obtained by performing prediction calculation is statistically minimized by using the prediction tap and the tap coefficient A tap coefficient calculation step for obtaining a tap coefficient for each class, and when the learning data is speech data, the encoding step uses the CELP (Code Excited Liner Prediction coding) method to convert the learning speech data. Encode and output the encoded data for learning, and the comparison output step includes the characteristic data included in the encoded data for learning and the actual characteristics of the speech data for learning corresponding to the encoded data for learning Mismatch information representing a difference value from the actual characteristic is output, and the teacher data generation step and the student data generation step include The audio data for learning is output as teacher data as it is, and the student data generation step encodes and decodes the audio data for learning by the CELP method, and outputs the decoding result as student data. The data generation step generates a residual signal by performing linear prediction analysis of the speech data for learning and driving a prediction filter using the resulting linear prediction coefficient as a filter coefficient by the speech data for learning, The residual signal is output as teacher data, and the student data generation step encodes and decodes the speech data for learning by the CELP method, and obtains the residual signal that drives the speech synthesis filter obtained as a result. Output as data, or the teacher data generation step performs linear predictive analysis of the speech data for learning and obtains the result. The linear prediction coefficient is output as teacher data, and the student data generation step encodes and decodes the speech data for learning using the CELP method, and obtains the linear prediction coefficient that is the filter coefficient of the speech synthesis filter obtained as a result. When the learning data is output as student data, and the learning data is image data, the encoding step encodes the learning image data by the MPEG (Moving Picture Experts Group) method, and converts the learning encoded data into the learning data. Output and the comparison output step is a mismatch information representing a combination of the characteristic data included in the learning encoded data and the actual characteristic that is the actual characteristic of the learning image data corresponding to the learning encoded data In the teacher data generation step and the student data generation step, the teacher data generation step outputs the learning image data as it is. Also, it is output as teacher data, and the student data generation step encodes and decodes the learning image data by the MPEG method and outputs the decoding result as student data, or the teacher data generation step Are directly output as teacher data, and the student data generation step encodes and decodes the learning image data by the MPEG method, and outputs the resulting two-dimensional DCT coefficients as student data. Alternatively, the teacher data generation step performs two-dimensional DCT transformation on the learning image data, and outputs the two-dimensional DCT coefficient obtained as a result as teacher data, and the student data generation step converts the learning image data into , Encode and decode by MPEG method, and output the resulting 2D DCT coefficients as student data That the learning process, is a program for causing a computer.

本発明の記録媒体は、データを符号化した符号化データであって、少なくとも、データの特性を表す特性データを含む符号化データを復号するために行われる、符号化データとの予測演算に用いられるタップ係数を学習する学習処理を、コンピュータに行なわせるためのプログラムが記録されている記録媒体であって、学習用のデータから、タップ係数の学習の教師となる教師データを生成して出力する教師データ生成ステップと、学習用のデータから、タップ係数の学習の生徒となる生徒データを生成して出力する生徒データ生成ステップと、学習用のデータを符号化し、そのデータについての特性データを含む学習用の符号化データを出力する符号化ステップと、学習用の符号化データに含まれる特性データと、学習用の符号化データに対応する学習用のデータの実際の特性である実特性とを比較して得られるミスマッチ情報を出力する比較出力ステップと、ミスマッチ情報に基づき、教師データと生徒データを用いて、タップ係数を学習する学習ステップとを含み、学習ステップは、注目している教師データである注目教師データを複数のクラスのうちのいずれかのクラスにクラス分類するのに用いるクラスタップを、ミスマッチ情報に予め対応付けられているクラスタップの抽出パターンで、生徒データから抽出するクラスタップ抽出ステップと、クラスタップに基づいて、注目教師データをクラス分類し、対応するクラスのクラスコードを出力するクラス分類ステップと、注目教師データについて、タップ係数との予測演算に用いる予測タップを、ミスマッチ情報に予め対応付けられている予測タップの抽出パターンで、生徒データから抽出する予測タップ抽出ステップと、予測タップとタップ係数とを用いて予測演算を行うことにより得られる教師データの予測値の予測誤差が統計的に最小になるタップ係数を、クラスごとに求めるタップ係数演算ステップとを有し、学習用のデータが音声データである場合、符号化ステップは、学習用の音声データを、ＣＥＬＰ(Code Excited Liner Prediction coding)方式によって符号化して、学習用の符号化データを出力し、比較出力ステップは、学習用の符号化データに含まれる特性データと、学習用の符号化データに対応する学習用の音声データの実際の特性である実特性との差分値を表すミスマッチ情報を出力し、教師データ生成ステップおよび生徒データ生成ステップは、教師データ生成ステップが、学習用の音声データを、そのまま、教師データとして出力し、生徒データ生成ステップが、学習用の音声データを、ＣＥＬＰ方式によって符号化して復号し、その復号結果を、生徒データとして出力するか、教師データ生成ステップが、学習用の音声データを線形予測分析し、その結果得られる線形予測係数をフィルタ係数とする予測フィルタを、学習用の音声データにより駆動することによって、残差信号を生成し、その残差信号を、教師データとして出力し、生徒データ生成ステップが、学習用の音声データをＣＥＬＰ方式によって符号化して復号し、その結果得られる、音声合成フィルタを駆動する残差信号を、生徒データとして出力するか、または、教師データ生成ステップが、学習用の音声データを線形予測分析し、その結果得られる線形予測係数を、教師データとして出力し、生徒データ生成ステップが、学習用の音声データをＣＥＬＰ方式によって符号化して復号し、その結果得られる、音声合成フィルタのフィルタ係数となる線形予測係数を、生徒データとして出力し、学習用のデータが画像データである場合、符号化ステップは、学習用の画像データを、ＭＰＥＧ(Moving Picture Experts Group)方式によって符号化して、学習用の符号化データを出力し、比較出力ステップは、学習用の符号化データに含まれる特性データと、学習用の符号化データに対応する学習用の画像データの実際の特性である実特性との組合せを表すミスマッチ情報を出力し、教師データ生成ステップおよび生徒データ生成ステップは、教師データ生成ステップが、学習用の画像データを、そのまま、教師データとして出力し、生徒データ生成ステップが、学習用の画像データを、ＭＰＥＧ方式によって符号化して復号し、その復号結果を、生徒データとして出力するか、教師データ生成ステップが、学習用の画像データを、そのまま、教師データとして出力し、生徒データ生成ステップが、学習用の画像データを、ＭＰＥＧ方式によって符号化して復号し、その結果得られる２次元ＤＣＴ係数を、生徒データとして出力するか、または、教師データ生成ステップが、学習用の画像データを２次元ＤＣＴ変換し、その結果得られる２次元ＤＣＴ係数を、教師データとして出力し、生徒データ生成ステップが、学習用の画像データを、ＭＰＥＧ方式によって符号化して復号し、その結果得られる２次元ＤＣＴ係数を、生徒データとして出力する学習処理を、コンピュータに行わせるためのプログラムが記録されている記録媒体である。 The recording medium of the present invention is encoded data obtained by encoding data, and is used for prediction calculation with encoded data, which is performed to decode at least encoded data including characteristic data representing the characteristics of the data. Is a recording medium on which a program for causing a computer to perform a learning process for learning a tap coefficient is recorded, and teacher data serving as a teacher for learning the tap coefficient is generated from the learning data and output. Including a teacher data generation step, a student data generation step for generating and outputting student data as students for learning tap coefficients from the learning data, encoding the learning data, and including characteristic data about the data The encoding step for outputting the encoded data for learning, the characteristic data included in the encoded data for learning, and the encoded data for learning The comparison output step for outputting mismatch information obtained by comparing the actual characteristics, which are the actual characteristics of the corresponding learning data, and learning the tap coefficient using the teacher data and the student data based on the mismatch information A learning step, and in the learning step, class taps used for classifying attention teacher data, which is teacher data of interest, into one of a plurality of classes are associated with mismatch information in advance. A class tap extraction step for extracting from student data with a class tap extraction pattern, a class classification step for classifying attention teacher data based on the class tap, and outputting a class code of the corresponding class, and a attention teacher for data, the prediction tap used to predict operation of the tap coefficients, advance the mismatch information In extraction pattern of prediction tap is attached response, the prediction tap extraction step of extracting from the student data, the prediction error of the prediction value of the teacher data obtained by performing prediction calculation by using the prediction tap and tap coefficients Statistics When the learning data is speech data, the encoding step converts the learning speech data into CELP (Code Excited Liner). Prediction coding) is used to output learning encoded data, and the comparison output step includes characteristic data included in the learning encoded data and learning speech corresponding to the learning encoded data. Mismatch information representing the difference between the actual characteristics of the data and the actual characteristics is output. The teacher data generation step and the student data generation step The data generation step outputs the learning voice data as it is as teacher data, and the student data generation step encodes and decodes the learning voice data by the CELP method, and the decoding result is used as student data. Or the teacher data generation step performs linear predictive analysis on the speech data for learning and drives a prediction filter using the resulting linear prediction coefficient as a filter coefficient by the speech data for learning. A signal is generated, the residual signal is output as teacher data, and the student data generation step encodes and decodes the speech data for learning by the CELP method, and obtains a residual signal for driving the speech synthesis filter obtained as a result. The difference signal is output as student data, or the teacher data generation step linearizes the audio data for learning. A linear prediction coefficient obtained by measurement and analysis is output as teacher data, and a student data generation step encodes and decodes learning speech data by the CELP method, and obtains a filter of a speech synthesis filter obtained as a result When the linear prediction coefficient that is a coefficient is output as student data and the learning data is image data, the encoding step encodes the learning image data by the MPEG (Moving Picture Experts Group) method, The learning encoded data is output, and the comparison output step is an actual characteristic that is the actual characteristic of the characteristic data included in the encoded learning data and the actual image data corresponding to the encoded learning data. Mismatch information representing the combination of the teacher data generation step and the student data generation step, the teacher data generation step The image data is output as teacher data as it is, and the student data generation step encodes and decodes the learning image data by the MPEG method, and outputs the decoding result as student data or generates teacher data. The step outputs the learning image data as teacher data as it is, and the student data generation step encodes and decodes the learning image data by the MPEG method, and the two-dimensional DCT coefficient obtained as a result is The data is output as student data, or the teacher data generation step performs two-dimensional DCT conversion on the learning image data, and the resulting two-dimensional DCT coefficients are output as teacher data. The student data generation step performs learning Image data is encoded and decoded by the MPEG method, and the resulting two-dimensional DCT coefficients are The learning process of outputting as student data, a recording medium having a program recorded for causing a computer.

本発明の学習装置および学習方法、並びにプログラムおよび記録媒体においては、学習用のデータから、タップ係数の学習の教師となる教師データと、生徒となる生徒データが生成されて出力される。さらに、学習用のデータが符号化され、そのデータについての特性データを含む学習用の符号化データが出力される。そして、学習用の符号化データに含まれる特性データと、学習用の符号化データに対応する学習用のデータの実際の特性である実特性とを比較して得られるミスマッチ情報に基づき、教師データと生徒データを用いて、タップ係数の学習が行われる。 In the learning device, the learning method, the program, and the recording medium of the present invention, teacher data serving as a teacher for learning tap coefficients and student data serving as students are generated and output from learning data. Further, learning data is encoded, and encoded learning data including characteristic data for the data is output. Based on the mismatch information obtained by comparing the characteristic data included in the learning encoded data and the actual characteristic that is the actual characteristic of the learning data corresponding to the learning encoded data , the teacher data The tap coefficient is learned using the student data.

本発明の学習装置および学習方法、並びにプログラムおよび記録媒体によれば、符号化データを、高品質のデータに復号することが可能となるタップ係数を学習することができる。 According to the learning device, the learning method, the program, and the recording medium of the present invention, it is possible to learn a tap coefficient that enables decoding of encoded data into high-quality data.

図１は、本発明を適用した復号装置の一実施の形態の構成例を示している。 FIG. 1 shows a configuration example of an embodiment of a decoding device to which the present invention is applied.

復号装置には、図示せぬ記録媒体（例えば、光ディスクや、光磁気ディスク、相変化ディスク、磁気テープ、半導体メモリ等）から再生された符号化データ、または伝送媒体（例えば、インターネットや、ＣＡＴＶ網、衛星回線、地上波等）を介して伝送されてくる符号化データが、復号対象として入力されるようになっている。ここで、符号化データは、所定のデータを所定の符号化方式で符号化して得られるもので、少なくとも、所定のデータの特性を表す特性データを含んでいる。 The decoding apparatus includes encoded data reproduced from a recording medium (not shown) (for example, an optical disk, a magneto-optical disk, a phase change disk, a magnetic tape, a semiconductor memory, etc.) or a transmission medium (for example, the Internet or a CATV network). Encoded data transmitted via a satellite line, terrestrial wave, etc.) is input as a decoding target. Here, the encoded data is obtained by encoding predetermined data with a predetermined encoding method, and includes at least characteristic data representing characteristics of the predetermined data.

なお、符号化データとしては、例えば、後述するように、音声データをＣＥＬＰ(Code Excited Liner Prediction coding)方式で符号化したものや、画像データをＭＰＥＧ２方式で符号化したもの等を採用することができる。 As the encoded data, for example, as described later, audio data encoded by CELP (Code Excited Linear Prediction coding) method, image data encoded by MPEG2 method, or the like may be employed. it can.

ここで、符号化データが、音声データをＣＥＬＰ方式で符号化したものである場合には、その符号化データには、ラグを表すＬコードが含まれる。このラグは、符号化された音声データのピッチ周期に対応し、従って、ピッチ周期という音声データの特性を表すから、特性データということができる。 Here, when the encoded data is audio data encoded by the CELP method, the encoded data includes an L code representing a lag. Since this lag corresponds to the pitch period of the encoded voice data, and thus represents the characteristic of the voice data called the pitch period, it can be called characteristic data.

また、符号化データが、画像データをＭＰＥＧ２方式で符号化したものである場合には、前述したように、その符号化データには、ＤＣＴタイプが含まれ、このＤＣＴタイプは、画像の動きなどに基づいて決定されるので、画像の特性を表しており、やはり、特性データということができる。 In addition, when the encoded data is obtained by encoding image data by the MPEG2 system, as described above, the encoded data includes a DCT type. Therefore, it represents the characteristics of the image, and can also be referred to as characteristic data.

なお、復号装置において復号対象とする符号化データは、上述のようなＣＥＬＰ方式で符号化された音声データや、ＭＰＥＧ２方式で符号化された画像データに限定されるものではない。 Note that the encoded data to be decoded in the decoding device is not limited to audio data encoded by the CELP method as described above or image data encoded by the MPEG2 method.

復号装置に入力された符号化データは、ミスマッチ検出部１と復号処理部２に供給されるようになっている。 The encoded data input to the decoding device is supplied to the mismatch detection unit 1 and the decoding processing unit 2.

ミスマッチ検出部１は、符号化データからミスマッチ情報を検出する。即ち、ミスマッチ検出部１は、符号化データに含まれる特性データの正しさを判定し、その判定結果を表すミスマッチ情報を、復号処理部２に出力する。復号処理部２は、ミスマッチ検出部１から供給されるミスマッチ情報に基づいて、符号化データを復号し、その結果得られる復号データを出力する。 The mismatch detection unit 1 detects mismatch information from the encoded data. That is, the mismatch detection unit 1 determines the correctness of the characteristic data included in the encoded data, and outputs mismatch information representing the determination result to the decoding processing unit 2. The decoding processing unit 2 decodes the encoded data based on the mismatch information supplied from the mismatch detection unit 1, and outputs decoded data obtained as a result.

次に、図２のフローチャートを参照して、図１の復号装置の処理（復号処理）について説明する。 Next, processing (decoding processing) of the decoding device in FIG. 1 will be described with reference to the flowchart in FIG.

ミスマッチ検出部１と復号処理部２には、符号化データが供給され、ミスマッチ検出部１は、まず最初に、ステップＳ１において、符号化データからミスマッチ情報を検出し、復号処理部２に供給して、ステップＳ２に進む。ステップＳ２では、復号処理部２が、ミスマッチ検出部１から供給されるミスマッチ情報に基づいて、そのミスマッチ情報が検出された符号化データを復号し、復号データを出力して、ステップＳ３に進む。ステップＳ３では、ミスマッチ検出部１または復号処理部２が、復号すべき符号化データが、まだ存在するかどうかを判定する。ステップＳ３において、復号すべき符号化データが、まだ存在すると判定された場合、ステップＳ１に戻り、以下、同様の処理が繰り返される。 Encoded data is supplied to the mismatch detection unit 1 and the decoding processing unit 2, and the mismatch detection unit 1 first detects mismatch information from the encoded data and supplies it to the decoding processing unit 2 in step S1. Then, the process proceeds to step S2. In step S2, the decoding processing unit 2 decodes the encoded data in which the mismatch information is detected based on the mismatch information supplied from the mismatch detection unit 1, outputs the decoded data, and proceeds to step S3. In step S3, the mismatch detection unit 1 or the decoding processing unit 2 determines whether there is still encoded data to be decoded. If it is determined in step S3 that encoded data to be decoded still exists, the process returns to step S1, and the same processing is repeated thereafter.

また、ステップＳ３において、復号すべき符号化データが存在しないと判定された場合、処理を終了する。 If it is determined in step S3 that there is no encoded data to be decoded, the process ends.

次に、図３は、本発明を適用した復号装置の他の実施の形態の構成例を示している。なお、図中、図１における場合と対応する部分については、同一の符号を付してあり、以下では、その説明は、適宜省略する。即ち、図３の復号装置は、パラメータ記憶部３が、新たに設けられている他は、基本的に、図１の復号装置と同様に構成されている。 Next, FIG. 3 shows a configuration example of another embodiment of a decoding device to which the present invention is applied. In the figure, portions corresponding to those in FIG. 1 are denoted by the same reference numerals, and description thereof will be omitted below as appropriate. That is, the decoding apparatus in FIG. 3 is basically configured in the same manner as the decoding apparatus in FIG. 1 except that the parameter storage unit 3 is newly provided.

パラメータ記憶部３は、後述する学習装置による学習によって得られたパラメータを記憶しており、復号処理部２は、パラメータ記憶部３に記憶されたパラメータを用いて、そこに供給される符号化データを復号する。 The parameter storage unit 3 stores parameters obtained by learning by a learning device, which will be described later, and the decoding processing unit 2 uses the parameters stored in the parameter storage unit 3 and supplies encoded data supplied thereto. Is decrypted.

従って、図３の復号装置では、復号処理部２において、符号化データの復号が、パラメータ記憶部３に記憶されたパラメータを用いて行われる他は、図１の復号装置と同様の処理が行われるため、その処理についての説明は省略する。 Therefore, in the decoding device of FIG. 3, the decoding processing unit 2 performs the same processing as the decoding device of FIG. 1 except that the encoded data is decoded using the parameters stored in the parameter storage unit 3. Therefore, the description of the process is omitted.

次に、図４は、図３のパラメータ記憶部３に記憶させるパラメータを学習する学習装置の一実施の形態の構成例を示している。 Next, FIG. 4 shows a configuration example of an embodiment of a learning device that learns parameters to be stored in the parameter storage unit 3 of FIG.

学習用データ記憶部１１は、パラメータの学習に用いられるデータである学習用データを記憶している。 The learning data storage unit 11 stores learning data that is data used for parameter learning.

符号化部１２は、学習用データ記憶部１１に記憶されている学習用データを読み出し、図３の復号装置で復号対象とする符号化データと同一の符号化方式で、学習用データを符号化する。学習用データを符号化することにより得られる符号化データ（以下、適宜、学習用符号化データという）は、符号化部１２からミスマッチ検出部１３に供給されるようになっている。 The encoding unit 12 reads the learning data stored in the learning data storage unit 11 and encodes the learning data with the same encoding method as the encoded data to be decoded by the decoding device in FIG. To do. Encoded data obtained by encoding the learning data (hereinafter referred to as learning encoded data as appropriate) is supplied from the encoding unit 12 to the mismatch detection unit 13.

ミスマッチ検出部１３は、図３のミスマッチ検出部１と同様に構成され、符号化部１２から供給される符号化データから、ミスマッチ情報を検出し、学習処理部１４に供給する。 The mismatch detection unit 13 is configured in the same manner as the mismatch detection unit 1 in FIG. 3, detects mismatch information from the encoded data supplied from the encoding unit 12, and supplies the mismatch information to the learning processing unit 14.

学習処理部１４は、学習用データ記憶部１１に記憶されている学習用データを読み出し、その学習用データから、パラメータについての学習の教師となる教師データと、その学習の生徒となる生徒データを生成する。さらに、学習処理部１４は、ミスマッチ検出部１３から供給されるミスマッチ情報に基づき、生成した教師データと生徒データを用いて、パラメータを学習する。 The learning processing unit 14 reads out the learning data stored in the learning data storage unit 11 and, from the learning data, the teacher data serving as a learning teacher about the parameters and the student data serving as the learning student. Generate. Further, the learning processing unit 14 learns parameters using the generated teacher data and student data based on the mismatch information supplied from the mismatch detection unit 13.

次に、図５のフローチャートを参照して、図４の学習装置の処理（学習処理）について説明する。 Next, processing (learning processing) of the learning device in FIG. 4 will be described with reference to the flowchart in FIG.

まず最初に、ステップＳ１１において、符号化部１２は、学習用データ記憶部１１に記憶されている学習用データを読み出して符号化し、その結果得られる学習用符号化データを、ミスマッチ検出部１３に供給して、ステップＳ１２に進む。ステップＳ１２では、ミスマッチ検出部１３が、符号化部１２から供給される符号化データから、ミスマッチ情報を検出し、学習処理部１４に供給して、ステップＳ１３に進む。 First, in step S 11, the encoding unit 12 reads and encodes the learning data stored in the learning data storage unit 11, and sends the learning encoded data obtained as a result to the mismatch detection unit 13. Then, the process proceeds to step S12. In step S12, the mismatch detection unit 13 detects mismatch information from the encoded data supplied from the encoding unit 12, supplies the mismatch information to the learning processing unit 14, and proceeds to step S13.

ステップＳ１３では、学習処理部１４が、学習用データ記憶部１１から、学習用データを読み出し、その学習用データから、教師データと生徒データを生成する。さらに、学習処理部１４は、ミスマッチ検出部１３から供給されるミスマッチ情報に基づき、生成した教師データと生徒データを用いて、パラメータを学習する。 In step S13, the learning processing unit 14 reads the learning data from the learning data storage unit 11, and generates teacher data and student data from the learning data. Further, the learning processing unit 14 learns parameters using the generated teacher data and student data based on the mismatch information supplied from the mismatch detection unit 13.

即ち、学習処理部１４は、ミスマッチ情報に基づき、生徒データから、対応する教師データを得ることができるようにするのに最適なパラメータを算出することができるようにするための処理（学習）を行う。 That is, the learning processing unit 14 performs a process (learning) for calculating an optimum parameter for enabling the corresponding teacher data to be obtained from the student data based on the mismatch information. Do.

そして、ステップＳ１４に進み、符号化部１２または学習処理部１４が、まだ処理していない学習用データが、学習用データ記憶部１１に記憶されているかどうかを判定する。ステップＳ１４において、まだ処理していない学習用データが、学習用データ記憶部１１に記憶されていると判定された場合、ステップＳ１１に戻り、その、まだ処理していない学習用データを対象に、以下、同様の処理が繰り返される。 In step S14, the encoding unit 12 or the learning processing unit 14 determines whether learning data that has not yet been processed is stored in the learning data storage unit 11. If it is determined in step S14 that learning data that has not yet been processed is stored in the learning data storage unit 11, the process returns to step S11, and the learning data that has not yet been processed is targeted. Thereafter, the same processing is repeated.

また、ステップＳ１４において、まだ処理していない学習用データが、学習用データ記憶部１１に記憶されていないと判定された場合、即ち、学習用データ記憶部１１に記憶された学習用データすべてを用いて学習を行った場合、ステップＳ１５に進み、学習処理部１４は、ステップＳ１３の学習結果に基づき、パラメータを算出し、処理を終了する。 If it is determined in step S14 that the learning data not yet processed is not stored in the learning data storage unit 11, that is, all the learning data stored in the learning data storage unit 11 are stored. When learning is performed using the learning process, the process proceeds to step S15, where the learning processing unit 14 calculates parameters based on the learning result of step S13, and ends the process.

次に、符号化データが、音声データをＣＥＬＰ方式で符号化したものである場合の復号装置と学習装置の詳細について説明する。なお、本実施の形態では、復号装置および学習装置は、本件出願人が先に提案したクラス分類適応処理を利用したものとなっている。 Next, details of the decoding device and the learning device in the case where the encoded data is audio data encoded by the CELP method will be described. In the present embodiment, the decoding device and the learning device utilize the class classification adaptation process previously proposed by the applicant.

クラス分類適応処理は、クラス分類処理と適応処理とからなり、クラス分類処理によって、データが、その性質に基づいてクラス分けされ、各クラスごとに適応処理が施される。 The class classification adaptation process includes a class classification process and an adaptation process. By the class classification process, data is classified based on its property, and the adaptation process is performed for each class.

ここで、適応処理について、低音質の音声（以下、適宜、低音質音声という）を、高音質の音声（以下、適宜、高音質音声という）に変換する場合を例に説明する。 Here, the adaptive processing will be described by taking, as an example, a case where low-quality sound (hereinafter, appropriately referred to as low-quality sound) is converted into high-quality sound (hereinafter, appropriately referred to as high-quality sound).

この場合、適応処理では、低音質音声を構成する音声サンプル（以下、適宜、低音質音声サンプルという）と、所定のタップ係数との線形結合により、その低音質音声の音質を向上させた高音質音声の音声サンプルの予測値を求めることで、その低音質音声の音質を高くした音声が得られる。 In this case, in the adaptive processing, a high sound quality in which the sound quality of the low sound quality sound is improved by linear combination of a sound sample constituting the low sound quality sound (hereinafter referred to as a low sound quality sound sample as appropriate) and a predetermined tap coefficient. By obtaining the predicted value of the voice sample of the voice, it is possible to obtain a voice in which the quality of the low-quality voice is improved.

具体的には、例えば、いま、ある高音質音声データを教師データとするとともに、その高音質音声の音質を劣化させた低音質音声データを生徒データとして、高音質音声を構成する音声サンプル（以下、適宜、高音質音声サンプルという）ｙの予測値Ｅ［ｙ］を、幾つかの低音質音声サンプル（低音質音声を構成する音声サンプル）ｘ₁，ｘ₂，・・・の集合と、所定のタップ係数ｗ₁，ｗ₂，・・・の線形結合により規定される線形１次結合モデルにより求めることを考える。この場合、予測値Ｅ［ｙ］は、次式で表すことができる。 Specifically, for example, a certain high sound quality voice data is used as teacher data, and a low sound quality sound data obtained by degrading the sound quality of the high sound quality sound is used as student data, and a sound sample (hereinafter referred to as a high sound quality sound) The predicted value E [y] of y (referred to as a high-quality sound sample, as appropriate), a set of several low-quality sound samples (sound samples constituting low-quality sound) x ₁ , x ₂ ,. Suppose that it is obtained by a linear linear combination model defined by linear combination of tap coefficients w ₁ , w ₂ ,. In this case, the predicted value E [y] can be expressed by the following equation.

Ｅ［ｙ］＝ｗ₁ｘ₁＋ｗ₂ｘ₂＋・・・
・・・（１） E [y] = w ₁ x ₁ + w ₂ x ₂ +...
... (1)

式（１）を一般化するために、タップ係数ｗ_jの集合でなる行列Ｗ、生徒データｘ_ijの集合でなる行列Ｘ、および予測値Ｅ［ｙ_j］の集合でなる行列Ｙ’を、

で定義すると、次のような観測方程式が成立する。 In order to generalize equation (1), a matrix W composed of a set of tap coefficients w _j , a matrix X composed of a set of student data x _ij , and a matrix Y ′ composed of a set of predicted values E [y _j ]

Then, the following observation equation holds.

ＸＷ＝Ｙ’
・・・（２） XW = Y '
... (2)

ここで、行列Ｘの成分ｘ_ijは、ｉ件目の生徒データの集合（ｉ件目の教師データｙ_iの予測に用いる生徒データの集合）の中のｊ番目の生徒データを意味し、行列Ｗの成分ｗ_jは、生徒データの集合の中のｊ番目の生徒データとの積が演算されるタップ係数を表す。また、ｙ_iは、ｉ件目の教師データを表し、従って、Ｅ［ｙ_i］は、ｉ件目の教師データの予測値を表す。なお、式（１）の左辺におけるｙは、行列Ｙの成分ｙ_iのサフィックスｉを省略したものであり、また、式（１）の右辺におけるｘ₁，ｘ₂，・・・も、行列Ｘの成分ｘ_ijのサフィックスｉを省略したものである。 Here, the component x _ij of the matrix X means the j-th student data in the i-th set of student data (the set of student data used for prediction of the _i-th teacher data y _i ). A component w _j of W represents a tap coefficient by which a product with the jth student data in the set of student data is calculated. Y _i represents the i-th teacher data, and therefore E [y _i ] represents the predicted value of the i-th teacher data. Note that y on the left side of the equation (1) is obtained by omitting the suffix i of the component y _i of the matrix Y, and x ₁ , x ₂ ,. The suffix i of the component x _ij is omitted.

式（２）の観測方程式に最小自乗法を適用して、高音質音声サンプルｙに近い予測値Ｅ［ｙ］を求めることを考える。この場合、教師データとなる高音質音声サンプルの真値ｙの集合でなる行列Ｙ、および高音質音声サンプルｙの予測値Ｅ［ｙ］の残差（真値ｙに対する誤差）ｅの集合でなる行列Ｅを、

で定義すると、式（２）から、次のような残差方程式が成立する。 Consider that the least square method is applied to the observation equation of Equation (2) to obtain a predicted value E [y] close to the high-quality sound sample y. In this case, it is composed of a matrix Y composed of a set of true values y of high sound quality speech samples as teacher data and a set of residuals (errors relative to the true value y) e of predicted values E [y] of the high sound quality speech samples y. Matrix E

From the equation (2), the following residual equation is established.

ＸＷ＝Ｙ＋Ｅ
・・・（３） XW = Y + E
... (3)

この場合、高音質音声サンプルｙに近い予測値Ｅ［ｙ］を求めるためのタップ係数ｗ_jは、自乗誤差

を最小にすることで求めることができる。 In this case, the tap coefficient w _j for obtaining the predicted value E [y] close to the high-quality sound sample y is a square error.

Can be obtained by minimizing.

従って、上述の自乗誤差をタップ係数ｗ_jで微分したものが０になる場合、即ち、次式を満たすタップ係数ｗ_jが、高音質音声サンプルｙに近い予測値Ｅ［ｙ］を求めるため最適値ということになる。 Accordingly, when the above-mentioned square error differentiated by the tap coefficient w _j is 0, that is, the tap coefficient w _j satisfying the following equation is optimal for obtaining the predicted value E [y] close to the high-quality sound sample y. It will be value.

・・・（４）

... (4)

そこで、まず、式（３）を、タップ係数ｗ_jで微分することにより、次式が成立する。 Therefore, first, the following equation is established by differentiating the equation (3) by the tap coefficient w _j .

・・・（５）

... (5)

式（４）および（５）より、式（６）が得られる。 From equations (4) and (5), equation (6) is obtained.

・・・（６）

... (6)

さらに、式（３）の残差方程式における生徒データｘ_ij、タップ係数ｗ_j、教師データｙ_i、および残差ｅ_iの関係を考慮すると、式（６）から、次のような正規方程式を得ることができる。 Further, considering the relationship among the student data x _ij , the tap coefficient w _j , the teacher data y _i , and the residual e _{i in} the residual equation of Equation (3), the following normal equation is obtained from Equation (6): Obtainable.

・・・（７）

... (7)

なお、式（７）に示した正規方程式は、行列（共分散行列）Ａおよびベクトルｖを、

で定義するとともに、ベクトルＷを、数１で示したように定義すると、式
ＡＷ＝ｖ
・・・（８）で表すことができる。 In addition, the normal equation shown in Expression (7) has a matrix (covariance matrix) A and a vector v,

And when the vector W is defined as shown in Equation 1, the equation AW = v
(8)

式（７）における各正規方程式は、生徒データｘ_ijおよび教師データｙ_iのセットを、ある程度の数だけ用意することで、求めるべきタップ係数ｗ_jの数Ｊと同じ数だけたてることができ、従って、式（８）を、ベクトルＷについて解くことで（但し、式（８）を解くには、式（８）における行列Ａが正則である必要がある）、最適なタップ係数ｗ_jを求めることができる。なお、式（８）を解くにあたっては、例えば、掃き出し法（Gauss-Jordanの消去法）などを用いることが可能である。 Each normal equation in equation (7) can be set to the same number as the number J of tap coefficients w _{j to} be obtained by preparing a certain number of sets of student data x _ij and teacher data y _i. Therefore, by solving the equation (8) with respect to the vector W (however, to solve the equation (8), the matrix A in the equation (8) needs to be regular), the optimal tap coefficient w _j is _calculated . Can be sought. In solving the equation (8), for example, a sweeping method (Gauss-Jordan elimination method) or the like can be used.

以上のように、生徒データと教師データを用いて、最適なタップ係数（ここでは、生徒データから教師データの予測値を求めた場合に、その予測値の自乗誤差の総和を最小にするタップ係数）ｗ_jを求める学習をしておき、さらに、そのタップ係数ｗ_jを用い、式（１）により、教師データｙに近い予測値Ｅ［ｙ］を求めるのが適応処理である。 As described above, the optimum tap coefficient using the student data and the teacher data (in this case, when the predicted value of the teacher data is obtained from the student data, the tap coefficient that minimizes the sum of the square errors of the predicted values) ) leave the learning for determining the w _j, further using the tap coefficient w _j, the equation (1), an adaptive processing determine the closest prediction value E [y] to the teacher data y.

なお、適応処理は、低音質音声には含まれていないが、高音質音声に含まれる成分が再現される点で、単なる補間とは異なる。即ち、適応処理では、式（１）だけを見る限りは、いわゆる補間フィルタを用いての単なる補間と同一に見えるが、その補間フィルタのタップ係数に相当するタップ係数ｗが、教師データｙを用いての、いわば学習により求められるため、高音質音声に含まれる成分を再現することができる。このことから、適応処理は、いわば音声の創造作用がある処理ということができる。 Note that adaptive processing is not included in low-quality sound, but differs from simple interpolation in that a component included in high-quality sound is reproduced. That is, in the adaptive processing, as long as only the expression (1) is seen, it looks the same as simple interpolation using a so-called interpolation filter, but the tap coefficient w corresponding to the tap coefficient of the interpolation filter uses the teacher data y. In other words, since it is obtained by learning, it is possible to reproduce components included in high-quality sound. From this, it can be said that the adaptive process is a process having a voice creation action.

また、上述の場合には、教師データとして、高音質の音声データを用いるとともに、生徒データとして、教師データとしての音声データを低音質にした音声データを用いるようにしたが、その他、例えば、教師データとして、高画質の画像データを用いるとともに、生徒データとして、教師データとしての画像データに対して間引きを行ったり、ノイズを加えたり、あるいは、ローパスフィルタによるフィルタリングを施す等して低画質にしたものを用いるようにすることが可能である。この場合、低画質の画像を、高画質の画像（の予測値）に変換するタップ係数を得ることができる。 In the above-described case, high-quality sound data is used as teacher data, and sound data having low sound quality as teacher data is used as student data. High-quality image data was used as the data, and as the student data, the image data as the teacher data was thinned out, added with noise, or filtered with a low-pass filter to reduce the image quality. It is possible to use one. In this case, a tap coefficient for converting a low-quality image into a high-quality image (predicted value thereof) can be obtained.

さらに、例えば、教師データとして、高画質の画像データを用いるとともに、生徒データとして、教師データとしての画像データを２次元ＤＣＴ変換し、さらに量子化、逆量子化して得られる２次元ＤＣＴ係数を用いるようにすることも可能である。この場合、２次元ＤＣＴ係数を、高画質の画像（の予測値）に変換するタップ係数を得ることができる。 Furthermore, for example, high-quality image data is used as the teacher data, and two-dimensional DCT coefficients obtained by performing two-dimensional DCT conversion on the image data as the teacher data, and further quantizing and dequantizing the student data are used. It is also possible to do so. In this case, a tap coefficient for converting the two-dimensional DCT coefficient into a high-quality image (predicted value thereof) can be obtained.

また、上述の場合には、高音質音声の予測値を、線形１次予測するようにしたが、その他、予測値は、２次以上の式によって予測することも可能である。 In the above-described case, the predicted value of high-quality sound is linearly predicted, but the predicted value can also be predicted by a quadratic or higher formula.

図６は、上述のようなクラス分類適応処理によって、低音質音声データを高音質音声データに変換する音声データ処理装置の構成例を示している。 FIG. 6 shows a configuration example of an audio data processing device that converts low-quality sound data into high-quality sound data by the class classification adaptive processing as described above.

低音質音声データは、ピッチ検出部２１、並びにタップ抽出部２２および２３に供給されるようになっている。 The low sound quality voice data is supplied to the pitch detection unit 21 and the tap extraction units 22 and 23.

ピッチ検出部２１は、そこに供給される低音質音声データのピッチ周期を検出し、タップ抽出部２２および２３に供給する。 The pitch detection unit 21 detects the pitch period of the low-quality sound data supplied thereto and supplies the detected pitch period to the tap extraction units 22 and 23.

タップ抽出部２２は、高音質音声データの音声サンプルを、順次、注目データとし、さらに、その注目データを予測するのに用いる低音質音声データの幾つかの音声サンプルを、予測タップとして抽出する。また、タップ抽出部２３は、注目データをクラス分類するのに用いる低音質音声データの幾つかの音声サンプルを、クラスタップとして抽出する。 The tap extraction unit 22 sequentially extracts voice samples of the high-quality sound data as attention data, and further extracts some sound samples of the low-quality sound data used for predicting the attention data as prediction taps. Further, the tap extraction unit 23 extracts some audio samples of the low sound quality audio data used for classifying the attention data as class taps.

ここで、タップ抽出部２２は、低音質音声データの音声サンプルのうち、注目データに対応する音声サンプルに近い位置にある幾つかの音声サンプルを、予測タップとして抽出する。また、タップ抽出部２２は、ピッチ検出部２１から供給される注目データに対応する位置のピッチ周期にしたがい、予測タップの構造を変更する。即ち、タップ抽出部２２は、ピッチ周期に応じて、予測タップとする低音質音声データの音声サンプルを変更する。具体的には、例えば、ピッチ周期が長い場合には、タップ抽出部２２は、低音質音声データの音声サンプルのうち、注目データに対応する音声サンプルから比較的広い範囲にわたって、所定数の音声サンプルを、予測タップとして抽出する。また、例えば、ピッチ周期が短い場合には、タップ抽出部２２は、低音質音声データの音声サンプルのうち、注目データに対応する音声サンプルから比較的狭い範囲にわたって、所定数の音声サンプルを、予測タップとして抽出する。 Here, the tap extraction unit 22 extracts, as prediction taps, some voice samples located at positions close to the voice sample corresponding to the data of interest among the voice samples of the low sound quality voice data. Further, the tap extraction unit 22 changes the structure of the prediction tap according to the pitch period of the position corresponding to the attention data supplied from the pitch detection unit 21. That is, the tap extraction unit 22 changes the sound sample of the low-quality sound data that is the prediction tap according to the pitch period. Specifically, for example, when the pitch period is long, the tap extraction unit 22 selects a predetermined number of audio samples over a relatively wide range from the audio samples corresponding to the data of interest among the audio samples of the low-quality audio data. Are extracted as prediction taps. For example, when the pitch period is short, the tap extraction unit 22 predicts a predetermined number of audio samples over a relatively narrow range from the audio samples corresponding to the data of interest among the audio samples of the low-quality audio data. Extract as a tap.

タップ抽出部２３も、タップ抽出部２２と同様にして、低音質音声データから、クラスタップを抽出する。 The tap extraction unit 23 also extracts class taps from the low-quality sound data in the same manner as the tap extraction unit 22.

なお、ここでは、予測タップとクラスタップは、説明を簡単にするために、同一のタップ構造を有するものとする。但し、予測タップとクラスタップとは、異なるタップ構造とすることが可能である。 Here, the prediction tap and the class tap have the same tap structure in order to simplify the description. However, the prediction tap and the class tap can have different tap structures.

タップ抽出部２２で得られた予測タップは、予測部２６に供給され、タップ抽出部２３で得られたクラスタップは、クラス分類部２４に供給される。 The prediction tap obtained by the tap extraction unit 22 is supplied to the prediction unit 26, and the class tap obtained by the tap extraction unit 23 is supplied to the class classification unit 24.

クラス分類部２４は、タップ抽出部２３からのクラスタップに基づき、注目データをクラス分類し、その結果得られるクラスに対応するクラスコードを、係数メモリ２５に出力する。 The class classification unit 24 classifies the data of interest based on the class tap from the tap extraction unit 23, and outputs a class code corresponding to the class obtained as a result to the coefficient memory 25.

ここで、クラス分類を行う方法としては、例えば、ADRC(Adaptive Dynamic Range Coding)等を採用することができる。 Here, as a method of classifying, for example, ADRC (Adaptive Dynamic Range Coding) or the like can be employed.

ADRCを用いる方法では、クラスタップを構成する音声サンプルが、ADRC処理され、その結果得られるADRCコードにしたがって、注目データのクラスが決定される。 In the method using ADRC, voice samples constituting a class tap are subjected to ADRC processing, and the class of data of interest is determined according to the ADRC code obtained as a result.

なお、KビットADRCにおいては、例えば、クラスタップを構成する音声サンプルの最大値MAXと最小値MINが検出され、DR=MAX-MINを、集合の局所的なダイナミックレンジとし、このダイナミックレンジDRに基づいて、クラスタップを構成する音声サンプルがKビットに再量子化される。即ち、クラスタップを構成する各音声サンプルから、最小値MINが減算され、その減算値がDR/2^Kで除算（量子化）される。そして、以上のようにして得られる、クラスタップを構成するKビットの各音声サンプルを、所定の順番で並べたビット列が、ADRCコードとして出力される。従って、クラスタップが、例えば、１ビットADRC処理された場合には、そのクラスタップを構成する各音声サンプルは、最小値MINが減算された後に、最大値MAXと最小値MINとの平均値で除算され（小数点以下切り捨て）、これにより、各音声サンプルが１ビットとされる（２値化される）。そして、その１ビットの音声サンプルを所定の順番で並べたビット列が、ADRCコードとして出力される。 In the K-bit ADRC, for example, the maximum value MAX and the minimum value MIN of the audio samples constituting the class tap are detected, and DR = MAX-MIN is set as the local dynamic range of the set, and the dynamic range DR Based on this, the speech samples that make up the class tap are requantized to K bits. That is, from each voice sample forming the class taps, the minimum value MIN is subtracted, and the subtracted value is divided (quantized) by DR / 2 ^K. Then, a bit string obtained by arranging the K-bit audio samples constituting the class tap in a predetermined order, which is obtained as described above, is output as an ADRC code. Therefore, when a class tap is subjected to, for example, 1-bit ADRC processing, each audio sample constituting the class tap is an average value of the maximum value MAX and the minimum value MIN after the minimum value MIN is subtracted. Division (rounded down after the decimal point) is performed, whereby each audio sample is converted into one bit (binarized). A bit string in which the 1-bit audio samples are arranged in a predetermined order is output as an ADRC code.

なお、クラス分類部２４には、例えば、クラスタップを構成する音声サンプルのレベル分布のパターンを、そのままクラスコードとして出力させることも可能である。しかしながら、この場合、クラスタップが、Ｎ個の音声サンプルで構成され、各音声サンプルに、Ｋビットが割り当てられているとすると、クラス分類部２４が出力するクラスコードの場合の数は、（２^N）^K通りとなり、音声サンプルのビット数Ｋに指数的に比例した膨大な数となる。 Note that the class classification unit 24 can output, for example, the level distribution pattern of the audio sample constituting the class tap as it is as the class code. However, in this case, if the class tap is composed of N speech samples, and K bits are assigned to each speech sample, the number of class codes output by the class classification unit 24 is (2 ^N ) ^K, which is a huge number that is exponentially proportional to the number of bits K of the audio sample.

従って、クラス分類部２４においては、クラスタップの情報量を、上述のADRC処理や、あるいはベクトル量子化等によって圧縮することにより、クラス分類を行うのが好ましい。 Therefore, the class classification unit 24 preferably performs class classification by compressing the information amount of the class tap by the above-described ADRC processing or vector quantization.

係数メモリ２５は、各クラスコードに対応するアドレスに、そのクラスコードに対応するクラスのタップ係数を記憶しており、クラス分類部２４から供給されるクラスコードに対応するアドレスに記憶されているタップ係数を、予測部２６に供給する。 The coefficient memory 25 stores the tap coefficient of the class corresponding to the class code at the address corresponding to each class code, and the tap stored at the address corresponding to the class code supplied from the class classification unit 24. The coefficient is supplied to the prediction unit 26.

予測部２６は、タップ抽出部２２が出力する予測タップと、係数メモリ２５が出力するタップ係数とを取得し、その予測タップとタップ係数とを用いて、式（１）に示した線形予測演算を行う。これにより、予測部２６は、注目データとしての高音質音声データ（の予測値）を求めて出力する。 The prediction unit 26 acquires the prediction tap output from the tap extraction unit 22 and the tap coefficient output from the coefficient memory 25, and uses the prediction tap and the tap coefficient to perform the linear prediction calculation shown in Expression (1). I do. Thereby, the prediction unit 26 obtains (outputs a predicted value) of high-quality sound data as attention data and outputs it.

次に、図７は、図６の係数メモリ２５に記憶させるタップ係数を学習する学習装置の構成例を示している。 Next, FIG. 7 shows a configuration example of a learning device that learns tap coefficients to be stored in the coefficient memory 25 of FIG.

学習装置には、高音質音声データが、学習用音声データとして入力されるようになっており、この学習用音声データは、時間間引きフィルタ３１に供給されるとともに、教師データとして、足し込み部３６に供給される。 High-quality sound data is input to the learning device as learning sound data. The learning sound data is supplied to the time thinning filter 31 and is added as a teacher data to the adder 36. To be supplied.

時間間引きフィルタ３１は、学習用音声データとしての高音質音声データの音声サンプルを、所定の間引き率で間引き、これにより、低音質音声データを生成し、生徒データとして、ピッチ検出部３２、並びにタップ抽出部３３および３４に供給する。 The time decimation filter 31 decimates audio samples of high sound quality audio data as learning audio data at a predetermined decimation rate, thereby generating low sound quality audio data, and pitch detection unit 32 and taps as student data It supplies to the extraction parts 33 and 34.

ピッチ検出部３２は、そこに供給される生徒データとしての低音質音声データのピッチ周期を検出し、タップ抽出部３３および３４に供給する。 The pitch detection unit 32 detects the pitch period of the low-quality sound data as the student data supplied thereto, and supplies it to the tap extraction units 33 and 34.

タップ抽出部３３は、教師データとしての高音質音声データの音声サンプルを、順次、注目データとし、その注目データについて、図６のタップ抽出部２２が構成するのと同一構造の予測タップを、そこに供給される生徒データとしての低音質音声データから、幾つかの音声サンプルを抽出することにより構成する。タップ抽出部３４も、注目データについて、図６のタップ抽出部２３が構成するのと同一構造のクラスタップを、そこに供給される生徒データとしての低音質音声データから、幾つかの音声サンプルを抽出することにより構成する。 The tap extraction unit 33 sequentially uses voice samples of high-quality sound data as teacher data as attention data, and the prediction data having the same structure as the tap extraction unit 22 of FIG. It is constructed by extracting some sound samples from the low sound quality sound data as the student data supplied to. The tap extraction unit 34 also selects a class tap having the same structure as the tap extraction unit 23 shown in FIG. 6 for the data of interest, and extracts some audio samples from the low-quality sound data as student data supplied thereto. Configure by extracting.

なお、タップ抽出部３３と３４は、それぞれ、図６のタップ抽出部２２と２３と同様に、ピッチ検出部３２から供給される、注目データに対応する位置のピッチ周期に応じて、予測タップとクラスタップのタップ構造を変更するようになっている。 Note that the tap extraction units 33 and 34, respectively, in the same manner as the tap extraction units 22 and 23 in FIG. 6, are predicted taps according to the pitch period of the position corresponding to the data of interest supplied from the pitch detection unit 32. The tap structure of the class tap is changed.

タップ抽出部３３で得られた予測タップは、足し込み部３６に供給され、タップ抽出部３４で得られたクラスタップは、クラス分類部３５に供給される。 The prediction tap obtained by the tap extraction unit 33 is supplied to the addition unit 36, and the class tap obtained by the tap extraction unit 34 is supplied to the class classification unit 35.

クラス分類部３５は、図６のクラス分類部２４における場合と同様に、タップ抽出部３３からのクラスタップに基づき、注目データをクラス分類し、その結果得られるクラスに対応するクラスコードを、足し込み部３６に出力する。 Similar to the case of the class classification unit 24 of FIG. 6, the class classification unit 35 classifies the attention data based on the class tap from the tap extraction unit 33, and adds the class code corresponding to the class obtained as a result. Output to the insertion unit 36.

足し込み部３６は、そこに供給される教師データのうち、注目データとなっている教師データと、タップ抽出部３３から供給される予測タップを構成する生徒データを対象とした足し込みを、クラス分類部３５から供給されるクラスコードごとに行う。 The adding unit 36 classifies the adding of the teacher data serving as attention data among the teacher data supplied thereto and the student data constituting the prediction tap supplied from the tap extracting unit 33 into the class. This is performed for each class code supplied from the classification unit 35.

即ち、足し込み部３６は、クラス分類部３５から供給されるクラスコードに対応するクラスごとに、予測タップ（生徒データ）を用い、式（８）の行列Ａにおける各コンポーネントとなっている、生徒データどうしの乗算（ｘ_inｘ_im）と、サメーション（Σ）に相当する演算を行う。 That is, the adding unit 36 uses the prediction tap (student data) for each class corresponding to the class code supplied from the class classifying unit 35, and is a student in the matrix A of Expression (8). An operation corresponding to multiplication (x _in x _im ) between data and summation (Σ) is performed.

さらに、足し込み部３６は、やはり、クラス分類部３５から供給されるクラスコードに対応するクラスごとに、予測タップ（生徒データ）および注目データ（教師データ）を用い、式（８）のベクトルｖにおける各コンポーネントとなっている、生徒データと教師データの乗算（ｘ_inｙ_i）と、サメーション（Σ）に相当する演算を行う。 Furthermore, the adding unit 36 also uses the prediction tap (student data) and the attention data (teacher data) for each class corresponding to the class code supplied from the class classification unit 35, and uses the vector v of Expression (8). The calculation corresponding to multiplication (x _in y _i ) of student data and teacher data and summation (Σ), which are the components _in FIG.

即ち、足し込み部３６は、前回、注目データとされた教師データについて求められた式（８）における行列Ａのコンポーネントと、ベクトルｖのコンポーネントを、その内蔵するメモリ（図示せず）に記憶しており、その行列Ａまたはベクトルｖの各コンポーネントに対して、新たに注目データとされた教師データについて、その教師データｙ_iおよび生徒データx_in(x_im)を用いて計算される、対応するコンポーネントｘ_inｘ_imまたはｘ_inｙ_iを足し込む（行列Ａ、ベクトルｖにおけるサメーションで表される加算を行う）。 In other words, the adding unit 36 stores the component of the matrix A and the component of the vector v in the formula (8) obtained for the teacher data, which was previously regarded as the attention data, in a built-in memory (not shown). Corresponding to each component of the matrix A or vector v, the teacher data newly set as the attention data is calculated using the teacher data y _i and the student data x _in (x _im ) Add component x _in x _im or x _in y _i (addition represented by summation in matrix A, vector v).

そして、足し込み部３６は、そこに供給される教師データすべてを注目データとして、上述の足し込みを行うことにより、各クラスについて、式（８）に示した正規方程式をたて、タップ係数算出部３７に供給する。 Then, the adding unit 36 sets the normal equation shown in the equation (8) for each class by performing the above addition using all the teacher data supplied thereto as the data of interest, and calculates the tap coefficient. To the unit 37.

タップ係数算出部３７は、足し込み部３６から供給されるクラスごとの正規方程式を解くことにより、各クラスごとのタップ係数を求めて出力する。図６の係数メモリ２５には、このようにして求められたクラスごとのタップ係数が記憶されている。 The tap coefficient calculation unit 37 finds and outputs the tap coefficient for each class by solving the normal equation for each class supplied from the addition unit 36. The coefficient memory 25 in FIG. 6 stores the tap coefficients for each class determined in this way.

なお、入力される学習用音声データのサンプル数が十分でないこと等に起因して、タップ係数を求めるのに必要な数の正規方程式が得られないクラスが生じることがあり得るが、そのようなクラスについては、タップ係数算出部３７は、例えば、デフォルトのタップ係数を出力するようになっている。 In addition, there may be a class in which the number of normal equations necessary for obtaining tap coefficients cannot be obtained due to an insufficient number of samples of input learning speech data. For the class, the tap coefficient calculation unit 37 outputs a default tap coefficient, for example.

次に、図８および図９を参照して、ＣＥＬＰ方式による音声データの符号化と復号について説明する。なお、ＣＥＬＰ方式としては、広義には、ＶＳＥＬＰ(Vector Sum Excited Liner Prediction)，ＰＳＩ−ＣＥＬＰ(Pitch Synchronous Innovation CELP)，ＣＳ−ＡＣＥＬＰ(Conjugate Structure Algebraic CELP)等があるが、ここでは、例えば、ＶＳＥＬＰ方式を例に説明する。 Next, encoding and decoding of audio data by the CELP method will be described with reference to FIGS. As CELP methods, there are VSELP (Vector Sum Excited Linear Prediction), PSI-CELP (Pitch Synchronous Innovation CELP), CS-ACELP (Conjugate Structure Algebraic CELP), etc., but here, for example, VSELP A method will be described as an example.

図８は、音声データを、ＶＳＥＬＰ方式により符号化するＶＳＥＬＰ符号化装置の構成例を示している。 FIG. 8 shows a configuration example of a VSELP encoding apparatus that encodes audio data by the VSELP method.

符号化対象の音声は、マイク（マイクロフォン）４１に入力され、そこで、電気信号としての音声信号に変換され、Ａ／Ｄ(Analog/Digital)変換部４２に供給される。Ａ／Ｄ変換部４２は、マイク４１からのアナログの音声信号を、例えば、８ｋＨｚ等のサンプリング周波数でサンプリングすることにより、ディジタルの音声信号にＡ／Ｄ変換し、さらに、所定のビット数で量子化を行って、演算器４３とＬＰＣ(Liner Prediction Coefficient)分析部４４に供給する。 The audio to be encoded is input to a microphone (microphone) 41, where it is converted into an audio signal as an electrical signal, and supplied to an A / D (Analog / Digital) converter 42. The A / D converter 42 samples the analog audio signal from the microphone 41 at a sampling frequency such as 8 kHz to perform A / D conversion into a digital audio signal, and further performs quantum quantization with a predetermined number of bits. Then, the data is supplied to a calculator 43 and an LPC (Liner Prediction Coefficient) analysis unit 44.

ＬＰＣ分析部４４は、Ａ／Ｄ変換部４２からの音声信号を、例えば、１６０サンプル分の長さのフレームごとにＬＰＣ分析し、Ｐ次の線形予測係数α₁，α₂，・・・，α_Pを求める。そして、ＬＰＣ分析部４４は、このＰ次の線形予測係数α_p（ｐ＝１，２，・・・，Ｐ）を要素とするベクトルを、音声の特徴ベクトルとして、ベクトル量子化部４５に供給する。 The LPC analysis unit 44 performs LPC analysis on the audio signal from the A / D conversion unit 42 for each frame having a length of 160 samples, for example, and P-order linear prediction coefficients α ₁ , α ₂ ,. Find α _P. Then, the LPC analysis unit 44 supplies a vector having the P-th order linear prediction coefficient α _p (p = 1, 2,..., P) as an element to the vector quantization unit 45 as a speech feature vector. To do.

ベクトル量子化部４５は、線形予測係数を要素とするコードベクトルとコードとを対応付けたコードブックを記憶しており、そのコードブックに基づいて、ＬＰＣ分析部４４からの特徴ベクトルαをベクトル量子化し、そのベクトル量子化の結果得られるコード（以下、適宜、Ａコード(A_code)という）を、コード決定部５５に供給する。 The vector quantization unit 45 stores a code book in which a code vector having a linear prediction coefficient as an element and a code are associated with each other, and based on the code book, the feature vector α from the LPC analysis unit 44 is converted into a vector quantum. And a code obtained as a result of the vector quantization (hereinafter referred to as A code (A_code) as appropriate) is supplied to the code determination unit 55.

さらに、ベクトル量子化部４５は、コード決定部５５に出力したＡコードに対応するコードベクトルα’を構成する要素となっている線形予測係数α₁’，α₂’，・・・，α_P’を、音声合成フィルタ４６に供給する。 Further, the vector quantizing unit 45 linear predictive coefficients α ₁ ′, α ₂ ′,..., Α _P that are elements constituting the code vector α ′ corresponding to the A code output to the code determining unit 55. 'Is supplied to the speech synthesis filter 46.

音声合成フィルタ４６は、例えば、ＩＩＲ(Infinite Impulse Response)型のディジタルフィルタで、ベクトル量子化部４５からの線形予測係数α_p’（ｐ＝１，２，・・・，Ｐ）をＩＩＲフィルタのフィルタ係数（タップ係数）とするとともに、演算器５４から供給される残差信号ｅを入力信号として、音声合成を行う。 The speech synthesis filter 46 is, for example, an IIR (Infinite Impulse Response) type digital filter, which converts the linear prediction coefficient α _p ′ (p = 1, 2,..., P) from the vector quantization unit 45 into the IIR filter. In addition to the filter coefficient (tap coefficient), speech synthesis is performed using the residual signal e supplied from the computing unit 54 as an input signal.

即ち、ＬＰＣ分析部４４で行われるＬＰＣ分析は、現在時刻ｎの音声信号（のサンプル値）ｓ_n、およびこれに隣接する過去のＰ個のサンプル値ｓ_n-1，ｓ_n-2，・・・，ｓ_n-Pに、式
ｓ_n＋α₁ｓ_n-1＋α₂ｓ_n-2＋・・・＋α_Pｓ_n-P＝e_n
・・・（９）で示す線形１次結合が成立すると仮定し、現在時刻ｎのサンプル値ｓ_nの予測値（線形予測値）ｓ_n’を、過去のＰ個のサンプル値ｓ_n-1，ｓ_n-2，・・・，ｓ_n-Pを用いて、式
ｓ_n’＝−（α₁ｓ_n-1＋α₂ｓ_n-2＋・・・＋α_Pｓ_n-P）
・・・（１０）
によって線形予測したときに、実際のサンプル値ｓ_nと線形予測値ｓ_n’との間の自乗誤差を最小にする線形予測係数α_pを求めるものである。 That, LPC analysis performed by the LPC analysis section 44, the sample value s _n-1 of the audio signal (sample value) s _n, and past adjacent to P number of the current time n, s _n-2, · ..., to s _nP, formula _{_{_{s n + α 1 s n-}}} 1 + α 2 s n-2 + ··· + α P s nP = e n
(9) Assuming that the linear primary combination shown in (9) is established, the predicted value (linear predicted value) s _n ′ of the sample value s _n at the current time n is used as the past P sample values s _n−1. , S _n-2 ,..., S _nP , the expression s _n ′ = − (α ₁ s _n−1 + α ₂ s _n−2 +... + Α _P s _nP )
... (10)
The linear prediction coefficient α _p that minimizes the square error between the actual sample value s _n and the linear prediction value s _n ′ when linear prediction is performed by the above-described method is obtained.

ここで、式（９）において、｛e_n｝（・・・，e_n-1，e_n，e_n+1，・・・）は、平均値が０で、分散が所定値σ²の互いに無相関な確率変数である。 Here, in equation (9), {e _n } (..., E _n−1 , e _n , e _{n + 1} ,...) Has an average value of 0 and a variance of the predetermined value σ ² . They are random variables that are uncorrelated with each other.

式（９）から、サンプル値ｓ_nは、式
ｓ_n＝e_n−（α₁ｓ_n-1＋α₂ｓ_n-2＋・・・＋α_Pｓ_n-P）
・・・（１１）で表すことができ、これを、Ｚ変換すると、次式が成立する。 Equation (9), the sample value s _n has the formula _{_{s n = e n - (α}} 1 s n-1 + α 2 s n-2 + ··· + α P s nP)
(11) When this is converted to Z, the following equation is established.

Ｓ＝Ｅ／（１＋α₁ｚ^-1＋α₂ｚ^-2＋・・・＋α_Pｚ^-P）
・・・（１２）
但し、式（１２）において、ＳとＥは、式（１１）におけるｓ_nとｅ_nのＺ変換を、それぞれ表す。 S = E / (1 + α ₁ z ⁻¹ + α ₂ z ⁻² +... + Α _P z ^−P )
(12)
However, in the equation (12), S and E, the Z-transform of s _n and e _n in the equation (11) represents, respectively.

ここで、式（９）および（１０）から、ｅ_nは、式
ｅ_n＝ｓ_n−ｓ_n’
・・・（１３）で表すことができ、実際のサンプル値ｓ_nと線形予測値ｓ_n’との間の残差信号と呼ばれる。 Here, from equation (9) and (10), e _n is the formula e _{_{_n}} = s _n -s _n '
(13), which is called a residual signal between the actual sample value s _n and the linear prediction value s _n ′.

従って、式（１２）から、線形予測係数α_pをＩＩＲフィルタのタップ係数とするとともに、残差信号ｅ_nをＩＩＲフィルタの入力信号とすることにより、音声信号ｓ_nを求めることができる。 Therefore, from equation (12), the linear prediction coefficient alpha _p with the tap coefficients of the IIR filter, by the residual signal e _n as an input signal of the IIR filter, it is possible to obtain the speech signal s _n.

そこで、音声合成フィルタ４６は、上述したように、ベクトル量子化部４５からの線形予測係数α_p’をタップ係数とするとともに、演算器５４から供給される残差信号ｅを入力信号として、式（１２）を演算し（残差信号ｅをフィルタリングし）、音声信号（合成音信号）ｓｓを求める。 Therefore, as described above, the speech synthesis filter 46 uses the linear prediction coefficient α _p ′ from the vector quantization unit 45 as a tap coefficient and the residual signal e supplied from the computing unit 54 as an input signal. (12) is calculated (the residual signal e is filtered) to obtain a voice signal (synthesized sound signal) ss.

なお、音声合成フィルタ４６では、ＬＰＣ分析部４４によるＬＰＣ分析の結果得られる線形予測係数α_pではなく、そのベクトル量子化の結果得られるコードに対応するコードベクトルとしての線形予測係数α_p’が、フィルタ係数として用いられるため、音声合成フィルタ４６が出力する合成音信号は、Ａ／Ｄ変換部４２が出力する音声信号とは、基本的に同一にはならない。 Note that in the speech synthesis filter 46, not the linear prediction coefficient α _p obtained as a result of the LPC analysis by the LPC analysis unit 44 but the linear prediction coefficient α _p ′ as a code vector corresponding to the code obtained as a result of the vector quantization. Therefore, the synthesized sound signal output from the speech synthesis filter 46 is not basically the same as the speech signal output from the A / D converter 42.

音声合成フィルタ４６が出力する合成音信号ｓｓは、演算器４３に供給される。演算器４３は、音声合成フィルタ４６からの合成音信号ｓｓから、Ａ／Ｄ変換部４２が出力する音声信号ｓを減算し、その減算値を、自乗誤差演算部４７に供給する。自乗誤差演算部４７は、演算器４３からの減算値の自乗和（第ｋフレームのサンプル値についての自乗和）を演算し、その結果得られる自乗誤差を、自乗誤差最小判定部４８に供給する。 The synthesized sound signal ss output from the speech synthesis filter 46 is supplied to the computing unit 43. The computing unit 43 subtracts the speech signal s output from the A / D conversion unit 42 from the synthesized sound signal ss from the speech synthesis filter 46 and supplies the subtraction value to the square error computation unit 47. The square error calculation unit 47 calculates the square sum of the subtraction values from the calculator 43 (the square sum of the sample values of the k-th frame), and supplies the square error obtained as a result to the square error minimum determination unit 48. .

自乗誤差最小判定部４８は、自乗誤差演算部４７が出力する自乗誤差に対応付けて、ラグを表すコードとしてのＬコード(L_code)、ゲインを表すコードとしてのＧコード(G_code)、および符号語を表すコードとしてのＩコード(I_code)を記憶しており、自乗誤差演算部４７が出力する自乗誤差に対応するＬコード、Ｇコード、およびＬコードを出力する。Ｌコードは、適応コードブック記憶部４９に、Ｇコードは、ゲイン復号器５０に、Ｉコードは、励起コードブック記憶部５１に、それぞれ供給される。さらに、Ｌコード、Ｇコード、およびＩコードは、コード決定部５５にも供給される。 The square error minimum determination unit 48 is associated with the square error output by the square error calculation unit 47, an L code (L_code) as a code representing lag, a G code (G_code) as a code representing gain, and a code word I code (I_code) is stored as a code representing, and L code, G code, and L code corresponding to the square error output by the square error calculation unit 47 are output. The L code is supplied to the adaptive codebook storage unit 49, the G code is supplied to the gain decoder 50, and the I code is supplied to the excitation codebook storage unit 51. Further, the L code, the G code, and the I code are also supplied to the code determination unit 55.

適応コードブック記憶部４９は、例えば７ビットのＬコードと、所定の遅延時間（ラグ）とを対応付けた適応コードブックを記憶しており、演算器５４から供給される残差信号ｅを、自乗誤差最小判定部４８から供給されるＬコードに対応付けられた遅延時間だけ遅延して、演算器５２に出力する。 The adaptive codebook storage unit 49 stores an adaptive codebook in which, for example, a 7-bit L code is associated with a predetermined delay time (lag), and the residual signal e supplied from the computing unit 54 is The result is delayed by the delay time associated with the L code supplied from the square error minimum determination unit 48 and output to the computing unit 52.

ここで、適応コードブック記憶部４９は、残差信号ｅを、Ｌコードに対応する時間だけ遅延して出力することから、その出力信号は、その遅延時間を周期とする周期信号に近い信号となる。この信号は、線形予測係数を用いた音声合成において、主として、有声音の合成音を生成するための駆動信号となる。従って、Ｌコードに対応する時間は、有声音のピッチ周期を表すことになる。 Here, since the adaptive codebook storage unit 49 outputs the residual signal e with a delay corresponding to the time corresponding to the L code, the output signal is a signal close to a periodic signal whose period is the delay time. Become. This signal mainly serves as a drive signal for generating a synthesized sound of voiced sound in speech synthesis using a linear prediction coefficient. Therefore, the time corresponding to the L code represents the pitch period of the voiced sound.

ゲイン復号器５０は、Ｇコードと、所定のゲインβおよびγとを対応付けたテーブルを記憶しており、自乗誤差最小判定部４８から供給されるＧコードに対応付けられたゲインβおよびγを出力する。ゲインβとγは、演算器５２と５３に、それぞれ供給される。 The gain decoder 50 stores a table in which G codes are associated with predetermined gains β and γ, and gains β and γ associated with G codes supplied from the square error minimum determination unit 48 are stored. Output. Gains β and γ are supplied to computing units 52 and 53, respectively.

励起コードブック記憶部５１は、例えば９ビットのＩコードと、所定の励起信号とを対応付けた励起コードブックを記憶しており、自乗誤差最小判定部４８から供給されるＩコードに対応付けられた励起信号を、演算器５３に出力する。 The excitation code book storage unit 51 stores, for example, an excitation code book in which a 9-bit I code is associated with a predetermined excitation signal, and is associated with the I code supplied from the square error minimum determination unit 48. The excited signal is output to the computing unit 53.

ここで、励起コードブックに記憶されている励起信号は、例えば、ホワイトノイズ等に近い信号であり、線形予測係数を用いた音声合成において、主として、無声音の合成音を生成するための駆動信号となる。 Here, the excitation signal stored in the excitation codebook is, for example, a signal close to white noise or the like, and in speech synthesis using a linear prediction coefficient, mainly a drive signal for generating unvoiced synthesized sound and Become.

演算器５２は、適応コードブック記憶部４９の出力信号と、ゲイン復号器５０が出力するゲインβとを乗算し、その乗算値ｌを、演算器５４に供給する。演算器５３は、励起コードブック記憶部５１の出力信号と、ゲイン復号器５０が出力するゲインγとを乗算し、その乗算値ｎを、演算器５４に供給する。演算器５４は、演算器５２からの乗算値ｌと、演算器５３からの乗算値ｎとを加算し、その加算値を、残差信号ｅとして、音声合成フィルタ４６に供給する。 The computing unit 52 multiplies the output signal of the adaptive codebook storage unit 49 and the gain β output from the gain decoder 50, and supplies the multiplication value l to the computing unit 54. The computing unit 53 multiplies the output signal of the excitation codebook storage unit 51 by the gain γ output from the gain decoder 50 and supplies the multiplication value n to the computing unit 54. The computing unit 54 adds the multiplication value l from the computing unit 52 and the multiplication value n from the computing unit 53, and supplies the addition value to the speech synthesis filter 46 as a residual signal e.

音声合成フィルタ４６では、以上のようにして、演算器５４から供給される残差信号ｅが、ベクトル量子化部４５から供給される線形予測係数α_p’をタップ係数とするＩＩＲフィルタでフィルタリングされ、その結果得られる合成音信号が、演算器４３に供給される。そして、演算器４３および自乗誤差演算部４７において、上述の場合と同様の処理が行われ、その結果得られる自乗誤差が、自乗誤差最小判定部４８に供給される。 In the speech synthesis filter 46, as described above, the residual signal e supplied from the computing unit 54 is filtered by the IIR filter using the linear prediction coefficient α _p ′ supplied from the vector quantization unit 45 as a tap coefficient. The synthesized sound signal obtained as a result is supplied to the calculator 43. Then, the calculator 43 and the square error calculation unit 47 perform the same process as described above, and the square error obtained as a result is supplied to the square error minimum determination unit 48.

自乗誤差最小判定部４８は、自乗誤差演算部４７からの自乗誤差が最小（極小）になったかどうかを判定する。そして、自乗誤差最小判定部４８は、自乗誤差が最小になっていないと判定した場合、上述のように、その自乗誤差に対応するＬコード、Ｇコード、およびＬコードを出力し、以下、同様の処理が繰り返される。 The square error minimum determination unit 48 determines whether or not the square error from the square error calculation unit 47 is minimized (minimum). When the square error minimum determination unit 48 determines that the square error is not minimized, the L error code, the G code, and the L code corresponding to the square error are output as described above. The process is repeated.

一方、自乗誤差最小判定部４８は、自乗誤差が最小になったと判定した場合（例えば、自乗誤差が、所定の閾値以下となった場合）、確定信号を、コード決定部５５に出力する。コード決定部５５は、ベクトル量子化部４５から供給されるＡコードをラッチするとともに、自乗誤差最小判定部４８から供給されるＬコード、Ｇコード、およびＩコードを順次ラッチするようになっており、自乗誤差最小判定部４８から確定信号を受信すると、そのときラッチしているＡコード、Ｌコード、Ｇコード、およびＩコードを、チャネルエンコーダ５６に供給する。チャネルエンコーダ５６は、コード決定部５５からのＡコード、Ｌコード、Ｇコード、およびＩコードを多重化し、符号化データとして出力する。 On the other hand, when the square error minimum determination unit 48 determines that the square error is minimized (for example, when the square error is equal to or smaller than a predetermined threshold value), the square error minimum determination unit 48 outputs a determination signal to the code determination unit 55. The code determination unit 55 latches the A code supplied from the vector quantization unit 45, and sequentially latches the L code, G code, and I code supplied from the square error minimum determination unit 48. When the confirmation signal is received from the square error minimum determination unit 48, the A code, L code, G code, and I code latched at that time are supplied to the channel encoder 56. The channel encoder 56 multiplexes the A code, L code, G code, and I code from the code determination unit 55 and outputs the multiplexed data.

なお、以下では、説明を簡単にするため、Ａコード、Ｌコード、Ｇコード、およびＩコードは、フレームごとに求められるものとする。但し、例えば、１フレームを、４つのサブフレームに分割し、Ｌコード、Ｇコード、およびＩコードは、サブフレームごとに求めるようにすること等が可能である。 Hereinafter, in order to simplify the description, it is assumed that the A code, the L code, the G code, and the I code are obtained for each frame. However, for example, one frame can be divided into four subframes, and the L code, G code, and I code can be obtained for each subframe.

ここで、図８（後述する図９乃至図１１においても同様）では、各変数に、[k]が付され、配列変数とされている。このkは、フレーム数を表すが、明細書中では、その記述は、適宜省略する。 Here, in FIG. 8 (the same applies to FIGS. 9 to 11 to be described later), [k] is added to each variable, which is an array variable. Although k represents the number of frames, the description thereof is omitted as appropriate in the specification.

次に、図９は、図８のＶＳＥＬＰ符号化装置が出力する符号化データを、ＶＳＥＬＰ方式で復号するＶＳＥＬＰ復号装置の構成例を示している。 Next, FIG. 9 shows a configuration example of a VSELP decoding apparatus that decodes encoded data output from the VSELP encoding apparatus of FIG. 8 by the VSELP method.

図８のＶＳＥＬＰ符号化装置が出力する符号化データは、チャネルデコーダ６１に供給される。チャネルデコーダ６１は、符号化データから、Ｌコード、Ｇコード、Ｉコード、Ａコードを分離し、それぞれを、適応コードブック記憶部６２、ゲイン復号器６３、励起コードブック記憶部６４、フィルタ係数復号器６５に供給する。 The encoded data output from the VSELP encoding apparatus in FIG. 8 is supplied to the channel decoder 61. The channel decoder 61 separates the L code, G code, I code, and A code from the encoded data, and each of them is an adaptive codebook storage unit 62, gain decoder 63, excitation codebook storage unit 64, filter coefficient decoding Supply to the container 65.

適応コードブック記憶部６２、ゲイン復号器６３、励起コードブック記憶部６４、演算器６６乃至６８は、図８の適応コードブック記憶部４９、ゲイン復号器５０、励起コードブック記憶部５１、演算器５２乃至５４とそれぞれ同様に構成されるもので、図８で説明した場合と同様の処理が行われることにより、Ｌコード、Ｇコード、およびＩコードが、残差信号ｅに復号される。この残差信号ｅは、音声合成フィルタ６９に対して、入力信号として与えられる。 The adaptive codebook storage unit 62, the gain decoder 63, the excitation codebook storage unit 64, and the calculators 66 to 68 are the adaptive codebook storage unit 49, the gain decoder 50, the excitation codebook storage unit 51, and the calculator of FIG. 52 to 54 are configured in the same manner, and the same processing as described in FIG. 8 is performed, whereby the L code, the G code, and the I code are decoded into the residual signal e. This residual signal e is given as an input signal to the speech synthesis filter 69.

フィルタ係数復号器６５は、図８のベクトル量子化部４５が記憶しているのと同一のコードブックを記憶しており、Ａコードを、線形予測係数α_p’に復号し、音声合成フィルタ６９に供給する。 The filter coefficient decoder 65 stores the same codebook as that stored in the vector quantization unit 45 of FIG. 8, decodes the A code into the linear prediction coefficient α _p ′, and generates a speech synthesis filter 69. To supply.

音声合成フィルタ６９は、図８の音声合成フィルタ４６と同様に構成されており、フィルタ係数復号器６５からの線形予測係数α_p’をフィルタ係数（タップ係数）とするとともに、演算器６８から供給される残差信号ｅを入力信号として、式（１２）を演算し、これにより、図８の自乗誤差最小判定部４８において自乗誤差が最小と判定されたときの合成音信号を生成し、復号音声データとして出力する。 The speech synthesis filter 69 is configured in the same manner as the speech synthesis filter 46 in FIG. 8, and uses the linear prediction coefficient α _p ′ from the filter coefficient decoder 65 as a filter coefficient (tap coefficient) and is supplied from the computing unit 68. Equation (12) is calculated using the residual signal e to be input as an input signal, thereby generating a synthesized sound signal when the square error minimum determination unit 48 in FIG. Output as audio data.

以上のように、図８のＶＳＥＬＰ符号化装置では、図９のＶＳＥＬＰ復号装置の音声合成フィルタ６９に与えられる残差信号と線形予測係数がコード化されて送信されてくるため、図９のＶＳＥＬＰ復号装置では、そのコードが、残差信号と線形予測係数に復号され、音声合成フィルタ６９に与えられる。 As described above, since the residual signal and the linear prediction coefficient given to the speech synthesis filter 69 of the VSELP decoding apparatus in FIG. 9 are encoded and transmitted in the VSELP encoding apparatus in FIG. 8, the VSELP in FIG. In the decoding device, the code is decoded into a residual signal and a linear prediction coefficient, and given to the speech synthesis filter 69.

しかしながら、この復号された残差信号や線形予測係数（以下、適宜、それぞれを、復号残差信号または復号線形予測係数という）には、量子化誤差（ベクトル量子化による誤差）等の誤差が含まれるため、音声をＬＰＣ分析して得られる残差信号と線形予測係数には一致しない。 However, the decoded residual signal and linear prediction coefficient (hereinafter, appropriately referred to as a decoded residual signal or a decoded linear prediction coefficient) include errors such as quantization error (error due to vector quantization). Therefore, the residual signal obtained by LPC analysis of speech does not match the linear prediction coefficient.

このため、図９のＶＳＥＬＰ復号装置の音声合成フィルタ６９が出力する復号音声データは、歪みを有する、音質の劣化したものとなる。 For this reason, the decoded speech data output from the speech synthesis filter 69 of the VSELP decoding apparatus in FIG. 9 has distortion and deteriorated sound quality.

そこで、ＶＳＥＬＰ復号装置において、上述したクラス分類適応処理を行うようにすることにより、音質を向上させた復号音声データを得ることが可能となる。 Therefore, by performing the above-described class classification adaptive processing in the VSELP decoding device, it is possible to obtain decoded speech data with improved sound quality.

図１０は、そのようなＶＳＥＬＰ復号装置の構成例を示している。なお、図中、図９における場合と対応する部分については、同一の符号を付してあり、以下では、その説明は、適宜省略する。 FIG. 10 shows a configuration example of such a VSELP decoding device. In the figure, portions corresponding to those in FIG. 9 are denoted by the same reference numerals, and description thereof will be omitted below as appropriate.

タップ抽出部８１には、音声合成フィルタ６９が出力する復号音声データが供給されるようになっており、タップ抽出部８１は、図６のタップ抽出部２２と同様に、その復号音声データから、予測タップとするもの（サンプル値）を抽出し、予測部８５に供給する。 The tap extraction unit 81 is supplied with the decoded voice data output from the voice synthesis filter 69, and the tap extraction unit 81 is similar to the tap extraction unit 22 in FIG. What is used as a prediction tap (sample value) is extracted and supplied to the prediction unit 85.

タップ抽出部８２にも、音声合成フィルタ６９が出力する復号音声データが供給されるようになっており、タップ抽出部８２は、図６のタップ抽出部２３と同様に、その復号音声データから、クラスタップとするもの（サンプル値）を抽出し、クラス分類部８３に供給する。 The tap extraction unit 82 is also supplied with the decoded voice data output from the voice synthesis filter 69. The tap extraction unit 82, like the tap extraction unit 23 in FIG. A class tap (sample value) is extracted and supplied to the class classification unit 83.

クラス分類部８３は、図６のクラス分類部２４と同様に、タップ抽出部８２から供給されるクラスタップに基づいて、クラス分類を行い、そのクラス分類結果としてのクラスコードを、係数メモリ８４に供給する。 Similar to the class classification unit 24 in FIG. 6, the class classification unit 83 performs class classification based on the class tap supplied from the tap extraction unit 82, and stores the class code as the class classification result in the coefficient memory 84. Supply.

係数メモリ８４は、後述する図１１の学習装置において学習処理が行われることにより得られる、クラスごとのタップ係数を記憶しており、クラス分類部８３が出力するクラスコードに対応するアドレスに記憶されているタップ係数を、予測部８５に供給する。 The coefficient memory 84 stores tap coefficients for each class obtained by performing a learning process in the learning device of FIG. 11 described later, and is stored at an address corresponding to the class code output by the class classification unit 83. The tap coefficient is supplied to the prediction unit 85.

予測部８５は、図６の予測部２６と同様に、タップ抽出部８１が出力する予測タップと、係数メモリ８４が出力するタップ係数とを取得し、その予測タップとタップ係数とを用いて、式（１）に示した線形予測演算を行う。これにより、予測部８５は、音声合成フィルタ６９が出力する低音質の復号音声データを高音質にした高音質音声データを出力する。 The prediction unit 85 acquires the prediction tap output from the tap extraction unit 81 and the tap coefficient output from the coefficient memory 84, and uses the prediction tap and the tap coefficient, similarly to the prediction unit 26 in FIG. The linear prediction calculation shown in Formula (1) is performed. As a result, the prediction unit 85 outputs high-quality sound data in which the low-quality decoded sound data output from the sound synthesis filter 69 is improved.

なお、タップ抽出部８１には、チャネルデコーダ６１が出力する、フレーム（またはサブフレーム）ごとのＬコード、Ｇコード、Ｉコード、およびＡコードが供給されるようになっている。そして、タップ抽出部８１では、Ｌコード、Ｇコード、Ｉコード、またはＡコードからも、予測タップを抽出することができるようになっている。さらに、タップ抽出部８１では、Ｌコード、Ｇコード、Ｉコード、またはＡコードに基づいて、予測タップのタップ構造を変更することも可能となっている。 Note that the tap extraction unit 81 is supplied with an L code, a G code, an I code, and an A code for each frame (or subframe) output from the channel decoder 61. The tap extraction unit 81 can extract a prediction tap from the L code, G code, I code, or A code. Furthermore, the tap extraction unit 81 can also change the tap structure of the prediction tap based on the L code, G code, I code, or A code.

タップ抽出部８２にも、チャネルデコーダ６１が出力するＬコード、Ｇコード、Ｉコード、およびＡコードが供給されるようになっており、タップ抽出部８２でも、タップ抽出部８１における場合と同様に、Ｌコード、Ｇコード、Ｉコード、またはＡコードからも、クラスタップを抽出し、さらには、Ｌコード、Ｇコード、Ｉコード、またはＡコードに基づいて、クラスタップのタップ構造を変更することが可能となっている。 The tap extraction unit 82 is also supplied with the L code, G code, I code, and A code output from the channel decoder 61, and the tap extraction unit 82 is also the same as in the tap extraction unit 81. Class taps are also extracted from L code, G code, I code, or A code, and further, the tap structure of the class tap is changed based on the L code, G code, I code, or A code Is possible.

次に、図１１は、図１０の係数メモリ８４に記憶させるタップ係数の学習処理を行う学習装置の構成例を示している。 Next, FIG. 11 shows a configuration example of a learning device that performs learning processing of tap coefficients stored in the coefficient memory 84 of FIG.

演算器９３乃至コード決定部１０５は、図８の演算器４３乃至コード決定部４５とそれぞれ同様に構成される。演算器９３には、学習用音声信号が入力されるようになっており、従って、演算器９３乃至コード決定部１０５では、その学習用音声信号に対して、図８における場合と同様の処理が施される。 The computing units 93 to code determination unit 105 are configured in the same manner as the computing units 43 to code determination unit 45 of FIG. A learning speech signal is input to the computing unit 93. Therefore, the computing unit 93 to the code determination unit 105 perform the same processing as that in FIG. 8 on the learning speech signal. Applied.

そして、タップ抽出部１１１と１１２には、自乗誤差最小判定部９８において自乗誤差が最小になったと判定されたときの音声合成フィルタ９６が出力する復号音声データが、生徒データとして供給される。また、足し込み部１１４には、学習用音声信号が、そのまま教師データとして供給される。 The tap extraction units 111 and 112 are supplied with the decoded speech data output from the speech synthesis filter 96 when the square error minimum determination unit 98 determines that the square error is minimized as student data. Further, the learning audio signal is supplied as it is to the adding unit 114 as teacher data.

タップ抽出部１１１は、音声合成フィルタ９６が出力する復号音声データの音声サンプルから、図１０のタップ抽出部８１と同一構造の予測タップを抽出し、足し込み部１１４に供給する。 The tap extraction unit 111 extracts a prediction tap having the same structure as the tap extraction unit 81 in FIG. 10 from the speech sample of the decoded speech data output from the speech synthesis filter 96 and supplies the prediction tap to the addition unit 114.

タップ抽出部１１２も、音声合成フィルタ９６が出力する復号音声データの音声サンプルから、図１０のタップ抽出部８２と同一構造のクラスタップを抽出し、クラス分類部１１３に供給する。 The tap extraction unit 112 also extracts a class tap having the same structure as the tap extraction unit 82 in FIG. 10 from the speech sample of the decoded speech data output from the speech synthesis filter 96 and supplies the class tap to the class classification unit 113.

クラス分類部１１３は、タップ抽出部１１２からのクラスタップに基づいて、図１０のクラス分類部８３における場合と同様のクラス分類を行い、その結果得られるクラスコードを、足し込み部１１４に供給する。 Based on the class tap from the tap extraction unit 112, the class classification unit 113 performs the same class classification as in the class classification unit 83 of FIG. 10 and supplies the resulting class code to the addition unit 114. .

足し込み部１１４は、学習用音声信号を、教師データとして受信するとともに、タップ抽出部１１１からの予測タップを、生徒データとして受信し、その教師データおよび生徒データを対象として、クラス分類部１１３からのクラスコードごとに、図７の足し込み部３６における場合と同様の足し込みを行うことにより、各クラスについて、式（８）に示した正規方程式をたてる。 The adding unit 114 receives the learning speech signal as teacher data, receives the prediction tap from the tap extraction unit 111 as student data, and receives the teacher data and student data from the class classification unit 113 as targets. For each class code, the same addition as in the addition unit 36 of FIG. 7 is performed, whereby the normal equation shown in Expression (8) is established for each class.

タップ係数算出部１１５は、図７のタップ係数算出部３７と同様に、足し込み部１１４においてクラスごとに生成された正規方程式を解くことにより、クラスごとに、タップ係数を求めて出力する。 Similar to the tap coefficient calculation unit 37 in FIG. 7, the tap coefficient calculation unit 115 calculates and outputs a tap coefficient for each class by solving the normal equation generated for each class in the addition unit 114.

図１０の係数メモリ８４には、以上のようにして、タップ係数算出部１１５から出力されるクラスごとのタップ係数が記憶されている。 The coefficient memory 84 in FIG. 10 stores the tap coefficient for each class output from the tap coefficient calculation unit 115 as described above.

従って、図１０の係数メモリ８４に記憶されたタップ係数は、線形予測演算を行うことにより得られる高音質の音声の予測値の予測誤差（自乗誤差）が、統計的に最小になるように学習を行うことにより求められたものであるから、図１０の予測部８５が出力する音声データは、高音質のものとなる。 Therefore, the tap coefficients stored in the coefficient memory 84 of FIG. 10 are learned so that the prediction error (square error) of the predicted value of the high-quality sound obtained by performing the linear prediction calculation is statistically minimized. Therefore, the audio data output by the prediction unit 85 in FIG. 10 is of high sound quality.

なお、タップ抽出部１１１と１１２には、コード決定部１０５が、自乗誤差最小判定部９８から確定信号を受信したときに出力するＬコード、Ｇコード、Ｉコード、およびＡコードが供給されるようになっており、図１０のタップ抽出部８１や８２において、Ｌコード、Ｇコード、Ｉコード、またはＡコードを用いて予測タップやクラスタップが構成される場合には、タップ抽出部１１１や１１２でも、Ｌコード、Ｇコード、Ｉコード、またはＡコードを用いて予測タップやクラスタップが構成されるようになっている。 The tap extraction units 111 and 112 are supplied with the L code, the G code, the I code, and the A code that are output when the code determination unit 105 receives the confirmation signal from the square error minimum determination unit 98. When the tap extraction unit 81 or 82 in FIG. 10 uses the L code, the G code, the I code, or the A code to configure the prediction tap or the class tap, the tap extraction unit 111 or 112 is used. However, prediction taps and class taps are configured using L codes, G codes, I codes, or A codes.

次に、図１２は、図３の復号装置の詳細構成例を示している。 Next, FIG. 12 shows a detailed configuration example of the decoding device of FIG.

符号化特性情報抽出部１２１には、復号対象の符号化データが供給されるようになっており、符号化特性情報抽出部１２１は、符号化データから、その符号化データに含まれる特性データを抽出して、判定部１２３に供給する。 The encoded characteristic information extracting unit 121 is supplied with encoded data to be decoded. The encoded characteristic information extracting unit 121 extracts characteristic data included in the encoded data from the encoded data. Extracted and supplied to the determination unit 123.

実特性抽出部１２２にも、復号対象の符号化データが供給されるようになっており、実特性抽出部１２２は、符号化データに対応する元のデータの実際の特性である実特性を抽出し、判定部１２３に供給する。 The actual characteristic extraction unit 122 is also supplied with the encoded data to be decoded, and the actual characteristic extraction unit 122 extracts the actual characteristic that is the actual characteristic of the original data corresponding to the encoded data. And supplied to the determination unit 123.

ここで、例えば、符号化データが、音声データを符号化したものである場合には、実特性抽出部１２２は、例えば、その音声データのピッチ周期を、実特性として求める。また、例えば、符号化データが、画像データを符号化したものである場合には、実特性抽出部１２２は、例えば、その画像データの動きを評価する評価値を、実特性として求める。 Here, for example, when the encoded data is encoded audio data, the actual characteristic extraction unit 122 determines, for example, the pitch period of the audio data as the actual characteristic. For example, when the encoded data is obtained by encoding image data, the actual characteristic extraction unit 122 obtains, for example, an evaluation value for evaluating the movement of the image data as the actual characteristic.

判定部１２３は、符号化特性情報抽出部１２１から供給される特性データと、実特性抽出部１２２から供給される実特性とを比較することにより、特性データの正しさを判定する。そして、判定部１２３は、その特性データの正しさの判定結果としてのミスマッチ情報を、復号処理部２に出力する。 The determination unit 123 determines the correctness of the characteristic data by comparing the characteristic data supplied from the encoding characteristic information extraction unit 121 and the actual characteristic supplied from the actual characteristic extraction unit 122. Then, the determination unit 123 outputs mismatch information as a determination result of the correctness of the characteristic data to the decoding processing unit 2.

なお、以上の符号化特性情報抽出部１２１、実特性抽出部１２２、および判定部１２３が、ミスマッチ検出部１を構成している。 The encoding characteristic information extraction unit 121, the actual characteristic extraction unit 122, and the determination unit 123 described above constitute the mismatch detection unit 1.

前処理部１３１には、復号対象の符号化データが供給されるようになっており、前処理部１３１は、符号化データに対して、所定の前処理を施し、その結果得られる前処理データを、クラス分類適応処理部１３２に供給する。 The preprocessing unit 131 is supplied with encoded data to be decoded. The preprocessing unit 131 performs predetermined preprocessing on the encoded data, and the preprocessed data obtained as a result thereof. Is supplied to the class classification adaptive processing unit 132.

クラス分類適応処理部１３２は、前処理部１３１から供給される前処理データから、予測タップおよびクラスタップを構成し、係数メモリ１４１を参照することで、上述したようなクラス分類適応処理を行う。そして、クラス分類適応処理部１３２は、クラス分類適応処理を行うことによって得られるデータ（以下、適宜、適応処理データという）を、後処理部１３３に出力する。 The class classification adaptive processing unit 132 configures prediction taps and class taps from the preprocess data supplied from the preprocessing unit 131, and performs the class classification adaptive processing as described above by referring to the coefficient memory 141. Then, the class classification adaptive processing unit 132 outputs data obtained by performing the class classification adaptive processing (hereinafter referred to as adaptive processing data as appropriate) to the post-processing unit 133.

ここで、クラス分類適応処理部１３２には、ミスマッチ検出部１の判定部１２３が出力するミスマッチ情報が供給されるようになっており、クラス分類適応処理部１３２では、このミスマッチ情報に基づき、クラス分類適応処理が行われるようになっている。 Here, the class classification adaptation processing unit 132 is supplied with the mismatch information output from the determination unit 123 of the mismatch detection unit 1, and the class classification adaptation processing unit 132 determines the class based on the mismatch information. Classification adaptation processing is performed.

後処理部１３３は、クラス分類適応処理部１３２が出力するデータに対して、所定の後処理を施し、これにより、符号化データを、高品質の復号データに復号したものを得て出力する。 The post-processing unit 133 performs predetermined post-processing on the data output from the class classification adaptation processing unit 132, thereby obtaining and outputting the encoded data decoded into high-quality decoded data.

なお、以上の前処理部１３１、クラス分類適応処理部１３２、および後処理部１３３が、復号処理部２を構成している。 The above preprocessing unit 131, class classification adaptive processing unit 132, and postprocessing unit 133 constitute the decoding processing unit 2.

係数メモリ１４１は、クラス分類適応処理部１３２がクラス分類適応処理を行うのに用いるクラスごとのタップ係数を記憶している。 The coefficient memory 141 stores tap coefficients for each class used by the class classification adaptation processing unit 132 for performing the class classification adaptation process.

なお、この係数メモリ１４１によって、パラメータ記憶部３が構成されている。 Note that the parameter storage unit 3 is configured by the coefficient memory 141.

次に、図１３は、図１２のクラス分類適応処理部１３２の構成例を示している。 Next, FIG. 13 shows a configuration example of the class classification adaptation processing unit 132 of FIG.

前処理部１３１が出力する前処理データは、タップ抽出部１５１および１５２に供給されるようになっている。 The preprocessing data output from the preprocessing unit 131 is supplied to the tap extraction units 151 and 152.

タップ抽出部１５１は、得ようとしている適応処理データを、注目データとし、さらに、その注目データを予測するのに用いる前処理データの幾つかを、予測タップとして抽出する。また、タップ抽出部１５２は、注目データをクラス分類するのに用いる前処理データの幾つかを、クラスタップとして抽出する。 The tap extraction unit 151 extracts the adaptive processing data to be obtained as attention data, and further extracts some of the preprocessing data used for predicting the attention data as prediction taps. Further, the tap extraction unit 152 extracts some of the preprocess data used for classifying the attention data as class taps.

ここで、タップ抽出部１５１および１５２には、判定部１２３（図１２）が出力するミスマッチ情報も供給されるようになっている。そして、タップ抽出部１５１と１５２は、ミスマッチ情報に基づき、予測タップとクラスタップの構造を、それぞれ変更するようになっている。 Here, mismatch information output from the determination unit 123 (FIG. 12) is also supplied to the tap extraction units 151 and 152. And the tap extraction parts 151 and 152 change the structure of a prediction tap and a class tap, respectively based on mismatch information.

なお、ここでは、説明を簡単にするために、予測タップとクラスタップは、同一のタップ構造を有するものとする。但し、予測タップとクラスタップとは、異なるタップ構造とすることが可能である。 Here, in order to simplify the description, it is assumed that the prediction tap and the class tap have the same tap structure. However, the prediction tap and the class tap can have different tap structures.

タップ抽出部１５１で得られた予測タップは、予測部１５４に供給され、タップ抽出部１５２で得られたクラスタップは、クラス分類部１５３に供給される。 The prediction tap obtained by the tap extraction unit 151 is supplied to the prediction unit 154, and the class tap obtained by the tap extraction unit 152 is supplied to the class classification unit 153.

クラス分類部１５３には、クラスタップの他、ミスマッチ情報も供給されるようになっており、クラス分類部１５３は、タップ抽出部１５２からのクラスタップとミスマッチ情報に基づき、注目データをクラス分類し、その結果得られるクラスに対応するクラスコードを、係数メモリ１４１に供給する。 In addition to the class tap, mismatch information is also supplied to the class classification unit 153. The class classification unit 153 classifies the data of interest based on the class tap and the mismatch information from the tap extraction unit 152. The class code corresponding to the class obtained as a result is supplied to the coefficient memory 141.

係数メモリ１４１は、各クラスコードに対応するアドレスに、そのクラスコードに対応するクラスのタップ係数を記憶しており、クラス分類部１５３から供給されるクラスコードに対応するアドレスに記憶されているタップ係数を、予測部１５４に供給する。 The coefficient memory 141 stores the tap coefficient of the class corresponding to the class code at the address corresponding to each class code, and the tap stored at the address corresponding to the class code supplied from the class classification unit 153 The coefficient is supplied to the prediction unit 154.

予測部１５４は、タップ抽出部１５１が出力する予測タップと、係数メモリ１４１が出力するタップ係数とを取得し、その予測タップとタップ係数とを用いて、式（１）に示した線形予測演算を行う。これにより、予測部１５４は、適応処理データ（の予測値）を求めて出力する。 The prediction unit 154 acquires the prediction tap output from the tap extraction unit 151 and the tap coefficient output from the coefficient memory 141, and uses the prediction tap and the tap coefficient to perform the linear prediction calculation shown in Expression (1). I do. As a result, the prediction unit 154 obtains and outputs the adaptive process data (predicted value thereof).

次に、図１４のフローチャートを参照して、図１２の復号装置の処理（復号処理）について説明する。 Next, processing (decoding processing) of the decoding device in FIG. 12 will be described with reference to the flowchart in FIG.

クラス分類適応処理部１３２（図１３）のタップ抽出部１５１では、得ようとしている適応処理データが、注目データとされ、ステップＳ２１において、ミスマッチ検出部１が、その注目データに対応する符号化データ（以下、適宜、注目符号化データという）から、ミスマッチ情報を生成する。 In the tap extraction unit 151 of the class classification adaptive processing unit 132 (FIG. 13), the adaptive processing data to be obtained is the attention data. In step S21, the mismatch detection unit 1 encodes the encoded data corresponding to the attention data. Mismatch information is generated from (hereinafter, referred to as encoded data of interest as appropriate).

即ち、ミスマッチ検出部１では、符号化特性情報抽出部１２１が、注目符号化データから、その注目符号化データに含まれる特性データを抽出し、判定部１２３に供給するとともに、実特性抽出部１２２が、注目符号化データに対応する元のデータの実際の特性である実特性を抽出し、判定部１２３に供給する。そして、判定部１２３は、符号化特性情報抽出部１２１から供給される特性データと、実特性抽出部１２２から供給される実特性とを比較することにより、特性データの正しさを判定し、その判定結果としてのミスマッチ情報を、クラス分類適応処理部１３２に供給する。 That is, in the mismatch detection unit 1, the encoding characteristic information extraction unit 121 extracts characteristic data included in the target encoded data from the target encoded data, supplies the characteristic data to the determination unit 123, and also the actual characteristic extraction unit 122. However, an actual characteristic that is an actual characteristic of the original data corresponding to the encoded data of interest is extracted and supplied to the determination unit 123. Then, the determination unit 123 determines the correctness of the characteristic data by comparing the characteristic data supplied from the encoding characteristic information extraction unit 121 and the actual characteristic supplied from the actual characteristic extraction unit 122, The mismatch information as the determination result is supplied to the class classification adaptation processing unit 132.

そして、ステップＳ２２に進み、前処理部１３１は、注目データについての予測タップとクラスタップを構成するのに必要な前処理データを得るための符号化データに対して、前処理を施し、その結果得られる前処理データを、クラス分類適応処理部１３２に供給する。 Then, the process proceeds to step S22, and the preprocessing unit 131 performs preprocessing on the encoded data for obtaining preprocessing data necessary for configuring the prediction tap and the class tap for the data of interest, and the result The obtained preprocessing data is supplied to the class classification adaptive processing unit 132.

クラス分類適応処理部１３２（図１３）では、ステップＳ２３において、タップ抽出部１５１と１５２が、前処理部１３１から供給される前処理データを用い、ミスマッチ検出部１からのミスマッチ情報に基づくタップ構造の予測タップとクラスタップを、それぞれ構成する。そして、予測タップは、タップ抽出部１５１から予測部１５４に供給され、クラスタップは、タップ抽出部１５２からクラス分類部１５３に供給される。 In the class classification adaptive processing unit 132 (FIG. 13), in step S23, the tap extraction units 151 and 152 use the preprocessing data supplied from the preprocessing unit 131 and use the tap structure based on the mismatch information from the mismatch detection unit 1. Each of the prediction tap and class tap is configured. Then, the prediction tap is supplied from the tap extraction unit 151 to the prediction unit 154, and the class tap is supplied from the tap extraction unit 152 to the class classification unit 153.

クラス分類部１５３は、タップ抽出部１５２から、注目データについてのクラスタップを受信し、ステップＳ２４において、そのクラスタップと、ミスマッチ検出部１から供給されるミスマッチ情報に基づき、注目データをクラス分類し、注目データのクラスを表すクラスコードを、係数メモリ１４１に出力する。 The class classification unit 153 receives the class tap for the data of interest from the tap extraction unit 152, and classifies the data of interest based on the class tap and the mismatch information supplied from the mismatch detection unit 1 in step S24. The class code representing the class of the data of interest is output to the coefficient memory 141.

係数メモリ１４１は、クラス分類部１５３から供給されるクラスコードに対応するアドレスに記憶されているタップ係数を読み出して出力する。予測部１５４は、ステップＳ２５において、係数メモリ１４１が出力するタップ係数を取得し、ステップＳ２６に進む。 The coefficient memory 141 reads and outputs the tap coefficient stored at the address corresponding to the class code supplied from the class classification unit 153. In step S25, the prediction unit 154 acquires the tap coefficient output from the coefficient memory 141, and proceeds to step S26.

ステップＳ２６では、予測部１５４が、タップ抽出部１５１が出力する予測タップと、係数メモリ１４１から取得したタップ係数とを用いて、式（１）に示した線形予測演算を行う。これにより、予測部１５４は、注目データとしての適応処理データ（の予測値）を求め、後処理部１３３に供給する。 In step S 26, the prediction unit 154 performs the linear prediction calculation shown in Expression (1) using the prediction tap output from the tap extraction unit 151 and the tap coefficient acquired from the coefficient memory 141. As a result, the prediction unit 154 obtains (predicted value) of adaptive processing data as the attention data and supplies it to the post-processing unit 133.

後処理部１３３（図１２）は、ステップＳ２７において、クラス分類適応処理部１３２（の予測部１５４）からの注目データに対して、所定の後処理を施し、これにより、復号データを得て出力する。 In step S27, the post-processing unit 133 (FIG. 12) performs predetermined post-processing on the data of interest from the class classification adaptive processing unit 132 (prediction unit 154 thereof), thereby obtaining decoded data for output. To do.

その後、ステップＳ２８に進み、まだ、注目データとしていない適応処理データがあるかどうかが判定される。ステップＳ２８において、まだ、注目データとしていない適応処理データがあると判定された場合、その、まだ注目データとされていない適応処理データのうちの１つが、新たに注目データとされ、ステップＳ２１に戻り、以下、同様の処理が繰り返される。 Thereafter, the process proceeds to step S28, where it is determined whether there is any adaptive processing data that has not yet been set as the data of interest. If it is determined in step S28 that there is adaptation processing data that has not yet been set as attention data, one of the adaptation processing data that has not yet been set as attention data is newly set as attention data, and the process returns to step S21. Thereafter, the same processing is repeated.

また、ステップＳ２８において、まだ、注目データとされていない適応処理データがないと判定された場合、処理を終了する。 If it is determined in step S28 that there is no adaptive process data that has not yet been set as attention data, the process ends.

次に、図１５は、図１２の係数メモリ１４１に記憶させるタップ係数を学習する場合の、図４の学習装置の詳細構成例を示している。 Next, FIG. 15 shows a detailed configuration example of the learning device in FIG. 4 when learning tap coefficients to be stored in the coefficient memory 141 in FIG.

図１５の実施の形態において、ミスマッチ検出部１３は、符号化特性情報抽出部１７１、実特性抽出部１７２、および判定部１７３から構成されており、符号化部１２が出力する符号化データは、符号化特性情報抽出部１７１、実特性抽出部１７２に供給されるようになっている。符号化特性情報抽出部１７１、実特性抽出部１７２、または判定部１７３は、図１２の符号化特性情報抽出部１２１、実特性抽出部１２２、または判定部１２３とそれぞれ同様に構成されており、図１２で説明した場合と同様に、後述する注目教師データに対応する符号化データから、ミスマッチ情報を求めて、学習処理部１４に供給する。 In the embodiment of FIG. 15, the mismatch detection unit 13 includes an encoding characteristic information extraction unit 171, an actual characteristic extraction unit 172, and a determination unit 173. The encoded data output from the encoding unit 12 is The encoding characteristic information extraction unit 171 and the actual characteristic extraction unit 172 are supplied. The encoding characteristic information extraction unit 171, the actual characteristic extraction unit 172, or the determination unit 173 is configured similarly to the encoding characteristic information extraction unit 121, the actual characteristic extraction unit 122, or the determination unit 123 of FIG. As in the case described with reference to FIG. 12, mismatch information is obtained from encoded data corresponding to attention teacher data to be described later and supplied to the learning processing unit 14.

学習処理部１４は、適応学習部１６０、教師データ生成部１６１、および生徒データ生成部１６３から構成されている。 The learning processing unit 14 includes an adaptive learning unit 160, a teacher data generation unit 161, and a student data generation unit 163.

適応学習部１６０は、教師データ記憶部１６２、生徒データ記憶部１６４、タップ抽出部１６５および１６６、クラス分類部１６７、足し込み部１６８、およびタップ係数算出部１６９から構成され、教師データ生成部１６１は、逆後処理部１６１Ａから構成され、生徒データ生成部１６３は、符号化部１６３Ａおよび前処理部１６３Ｂから構成されている。 The adaptive learning unit 160 includes a teacher data storage unit 162, a student data storage unit 164, tap extraction units 165 and 166, a class classification unit 167, an addition unit 168, and a tap coefficient calculation unit 169, and a teacher data generation unit 161. Includes a reverse post-processing unit 161A, and the student data generation unit 163 includes an encoding unit 163A and a pre-processing unit 163B.

逆後処理部１６１Ａは、学習用データ記憶部１１から学習用データを読み出し、図１２の後処理部１３３が行う処理と相補的な関係にある処理（以下、適宜、逆後処理という）を行う。即ち、例えば、学習用データをｙとするとともに、図１２の後処理部１３３が、適応処理データｘに対して施す後処理を、関数ｆ（ｘ）で表すとすると、逆後処理部１６１Ａは、学習用データｙに対して、関数ｆ-1（ｙ）（ｆ-1（）は、関数ｆ（）の逆関数を表す）で表される処理を逆後処理として施し、その結果得られるデータを、教師データとして、適応学習部１６０に出力する。なお、逆後処理部１６１Ａが出力する教師データは、図１２のクラス分類適応処理部１３２から後処理部１３３に供給される適応データに相当する。 The reverse post-processing unit 161A reads the learning data from the learning data storage unit 11, and performs processing complementary to the processing performed by the post-processing unit 133 in FIG. 12 (hereinafter referred to as reverse post-processing as appropriate). . That is, for example, if the learning data is y and the post-processing unit 133 in FIG. 12 performs post-processing on the adaptive processing data x by a function f (x), the reverse post-processing unit 161A Then, the processing represented by the function f-1 (y) (f-1 () represents the inverse function of the function f ()) is applied to the learning data y as a reverse post-processing, and the result is obtained. The data is output to the adaptive learning unit 160 as teacher data. Note that the teacher data output by the reverse post-processing unit 161A corresponds to adaptive data supplied from the class classification adaptive processing unit 132 to the post-processing unit 133 in FIG.

教師データ記憶部１６２は、教師データ生成部１６１（の逆後処理部１６１Ａ）が出力する教師データを一時記憶する。 The teacher data storage unit 162 temporarily stores the teacher data output from the teacher data generation unit 161 (the inverse post-processing unit 161A).

符号化部１６３Ａは、学習用データ記憶部１１から学習用データを読み出し、符号化部１２と同一の符号化方式で符号化して出力する。従って、符号化部１６３Ａは、符号化部１２が出力するのと同一の符号化データを出力する。なお、符号化部１２と１６３Ａとは、１つの符号化部で共用することが可能である。 The encoding unit 163A reads the learning data from the learning data storage unit 11, encodes it with the same encoding method as the encoding unit 12, and outputs the encoded data. Therefore, the encoding unit 163A outputs the same encoded data that the encoding unit 12 outputs. Note that the encoding units 12 and 163A can be shared by a single encoding unit.

前処理部１６３Ｂは、符号化部１６３Ａが出力する符号化データに対して、図１２の前処理部１３１が行うのと同一の前処理を施し、その結果得られる前処理データを、生徒データとして、適応学習部１６０に出力する。なお、前処理部１６３Ｂが出力する生徒データは、図１２の前処理部１３１からクラス分類適応処理部１３２に供給される前処理データに相当する。 The preprocessing unit 163B performs the same preprocessing as that performed by the preprocessing unit 131 in FIG. 12 on the encoded data output from the encoding unit 163A, and uses the preprocessed data obtained as a result as student data. To the adaptive learning unit 160. Note that the student data output by the preprocessing unit 163B corresponds to the preprocessing data supplied from the preprocessing unit 131 to the class classification adaptive processing unit 132 in FIG.

生徒データ記憶部１６４は、生徒データ生成部１６３（の前処理部１６３Ｂ）が出力する生徒データを一時記憶する。 The student data storage unit 164 temporarily stores the student data output from the student data generation unit 163 (preprocessing unit 163B).

タップ抽出部１６５は、教師データ記憶部１６２に記憶された教師データを、順次、注目教師データとし、その注目教師データについて、生徒データ記憶部１６４に記憶された生徒データを抽出することにより、図１３のタップ抽出部１５１が構成するのと同一のタップ構造の予測タップを構成して出力する。なお、タップ抽出部１６５には、ミスマッチ検出部１３（の判定部１７３）が出力するミスマッチ情報が供給されるようになっており、タップ抽出部１６５は、図１３のタップ抽出部１５１と同様に、注目教師データについてのミスマッチ情報に基づいて、予測タップのタップ構造を変更するようになっている。 The tap extraction unit 165 sequentially uses the teacher data stored in the teacher data storage unit 162 as attention teacher data, and extracts the student data stored in the student data storage unit 164 for the attention teacher data. A prediction tap having the same tap structure as that of the 13 tap extraction units 151 is configured and output. The tap extraction unit 165 is supplied with mismatch information output from the mismatch detection unit 13 (the determination unit 173), and the tap extraction unit 165 is similar to the tap extraction unit 151 of FIG. The tap structure of the prediction tap is changed based on the mismatch information about the attention teacher data.

タップ抽出部１６６は、注目教師データについて、生徒データ記憶部１６４に記憶された生徒データを抽出することにより、図１３のタップ抽出部１５２が構成するのと同一のタップ構造のクラスタップを構成して出力する。なお、タップ抽出部１６６には、ミスマッチ検出部１３が出力するミスマッチ情報が供給されるようになっており、タップ抽出部１６６は、図１３のタップ抽出部１５２と同様に、注目教師データについてのミスマッチ情報に基づいて、クラスタップのタップ構造を変更するようになっている。 The tap extraction unit 166 configures a class tap having the same tap structure as the tap extraction unit 152 of FIG. 13 by extracting the student data stored in the student data storage unit 164 for the teacher data of interest. Output. Note that the tap extraction unit 166 is supplied with mismatch information output from the mismatch detection unit 13, and the tap extraction unit 166 is similar to the tap extraction unit 152 in FIG. The tap structure of the class tap is changed based on the mismatch information.

クラス分類部１６７には、タップ抽出部１６６が出力するクラスタップと、ミスマッチ検出部１３が出力するミスマッチ情報が供給されるようになっている。クラス分類部１６７は、注目教師データについてのクラスタップとミスマッチ情報に基づき、図１３のクラス分類部１５３と同一のクラス分類を行い、その結果得られるクラスに対応するクラスコードを、足し込み部１６８に出力する。 The class classification unit 167 is supplied with the class tap output from the tap extraction unit 166 and the mismatch information output from the mismatch detection unit 13. The class classification unit 167 performs the same class classification as the class classification unit 153 of FIG. 13 based on the class tap and mismatch information about the teacher data of interest, and adds the class code corresponding to the resulting class to the addition unit 168. Output to.

足し込み部１６８は、教師データ記憶部１６２から、注目教師データを読み出し、その注目教師データと、タップ抽出部１６５から供給される注目教師データについて構成された予測タップを構成する生徒データを対象とした足し込みを、クラス分類部１６７から供給されるクラスコードごとに行う。 The adding unit 168 reads the attention teacher data from the teacher data storage unit 162, and targets the attention teacher data and the student data constituting the prediction tap configured for the attention teacher data supplied from the tap extraction unit 165. The addition is performed for each class code supplied from the class classification unit 167.

即ち、足し込み部１６８は、クラス分類部１６７から供給されるクラスコードに対応するクラスごとに、予測タップ（生徒データ）を用い、式（８）の行列Ａにおける各コンポーネントとなっている、生徒データどうしの乗算（ｘ_inｘ_im）と、サメーション（Σ）に相当する演算を行う。 In other words, the adding unit 168 uses the prediction tap (student data) for each class corresponding to the class code supplied from the class classification unit 167, and is a component in the matrix A of Expression (8). An operation corresponding to multiplication (x _in x _im ) between data and summation (Σ) is performed.

さらに、足し込み部１６８は、やはり、クラス分類部１６７から供給されるクラスコードに対応するクラスごとに、予測タップ（生徒データ）および教師データを用い、式（８）のベクトルｖにおける各コンポーネントとなっている、生徒データと教師データの乗算（ｘ_inｙ_i）と、サメーション（Σ）に相当する演算を行う。 Furthermore, the adding unit 168 again uses the prediction tap (student data) and the teacher data for each class corresponding to the class code supplied from the class classification unit 167, and uses each component in the vector v of the equation (8) The calculation corresponding to multiplication (x _in y _i ) of student data and teacher data and summation (Σ) is performed.

即ち、足し込み部１６８は、前回、注目教師データとされた教師データについて求められた式（８）における行列Ａのコンポーネントと、ベクトルｖのコンポーネントを、その内蔵するメモリ（図示せず）に記憶しており、その行列Ａまたはベクトルｖの各コンポーネントに対して、新たに注目教師データとされた教師データについて、その教師データｙ_iおよび生徒データx_in(x_im)を用いて計算される、対応するコンポーネントｘ_inｘ_imまたはｘ_inｙ_iを足し込む（行列Ａ、ベクトルｖにおけるサメーションで表される加算を行う）。 In other words, the adding unit 168 stores the component of the matrix A and the component of the vector v in the formula (8) obtained for the teacher data that was previously set as the teacher data of interest in a built-in memory (not shown). For each component of the matrix A or vector v, the teacher data newly set as the attention teacher data is calculated using the teacher data y _i and the student data x _in (x _im ). Add the corresponding component x _in x _im or x _in y _i (addition represented by summation in matrix A, vector v).

そして、足し込み部１６８は、教師データ記憶部１６２に記憶された教師データすべてを注目教師データとして、上述の足し込みを行うことにより、各クラスについて、式（８）に示した正規方程式をたてると、その正規方程式を、タップ係数算出部１６９に供給する。 Then, the addition unit 168 performs the above-described addition using all the teacher data stored in the teacher data storage unit 162 as attention teacher data, thereby obtaining the normal equation shown in the equation (8) for each class. Then, the normal equation is supplied to the tap coefficient calculation unit 169.

タップ係数算出部１６９は、足し込み部１６８から供給されるクラスごとの正規方程式を解くことにより、各クラスごとのタップ係数を求めて出力する。 The tap coefficient calculation unit 169 calculates and outputs the tap coefficient for each class by solving the normal equation for each class supplied from the addition unit 168.

次に、図１６のフローチャートを参照して、図１５の学習装置の処理（学習処理）について、説明する。 Next, processing (learning processing) of the learning device in FIG. 15 will be described with reference to the flowchart in FIG.

まず最初に、ステップＳ３１において、教師データ生成部１６１と生徒データ生成部１６３が、学習用データ記憶部１１に記憶された学習用データから、教師データと生徒データを、それぞれ生成する。教師データは、教師データ生成部１６１から教師データ記憶部１６２に供給されて記憶され、生徒データは、生徒データ生成部１６３から生徒データ記憶部１６４に供給されて記憶される。 First, in step S31, the teacher data generation unit 161 and the student data generation unit 163 generate teacher data and student data from the learning data stored in the learning data storage unit 11, respectively. The teacher data is supplied from the teacher data generation unit 161 to the teacher data storage unit 162 and stored therein, and the student data is supplied from the student data generation unit 163 to the student data storage unit 164 and stored therein.

その後、タップ抽出部１６５は、教師データ記憶部１６２に記憶された教師データのうち、まだ、注目教師データとしていないものを、注目教師データとする。そしてステップＳ３２において、符号化部１２は、学習用データ記憶部１１に記憶された学習用データを符号化し、これにより、注目教師データに対応する符号化データ（注目教師データに対応する学習用データを符号化したもの）を得て、ミスマッチ検出部１３に供給する。 After that, the tap extraction unit 165 sets the teacher data stored in the teacher data storage unit 162 as notable teacher data that has not yet been noted teacher data. In step S32, the encoding unit 12 encodes the learning data stored in the learning data storage unit 11, thereby encoding data corresponding to the attention teacher data (the learning data corresponding to the attention teacher data). Is obtained) and supplied to the mismatch detection unit 13.

ミスマッチ検出部１３は、符号化部１２から供給される符号化データから、注目教師データについてのミスマッチ情報を生成し、学習処理部１４のタップ抽出部１６５および１６６、並びにクラス分類部１６７に供給する。 The mismatch detection unit 13 generates mismatch information about the teacher data of interest from the encoded data supplied from the encoding unit 12 and supplies the mismatch information to the tap extraction units 165 and 166 and the class classification unit 167 of the learning processing unit 14. .

そして、ステップＳ３４に進み、タップ抽出部１６５が、ミスマッチ情報に基づき、注目教師データについて、生徒データ記憶部１６４に記憶された生徒データを読み出して予測タップを構成し、足し込み部１６８に供給するとともに、タップ抽出部１６６が、やはり、ミスマッチ情報に基づき、注目教師データについて、生徒データ記憶部１６４に記憶された生徒データを読み出してクラスタップを構成し、クラス分類部１６７に供給する。 In step S 34, the tap extraction unit 165 reads the student data stored in the student data storage unit 164 for the teacher data of interest based on the mismatch information, forms a prediction tap, and supplies the prediction tap to the addition unit 168. At the same time, the tap extraction unit 166 also reads out the student data stored in the student data storage unit 164 for the teacher data of interest based on the mismatch information, forms a class tap, and supplies the class tap to the class classification unit 167.

クラス分類部１６７は、ステップＳ３５において、注目教師データについてのクラスタップとミスマッチ情報に基づき、注目教師データについてクラス分類を行い、その結果得られるクラスに対応するクラスコードを、足し込み部１６８に出力する。 In step S35, the class classification unit 167 performs class classification on the attention teacher data based on the class tap and mismatch information regarding the attention teacher data, and outputs the class code corresponding to the class obtained as a result to the addition unit 168. To do.

足し込み部１６８は、ステップＳ３６において、教師データ記憶部１６２から注目教師データを読み出し、その注目教師データと、タップ抽出部１６５からの予測タップを用い、式（８）の行列Ａとベクトルｖのコンポーネントを計算する。さらに、足し込み部１６８は、既に得られている行列Ａとベクトルｖのコンポーネントのうち、クラス分類部１６７からのクラスコードに対応するものに対して、注目データと予測タップから求められた行列Ａとベクトルｖのコンポーネントを足し込み、ステップＳ３７に進む。 In step S36, the adding unit 168 reads the attention teacher data from the teacher data storage unit 162, uses the attention teacher data and the prediction tap from the tap extraction unit 165, and uses the matrix A of Expression (8) and the vector v. Calculate the component. Furthermore, the adding unit 168 applies the matrix A obtained from the attention data and the prediction tap to the component corresponding to the class code from the class classification unit 167 among the components of the matrix A and the vector v that have already been obtained. And the vector v component are added, and the process proceeds to step S37.

ステップＳ３７では、タップ抽出部１６５が、教師データ記憶部１６２に、まだ、注目教師データとしていない教師データが記憶されているかどうかを判定する。ステップＳ３７において、注目教師データとしていない教師データが、まだ、教師データ記憶部１６２に記憶されていると判定された場合、タップ抽出部１６５は、まだ注目教師データとしていない教師データを、新たに、注目教師データとして、ステップＳ３２に戻り、以下、同様の処理が繰り返される。 In step S 37, the tap extraction unit 165 determines whether teacher data that has not yet been set as the teacher data of interest is stored in the teacher data storage unit 162. If it is determined in step S37 that the teacher data that is not the attention teacher data is still stored in the teacher data storage unit 162, the tap extraction unit 165 newly adds the teacher data that is not yet the attention teacher data. As attention teacher data, the process returns to step S32, and the same processing is repeated thereafter.

また、ステップＳ３７において、注目教師データとしていない教師データが、教師データ記憶部１６２に記憶されていないと判定された場合、足し込み部１６８は、いままでの処理によって得られたクラスごとの行列Ａおよびベクトルｖのコンポーネントで構成される式（８）の正規方程式を、タップ係数算出部１６９に供給し、ステップＳ３８に進む。 If it is determined in step S37 that the teacher data that is not the attention teacher data is not stored in the teacher data storage unit 162, the adding unit 168 determines that the matrix A for each class obtained by the processing so far is performed. Then, the normal equation of Expression (8) composed of the components of the vector v is supplied to the tap coefficient calculation unit 169, and the process proceeds to Step S38.

ステップＳ３８では、タップ係数算出部１６９は、足し込み部１６８から供給される各クラスごとの正規方程式を解くことにより、各クラスごとに、タップ係数を求めて出力し、処理を終了する。 In step S38, the tap coefficient calculation unit 169 calculates and outputs a tap coefficient for each class by solving the normal equation for each class supplied from the addition unit 168, and ends the process.

なお、学習用データ記憶部１１に記憶されているる学習用データの数が十分でないこと等に起因して、タップ係数を求めるのに必要な数の正規方程式が得られないクラスが生じることがあり得るが、そのようなクラスについては、タップ係数算出部１６９は、例えば、デフォルトのタップ係数を出力するようになっている。 It should be noted that due to the fact that the number of learning data stored in the learning data storage unit 11 is not sufficient, a class in which the number of normal equations necessary for obtaining tap coefficients cannot be obtained may arise. Although possible, for such a class, the tap coefficient calculation unit 169 outputs a default tap coefficient, for example.

次に、図１７は、符号化データが音声データをＣＥＬＰ方式で符号化したものである場合の、図１２の復号装置の第１の詳細構成例を示している。 Next, FIG. 17 illustrates a first detailed configuration example of the decoding device in FIG. 12 when the encoded data is audio data encoded by the CELP method.

図１７の実施の形態では、符号化特性情報抽出部１２１は、チャネルデコーダ１８１で構成されている。チャネルデコーダ１８１は、例えば、図９のチャネルデコーダ６１と同様に構成されており、符号化データから、Ｌコードを抽出し、特性データとして、判定部１２３に供給する。 In the embodiment of FIG. 17, the encoding characteristic information extraction unit 121 includes a channel decoder 181. For example, the channel decoder 181 is configured in the same manner as the channel decoder 61 in FIG. 9, extracts an L code from the encoded data, and supplies the L code as characteristic data to the determination unit 123.

実特性抽出部１２２は、ＶＳＥＬＰ復号装置１８２およびピッチ検出部１８３で構成されている。ＶＳＥＬＰ復号装置１８２は、図９に示したＶＳＥＬＰ復号装置と同様に構成され、符号化データを、ＶＳＥＬＰ方式で復号し、その結果得られる復号音声データを、ピッチ検出部１８３に供給する。 The actual characteristic extraction unit 122 includes a VSELP decoding device 182 and a pitch detection unit 183. The VSELP decoding device 182 is configured in the same manner as the VSELP decoding device shown in FIG. 9, decodes encoded data by the VSELP method, and supplies decoded speech data obtained as a result to the pitch detection unit 183.

ピッチ検出部１８３は、ＶＳＥＬＰ復号装置１８２が出力する復号音声データのピッチ周期を検出する。即ち、ピッチ検出部１８３は、例えば、復号音声データの自己相関を計算し、その自己相関に基づいて、ピッチ周期を検出し、実特性として、判定部１２３に供給する。 The pitch detection unit 183 detects the pitch period of the decoded speech data output from the VSELP decoding device 182. That is, for example, the pitch detection unit 183 calculates the autocorrelation of the decoded speech data, detects the pitch period based on the autocorrelation, and supplies it to the determination unit 123 as an actual characteristic.

判定部１２３は、差分演算部１８４で構成されている。差分演算部１８４は、チャネルデコーダ１８１からのＬコードに対応する時間（音声のピッチ周期を表す時間）と、実際に得られた復号音声データのピッチ周期との差分を演算し、その差分値を、ミスマッチ情報として、クラス分類適応処理部１３２に供給する。 The determination unit 123 includes a difference calculation unit 184. The difference calculation unit 184 calculates the difference between the time corresponding to the L code from the channel decoder 181 (the time representing the pitch period of the voice) and the pitch period of the actually obtained decoded voice data, and calculates the difference value. The mismatch information is supplied to the class classification adaptation processing unit 132.

一方、前処理部１３１は、ＶＳＥＬＰ復号装置１８５で構成されている。ＶＳＥＬＰ復号装置１８５は、ＶＳＥＬＰ復号装置１８２と同様に、符号化データを、ＶＳＥＬＰ方式で復号し、復号音声データを、前処理データとして、クラス分類適応処理部１３２に出力する。 On the other hand, the preprocessing unit 131 includes a VSELP decoding device 185. Similarly to the VSELP decoding device 182, the VSELP decoding device 185 decodes the encoded data by the VSELP method, and outputs the decoded speech data to the class classification adaptation processing unit 132 as preprocessing data.

クラス分類適応処理部１３２では、前処理部１３１のＶＳＥＬＰ復号装置１８５が出力する復号音声データを対象に、クラス分類適応処理が行われ、その結果得られる適応処理データが、後処理部１３３に出力される。後処理部１３３は、クラス分類適応処理部１３２からの適応処理データを、そのまま、高音質音声データとして出力する。 In the class classification adaptation processing unit 132, class classification adaptation processing is performed on the decoded speech data output from the VSELP decoding device 185 of the preprocessing unit 131, and the adaptive processing data obtained as a result is output to the post-processing unit 133. Is done. The post-processing unit 133 outputs the adaptive processing data from the class classification adaptive processing unit 132 as it is as high-quality sound data.

従って、図１７の実施の形態においては、クラス分類適応処理部１３２では、クラス分類適応処理が行われることにより、前処理部１３１のＶＳＥＬＰ復号装置１８５が出力する、符号化データをＶＳＥＬＰ方式で復号した復号音声データが、高音質音声データに変換されて出力される。 Therefore, in the embodiment of FIG. 17, the class classification adaptation processing unit 132 performs the class classification adaptation process, thereby decoding the encoded data output from the VSELP decoding device 185 of the preprocessing unit 131 using the VSELP method. The decoded audio data is converted into high sound quality audio data and output.

即ち、クラス分類適応処理部１３２（図１３）では、前処理部１３１のＶＳＥＬＰ復号装置１８５が出力する復号音声データが、タップ抽出部１５１と１５２に供給される。 That is, in the class classification adaptive processing unit 132 (FIG. 13), the decoded speech data output from the VSELP decoding device 185 of the preprocessing unit 131 is supplied to the tap extraction units 151 and 152.

タップ抽出部１５１は、まだ、注目データとしていない高音質音声データを注目データとして、その注目データを予測するのに用いる復号音声データの幾つかの音声サンプルを、予測タップとして抽出する。タップ抽出部１５２も、注目データをクラス分類するのに用いる復号音声データの幾つかの音声サンプルを、クラスタップとして抽出する。 The tap extraction unit 151 extracts, as prediction taps, some speech samples of decoded speech data used to predict the attention data, using the high-quality sound data that is not yet the attention data as the attention data. The tap extraction unit 152 also extracts some audio samples of the decoded audio data used for classifying the data of interest as class taps.

ここで、上述したように、タップ抽出部１５１および１５２には、判定部１２３からミスマッチ情報も供給されるようになっており、タップ抽出部１５１と１５２は、ミスマッチ情報に基づき、予測タップとクラスタップの構造を、それぞれ変更するようになっている。 Here, as described above, mismatch information is also supplied from the determination unit 123 to the tap extraction units 151 and 152, and the tap extraction units 151 and 152 are configured based on the mismatch information and predicted taps and classes. Each tap structure is changed.

即ち、符号化特性情報抽出部１２１（図１７）のチャネルデコーダ１８１では、例えば、注目データに対応する位置の復号音声データを含むサブフレーム（またはフレーム）のＬコードが抽出され、判定部１２３の差分演算部１８４に供給される。 That is, in the channel decoder 181 of the encoding characteristic information extraction unit 121 (FIG. 17), for example, the L code of the subframe (or frame) including the decoded audio data at the position corresponding to the data of interest is extracted. The difference calculation unit 184 is supplied.

また、実特性抽出部１２２のＶＳＥＬＰ復号装置１８２では、例えば、注目データに対応する位置の復号音声データ（以下、適宜、注目復号音声データという）を含むフレームの前後それぞれ数１０フレーム等が復号され、その結果得られる復号音声データが、ピッチ検出部１８３に供給される。ピッチ検出部１８３では、ＶＳＥＬＰ復号装置１８２から供給される復号音声データの自己相関が計算され、その自己相関に基づき、注目復号音声データ付近のピッチ周期が検出される。このピッチ周期は、差分演算部１８４に供給される。差分演算部１８４は、チャネルデコーダ１８１から供給されるＬコードに対応する時間Ｔ1と、ピッチ検出部１８３から供給されるピッチ周期Ｔ2との差分を演算し、その差分値△Ｔ（＝Ｔ1−Ｔ2）を、注目データについてのミスマッチ情報として出力する。 In addition, in the VSELP decoding device 182 of the actual characteristic extraction unit 122, for example, several tens of frames before and after a frame including decoded audio data at a position corresponding to the data of interest (hereinafter, referred to as interest decoded audio data as appropriate) are decoded. The decoded audio data obtained as a result is supplied to the pitch detector 183. The pitch detector 183 calculates the autocorrelation of the decoded speech data supplied from the VSELP decoding device 182 and detects the pitch period near the decoded speech data of interest based on the autocorrelation. The pitch period is supplied to the difference calculation unit 184. The difference calculation unit 184 calculates a difference between the time T1 corresponding to the L code supplied from the channel decoder 181 and the pitch period T2 supplied from the pitch detection unit 183, and the difference value ΔT (= T1−T2). ) As mismatch information for the data of interest.

タップ抽出部１５１（図１３）は、以上のような、注目データについてのミスマッチ情報としての差分値△Ｔを受信すると、例えば、その差分値△Ｔの絶対値を、所定の閾値ＴＨ_Tと比較する。 When the tap extraction unit 151 (FIG. 13) receives the difference value ΔT as mismatch information for the data of interest as described above, for example, the absolute value of the difference value ΔT is compared with a predetermined threshold value TH _T. To do.

そして、タップ抽出部１５１は、差分値△Ｔの絶対値が、閾値ＴＨ_T以下（または未満）である場合、即ち、注目復号音声データを含むサブフレームのＬコードに対応する時間が、注目復号音声データのピッチ周期を正しく表している場合、例えば、注目復号音声データを含むサブフレーム（以下、適宜、注目サブフレームという）の音声サンプルすべてと、注目サブフレームの１つ前のサブフレームの１サンプルおきの音声サンプルと、注目サブフレームの１つ後のサブフレームの１サンプルおきの音声サンプルとを、予測タップとして抽出する。 Then, when the absolute value of the difference value ΔT is equal to or less than the threshold value TH _T (ie, the time corresponding to the L code of the subframe including the focused decoded speech data), the tap extracting unit 151 performs the focused decoding. When the pitch period of audio data is correctly expressed, for example, all audio samples of a subframe including target decoded audio data (hereinafter, appropriately referred to as a target subframe) and one subframe immediately preceding the target subframe are included. A speech sample every other sample and every other sample in the subframe immediately after the target subframe are extracted as prediction taps.

また、タップ抽出部１５１は、差分値△Ｔの絶対値が、閾値ＴＨ_Tより大きい（または以上である）場合、即ち、注目復号音声データを含むサブフレームのＬコードに対応する時間が、注目復号音声データのピッチ周期を正しく表していない場合、例えば、注目サブフレームの音声サンプルすべてと、注目サブフレームの１つ前と２つ前のサブフレームの２サンプルおきの音声サンプルと、注目サブフレームの１つ後と２つ後のサブフレームの２サンプルおきの音声サンプルとを、予測タップとして抽出する。 Further, when the absolute value of the difference value ΔT is greater than (or greater than) the threshold value TH _T , the tap extraction unit 151, that is, the time corresponding to the L code of the subframe including the target decoded speech data When the pitch period of the decoded audio data is not correctly expressed, for example, all the audio samples of the target subframe, audio samples every two samples of the first and second subframes of the target subframe, and the target subframe Are extracted as prediction taps for every two samples of subframes after the first and second subframes.

タップ抽出部１５２も、タップ抽出部１５１と同様に、ミスマッチ情報に基づいてタップ構造を変更したクラスタップを、復号音声データから抽出する。 Similarly to the tap extraction unit 151, the tap extraction unit 152 also extracts a class tap whose tap structure has been changed based on the mismatch information from the decoded speech data.

なお、ここでは、ミスマッチ情報に基づいて、予測タップとして抽出する音声サンプルの位置を変更するだけで、予測タップを構成する音声サンプルの数は変更しないようにしたが、タップ抽出部１５１では、ミスマッチ情報に基づいて、予測タップを構成する復号音声データの音声サンプルの数を変更するようにすることも可能である。 Here, based on the mismatch information, only the position of the speech sample extracted as the prediction tap is changed, and the number of speech samples constituting the prediction tap is not changed. However, in the tap extraction unit 151, the mismatch is performed. It is also possible to change the number of audio samples of the decoded audio data constituting the prediction tap based on the information.

また、タップ抽出部１５１では、図１０で説明した場合と同様に、ＶＳＥＬＰ復号装置１８５において得られるＬコード、Ｇコード、Ｉコード、またはＡコードも予測タップとして抽出することが可能であるが、この場合も、予測タップとするＬコード、Ｇコード、Ｉコード、またはＡコードのサブフレームの位置や数を、ミスマッチ情報に基づいて変更することが可能である。 Further, in the tap extraction unit 151, as in the case described with reference to FIG. 10, the L code, G code, I code, or A code obtained in the VSELP decoding device 185 can be extracted as a prediction tap. Also in this case, it is possible to change the position and number of subframes of the L code, G code, I code, or A code that are prediction taps based on the mismatch information.

さらに、ミスマッチ情報には、差分値△Ｔだけでなく、その差分値△Ｔを得るのに用いられたＬコードや復号音声データのピッチ周期Ｔ2、即ち、チャネルデコーダ１８１が出力するＬコードや、ピッチ検出部１８３が出力するピッチ周期Ｔ2を含めることが可能である。この場合、タップ抽出部１５１では、上述のような予測タップのタップ構造の変更を、差分値△Ｔだけでなく、Ｌコードや、復号音声データのピッチ周期Ｔ2にも基づいて行うようにすることが可能である。 Further, the mismatch information includes not only the difference value ΔT but also the L code used to obtain the difference value ΔT, the pitch period T2 of the decoded audio data, that is, the L code output by the channel decoder 181; It is possible to include the pitch period T2 output by the pitch detector 183. In this case, the tap extraction unit 151 changes the tap structure of the prediction tap as described above based on not only the difference value ΔT but also the L code and the pitch period T2 of the decoded speech data. Is possible.

タップ抽出部１５２でも、タップ抽出部１５１における場合と同様にして、クラスタップを構成することができる。 The tap extraction unit 152 can also configure class taps in the same manner as in the tap extraction unit 151.

クラス分類部１５３には、クラスタップの他、注目データについてのミスマッチ情報も供給され、クラス分類部１５３では、上述したように、クラスタップとミスマッチ情報に基づき、注目データがクラス分類される。 In addition to the class tap, mismatch information about the attention data is also supplied to the class classification unit 153. As described above, the class classification unit 153 classifies the attention data based on the class tap and the mismatch information.

即ち、クラス分類部１５３は、例えば、注目データについてのクラスタップに基づき、上述のADRC処理を行うことにより、クラスコードを求める。ここで、クラスタップから得られるクラスコードを、以下、適宜、クラスタップコードという。 That is, the class classification unit 153 obtains a class code by performing the above-described ADRC processing based on, for example, a class tap for attention data. Here, the class code obtained from the class tap is hereinafter appropriately referred to as a class tap code.

さらに、クラス分類部１５３は、例えば、注目データについてのミスマッチ情報としての差分値△Ｔの絶対値を、所定の閾値ＴＨ_Tと比較することにより、１ビットのクラスコードを求める。 Furthermore, the classification unit 153, for example, the absolute value of the difference value △ T as mismatch information for the target data, by comparing with a predetermined threshold value TH _T, determination of the 1-bit class code.

即ち、クラス分類部１５３は、差分値△Ｔの絶対値が、閾値ＴＨ_T以下である場合、即ち、注目復号音声データを含むサブフレームのＬコードに対応する時間が、注目復号音声データのピッチ周期を正しく表している場合、０または１のうちの、例えば、１をクラスコードとする。また、クラス分類部１５３は、差分値△Ｔの絶対値が、閾値ＴＨ_Tより大きい場合、即ち、注目復号音声データを含むサブフレームのＬコードに対応する時間が、注目復号音声データのピッチ周期を正しく表していない場合、０または１のうちの、例えば、０をクラスコードとする。ここで、ミスマッチ情報から得られるクラスコードを、以下、適宜、ミスマッチコードという。 That is, the class classification unit 153 determines that the time corresponding to the L code of the subframe including the target decoded speech data is equal to the pitch of the target decoded speech data when the absolute value of the difference value ΔT is equal to or less than the threshold value TH _T. When the period is correctly expressed, for example, 1 of 0 or 1 is set as a class code. Further, the classification unit 153, the absolute value of the difference value △ T is greater than the threshold value TH _T, i.e., the time corresponding to the L code of the subframe including the target decoded audio data, the pitch period of interest decoded audio data Is not represented correctly, for example, 0 or 1 is set as the class code. Here, the class code obtained from the mismatch information is hereinafter referred to as a mismatch code as appropriate.

その後、クラス分類部１５３は、例えば、注目データについて得られたクラスタップコードの上位ビットとして、注目データについて得られたミスマッチコードを付加し、このクラスタップコードとミスマッチコードとで構成されるコードを、注目データについての最終的なクラスコードとして出力する。 Thereafter, the class classification unit 153, for example, adds the mismatch code obtained for the attention data as the upper bits of the class tap code obtained for the attention data, and generates a code composed of this class tap code and the mismatch code. , And output as the final class code for the data of interest.

このクラスコードは、係数メモリ１４１に供給される。係数メモリ１４１では、そのクラスコードに対応するタップ係数が読み出され、予測部１５４に供給される。 This class code is supplied to the coefficient memory 141. In the coefficient memory 141, the tap coefficient corresponding to the class code is read and supplied to the prediction unit 154.

なお、上述のように、ミスマッチ情報に、差分値△Ｔだけでなく、その差分値△Ｔを得るのに用いられたＬコードや復号音声データのピッチ周期Ｔ2、即ち、チャネルデコーダ１８１が出力するＬコードや、ピッチ検出部１８３が出力するピッチ周期Ｔ2を含める場合には、クラス分類部１５３では、ミスマッチ情報に含まれるＬコードやピッチ周期Ｔ2にも基づいて、クラス分類を行うようにすることが可能である。 As described above, not only the difference value ΔT but also the L code used to obtain the difference value ΔT or the pitch period T2 of the decoded audio data, that is, the channel decoder 181 outputs the mismatch information as described above. When including the L code and the pitch period T2 output from the pitch detection unit 183, the class classification unit 153 performs class classification based on the L code and the pitch period T2 included in the mismatch information. Is possible.

また、上述の場合には、差分値△Ｔの絶対値と閾値ＴＨ_Tとの大小関係に対応して、１ビットのミスマッチコードを決定するようにしたが、ミスマッチコードとしては、その他、例えば、差分値△Ｔの２の補数表示などを採用すること等が可能である。 Further, in the above case, in response to the magnitude relationship between the absolute value and the threshold value TH _T of the difference value △ T, has been to determine the mismatched code 1 bit, as the mismatched code, other, for example, It is possible to adopt a 2's complement display of the difference value ΔT.

予測部１５４は、タップ抽出部１５１が出力する予測タップと、係数メモリ１４１から取得したタップ係数とを用いて、式（１）に示した線形予測演算を行う。これにより、予測部１５４は、注目データ（の予測値）、即ち、高音質音声データを求め、後処理部１３３に供給する。 The prediction unit 154 performs the linear prediction calculation shown in Expression (1) using the prediction tap output from the tap extraction unit 151 and the tap coefficient acquired from the coefficient memory 141. Accordingly, the prediction unit 154 obtains attention data (predicted value thereof), that is, high-quality sound data, and supplies it to the post-processing unit 133.

後処理部１３３では、上述したように、クラス分類適応処理部１３２（の予測部１５４）の出力、即ち、高音質音声データが、そのまま出力される。 In the post-processing unit 133, as described above, the output of the class classification adaptive processing unit 132 (the prediction unit 154), that is, the high-quality sound data is output as it is.

次に、図１８は、図１７の復号装置の係数メモリ１４１に記憶させるタップ係数を学習する場合の、図１５の学習装置の詳細構成例を示している。 Next, FIG. 18 illustrates a detailed configuration example of the learning device in FIG. 15 when learning tap coefficients to be stored in the coefficient memory 141 of the decoding device in FIG.

図１８の実施の形態では、学習用データ記憶部１１に、学習用データとして、高音質の音声データ（学習用音声データ）が記憶されている。 In the embodiment of FIG. 18, high-quality sound data (learning sound data) is stored as learning data in the learning data storage unit 11.

符号化部１２は、ＶＳＥＬＰ符号化装置１９１で構成されており、ＶＳＥＬＰ符号化装置１９１は、例えば、図８に示したＶＳＥＬＰ符号化装置と同様に構成されている。但し、ＶＳＥＬＰ符号化装置１９１は、図８のＶＳＥＬＰ符号化装置のマイク４１およびＡ／Ｄ変換部４２が設けられていないものとなっている。 The encoding unit 12 includes a VSELP encoding device 191. The VSELP encoding device 191 is configured similarly to the VSELP encoding device illustrated in FIG. 8, for example. However, the VSELP encoding device 191 is not provided with the microphone 41 and the A / D conversion unit 42 of the VSELP encoding device of FIG.

ＶＳＥＬＰ符号化装置１９１は、学習用データ記憶部１１から学習用音声データを読み出して、ＶＳＥＬＰ方式で符号化し、その結果得られる符号化データを、符号化特性情報抽出部１７１および実特性抽出部１７２に供給する。 The VSELP encoding device 191 reads out the learning speech data from the learning data storage unit 11 and encodes it by the VSELP method, and the encoded data obtained as a result is encoded characteristic information extracting unit 171 and actual characteristic extracting unit 172. To supply.

符号化特性情報抽出部１７１は、チャネルデコーダ１９２で、実特性抽出部１７２は、ＶＳＥＬＰ復号装置１９３およびピッチ検出部１９４で、判定部１７３は、差分演算部１９５で、それぞれ構成されている。チャネルデコーダ１９２、ＶＳＥＬＰ復号装置１９３、ピッチ検出部１９４、または差分演算部１９５は、図１７のチャネルデコーダ１８１、ＶＳＥＬＰ復号装置１８２、ピッチ検出部１８３、または差分演算部１８４とそれぞれ同様の処理を行い、これにより、注目教師データについてのミスマッチ情報として、図１７で説明した差分値△Ｔを得て、適応学習部１６０に出力する。 The encoding characteristic information extraction unit 171 includes a channel decoder 192, the actual characteristic extraction unit 172 includes a VSELP decoding device 193 and a pitch detection unit 194, and the determination unit 173 includes a difference calculation unit 195. The channel decoder 192, VSELP decoding device 193, pitch detection unit 194, or difference calculation unit 195 performs the same processing as the channel decoder 181, VSELP decoding device 182, pitch detection unit 183, or difference calculation unit 184 of FIG. As a result, the difference value ΔT described with reference to FIG. 17 is obtained as mismatch information for the attention teacher data, and is output to the adaptive learning unit 160.

逆後処理部１６１Ａは、学習用データ記憶部１１から学習用音声データを読み出し、そのまま、教師データとして、適応学習部１６０に出力する。適応学習部１６０（図１５）では、教師データ記憶部１６２において、後処理部１６１Ａからの教師データが記憶される。 The reverse post-processing unit 161A reads the learning speech data from the learning data storage unit 11, and outputs the learning speech data as it is to the adaptive learning unit 160 as teacher data. In adaptive learning section 160 (FIG. 15), teacher data from post-processing section 161A is stored in teacher data storage section 162.

符号化部１６３Ａは、ＶＳＥＬＰ符号化装置１９６で構成され、ＶＳＥＬＰ符号化装置１９６は、ＶＳＥＬＰ符号化装置１９１と同様に、学習用データ記憶部１１から学習用音声データを読み出して、ＶＳＥＬＰ方式で符号化し、その結果得られる符号化データを、前処理部１６３Ｂに出力する。 The encoding unit 163A is configured by a VSELP encoding device 196. The VSELP encoding device 196 reads the learning speech data from the learning data storage unit 11 and encodes it using the VSELP method, similarly to the VSELP encoding device 191. The encoded data obtained as a result is output to the preprocessing unit 163B.

前処理部１６３Ｂは、図９のＶＳＥＬＰ復号装置と同様に構成されるＶＳＥＬＰ復号装置１９７で構成され、ＶＳＥＬＰ復号装置１９７は、ＶＳＥＬＰ符号化装置１９６からの符号化データを、ＶＳＥＬＰ方式で復号し、その結果得られる復号音声データを、生徒データとして、適応学習部１６０に出力する。適応学習部１６０（図１５）では、生徒データ記憶部１６４において、ＶＳＥＬＰ復号装置１９７からの生徒データが記憶される。 The preprocessing unit 163B is configured by a VSELP decoding device 197 configured similarly to the VSELP decoding device in FIG. 9, and the VSELP decoding device 197 decodes the encoded data from the VSELP encoding device 196 by the VSELP method, The decoded speech data obtained as a result is output to the adaptive learning unit 160 as student data. In the adaptive learning unit 160 (FIG. 15), the student data from the VSELP decoding device 197 is stored in the student data storage unit 164.

そして、適応学習部１６０では、教師データおよび生徒データを用い、生徒データから抽出される予測タップとタップ係数から、式（１）の線形予測演算を行うことにより得られる教師データの予測値の予測誤差を統計的に最小にするタップ係数を求める学習が行われる。 Then, the adaptive learning unit 160 uses the teacher data and the student data, and predicts the predicted value of the teacher data obtained by performing the linear prediction calculation of Expression (1) from the prediction tap and tap coefficient extracted from the student data. Learning is performed to obtain a tap coefficient that statistically minimizes the error.

即ち、適応学習部１６０（図１５）では、タップ抽出部１６５が、教師データ記憶部１６２に記憶された教師データのうち、まだ、注目教師データとしていないものを、注目教師データとし、注目教師データについて、生徒データ記憶部１６４に記憶された生徒データから予測タップを構成して、足し込み部１６８に供給する。さらに、タップ抽出部１６６が、注目教師データについて、生徒データ記憶部１６４に記憶された生徒データからクラスタップを構成し、クラス分類部１６７に供給する。 That is, in the adaptive learning unit 160 (FIG. 15), the tap extraction unit 165 sets the teacher data stored in the teacher data storage unit 162 that has not yet been set as the attention teacher data as the attention teacher data, and the attention teacher data. Is configured from the student data stored in the student data storage unit 164 and supplied to the adding unit 168. Further, the tap extraction unit 166 configures class taps from the student data stored in the student data storage unit 164 for the teacher data of interest and supplies the class taps to the class classification unit 167.

ここで、チャネルデコーダ１９２、ＶＳＥＬＰ復号装置１９３、ピッチ検出部１９４、または差分演算部１９５では、図１７のチャネルデコーダ１８１、ＶＳＥＬＰ復号装置１８２、ピッチ検出部１８３、または差分演算部１８４とそれぞれ同様の処理が行われ、これにより、注目教師データについてのミスマッチ情報としての差分値△Ｔが、タップ抽出部１６５および１６６、並びにクラス分類部１６７に供給されるようになっている。 Here, the channel decoder 192, the VSELP decoding device 193, the pitch detection unit 194, or the difference calculation unit 195 is the same as the channel decoder 181, the VSELP decoding device 182, the pitch detection unit 183, or the difference calculation unit 184 of FIG. As a result, the difference value ΔT as mismatch information for the teacher data of interest is supplied to the tap extraction units 165 and 166 and the class classification unit 167.

そして、タップ抽出部１６５または１６６では、図１７で説明したタップ抽出部１５１または１５２（図１３）における場合とそれぞれ同様に、ミスマッチ情報に基づいてタップ構造を変更した予測タップまたはクラスタップが、生徒データ記憶部１６４に記憶された生徒データとしての復号音声データから構成される。 Then, in the tap extraction unit 165 or 166, as in the case of the tap extraction unit 151 or 152 (FIG. 13) described with reference to FIG. 17, the predicted tap or class tap whose tap structure has been changed based on the mismatch information is determined by the student. It consists of decoded audio data as student data stored in the data storage unit 164.

なお、タップ抽出部１６５または１６６では、図１７で説明したタップ抽出部１５１または１５２（図１３）における場合とそれぞれ同一のタップ構造の予測タップまたはクラスタップが構成される。このため、タップ抽出部１５１または１５２において、ＶＳＥＬＰ復号装置１８５で得られるＬコード、Ｇコード、Ｉコード、またはＡコードも用いて、予測タップまたはクラスタップが構成される場合には、タップ抽出部１６５または１６６でも、ＶＳＥＬＰ復号装置１９７で得られるＬコード、Ｇコード、Ｉコード、またはＡコードを用いて、タップ抽出部１５１または１５２における場合とそれぞれ同一のタップ構造の予測タップまたはクラスタップが構成される。 In tap extracting section 165 or 166, a prediction tap or a class tap having the same tap structure as that in tap extracting section 151 or 152 (FIG. 13) described in FIG. 17 is configured. Therefore, when the tap extraction unit 151 or 152 uses the L code, the G code, the I code, or the A code obtained by the VSELP decoding device 185 to configure a prediction tap or a class tap, the tap extraction unit Also in 165 or 166, a prediction tap or a class tap having the same tap structure as that in the tap extraction unit 151 or 152 is configured by using the L code, G code, I code, or A code obtained by the VSELP decoding device 197. Is done.

さらに、タップ抽出部１６５または１６６それぞれでは、ミスマッチ情報に、差分値△Ｔだけでなく、その差分値△Ｔを得るのに用いられたＬコードや復号音声データのピッチ周期Ｔ2が含まれる場合には、図１７で説明したタップ抽出部１５１または１５２（図１３）における場合と同様に、予測タップまたはクラスタップのタップ構造の変更が、差分値△Ｔだけでなく、Ｌコードや、復号音声データのピッチ周期Ｔ2にも基づいて行われる。 Further, in each of the tap extraction units 165 and 166, when the mismatch information includes not only the difference value ΔT but also the L code used to obtain the difference value ΔT and the pitch period T2 of the decoded speech data. In the same manner as in the tap extraction unit 151 or 152 (FIG. 13) described in FIG. 17, the change of the tap structure of the prediction tap or class tap is not only the difference value ΔT but also the L code or decoded voice data. This is also performed based on the pitch period T2.

その後、クラス分類部１６７は、注目教師データについてのクラスタップとミスマッチ情報に基づき、注目教師データについて、図１７で説明したクラス分類部１５３（図１３）における場合と同様のクラス分類を行い、その結果得られるクラスに対応するクラスコードを、足し込み部１６８に出力する。 After that, the class classification unit 167 performs class classification similar to that in the class classification unit 153 (FIG. 13) described in FIG. 17 on the attention teacher data based on the class tap and mismatch information regarding the attention teacher data. The class code corresponding to the class obtained as a result is output to the adding unit 168.

足し込み部１６８は、教師データ記憶部１６２から注目教師データを読み出し、その注目教師データと、タップ抽出部１６５からの予測タップを用い、式（８）の行列Ａとベクトルｖのコンポーネントを計算する。さらに、足し込み部１６８は、既に得られている行列Ａとベクトルｖのコンポーネントのうち、クラス分類部１６７からのクラスコードに対応するものに対して、注目教師データと予測タップから求められた行列Ａとベクトルｖのコンポーネントを足し込む。 The adding unit 168 reads the attention teacher data from the teacher data storage unit 162, and calculates the components of the matrix A and the vector v in Expression (8) using the attention teacher data and the prediction tap from the tap extraction unit 165. . Further, the adding unit 168 calculates the matrix obtained from the attention teacher data and the prediction tap for the components corresponding to the class code from the class classification unit 167 among the components of the matrix A and the vector v already obtained. Add the components of A and vector v.

以上の処理が、教師データ記憶部１６２に記憶された教師データすべてを、注目教師データとして行われると、足し込み部１６８は、いままでの処理によって得られたクラスごとの行列Ａおよびベクトルｖのコンポーネントで構成される式（８）の正規方程式を、タップ係数算出部１６９に供給し、タップ係数算出部１６９は、その各クラスごとの正規方程式を解くことにより、各クラスごとに、タップ係数を求めて出力する。 When the above processing is performed on all the teacher data stored in the teacher data storage unit 162 as attention teacher data, the adding unit 168 adds the matrix A and the vector v for each class obtained by the above processing. The normal equation of the equation (8) composed of components is supplied to the tap coefficient calculation unit 169, and the tap coefficient calculation unit 169 solves the normal equation for each class, thereby obtaining the tap coefficient for each class. Find and output.

次に、図１９は、符号化データが音声データをＣＥＬＰ方式で符号化したものである場合の、図１２の復号装置の第２の詳細構成例を示している。なお、図中、図１７における場合と対応する部分については、同一の符号を付してあり、以下では、その説明は、適宜省略する。 Next, FIG. 19 shows a second detailed configuration example of the decoding device in FIG. 12 when the encoded data is audio data encoded by the CELP method. In the figure, portions corresponding to those in FIG. 17 are denoted by the same reference numerals, and description thereof will be omitted below as appropriate.

即ち、図１９の復号装置は、後処理部１３３が、図９の音声合成フィルタ６９と同様に構成される音声合成フィルタ２０１で構成されている他は、基本的に、図１７の復号装置と同様に構成されている。 That is, the decoding apparatus of FIG. 19 is basically the same as the decoding apparatus of FIG. 17 except that the post-processing unit 133 is configured by the speech synthesis filter 201 configured similarly to the speech synthesis filter 69 of FIG. It is constituted similarly.

但し、前処理部１３１のＶＳＥＬＰ復号装置１８５は、図９において、音声合成フィルタ６９が出力する復号音声データではなく、フィルタ係数復号器６５が出力する線形予測係数と、演算器６８が出力する残差信号を、前処理データとして、クラス分類適応処理部１３２に出力するようになっている。 However, in FIG. 9, the VSELP decoding device 185 of the preprocessing unit 131 does not use the decoded speech data output from the speech synthesis filter 69 but the linear prediction coefficient output from the filter coefficient decoder 65 and the remaining output from the computing unit 68 in FIG. The difference signal is output to the class classification adaptive processing unit 132 as preprocessed data.

クラス分類適応処理部１３２では、前処理部１３１のＶＳＥＬＰ復号装置１８５が出力する残差信号（復号残差信号）と線形予測係数（復号線形予測係数）を対象に、クラス分類適応処理が行われ、これにより、音声合成フィルタ２０１において、高音質音声データ（の予測値）を得ることができる残差信号と線形予測係数（以下、適宜、それぞれを、高音質残差信号と高音質線形予測係数という）が、適応処理データとして求められる。 In the class classification adaptive processing unit 132, class classification adaptive processing is performed on the residual signal (decoded residual signal) and the linear prediction coefficient (decoded linear prediction coefficient) output from the VSELP decoding device 185 of the preprocessing unit 131. Thus, in the speech synthesis filter 201, a high-quality sound data (predicted value thereof) from which a residual signal and a linear prediction coefficient (hereinafter appropriately referred to as a high-quality sound residual signal and a high-quality sound linear prediction coefficient, respectively) are obtained. Is required as adaptive processing data.

即ち、クラス分類適応処理部１３２（図１３）では、前処理部１３１のＶＳＥＬＰ復号装置１８５が出力する復号残差信号が、タップ抽出部１５１と１５２に供給される。 That is, in the class classification adaptive processing unit 132 (FIG. 13), the decoding residual signal output from the VSELP decoding device 185 of the preprocessing unit 131 is supplied to the tap extraction units 151 and 152.

タップ抽出部１５１は、まだ、注目データとしていない高音質残差信号のサンプルを注目データとして、その注目データを予測するのに用いる復号残差信号の幾つかのサンプルを、予測タップとして抽出する。タップ抽出部１５２も、注目データをクラス分類するのに用いる復号残差信号の幾つかのサンプルを、クラスタップとして抽出する。 The tap extraction unit 151 extracts, as prediction data, some samples of the decoded residual signal that are used to predict the attention data, using the samples of the high-quality sound residual signal that are not yet the attention data as the attention data. The tap extraction unit 152 also extracts some samples of the decoded residual signal used for classifying the data of interest as class taps.

なお、タップ抽出部１５１および１５２には、図１７で説明したように、注目データについてのミスマッチ情報が供給されるようになっており、タップ抽出部１５１または１５２では、そのミスマッチ情報に基づいて、図１７で説明したようなタップ構造の予測タップまたはクラスタップが、それぞれ構成される。 As described with reference to FIG. 17, the tap extraction units 151 and 152 are supplied with mismatch information about the data of interest, and the tap extraction unit 151 or 152 is based on the mismatch information. Each of the prediction taps or class taps having the tap structure as described in FIG. 17 is configured.

クラス分類部１５３には、クラスタップの他、注目データについてのミスマッチ情報も供給され、クラス分類部１５３では、図１７で説明した場合と同様にして、クラスタップとミスマッチ情報に基づき、注目データがクラス分類され、注目データについてのクラスコードが、係数メモリ１４１に供給される。係数メモリ１４１では、注目データについてのクラスコードに対応するタップ係数が読み出され、予測部１５４に供給される。 In addition to the class tap, mismatch information about the attention data is also supplied to the class classification unit 153. The class classification unit 153 receives the attention data based on the class tap and the mismatch information in the same manner as described in FIG. The class is classified and the class code for the data of interest is supplied to the coefficient memory 141. In the coefficient memory 141, the tap coefficient corresponding to the class code for the data of interest is read and supplied to the prediction unit 154.

予測部１５４は、タップ抽出部１５１が出力する予測タップと、係数メモリ１４１から取得したタップ係数とを用いて、式（１）に示した線形予測演算を行う。これにより、予測部１５４は、注目データ（の予測値）、即ち、高音質残差信号を求め、後処理部１３３に供給する。 The prediction unit 154 performs the linear prediction calculation shown in Expression (1) using the prediction tap output from the tap extraction unit 151 and the tap coefficient acquired from the coefficient memory 141. As a result, the prediction unit 154 obtains attention data (predicted value thereof), that is, a high sound quality residual signal, and supplies it to the post-processing unit 133.

図１９の実施の形態では、クラス分類適応処理部１３２と係数メモリ１４１が２系統設けられており、一方の系統のクラス分類適応処理部１３２および係数メモリ１４１では、復号残差信号が、上述のように処理される。そして、他方の系統のクラス分類適応処理部１３２および係数メモリ１４１では、前処理部１３１のＶＳＥＬＰ復号装置１８５が出力する復号線形予測係数について、復号残差残差信号における場合と同様の処理が行われ、これにより、高音質線形予測係数が求められて、後処理部１３３に供給される。 In the embodiment of FIG. 19, two classes of class classification adaptive processing unit 132 and coefficient memory 141 are provided. In the class classification adaptive processing unit 132 and coefficient memory 141 of one system, the decoding residual signal is Is processed as follows. Then, the class classification adaptation processing unit 132 and the coefficient memory 141 of the other system perform the same processing as in the case of the decoded residual signal for the decoded linear prediction coefficient output from the VSELP decoding device 185 of the preprocessing unit 131. As a result, a high sound quality linear prediction coefficient is obtained and supplied to the post-processing unit 133.

後処理部１３３では、音声合成フィルタ２０１において、クラス分類適応処理部１３２からの高音質線形予測係数をフィルタ係数として、同じくクラス分類適応処理部１３２からの高音質復号残差信号がフィルタリングされることにより、高音質音声データが求められて出力される。 In the post-processing unit 133, the speech synthesis filter 201 filters the high-quality decoded residual signal from the class classification adaptive processing unit 132 using the high-quality linear prediction coefficient from the class classification adaptive processing unit 132 as a filter coefficient. Thus, high-quality sound data is obtained and output.

次に、図２０および図２１は、図１９の復号装置の係数メモリ１４１に記憶させるタップ係数を学習する場合の、図１５の学習装置の詳細構成例を示している。なお、図中、図１８における場合と対応する部分については、同一の符号を付してあり、以下では、その説明は、適宜省略する。 Next, FIG. 20 and FIG. 21 illustrate a detailed configuration example of the learning device in FIG. 15 when learning the tap coefficients to be stored in the coefficient memory 141 of the decoding device in FIG. In the figure, portions corresponding to those in FIG. 18 are denoted by the same reference numerals, and description thereof will be omitted below as appropriate.

図２０は、復号残差信号を高音質残差信号に変換するタップ係数を学習する学習装置の構成例を示しており、図２１は、復号線形予測係数を高音質線形予測係数に変換するタップ係数を学習する学習装置の構成例を示している。 FIG. 20 shows a configuration example of a learning device that learns tap coefficients for converting a decoded residual signal into a high sound quality residual signal, and FIG. 21 shows a tap for converting a decoded linear prediction coefficient into a high sound quality linear prediction coefficient. The example of a structure of the learning apparatus which learns a coefficient is shown.

図２０の実施の形態では、逆後処理部１６１Ａが、ＬＰＣ分析部２１１および予測フィルタ２１２で構成されており、また、前処理部１６３Ｂを構成するＶＳＥＬＰ復号装置１９７は、復号残差信号（図９の演算器６８が出力する残差信号）を、生徒データとして、適応学習部１６０に供給するようになっている。 In the embodiment of FIG. 20, the inverse post-processing unit 161A includes an LPC analysis unit 211 and a prediction filter 212, and the VSELP decoding device 197 configuring the pre-processing unit 163B includes a decoded residual signal (FIG. 9) is supplied to the adaptive learning unit 160 as student data.

ＬＰＣ分析部２１１は、学習用データ記憶部１１から学習用音声データを読み出し、図８のＬＰＣ分析部４４における場合と同様に、学習用音声データをＬＰＣ分析することで、Ｐ次の線形予測係数を求めて、予測フィルタ２１２に供給する。 The LPC analysis unit 211 reads the learning speech data from the learning data storage unit 11 and performs the LPC analysis on the learning speech data in the same manner as in the LPC analysis unit 44 in FIG. Is obtained and supplied to the prediction filter 212.

予測フィルタ２１２は、学習用データ記憶部１１から、ＬＰＣ分析部２１１がＬＰＣ分析を行った学習用データを読み出し、その学習用データと、ＬＰＣ分析部２１１から供給される線形予測係数を用いて、例えば、式（９）にしたがった演算を行うことにより、残差信号を求め、教師データとして、適応学習部１６０に供給する。 The prediction filter 212 reads the learning data that the LPC analysis unit 211 has performed the LPC analysis from the learning data storage unit 11, and uses the learning data and the linear prediction coefficient supplied from the LPC analysis unit 211, For example, a residual signal is obtained by performing an operation according to Equation (9), and supplied to the adaptive learning unit 160 as teacher data.

ここで、式（９）における音声データ（音声信号）ｓ_nと残差信号ｅ_nのＺ変換を、ＳとＥとそれぞれ表すと、式（９）は、次式のように表すことができる。 Here, the Z transform of the audio data (audio signal) s _n and the residual signal e _n in the equation (9), when expressed respectively S and E, the formula (9) can be expressed as: .

Ｅ＝（１＋α₁ｚ^-1＋α₂ｚ^-2＋・・・＋α_Pｚ^-P）Ｓ
・・・（１４） E = (1 + α ₁ z ⁻¹ + α ₂ z ⁻² +... + Α _P z ^−P ) S
(14)

式（１４）から、残差信号ｅは、音声データｓと線形予測係数α_Pとの積和演算で求めることができ、従って、残差信号ｅを求める予測フィルタ２１２は、ＦＩＲ(Finite Impulse Response)型のディジタルフィルタで構成することができる。 From equation (14), the residual signal e can be obtained by the product-sum operation of the speech data s and the linear prediction coefficient α _P, and therefore the prediction filter 212 for obtaining the residual signal e is FIR (Finite Impulse Response). ) Type digital filter.

適応学習部１６０（図１５）では、教師データ記憶部１６２において、予測フィルタ２１２から供給される教師データとしての残差信号（上述の高音質残差信号に相当する）が記憶されるとともに、生徒データ記憶部１６４において、ＶＳＥＬＰ復号装置１９７から供給される生徒データとしての復号残差信号が記憶される。 In the adaptive learning unit 160 (FIG. 15), the teacher data storage unit 162 stores a residual signal (corresponding to the above-mentioned high-quality residual signal) as teacher data supplied from the prediction filter 212, and also a student. In the data storage unit 164, a decoding residual signal as student data supplied from the VSELP decoding device 197 is stored.

そして、適応学習部１６０では、図１８で説明した場合と同様に、教師データおよび生徒データを用い、生徒データから抽出される予測タップとタップ係数から、式（１）の線形予測演算を行うことにより得られる教師データの予測値の予測誤差を統計的に最小にするタップ係数を求める学習が行われ、これにより、復号残差信号を高音質残差信号に変換するクラスごとのタップ係数が求められる。 Then, as in the case described with reference to FIG. 18, the adaptive learning unit 160 uses the teacher data and the student data, and performs the linear prediction calculation of Expression (1) from the prediction tap and the tap coefficient extracted from the student data. Learning is performed to obtain tap coefficients that statistically minimize the prediction error of the predicted value of the teacher data obtained by the above, and thereby, tap coefficients for each class for converting the decoded residual signal into a high-quality residual signal are obtained. It is done.

次に、図２１の実施の形態では、逆後処理部１６１Ａが、ＬＰＣ分析部２２１で構成されており、また、前処理部１６３Ｂを構成するＶＳＥＬＰ復号装置１９７は、復号線形予測係数（図９のフィルタ係数復号器６５が出力する線形予測係数）を、生徒データとして、適応学習部１６０に供給するようになっている。 Next, in the embodiment of FIG. 21, the inverse post-processing unit 161A is configured by an LPC analysis unit 221, and the VSELP decoding device 197 that configures the pre-processing unit 163B is configured to decode decoded linear prediction coefficients (FIG. 9). The linear prediction coefficient output from the filter coefficient decoder 65 is supplied to the adaptive learning unit 160 as student data.

ＬＰＣ分析部２２１は、学習用データ記憶部１１から学習用音声データを読み出し、図８のＬＰＣ分析部４４における場合と同様に、学習用音声データをＬＰＣ分析することで、Ｐ次の線形予測係数を求め、教師データとして、適応学習部１６０に供給する。 The LPC analysis unit 221 reads the learning speech data from the learning data storage unit 11, and performs the LPC analysis on the learning speech data in the same manner as in the LPC analysis unit 44 of FIG. Is supplied to the adaptive learning unit 160 as teacher data.

適応学習部１６０（図１５）では、教師データ記憶部１６２において、ＬＰＣ分析部２２１から供給される教師データとしての線形予測係数（上述の高音質線形予測係数に相当する）が記憶されるとともに、生徒データ記憶部１６４において、ＶＳＥＬＰ復号装置１９７から供給される生徒データとしての復号線形予測係数が記憶される。 In the adaptive learning unit 160 (FIG. 15), in the teacher data storage unit 162, linear prediction coefficients (corresponding to the above-described high sound quality linear prediction coefficients) as teacher data supplied from the LPC analysis unit 221 are stored. The student data storage unit 164 stores decoded linear prediction coefficients as student data supplied from the VSELP decoding device 197.

そして、適応学習部１６０では、図１８で説明した場合と同様に、教師データおよび生徒データを用い、生徒データから抽出される予測タップとタップ係数から、式（１）の線形予測演算を行うことにより得られる教師データの予測値の予測誤差を統計的に最小にするタップ係数を求める学習が行われ、これにより、復号線形予測係数を高音質線形予測係数に変換するクラスごとのタップ係数が求められる。 Then, as in the case described with reference to FIG. 18, the adaptive learning unit 160 uses the teacher data and the student data, and performs the linear prediction calculation of Expression (1) from the prediction tap and the tap coefficient extracted from the student data. Learning to find the tap coefficient that statistically minimizes the prediction error of the prediction value of the teacher data obtained by the above is performed, and the tap coefficient for each class for converting the decoded linear prediction coefficient into the high-quality linear prediction coefficient is obtained. It is done.

次に、図２２は、符号化データが画像データをＭＰＥＧ２方式で符号化したものである場合の、図１２の復号装置の第１の詳細構成例を示している。 Next, FIG. 22 shows a first detailed configuration example of the decoding device of FIG. 12 when the encoded data is obtained by encoding image data by the MPEG2 system.

図１７の実施の形態では、符号化特性情報抽出部１２１は、逆ＶＬＣ部２３１で構成されている。逆ＶＬＣ部２３１は、例えば、後述するＭＰＥＧデコーダ２３２を構成する逆ＶＬＣ部２４１（図２３）と同様に構成されており、符号化データから、ＤＣＴタイプを抽出し、特性データとして、判定部１２３に供給する。 In the embodiment of FIG. 17, the encoding characteristic information extraction unit 121 includes an inverse VLC unit 231. The inverse VLC unit 231 is configured, for example, in the same manner as the inverse VLC unit 241 (FIG. 23) that configures an MPEG decoder 232 described later. To supply.

実特性抽出部１２２は、ＭＰＥＧデコーダ２３２および相関演算部２３３で構成されている。ＭＰＥＧデコーダ２３２は、符号化データをＭＰＥＧ方式で復号し、その結果得られる復号画像データを、相関演算部２３３に供給する。 The actual characteristic extraction unit 122 includes an MPEG decoder 232 and a correlation calculation unit 233. The MPEG decoder 232 decodes the encoded data by the MPEG method, and supplies the decoded image data obtained as a result to the correlation calculation unit 233.

ここで、図２３は、ＭＰＥＧデコーダ２３２の構成例を示している。 Here, FIG. 23 shows a configuration example of the MPEG decoder 232.

符号化データは、逆ＶＬＣ部２４１に供給される。逆ＶＬＣ部２４１は、符号化データに含まれる量子化ＤＣＴ係数（量子化された２次元ＤＣＴ係数）のＶＬＣコード（量子化ＤＣＴ係数を可変長符号化したもの）、量子化ステップ、動きベクトル、ピクチャタイプ、テンポラルリファレンス、その他の情報を分離する。 The encoded data is supplied to the inverse VLC unit 241. The inverse VLC unit 241 includes a VLC code (quantized two-dimensional DCT coefficient) VLC code (quantized DCT coefficient variable-length encoded) included in the encoded data, a quantization step, a motion vector, Separate picture type, temporal reference, and other information.

そして、逆ＶＬＣ部２４１は、量子化ＤＣＴ係数のＶＬＣコードを逆ＶＬＣ処理することで、量子化ＤＣＴ係数に復号し、逆量子化部２４２に供給する。さらに、逆ＶＬＣ部２４１は、量子化ステップを逆量子化部２４２に、動きベクトルを動き補償部２４６に、ピクチャタイプをメモリ２４５に、テンポラルリファレンスをピクチャ選択部２４７に、それぞれ供給する。 Then, the inverse VLC unit 241 performs inverse VLC processing on the VLC code of the quantized DCT coefficient to decode the quantized DCT coefficient, and supplies the quantized DCT coefficient to the inverse quantization unit 242. Further, the inverse VLC unit 241 supplies the quantization step to the inverse quantization unit 242, the motion vector to the motion compensation unit 246, the picture type to the memory 245, and the temporal reference to the picture selection unit 247.

逆量子化部２４２は、逆ＶＬＣ部２４１から供給される量子化ＤＣＴ係数を、同じく逆ＶＬＣ部２４２から供給される量子化ステップで逆量子化し、その結果得られる２次元ＤＣＴ係数を、逆ＤＣＴ変換部２４２に供給する。逆ＤＣＴ変換部２４３は、逆量子化部２４２から供給される２次元ＤＣＴ係数を、２次元逆ＤＣＴ変換し、演算部２４４に供給する。 The inverse quantization unit 242 inversely quantizes the quantized DCT coefficient supplied from the inverse VLC unit 241 in the quantization step similarly supplied from the inverse VLC unit 242, and converts the obtained two-dimensional DCT coefficient into an inverse DCT. This is supplied to the conversion unit 242. The inverse DCT transform unit 243 performs a two-dimensional inverse DCT transform on the two-dimensional DCT coefficient supplied from the inverse quantization unit 242, and supplies the result to the calculation unit 244.

演算部２４４には、逆ＤＣＴ変換部２４３の出力の他、動き補償部２４６の出力も供給されるようになっており、演算部２４４は、逆ＤＣＴ変換部２４３の出力に対して、動き補償部２４６の出力を、必要に応じて加算することにより、復号画像データを得て出力する。 The calculation unit 244 is supplied with the output of the motion compensation unit 246 in addition to the output of the inverse DCT conversion unit 243. The calculation unit 244 performs motion compensation on the output of the inverse DCT conversion unit 243. The output of the unit 246 is added as necessary to obtain and output decoded image data.

即ち、ＭＰＥＧ符号化では、ピクチャタイプとして、Ｉ，Ｐ，Ｂの３つが定義されており、各ピクチャは、横×縦が８×８画素単位で、２次元ＤＣＴ変換されるが、その際、Ｉピクチャのブロックは、イントラ(intra)符号化され、Ｐピクチャのブロックは、イントラ符号化、または前方予測符号化され、Ｂピクチャのブロックは、イントラ符号化、前方予測符号化、後方予測符号化、または両方向予測符号化される。 That is, in MPEG encoding, three picture types, I, P, and B, are defined, and each picture is two-dimensionally DCT-converted in units of 8 × 8 pixels in width × length. The I picture block is intra-coded, the P picture block is intra-coded or forward-predicted, and the B-picture block is intra-coded, forward-predicted and backward-predicted. Or bi-directional predictive coding.

ここで、前方予測符号化では、符号化対象のブロックのフレーム（またはフィールド）より時間的に先行するフレーム（またはフィールド）の画像を参照画像として、その参照画像を動き補償することにより得られる、符号化対象のブロックの予測画像と、符号化対象のブロックとの差分が求められ、その差分値（以下、適宜、残差画像という）が２次元ＤＣＴ変換される。 Here, in forward predictive coding, an image of a frame (or field) temporally preceding the frame (or field) of the block to be coded is used as a reference image, and the reference image is obtained by motion compensation. The difference between the prediction image of the block to be encoded and the block to be encoded is obtained, and the difference value (hereinafter referred to as a residual image as appropriate) is subjected to two-dimensional DCT transform.

また、後方予測符号化では、符号化対象のブロックのフレームより時間的に後行するフレームの画像を参照画像として、その参照画像を動き補償することにより得られる、符号化対象のブロックの予測画像と、符号化対象のブロックとの差分が求められ、その差分値（残差画像）が２次元ＤＣＴ変換される。 Further, in backward predictive coding, a predicted image of a block to be encoded, which is obtained by performing motion compensation on the reference image using a frame image temporally following the frame of the block to be encoded as a reference image. And the difference from the block to be encoded are obtained, and the difference value (residual image) is subjected to two-dimensional DCT transform.

さらに、両方向予測符号化では、符号化対象のブロックのフレームより時間的に先行するフレームと後行するフレームの２フレーム（またはフィールド）の画像を参照画像として、その参照画像を動き補償することにより得られる、符号化対象のブロックの予測画像と、符号化対象のブロックとの差分が求められ、その差分値（残差画像）が２次元ＤＣＴ変換される。 Furthermore, in bi-directional predictive coding, two frames (or fields) of a frame temporally preceding and following a frame of a block to be encoded are used as reference images, and the reference image is subjected to motion compensation. The obtained difference between the prediction image of the encoding target block and the encoding target block is obtained, and the difference value (residual image) is subjected to two-dimensional DCT transform.

従って、ブロックが、ノンイントラ(non-intra)符号化（前方予測符号化、後方予測符号化、または両方向予測符号化）されている場合、逆ＤＣＴ変換部２４３の出力は、残差画像（元の画像と、その予測画像との差分値）を復号したものとなっており、演算部２４４は、この残差画像の復号結果（以下、適宜、復号残差画像という）と、動き補償部２４６から供給される予測画像とを加算することで、ノンイントラ符号化されたブロックを復号し、その結果得られる復号画像データを出力する。 Therefore, when the block is non-intra coded (forward prediction coding, backward prediction coding, or bidirectional prediction coding), the output of the inverse DCT transform unit 243 is a residual image (original The difference between the image and the predicted image is decoded. The arithmetic unit 244 decodes the residual image (hereinafter referred to as a decoded residual image as appropriate) and the motion compensation unit 246. Are added to the prediction image supplied from the non-intra-coded block, and the decoded image data obtained as a result is output.

一方、逆ＤＣＴ変換部２４３が出力するブロックが、イントラ符号化されたものであった場合には、逆ＤＣＴ変換部２４３の出力は、元の画像を復号したものとなっており、演算部２４４は、逆ＤＣＴ変換部２４３の出力を、そのまま、復号画像データとして出力する。 On the other hand, when the block output from the inverse DCT transform unit 243 is an intra-coded block, the output from the inverse DCT transform unit 243 is obtained by decoding the original image, and the computation unit 244. Outputs the output of the inverse DCT conversion unit 243 as decoded image data as it is.

演算部２４４が出力する復号画像データは、メモリ２４５とピクチャ選択部２４７に供給される。 The decoded image data output from the calculation unit 244 is supplied to the memory 245 and the picture selection unit 247.

メモリ２４５は、演算部２４４から供給される復号画像データが、ＩピクチャまたはＰピクチャの画像データである場合、その復号画像データを、その後に復号される符号化データの参照画像として一時記憶する。ここで、ＭＰＥＧ２では、Ｂピクチャは参照画像とされないことから、演算部２４４から供給される復号画像が、Ｂピクチャの画像である場合には、メモリ２４５では、Ｂピクチャの復号画像は記憶されない。なお、メモリ２４５は、演算部２４４から供給される復号画像が、Ｉ，Ｐ，Ｂのうちのいずれのピクチャであるかは、逆ＶＬＣ部２４１から供給されるピクチャタイプを参照することにより判断する。 When the decoded image data supplied from the calculation unit 244 is I-picture or P-picture image data, the memory 245 temporarily stores the decoded image data as a reference image of encoded data to be decoded thereafter. Here, in MPEG2, since the B picture is not a reference image, when the decoded image supplied from the calculation unit 244 is a B picture image, the memory 245 does not store the decoded B picture image. Note that the memory 245 determines whether the decoded image supplied from the calculation unit 244 is a picture of I, P, or B by referring to the picture type supplied from the inverse VLC unit 241. .

ピクチャ選択部２４７は、演算部２４４が出力する復号画像、またはメモリ２４５に記憶された復号画像のフレーム（またはフィールド）を、表示順に選択して出力する。即ち、ＭＰＥＧ２方式では、画像のフレーム（またはフィールド）の表示順と復号順（符号化順）とが一致していないため、ピクチャ選択部２４７は、復号順に得られる復号画像のフレーム（またはフィールド）を表示順に並べ替えて出力する。なお、ピクチャ選択部２４７は、表示順を、逆ＶＬＣ部２４１から供給されるテンポラルリファレンスを参照することにより判断する。 The picture selection unit 247 selects and outputs the decoded image output from the calculation unit 244 or the frame (or field) of the decoded image stored in the memory 245 in the display order. That is, in the MPEG2 system, since the display order of the frame (or field) of the image does not match the decoding order (encoding order), the picture selection unit 247 allows the frame (or field) of the decoded image obtained in the decoding order. Are output in the order of display. Note that the picture selection unit 247 determines the display order by referring to the temporal reference supplied from the reverse VLC unit 241.

一方、動き補償部２４６は、逆ＶＬＣ部２４１が出力する動きベクトルを受信するとともに、参照画像となるフレーム（またはフィールド）を、メモリ２４５から読み出し、その参照画像に対して、逆ＶＬＣ部２４１からの動きベクトルにしたがった動き補償を施し、その結果得られる予測画像を、演算部２４４に供給する。演算部２４４では、上述したように、動き補償部２４６からの予測画像と、逆ＤＣＴ変換部２４３が出力する残差画像と加算され、これにより、ノンイントラ符号化されたブロックが復号される。 On the other hand, the motion compensation unit 246 receives the motion vector output from the inverse VLC unit 241 and reads out a frame (or field) serving as a reference image from the memory 245 and outputs the reference image from the inverse VLC unit 241. The motion compensation according to the motion vector is performed, and the predicted image obtained as a result is supplied to the calculation unit 244. As described above, the calculation unit 244 adds the prediction image from the motion compensation unit 246 and the residual image output from the inverse DCT conversion unit 243, thereby decoding the non-intra coded block.

図２２に戻り、相関演算部２３３は、ＭＰＥＧデコーダ２３２が出力する復号画像データの各ブロックについて、ライン間の相関を演算する。 Returning to FIG. 22, the correlation calculation unit 233 calculates the correlation between lines for each block of the decoded image data output from the MPEG decoder 232.

即ち、相関演算部２３３は、ブロックにおけるフレームを構成するライン間の相関（以下、適宜、フレームライン相関という）と、フィールドを構成するライン間の相関（以下、適宜、フィールドライン相関という）を計算する。 That is, the correlation calculation unit 233 calculates a correlation between lines constituting a frame in a block (hereinafter referred to as frame line correlation as appropriate) and a correlation between lines constituting a field (hereinafter referred to as field line correlation as appropriate). To do.

具体的には、相関演算部２３３は、図２４に示すように、ブロックにおける隣接する第ｉライン（上からｉ番目のライン）と第ｉ＋１ラインとの間の相関Ｐ（ｉ，ｉ＋１）を、例えば、次式にしたがって求める。 Specifically, as shown in FIG. 24, the correlation calculation unit 233 calculates the correlation P (i, i + 1) between the adjacent i-th line (i-th line from the top) and i + 1-th line in the block. For example, it calculates | requires according to following Formula.

Ｐ（ｉ，ｉ＋１）＝１／（Σ（ｘ（ｉ，ｊ）−ｘ（ｉ＋１，ｊ））
・・・（１５） P (i, i + 1) = 1 / (Σ (x (i, j) −x (i + 1, j))
... (15)

但し、ｘ（ｉ，ｊ）は、第ｉラインの左からｊ番目（第ｊ列）の画素の画素値を表す。また、Σは、ｊを１乃至８に変えてのサメーションを表す。 However, x (i, j) represents the pixel value of the j-th (j-th column) pixel from the left of the i-th line. Σ represents a summation with j changed from 1 to 8.

そして、相関演算部２３３は、例えば、相関Ｐ（ｉ，ｉ＋１）の平均値（（Ｐ（１，２）＋Ｐ（２，３）＋Ｐ（３，４）＋Ｐ（４，５）＋Ｐ（５，６）＋Ｐ（６，７）＋Ｐ（７，８））／７）を求め、この平均値を、フレームライン相関として出力する。 Then, for example, the correlation calculation unit 233 calculates the average value of the correlations P (i, i + 1) ((P (1,2) + P (2,3) + P (3,4)) + P (4,5) + P (5 6) + P (6,7) + P (7,8)) / 7) is obtained, and this average value is output as a frame line correlation.

また、相関演算部２３３は、図２４に示すように、ブロックにおける１ラインおきに隣接する第ｉラインと第ｉ＋２ラインとの間の相関Ｐ（ｉ，ｉ＋２）を、例えば、式（１５）にしたがって求める。 Further, as shown in FIG. 24, the correlation calculation unit 233 calculates the correlation P (i, i + 2) between the i-th line and the i + 2 line adjacent to every other line in the block, for example, in the equation (15). Therefore seek.

そして、相関演算部２３３は、例えば、相関Ｐ（ｉ，ｉ＋２）の平均値（（Ｐ（１，３）＋Ｐ（２，４）＋Ｐ（３，５）＋Ｐ（４，６）＋Ｐ（５，７）＋Ｐ（６，８））／６）を求め、この平均値を、フィールドライン相関として出力する。 Then, for example, the correlation calculation unit 233 calculates the average value of the correlation P (i, i + 2) ((P (1,3) + P (2,4) + P (3,5) + P (4,6) + P (5 7) + P (6,8)) / 6) is obtained, and this average value is output as a field line correlation.

相関演算部２３３が出力するフレームライン相関とフィールドライン相関は、実特性として、判定部１２３に供給される。 The frame line correlation and field line correlation output from the correlation calculation unit 233 are supplied to the determination unit 123 as actual characteristics.

ここで、あるブロックにおいて、そのブロックにおける画像の動きが比較的小さい場合には、一般に、フレームライン相関が大になり、フィールドライン相関が小になる。また、そのブロックにおける画像の動きが比較的大きい場合には、一般に、フィールドライン相関が大になり、フレームライン相関が小になる。従って、フレームライン相関とフィールドライン相関は、画像の実際の特性（実特性）を表しているということができる。 Here, when the motion of an image in a certain block is relatively small, generally, the frame line correlation becomes large and the field line correlation becomes small. When the motion of the image in the block is relatively large, generally, the field line correlation becomes large and the frame line correlation becomes small. Therefore, it can be said that the frame line correlation and the field line correlation represent the actual characteristics (actual characteristics) of the image.

判定部１２３は、ブロック特性判定部２３４と比較部２３５で構成されている。ブロック特性判定部２３４は、クラス分類適応処理部１３２における注目データに対応する画素を含むブロック（以下、適宜、注目ブロック）のフレームライン相関とフィールドライン相関に基づき、注目ブロックが、フレームＤＣＴモードまたはフィールドＤＣＴモードのうちのいずれで符号化されるべき特性を有するものであるかを判定し、その判定結果（以下、適宜、実特性タイプという）を、比較部２３５に供給する。 The determination unit 123 includes a block characteristic determination unit 234 and a comparison unit 235. Based on the frame line correlation and the field line correlation of a block (hereinafter referred to as “target block” as appropriate) including a pixel corresponding to the target data in the class classification adaptation processing unit 132, the block characteristic determination unit 234 determines whether the target block is in frame DCT mode or It is determined which of the field DCT modes has a characteristic to be encoded, and the determination result (hereinafter referred to as an actual characteristic type as appropriate) is supplied to the comparison unit 235.

即ち、ブロック特性判定部２３４は、例えば、注目ブロックのフレームライン相関が、フィールドライン相関より小さい（または以下である）場合には、注目ブロックがフィールドＤＣＴモードで符号化されるべき特性を有するという実特性タイプを、比較部２３５に供給する。また、ブロック特性判定部２３４は、注目ブロックのフレームライン相関が、フィールドライン相関より小さくない場合には、注目ブロックがフレームＤＣＴモードで符号化されるべき特性を有するという実特性タイプを、比較部２３５に供給する。 That is, for example, when the frame line correlation of the block of interest is smaller than (or less than) the field line correlation, the block characteristic determination unit 234 has the characteristic that the block of interest should be encoded in the field DCT mode. The actual characteristic type is supplied to the comparison unit 235. In addition, the block characteristic determination unit 234 determines the actual characteristic type that the target block has a characteristic to be encoded in the frame DCT mode when the frame line correlation of the target block is not smaller than the field line correlation. 235.

比較部２３５は、符号化特性情報抽出部１２１の逆ＶＬＣ部２３１から供給される注目ブロックのＤＣＴタイプ（注目ブロックを含むマクロブロックのＤＣＴタイプ）と、ブロック特性判定部２３４から供給される注目ブロックの実特性タイプとを比較し、その比較結果、即ち、例えば、注目ブロックのＤＣＴタイプを表すフラグと実特性タイプを表すフラグのセットを、ミスマッチ情報として、クラス分類適応処理部１３２に供給する。 The comparison unit 235 includes the DCT type of the target block (DCT type of the macroblock including the target block) supplied from the inverse VLC unit 231 of the coding characteristic information extraction unit 121 and the target block supplied from the block characteristic determination unit 234. And a set of flags representing the DCT type of the block of interest and a flag representing the actual characteristic type are supplied to the class classification adaptation processing unit 132 as mismatch information.

一方、前処理部１３１は、ＭＰＥＧデコーダ２３６で構成されている。ＭＰＥＧデコーダ２３６は、ＭＰＥＧデコーダ２３２と同様に、符号化データを、ＭＰＥＧ方式で復号し、復号画像データを、前処理データとして、クラス分類適応処理部１３２に出力する。 On the other hand, the preprocessing unit 131 includes an MPEG decoder 236. Similar to the MPEG decoder 232, the MPEG decoder 236 decodes the encoded data by the MPEG method, and outputs the decoded image data to the class classification adaptive processing unit 132 as preprocessed data.

クラス分類適応処理部１３２では、前処理部１３１のＭＰＥＧデコーダ２３６が出力する復号画像データを対象に、クラス分類適応処理が行われ、その結果得られる適応処理データが、後処理部１３３に出力される。後処理部１３３は、クラス分類適応処理部１３２からの適応処理データを、そのまま、高画質の画像データ（高画質画像データ）として出力する。 In the class classification adaptive processing unit 132, class classification adaptive processing is performed on the decoded image data output from the MPEG decoder 236 of the preprocessing unit 131, and the adaptive processing data obtained as a result is output to the post-processing unit 133. The The post-processing unit 133 outputs the adaptive processing data from the class classification adaptive processing unit 132 as it is as high-quality image data (high-quality image data).

従って、図２２の実施の形態においては、クラス分類適応処理部１３２では、クラス分類適応処理が行われることにより、前処理部１３１のＭＰＥＧデコーダ２３６が出力する、符号化データをＭＰＥＧ方式で復号した復号画像データが、高画質画像データに変換されて出力される。 Therefore, in the embodiment of FIG. 22, the class classification adaptive processing unit 132 performs the class classification adaptive processing, and thus the encoded data output from the MPEG decoder 236 of the preprocessing unit 131 is decoded by the MPEG method. The decoded image data is converted into high-quality image data and output.

即ち、クラス分類適応処理部１３２（図１３）では、前処理部１３１のＭＰＥＧデコーダ２３６が出力する復号画像データが、タップ抽出部１５１と１５２に供給される。 That is, in the class classification adaptive processing unit 132 (FIG. 13), the decoded image data output from the MPEG decoder 236 of the preprocessing unit 131 is supplied to the tap extraction units 151 and 152.

タップ抽出部１５１は、まだ、注目データとしていない高画質画像データの画素を注目データとして、その注目データ（の画素値）を予測するのに用いる復号画像データの幾つか（の画素）を、予測タップとして抽出する。タップ抽出部１５２も、注目データをクラス分類するのに用いる復号画像データの幾つかを、クラスタップとして抽出する。 The tap extraction unit 151 predicts some of the decoded image data (pixels) used to predict the attention data (the pixel value thereof) using the pixels of the high-quality image data that are not yet the attention data as the attention data. Extract as a tap. The tap extraction unit 152 also extracts some of the decoded image data used for classifying the attention data as class taps.

即ち、上述したように、判定部１２３（の比較部２３５）からクラス分類適応処理部１３２には、注目ブロックについてのＤＣＴタイプと実特性タイプとのセットが、注目データについてのミスマッチ情報として供給される。 That is, as described above, the set of the DCT type and the actual characteristic type for the block of interest is supplied as mismatch information for the data of interest from the determination unit 123 (the comparison unit 235) to the class classification adaptation processing unit 132. The

タップ抽出部１５１は、ミスマッチ情報としての、注目ブロックについてのＤＣＴタイプと実特性タイプとのセットを受信すると、ＭＰＥＧデコーダ２３６から供給される復号画像データから、例えば、図２５に示すようなタップ構造設定テーブルにしたがったタップ構造の予測タップを抽出する。 When the tap extraction unit 151 receives a set of the DCT type and the actual characteristic type for the block of interest as mismatch information, the tap extraction unit 151 uses, for example, a tap structure as shown in FIG. 25 from the decoded image data supplied from the MPEG decoder 236. Extract a tap with a tap structure according to the setting table.

即ち、タップ抽出部１５１は、ミスマッチ情報としてのＤＣＴタイプと実特性タイプが、いずれもフィールドＤＣＴモードである場合、後述するフィールドタップのみからなるパターンＡのタップ構造の予測タップを構成する。また、タップ抽出部１５１は、ミスマッチ情報としてのＤＣＴタイプと実特性タイプが、それぞれフィールドＤＣＴモードとフレームＤＣＴモードである場合、フィールドタップの数が、後述するフレームタップの数より多いパターンＢのタップ構造の予測タップを構成する。さらに、タップ抽出部１５１は、ミスマッチ情報としてのＤＣＴタイプと実特性タイプが、それぞれフレームＤＣＴモードとフィールドＤＣＴモードである場合、フレームタップの数が、フィールドタップの数より多いパターンＣのタップ構造の予測タップを構成する。また、タップ抽出部１５１は、ミスマッチ情報としてのＤＣＴタイプと実特性タイプが、いずれもフレームＤＣＴモードである場合、フレームタップのみからなるパターンＤのタップ構造の予測タップを構成する。 That is, when both the DCT type and the actual characteristic type as mismatch information are in the field DCT mode, the tap extraction unit 151 configures a prediction tap having a pattern A tap structure including only field taps to be described later. Further, when the DCT type and the actual characteristic type as mismatch information are the field DCT mode and the frame DCT mode, respectively, the tap extraction unit 151 includes taps of the pattern B in which the number of field taps is larger than the number of frame taps described later. Configure the structure prediction tap. Further, when the DCT type and the actual characteristic type as mismatch information are the frame DCT mode and the field DCT mode, respectively, the tap extraction unit 151 has the tap structure of the pattern C in which the number of frame taps is larger than the number of field taps. Configure the prediction tap. Further, when both the DCT type and the actual characteristic type as mismatch information are in the frame DCT mode, the tap extraction unit 151 configures a prediction tap having a tap structure of a pattern D including only frame taps.

ここで、図２６は、パターンＡ乃至Ｄのタップ構造を示している。なお、図２６において、○印が、復号画像データの画素を表している。また、斜線を付してある○印は、フィールドタップとなっている画素を表し、●印は、フレームタップとなっている画素を表している。 Here, FIG. 26 shows a tap structure of patterns A to D. In FIG. 26, the circles represent the pixels of the decoded image data. In addition, a circle mark with a hatched line represents a pixel that is a field tap, and a mark ● represents a pixel that is a frame tap.

図２６（Ａ）は、パターンＡのタップ構造を示している。パターンＡのタップ構造は、注目データに対応する復号画像データの画素（以下、適宜、注目画素という）、注目画素の左右それぞれに隣接する２画素、注目画素の上方向に１画素おいて隣接する画素、その画素の左右それぞれに隣接する２画素、注目画素の上方向に３画素おいて隣接する画素、その画素の左右それぞれに隣接する２画素、注目画素の下方向に１画素おいて隣接する画素、その画素の左右それぞれに隣接する２画素、注目画素の下方向に３画素おいて隣接する画素、その画素の左右それぞれに隣接する２画素の合計２５画素で構成される。 FIG. 26A shows the tap structure of pattern A. FIG. The tap structure of the pattern A is adjacent to the pixel of the decoded image data corresponding to the target data (hereinafter referred to as the target pixel as appropriate), two pixels adjacent to the left and right of the target pixel, and one pixel above the target pixel. Pixel, 2 pixels adjacent to the left and right of the pixel, 3 pixels above the pixel of interest adjacent to each other, 2 pixels adjacent to the left and right of the pixel, and 1 pixel below the pixel of interest The pixel is composed of a total of 25 pixels: two pixels adjacent to the left and right sides of the pixel, pixels adjacent to each other in the downward direction of the pixel of interest, and two pixels adjacent to the left and right sides of the pixel.

ここで、フィールドタップとは、その上下に隣接する２画素が、いずれもタップ（ここでは、予測タップまたはクラスタップ）となっていない画素を意味する。図２６（Ａ）のパターンＡのタップ構造では、いずれのタップも、その上下に隣接する画素がタップになっていないので、すべてフィールドタップである。 Here, the field tap means a pixel in which two adjacent pixels above and below are not taps (here, prediction taps or class taps). In the tap structure of the pattern A in FIG. 26A, all the taps are field taps because the adjacent pixels above and below the taps are not taps.

図２６（Ｂ）は、パターンＢのタップ構造を示している。パターンＢのタップ構造は、注目画素、注目画素の左右それぞれに隣接する２画素、注目画素の上方向に１画素おいて隣接する画素の左右それぞれに隣接する２画素、注目画素の上方向に３画素おいて隣接する画素の左右それぞれに隣接する１画素、注目画素の下方向に１画素おいて隣接する画素の左右それぞれに隣接する２画素、注目画素の下方向に３画素おいて隣接する画素の左右それぞれに隣接する１画素、注目画素の上に隣接する４画素、注目画素の下に隣接する４画素の合計２５画素で構成される。 FIG. 26B shows a tap structure of the pattern B. The tap structure of the pattern B includes the target pixel, two pixels adjacent to the left and right of the target pixel, two pixels adjacent to the left and right of the adjacent pixel in the upper direction of the target pixel, and three upwards of the target pixel. 1 pixel adjacent to the left and right of each adjacent pixel in the pixel, 2 pixels adjacent to the left and right of each adjacent pixel in the downward direction of the target pixel, and 3 pixels adjacent in the downward direction of the target pixel 1 pixel adjacent to each of the left and right, 4 pixels adjacent above the target pixel, and 4 pixels adjacent below the target pixel, for a total of 25 pixels.

ここで、フレームタップとは、その上または下に隣接する画素のうちの少なくとも一方がタップとなっている画素を意味する。図２６（Ｂ）のパターンＢのタップ構造では、注目画素と、注目画素の上下それぞれに隣接する４画素の合計９画素がフレームタップとなっており、残りの１６画素がフィールドタップとなっている。 Here, the frame tap means a pixel in which at least one of the adjacent pixels above or below is a tap. In the tap structure of the pattern B in FIG. 26B, a total of nine pixels of the target pixel and four pixels adjacent to the top and bottom of the target pixel are frame taps, and the remaining 16 pixels are field taps. .

図２６（Ｃ）は、パターンＣのタップ構造を示している。パターンＣのタップ構造は、注目画素、注目画素の左右それぞれに隣接する２画素、注目画素の上方向に１画素おいて隣接する画素の左右それぞれに隣接する２画素、注目画素の下方向に１画素おいて隣接する画素の左右それぞれに隣接する２画素、注目画素の上下それぞれに隣接する４画素、注目画素の上に隣接する画素の左右それぞれに隣接する１画素、注目画素の下に隣接する画素の左右それぞれに隣接する１画素の合計２５画素で構成される。 FIG. 26C shows the tap structure of the pattern C. The tap structure of the pattern C includes a pixel of interest, two pixels adjacent to the left and right of the pixel of interest, two pixels adjacent to the left and right of each adjacent pixel in the upper direction of the pixel of interest, and 1 downward of the pixel of interest. 2 pixels adjacent to the left and right of the adjacent pixel, 4 pixels adjacent to the upper and lower sides of the target pixel, 1 pixel adjacent to the left and right of the adjacent pixel above the target pixel, and adjacent to the lower side of the target pixel It consists of a total of 25 pixels, one pixel adjacent to the left and right of each pixel.

パターンＣのタップ構造では、注目画素、注目画素の上下それぞれに隣接する４画素、注目画素の左に隣接する画素、その画素の上下それぞれに隣接する２画素、注目画素の右に隣接する画素、その画素の上下それぞれに隣接する２画素の合計１９画素がフレームタップとなっており、残りの６画素がフィールドタップになっている。 In the tap structure of pattern C, the pixel of interest, the four pixels adjacent to the top and bottom of the pixel of interest, the pixel adjacent to the left of the pixel of interest, the two pixels adjacent to the top and bottom of the pixel, the pixel adjacent to the right of the pixel of interest, A total of 19 pixels, which are two adjacent pixels above and below the pixel, are frame taps, and the remaining 6 pixels are field taps.

図２６（Ｄ）は、パターンＤのタップ構造を示している。パターンＤのタップ構造は、注目画素を中心として隣接する、横×縦が５×５画素の合計２５画素で構成される。 FIG. 26D shows the tap structure of the pattern D. The tap structure of the pattern D is composed of a total of 25 pixels, with 5 × 5 pixels in the horizontal and vertical directions that are adjacent to each other with the pixel of interest at the center.

パターンＤのタップ構造では、いずれのタップも、その上または下の少なくとも一方の画素がタップとなっているので、すべてフレームタップである。 In the tap structure of the pattern D, all the taps are frame taps because at least one pixel above or below is a tap.

タップ抽出部１５１（図１３）は、ミスマッチ情報に基づき、注目データについて、図２６に示したパターンＡ乃至Ｄのうちのいずれかのタップ構造の予測タップを構成する。 Based on the mismatch information, the tap extraction unit 151 (FIG. 13) configures a prediction tap having a tap structure of any one of the patterns A to D illustrated in FIG.

タップ抽出部１５２も、タップ抽出部１５１と同様に、ミスマッチ情報に基づくタップ構造のクラスタップを構成する。 Similarly to the tap extraction unit 151, the tap extraction unit 152 configures a class tap having a tap structure based on mismatch information.

なお、ここでは、ミスマッチ情報に基づいて、予測タップとして抽出する復号画像データの画素の位置を変更するだけで、予測タップを構成する画素数は、２５画素のまま変更しないようにしたが、タップ抽出部１５１では、ミスマッチ情報に基づいて、予測タップを構成する復号画像データの画素の数を変更するようにすることも可能である。 Here, based on the mismatch information, only the pixel position of the decoded image data extracted as the prediction tap is changed, and the number of pixels constituting the prediction tap remains 25 pixels. The extraction unit 151 can change the number of pixels of the decoded image data constituting the prediction tap based on the mismatch information.

また、前処理部１３１のＭＰＥＧデコーダ２３６では、符号化データが、その符号化データに含まれる量子化ＤＣＴ係数以外の動きベクトルや、ＤＣＴタイプ、量子化ステップその他の復号を制御する情報（以下、適宜、復号制御情報という）を用いて、画像に復号されるが、タップ抽出部１５１では、このような復号制御情報も、予測タップに含めることが可能である。さらに、この場合、ミスマッチ情報に基づいて、予測タップとする復号制御情報を変更することも可能である。さらに、タープ抽出部１５１では、符号化データに含まれる量子化ＤＣＴ係数や、その量子化ＤＣＴ係数を逆量子化して得られる２次元ＤＣＴ係数も、予測タップに含めるようにすることが可能である。 Also, in the MPEG decoder 236 of the pre-processing unit 131, the encoded data is motion vector other than the quantized DCT coefficient included in the encoded data, DCT type, quantization step and other information for controlling decoding (hereinafter, referred to as “decoded data”). The tap extraction unit 151 can also include such decoding control information in the prediction tap. Further, in this case, it is also possible to change the decoding control information used as the prediction tap based on the mismatch information. Furthermore, the tarp extraction unit 151 can also include the quantized DCT coefficient included in the encoded data and the two-dimensional DCT coefficient obtained by inverse quantization of the quantized DCT coefficient in the prediction tap. .

即ち、クラス分類部１５３は、例えば、注目データについてのクラスタップに基づき、上述のADRC処理を行うことにより、クラスコード（クラスタップコード）を求める。 That is, the class classification unit 153 obtains a class code (class tap code) by performing the above-described ADRC processing based on, for example, the class tap for the data of interest.

さらに、クラス分類部１５３は、例えば、注目データについてのミスマッチ情報としてのＤＣＴタイプと実特性タイプのセットに基づいて、２ビットのクラスコード（ミスマッチコード）を求める。 Furthermore, the class classification unit 153 obtains a 2-bit class code (mismatch code) based on, for example, a set of a DCT type and an actual characteristic type as mismatch information for the data of interest.

即ち、クラス分類部１５３は、ＤＣＴタイプと実特性タイプが、いずれもフィールドＤＣＴモードの場合には、２ビットのミスマッチコードを、例えば「００」とする。また、クラス分類部１５３は、ＤＣＴタイプと実特性タイプが、それぞれフィールドＤＣＴモードとフレームＤＣＴモードの場合には、２ビットのミスマッチコードを、例えば「０１」とする。さらに、クラス分類部１５３は、ＤＣＴタイプと実特性タイプが、それぞれフレームＤＣＴモードとフィールドＤＣＴモードの場合には、２ビットのミスマッチコードを、例えば「１０」とする。また、クラス分類部１５３は、ＤＣＴタイプと実特性タイプが、いずれもフレームＤＣＴモードの場合には、２ビットのミスマッチコードを、例えば「１１」とする。 That is, the class classification unit 153 sets the 2-bit mismatch code to, for example, “00” when the DCT type and the actual characteristic type are both in the field DCT mode. Further, the class classification unit 153 sets the 2-bit mismatch code to “01”, for example, when the DCT type and the actual characteristic type are the field DCT mode and the frame DCT mode, respectively. Furthermore, the class classification unit 153 sets the 2-bit mismatch code to, for example, “10” when the DCT type and the actual characteristic type are the frame DCT mode and the field DCT mode, respectively. Further, the class classification unit 153 sets the 2-bit mismatch code to, for example, “11” when the DCT type and the actual characteristic type are both in the frame DCT mode.

なお、クラス分類部１５３では、その他、例えば、復号制御情報にも基づいて、クラス分類を行うようにすることが可能である。 In addition, the class classification unit 153 can perform class classification based on, for example, decoding control information.

クラス分類部１５３が出力するクラスコードは、係数メモリ１４１に供給される。係数メモリ１４１では、そのクラスコードに対応するタップ係数が読み出され、予測部１５４に供給される。 The class code output from the class classification unit 153 is supplied to the coefficient memory 141. In the coefficient memory 141, the tap coefficient corresponding to the class code is read and supplied to the prediction unit 154.

予測部１５４は、タップ抽出部１５１が出力する予測タップと、係数メモリ１４１から取得したタップ係数とを用いて、式（１）に示した線形予測演算を行う。これにより、予測部１５４は、注目データ（の予測値）、即ち、高画質画像データを求め、後処理部１３３に供給する。 The prediction unit 154 performs the linear prediction calculation shown in Expression (1) using the prediction tap output from the tap extraction unit 151 and the tap coefficient acquired from the coefficient memory 141. Thereby, the prediction unit 154 obtains attention data (predicted value thereof), that is, high-quality image data, and supplies it to the post-processing unit 133.

後処理部１３３では、上述したように、クラス分類適応処理部１３２（の予測部１５４）の出力、即ち、高画質画像データが、そのまま出力される。 As described above, the post-processing unit 133 outputs the output of the class classification adaptive processing unit 132 (prediction unit 154 thereof), that is, the high-quality image data as it is.

なお、図２２の実施の形態では、ブロック特性判定部２３４において、フレームＤＣＴモードとフィールドＤＣＴモードのうちのいずれか一方のみを表す実特性タイプを出力するようにしたが、実特性タイプとしては、その他、例えば、注目ブロックのフレームライン相関とフィールドライン相関を、そのまま用いることも可能である。この場合、比較部２３５においては、注目ブロックのフレームライン相関とフィールドライン相関に基づき、逆ＶＬＣ部２３１が出力する注目ブロックのＤＣＴタイプが、その注目ブロックにとって、どの程度適切であるかを表す評価値を求め、その評価値を、ミスマッチ情報として出力するようにすることが可能である。ここで、注目ブロックのフレームライン相関とフィールドライン相関を、それぞれＦ１とＦ２と表すとすれば、注目ブロックのＤＣＴタイプがフレームＤＣＴモードである場合は、評価値として、例えば、Ｆ１／（Ｆ１＋Ｆ２）を採用し、注目ブロックのＤＣＴタイプがフィールドＤＣＴモードである場合は、評価値として、例えば、Ｆ２／（Ｆ１＋Ｆ２）を採用することが可能である。 In the embodiment of FIG. 22, the block characteristic determination unit 234 outputs an actual characteristic type representing only one of the frame DCT mode and the field DCT mode. However, as the actual characteristic type, In addition, for example, the frame line correlation and the field line correlation of the block of interest can be used as they are. In this case, the comparison unit 235 evaluates how appropriate the DCT type of the target block output from the inverse VLC unit 231 is for the target block based on the frame line correlation and the field line correlation of the target block. It is possible to obtain a value and output the evaluation value as mismatch information. Here, if the frame line correlation and the field line correlation of the block of interest are expressed as F1 and F2, respectively, when the DCT type of the block of interest is the frame DCT mode, the evaluation value is, for example, F1 / (F1 + F2) When the DCT type of the target block is the field DCT mode, for example, F2 / (F1 + F2) can be adopted as the evaluation value.

さらに、タップ抽出部１５１や１５２では、ミスマッチ情報としての評価値を、１つ以上の閾値と比較し、その比較結果に基づいて、予測タップやクラスタップのタップ構造を変更するようにすることが可能である。 Furthermore, the tap extraction units 151 and 152 may compare the evaluation value as mismatch information with one or more threshold values, and change the tap structure of the prediction tap or the class tap based on the comparison result. Is possible.

また、クラス分類部１５３では、ミスマッチ情報としての評価値を量子化し、その量子化値を、ミスマッチコードとして用いることが可能である。 The class classification unit 153 can quantize the evaluation value as mismatch information and use the quantized value as a mismatch code.

さらに、図２２の実施の形態では、注目ブロックのフレームライン相関とフィールドライン相関から、その注目ブロックの実特性タイプを決定するようにしたが、注目ブロックの実特性タイプは、その他、例えば、注目ブロックの周辺のブロックにも基づいて決定することが可能である。即ち、注目ブロックの最終的な実特性タイプは、例えば、注目ブロックのフレームライン相関とフィールドライン相関から決定される実特性タイプと、注目ブロックに隣接する１以上のブロックのフレームライン相関とフィールドライン相関から決定される、それぞれのブロックの実特性タイプとの多数決によって決定することが可能である。 Further, in the embodiment of FIG. 22, the actual characteristic type of the target block is determined from the frame line correlation and the field line correlation of the target block. It is possible to make a determination based on blocks around the block. That is, the final actual characteristic type of the target block is, for example, an actual characteristic type determined from the frame line correlation and field line correlation of the target block, and the frame line correlation and field line of one or more blocks adjacent to the target block. It can be determined by majority voting with the actual characteristic type of each block, determined from the correlation.

次に、図２２の実施の形態では、実特性抽出部１２２において、符号化データをＭＰＥＧ方式で復号し、その結果得られる復号画像データから、フレームライン相関とフィールドライン相関を求め、判定部１２３において、そのフレームライン相関とフィールドライン相関から、実特性タイプを求めるようにしたが、判定部１２３では、その他、例えば、符号化データに含まれる２次元ＤＣＴ係数から、実特性タイプを求めることが可能である。 Next, in the embodiment of FIG. 22, the actual characteristic extraction unit 122 decodes the encoded data by the MPEG method, obtains the frame line correlation and the field line correlation from the decoded image data obtained as a result, and determines the determination unit 123. In FIG. 5, the actual characteristic type is obtained from the frame line correlation and the field line correlation. However, in the determination unit 123, for example, the actual characteristic type can be obtained from the two-dimensional DCT coefficient included in the encoded data. Is possible.

即ち、実特性抽出部１２２では、例えば、図２７に示すように、符号化データから得られるブロックの２次元ＤＣＴ係数のうちの、水平の横縞を基底とするもの、つまり、ブロックの左端の、ＤＣ(Direct Current)係数を除く７個の２次元ＤＣＴ係数（以下、適宜、横縞２次元ＤＣＴ係数という）（図２７において、斜線を付して示す部分）を実特性として求め、判定部１２３では、その実特性としての横縞２次元ＤＣＴ係数に基づいて、実特性タイプを求めることが可能である。 That is, in the real characteristic extraction unit 122, for example, as shown in FIG. 27, among the two-dimensional DCT coefficients of the block obtained from the encoded data, those based on horizontal horizontal stripes, that is, at the left end of the block, Seven 7-dimensional DCT coefficients excluding DC (Direct Current) coefficients (hereinafter referred to as horizontal stripe 2-dimensional DCT coefficients as appropriate) (parts indicated by hatching in FIG. 27) are obtained as actual characteristics, and the determination unit 123 The actual characteristic type can be determined based on the horizontal stripe two-dimensional DCT coefficient as the actual characteristic.

また、実特性抽出部１２２では、例えば、符号化データから得られるブロックの２次元ＤＣＴ係数のうちの、任意の横縞２次元ＤＣＴ係数と、横縞２次元ＤＣＴ係数を除く任意のＡＣ(Alternating Current)係数との差（以下、適宜、係数差分という）、または任意の横縞２次元ＤＣＴ係数のパワー（例えば、２次元ＤＣＴ係数を２乗したもの）と、横縞２次元ＤＣＴ係数を除く任意のＡＣ係数のパワーとの差（以下、適宜、パワー差分という）を求め、判定部１２３では、その係数差分またはパワー差分に基づいて、実特性タイプを求めることが可能である。 Further, in the actual characteristic extraction unit 122, for example, of the two-dimensional DCT coefficients of the block obtained from the encoded data, an arbitrary horizontal stripe two-dimensional DCT coefficient and an arbitrary AC (Alternating Current) excluding the horizontal stripe two-dimensional DCT coefficient The difference from the coefficient (hereinafter referred to as the coefficient difference as appropriate), or the power of any horizontal stripe two-dimensional DCT coefficient (for example, the square of the two-dimensional DCT coefficient) and any AC coefficient excluding the horizontal stripe two-dimensional DCT coefficient The determination unit 123 can obtain the actual characteristic type based on the coefficient difference or the power difference.

そこで、図２８は、係数差分またはパワー差分を完特性タイプとして求める実特性抽出部１２２の構成例を示している。 Therefore, FIG. 28 illustrates a configuration example of the actual characteristic extraction unit 122 that obtains the coefficient difference or the power difference as the complete characteristic type.

符号化データは、逆ＶＬＣ部２５１とＭＰＥＧデコーダ２５４に供給される。 The encoded data is supplied to the inverse VLC unit 251 and the MPEG decoder 254.

逆ＶＬＣ部２５１は、符号化データに含まれる量子化ＤＣＴ係数のＶＬＣコード、量子化ステップ、動きベクトル、その他の情報を分離する。そして、逆ＶＬＣ部２５１は、量子化ＤＣＴ係数のＶＬＣコードを逆ＶＬＣ処理することで、量子化ＤＣＴ係数に復号し、逆量子化部２５２に供給する。さらに、逆ＶＬＣ部２５１は、量子化ステップを、逆量子化部２５２に、動きベクトルを、動き補償部２５６に、それぞれ供給する。 The inverse VLC unit 251 separates the VLC code of the quantized DCT coefficient, the quantization step, the motion vector, and other information included in the encoded data. Then, the inverse VLC unit 251 performs inverse VLC processing on the VLC code of the quantized DCT coefficient to decode the quantized DCT coefficient, and supplies the quantized DCT coefficient to the inverse quantization unit 252. Further, the inverse VLC unit 251 supplies the quantization step to the inverse quantization unit 252 and the motion vector to the motion compensation unit 256, respectively.

逆量子化部２５２は、逆ＶＬＣ部２５１から供給される量子化ＤＣＴ係数を、同じく逆ＶＬＣ部２５１から供給される量子化ステップで逆量子化し、その結果得られる８×８画素のブロックの２次元ＤＣＴ係数を、演算部２５３に供給する。 The inverse quantization unit 252 inversely quantizes the quantized DCT coefficient supplied from the inverse VLC unit 251 in the quantization step similarly supplied from the inverse VLC unit 251, and 2 of the resulting 8 × 8 pixel block. The dimensional DCT coefficient is supplied to the calculation unit 253.

一方、ＭＰＥＧデコーダ２５４では、符号化データが、ＭＰＥＧ方式で符号化され、復号画像データが出力される。ＭＰＥＧデコーダ２５４が出力する復号画像のうち、参照画像とされ得るＩピクチャとＰピクチャは、メモリ２５５に供給されて記憶される。 On the other hand, in the MPEG decoder 254, the encoded data is encoded by the MPEG method, and decoded image data is output. Of the decoded images output from the MPEG decoder 254, I and P pictures that can be used as reference images are supplied to the memory 255 and stored therein.

そして、動き補償部２５６は、メモリ２５５に記憶された参照画像を読み出し、その参照画像に対して、逆ＶＬＣ部２５１から供給される動きベクトルにしたがい、動き補償を施すことで、逆量子化部２５２から演算部２５３に供給されたブロックの予測画像を生成し、ＤＣＴ変換部２５７に供給する。ＤＣＴ変換部２５７は、動き補償部２５６から供給される予測画像を２次元ＤＣＴ変換し、その結果得られる２次元ＤＣＴ係数を、演算部２５３に供給する。 Then, the motion compensation unit 256 reads the reference image stored in the memory 255, performs motion compensation on the reference image according to the motion vector supplied from the inverse VLC unit 251, and thereby the inverse quantization unit. A predicted image of the block supplied from 252 to the calculation unit 253 is generated and supplied to the DCT conversion unit 257. The DCT conversion unit 257 performs two-dimensional DCT conversion on the prediction image supplied from the motion compensation unit 256 and supplies the two-dimensional DCT coefficient obtained as a result to the calculation unit 253.

演算部２５３は、逆量子化部２５２から供給されるブロックの各２次元ＤＣＴ係数と、ＤＣＴ変換部２５７から供給される、対応する２次元ＤＣＴ係数とを、必要に応じて加算することで、そのブロックの画素値を２次元ＤＣＴ変換した２次元ＤＣＴ係数を求める。 The calculation unit 253 adds each two-dimensional DCT coefficient of the block supplied from the inverse quantization unit 252 and the corresponding two-dimensional DCT coefficient supplied from the DCT conversion unit 257 as necessary. A two-dimensional DCT coefficient obtained by two-dimensional DCT transform of the pixel value of the block is obtained.

即ち、逆量子化部２５２から供給されるブロックがイントラ符号化されているものである場合、逆量子化部２５２から供給されるブロックの２次元ＤＣＴ係数は、元の画素値を２次元ＤＣＴ変換したものとなっているから、演算部２５３は、逆量子化部２５２から供給されるブロックの２次元ＤＣＴ係数を、そのまま出力する。 That is, when the block supplied from the inverse quantization unit 252 is intra-coded, the two-dimensional DCT coefficient of the block supplied from the inverse quantization unit 252 converts the original pixel value into a two-dimensional DCT transform. Therefore, the calculation unit 253 outputs the two-dimensional DCT coefficient of the block supplied from the inverse quantization unit 252 as it is.

また、逆量子化部２５２から供給されるブロックがノンイントラ符号化されているものである場合、逆量子化部２５２から供給されるブロックの２次元ＤＣＴ係数は、元の画素値と予測画像との差分値（残差画像）を２次元ＤＣＴ変換したものとなっているから、演算部２５３は、逆量子化部２５２から供給されるブロックの各ＤＣＴ係数と、ＤＣＴ変換部２５７から供給される、予測画像を２次元ＤＣＴ変換して得られる２次元ＤＣＴ係数の対応するものとを加算することにより、元の画素値を２次元ＤＣＴ変換して得られる２次元ＤＣＴ係数を求めて出力する。 When the block supplied from the inverse quantization unit 252 is non-intra coded, the two-dimensional DCT coefficient of the block supplied from the inverse quantization unit 252 is the original pixel value, the predicted image, Since the difference value (residual image) is two-dimensional DCT transformed, the calculation unit 253 is supplied from the DCT coefficient of the block supplied from the inverse quantization unit 252 and the DCT conversion unit 257. Then, by adding the corresponding two-dimensional DCT coefficient obtained by two-dimensional DCT conversion of the predicted image, a two-dimensional DCT coefficient obtained by two-dimensional DCT conversion of the original pixel value is obtained and output.

演算部２５３が出力するブロックの２次元ＤＣＴ係数は、ＤＣＴ係数差分演算部２５８に供給される。 The two-dimensional DCT coefficient of the block output from the calculation unit 253 is supplied to the DCT coefficient difference calculation unit 258.

ＤＣＴ係数差分演算部２５８では、ブロックの２次元ＤＣＴ係数を用いて、上述したような係数差分やパワー差分が求められ、実特性として、判定部１２３に供給される。 In the DCT coefficient difference calculation unit 258, the coefficient difference and the power difference as described above are obtained using the two-dimensional DCT coefficient of the block, and supplied to the determination unit 123 as actual characteristics.

なお、この場合、判定部１２３では、例えば、注目ブロックの係数差分またはパワー差分を参照することにより、その係数差分またはパワー差分を求めるのに用いられた横縞２次元ＤＣＴ係数とＡＣ係数の大小関係が判定される。さらに、判定部１２３では、例えば、注目ブロックの係数差分またはパワー差分を求めるのに用いられた横縞２次元ＤＣＴ係数が、ＡＣ係数より小さい（または以下である）場合、実特性タイプがフィールドＤＣＴモードであると認識され、横縞２次元ＤＣＴ係数が、ＡＣ係数より小さくない場合は、実特性タイプがフレームＤＣＴモードであると認識される。なお、注目ブロックの係数差分またはパワー差分を求めるのに用いられた横縞２次元ＤＣＴ係数が、ＡＣ係数より小さい場合には、その注目ブロックの画像が、フィールドＤＣＴモードで符号化すべきものであることを表す他、横縞の多い画像であることも表す。 In this case, the determination unit 123 refers to, for example, the coefficient difference or power difference of the block of interest, and the magnitude relationship between the horizontal stripe two-dimensional DCT coefficient used to obtain the coefficient difference or power difference and the AC coefficient. Is determined. Further, in the determination unit 123, for example, when the horizontal stripe two-dimensional DCT coefficient used for obtaining the coefficient difference or power difference of the block of interest is smaller than (or less than) the AC coefficient, the actual characteristic type is the field DCT mode. If the horizontal stripe two-dimensional DCT coefficient is not smaller than the AC coefficient, the actual characteristic type is recognized as the frame DCT mode. When the horizontal stripe two-dimensional DCT coefficient used for obtaining the coefficient difference or power difference of the block of interest is smaller than the AC coefficient, the image of the block of interest should be encoded in the field DCT mode. In addition to this, it also represents an image with many horizontal stripes.

ここで、判定部１２３では、係数差分またはパワー差分、さらには、その係数差分またはパワー差分を求めるのに用いられた２次元ＤＣＴ係数を、ミスマッチ情報に含めて出力するようにすることが可能である。そして、この場合、例えば、クラス分類適応処理部１３２（図１３）では、タップ抽出部１５１と１５２それぞれにおいて、ミスマッチ情報に含まれる係数差分またはパワー差分や、２次元ＤＣＴ係数にも基づいて、予測タップとクラスタップのタップ構造を変更し、さらに、クラス分類部１５３においても、ミスマッチ情報に含まれる係数差分またはパワー差分や、２次元ＤＣＴ係数にも基づいて、クラス分類を行うようにすることが可能である。 Here, the determination unit 123 can output the coefficient difference or power difference and further include the two-dimensional DCT coefficient used to obtain the coefficient difference or power difference in mismatch information. is there. In this case, for example, in the class classification adaptation processing unit 132 (FIG. 13), the tap extraction units 151 and 152 perform prediction based on the coefficient difference or power difference included in the mismatch information and the two-dimensional DCT coefficient. The tap structure of taps and class taps is changed, and the class classification unit 153 also performs class classification based on the coefficient difference or power difference included in the mismatch information and the two-dimensional DCT coefficient. Is possible.

次に、注目ブロックのフレームライン相関とフィールドライン相関は、その他、例えば、注目ブロックの１次元ＤＣＴ係数から求めることも可能である。 Next, the frame line correlation and the field line correlation of the block of interest can be obtained from, for example, the one-dimensional DCT coefficient of the block of interest.

ここで、図２９および図３０を参照して、１次元ＤＣＴ係数について説明する。 Here, the one-dimensional DCT coefficient will be described with reference to FIGS. 29 and 30. FIG.

ＭＰＥＧやＪＰＥＧ(Joint Photographic Experts Group)等のＤＣＴ変換を利用した画像の符号化方式では、画像データが、水平方向および垂直方向の２次元のＤＣＴ変換（２次元ＤＣＴ変換）／逆ＤＣＴ変換（２次元逆ＤＣＴ変換）が行われる。 In an image encoding method using DCT conversion such as MPEG or JPEG (Joint Photographic Experts Group), image data is converted into two-dimensional DCT conversion (two-dimensional DCT conversion) / inverse DCT conversion (2 Dimensional inverse DCT transform) is performed.

図２９（Ａ）に示すような８×８画素のブロックにおける画素値を、８行×８列の行列Ｘで表すとともに、図２９（Ｂ）に示すような８×８のブロックにおける２次元ＤＣＴ係数を、８行×８列の行列Ｆで表すこととすると、２次元ＤＣＴ変換／２次元逆ＤＣＴ変換は、次式で表すことができる。 A pixel value in an 8 × 8 pixel block as shown in FIG. 29A is represented by a matrix X of 8 rows × 8 columns, and a two-dimensional DCT in an 8 × 8 block as shown in FIG. If the coefficient is represented by a matrix F of 8 rows × 8 columns, the two-dimensional DCT transformation / two-dimensional inverse DCT transformation can be represented by the following equation.

ＣＸＣ^T＝Ｆ
・・・（１６）
Ｃ^TＦＣ＝Ｘ
・・・（１７） CXC ^T = F
... (16)
C ^T FC = X
... (17)

ここで、上付のＴは、転置を表す。また、Ｃは、８行×８列のＤＣＴ変換行列で、その第ｉ＋１行第ｊ＋１列のコンポーネントｃ_ijは、次式で表される。 Here, the superscript T represents transposition. C is a DCT transformation matrix of 8 rows × 8 columns, and a component c _ij of the (i + 1) th row and the (j + 1) th column is expressed by the following equation.

ｃ_ij＝Ａ_i×ｃｏｓ（（２ｊ＋１）×ｉ×π／１６）
・・・（１８） c _ij = A _i × cos ((2j + 1) × i × π / 16)
... (18)

但し、式（１８）において、ｉ＝０のときは、Ａ_i＝１／（２√２）であり、ｉ≠０のときは、Ａ_i＝１／２である。また、ｉとｊは、０乃至７の範囲の整数値である。 However, in Expression (18), when i = 0, A _i = 1 / (2√2), and when i ≠ 0, A _i = 1/2. I and j are integer values ranging from 0 to 7.

式（１６）は、画素値Ｘを、２次元ＤＣＴ係数Ｆに変換する２次元ＤＣＴ変換を表し、式（１７）は、２次元ＤＣＴ係数Ｆを、画素値Ｘに変換する２次元逆ＤＣＴ変換を表す。 Equation (16) represents a two-dimensional DCT transformation that transforms the pixel value X into a two-dimensional DCT coefficient F, and Equation (17) represents a two-dimensional inverse DCT transformation that transforms the two-dimensional DCT coefficient F into a pixel value X. Represents.

従って、式（１７）によれば、２次元ＤＣＴ係数Ｆは、その左側から行列Ｃ^Tをかけるとともに、その右側から行列Ｃをかけることにより、画素値Ｘに変換されるが、２次元ＤＣＴ係数Ｆに対して、その左側から行列Ｃ^Tをかけるだけか、または、その右側から行列Ｃをかけるだけかすることで、１次元ＤＣＴ係数を求めることができる。 Therefore, according to equation (17), two-dimensional DCT coefficients F, as well as applying a matrix C ^T from the left, by multiplying the matrix C from the right side, is converted into the pixel value X, two-dimensional DCT coefficients against F, or just make a matrix C ^T from the left side, or, by either simply multiplying the matrix C from the right side, it is possible to obtain the one-dimensional DCT coefficients.

即ち、２次元ＤＣＴ係数Ｆに対して、その左側から行列Ｃ^Tだけをかける場合、図２９（Ｃ）に示すように、２次元ＤＣＴ係数Ｆにおける垂直方向が空間領域に変換され、水平方向が周波数領域のままとされる垂直１次元逆ＤＣＴ変換が行われることとなり、その結果、水平方向の空間周波数成分を表す水平１次元ＤＣＴ係数ｖＸｈＦを得ることができる。 That is, the two-dimensional DCT coefficients F, when applying only the left from the matrix C ^T, as shown in FIG. 29 (C), the vertical direction in the two-dimensional DCT coefficients F is converted to the spatial domain, horizontal The vertical one-dimensional inverse DCT transformation that remains in the frequency domain is performed, and as a result, a horizontal one-dimensional DCT coefficient vXhF that represents the spatial frequency component in the horizontal direction can be obtained.

また、２次元ＤＣＴ係数Ｆに対して、その右側から行列Ｃだけをかける場合、図２９（Ｄ）に示すように、２次元ＤＣＴ係数Ｆにおける水平方向が空間領域に変換され、垂直方向が周波数領域のままとされる水平１次元逆ＤＣＴ変換が行われることとなり、その結果、垂直方向の空間周波数成分を表す垂直１次元ＤＣＴ係数ｈＸｖＦを得ることができる。 Also, when only the matrix C is applied to the two-dimensional DCT coefficient F from the right side, the horizontal direction in the two-dimensional DCT coefficient F is converted into a spatial domain, and the vertical direction is the frequency as shown in FIG. The horizontal one-dimensional inverse DCT transformation that remains in the region is performed, and as a result, the vertical one-dimensional DCT coefficient hXvF representing the vertical spatial frequency component can be obtained.

なお、横×縦が８×８の２次元ＤＣＴ係数Ｆを、垂直１次元逆ＤＣＴ変換した場合には、８×１の水平１次元ＤＣＴ係数が、８セット（８行分）得られることになる（図２９（Ｃ））。また、２次元ＤＣＴ係数Ｆを、水平１次元逆ＤＣＴ変換した場合には、１×８の垂直１次元ＤＣＴ係数が、８セット（８列分）得られることになる（図２９（Ｄ））。 In addition, when a two-dimensional DCT coefficient F of horizontal × vertical 8 × 8 is subjected to a vertical one-dimensional inverse DCT transform, eight sets (eight lines) of 8 × 1 horizontal one-dimensional DCT coefficients are obtained. (FIG. 29C). Further, when the two-dimensional DCT coefficient F is subjected to the horizontal one-dimensional inverse DCT transform, 8 sets (for eight columns) of 1 × 8 vertical one-dimensional DCT coefficients are obtained (FIG. 29D). .

そして、ある行における８×１の水平１次元ＤＣＴ係数については、その左端のＤＣＴ係数が、その行の８画素の画素値の直流成分（ＤＣ成分）（８画素の画素値の平均値）を表し、他の７つのＤＣＴ係数が、その行の水平方向の交流成分を表す。また、ある列における１×８の垂直１次元ＤＣＴ係数については、その最上行のＤＣＴ係数が、その列の８画素の画素値の直流成分を表し、他の７つのＤＣＴ係数が、その列の垂直方向の交流成分を表す。 Then, for an 8 × 1 horizontal one-dimensional DCT coefficient in a certain row, the DCT coefficient at the left end represents the direct current component (DC component) of the pixel values of the eight pixels in the row (average value of the pixel values of eight pixels). And the other seven DCT coefficients represent the horizontal AC component of the row. For a 1 × 8 vertical one-dimensional DCT coefficient in a certain column, the DCT coefficient in the uppermost row represents the DC component of the pixel value of the eight pixels in the column, and the other seven DCT coefficients are in the column. Represents the AC component in the vertical direction.

ここで、式（１６）によれば、水平１次元ＤＣＴ係数は、２次元ＤＣＴ係数Ｆに対応する画素値Ｘに対して、その右側から行列Ｃ^Tをかける水平１次元ＤＣＴ変換を行うことによっても求めることができる。また、垂直１次元ＤＣＴ係数は、２次元ＤＣＴ係数Ｆに対応する画素値Ｘに対して、その左側から行列Ｃをかける垂直１次元ＤＣＴ変換を行うことによっても求めることができる。 Here, according to the equation (16), the horizontal one-dimensional DCT coefficients, the pixel value X corresponding to the two-dimensional DCT coefficients F, by performing a horizontal one-dimensional DCT transform to apply a matrix C ^T from the right Can also be sought. The vertical one-dimensional DCT coefficient can also be obtained by performing a vertical one-dimensional DCT transform that applies the matrix C from the left side to the pixel value X corresponding to the two-dimensional DCT coefficient F.

図３０は、実際の画像と、その画像についての２次元ＤＣＴ係数、水平１次元ＤＣＴ係数、および垂直１次元ＤＣＴ係数を示している。 FIG. 30 shows an actual image and a two-dimensional DCT coefficient, a horizontal one-dimensional DCT coefficient, and a vertical one-dimensional DCT coefficient for the image.

なお、図３０は、８×８ブロックの画像と、その画像についての２次元ＤＣＴ係数、水平１次元ＤＣＴ係数、および垂直１次元ＤＣＴ係数を示している。また、図３０（Ａ）が、実際の画像を、図３０（Ｂ）が、２次元ＤＣＴ係数を、図３０（Ｃ）が、水平１次元ＤＣＴ係数を、図３０（Ｄ）が、垂直１次元ＤＣＴ係数を、それぞれ示している。 FIG. 30 shows an 8 × 8 block image, a two-dimensional DCT coefficient, a horizontal one-dimensional DCT coefficient, and a vertical one-dimensional DCT coefficient for the image. 30A shows an actual image, FIG. 30B shows a two-dimensional DCT coefficient, FIG. 30C shows a horizontal one-dimensional DCT coefficient, and FIG. 30D shows a vertical 1 The dimensional DCT coefficients are shown respectively.

ここで、図３０（Ａ）の画像は、８ビットの画素値を有するものであり、そのような画素値から求められるＤＣＴ係数は、負の値も取り得る。但し、図３０（Ｂ）乃至図３０（Ｄ）の実施の形態では、求められたＤＣＴ係数に対して、１２８（＝２7）を加算し、その加算値が０未満となるものは０にクリップするとともに、加算値が２５６以上となるものは２５５にクリップすることにより、０乃至２５５の範囲のＤＣＴ係数を、図示してある。 Here, the image in FIG. 30A has an 8-bit pixel value, and the DCT coefficient obtained from such a pixel value can take a negative value. However, in the embodiment shown in FIGS. 30B to 30D, 128 (= 2 7) is added to the obtained DCT coefficient, and the addition value less than 0 is clipped to 0 In addition, the DCT coefficients in the range of 0 to 255 are shown by clipping to 255 when the added value is 256 or more.

２次元ＤＣＴ係数には、８×８画素のブロック全体の情報が反映されているため、２次元ＤＣＴ係数からでは、ブロック内の特定の画素の情報等の局所的な情報を把握するのは困難である。これに対して、水平１次元ＤＣＴ係数または垂直１次元ＤＣＴ係数には、ブロックのある１行または１列だけの情報が、それぞれ反映されているため、２次元ＤＣＴ係数に比較して、ブロック内の局所的な情報を容易に把握することができる。 Since the information of the entire block of 8 × 8 pixels is reflected in the two-dimensional DCT coefficient, it is difficult to grasp local information such as information on specific pixels in the block from the two-dimensional DCT coefficient. It is. On the other hand, since the horizontal one-dimensional DCT coefficient or the vertical one-dimensional DCT coefficient reflects information of only one row or one column of the block, it is compared with the two-dimensional DCT coefficient. The local information of can be easily grasped.

即ち、ブロックのある行の特徴は、その行の８×１の水平１次元ＤＣＴ係数から把握することができ、ある列の特徴は、その列の１×８の垂直１次元ＤＣＴ係数から把握することができる。さらに、ブロックのある画素の特徴は、その画素が位置する行の８×１の水平１次元ＤＣＴ係数と、その画素が位置する列の１×８の垂直１次元ＤＣＴ係数とから把握することができる。 That is, the feature of a certain row of a block can be grasped from the 8 × 1 horizontal one-dimensional DCT coefficient of the row, and the feature of a certain column can be grasped from the 1 × 8 vertical one-dimensional DCT coefficient of the column. be able to. Furthermore, the characteristics of a pixel with a block can be grasped from the 8 × 1 horizontal one-dimensional DCT coefficient of the row where the pixel is located and the 1 × 8 vertical one-dimensional DCT coefficient of the column where the pixel is located. it can.

また、左右に隣接するブロックどうしの境界の状態は、ブロック全体の情報が反映された２次元ＤＣＴ係数よりも、ブロックの境界部分の垂直方向の空間周波数成分を表す垂直１次元ＤＣＴ係数を用いた方が、より正確に把握することができる。さらに、上下に隣接するブロックどうしの境界の状態も、ブロック全体の情報が反映された２次元ＤＣＴ係数よりも、ブロックの境界部分の水平方向の空間周波数成分を表す水平１次元ＤＣＴ係数を用いた方が、より正確に把握することができる。 In addition, for the state of the boundary between adjacent blocks on the left and right, a vertical one-dimensional DCT coefficient representing a spatial frequency component in the vertical direction of the boundary portion of the block is used rather than a two-dimensional DCT coefficient reflecting the information of the entire block. It is possible to grasp more accurately. Further, for the state of the boundary between adjacent blocks, a horizontal one-dimensional DCT coefficient representing a horizontal spatial frequency component of the boundary part of the block is used rather than a two-dimensional DCT coefficient reflecting the information of the entire block. It is possible to grasp more accurately.

実特性抽出部１２２において、上述のような１次元ＤＣＴ係数を用いた注目ブロックのフレームライン相関とフィールドライン相関の演算は、例えば、次のように行われる。 In the actual characteristic extraction unit 122, the calculation of the frame line correlation and the field line correlation of the block of interest using the one-dimensional DCT coefficient as described above is performed as follows, for example.

即ち、実特性抽出部１２２は、図３１に示すように、ブロックにおける隣接する第ｉライン（上からｉ番目のライン）と第ｉ＋１ラインとの間の相関Ｑ（ｉ，ｉ＋１）を、例えば、次式にしたがって求める。 That is, as shown in FIG. 31, the actual characteristic extraction unit 122 calculates the correlation Q (i, i + 1) between the i-th line (i-th line from the top) and the i + 1-th line in the block, for example, Obtained according to the following equation.

Ｑ（ｉ，ｉ＋１）＝１／（Σ（ｄ_H（ｉ，ｊ）−ｄ_H（ｉ＋１，ｊ））
・・・（１９） Q (i, i + 1) = 1 / (Σ (d _H (i, j) −d _H (i + 1, j))
... (19)

但し、ｄ_H（ｉ，ｊ）は、第ｉラインの左からｊ番目（第ｊ列）の水平１次元ＤＣＴ係数を表す。また、Σは、ｊを１乃至８に変えてのサメーションを表す。 Here, d _H (i, j) represents the j-th (j-th column) horizontal one-dimensional DCT coefficient from the left of the i-th line. Σ represents a summation with j changed from 1 to 8.

そして、実特性抽出部１２２は、例えば、相関Ｑ（ｉ，ｉ＋１）の平均値（（Ｑ（１，２）＋Ｑ（２，３）＋Ｑ（３，４）＋Ｑ（４，５）＋Ｑ（５，６）＋Ｑ（６，７）＋Ｑ（７，８））／７）を求め、この平均値を、フレームライン相関として出力する。 Then, the actual characteristic extraction unit 122, for example, calculates the average value of the correlation Q (i, i + 1) ((Q (1,2) + Q (2,3) + Q (3,4) + Q (4,5) + Q (5 , 6) + Q (6,7) + Q (7,8)) / 7), and outputs this average value as a frame line correlation.

また、実特性抽出部１２２は、図３１に示すように、ブロックにおける１ラインおきに隣接する第ｉラインと第ｉ＋２ラインとの間の相関Ｑ（ｉ，ｉ＋２）を、例えば、式（１９）にしたがって求める。 Further, as shown in FIG. 31, the actual characteristic extraction unit 122 calculates the correlation Q (i, i + 2) between the i-th line and the i + 2 line adjacent to every other line in the block, for example, Equation (19) According to

そして、実特性抽出部１２２は、例えば、相関Ｑ（ｉ，ｉ＋２）の平均値（（Ｑ（１，３）＋Ｑ（２，４）＋Ｑ（３，５）＋Ｑ（４，６）＋Ｑ（５，７）＋Ｑ（６，８））／６）を求め、この平均値を、フィールドライン相関として出力する。 Then, the actual characteristic extraction unit 122, for example, calculates the average value of the correlation Q (i, i + 2) ((Q (1,3) + Q (2,4) + Q (3,5) + Q (4,6) + Q (5 7) + Q (6,8)) / 6), and the average value is output as a field line correlation.

次に、図３２は、上述のように、１次元ＤＣＴ係数を用いてフレームライン相関とフィールドライン相関を求める実特性抽出部１２２の構成例を示している。なお、図中、図２８における場合と対応する部分については、同一の符号を付してあり、以下では、その説明は、適宜省略する。即ち、図３２の実特性抽出部１２２は、ＤＣＴ係数差分演算部２５８に代えて、垂直１次元逆ＤＣＴ変換部２６１および相関演算部２６２が設けられている他は、図２８における場合と同様に構成されている。 Next, FIG. 32 illustrates a configuration example of the actual characteristic extraction unit 122 that obtains the frame line correlation and the field line correlation using the one-dimensional DCT coefficient as described above. In the figure, portions corresponding to those in FIG. 28 are denoted by the same reference numerals, and description thereof will be omitted below as appropriate. That is, the actual characteristic extraction unit 122 in FIG. 32 is the same as in FIG. 28 except that a vertical one-dimensional inverse DCT conversion unit 261 and a correlation calculation unit 262 are provided instead of the DCT coefficient difference calculation unit 258. It is configured.

垂直１次元逆ＤＣＴ変換部２６１には、演算部２５３が出力するブロックの２次元ＤＣＴ係数が供給されるようになっている。垂直１次元逆ＤＣＴ変換部２６１は、演算部２５３からの２次元ＤＣＴ係数のブロックを、垂直１次元逆ＤＣＴ変換することにより、水平１次元ＤＣＴ係数のブロックを求め、相関演算部２６２に供給する。相関演算部２６２は、垂直１次元逆ＤＣＴ変換部２６１からの水平１次元ＤＣＴ係数から、図３１で説明したように、フレームライン相関とフィールドライン相関を求めて出力する。 The vertical one-dimensional inverse DCT transform unit 261 is supplied with the two-dimensional DCT coefficient of the block output from the calculation unit 253. The vertical one-dimensional inverse DCT transform unit 261 obtains a horizontal one-dimensional DCT coefficient block by performing a vertical one-dimensional inverse DCT transform on the two-dimensional DCT coefficient block from the computation unit 253, and supplies the block to the correlation computation unit 262. . The correlation calculation unit 262 calculates and outputs the frame line correlation and the field line correlation from the horizontal one-dimensional DCT coefficient from the vertical one-dimensional inverse DCT conversion unit 261 as described in FIG.

なお、図２８や図３２の実施の形態では、ＭＰＥＧデコーダ２５４が出力する復号画像データから予測画像を生成して、その予測画像を２次元ＤＣＴ係数に変換し、演算部２５３において、符号化データから得られる残差画像の２次元ＤＣＴ係数と、予測画像の２次元ＤＣＴ係数とを加算することにより、元の画像の２次元ＤＣＴ係数を求めるようにしたが、実特性抽出部１２２では、その他、例えば、ＭＰＥＧデコーダ２５４が出力する復号画像データを２次元ＤＣＴ変換し、その結果得られる２次元ＤＣＴ係数を、元の画像の２次元ＤＣＴ係数として用い、図２８のＤＣＴ係数差分演算部２５８や図３２の垂直１次元ＤＣＴ逆ＤＣＴ変換部２６１において、処理を行うことが可能である。 In the embodiment of FIGS. 28 and 32, a predicted image is generated from the decoded image data output from the MPEG decoder 254, the predicted image is converted into a two-dimensional DCT coefficient, and the encoded data is calculated in the arithmetic unit 253. The two-dimensional DCT coefficient of the residual image obtained from the above and the two-dimensional DCT coefficient of the predicted image are added to obtain the two-dimensional DCT coefficient of the original image. For example, the decoded image data output from the MPEG decoder 254 is subjected to two-dimensional DCT transform, and the resulting two-dimensional DCT coefficient is used as the two-dimensional DCT coefficient of the original image. Processing can be performed in the vertical one-dimensional DCT inverse DCT transform unit 261 in FIG.

また、図２８や図３２の実特性抽出部１２２では、ＤＣＴ係数差分演算部２５８や垂直１次元逆ＤＣＴ変換部２６１において、演算部２５３が出力する元の画像の２次元ＤＣＴ係数ではなく、符号化データから得られる残差画像の２次元ＤＣＴ係数（逆量子化部２５２の出力）を用いて処理を行うようにすることが可能である。 Also, in the actual characteristic extraction unit 122 of FIGS. 28 and 32, the DCT coefficient difference calculation unit 258 and the vertical one-dimensional inverse DCT conversion unit 261 use the code instead of the two-dimensional DCT coefficient of the original image output from the calculation unit 253. It is possible to perform processing using the two-dimensional DCT coefficient of the residual image obtained from the quantized data (output of the inverse quantization unit 252).

次に、図３３は、図２２の係数メモリ１４１に記憶させるタップ係数を学習する場合の、図１５の学習装置の詳細構成例を示している。 Next, FIG. 33 shows a detailed configuration example of the learning device in FIG. 15 when learning the tap coefficients to be stored in the coefficient memory 141 in FIG.

図３３の実施の形態では、学習用データ記憶部１１に、学習用データとして、高画質の画像データ（学習用画像データ）が記憶されている。 In the embodiment of FIG. 33, high-quality image data (learning image data) is stored in the learning data storage unit 11 as learning data.

図３３の実施の形態において、符号化部１２は、ＭＰＥＧエンコーダ２７１で構成されており、ＭＰＥＧエンコーダ２７１は、学習用データ記憶部１１から学習用画像データを読み出して、ＭＰＥＧ２方式で符号化し、その結果られる符号化データを出力する。 In the embodiment of FIG. 33, the encoding unit 12 includes an MPEG encoder 271. The MPEG encoder 271 reads the learning image data from the learning data storage unit 11, encodes it using the MPEG2 method, and Output the resulting encoded data.

即ち、図３４は、図３３のＭＰＥＧエンコーダ２７１の構成例を示している。 That is, FIG. 34 shows a configuration example of the MPEG encoder 271 of FIG.

学習用画像データは、動きベクトル検出部３２１と演算部３２３に供給される。動きベクトル検出部３２１は、学習用画像データを対象に、例えば、ブロックマッチングを行うことにより、学習用画像データの動きベクトルを検出し、動き補償部３２２に供給する。 The learning image data is supplied to the motion vector detection unit 321 and the calculation unit 323. The motion vector detection unit 321 detects the motion vector of the learning image data, for example, by performing block matching on the learning image data, and supplies the motion vector to the motion compensation unit 322.

また、演算部３２３は、必要に応じて、学習用画像データ（原画像）から、動き補償部３２２から供給される予測画像を減算し、その結果得られる残差画像を、ＤＣＴ変換部３２４に供給する。ＤＣＴ変換部３２４は、演算部３２３からの残差画像を２次元ＤＣＴ変換し、その結果得られる２次元ＤＣＴ係数を、量子化部３２５に供給する。量子化部３２５は、ＤＣＴ変換部３２４から供給される２次元ＤＣＴ係数を、所定の量子化ステップで量子化することにより、量子化ＤＣＴ係数を得て、ＶＬＣ部３２６および逆量子化部３２７に供給する。 Further, the calculation unit 323 subtracts the prediction image supplied from the motion compensation unit 322 from the learning image data (original image) as necessary, and the resulting residual image is sent to the DCT conversion unit 324. Supply. The DCT conversion unit 324 performs two-dimensional DCT conversion on the residual image from the calculation unit 323 and supplies the two-dimensional DCT coefficient obtained as a result to the quantization unit 325. The quantization unit 325 obtains a quantized DCT coefficient by quantizing the two-dimensional DCT coefficient supplied from the DCT transform unit 324 in a predetermined quantization step, and sends the quantized DCT coefficient to the VLC unit 326 and the inverse quantization unit 327. Supply.

ＶＬＣ部３２６は、量子化部３２５から供給される量子化ＤＣＴ係数をＶＬＣコードに可変長符号化し、さらに、必要な復号制御情報（例えば、動きベクトル検出部３２１で検出された動きベクトルや、量子化部３２５で用いられた量子化ステップなど）を多重化することで、符号化データを得て出力する。 The VLC unit 326 variable-length-encodes the quantized DCT coefficient supplied from the quantization unit 325 into a VLC code, and further performs necessary decoding control information (for example, the motion vector detected by the motion vector detection unit 321, the quantum The quantizing step used in the encoding unit 325 is multiplexed to obtain encoded data and output it.

一方、逆量子化部３２７では、量子化部３２５が出力する量子化ＤＣＴ係数が逆量子化され、２次元ＤＣＴ係数が求められて、逆ＤＣＴ変換部３２８に供給される。逆ＤＣＴ変換部３２８は、逆量子化部３２７からの２次元ＤＣＴ係数を、２次元逆ＤＣＴ変換することにより、残差画像に復号し、演算部３２９に供給する。 On the other hand, in the inverse quantization unit 327, the quantized DCT coefficient output from the quantization unit 325 is inversely quantized, and a two-dimensional DCT coefficient is obtained and supplied to the inverse DCT transform unit 328. The inverse DCT transform unit 328 decodes the two-dimensional DCT coefficient from the inverse quantization unit 327 into a residual image by performing two-dimensional inverse DCT transform, and supplies the residual image to the arithmetic unit 329.

演算部３２９には、逆ＤＣＴ変換部３２８から、残差画像が供給される他、動き補償部３２２から、その残差画像を求めるのに演算部３２３で用いられたのと同一の予測画像が供給されるようになっており、演算部３２９は、残差画像と予測画像とを加算することで、元の画像を復号（ローカルデコード）する。この復号画像は、メモリ３３０に供給され、参照画像として記憶される。 The arithmetic unit 329 is supplied with the residual image from the inverse DCT transform unit 328, and also receives the same predicted image used in the arithmetic unit 323 to obtain the residual image from the motion compensation unit 322. The arithmetic unit 329 decodes the original image (local decoding) by adding the residual image and the predicted image. This decoded image is supplied to the memory 330 and stored as a reference image.

そして、動き補償部３２２では、メモリ３３０に記憶された参照画像が読み出され、動きベクトル検出部３２１から供給される動きベクトルにしたがって動き補償が施されることにより、予測画像が生成される。この予測画像は、動き補償部３２２から演算部３２３および３２９に供給される。 Then, the motion compensation unit 322 reads the reference image stored in the memory 330 and performs motion compensation according to the motion vector supplied from the motion vector detection unit 321 to generate a predicted image. This predicted image is supplied from the motion compensation unit 322 to the calculation units 323 and 329.

上述したように、演算部３２３では、動き補償部３２２からの予測画像を用いて、残差画像が求められ、また、演算部３２９では、動き補償部３２２からの予測画像を用いて、元の画像が復号される。 As described above, the computation unit 323 obtains a residual image using the prediction image from the motion compensation unit 322, and the computation unit 329 uses the prediction image from the motion compensation unit 322 to obtain the original image. The image is decoded.

図３３に戻り、ＭＰＥＧデコーダ２７１が出力する符号化データは、符号化特性情報抽出部１７１および実特性抽出部１７２に供給される。 Returning to FIG. 33, the encoded data output from the MPEG decoder 271 is supplied to the encoded characteristic information extraction unit 171 and the actual characteristic extraction unit 172.

符号化特性情報抽出部１７１は、逆ＶＬＣ部２７２で構成されており、実特性抽出部１７２は、ＭＰＥＧデコーダ２７３および相関演算部２７４で構成されている。逆ＶＬＣ部２７２、ＭＰＥＧデコーダ２７３、または相関演算部２７４は、図２２の逆ＶＬＣ部２３１、ＭＰＥＧデコーダ２３２、または相関演算部２３３とそれぞれ同様の処理を行い、これにより、逆ＶＬＣ部２７２は、注目ブロックのＤＣＴタイプを、相関演算部２７４は、注目ブロックのフレームライン相関およびフィールドライン相関を、それぞれ、判定部１７３に供給する。 The encoding characteristic information extraction unit 171 is configured by an inverse VLC unit 272, and the actual characteristic extraction unit 172 is configured by an MPEG decoder 273 and a correlation calculation unit 274. The inverse VLC unit 272, the MPEG decoder 273, or the correlation calculation unit 274 performs the same processing as the inverse VLC unit 231, the MPEG decoder 232, or the correlation calculation unit 233 in FIG. The correlation calculation unit 274 supplies the DCT type of the block of interest to the determination unit 173, respectively, the frame line correlation and the field line correlation of the block of interest.

判定部１７３は、ブロック特性判定部２７５および比較部２７６で構成されており、ブロック特性判定部２７５と比較部２７６では、そこに供給される注目ブロックのＤＣＴタイプとフレームライン相関およびフィールドライン相関を用いて、図２２のブロック特性判定部２３４と比較部２３５における場合とそれぞれ同様の処理が行われることにより、適応学習部１６０において注目教師データとされている教師データについてのミスマッチ情報が生成される。このミスマッチ情報は、比較部２７６から適応学習部１６０に供給される。 The determination unit 173 includes a block characteristic determination unit 275 and a comparison unit 276. In the block characteristic determination unit 275 and the comparison unit 276, the DCT type of the target block supplied thereto, the frame line correlation, and the field line correlation are calculated. By using the same processing as in the case of the block characteristic determination unit 234 and the comparison unit 235 in FIG. 22, mismatch information is generated for the teacher data that is the attention teacher data in the adaptive learning unit 160. . This mismatch information is supplied from the comparison unit 276 to the adaptive learning unit 160.

なお、図２２の復号装置における符号化特性情報抽出部１２１、実特性抽出部１２２、および判定部１２３において、図２７乃至図３２で説明したようにして、ミスマッチ情報が求められる場合には、図３３の学習装置における符号化特性情報抽出部１７１、実特性抽出部１７２、および判定部１７３でも、同様にして、ミスマッチ情報が求められる。 In the case where mismatch information is obtained in the encoding characteristic information extraction unit 121, the actual characteristic extraction unit 122, and the determination unit 123 in the decoding device of FIG. 22 as described with reference to FIGS. Similarly, the encoding characteristic information extraction unit 171, the actual characteristic extraction unit 172, and the determination unit 173 in the 33 learning devices also obtain mismatch information.

逆後処理部１６１Ａは、学習用データ記憶部１１から学習用画像データを読み出し、そのまま、教師データとして、適応学習部１６０に出力する。適応学習部１６０（図１５）では、教師データ記憶部１６２において、後処理部１６１Ａからの教師データが記憶される。 The reverse post-processing unit 161A reads the learning image data from the learning data storage unit 11, and outputs the learning image data as it is to the adaptive learning unit 160 as teacher data. In adaptive learning section 160 (FIG. 15), teacher data from post-processing section 161A is stored in teacher data storage section 162.

符号化部１６３Ａは、ＭＰＥＧエンコーダ２７７で構成され、ＭＰＥＧエンコーダ２７７は、ＭＰＥＧエンコーダ２７１と同様に、学習用データ記憶部１１から学習用画像データを読み出して、ＭＰＥＧ２方式で符号化し、その結果得られる符号化データを、前処理部１６３Ｂに出力する。 The encoding unit 163A includes an MPEG encoder 277. The MPEG encoder 277 reads the learning image data from the learning data storage unit 11 and encodes it using the MPEG2 system, as with the MPEG encoder 271, and obtains the result. The encoded data is output to the preprocessing unit 163B.

前処理部１６３Ｂは、図２３のＭＰＥＧデコーダ２３２と同様に構成されるＭＰＥＧデコーダ２７８で構成され、ＭＰＥＧデコーダ２７８は、ＭＰＥＧエンコーダ２７７からの符号化データを、ＭＰＥＧ２方式で復号し、その結果得られる復号画像データを、生徒データとして、適応学習部１６０に出力する。適応学習部１６０（図１５）では、生徒データ記憶部１６４において、ＭＰＥＧデコーダ２７８からの生徒データが記憶される。 The preprocessing unit 163B includes an MPEG decoder 278 configured in the same manner as the MPEG decoder 232 in FIG. 23. The MPEG decoder 278 decodes the encoded data from the MPEG encoder 277 using the MPEG2 method, and is obtained as a result. The decoded image data is output to the adaptive learning unit 160 as student data. In the adaptive learning unit 160 (FIG. 15), the student data from the MPEG decoder 278 is stored in the student data storage unit 164.

ここで、タップ抽出部１６５および１６６には、ミスマッチ情報が供給されるようになっており、タップ抽出部１６５または１６６では、ミスマッチ情報に基づき、注目教師データについて、図２２で説明したクラス分類適応処理部１３２のタップ抽出部１５１または１５２（図１３）が構成するのと同一のタップ構造の予測タップまたはクラスタップを構成する。 Here, mismatch information is supplied to the tap extraction units 165 and 166, and the tap extraction unit 165 or 166 uses the class classification adaptation described with reference to FIG. 22 for the attention teacher data based on the mismatch information. A prediction tap or a class tap having the same tap structure as the tap extraction unit 151 or 152 (FIG. 13) of the processing unit 132 is configured.

従って、例えば、タップ抽出部１５１または１５２において、図２２で説明したように、復号制御情報をも用いて、予測タップまたはクラスタップがそれぞれ構成される場合には、図３３の学習装置でも、タップ抽出部１６５または１６６（図１５）において、復号制御情報をも用いて、予測タップまたはクラスタップがそれぞれ構成される。 Therefore, for example, in the tap extraction unit 151 or 152, when the prediction tap or the class tap is configured using the decoding control information as described with reference to FIG. In the extraction unit 165 or 166 (FIG. 15), the prediction tap or the class tap is configured using the decoding control information.

その後、クラス分類部１６７（図１５）では、注目教師データについてのクラスタップとミスマッチ情報に基づき、注目教師データについて、図２２で説明したクラス分類部１５３における場合と同様のクラス分類を行い、その結果得られるクラスに対応するクラスコードを、足し込み部１６８に出力する。 Thereafter, in the class classification unit 167 (FIG. 15), based on the class tap and mismatch information for the attention teacher data, the same class classification as that in the class classification unit 153 described in FIG. The class code corresponding to the class obtained as a result is output to the adding unit 168.

なお、図３３の学習装置では、例えば、符号化部１６３ＡのＭＰＥＧエンコーダ２７７において学習用画像データをＭＰＥＧ符号化する前に、その学習用画像データの画素数を，１／Ｎに間引くようにすることで、適応学習部１６０において、ＭＰＥＧ復号された画像データを、高画質で、かつ画素数をＮ倍にする（解像度を高くする）タップ係数を得ることができる。 In the learning apparatus of FIG. 33, for example, before the learning image data is MPEG-encoded by the MPEG encoder 277 of the encoding unit 163A, the number of pixels of the learning image data is thinned to 1 / N. As a result, the adaptive learning unit 160 can obtain tap coefficients of the MPEG decoded image data with high image quality and N-times the number of pixels (increasing the resolution).

次に、図３５は、符号化データが画像データをＭＰＥＧ２方式で符号化したものである場合の、図１２の復号装置の第２の詳細構成例を示している。なお、図中、図２２における場合と対応する部分については、同一の符号を付してあり、以下では、その説明は、適宜省略する。 Next, FIG. 35 shows a second detailed configuration example of the decoding device of FIG. 12 when the encoded data is obtained by encoding image data by the MPEG2 system. In the figure, portions corresponding to those in FIG. 22 are denoted by the same reference numerals, and description thereof will be omitted below as appropriate.

図３５の実施の形態では、前処理部１３１が、逆ＶＬＣ部２８１、逆量子化部２８２、演算部２８３、ＭＰＥＧデコーダ２８４、メモリ２８５、動き補償部２８６、およびＤＣＴ変換部２８７で構成されている。 In the embodiment of FIG. 35, the preprocessing unit 131 includes an inverse VLC unit 281, an inverse quantization unit 282, a calculation unit 283, an MPEG decoder 284, a memory 285, a motion compensation unit 286, and a DCT conversion unit 287. Yes.

逆ＶＬＣ部２８１、逆量子化部２８２、演算部２８３、ＭＰＥＧデコーダ２８４、メモリ２８５、動き補償部２８６、またはＤＣＴ変換部２８７は、図２８の逆ＶＬＣ部２５１、逆量子化部２５２、演算部２５３、ＭＰＥＧデコーダ２５４、メモリ２５５、動き補償部２５６、またはＤＣＴ変換部２５７とそれぞれ同様に構成されるもので、前処理部１３１に供給される符号化データに対して、図２８で説明した場合と同様の処理を施し、これにより、前処理部１３１では、元の画像の２次元ＤＣＴ係数が求められ、前処理データとして、クラス分類適応処理部１３２に供給される。 The inverse VLC unit 281, inverse quantization unit 282, calculation unit 283, MPEG decoder 284, memory 285, motion compensation unit 286, or DCT conversion unit 287 is the same as the inverse VLC unit 251, inverse quantization unit 252, calculation unit in FIG. 28. 253, MPEG decoder 254, memory 255, motion compensation unit 256, or DCT conversion unit 257, respectively, and the encoded data supplied to the preprocessing unit 131 is described with reference to FIG. 28. As a result, the preprocessing unit 131 obtains the two-dimensional DCT coefficient of the original image and supplies it to the class classification adaptation processing unit 132 as preprocessing data.

クラス分類適応処理部１３２では、前処理部１３１が出力する２次元ＤＣＴ係数を対象に、クラス分類適応処理が行われ、これにより、高画質画像データ（の予測値）が、適応処理データとして求められる。 The class classification adaptive processing unit 132 performs class classification adaptive processing on the two-dimensional DCT coefficients output from the preprocessing unit 131, thereby obtaining high-quality image data (predicted values thereof) as adaptive processing data. It is done.

即ち、クラス分類適応処理部１３２（図１３）では、前処理部１３１が出力する２次元ＤＣＴ係数が、タップ抽出部１５１と１５２に供給される。 That is, in the class classification adaptive processing unit 132 (FIG. 13), the two-dimensional DCT coefficient output from the preprocessing unit 131 is supplied to the tap extraction units 151 and 152.

タップ抽出部１５１は、まだ、注目データとしていない高画質画像データの画素を注目データとして、その注目データを予測するのに用いる前処理データとしての２次元ＤＣＴ係数の幾つかを、予測タップとして抽出する。タップ抽出部１５２も、注目データをクラス分類するのに用いる前処理データとしての２次元ＤＣＴ係数の幾つかを、クラスタップとして抽出する。 The tap extracting unit 151 extracts, as prediction taps, two-dimensional DCT coefficients as preprocessing data used for predicting the attention data using pixels of high-quality image data that are not yet attention data as attention data. To do. The tap extraction unit 152 also extracts some of the two-dimensional DCT coefficients as preprocessing data used for classifying the data of interest as class taps.

なお、タップ抽出部１５１または１５２は、注目データについてのミスマッチ情報に基づいて、予測タップまたはクラスタップのタップ構造を、それぞれ変更する。 Note that the tap extraction unit 151 or 152 changes the tap structure of the prediction tap or the class tap based on the mismatch information about the data of interest.

即ち、タップ抽出部１５１は、例えば、注目データのブロック（注目ブロック）の２次元ＤＣＴ係数すべての他、注目ブロックの上下左右それぞれに隣接するブロックにおける２次元ＤＣＴ係数を、ミスマッチ情報に応じて抽出して、予測タップを構成する。タップ抽出部１５１も、タップ抽出部１５１と同様にして、クラスタップを構成する。 That is, for example, the tap extraction unit 151 extracts, in addition to all the two-dimensional DCT coefficients of the block of attention data (target block), two-dimensional DCT coefficients in blocks adjacent to the upper, lower, left, and right of the target block according to the mismatch information. Then, a prediction tap is configured. The tap extraction unit 151 also forms a class tap in the same manner as the tap extraction unit 151.

そして、タップ抽出部１５１で得られた予測タップは、予測部１５４に供給され、タップ抽出部１５２で得られたクラスタップは、クラス分類部１５３に供給される。 Then, the prediction tap obtained by the tap extraction unit 151 is supplied to the prediction unit 154, and the class tap obtained by the tap extraction unit 152 is supplied to the class classification unit 153.

クラス分類部１５３では、クラスタップと、注目データについてのミスマッチ情報に基づき、図２２で説明した場合と同様にして、注目データがクラス分類され、注目データについてのクラスコードが、係数メモリ１４１に供給される。係数メモリ１４１では、注目データについてのクラスコードに対応するタップ係数が読み出され、予測部１５４に供給される。 The class classification unit 153 classifies the attention data based on the class tap and the mismatch information regarding the attention data, and supplies the class code for the attention data to the coefficient memory 141 in the same manner as described with reference to FIG. Is done. In the coefficient memory 141, the tap coefficient corresponding to the class code for the data of interest is read and supplied to the prediction unit 154.

後処理部１３３では、クラス分類適応処理部１３２からの高画質画像データが、そのまま出力される。 The post-processing unit 133 outputs the high-quality image data from the class classification adaptation processing unit 132 as it is.

従って、図３５の実施の形態では、クラス分類適応処理部１３２において、２次元ＤＣＴ係数が高画質画像データに変換される。 Therefore, in the embodiment of FIG. 35, the class classification adaptation processing unit 132 converts the two-dimensional DCT coefficients into high-quality image data.

次に、図３６は、図３５の復号装置の係数メモリ１４１に記憶させるタップ係数を学習する場合の、図１５の学習装置の詳細構成例を示している。なお、図中、図３３における場合と対応する部分については、同一の符号を付してあり、以下では、その説明は、適宜省略する。 Next, FIG. 36 illustrates a detailed configuration example of the learning device in FIG. 15 when learning tap coefficients to be stored in the coefficient memory 141 of the decoding device in FIG. In the figure, portions corresponding to those in FIG. 33 are denoted by the same reference numerals, and description thereof will be omitted below as appropriate.

図３６の実施の形態では、前処理部１６３Ｂが、逆ＶＬＣ部２９１、逆量子化部２９２、演算部２９３、ＭＰＥＧデコーダ２９４、メモリ２９５、動き補償部２９６、およびＤＣＴ変換部２９７で構成されており、これらの逆ＶＬＣ部２９１乃至ＤＣＴ変換部２９７は、図３５の逆ＶＬＣ部２８１乃至ＤＣＴ変換部２８７とそれぞれ同様に構成されている。 In the embodiment of FIG. 36, the preprocessing unit 163B includes an inverse VLC unit 291, an inverse quantization unit 292, a calculation unit 293, an MPEG decoder 294, a memory 295, a motion compensation unit 296, and a DCT conversion unit 297. The inverse VLC unit 291 to DCT conversion unit 297 are configured in the same manner as the inverse VLC unit 281 to DCT conversion unit 287 of FIG.

従って、前処理部１６３Ｂでは、符号化部１６３ＡのＭＰＥＧエンコーダ２７７が出力する符号化データに対して、図３５の前処理部１３１における場合と同様の処理が施され、これにより得られる２次元ＤＣＴ係数が、生徒データとして、適応学習部１６０に供給される。 Accordingly, in the preprocessing unit 163B, the same processing as in the preprocessing unit 131 in FIG. 35 is performed on the encoded data output from the MPEG encoder 277 of the encoding unit 163A, and the two-dimensional DCT obtained thereby is obtained. The coefficient is supplied to the adaptive learning unit 160 as student data.

適応学習部１６０（図１５）では、生徒データ記憶部１６４において、前処理部１６３Ｂから供給される２次元ＤＣＴ係数が、生徒データとして記憶され、図３３で説明した場合と同様に、教師データおよび生徒データを用い、生徒データから抽出される予測タップとタップ係数から、式（１）の線形予測演算を行うことにより得られる教師データの予測値の予測誤差を統計的に最小にするタップ係数を求める学習が行われ、これにより、生徒データとしての２次元ＤＣＴ係数を、高画質画像データに変換するクラスごとのタップ係数が求められる。 In the adaptive learning unit 160 (FIG. 15), the student data storage unit 164 stores the two-dimensional DCT coefficients supplied from the preprocessing unit 163B as student data, and in the same manner as described with reference to FIG. A tap coefficient that statistically minimizes the prediction error of the predicted value of the teacher data obtained by performing the linear prediction calculation of Equation (1) from the prediction tap and tap coefficient extracted from the student data using the student data. Learning to be obtained is performed, and thereby, a tap coefficient for each class for converting a two-dimensional DCT coefficient as student data into high-quality image data is obtained.

但し、図３６の実施の形態において、適応学習部１６０（図１５）では、そのタップ抽出部１６５または１６６それぞれにおいて、図３５のクラス分類適応処理部１３２（図１３）におけるタップ抽出部１５１または１５２が構成するのと同一のタップ構造の予測タップまたはクラスタップが、ミスマッチ情報に基づいて構成される。さらに、図３６の適応学習部１６０（図１５）におけるクラス分類部１６７でも、図３５のクラス分類適応処理部１３２（図１３）におけるクラス分類部１５３と同様のクラス分類が行われる。 However, in the embodiment of FIG. 36, in the adaptive learning unit 160 (FIG. 15), the tap extraction unit 151 or 152 in the class classification adaptive processing unit 132 (FIG. 13) of FIG. A prediction tap or a class tap having the same tap structure as that configured by is configured based on the mismatch information. Furthermore, the class classification unit 167 in the adaptive learning unit 160 (FIG. 15) in FIG. 36 performs the same class classification as the class classification unit 153 in the class classification adaptation processing unit 132 (FIG. 13) in FIG.

次に、図３７は、符号化データが画像データをＭＰＥＧ２方式で符号化したものである場合の、図１２の復号装置の第３の詳細構成例を示している。なお、図中、図３５における場合と対応する部分については、同一の符号を付してあり、以下では、その説明は、適宜省略する。 Next, FIG. 37 shows a third detailed configuration example of the decoding device of FIG. 12 when the encoded data is obtained by encoding image data by the MPEG2 system. In the figure, portions corresponding to those in FIG. 35 are denoted by the same reference numerals, and description thereof will be omitted below as appropriate.

図３７の復号装置は、後処理部１３３が、逆ＤＣＴ変換部３０１で構成されていることを除いて、図３５における場合と同様に構成されている。 The decoding apparatus in FIG. 37 is configured in the same manner as in FIG. 35 except that the post-processing unit 133 is configured by an inverse DCT transform unit 301.

図３７の実施の形態では、クラス分類適応処理部１３２において、前処理部１３１が出力する２次元ＤＣＴ係数を対象に、クラス分類適応処理が行われ、これにより、２次元逆ＤＣＴ変換を行った場合に、高画質画像データを得ることのできる２次元ＤＣＴ係数（以下、適宜、高画質２次元ＤＣＴ係数という）（の予測値）が、適応処理データとして求められる。 In the embodiment of FIG. 37, the class classification adaptive processing unit 132 performs the class classification adaptive processing on the two-dimensional DCT coefficients output from the preprocessing unit 131, thereby performing the two-dimensional inverse DCT transform. In this case, a two-dimensional DCT coefficient (hereinafter, appropriately referred to as a high-quality two-dimensional DCT coefficient) (predicted value) from which high-quality image data can be obtained is obtained as adaptive processing data.

即ち、クラス分類適応処理部１３２（図１３）では、前処理部１３１が出力する前処理データとしての２次元ＤＣＴ係数が、タップ抽出部１５１と１５２に供給される。 That is, in the class classification adaptive processing unit 132 (FIG. 13), the two-dimensional DCT coefficient as the preprocessing data output from the preprocessing unit 131 is supplied to the tap extraction units 151 and 152.

タップ抽出部１５１は、まだ、注目データとしていない高画質２次元ＤＣＴ係数を注目データとして、その注目データを予測するのに用いる前処理データとしての２次元ＤＣＴ係数の幾つかを、予測タップとして抽出する。即ち、タップ抽出部１５１は、ミスマッチ情報に基づき、注目データについて、図３５における場合と同様のタップ構造の予測タップを構成する。タップ抽出部１５２も、ミスマッチ情報に基づき、注目データについて、図３５における場合と同様のタップ構造のクラスタップを構成する。 The tap extraction unit 151 extracts high-quality two-dimensional DCT coefficients that are not yet used as attention data as attention data, and extracts some of the two-dimensional DCT coefficients as preprocessing data used to predict the attention data as prediction taps. To do. That is, the tap extraction unit 151 configures a prediction tap having the same tap structure as that in FIG. 35 for the data of interest based on the mismatch information. Based on the mismatch information, the tap extraction unit 152 also configures class taps having the same tap structure as in FIG.

クラス分類部１５３では、クラスタップと、注目データについてのミスマッチ情報に基づき、図３５における場合と同様にして、注目データがクラス分類され、注目データについてのクラスコードが、係数メモリ１４１に供給される。係数メモリ１４１では、注目データについてのクラスコードに対応するタップ係数が読み出され、予測部１５４に供給される。 The class classification unit 153 classifies the attention data based on the class tap and the mismatch information regarding the attention data, and supplies the class code for the attention data to the coefficient memory 141 in the same manner as in FIG. . In the coefficient memory 141, the tap coefficient corresponding to the class code for the data of interest is read and supplied to the prediction unit 154.

予測部１５４は、タップ抽出部１５１が出力する予測タップと、係数メモリ１４１から取得したタップ係数とを用いて、式（１）に示した線形予測演算を行う。これにより、予測部１５４は、注目データ（の予測値）、即ち、高画質２次元ＤＣＴ係数を求め、後処理部１３３に供給する。 The prediction unit 154 performs the linear prediction calculation shown in Expression (1) using the prediction tap output from the tap extraction unit 151 and the tap coefficient acquired from the coefficient memory 141. As a result, the prediction unit 154 calculates attention data (predicted value thereof), that is, a high-quality two-dimensional DCT coefficient, and supplies it to the post-processing unit 133.

後処理部１３３では、逆ＤＣＴ変換部３０１において、クラス分類適応処理部１３２が出力する高画質２次元ＤＣＴ係数が、２次元逆ＤＣＴ変換され、これにより、高画質画像データが求められて出力される。 In the post-processing unit 133, in the inverse DCT conversion unit 301, the high-quality two-dimensional DCT coefficient output from the class classification adaptation processing unit 132 is subjected to two-dimensional inverse DCT conversion, whereby high-quality image data is obtained and output. The

次に、図３８は、図３７の復号装置の係数メモリ１４１に記憶させるタップ係数を学習する場合の、図１５の学習装置の詳細構成例を示している。なお、図中、図３６における場合と対応する部分については、同一の符号を付してあり、以下では、その説明は、適宜省略する。 Next, FIG. 38 shows a detailed configuration example of the learning device in FIG. 15 when learning tap coefficients to be stored in the coefficient memory 141 of the decoding device in FIG. In the figure, portions corresponding to those in FIG. 36 are denoted by the same reference numerals, and description thereof will be omitted below as appropriate.

図３８の学習装置は、逆後処理部１６１Ａが、ＤＣＴ変換部３１１で構成されていることを除いて、図３６における場合と同様に構成されている。 The learning device in FIG. 38 is configured in the same manner as in FIG. 36 except that the reverse post-processing unit 161A is configured by a DCT conversion unit 311.

従って、逆後処理部１６１Ａでは、ＤＣＴ変換部３１１において、学習用データ記憶部１１から読み出された学習用画像データとしての高画質画像データが、ブロック単位で２次元ＤＣＴ変換され、その結果得られる高画質２次元ＤＣＴ係数が、教師データとして、適応学習部１６０に供給される。 Therefore, in the reverse post-processing unit 161A, the DCT conversion unit 311 performs high-dimensional image data as learning image data read from the learning data storage unit 11 on a two-dimensional DCT basis in units of blocks, and obtains the result. The high-quality two-dimensional DCT coefficient to be obtained is supplied to the adaptive learning unit 160 as teacher data.

適応学習部１６０（図１５）では、教師データ記憶部１６２において、逆後処理部１６１Ａから供給される高画質２次元ＤＣＴ係数が、教師データとして記憶され、その教師データと、生徒データ記憶部１６４に記憶された生徒データとしての２次元ＤＣＴ係数とを用い、生徒データから抽出される予測タップとタップ係数から、式（１）の線形予測演算を行うことにより得られる教師データの予測値の予測誤差を統計的に最小にするタップ係数を求める学習が行われ、これにより、生徒データとしての２次元ＤＣＴ係数を、高画質２次元ＤＣＴ係数に変換するクラスごとのタップ係数が求められる。 In the adaptive learning unit 160 (FIG. 15), the high-quality two-dimensional DCT coefficient supplied from the inverse post-processing unit 161A is stored as teacher data in the teacher data storage unit 162, and the teacher data and student data storage unit 164 are stored. The prediction value of the teacher data obtained by performing the linear prediction calculation of Expression (1) from the prediction tap extracted from the student data and the tap coefficient using the two-dimensional DCT coefficient as the student data stored in Learning for obtaining a tap coefficient that statistically minimizes the error is performed, whereby a tap coefficient for each class that converts the two-dimensional DCT coefficient as student data into a high-quality two-dimensional DCT coefficient is obtained.

即ち、いまの場合、生徒データされている２次元ＤＣＴ係数は、前処理部１６３Ｂにおいて、符号化データから求められたものであり、量子化誤差を含んでいるため、その２次元ＤＣＴ係数を２次元逆ＤＣＴ変換して得られる画像は、いわゆるブロック歪み等を有する低画質のものとなる。 That is, in this case, the two-dimensional DCT coefficient that is student data is obtained from the encoded data in the pre-processing unit 163B and includes a quantization error. An image obtained by the dimensional inverse DCT transform has a low image quality having so-called block distortion or the like.

そこで、適応学習部１６０では、上述のように、式（１）の線形予測演算を行うことにより得られる教師データ（学習用画像データを２次元ＤＣＴ変換して得られる高画質２次元ＤＣＴ係数）の予測値の予測誤差を統計的に最小にするタップ係数を求める学習が行われることにより、生徒データされている２次元ＤＣＴ係数を、高画質２次元ＤＣＴ係数に変換するクラスごとのタップ係数が求められる。 Therefore, in the adaptive learning unit 160, as described above, teacher data (high-quality two-dimensional DCT coefficient obtained by two-dimensional DCT conversion of learning image data) obtained by performing the linear prediction calculation of Expression (1). By performing learning for obtaining a tap coefficient that statistically minimizes the prediction error of the predicted value, a tap coefficient for each class that converts the two-dimensional DCT coefficient being student data into a high-quality two-dimensional DCT coefficient is obtained. Desired.

なお、図３８の実施の形態において、適応学習部１６０（図１５）では、そのタップ抽出部１６５または１６６それぞれにおいて、図３７のクラス分類適応処理部１３２（図１３）におけるタップ抽出部１５１または１５２が構成するのと同一のタップ構造の予測タップまたはクラスタップが、ミスマッチ情報に基づいて構成される。さらに、図３８の適応学習部１６０（図１５）におけるクラス分類部１６７でも、図３７のクラス分類適応処理部１３２（図１３）におけるクラス分類部１５３と同様のクラス分類が行われる。 38, in the adaptive learning unit 160 (FIG. 15), the tap extraction unit 151 or 152 in the class classification adaptation processing unit 132 (FIG. 13) in FIG. A prediction tap or a class tap having the same tap structure as that configured by is configured based on the mismatch information. Further, the class classification unit 167 in the adaptive learning unit 160 (FIG. 15) in FIG. 38 performs the same class classification as the class classification unit 153 in the class classification adaptation processing unit 132 (FIG. 13) in FIG.

以上のように、符号化データに含まれる特性データの正しさを判定し、その判定結果を表すミスマッチ情報に基づいて、符号化データの復号、およびその復号に用いるタップ係数の学習等を行うようにしたので、例えば、符号化データに含まれる特性データが、元のデータの特性を正しく表していないものであっても、符号化データを、高品質のデータに復号することが可能となる。 As described above, the correctness of the characteristic data included in the encoded data is determined, and based on the mismatch information indicating the determination result, decoding of the encoded data, learning of tap coefficients used for the decoding, and the like are performed. Therefore, for example, even if the characteristic data included in the encoded data does not correctly represent the characteristics of the original data, the encoded data can be decoded into high-quality data.

次に、上述した一連の処理は、ハードウェアにより行うこともできるし、ソフトウェアにより行うこともできる。一連の処理をソフトウェアによって行う場合には、そのソフトウェアを構成するプログラムが、汎用のコンピュータ等にインストールされる。 Next, the series of processes described above can be performed by hardware or software. When a series of processing is performed by software, a program constituting the software is installed in a general-purpose computer or the like.

そこで、図３９は、上述した一連の処理を実行するプログラムがインストールされるコンピュータの一実施の形態の構成例を示している。 Therefore, FIG. 39 shows a configuration example of an embodiment of a computer in which a program for executing the series of processes described above is installed.

プログラムは、コンピュータに内蔵されている記録媒体としてのハードディスク４０５やＲＯＭ４０３に予め記録しておくことができる。 The program can be recorded in advance on a hard disk 405 or a ROM 403 as a recording medium built in the computer.

あるいはまた、プログラムは、フレキシブルディスク、CD-ROM(Compact Disc Read Only Memory)，MO(Magneto Optical)ディスク，DVD(Digital Versatile Disc)、磁気ディスク、半導体メモリなどのリムーバブル記録媒体４１１に、一時的あるいは永続的に格納（記録）しておくことができる。このようなリムーバブル記録媒体４１１は、いわゆるパッケージソフトウエアとして提供することができる。 Alternatively, the program is stored temporarily on a removable recording medium 411 such as a flexible disk, a CD-ROM (Compact Disc Read Only Memory), an MO (Magneto Optical) disk, a DVD (Digital Versatile Disc), a magnetic disk, or a semiconductor memory. It can be stored permanently (recorded). Such a removable recording medium 411 can be provided as so-called package software.

なお、プログラムは、上述したようなリムーバブル記録媒体４１１からコンピュータにインストールする他、ダウンロードサイトから、ディジタル衛星放送用の人工衛星を介して、コンピュータに無線で転送したり、LAN(Local Area Network)、インターネットといったネットワークを介して、コンピュータに有線で転送し、コンピュータでは、そのようにして転送されてくるプログラムを、通信部４０８で受信し、内蔵するハードディスク４０５にインストールすることができる。 The program is installed in the computer from the removable recording medium 411 as described above, or transferred from the download site to the computer wirelessly via a digital satellite broadcasting artificial satellite, LAN (Local Area Network), The program can be transferred to a computer via a network such as the Internet, and the computer can receive the program transferred in this way by the communication unit 408 and install it in the built-in hard disk 405.

コンピュータは、CPU(Central Processing Unit)４０２を内蔵している。CPU４０２には、バス４０１を介して、入出力インタフェース４１０が接続されており、CPU４０２は、入出力インタフェース４１０を介して、ユーザによって、キーボードや、マウス、マイク等で構成される入力部４０７が操作等されることにより指令が入力されると、それにしたがって、ROM(Read Only Memory)４０３に格納されているプログラムを実行する。あるいは、また、CPU４０２は、ハードディスク４０５に格納されているプログラム、衛星若しくはネットワークから転送され、通信部４０８で受信されてハードディスク４０５にインストールされたプログラム、またはドライブ４０９に装着されたリムーバブル記録媒体４１１から読み出されてハードディスク４０５にインストールされたプログラムを、RAM(Random Access Memory)４０４にロードして実行する。これにより、CPU４０２は、上述したフローチャートにしたがった処理、あるいは上述したブロック図の構成により行われる処理を行う。そして、CPU４０２は、その処理結果を、必要に応じて、例えば、入出力インタフェース４１０を介して、LCD(Liquid CryStal Display)やスピーカ等で構成される出力部４０６から出力、あるいは、通信部４０８から送信、さらには、ハードディスク４０５に記録等させる。 The computer includes a CPU (Central Processing Unit) 402. An input / output interface 410 is connected to the CPU 402 via the bus 401, and the CPU 402 operates the input unit 407 including a keyboard, a mouse, a microphone, and the like by the user via the input / output interface 410. When a command is input by the equalization, a program stored in a ROM (Read Only Memory) 403 is executed accordingly. Alternatively, the CPU 402 may be a program stored in the hard disk 405, a program transferred from a satellite or a network, received by the communication unit 408 and installed in the hard disk 405, or a removable recording medium 411 mounted on the drive 409. The program read and installed in the hard disk 405 is loaded into a RAM (Random Access Memory) 404 and executed. Thereby, the CPU 402 performs processing according to the above-described flowchart or processing performed by the configuration of the above-described block diagram. Then, the CPU 402 outputs the processing result from the output unit 406 configured by an LCD (Liquid Crystal Display), a speaker, or the like via the input / output interface 410, or from the communication unit 408 as necessary. Transmission and further recording on the hard disk 405 are performed.

ここで、本明細書において、コンピュータに各種の処理を行わせるためのプログラムを記述する処理ステップは、必ずしもフローチャートとして記載された順序に沿って時系列に処理する必要はなく、並列的あるいは個別に実行される処理（例えば、並列処理あるいはオブジェクトによる処理）も含むものである。 Here, in this specification, the processing steps for describing a program for causing a computer to perform various types of processing do not necessarily have to be processed in time series according to the order described in the flowchart, but in parallel or individually. This includes processing to be executed (for example, parallel processing or processing by an object).

また、プログラムは、１のコンピュータにより処理されるものであっても良いし、複数のコンピュータによって分散処理されるものであっても良い。さらに、プログラムは、遠方のコンピュータに転送されて実行されるものであっても良い。 Further, the program may be processed by a single computer, or may be processed in a distributed manner by a plurality of computers. Furthermore, the program may be transferred to a remote computer and executed.

なお、本実施の形態では、画像データをＭＰＥＧ方式で符号化した場合と、音声データをＣＥＬＰ方式で符号化した場合とについて説明したが、本発明は、これらの符号化方式に限定されるものではなく、例えば、音声データをＭＰ３(MPEG-1 Audio Layer 3)方式で符号化した符号化データ等にも適用可能である。 In this embodiment, the case where the image data is encoded by the MPEG system and the case where the audio data is encoded by the CELP system have been described. However, the present invention is limited to these encoding systems. Instead, for example, the present invention can also be applied to encoded data obtained by encoding audio data using the MP3 (MPEG-1 Audio Layer 3) method.

また、本発明を適用した復号装置および復号方法、並びに第１のプログラムおよび第１の記録媒体によれば、特性データの正しさが判定され、その判定結果を表すミスマッチ情報が出力される。そして、そのミスマッチ情報に基づいて、符号化データが復号される。従って、符号化データを、高品質のデータに復号することが可能となる。 In addition, according to the decoding device and decoding method, the first program, and the first recording medium to which the present invention is applied, the correctness of the characteristic data is determined, and mismatch information representing the determination result is output. Then, the encoded data is decoded based on the mismatch information. Therefore, the encoded data can be decoded into high quality data.

さらに、本発明を適用した学習装置および学習方法、並びに第２のプログラムおよび第２の記録媒体によれば、学習用のデータから、タップ係数の学習の教師となる教師データと、生徒となる生徒データが生成されて出力される。さらに、学習用のデータが符号化され、そのデータについての特性データを含む学習用の符号化データが出力される。そして、学習用の符号化データに含まれる特性データの正しさが判定され、その判定結果を表すミスマッチ情報に基づき、教師データと生徒データを用いて、タップ係数の学習が行われる。従って、そのタップ係数により、符号化データを、高品質のデータに復号することが可能となる。 Furthermore, according to the learning device and the learning method, the second program, and the second recording medium to which the present invention is applied, the teacher data serving as a teacher for learning the tap coefficient and the student serving as the student from the learning data. Data is generated and output. Further, learning data is encoded, and encoded learning data including characteristic data for the data is output. Then, the correctness of the characteristic data included in the learning encoded data is determined, and the tap coefficient is learned using the teacher data and the student data based on the mismatch information representing the determination result. Therefore, the encoded data can be decoded into high quality data by the tap coefficient.

本発明を適用した復号装置の一実施の形態の構成例を示すブロック図である。It is a block diagram which shows the structural example of one Embodiment of the decoding apparatus to which this invention is applied. 復号装置の処理を説明するフローチャートである。It is a flowchart explaining the process of a decoding apparatus. 本発明を適用した復号装置の他の一実施の形態の構成例を示すブロック図である。It is a block diagram which shows the structural example of other one Embodiment of the decoding apparatus to which this invention is applied. 本発明を適用した学習装置の一実施の形態の構成例を示すブロック図である。It is a block diagram which shows the structural example of one Embodiment of the learning apparatus to which this invention is applied. 学習装置の処理を説明するフローチャートである。It is a flowchart explaining the process of a learning apparatus. 音声データを、クラス分類適応処理によって、高音質の音声データに変換する音声データ処理装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the audio | voice data processing apparatus which converts audio | voice data into audio | voice data of high sound quality by a class classification adaptive process. 係数メモリ２５に記憶されるタップ係数を学習する学習装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the learning apparatus which learns the tap coefficient memorize | stored in the coefficient memory. 音声データをＶＳＥＬＰ方式で符号化するＶＳＥＬＰ符号化装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the VSELP encoding apparatus which encodes audio | voice data by a VSELP system. 符号化データをＶＳＥＬＰ方式で復号するＶＳＥＬＰ復号装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the VSELP decoding apparatus which decodes coding data by a VSELP system. クラス分類適応処理を適用したＶＳＥＬＰ復号装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the VSELP decoding apparatus to which the class classification adaptive process is applied. 係数メモリ８４に記憶されるタップ係数を学習する学習装置の構成例を示すブロック図である。3 is a block diagram illustrating a configuration example of a learning device that learns tap coefficients stored in a coefficient memory 84. FIG. 本発明を適用した復号装置のより詳細な構成例を示すブロック図である。It is a block diagram which shows the more detailed structural example of the decoding apparatus to which this invention is applied. クラス分類適応処理部１３２の構成例を示すブロック図である。3 is a block diagram illustrating a configuration example of a class classification adaptation processing unit 132. FIG. 復号装置の処理を説明するフローチャートである。It is a flowchart explaining the process of a decoding apparatus. 本発明を適用した学習装置のより詳細な構成例を示すブロック図である。It is a block diagram which shows the more detailed structural example of the learning apparatus to which this invention is applied. 学習装置の処理を説明するフローチャートである。It is a flowchart explaining the process of a learning apparatus. ＶＳＥＬＰ方式で符号化された符号化データを復号する復号装置の第１の構成例を示すブロック図である。It is a block diagram which shows the 1st structural example of the decoding apparatus which decodes the coding data encoded by the VSELP system. ＶＳＥＬＰ方式で符号化された符号化データを復号するのに用いられるタップ係数を学習する学習装置の第１の構成例を示すブロック図である。It is a block diagram which shows the 1st structural example of the learning apparatus which learns the tap coefficient used in decoding the coding data encoded by the VSELP system. ＶＳＥＬＰ方式で符号化された符号化データを復号する復号装置の第２の構成例を示すブロック図である。It is a block diagram which shows the 2nd structural example of the decoding apparatus which decodes the coding data encoded by the VSELP system. ＶＳＥＬＰ方式で符号化された符号化データを復号するのに用いられるタップ係数を学習する学習装置の第２の構成例を示すブロック図である。It is a block diagram which shows the 2nd structural example of the learning apparatus which learns the tap coefficient used in decoding the encoding data encoded by the VSELP system. ＶＳＥＬＰ方式で符号化された符号化データを復号するのに用いられるタップ係数を学習する学習装置の第３の構成例を示すブロック図である。It is a block diagram which shows the 3rd structural example of the learning apparatus which learns the tap coefficient used in decoding the encoding data encoded by the VSELP system. ＭＰＥＧ方式で符号化された符号化データを復号する復号装置の第１の構成例を示すブロック図である。It is a block diagram which shows the 1st structural example of the decoding apparatus which decodes the encoding data encoded by the MPEG system. ＭＰＥＧデコーダ２３２の構成例を示すブロック図である。3 is a block diagram illustrating a configuration example of an MPEG decoder 232. FIG. 画像データから、フレームライン相関とフィールドライン相関を求める方法を説明するための図である。It is a figure for demonstrating the method of calculating | requiring frame line correlation and field line correlation from image data. タップ構造設定テーブルを示す図である。It is a figure which shows a tap structure setting table. パターンＡ乃至Ｄのタップ構造を示す図である。It is a figure which shows the tap structure of the pattern A thru | or D. FIG. 横縞を基底とするＤＣＴ係数を示す図である。It is a figure which shows the DCT coefficient based on a horizontal stripe. 実特性抽出部１２２の構成例を示すブロック図である。3 is a block diagram illustrating a configuration example of an actual characteristic extraction unit 122. FIG. １次元ＤＣＴ係数を説明するための図である。It is a figure for demonstrating a one-dimensional DCT coefficient. １次元ＤＣＴ係数を説明するディスプレイ上に表示された中間階調の写真である。It is the photograph of the intermediate gradation displayed on the display explaining a one-dimensional DCT coefficient. １次元ＤＣＴ係数から、フレームライン相関とフィールドライン相関を求める方法を説明するための図である。It is a figure for demonstrating the method of calculating | requiring a frame line correlation and a field line correlation from a one-dimensional DCT coefficient. 実特性抽出部１２２の他の構成例を示すブロック図である。12 is a block diagram illustrating another configuration example of the actual characteristic extraction unit 122. FIG. ＭＰＥＧ方式で符号化された符号化データを復号するのに用いられるタップ係数を学習する学習装置の第１の構成例を示すブロック図である。It is a block diagram which shows the 1st structural example of the learning apparatus which learns the tap coefficient used in decoding the encoding data encoded by the MPEG system. ＭＰＥＧエンコーダ２７１の構成例を示すブロック図である。2 is a block diagram illustrating a configuration example of an MPEG encoder 271. FIG. ＭＰＥＧ方式で符号化された符号化データを復号する復号装置の第２の構成例を示すブロック図である。It is a block diagram which shows the 2nd structural example of the decoding apparatus which decodes the encoding data encoded by the MPEG system. ＭＰＥＧ方式で符号化された符号化データを復号するのに用いられるタップ係数を学習する学習装置の第２の構成例を示すブロック図である。It is a block diagram which shows the 2nd structural example of the learning apparatus which learns the tap coefficient used in decoding the encoding data encoded by the MPEG system. ＭＰＥＧ方式で符号化された符号化データを復号する復号装置の第３の構成例を示すブロック図である。It is a block diagram which shows the 3rd structural example of the decoding apparatus which decodes the encoding data encoded by the MPEG system. ＭＰＥＧ方式で符号化された符号化データを復号するのに用いられるタップ係数を学習する学習装置の第３の構成例を示すブロック図である。It is a block diagram which shows the 3rd structural example of the learning apparatus which learns the tap coefficient used in decoding the encoding data encoded by the MPEG system. 本発明を適用したコンピュータの一実施の形態の構成例を示すブロック図である。It is a block diagram which shows the structural example of one Embodiment of the computer to which this invention is applied.

Explanation of symbols

１ミスマッチ検出部，２復号処理部，３パラメータ記憶部，１１学習用データ記憶部，１２符号化部，１３ミスマッチ検出部，１４学習処理部，２１ピッチ検出部，２２，２３タップ抽出部，２４クラス分類部，２５係数メモリ，２６予測部，３１時間間引きフィルタ，３２ピッチ検出部，３３，３４タップ抽出部，３５クラス分類部，３６足し込み部，３７タップ係数算出部，４１マイク，４２Ａ／Ｄ変換部，４３演算器，４４ＬＰＣ分析部，４５ベクトル量子化部，４６音声合成フィルタ，４７自乗誤差演算部，４８自乗誤差最小判定部，４９適応コードブック記憶部，５０ゲイン復号器，５１励起コードブック記憶部，５２乃至５４演算器，５５コード決定部，５６チャネルエンコーダ，６１チャネルデコーダ，６２適応コードブック記憶部，６３ゲイン復号器，６４励起コードブック記憶部，６５フィルタ係数復号器，６６乃至６８演算器，６９音声合成フィルタ，８１，８２タップ抽出部，８３クラス分類部，８４係数メモリ，８５予測部，９２Ａ／Ｄ変換部，９３演算器，９４ＬＰＣ分析部，９５ベクトル量子化部，９６音声合成フィルタ，９７自乗誤差演算部，９８自乗誤差最小判定部，９９適応コードブック記憶部，１００ゲイン復号器，１０１励起コードブック記憶部，１０２乃至１０４演算器，１０５コード決定部，１１１，１１２タップ抽出部，１１３クラス分類部，１１４足し込み部，１１５タップ係数算出部，１２１符号化特性情報抽出部，１２２実特性抽出部，１２３判定部，１３１前処理部，１３２クラス分類適応処理部，１３３後処理部，１４１係数メモリ，１５１，１５２タップ抽出部，１５３クラス分類部，１５４予測部，１６０適応学習部，１６１教師データ生成部，１６１Ａ逆後処理部，１６２教師データ記憶部，１６３生徒データ生成部，１６３Ａ符号化部，１６３Ｂ前処理部，１６４生徒データ記憶部，１６５，１６６タップ抽出部，１６７クラス分類部，１６８足し込み部，１６９タップ係数算出部，１７１符号化特性情報抽出部，１７２実特性抽出部，１７３判定部，１８１チャネルデコーダ，１８２ＶＳＥＬＰ復号装置，１８３ピッチ検出部，１８４差分演算部，１８５ＶＳＥＬＰ復号装置，１９１ＶＳＥＬＰ符号化装置，１９２チャネルデコーダ，１９３ＶＳＥＬＰ復号装置，１９４ピッチ検出部，１９５差分演算部，１９６ＶＳＥＬＰ符号化装置，１９７ＶＳＥＬＰ復号装置，２０１音声合成フィルタ，２１１ＬＰＣ分析部，２１２予測フィルタ，２２１ＬＰＣ分析部，２３１逆ＶＬＣ部，２３２ＭＰＥＧデコーダ，２３３相関演算部，２３４ブロック特性判定部，２３５比較部，２３６ＭＰＥＧデコーダ，２４１逆ＶＬＣ部，２４２逆量子化部，２４３逆ＤＣＴ変換部，２４４演算部，２４５メモリ，２４６動き補償部，２４７ピクチャ選択部，２５１逆ＶＬＣ部，２５２逆量子化部，２５３演算部，２５４ＭＰＥＧデコーダ，２５５メモリ，２５６動き補償部，２５７ＤＣＴ変換部，２５８ＤＣＴ係数差分演算部，２６１垂直１次元逆ＤＣＴ変換部，２６２相関演算部，２７１ＭＰＥＧエンコーダ，２７２逆ＶＬＣ部，２７３ＭＰＥＧデコーダ，２７４相関演算部，２７５ブロック特性判定部，２７６比較部，２７７ＭＰＥＧエンコーダ，２７８ＭＰＥＧデコーダ，２８１逆ＶＬＣ部，２８２逆量子化部，２８３演算部，２８４ＭＰＥＧデコーダ，２８５メモリ，２８６動き補償部，２８７ＤＣＴ変換部，２９１逆ＶＬＣ部，２９２逆量子化部，２９３演算部，２９４ＭＰＥＧデコーダ，２９５メモリ，２９６動き補償部，２９７ＤＣＴ変換部，３０１逆ＤＣＴ変換部，３１１ＤＣＴ変換部，３２１動きベクトル検出部，３２２動き補償部，３２３演算部，３２４ＤＣＴ変換部，３２５量子化部，３２６ＶＬＣ部，３２７逆量子化部，３２８逆ＤＣＴ変換部，３２９演算部，３３０メモリ，４０１バス，４０２ CPU，４０３ ROM，４０４ RAM，４０５ハードディスク，４０６出力部，４０７入力部，４０８通信部，４０９ドライブ，４１０入出力インタフェース，４１１リムーバブル記録媒体 DESCRIPTION OF SYMBOLS 1 Mismatch detection part, 2 Decoding processing part, 3 Parameter storage part, 11 Learning data storage part, 12 Coding part, 13 Mismatch detection part, 14 Learning processing part, 21 Pitch detection part, 22, 23 Tap extraction part, 24 Class classification unit, 25 coefficient memory, 26 prediction unit, 31 time decimation filter, 32 pitch detection unit, 33, 34 tap extraction unit, 35 class classification unit, 36 addition unit, 37 tap coefficient calculation unit, 41 microphone, 42 A / D conversion unit, 43 arithmetic unit, 44 LPC analysis unit, 45 vector quantization unit, 46 speech synthesis filter, 47 square error calculation unit, 48 square error minimum determination unit, 49 adaptive codebook storage unit, 50 gain decoder, 51 Excitation code book storage unit, 52 to 54 computing unit, 55 Code determination unit, 56 channel encoder, 61 channel decoder, 62 adaptive codebook storage unit, 63 gain decoder, 64 excitation codebook storage unit, 65 filter coefficient decoder, 66 to 68 arithmetic unit, 69 speech synthesis filter, 81, 82 tap extraction unit, 83 class classification unit, 84 coefficient memory, 85 prediction unit, 92 A / D conversion unit, 93 calculator, 94 LPC analysis unit, 95 vector quantization unit, 96 speech synthesis filter, 97 square error calculation unit , 98 square error minimum determination unit, 99 adaptive codebook storage unit, 100 gain decoder, 101 excitation codebook storage unit, 102 to 104 arithmetic unit, 105 code determination unit, 111, 112 tap extraction unit, 113 class classification unit, 114 addition part, 115 tap coefficient calculation unit, 121 encoding characteristic information extraction unit, 122 actual characteristic extraction unit, 123 determination unit, 131 preprocessing unit, 132 class classification adaptive processing unit, 133 post processing unit, 141 coefficient memory, 151, 152 tap extraction 153 class classification unit, 154 prediction unit, 160 adaptive learning unit, 161 teacher data generation unit, 161A inverse post-processing unit, 162 teacher data storage unit, 163 student data generation unit, 163A encoding unit, 163B pre-processing unit, 164 Student data storage unit, 165, 166 tap extraction unit, 167 class classification unit, 168 addition unit, 169 tap coefficient calculation unit, 171 encoding characteristic information extraction unit, 172 actual characteristic extraction unit, 173 determination unit, 181 channel decoder , 182 VSELP decoding Device, 183 pitch detector, 184 difference calculator, 185 VSELP decoder, 191 VSELP encoder, 192 channel decoder, 193 VSELP decoder, 194 pitch detector, 195 difference calculator, 196 VSELP encoder, 197 VSELP Decoding apparatus, 201 speech synthesis filter, 211 LPC analysis unit, 212 prediction filter, 221 LPC analysis unit, 231 inverse VLC unit, 232 MPEG decoder, 233 correlation operation unit, 234 block characteristic determination unit, 235 comparison unit, 236 MPEG decoder, 241 inverse VLC unit, 242 inverse quantization unit, 243 inverse DCT transform unit, 244 calculation unit, 245 memory, 246 motion compensation unit, 247 picture selection unit, 251 inverse VLC unit, 252 Inverse quantization unit, 253 calculation unit, 254 MPEG decoder, 255 memory, 256 motion compensation unit, 257 DCT conversion unit, 258 DCT coefficient difference calculation unit, 261 vertical one-dimensional inverse DCT conversion unit, 262 correlation calculation unit, 271 MPEG Encoder, 272 Inverse VLC section, 273 MPEG decoder, 274 Correlation calculation section, 275 Block characteristic determination section, 276 comparison section, 277 MPEG encoder, 278 MPEG decoder, 281 Inverse VLC section, 282 Inverse quantization section, 283 calculation section, 284 MPEG decoder, 285 memory, 286 motion compensation unit, 287 DCT conversion unit, 291 inverse VLC unit, 292 inverse quantization unit, 293 arithmetic unit, 294 MPEG decoder, 295 memory, 296 motion Compensation unit, 297 DCT conversion unit, 301 inverse DCT conversion unit, 311 DCT conversion unit, 321 motion vector detection unit, 322 motion compensation unit, 323 calculation unit, 324 DCT conversion unit, 325 quantization unit, 326 VLC unit, 327 inverse Quantization unit, 328 inverse DCT conversion unit, 329 operation unit, 330 memory, 401 bus, 402 CPU, 403 ROM, 404 RAM, 405 hard disk, 406 output unit, 407 input unit, 408 communication unit, 409 drive, 410 I / O Interface, 411 removable recording media

Claims

Learning tap coefficients used for predictive calculation with encoded data, which is encoded data obtained by encoding data and is performed to decode at least encoded data including characteristic data representing the characteristics of the data Learning device
Teacher data generation means for generating and outputting teacher data serving as a teacher for learning the tap coefficient from learning data;
Student data generation means for generating and outputting student data to be students of learning of the tap coefficient from the learning data;
Encoding means for encoding the learning data and outputting encoded learning data including the characteristic data for the data;
Outputs mismatch information obtained by comparing the characteristic data included in the learning encoded data and the actual characteristic that is an actual characteristic of the learning data corresponding to the learning encoded data. Comparison output means;
Learning means for learning the tap coefficient using the teacher data and student data based on the mismatch information;
The learning means includes
The class tap extraction pattern in which class taps used for classifying the teacher data of interest which is the teacher data of interest into a class of any of a plurality of classes are previously associated with the mismatch information And a class tap extracting means for extracting from the student data,
Classifying means for classifying the teacher data of interest based on the class tap, and outputting a class code of the corresponding class;
A prediction tap extracting means for extracting a prediction tap used for the prediction calculation with the tap coefficient from the student data with the extraction pattern of the prediction tap associated in advance with the mismatch information for the attention teacher data;
Tap coefficient calculation means for obtaining, for each class, the tap coefficient that statistically minimizes the prediction error of the predicted value of the teacher data obtained by performing the prediction calculation using the prediction tap and the tap coefficient. Have
When the learning data is voice data,
The encoding means encodes the learning speech data by a CELP (Code Excited Linear Prediction coding) method, and outputs the learning encoded data,
The comparison output means calculates a difference value between the characteristic data included in the learning encoded data and an actual characteristic that is an actual characteristic of the learning speech data corresponding to the learning encoded data. Output the mismatch information that represents
The teacher data generation means and the student data generation means are
The teacher data generation means outputs the learning voice data as it is as the teacher data, and the student data generation means encodes and decodes the learning voice data by a CELP method, The result is output as the student data,
The teacher data generation means performs linear prediction analysis on the learning speech data, and drives a prediction filter using the resulting linear prediction coefficient as a filter coefficient by the learning speech data. And the residual signal is output as the teacher data, and the student data generation means encodes and decodes the learning speech data by the CELP method, and drives a speech synthesis filter obtained as a result The residual signal to be output as the student data,
Alternatively, the teacher data generation means performs linear prediction analysis on the learning voice data, and outputs a linear prediction coefficient obtained as a result thereof as the teacher data, and the student data generation means outputs the learning voice data. Is encoded and decoded by the CELP method, and a linear prediction coefficient that is a filter coefficient of a speech synthesis filter obtained as a result is output as the student data,
When the learning data is image data,
The encoding means encodes the image data for learning by an MPEG (Moving Picture Experts Group) method, and outputs the encoded data for learning.
The comparison output means represents a combination of the characteristic data included in the learning encoded data and an actual characteristic that is an actual characteristic of the learning image data corresponding to the learning encoded data. Output the mismatch information;
The teacher data generation means and the student data generation means are
The teacher data generation means outputs the learning image data as the teacher data as it is, and the student data generation means encodes and decodes the learning image data by the MPEG method, and the decoding The result is output as the student data,
The teacher data generation means outputs the learning image data as the teacher data as it is, and the student data generation means encodes and decodes the learning image data by the MPEG method, and the result Outputting the obtained two-dimensional DCT coefficient as the student data;
Alternatively, the teacher data generation unit performs two-dimensional DCT conversion on the learning image data, and outputs a two-dimensional DCT coefficient obtained as a result thereof as the teacher data, and the student data generation unit outputs the learning data. A learning device that encodes and decodes image data according to the MPEG method, and outputs the resulting two-dimensional DCT coefficients as the student data.

Learning tap coefficients used for predictive calculation with encoded data, which is encoded data obtained by encoding data and is performed to decode at least encoded data including characteristic data representing the characteristics of the data In the learning method of the learning device to
A teacher data generation step for generating and outputting teacher data to be a teacher for learning the tap coefficient from the learning data;
A student data generation step of generating and outputting student data to be students of learning of the tap coefficient from the learning data;
An encoding step of encoding the learning data and outputting encoded learning data including the characteristic data for the data;
Outputs mismatch information obtained by comparing the characteristic data included in the learning encoded data and the actual characteristic that is an actual characteristic of the learning data corresponding to the learning encoded data. A comparison output step;
A learning step of learning the tap coefficient using the teacher data and student data based on the mismatch information;
The learning step includes
The class tap extraction pattern in which class taps used for classifying the teacher data of interest which is the teacher data of interest into a class of any of a plurality of classes are previously associated with the mismatch information And a class tap extraction step for extracting from the student data;
A class classification step of classifying the teacher data of interest based on the class tap and outputting a class code of a corresponding class;
A prediction tap extraction step of extracting a prediction tap used for the prediction calculation with the tap coefficient with respect to the attention teacher data from the student data with an extraction pattern of the prediction tap previously associated with the mismatch information;
A tap coefficient calculation step for determining, for each class, the tap coefficient that statistically minimizes the prediction error of the predicted value of the teacher data obtained by performing the prediction calculation using the prediction tap and the tap coefficient; Have
When the learning data is voice data,
In the encoding step, the speech data for learning is encoded by a CELP (Code Excited Liner Prediction coding) method, and the encoded data for learning is output.
The comparison output step calculates a difference value between the characteristic data included in the learning encoded data and an actual characteristic that is an actual characteristic of the learning speech data corresponding to the learning encoded data. Output the mismatch information that represents
The teacher data generation step and the student data generation step include:
The teacher data generation step outputs the learning voice data as it is as the teacher data, and the student data generation step encodes and decodes the learning voice data by a CELP method, The result is output as the student data,
The teacher data generation step performs linear prediction analysis on the learning speech data, and drives a prediction filter using a linear prediction coefficient obtained as a result as a filter coefficient by the learning speech data. And the residual signal is output as the teacher data, and the student data generation step encodes and decodes the learning speech data by a CELP method, and drives a speech synthesis filter obtained as a result The residual signal to be output as the student data,
Alternatively, the teacher data generation step performs linear prediction analysis on the learning voice data, and outputs a linear prediction coefficient obtained as a result thereof as the teacher data, and the student data generation step includes the learning voice data. Is encoded and decoded by the CELP method, and a linear prediction coefficient that is a filter coefficient of a speech synthesis filter obtained as a result is output as the student data,
When the learning data is image data,
In the encoding step, the image data for learning is encoded by an MPEG (Moving Picture Experts Group) method, and the encoded data for learning is output.
The comparison output step represents a combination of the characteristic data included in the learning encoded data and an actual characteristic that is an actual characteristic of the learning image data corresponding to the learning encoded data. Output the mismatch information;
The teacher data generation step and the student data generation step include:
The teacher data generation step outputs the learning image data as it is as the teacher data, and the student data generation step encodes and decodes the learning image data by the MPEG method, The result is output as the student data,
The teacher data generation step outputs the learning image data as it is as the teacher data, and the student data generation step encodes and decodes the learning image data by the MPEG method, and the result Outputting the obtained two-dimensional DCT coefficient as the student data;
Alternatively, the teacher data generation step performs two-dimensional DCT transformation on the learning image data, and outputs a two-dimensional DCT coefficient obtained as a result thereof as the teacher data, and the student data generation step includes the learning data A learning method in which image data is encoded and decoded by the MPEG method, and a two-dimensional DCT coefficient obtained as a result is output as the student data.

Learning tap coefficients used for predictive calculation with encoded data, which is encoded data obtained by encoding data and is performed to decode at least encoded data including characteristic data representing the characteristics of the data In a program for causing a computer to perform learning processing
A teacher data generation step for generating and outputting teacher data to be a teacher for learning the tap coefficient from the learning data;
A student data generation step of generating and outputting student data to be students of learning of the tap coefficient from the learning data;
An encoding step of encoding the learning data and outputting encoded learning data including the characteristic data for the data;
Outputs mismatch information obtained by comparing the characteristic data included in the learning encoded data and the actual characteristic that is an actual characteristic of the learning data corresponding to the learning encoded data. A comparison output step;
A learning step of learning the tap coefficient using the teacher data and student data based on the mismatch information;
The learning step includes
The class tap extraction pattern in which class taps used for classifying the teacher data of interest which is the teacher data of interest into a class of any of a plurality of classes are previously associated with the mismatch information And a class tap extraction step for extracting from the student data;
A class classification step of classifying the teacher data of interest based on the class tap and outputting a class code of a corresponding class;
A prediction tap extraction step of extracting a prediction tap used for the prediction calculation with the tap coefficient with respect to the attention teacher data from the student data with an extraction pattern of the prediction tap previously associated with the mismatch information;
A tap coefficient calculation step for determining, for each class, the tap coefficient that statistically minimizes the prediction error of the predicted value of the teacher data obtained by performing the prediction calculation using the prediction tap and the tap coefficient; Have
When the learning data is voice data,
In the encoding step, the speech data for learning is encoded by a CELP (Code Excited Liner Prediction coding) method, and the encoded data for learning is output.
The comparison output step calculates a difference value between the characteristic data included in the learning encoded data and an actual characteristic that is an actual characteristic of the learning speech data corresponding to the learning encoded data. Output the mismatch information that represents
The teacher data generation step and the student data generation step include:
The teacher data generation step outputs the learning voice data as it is as the teacher data, and the student data generation step encodes and decodes the learning voice data by a CELP method, The result is output as the student data,
The teacher data generation step performs linear prediction analysis on the learning speech data, and drives a prediction filter using a linear prediction coefficient obtained as a result as a filter coefficient by the learning speech data. And the residual signal is output as the teacher data, and the student data generation step encodes and decodes the learning speech data by a CELP method, and drives a speech synthesis filter obtained as a result The residual signal to be output as the student data,
Alternatively, the teacher data generation step performs linear prediction analysis on the learning voice data, and outputs a linear prediction coefficient obtained as a result thereof as the teacher data, and the student data generation step includes the learning voice data. Is encoded and decoded by the CELP method, and a linear prediction coefficient that is a filter coefficient of a speech synthesis filter obtained as a result is output as the student data,
When the learning data is image data,
In the encoding step, the image data for learning is encoded by an MPEG (Moving Picture Experts Group) method, and the encoded data for learning is output.
The comparison output step represents a combination of the characteristic data included in the learning encoded data and an actual characteristic that is an actual characteristic of the learning image data corresponding to the learning encoded data. Output the mismatch information;
The teacher data generation step and the student data generation step include:
The teacher data generation step outputs the learning image data as it is as the teacher data, and the student data generation step encodes and decodes the learning image data by the MPEG method, The result is output as the student data,
The teacher data generation step outputs the learning image data as it is as the teacher data, and the student data generation step encodes and decodes the learning image data by the MPEG method, and the result Outputting the obtained two-dimensional DCT coefficient as the student data;
Alternatively, the teacher data generation step performs two-dimensional DCT transformation on the learning image data, and outputs a two-dimensional DCT coefficient obtained as a result thereof as the teacher data, and the student data generation step includes the learning data A program for causing a computer to perform a learning process in which image data is encoded and decoded by the MPEG method, and a two-dimensional DCT coefficient obtained as a result is output as the student data.

Learning tap coefficients used for predictive calculation with encoded data, which is encoded data obtained by encoding data and is performed to decode at least encoded data including characteristic data representing the characteristics of the data In a recording medium on which a program for causing a computer to perform learning processing is recorded,
A teacher data generation step for generating and outputting teacher data to be a teacher for learning the tap coefficient from the learning data;
A student data generation step of generating and outputting student data to be students of learning of the tap coefficient from the learning data;
An encoding step of encoding the learning data and outputting encoded learning data including the characteristic data for the data;
Outputs mismatch information obtained by comparing the characteristic data included in the learning encoded data and the actual characteristic that is an actual characteristic of the learning data corresponding to the learning encoded data. A comparison output step;
A learning step of learning the tap coefficient using the teacher data and student data based on the mismatch information;
The learning step includes
The class tap extraction pattern in which class taps used for classifying the teacher data of interest which is the teacher data of interest into a class of any of a plurality of classes are previously associated with the mismatch information And a class tap extraction step for extracting from the student data;
A class classification step of classifying the teacher data of interest based on the class tap and outputting a class code of a corresponding class;
A prediction tap extraction step of extracting a prediction tap used for the prediction calculation with the tap coefficient with respect to the attention teacher data from the student data with an extraction pattern of the prediction tap previously associated with the mismatch information;
A tap coefficient calculation step for determining, for each class, the tap coefficient that statistically minimizes the prediction error of the predicted value of the teacher data obtained by performing the prediction calculation using the prediction tap and the tap coefficient; Have
When the learning data is voice data,
In the encoding step, the speech data for learning is encoded by a CELP (Code Excited Liner Prediction coding) method, and the encoded data for learning is output.
The comparison output step calculates a difference value between the characteristic data included in the learning encoded data and an actual characteristic that is an actual characteristic of the learning speech data corresponding to the learning encoded data. Output the mismatch information that represents
The teacher data generation step and the student data generation step include:
The teacher data generation step outputs the learning voice data as it is as the teacher data, and the student data generation step encodes and decodes the learning voice data by a CELP method, The result is output as the student data,
The teacher data generation step performs linear prediction analysis on the learning speech data, and drives a prediction filter using a linear prediction coefficient obtained as a result as a filter coefficient by the learning speech data. And the residual signal is output as the teacher data, and the student data generation step encodes and decodes the learning speech data by a CELP method, and drives a speech synthesis filter obtained as a result The residual signal to be output as the student data,
Alternatively, the teacher data generation step performs linear prediction analysis on the learning voice data, and outputs a linear prediction coefficient obtained as a result thereof as the teacher data, and the student data generation step includes the learning voice data. Is encoded and decoded by the CELP method, and a linear prediction coefficient that is a filter coefficient of a speech synthesis filter obtained as a result is output as the student data,
When the learning data is image data,
In the encoding step, the image data for learning is encoded by an MPEG (Moving Picture Experts Group) method, and the encoded data for learning is output.
The comparison output step represents a combination of the characteristic data included in the learning encoded data and an actual characteristic that is an actual characteristic of the learning image data corresponding to the learning encoded data. Output the mismatch information;
The teacher data generation step and the student data generation step include:
The teacher data generation step outputs the learning image data as it is as the teacher data, and the student data generation step encodes and decodes the learning image data by the MPEG method, The result is output as the student data,
The teacher data generation step outputs the learning image data as it is as the teacher data, and the student data generation step encodes and decodes the learning image data by the MPEG method, and the result Outputting the obtained two-dimensional DCT coefficient as the student data;
Alternatively, the teacher data generation step performs two-dimensional DCT transformation on the learning image data, and outputs a two-dimensional DCT coefficient obtained as a result thereof as the teacher data, and the student data generation step includes the learning data A recording medium on which is recorded a program for causing a computer to perform learning processing for encoding and decoding image data by the MPEG method and outputting the resulting two-dimensional DCT coefficient as the student data.