JP4735375B2

JP4735375B2 - Image processing apparatus and moving image encoding method.

Info

Publication number: JP4735375B2
Application number: JP2006102638A
Authority: JP
Inventors: 昌史高橋; 智一村上; 勲軽部; 浩朗伊藤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2006-04-04
Filing date: 2006-04-04
Publication date: 2011-07-27
Anticipated expiration: 2026-04-04
Also published as: JP2007281634A

Description

本発明は動画像を符号化するための技術に関する。 The present invention relates to a technique for encoding a moving image.

画像の圧縮符号化処理において、当該処理に用いられる最適な圧縮パラメータを得るために学習機能を利用することが知られている。例えば特許文献１には、観測者が圧縮パラメータを変更しつつ客観的画質評価を行い、この評価により画質が良いとされた画像の圧縮パラメータを用いて画像の統計的性質をニューラルネットワークによる学習し、最適な圧縮パラメータを生成することが開示されている。 In the image compression coding process, it is known to use a learning function to obtain an optimal compression parameter used for the process. For example, in Patent Document 1, the observer performs objective image quality evaluation while changing the compression parameter, and learns the statistical properties of the image using a neural network by using the compression parameter of the image that is determined to have good image quality by this evaluation. Generating optimal compression parameters is disclosed.

特開平７−１８４０６２号公報Japanese Patent Laid-Open No. 7-184062

ところで、ネットワークを介して映像ストリームを配信する場合や、受信したテレビジョン映像信号を圧縮してハードディスクに随時格納する場合は、リアルタイムに符号化画像を生成する必要があるため画像の符号化処理を高速化する必要がある。このような場合には、高画質と符号化処理の高速化の両立が要求されるが、上記特許文献１はこのようなことについては考慮されていない。また、特許文献１は、観測者によって変更された圧縮パラメータを用いて学習しているため、装置の実使用時においてリアルタイムに学習することができず、好適なパラメータを得ることが困難となる。 By the way, when a video stream is distributed via a network or when a received television video signal is compressed and stored in a hard disk as needed, it is necessary to generate an encoded image in real time. Need to speed up. In such a case, both high image quality and high speed encoding processing are required. However, Patent Document 1 does not consider such a case. Further, since Patent Document 1 learns using a compression parameter changed by an observer, it cannot be learned in real time when the apparatus is actually used, and it is difficult to obtain a suitable parameter.

本発明は、上記課題に鑑みて為されたものであって、その目的は、符号化処理を高速化させつつ高画質を得るのに好適な技術を提供することにある。 The present invention has been made in view of the above problems, and an object thereof is to provide a technique suitable for obtaining high image quality while speeding up the encoding process.

上記目的を達成するために、本発明は、入力画像を符号化するための符号化モードを所定の演算モデルを用いて判定する第１モード判定部と、該第１モード判定部によって判定された符号化モードとそれに対応する画像の状態に関する情報とを教師信号として、該符号化モードと該情報との対応関係についての統計情報を学習する第２のモード判定部を備えることを特徴とする。 In order to achieve the above object, the present invention is determined by a first mode determination unit that determines an encoding mode for encoding an input image using a predetermined calculation model, and the first mode determination unit. A second mode determination unit is provided that learns statistical information about the correspondence between the coding mode and the information using the coding mode and information about the state of the image corresponding thereto as a teacher signal.

上記のような構成において、例えば上記第１及び第２のモード判定部を備える装置の初期使用時においては、上記第１の判定部を選択し、所定の演算モデル（例えばRD-Optimization方式）に従い符号化モードの判別を行う。このとき、第２の判定部は、その判別結果の統計をそのときの画像状態に対応させて学習する。そして、例えば所定時間経過後に第２の判定部を選択し、これにより符号化モードを判定するようにすれば、様々な状態の画像に対して適した符号化モードを、上記学習の結果に応じて所定の演算モデルを用いることなく迅速に判別することができる。このとき、第２の判別部は、学習の結果として得られた、画像状態毎に対応した最も尤度の高い符号化モードを選択して判別することが好ましい。 In the configuration as described above, for example, at the time of initial use of the apparatus including the first and second mode determination units, the first determination unit is selected, and according to a predetermined calculation model (for example, RD-Optimization method). The encoding mode is determined. At this time, the second determination unit learns the statistics of the determination result in association with the image state at that time. For example, if the second determination unit is selected after a lapse of a predetermined time and the encoding mode is determined by this, an encoding mode suitable for images in various states is selected according to the learning result. Thus, it is possible to quickly determine without using a predetermined calculation model. At this time, it is preferable that the second discriminating unit selects and discriminates the coding mode having the highest likelihood corresponding to each image state obtained as a result of learning.

上記第１及び第２のモード判定部の選択は、符号化モードを切り替えるための切替部に対する操作に応答するようにしてもよい。 The selection of the first and second mode determination units may respond to an operation on the switching unit for switching the encoding mode.

本発明によれば、符号化処理を高速化させつつ高画質を得ることが可能となる。 According to the present invention, it is possible to obtain high image quality while speeding up the encoding process.

以下、本発明の実施の形態について図面を参照しつつ説明する。 Embodiments of the present invention will be described below with reference to the drawings.

まず、本発明が適用され得る画像処理装置の一例について図１２を用いて説明する。本実施例では、画像処理装置としてＨＤＤ（ハードディスクドライブ）や半導体メモリ（例えばフラッシュメモリ）等の記録媒体を内蔵するテレビジョン受信機を例にして説明するが、符号化処理を有する装置であれば、どのようなものにでも適用できる。例えば、ビデオカメラ、カメラ機能を内蔵した携帯電話、ＨＤＤレコーダ、ＤＶＤレコーダ等にも本発明は同様に適用できる。 First, an example of an image processing apparatus to which the present invention can be applied will be described with reference to FIG. In the present embodiment, a television receiver that incorporates a recording medium such as an HDD (hard disk drive) or a semiconductor memory (for example, a flash memory) will be described as an example of an image processing apparatus. However, any apparatus having an encoding process may be used. It can be applied to anything. For example, the present invention can be similarly applied to a video camera, a mobile phone with a built-in camera function, an HDD recorder, a DVD recorder, and the like.

図１２において、画像取込部１は、例えばテレビジョン信号を受信するチューナであり、ここで取り込まれた画像は符号化部２に供給される。符号化部２は、入力された画像に対して符号化処理を行って圧縮符号化し、例えばＨＤＤなどの記録媒体を含む記録部３に供給される。ここで、符号化部２は、MPEG 2 (Moving Picture Experts Group)、MPEG-4、H.264/AVC (Advanced Video Coding)規格等の所定の符号化方式を用いて画像の符号化処理を行う。本実施例では、MPEG-4と比べて２倍程度の圧縮率で動画像の符号化を行うことが可能なH.264/AVC規格で符号化処理を行うものとするが、これに限られるものではない。当然ながら、他の符号化方式も使用することができる。 In FIG. 12, an image capturing unit 1 is a tuner that receives a television signal, for example, and the captured image is supplied to the encoding unit 2. The encoding unit 2 performs an encoding process on the input image to perform compression encoding, and supplies the image to a recording unit 3 including a recording medium such as an HDD. Here, the encoding unit 2 performs an image encoding process using a predetermined encoding method such as MPEG 2 (Moving Picture Experts Group), MPEG-4, or H.264 / AVC (Advanced Video Coding) standard. . In this embodiment, encoding processing is performed according to the H.264 / AVC standard, which can encode moving images at a compression rate about twice that of MPEG-4, but is not limited thereto. It is not a thing. Of course, other encoding schemes can be used.

記録部３は、符号化部２によって圧縮符号化された画像を記録する。記録部３から読み出された画像は、復号部４によって復号されて映像信号として信号処理部５に供給される。信号処理部５では、上記映像信号に対し例えば色補正、コントラスト補正、ガンマ補正などの所定の信号処理を施して、例えばＰＤＰ(plasma display panel)または液晶パネル等で構成された表示部６に供給する。表示部６では、供給された映像信号に応じて画像の表示を行う。また制御部７は、例えばＣＰＵで構成されており、画像取込部１、符号化部２、記録部３、復号化部４、及び信号処理部５に対して制御信号を供給して各部の制御を行う。また制御部７は、ユーザからの指令が入力される指令入力端子と電気的に接続されており、ユーザの指令に応じて上記符号化部２を制御するように構成される。 The recording unit 3 records the image compressed and encoded by the encoding unit 2. The image read from the recording unit 3 is decoded by the decoding unit 4 and supplied to the signal processing unit 5 as a video signal. The signal processing unit 5 performs predetermined signal processing such as color correction, contrast correction, and gamma correction on the video signal, and supplies the video signal to the display unit 6 configured by, for example, a PDP (plasma display panel) or a liquid crystal panel. To do. The display unit 6 displays an image according to the supplied video signal. The control unit 7 is constituted by a CPU, for example, and supplies control signals to the image capturing unit 1, the encoding unit 2, the recording unit 3, the decoding unit 4, and the signal processing unit 5. Take control. The control unit 7 is electrically connected to a command input terminal to which a command from a user is input, and is configured to control the encoding unit 2 according to the user's command.

かかる画像処理装置は、例えばテレビジョン信号をリアルタイムに画像を圧縮符号化してＨＤＤ等の記録媒体に随時記録するため、符号化部２における符号化処理を高速にする必要がある。また、表示部６に表示される画像の画質劣化が抑制されるように、符号化処理を行うことが望まれる。そのために本実施例では、符号化部２における符号化処理、特に符号化処理を行う際に用いられる符号化モードを判別する処理に、学習機能を持たせて当該判別処理を高速化したものである。以下、本実施例に係る符号化処理の詳細について説明する。ここで、符号化モードとは、画面内(Intra)符号化モード、画面間スキップ(PSkip)符号化モード、画面間(Inter)符号化モードや、更にはブロックマッチング処理を行う際のブロックの種類やサイズ（例えば8×8、8×4、4×8、4×4のブロック）、更にはこれらの組み合わせ等を含むものとする。 Since such an image processing apparatus compresses and encodes, for example, a television signal in real time and records it on a recording medium such as an HDD as needed, it is necessary to speed up the encoding process in the encoding unit 2. In addition, it is desirable to perform the encoding process so that image quality deterioration of the image displayed on the display unit 6 is suppressed. Therefore, in this embodiment, the encoding process in the encoding unit 2, particularly the process for determining the encoding mode used when performing the encoding process is provided with a learning function to speed up the determination process. is there. Details of the encoding process according to the present embodiment will be described below. Here, the coding mode is an intra-screen (Intra) coding mode, an inter-screen skip (PSkip) coding mode, an inter-screen (Inter) coding mode, or a block type when performing block matching processing. And sizes (for example, 8 × 8, 8 × 4, 4 × 8, and 4 × 4 blocks), and combinations thereof.

図１は、図１２に示された符号化部２の一具体例を示している。上記画像取込部１からの画像のストリームは、原画像メモリ(101)に入力され原画像のデータとして記憶される。原画像メモリ(101)から読み出された原画像は、ブロック分割部(102)に供給される。ブロック分割部(102)は、原画像を予め設定された数種類の大きさを持つ複数のブロックに分割して動き探索部(103)、予測画像生成部(104)、第１のモード判定部である学習用モード判定部(107)、及び第２のモード判定部である高速動作モード判定部(108)にそれぞれ供給される。動き探索部(103)では、ブロック分割部(102)からのブロック画像と参照画像メモリ(106)に保持されている参照画像とを用いて、上記ブロック画像と相関性の高い（すなわち当該ブロック画像との差分が最も小さい）参照画像内のブロックを探索する。そして上記ブロック画像と相関性が高いとされた参照画像内のブロックとを用いて、当該ブロック画像における動きベクトルを計算する。この動きベクトルは、予測画像生成部(104)と第１スイッチ部(110)に供給される。予測画像生成部(104)では、動き探索部(103)からの動きベクトル情報等を用いて各符号化モードで符号化した場合の予測画像を作成し、第１スイッチ部(110)に供給する。 FIG. 1 shows a specific example of the encoding unit 2 shown in FIG. The stream of images from the image capturing unit 1 is input to the original image memory (101) and stored as original image data. The original image read from the original image memory (101) is supplied to the block dividing unit (102). The block division unit (102) divides the original image into a plurality of blocks having several types of preset sizes, and the motion search unit (103), the predicted image generation unit (104), and the first mode determination unit These are supplied to a learning mode determination unit (107) and a high-speed operation mode determination unit (108) which is a second mode determination unit. The motion search unit (103) uses the block image from the block dividing unit (102) and the reference image stored in the reference image memory (106), and has a high correlation with the block image (that is, the block image) The block in the reference image is searched. Then, a motion vector in the block image is calculated using the block image and a block in the reference image that has a high correlation. This motion vector is supplied to the predicted image generation unit (104) and the first switch unit (110). The predicted image generation unit (104) creates a predicted image when encoded in each encoding mode using the motion vector information from the motion search unit (103), and supplies the predicted image to the first switch unit (110). .

第１のスイッチ部(110)は、制御部７からの制御信号に応答して、ブロック分割部(102)からのブロック画像、動き探索部(103)からの動きベクトル、及び予測画像生成部(104)からの予測画像を、学習用モード判定部(107)及び高速動作モード判定部(108)のいずれかに供給するように切り替えられる。 In response to the control signal from the control unit 7, the first switch unit (110) is a block image from the block dividing unit (102), a motion vector from the motion search unit (103), and a predicted image generation unit ( The prediction image from 104) is switched to be supplied to either the learning mode determination unit (107) or the high-speed operation mode determination unit (108).

学習用モード判定部(107)では、所定の演算モデルに従い、入力された予測画像とブロック画像との差分信号を生成し、この差分信号と入力された動きベクトルを用いて各モードの符号化コストを算出して最適な符号化モードを判定する。そして学習用モード判定部(107)は、判定された符号化モードと、その符号化モードに対応する画像の状態に関する情報（すなわち、当該符号化モードの判別に必要な情報）を含む学習データを、教師信号として高速動作モード判定部(108)に出力する。 The learning mode determination unit (107) generates a difference signal between the input predicted image and the block image according to a predetermined calculation model, and uses the difference signal and the input motion vector to encode the encoding cost of each mode. Is calculated to determine the optimum encoding mode. Then, the learning mode determination unit (107) obtains learning data including the determined encoding mode and information on the state of the image corresponding to the encoding mode (that is, information necessary for determining the encoding mode). The high-speed operation mode determination unit (108) outputs the teacher signal.

高速動作モード判定部(108)は、上記学習データである教師信号を用いて、上記画像の状態に関する情報と符号化モードとの対応関係についての統計とって学習動作を行う。つまり、高速動作モード判定部(108)は、上記教師信号を用いて得られた統計情報から、画像状態毎に判別される尤度の高い符号化モードを学習する。そして高速動作モード判定部(108)に、ある画像状態の画像が入力されたときに、上記学習の結果に従って、当該画像状態に対応する最高尤度の符号化モードを選択して判別する。 The high-speed operation mode determination unit (108) performs a learning operation using statistics about the correspondence relationship between the information about the state of the image and the encoding mode, using the teacher signal that is the learning data. That is, the high-speed operation mode determination unit (108) learns a coding mode having a high likelihood determined for each image state from the statistical information obtained using the teacher signal. When an image in a certain image state is input to the high-speed operation mode determination unit (108), the highest likelihood encoding mode corresponding to the image state is selected and determined according to the learning result.

第２スイッチ部(111)は、制御部７からの制御信号に応答して、学習用モード判定部(107)及び高速動作モード判定部(108)からのいずれか一方の符号化モードを選択するように構成されている。すなわち、第２スイッチ部(111)は、学習用モード判定部(107)及び高速動作モード判定部(108)からのいずれか一方の符号化モードが符号化処理部(109)に供給されるように切替動作を行う。尚、第１のスイッチ部(110)及び第２スイッチ部(111)は、学習用モード判定部(107)を使用するときはいずれも学習用モード判定部(107)を選択し、高速動作モード判定部(108)を使用するときは、いずれも高速動作モード判定部(108)を選択するように動作する。 The second switch unit (111) selects one of the encoding modes from the learning mode determination unit (107) and the high-speed operation mode determination unit (108) in response to the control signal from the control unit 7. It is configured as follows. That is, the second switch unit (111) is configured so that one of the encoding modes from the learning mode determination unit (107) and the high-speed operation mode determination unit (108) is supplied to the encoding processing unit (109). The switching operation is performed. The first switch unit (110) and the second switch unit (111) both select the learning mode determination unit (107) when using the learning mode determination unit (107), and select the high-speed operation mode. When the determination unit (108) is used, both operate so as to select the high-speed operation mode determination unit (108).

符号化処理部(109)は、上記第２スイッチ部(111)からの符号化モードを用いて符号化ストリームを生成して図１２に示された記録部３に供給する。また、この符号化ストリームは、復号処理部(105)にも供給され、ここで復号処理が行われる。復号処理部(105)で復号された画像は、上記参照画像として参照画像メモリ(106)に記憶される。 The encoding processing unit (109) generates an encoded stream using the encoding mode from the second switch unit (111) and supplies it to the recording unit 3 shown in FIG. The encoded stream is also supplied to the decoding processing unit (105), where decoding processing is performed. The image decoded by the decoding processing unit (105) is stored in the reference image memory (106) as the reference image.

次に、学習用モード判定部(107)について説明する。本実施例に係る学習用モード判定部(107)は、ブロック単位で上記符号化モードを決定するための演算モデルとして、例えば高い圧縮率を得るために有利なRD-Optimization方式と呼ばれるアルゴリズムを上記所定の演算モデルとして用いている。このRD-Optimization方式については、例えば下記参考文献１に紹介されている。
〔参考文献１〕G. Sullivan and T. Wiegand : "Rate-Distortion Optimization for Video Compression", IEEE Signal Processing Magazine, vol. 15,no. 6, pp.74-90, (November, 1998).
RD-Optimization方式では、次のようにして符号化モードを決定している。まず、候補となるすべてのモードで該当マクロブロックの予測画像を生成して原画像との差分信号を生成し、DCTと量子化を施して一度仮符号化を行う。次に、その仮符号化したデータをIDCT(Inverse DCT：逆離散コサイン変換)と逆量子化を施して復号する。そして、その復号されたデータから実際の画像の歪み量と発生符号量を測定してラグランジュ未定乗数法で最適化する。これによって、画質劣化が極力少なくなる符号化モードを決定している。一般的にこの方式は理想的なモードの選択を行うことが可能であることが知られている。 Next, the learning mode determination unit (107) will be described. The learning mode determination unit (107) according to the present embodiment uses an algorithm called an RD-Optimization scheme that is advantageous for obtaining a high compression rate, for example, as an arithmetic model for determining the encoding mode in units of blocks. It is used as a predetermined calculation model. This RD-Optimization method is introduced in Reference Document 1 below, for example.
[Reference 1] G. Sullivan and T. Wiegand: "Rate-Distortion Optimization for Video Compression", IEEE Signal Processing Magazine, vol. 15, no. 6, pp. 74-90, (November, 1998).
In the RD-Optimization method, the encoding mode is determined as follows. First, a prediction image of the corresponding macroblock is generated in all candidate modes, a difference signal from the original image is generated, DCT and quantization are performed, and temporary encoding is performed once. Next, the temporarily encoded data is decoded by performing IDCT (Inverse DCT: Inverse Discrete Cosine Transform) and inverse quantization. Then, the distortion amount and the generated code amount of the actual image are measured from the decoded data and optimized by the Lagrange undetermined multiplier method. As a result, an encoding mode in which image quality deterioration is minimized is determined. In general, it is known that this method can select an ideal mode.

しかしながら、かかる方式は、仮符号化と復号を行う処理の演算量が非常に大きく多大な時間を要する。このため、差分信号に対して符号化・復号処理を行わずに符号化後の画質を予測してモード選択を行うことで高速化を図る手法も提案されている。例えば、H.264標準化リファレンスソフトであるJM(Joint Model)は、RD-Optimization方式を用いない高速化方式が導入されている。しかしながら、JMに導入される高速化方式は、モード判定に利用する符号化コストを主に予測誤差の一次式で近似するため、予測精度が十分ではなく画質が低下する。更に、候補となるすべての符号化モードに対して予測画像を生成するため演算量も大きい。 However, this method requires a large amount of time because the amount of calculation for provisional encoding and decoding is very large. For this reason, a method has also been proposed in which the speed is increased by predicting the image quality after encoding without performing encoding / decoding processing on the difference signal and performing mode selection. For example, JM (Joint Model), which is H.264 standardized reference software, has introduced a high speed method that does not use the RD-Optimization method. However, since the high-speed method introduced in JM approximates the coding cost used for mode determination mainly by a linear expression of prediction error, the prediction accuracy is not sufficient and the image quality is deteriorated. Furthermore, the amount of calculation is large because prediction images are generated for all candidate encoding modes.

従って、本実施例では、高画質を維持するのに好適なRD-Optimization方式を学習用モード判定部(107)における演算モデルとして用いている。かかる学習用モード判定部(107)の一具体例を図２に示す。図２において、差分信号生成部(201)は、上記第１スイッチ部(110)を介して供給された予測画像と原画像（ブロック画像）を用いて、当該ブロック内の画素毎に差分値を計算する。この差分値は、符号化処理部(202)と教師信号生成部(206)に供給される。符号化処理部(202)では、入力された差分信号に対して、指定されたサイズ（例えば8×8）でブロック単位のDCT(Discrete Cosine Transformation：離散コサイン変換)と量子化を行う。この符号化処理で得られたDCT係数と量子化パラメータは、復号処理部(203)に供給される。 Therefore, in this embodiment, an RD-Optimization method suitable for maintaining high image quality is used as an arithmetic model in the learning mode determination unit (107). A specific example of the learning mode determination unit (107) is shown in FIG. In FIG. 2, a difference signal generation unit (201) uses a prediction image and an original image (block image) supplied via the first switch unit (110) to calculate a difference value for each pixel in the block. calculate. This difference value is supplied to the encoding processing unit (202) and the teacher signal generation unit (206). The encoding processing unit (202) performs DCT (Discrete Cosine Transformation) and quantization in block units with a specified size (for example, 8 × 8) on the input differential signal. The DCT coefficient and the quantization parameter obtained by this encoding process are supplied to the decoding processing unit (203).

復号処理部(203)では、符号化されたデータに対して逆量子化とIDCT(Inverse DCT：逆DCT)を行って差分信号に復号してコスト計算部(204)に出力する。コスト計算部では、復号された差分信号と、第１のスイッチ部(110)を介して入力された動きベクトル情報とに基づいて該当モードの符号化コストを計算してモード決定部(205)に供給する。モード決定部(205)では、符号化コストの最も小さい符号化モードを最適モードとして選択して決定し、その結果を第２のスイッチ部(111)と教師信号生成部(206)に出力する。 The decoding processing unit (203) performs inverse quantization and IDCT (Inverse DCT) on the encoded data, decodes it into a differential signal, and outputs it to the cost calculation unit (204). The cost calculation unit calculates the coding cost of the corresponding mode based on the decoded difference signal and the motion vector information input via the first switch unit (110), and sends it to the mode determination unit (205). Supply. The mode determination unit (205) selects and determines the encoding mode with the lowest encoding cost as the optimum mode, and outputs the result to the second switch unit (111) and the teacher signal generation unit (206).

教師信号生成部(206)は、上記差分信号生成部(201)からの差分信号とモード決定部(205)からの符号化モードに基づいて、高速動作モード判定部(108)で利用される学習データとしての教師信号を生成し、これを高速動作モード判定部(108)に供給する。すなわち、本実施例に係る高速動作モード判定部(108)は、RD-Optimization方式による符号化モード判別処理をモデルとして、その入力（画像の状態としてのブロック画像と予測画像との差分及び動きベクトル）に対する出力（符号化モード）を学習するようにしている。 The teacher signal generation unit (206) is a learning used in the high-speed operation mode determination unit (108) based on the difference signal from the difference signal generation unit (201) and the encoding mode from the mode determination unit (205). A teacher signal as data is generated and supplied to the high-speed operation mode determination unit (108). That is, the high-speed operation mode determination unit (108) according to the present embodiment uses the RD-Optimization encoding mode determination process as a model, and inputs (difference between the block image as the image state and the predicted image and the motion vector). ) (Encoding mode) is learned.

続いて、本実施例に係る高速動作モード判定部(108)を説明する。図３は、本実施例に係る高速動作モード判定部(108)の基本的構成を示しており、該高速動作モード判定部(108)は、予測画像と原画像の差分信号を生成する差分信号生成部(301)と、多段に接続された判別部(302)〜(306)を備える。本実施例に係る高速動作モード判定部(108)は、各動作モードにおいて、次のように動作する。
（１）学習モード：
このモードは、学習用モード判定部(107)において実行される所定演算モデルに従った符号化モード判別の処理結果とそれに対応する画像状態の情報を用いて、学習を行うモードである。このとき、第１のスイッチ部(110)は、学習用モード判定部(107)を選択しており、ブロック分割部(102)からのブロック画像、動き探索部(103)からの動きベクトル、及び予測画像生成部(104)からの予測画像は高速モード判定部(108)に供給されず、学習用モード判定部(107)に供給される。尚、第２のスイッチ部(111)も学習用モード判定部(107)を選択しており、学習用モード判定部(107)における演算結果である符号化モードが第２のスイッチ部(111)を介して符号化処理部(109)に供給される。 Subsequently, the high-speed operation mode determination unit (108) according to the present embodiment will be described. FIG. 3 shows a basic configuration of the high-speed operation mode determination unit (108) according to the present embodiment. The high-speed operation mode determination unit (108) generates a difference signal that generates a difference signal between the predicted image and the original image. A generation unit (301) and determination units (302) to (306) connected in multiple stages are provided. The high-speed operation mode determination unit (108) according to the present embodiment operates as follows in each operation mode.
(1) Learning mode:
This mode is a mode in which learning is performed using the processing result of the coding mode discrimination according to the predetermined calculation model executed in the learning mode judgment unit (107) and the information of the image state corresponding thereto. At this time, the first switch unit (110) has selected the learning mode determination unit (107), the block image from the block division unit (102), the motion vector from the motion search unit (103), and The predicted image from the predicted image generation unit (104) is not supplied to the high-speed mode determination unit (108), but is supplied to the learning mode determination unit (107). The second switch unit (111) also selects the learning mode determination unit (107), and the encoding mode that is the calculation result in the learning mode determination unit (107) is the second switch unit (111). To the encoding processing unit (109).

上記のように、学習モードにおいてはブロック画像、動きベクトル及び予測画像は高速動作モード判定部(108)に供給されないが、学習用モード判定部(107)からの教師信号が供給される。この教師信号は、多段に接続された複数の判定部(302)〜(306)のそれぞれに入力される。判定器(302)〜(306)は、それぞれ学習用モード判定部(107)での演算結果である符号化モードと画像状態に関する情報との対応関係についての統計を取る。そしてその統計情報によって、学習用モード判定部(107)に入力（つまり画像状態）と出力（つまり判別された符号化モード）との関係を学習する。尚、第１段階判定部(302)は大まかな符号化モードの判定を行い、必要に応じて、それに接続される第２段階判定部１(303)、第２段階判定部ｎ(304)、及び／または第ｓ段階判定部１(305)、第ｓ段階判定部ｍ(306)にて詳細な判別を行ってもよい。ここで、ｎ、ｓ、ｍは２以上の整数である。すなわち、第２段階判定部及び第ｓ段階判定部は、それぞれ複数（ｎまたはｍ個）で構成するようにしてもよい。
（２）高速動作モード：
このモードは、高速動作モード判定部(108)における各判定部での学習結果を利用して高速に（上記学習用モード判定部(107)における演算処理よりも速く）符号化モードを判別するモードである。このとき、第１のスイッチ部(110)は、高速動作モード判定部(108)を選択しており、ブロック分割部(102)からのブロック画像、動き探索部(103)からの動きベクトル、及び予測画像生成部(104)からの予測画像は高速モード判定部に供給される（当然、これらは学習用モード判定部(107)には供給されない）。尚、第２のスイッチ部(111)も高速動作モード判定部(108)を選択しており、高速動作モード判定部(108)における判定結果である符号化モードが第２のスイッチ部(111)を介して符号化処理部(109)に供給される。 As described above, in the learning mode, the block image, the motion vector, and the predicted image are not supplied to the high-speed operation mode determination unit (108), but the teacher signal is supplied from the learning mode determination unit (107). This teacher signal is input to each of a plurality of determination units (302) to (306) connected in multiple stages. Each of the determiners (302) to (306) takes statistics about the correspondence between the coding mode, which is the calculation result in the learning mode determination unit (107), and information on the image state. The statistical information is used to learn the relationship between the input (that is, the image state) and the output (that is, the determined coding mode) to the learning mode determination unit (107). The first stage determination unit (302) performs a rough coding mode determination, and if necessary, the second stage determination unit 1 (303), the second stage determination unit n (304) connected thereto, Further, detailed determination may be performed by the s-th stage determination unit 1 (305) and the s-th stage determination unit m (306). Here, n, s, and m are integers of 2 or more. That is, the second stage determination unit and the s stage determination unit may be configured by a plurality (n or m), respectively.
(2) High-speed operation mode:
This mode is a mode for discriminating the encoding mode at a high speed (faster than the arithmetic processing in the learning mode determination unit (107)) by using the learning result in each determination unit in the high-speed operation mode determination unit (108). It is. At this time, the first switch unit (110) has selected the high-speed operation mode determination unit (108), the block image from the block division unit (102), the motion vector from the motion search unit (103), and The predicted image from the predicted image generation unit (104) is supplied to the high-speed mode determination unit (of course, they are not supplied to the learning mode determination unit (107)). The second switch unit (111) also selects the high-speed operation mode determination unit (108), and the coding mode that is the determination result in the high-speed operation mode determination unit (108) is the second switch unit (111). To the encoding processing unit (109).

入力されたブロック画像と予測画像は、差分信号生成部(301)に入力される。差分信号生成部(301)は、ブロック画像を原画像としてこの原画像と予測画像との差分を画素毎に計算し、第１段階判定部(302)に供給する。第１段階判定部(302)では、差分信号生成部(301)からの差分信号と動きベクトルの情報とを画像状態として入力し、これに対応する符号化モードを上記学習した統計情報を利用して選択し判別する。ここで選択される符号化モードは、当該入力された画像の情報に対して、上記所定演算モデルで判別される尤度の最も高い符号化モードとする。すなわち、高速動作モード判定部(108)は、上記学習によって、画像の状態毎に対応する符号化モードの尤度を把握する。そして、ある画像が入力されたときに、その画像状態に対応する最も尤度の高い符号化モードを選択すれば、所定演算モデルである例えばRD-Optimization方式による演算を行わなくとも最適な符号化モードを得ることができる。 The input block image and predicted image are input to the difference signal generation unit (301). The difference signal generation unit (301) calculates a difference between the original image and the predicted image for each pixel using the block image as an original image, and supplies the difference to the first stage determination unit (302). In the first stage determination unit (302), the difference signal and the motion vector information from the difference signal generation unit (301) are input as image states, and the corresponding encoding mode is used using the learned statistical information. To select and determine. The encoding mode selected here is the encoding mode with the highest likelihood determined by the predetermined calculation model for the input image information. That is, the high-speed operation mode determination unit (108) grasps the likelihood of the encoding mode corresponding to each state of the image by the learning. Then, when a certain image is input, if the most likely encoding mode corresponding to the image state is selected, the optimum encoding can be performed without performing a calculation by a predetermined calculation model such as the RD-Optimization method. You can get mode.

第１段階判定部(302)では符号化モードの大まかな判定を行った後、判定結果に応じてモード判定処理を終了するか、もしくは第２段階判定部(303)(304)のうちどの判別部を次の判定処理に利用するかを決定する。このような判定処理を多段的に行う。そして最終段階である第s段階判定部(305)(306)では、ほぼ確実に最適な符号化モードが決定され、モード判定処理を終了する。これら判別部で判定された最適な符号化モードは、上記学習用モード判定部(107)の出力に代えて、第２のスイッチ(111)を介して符号化処理部(109)に供給する。 The first stage determination unit (302) performs a rough determination of the encoding mode and then ends the mode determination process according to the determination result, or which of the second stage determination units (303) and (304) determines Determines whether to use the part for the next determination process. Such a determination process is performed in multiple stages. Then, in the s-stage determination unit (305) (306), which is the final stage, the optimum encoding mode is almost certainly determined, and the mode determination process is terminated. The optimum encoding mode determined by these determination units is supplied to the encoding processing unit (109) via the second switch (111) instead of the output of the learning mode determination unit (107).

これら判定部は、学習の機能を利用していれば具体的な方法は問わない。例えば、ニューラルネットワーク、カーネル法に基づくSVM(Support Vector Machine)、k近傍法などを利用した非線形判別器を用いてもよい。またSVM、線形判別式分析などを利用した線形判別器を用いてもよい。更に、候補となる符号化モードで符号化を行った際の尤度を計算してもよい。更にまた、ベイズネット、隠れマルコフモデル、決定木学習など確率的な手段によって判定規則そのものを学習させてもよい。また、ブースティングを利用するなどの手段により、複数の判別器を組み合わせてもよい。 Any specific method may be used for these determination units as long as the learning function is used. For example, a nonlinear classifier using a neural network, a SVM (Support Vector Machine) based on a kernel method, a k-nearest neighbor method, or the like may be used. A linear discriminator using SVM, linear discriminant analysis, or the like may be used. Furthermore, the likelihood when encoding is performed in a candidate encoding mode may be calculated. Furthermore, the determination rule itself may be learned by a probabilistic means such as a Bayes net, a hidden Markov model, or a decision tree learning. A plurality of discriminators may be combined by means such as using boosting.

次に、本発明に係る高速動作モード判定部(108)の一具体例について図４〜図７を参照しつつ説明する。これは、図３に示された各判別部(302)〜(306)をニューラルネットワークで構成した例を示している。本具体例は、16×16ピクセルサイズのマクロブロック単位で、4×4または16×16ブロックの画面内(Intra)予測方式、16×16ブロックの画面間スキップ(PSkip)方式、8×8、8×16、16×8または16×16ブロックの画面間(Inter)予測方式の合計7種類の符号化モードから、最適な符号化モードを選択する。ここで、画面間スキップ(PSkip)方式とは、差分情報を用いずに、動きベクトルで示された参照画像内のブロックの画像をそのまま使用する方式である。 Next, a specific example of the high-speed operation mode determination unit (108) according to the present invention will be described with reference to FIGS. This shows an example in which each determination unit (302) to (306) shown in FIG. 3 is configured by a neural network. This specific example is a 16 × 16 pixel macroblock unit, 4 × 4 or 16 × 16 block intra prediction method, 16 × 16 block inter-screen skip (PSkip) method, 8 × 8, An optimal encoding mode is selected from a total of seven encoding modes of the inter prediction scheme of 8 × 16, 16 × 8, or 16 × 16 blocks. Here, the inter-screen skip (PSkip) method is a method in which an image of a block in a reference image indicated by a motion vector is used as it is without using difference information.

本例に係る高速動作モード判定部(108)は、予測画像と原画像の差分信号を生成する差分信号生成部(401)と、Intra方式、PSkip方式、Inter方式の中から最適な符号化方式を選択するニューラルネットワーク１(402)からなる第１段階判定部とを備える。更に、Intra方式の中で最適なブロックサイズを決定するニューラルネットワーク２(403)とInter方式の中で最適なブロックサイズを決定するニューラルネットワーク３(404)の２種類の判定部からなる第２段階判定部を備える。 The high-speed operation mode determination unit (108) according to the present example includes a difference signal generation unit (401) that generates a difference signal between the predicted image and the original image, and an optimal encoding method among the Intra method, the Pskip method, and the Inter method. And a first stage determination unit comprising a neural network 1 (402) for selecting. Further, a second stage comprising two types of determination units, a neural network 2 (403) for determining an optimum block size in the intra method and a neural network 3 (404) for determining an optimum block size in the inter method. A determination unit is provided.

学習モードにおいては、図３に示された例と同様に、各ニューラルネットワークは学習用モード判定部(107)からの教師信号が入力されており、この教師信号を用いて画像状態に関する情報と符号化モードとの対応関係の統計をとり、学習を行う。 In the learning mode, as in the example shown in FIG. 3, each neural network is input with a teacher signal from the learning mode determination unit (107). Statistic of correspondence with the computerization mode and learning.

高速動作モードでは、第１のスイッチ部(110)からのブロック画像と予測画像とが、差分信号生成部(401)に入力される。差分信号生成部(401)では、入力されたブロック画像を原画像としては、予測画像との差分を画素毎に生成してニューラルネットワーク１(402)に供給する。ニューラルネットワーク１(402)は、この入力された差分信号に対応する最適な符号化モードを、Intra方式、PSkip方式、Inter方式の中から選択する。ここでの選択は、第３図の例と同様に、Intra方式、PSkip方式、Inter方式の中で最も尤度の高いものを選択するように構成する。PSkipモードが選択された場合は、その結果を第２のスイッチ(111)へ出力して符号化モードの選択処理を終了する。一方、Intra方式が選択された場合は、符号化処理の際の最適なブロックサイズを決定するために、ニューラルネットワーク２(403)で引き続き判定処理を行う。また、Inter方式が選択された場合は、符号化処理の際の最適なブロックサイズを決定するために、ニューラルネットワーク３(404)で引き続き判定処理を行う。ニューラルネットワーク２(403)、ニューラルネットワーク３(405)では、各符号化方式に対して、学習して得られた最適なブロックサイズ（すなわち各符号化方式に対して最も尤度の高いブロックサイズ）を選択する。その結果を第２のスイッチ(111)へ出力してモード選択処理を終了する。 In the high-speed operation mode, the block image and the predicted image from the first switch unit (110) are input to the difference signal generation unit (401). In the difference signal generation unit (401), using the input block image as an original image, a difference from the predicted image is generated for each pixel and supplied to the neural network 1 (402). The neural network 1 (402) selects an optimal encoding mode corresponding to the input differential signal from the Intra method, the Pskip method, and the Inter method. As in the example of FIG. 3, the selection here is configured to select the highest likelihood among the Intra, Pskip, and Inter methods. When the PSkip mode is selected, the result is output to the second switch (111), and the encoding mode selection process is terminated. On the other hand, when the Intra method is selected, the determination process is continued in the neural network 2 (403) in order to determine the optimum block size for the encoding process. If the Inter method is selected, the determination process is continued in the neural network 3 (404) in order to determine the optimum block size for the encoding process. In the neural network 2 (403) and the neural network 3 (405), the optimum block size obtained by learning for each coding method (that is, the block size having the highest likelihood for each coding method). Select. The result is output to the second switch (111) and the mode selection process is terminated.

ここで、ニューラルネットワークとは、複数のしきい値論理ユニットを入力層から出力層まで階層的に配置したネットワークのことである。フィードフォーワード型のネットワークでは、ユニット間の結合は隣接する層間でのみ存在し、かつ入力層から出力層へ向かう一方向である。結合されたユニット間には結合の重み付けが与えられ、上位階層のユニットへの入力は、下位階層のユニット群が出力する値の積和となる。学習を行う際には、出力層で所望の結果が得られるようにこれらの重み付けを調整する。ニューラルネットワークを利用して符号化モードの選択規則を多次元で近似することで、予測誤差を一次式で算出するJMの高速化手法に比べて、主に符号化コストについて高精度の予測を行うことが可能になる。 Here, the neural network is a network in which a plurality of threshold logic units are arranged hierarchically from the input layer to the output layer. In a feed-forward network, coupling between units exists only between adjacent layers and is unidirectional from the input layer to the output layer. A weight is given to the combined units, and the input to the upper layer unit is the product sum of the values output by the lower layer unit group. When learning is performed, these weightings are adjusted so that a desired result is obtained in the output layer. Compared with JM's high-speed method, which uses a neural network to approximate the encoding mode selection rule in multiple dimensions, the prediction error is calculated using a linear expression, predicting mainly the coding cost is highly accurate. It becomes possible.

図５は、図４に示されたニューラルネットワーク１(402)の一具体例が示されている。本例に係るニューラルネットワーク１(402)は、入力層に５個、出力層に３個のユニットを持つフィードフォーワード型のネットワークである。また、中間層に関しては層数、ユニット数ともに特に指定されていないが、例えば７個のユニットを持つ１層の中間層を設定してもよい。このニューラルネットワーク１には、学習用モード判定部(107)からの教師信号に含まれる学習データとして、例えばQP(Quantization Parameter：量子化パラメータ)、空間解像度、4×4ブロックのIntraモード(I4x4)で予測したときの予測歪み（差分信号）、PSkipモードで予測したときの予測歪み、8×8ブロックのInterモードで予測したときの予測歪みが入力される。尚、予測歪とは、あるブロックにおける画素毎の原画像と参照画像との差分の総和であり、以下では差分信号と呼ぶ場合も有る。 FIG. 5 shows a specific example of the neural network 1 (402) shown in FIG. The neural network 1 (402) according to this example is a feed-forward network having five units in the input layer and three units in the output layer. Further, although the number of layers and the number of units are not particularly specified for the intermediate layer, for example, a single intermediate layer having seven units may be set. The neural network 1 includes, for example, QP (Quantization Parameter), spatial resolution, 4 × 4 block Intra mode (I4x4) as learning data included in the teacher signal from the learning mode determination unit (107). Prediction distortion (difference signal) when predicted in, prediction distortion when predicted in Pskip mode, and prediction distortion when predicted in 8 × 8 block Inter mode are input. Note that the predicted distortion is the sum of differences between an original image and a reference image for each pixel in a certain block, and may be hereinafter referred to as a difference signal.

そしてこれらの情報、信号が入力されると、Intraモード、PSkipモード、Interモードの各モードで符号化した場合の尤度が出力される。これら３つの符号化モードに対する尤度の内、最も高いものを最適な符号化方式とされる。ここで、図５に示された入力において最も尤度の高い符号化方式が例えばInterモードである場合、高速動作モードで同じような入力があったときは、Interモードが選択される。 When such information and signals are input, the likelihood when encoding is performed in each mode of Intra mode, Pskip mode, and Inter mode is output. Among the likelihoods for these three encoding modes, the highest one is set as the optimal encoding method. Here, when the most likely encoding method in the input shown in FIG. 5 is, for example, the Inter mode, the Inter mode is selected when there is a similar input in the high-speed operation mode.

図６は、図４に示されたニューラルネットワーク２(403)を一具体例が示されている。ニューラルネットワーク２(403)は、入力層に４個、出力層に２個のユニットを持つフィードフォーワード型のネットワークである。また、中間層に関しては層数、ユニット数ともに特に指定されていないが、例えば５個のユニットを持つ１層の中間層を設定してもよい。このニューラルネットワーク２には、学習用モード判定部(107)からの教師信号に含まれる学習データとして、例えばQP、空間解像度、4×4ブロックのIntraモード(I4x4)で予測したときの差分信号、16×16ブロックのIntraモード(I16x16)で予測したときの差分信号が入力される。そしてこれらの情報、信号が入力されると、I4x4モード、I16x16モードで符号化した場合の尤度が出力される。これら２つの符号化モードに対する尤度の内、最も高いものを最適な符号化方式とされる。ここで、図６に示された入力において最も尤度の高い符号化方式が例えばI4x4モードである場合、高速動作モードで同じような入力があったときは、I4x4モードが選択される。 FIG. 6 shows a specific example of the neural network 2 (403) shown in FIG. The neural network 2 (403) is a feedforward network having four units in the input layer and two units in the output layer. Further, although the number of layers and the number of units are not particularly specified for the intermediate layer, for example, a single intermediate layer having five units may be set. The neural network 2 includes, as learning data included in the teacher signal from the learning mode determination unit (107), for example, QP, spatial resolution, a difference signal when predicted in 4 × 4 block intra mode (I4x4), The difference signal when predicted in the 16 × 16 block intra mode (I16 × 16) is input. When these information and signals are input, the likelihood of encoding in I4x4 mode and I16x16 mode is output. Among the likelihoods for these two encoding modes, the highest one is set as the optimum encoding method. Here, when the coding method having the highest likelihood in the input shown in FIG. 6 is, for example, the I4x4 mode, the I4x4 mode is selected when there is a similar input in the high-speed operation mode.

図７は、図４に示されたニューラルネットワーク３(404)を一具体例が示されている。ニューラルネットワーク３(404)は、入力層に１１個、出力層に４個のユニットを持つフィードフォーワード型のネットワークである。また、中間層に関しては層数、ユニット数ともに特に指定されていないが、例えば１０個のユニットを持つ１層の中間層を設定してもよい。この例では、16×16サイズのマクロブロックは、さらに4個の8×8ブロックに分割されるものとする。このニューラルネットワーク３には、学習用モード判定部(107)からの教師信号に含まれる学習データとして、QP、空間解像度、8×8ブロック単位で算出したベクトルの差分情報、8×8ブロック単位で算出した差分信号情報が入力される。そしてこれらの情報、信号が入力されると、8×8ブロックのInterモード(P8×8)、8×16ブロックのInterモード(P8×16)、16×8ブロックのInterモード(P16×8)、16×16ブロックのInterモード(P16×16)で符号化した場合の尤度が出力される。ここで、図７に示された入力において最も尤度の高い符号化方式が例えばP16×8モードである場合、高速動作モードで同じような入力があったときは、P16×8モードが選択される。 FIG. 7 shows a specific example of the neural network 3 (404) shown in FIG. The neural network 3 (404) is a feedforward network having 11 units in the input layer and 4 units in the output layer. Further, although the number of layers and the number of units are not particularly specified for the intermediate layer, for example, a single intermediate layer having 10 units may be set. In this example, a 16 × 16 macroblock is further divided into four 8 × 8 blocks. The neural network 3 includes QP, spatial resolution, vector difference information calculated in 8 × 8 block units, 8 × 8 block units as learning data included in the teacher signal from the learning mode determination unit (107). The calculated difference signal information is input. And when these information and signals are input, 8 × 8 block Inter mode (P8 × 8), 8 × 16 block Inter mode (P8 × 16), 16 × 8 block Inter mode (P16 × 8) The likelihood when encoding in the 16 × 16 block Inter mode (P16 × 16) is output. Here, when the most likely encoding method in the input shown in FIG. 7 is, for example, the P16 × 8 mode, the P16 × 8 mode is selected when there is a similar input in the high-speed operation mode. The

一般的に、動画像を符号化する際の処理時間の大半が、画面間予測を行うための動き探索部(103)で消費されている。そのため、Inter予測モードにおける予測画像と原画像の差分信号を生成するための計算コストは非常に大きい。よって、Interモードのすべてのブロックサイズについて差分信号を生成し、ニューラルネットワーク３(404)に入力すると莫大な計算時間がかかる。そのため、本例では、8×8ブロックのInterモードに対してのみ動き探索を行って予測信号を生成することで高速化を実現している。一方、Intra予測モードにおいて、予測画像と原画像との差分信号を生成するための計算コストはそれほど大きくない。そのため、本例に係るニューラルネットワーク２(403)では、Intra予測モードのすべてのブロックサイズで差分信号を生成して入力することで、予測精度を高めている。このように、本実施例では、予測信号を生成する処理負荷の大きさに応じて入力パラメータを調整することで、画質と処理速度を向上するようにしている。 Generally, most of the processing time for encoding a moving image is consumed by the motion search unit (103) for performing inter-screen prediction. Therefore, the calculation cost for generating the difference signal between the predicted image and the original image in the Inter prediction mode is very high. Therefore, if differential signals are generated for all block sizes in the Inter mode and input to the neural network 3 (404), a huge amount of calculation time is required. For this reason, in this example, speedup is realized by generating a prediction signal by performing motion search only for the Inter mode of 8 × 8 blocks. On the other hand, in the intra prediction mode, the calculation cost for generating the difference signal between the predicted image and the original image is not so high. Therefore, in the neural network 2 (403) according to the present example, the prediction accuracy is improved by generating and inputting the difference signal in all block sizes in the intra prediction mode. Thus, in this embodiment, the image quality and the processing speed are improved by adjusting the input parameters according to the size of the processing load for generating the prediction signal.

上述した本実施例における高速動作モード判定処理部(108)の処理の流れを図８に示す。まずステップ802で、３種類の符号化モードI4x4、PSkip、P8×8で差分信号を生成し、ニューラルネットワーク１へ入力する。次にステップ803で、ニューラルネットワーク１(402)がPSkipモードを選択したか、つまりPSkipモードの尤度が最も高いかを判定する。その判定の結果、「yes」であるならば、該当マクロブロックの最適符号化モードはPSkipであるとしてステップ807へ進み、モード選択処理を終了する。また、ステップ803における判定がが「no」である場合はステップ804へ進む。 FIG. 8 shows a processing flow of the high-speed operation mode determination processing unit (108) in the present embodiment described above. First, in step 802, a differential signal is generated in three types of encoding modes I4x4, Pskip, and P8 × 8 and input to the neural network 1. In step 803, it is determined whether the neural network 1 (402) has selected the Pskip mode, that is, whether the likelihood of the Pskip mode is the highest. As a result of the determination, if “yes”, it is determined that the optimal encoding mode of the corresponding macroblock is Pskip, the process proceeds to step 807, and the mode selection process ends. If the determination in step 803 is “no”, the process proceeds to step 804.

ステップ804では、ニューラルネットワーク１(402)がIntraモードを選択したか、つまりIntraモードの尤度が最も高いかを判定する。その判定の結果、「yes」であるならばステップ805へ進む。ステップ805では、Intraモードにおけるもう一つのブロックサイズI16×16で差分信号を作成してI8×8の差分信号とともにニューラルネットワーク２(403)へ入力する。そして、ニューラルネットワーク２(403)出力層において最も尤度の高い符号化モードを最適モードとして選択し、ステップ807へ進んで選択処理を終了する。 In step 804, it is determined whether the neural network 1 (402) has selected the Intra mode, that is, whether the likelihood of the Intra mode is the highest. As a result of the determination, if “yes”, the process proceeds to Step 805. In step 805, a difference signal is created with another block size I16 × 16 in the Intra mode and input to the neural network 2 (403) together with the I8 × 8 difference signal. Then, the coding mode having the highest likelihood in the output layer of the neural network 2 (403) is selected as the optimum mode, and the process proceeds to step 807 to end the selection process.

一方、ステップ804での判定の結果、「no」である、すなわちニューラルネットワーク１がInterモードを選択したならば（つまりInterモードの尤度が最も高ければ）ステップ806へ進む。ステップ806では、P8×8の差分信号と動きベクトル情報をニューラルネットワーク３(404)へ入力する。そしてその出力層において最も尤度の高い符号化モードを最適モードとして選択し、ステップ807へ進んでモード選択処理を終了する。 On the other hand, if the result of determination in step 804 is “no”, that is, if the neural network 1 has selected the Inter mode (that is, if the likelihood of the Inter mode is the highest), the process proceeds to step 806. In step 806, the P8 × 8 difference signal and motion vector information are input to the neural network 3 (404). Then, the coding mode having the highest likelihood in the output layer is selected as the optimum mode, and the process proceeds to step 807 to end the mode selection process.

上記ニューラルネットワークを用いた学習の手法については、本実施例では特に限定されないが、例えば誤差逆伝播法(BP法:Back Propagation method)を利用すれば大きな効果が見られる。BP法については、例えば下記参考文献２の第３章に詳しく解説されている。
〔参考文献２〕石井健一郎, 上田修功, 前田英作, 村瀬洋:”わかりやすいパターン認識”, オーム社, 1998.
また上記の実施例では、RD-Optimization方式の選択規則を学習しているが、学習させる選択規則はこのような客観的判断に基づくものでなくてもよい。例えば、人による画質評価のような主観的判断に基づくものでも構わない。 The learning method using the neural network is not particularly limited in the present embodiment. However, for example, if a back propagation method (BP method) is used, a great effect can be seen. The BP method is described in detail in Chapter 3 of Reference Document 2 below, for example.
[Reference 2] Kenichiro Ishii, Nobuyoshi Ueda, Eisaku Maeda, Hiroshi Murase: “Intuitive Pattern Recognition”, Ohmsha, 1998.
In the above embodiment, the selection rule of the RD-Optimization method is learned. However, the selection rule to be learned may not be based on such objective judgment. For example, it may be based on subjective judgment such as image quality evaluation by a person.

さらに、上記学習用モード選択部(107)及び高速動作モード判定部(108)で用いられる予測信号の作成順序は問わない。例えば、マクロブロック単位で動きベクトルを探索してモードの選択を行ってもよいし、画面単位あるいはスライス単位で一括してベクトル探索を行った後でモード選択を行ってもよい。 Furthermore, the generation order of the prediction signals used in the learning mode selection unit (107) and the high-speed operation mode determination unit (108) is not limited. For example, the mode may be selected by searching for a motion vector in units of macroblocks, or the mode may be selected after performing a vector search in batches in units of screens or slices.

また、高速動作モード判定部(108)において、符号化モードの選択を行うのと同時に動きベクトルの補正を行うことで、さらなる高速化が可能になる。例えば、動き探索部(103)に対して、時間をかけて小数精度まで探索を行う学習用動き探索部と、整数精度までの探索しか行わない高速動作動き探索部の２種類を用意して両者の結果の差分情報を学習するようにしてもよい。そして、高速動作モード判定部(108)で画面間予測方式が最適であると判断された場合には、学習された統計情報を基に動きベクトル量の補正を行ってもよい。これにより、高速で動き探索を行うことができるようになる。 Further, in the high-speed operation mode determination unit (108), it is possible to further increase the speed by correcting the motion vector at the same time as selecting the encoding mode. For example, for the motion search unit (103), two types of a motion search unit for learning that searches to decimal precision over time and a high-speed motion search unit that only searches to integer precision are prepared. The difference information as a result of the above may be learned. Then, when the high-speed operation mode determination unit (108) determines that the inter-screen prediction method is optimal, the motion vector amount may be corrected based on the learned statistical information. Thereby, a motion search can be performed at high speed.

上記本実施例の構成によれば、高速かつ高画質で動画像を符号化することが可能になる。また、利用者の環境下で定期的に学習を繰り返すことで、利用環境に適した符号化モードを判別して符号化を行えるようになり、より画質を向上させることができる。例えば本実施例に係る画像処理装置をデジタルカメラに適用した場合、当該デジタルカメラの購入後しばらく学習モードで撮影を行ってパラメータの調整を行うことで、カメラの特性や撮影環境に応じた適切な符号化モードを学習することができる。そして、しばらく学習モードで撮影した後に高速動作モードで撮影すれば、当該高速動作モードは学習モードで学習した結果を反映し、様々なシーンに応じて高速に最適な符号化が行える。また、本実施例によれば、画像処理装置の使用時間が長くなれば、その分様々な画像に対する学習結果が蓄積されるので、使用する時間が長くなるほど符号化に係る時間が短縮される。 According to the configuration of the present embodiment, it is possible to encode a moving image with high speed and high image quality. In addition, by repeating the learning periodically in the user's environment, it becomes possible to determine the encoding mode suitable for the use environment and perform the encoding, thereby further improving the image quality. For example, when the image processing apparatus according to the present embodiment is applied to a digital camera, it is possible to perform appropriate shooting according to the characteristics of the camera and the shooting environment by performing shooting in the learning mode for a while after the purchase of the digital camera and adjusting the parameters. The encoding mode can be learned. Then, if shooting is performed in the learning mode for a while and then shooting in the high-speed operation mode, the high-speed operation mode reflects the result of learning in the learning mode, and optimal encoding can be performed at high speed according to various scenes. Further, according to the present embodiment, if the use time of the image processing apparatus is increased, learning results for various images are accumulated accordingly, so that the time required for encoding is shortened as the use time is increased.

このようなモードの切替は、当該画像処理装置のユーザの手動により切り換えてもよい。例えば、デジタルカメラに手動スイッチやメニュー画面を表示させ、これに対する操作に応答して符号化のモードを選択、切替してもよい。この手動で入力された指令に応答して、制御部９は、上述した第１のスイッチ部(110)及び第２のスイッチ部(111)を制御する。つまり、ユーザが学習モードを選択したときは、制御部９は、第１のスイッチ部(110)及び第２のスイッチ部(111)が学習用モード判定部(107)を選択するように制御する。一方、ユーザが高速動作モードを選択したときは、制御部９は、第１のスイッチ部(110)及び第２のスイッチ部(111)が高速動作モード判定部(108)を選択するように制御する。 Such mode switching may be performed manually by the user of the image processing apparatus. For example, a manual switch or a menu screen may be displayed on the digital camera, and an encoding mode may be selected and switched in response to an operation on the manual switch or menu screen. In response to the manually input command, the control unit 9 controls the first switch unit (110) and the second switch unit (111) described above. That is, when the user selects the learning mode, the control unit 9 controls the first switch unit (110) and the second switch unit (111) to select the learning mode determination unit (107). . On the other hand, when the user selects the high-speed operation mode, the control unit 9 performs control so that the first switch unit (110) and the second switch unit (111) select the high-speed operation mode determination unit (108). To do.

上記学習用モード判定部(107)と高速動作モード判定部(108)との切り替えは、上記のようにユーザの指示によって手動で行ってもよいし、自動的に行っても構わない。自動的に行う場合は、例えば学習モードの動作時間を基に切り替えるようにしてもよい。例えば、本実施例に係る画像処理装置が適用されたデジタルカメラの場合、ユーザにおける初期使用（初期動作）時は、自動的に学習モードを選択するように、上記制御部９が第１及び第２のスイッチ部を制御する。制御部９は、カウンタ機能を有しており、当該装置を最初に使用してからの動作時間、つまり学習モードで動作した時間をカウントする。そしてそのカウント値が72時間を超えた場合、制御部９は自動的に高速動作モードを選択するように上記第１及び第２のスイッチ部を制御する。すなわち、この例では、上記72時間の学習モードの動作を、いわゆる「ならし動作」としている。この「ならし動作」終了後に、高画質で高速な符号化が行えるようになる。 Switching between the learning mode determination unit (107) and the high-speed operation mode determination unit (108) may be performed manually by a user instruction as described above, or may be performed automatically. When performing automatically, you may make it switch based on the operation time of learning mode, for example. For example, in the case of a digital camera to which the image processing apparatus according to the present embodiment is applied, the control unit 9 performs the first and first control so that the learning mode is automatically selected during initial use (initial operation) by the user. 2 switches. The control unit 9 has a counter function, and counts the operation time after the first use of the device, that is, the operation time in the learning mode. When the count value exceeds 72 hours, the control unit 9 controls the first and second switch units so as to automatically select the high-speed operation mode. That is, in this example, the 72-hour learning mode operation is a so-called “run-in operation”. After this “run-in operation” is completed, high-quality and high-speed encoding can be performed.

当然、モードの切替は、手動と自動の両方を組み合わせてもよい。上記「ならし動作」終了後も、撮影シーンが変化すれば、その都度手動で学習モードを選択するようにしてもよい。 Of course, both manual and automatic modes may be switched. Even after the above-described “run-in operation”, if the shooting scene changes, the learning mode may be manually selected each time.

上記自動的なモード切替において、例えば学習用モード判定部(107)は処理時間が大きいため、学習モードにおいてはフレームレートを落とすなどの工夫を行って処理速度を向上させてもよい。これにより、処理速度を落とすことなく両者を切り替えて利用することが可能になる。また、両者を並行して動作させる場合には、通常は一方のみが動作している状態で、一部の期間もしくは時間では両者が並行して動作するようにしても構わない。 In the automatic mode switching, for example, since the learning mode determination unit (107) has a long processing time, the processing speed may be improved by reducing the frame rate in the learning mode. As a result, it is possible to switch between the two without reducing the processing speed. When both are operated in parallel, normally only one of them may be operating, and both of them may operate in parallel for some period or time.

さらに、本実施例では、撮影環境に特化した符号化処理を構築することが可能である、このため、テレビ会議システムや監視システムなど、撮影する場所や映像の性質がある程度限定される場合に、本実施例に係る符号化処理を利用すると特に効果的である。 Furthermore, in this embodiment, it is possible to construct an encoding process specialized for the shooting environment. For this reason, when the shooting location and the nature of the video are limited to some extent, such as a video conference system or a surveillance system. The encoding process according to the present embodiment is particularly effective.

次に、本発明の第２実施例について、図９〜図１１を参照しつつ説明する。本実施例は、制御部９がＣＰＵで構成される場合に、このＣＰＵの負荷、つまりＣＰＵの使用率に応じて学習モードと高速動作モードとを自動的に切り替えるようにしたものである。本実施例は、制御部９に代えて、もしくは制御部９にＣＰＵ使用率計測部(190)を設け、これにより第１のスイッチ(110)及び第２のスイッチ(111)を制御する点が第１実施例と異なっている。その他の点については第１実施例と同様である。 Next, a second embodiment of the present invention will be described with reference to FIGS. In this embodiment, when the control unit 9 is constituted by a CPU, the learning mode and the high-speed operation mode are automatically switched according to the load of the CPU, that is, the usage rate of the CPU. In this embodiment, a CPU usage rate measurement unit (190) is provided in place of the control unit 9 or in the control unit 9, thereby controlling the first switch (110) and the second switch (111). This is different from the first embodiment. Other points are the same as in the first embodiment.

本実施例に係るＣＰＵ使用率計測部(190)は、例えばＣＰＵのリソースの空きがあるかを計測し、もし空きが有る場合に高速モード判定部(108)と並行して学習用モード判定部(107)を動作させるように第１及び第２スイッチを切り替えるように制御する。つまり本実施例では、高動作速モード判定部(108)は常に動作しており、部分的に学習用モード判定部(107)が動作してその結果を高速動作モード判定部(108)に反映させるようにしている。 The CPU usage rate measurement unit (190) according to the present embodiment measures, for example, whether there is a CPU resource vacancy, and if there is a vacancy, the learning mode determination unit in parallel with the high-speed mode determination unit (108) The first and second switches are controlled so as to operate (107). That is, in this embodiment, the high operation speed mode determination unit (108) is always operating, and the learning mode determination unit (107) partially operates and the result is reflected in the high speed operation mode determination unit (108). I try to let them.

このように構成すれば、本実施例が適用される画像処理装置の利用回数が多くなるにつれて、符号化効率が向上する特徴がある。図１０は、一定のビットレートで符号化を行った場合における、本実施例に係る画像処理装置の使用時間と歪み量との関係を示している。また図１１は、本実施例に係る画像処理装置の使用時間とビットレートとの関係を示している。各図において、時刻t1から時刻t2まで、時刻t3から時刻t4まで、及び時刻t5から時刻t6までの３区間で、学習用モード判定部(107)が選択されて動作されるものとする。この図から明らかなように、学習用モード判定部(107)が高速動作モード判定部(108)と並行して動作する期間は、符号化効率が徐々に向上されている。すなわち、図１０から、学習を行うことにより画質が向上することが、図１１から、学習を行うことにより発生符号量が減少することが理解される。 With this configuration, there is a feature that the encoding efficiency is improved as the number of uses of the image processing apparatus to which the present embodiment is applied increases. FIG. 10 shows the relationship between the usage time and the distortion amount of the image processing apparatus according to the present embodiment when encoding is performed at a constant bit rate. FIG. 11 shows the relationship between the usage time and the bit rate of the image processing apparatus according to this embodiment. In each figure, the learning mode determination unit (107) is selected and operated in three sections from time t1 to time t2, from time t3 to time t4, and from time t5 to time t6. As is apparent from this figure, during the period in which the learning mode determination unit (107) operates in parallel with the high-speed operation mode determination unit (108), the encoding efficiency is gradually improved. That is, it can be understood from FIG. 10 that the image quality is improved by performing learning, and the generated code amount is decreased by performing learning from FIG.

このように、本実施例によれば、利用回数に応じて符号化処理が高速化されるとともに、符号化に伴う画質劣化が低減され画質が向上される。 As described above, according to the present embodiment, the encoding process is speeded up according to the number of uses, and the image quality deterioration due to the encoding is reduced and the image quality is improved.

本発明は、例えばハードディスクレコーダ、携帯電話、デジタルカメラ、監視システム、テレビ会議システム等の動画像を符号化する機能を備えた画像処理装置に適用され得る。 The present invention can be applied to an image processing apparatus having a function of encoding a moving image, such as a hard disk recorder, a mobile phone, a digital camera, a surveillance system, a video conference system, and the like.

第１実施例に係る符号化部の一構成例を示す図The figure which shows the example of 1 structure of the encoding part which concerns on 1st Example. 学習用モード判定部107の一具体例を示す図The figure which shows one specific example of the mode determination part 107 for learning 高速動作モード判定部108の基本的構成の一例を示す図The figure which shows an example of a basic structure of the high-speed operation mode determination part 108 高速動作モード判定部108の一具体例を示す図The figure which shows one specific example of the high-speed operation mode determination part 108 図４に示されたニューラルネットワーク１(402)の一具体例を示す図The figure which shows one specific example of the neural network 1 (402) shown by FIG. 図４に示されたニューラルネットワーク２(403)の一具体例を示す図The figure which shows one specific example of the neural network 2 (403) shown by FIG. 図４に示されたニューラルネットワーク３(404)の一具体例を示す図The figure which shows one specific example of the neural network 3 (404) shown by FIG. 高速動作モード判定部108における符号化モード判定の処理の流れを示す図The figure which shows the flow of a process of the encoding mode determination in the high-speed operation mode determination part 108. 本発明に係る第２実施例の一構成例を示す図The figure which shows the example of 1 structure of 2nd Example based on this invention. 第２実施例の効果を示す図The figure which shows the effect of 2nd Example 第２実施例の効果を示す図The figure which shows the effect of 2nd Example 本発明が適用される画像処理装置の一構成例を示す図。The figure which shows the example of 1 structure of the image processing apparatus with which this invention is applied.

Explanation of symbols

１０１…原画像メモリ、１０２…ブロック分割部、１０３…動き探索部、１０４…予測画像生成部、１０５…復号処理部、１０６…参照画像メモリ、１０７…学習用モード判定部、１０８…高速動作モード判定部、１０９…符号化処理部、１１０…第１のスイッチ、１１１…第２のスイッチ。
DESCRIPTION OF SYMBOLS 101 ... Original image memory, 102 ... Block division part, 103 ... Motion search part, 104 ... Predictive image generation part, 105 ... Decoding processing part, 106 ... Reference image memory, 107 ... Learning mode determination part, 108 ... High-speed operation mode Judgment unit, 109... Encoding processing unit, 110... First switch, 111.

Claims

Coding mode defined by a prediction mode of any one of three prediction methods of inter-screen skip mode, intra mode, and inter mode, and a predetermined block size applied when intra mode or inter mode is selected. And an image processing apparatus for encoding in the selected encoding mode,
It has a learning mode and a high-speed operation mode,
In the learning mode,
Encode the block image in the learning original image in each encoding mode,
Find the coding cost in each coding mode,
The first determiner used in the high-speed mode based on the correspondence relationship between the spatial resolution, QP, the coding mode with the smallest code amount, and the difference signal according to the coding mode with the smallest code amount. And a second determiner and a third determiner,
In the high speed operation mode,
When coding a block image of an image to be processed at high speed,
The difference signal is obtained for each of the inter-screen skip mode, the intra mode applied to one block size of the specified block size, and the inter mode applied to one block size of the specified block size. In the case of the spatial resolution, QP, and inter mode of the image to be processed at high speed, a difference vector signal is further obtained and input to the first determiner. Output the likelihood of each mode, select the prediction method of the coding mode with the highest likelihood,
Next, an image processing apparatus characterized by performing any one of the following steps (a) to (c).
(A) When the inter-screen skip mode is selected as the prediction method by the first determiner, prediction and encoding are performed in the inter-screen skip mode.
(B) When an intra mode is selected as a prediction method by the first determiner, a difference signal for the intra mode applied to another block size of the specified block size is obtained. While inputting into the second determiner, the spatial resolution and QP of the high-speed processed image are input to the second determiner, so that the likelihood for each coding mode is output, and the most likely of these is output. A block size of a coding mode having a higher value is selected, and prediction and coding are performed in an intra mode to which the selected block size is applied.
(C) When inter mode is selected as the prediction method in the first determiner, the difference signal and difference vector information for the inter mode applied to another block size of the specified block size Each of the inter modes applied to all block sizes by inputting the spatial resolution, QP, and difference vector information of the high-speed processed image to the third determiner. The likelihood is output, the block size of the coding mode with the highest likelihood is selected, and prediction and coding are performed in the inter mode to which the selected block size is applied.