JP2017143425A

JP2017143425A - Image feature descriptor encoder, image feature descriptor decoder, image feature descriptor encoding method and image feature descriptor decoding method

Info

Publication number: JP2017143425A
Application number: JP2016023805A
Authority: JP
Inventors: 彰峯澤; Akira Minesawa; 守屋　芳美; Yoshimi Moriya; 芳美守屋; 関口　俊一; Shunichi Sekiguchi; 俊一関口; 亮史服部; Akifumi Hattori; 一之宮澤; Kazuyuki Miyazawa; 友哉澤田; Tomoya Sawada; 直大澁谷; Naohiro Shibuya
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2016-02-10
Filing date: 2016-02-10
Publication date: 2017-08-17
Anticipated expiration: 2036-02-10
Also published as: JP6409792B2

Abstract

PROBLEM TO BE SOLVED: To provide an image feature descriptor encoder that improves prediction accuracy by using correlation in a time direction and achieves high encoding efficiency.SOLUTION: An encoder comprises: predicting means for setting an encoded descriptor signal on a reference picture, which a motion vector points, as a prediction descriptor signal for each feature point in a block; conversion means for generating a conversion signal for a difference value between the descriptor signal and the prediction descriptor signal; and encoding means for encoding a prediction mode, a conversion mode and conversion signals of respective feature points and generating a bit stream on which encoded data of the descriptor presence flag, the prediction mode, the conversion mode and the conversion signals of the respective feature points are multiplexed. The predicting means selects a prediction vector from one or more encoded motion vectors at the periphery of the block and generates a difference vector being a difference between the motion vector and the prediction vector. The encoding means encodes selection information of the prediction vector for each feature point and the difference vector, and multiplexes them on the bit stream.SELECTED DRAWING: Figure 6

Description

この発明は、画像の特徴記述子を符号化する画像特徴記述子符号化装置及び方法、符号化された特徴記述子を復号する画像特徴記述子復号装置及び方法に関するものである。 The present invention relates to an image feature descriptor encoding apparatus and method for encoding an image feature descriptor, and an image feature descriptor decoding apparatus and method for decoding an encoded feature descriptor.

画像検索、物体追跡等の画像処理を実現する方法として、ＨＯＧ（Histogram of Oriented Gradients）、ＳＩＦＴ（Scale Invariant Feature Transform）、ＳＵＲＦ（Speeded Up Robust Features）といった画像の特徴記述子のマッチング処理を用いた手法がある。例えば、ある特定の被写体を撮影した画像をネットワークで繋がったサーバから検索する際、サーバが様々な画像の特徴記述子をデータベースとして格納しており、このデータベースとの特徴記述子間のマッチングによって検索を実現するとする。このような用途では、検索対象画像そのものを伝送してサーバ側で検索対象画像の特徴記述子を抽出するか、ユーザ側で特徴記述子を抽出し、特徴記述子をサーバに伝送する必要がある。したがって、特徴記述子のデータサイズが画像そのもののデータサイズより小さくなる場合、画像そのものを伝送せずに特徴記述子を伝送することで伝送負荷を低減することができる。 Image feature descriptor matching processing such as HOG (Histogram of Oriented Gradients), SIFT (Scale Invariant Feature Transform), SURF (Speeded Up Robust Features) was used as a method for realizing image processing such as image search and object tracking. There is a technique. For example, when searching an image of a specific subject from a server connected via a network, the server stores feature descriptors of various images as a database, and searches by matching between feature descriptors with this database Suppose that In such an application, it is necessary to transmit the search target image itself and extract the feature descriptor of the search target image on the server side, or extract the feature descriptor on the user side and transmit the feature descriptor to the server. . Therefore, when the data size of the feature descriptor is smaller than the data size of the image itself, the transmission load can be reduced by transmitting the feature descriptor without transmitting the image itself.

非特許文献１はこのような用途のために開発された技術であり、画像そのものよりもデータサイズが小さくなる特徴記述子の生成手法及び符号化方法が示されている。図１にブロック図を示す。非特許文献１は、まず入力画像前処理部１で、入力画像に対して前処理として画像縮小を実施する。前処理済み画像である縮小画像の画素数が元の入力画像より少なくなるため、後段の処理の演算負荷及び特徴点数の削減ができる。次に、特徴記述子生成部２で上記縮小画像に対して特徴点の抽出処理を行い、得られた特徴点に対する特徴記述子を算出する。そして、得られた特徴点の位置情報と対応する特徴記述子を記述子情報として、記述子情報符号化部３で符号化して符号化ビットストリームとして出力する。制御部４は各処理に対して処理の有無や処理方式の選択等の制御を行う。 Non-Patent Document 1 is a technique developed for such a purpose, and shows a feature descriptor generation method and an encoding method in which the data size is smaller than the image itself. FIG. 1 shows a block diagram. In Non-Patent Document 1, first, an input image preprocessing unit 1 performs image reduction as preprocessing on an input image. Since the number of pixels of the reduced image, which is a preprocessed image, is smaller than that of the original input image, it is possible to reduce the calculation load and the number of feature points in the subsequent processing. Next, the feature descriptor generation unit 2 performs feature point extraction processing on the reduced image, and calculates a feature descriptor for the obtained feature point. Then, the descriptor descriptor corresponding to the obtained feature point position information is encoded as descriptor information by the descriptor information encoding unit 3 and output as an encoded bit stream. The control unit 4 controls the presence / absence of a process, selection of a processing method, and the like for each process.

ＩＳＯ／ＩＥＣ１５９３８−１３：ＣｏｍｐａｃｔｄｅｓｃｒｉｐｔｏｒｓｆｏｒｖｉｓｕａｌｓｅａｒｃｈISO / IEC 15938-13: Compact descriptors for visual search

非特許文献１は、上記処理によって画像そのものより小さいデータサイズでの特徴記述子の伝送を実現している。しかし、入力画像として１以上の画像（フレーム）からなる映像を扱う場合、非特許文献１は符号化処理に主にフレーム内の相関を利用した符号化を行っているため、映像が本来持つ時間方向の相関を使用できていない課題があった。 Non-Patent Document 1 realizes transmission of a feature descriptor with a data size smaller than the image itself by the above processing. However, when handling a video composed of one or more images (frames) as an input image, Non-Patent Document 1 performs encoding using mainly the correlation in the frame for encoding processing, and thus the time inherent to the video There was a problem that direction correlation could not be used.

この発明は上記のような課題を解決するためになされたもので、記述子情報の符号化の際に特徴記述子の時間方向の相関を利用した符号化処理によって、高い符号化効率を実現することができる画像特徴記述子符号化装置及び画像特徴記述子符号化方法を得ることを目的とする。
また、この発明は、上記のような画像特徴記述子符号化装置及び画像特徴記述子符号化方法により生成された符号化ビットストリームを正しく復号することができる画像特徴記述子復号装置及び画像特徴記述子復号方法を得ることを目的とする。 The present invention has been made to solve the above-described problems, and realizes high encoding efficiency by encoding processing using the time direction correlation of feature descriptors when encoding descriptor information. An object is to obtain an image feature descriptor encoding apparatus and an image feature descriptor encoding method.
The present invention also provides an image feature descriptor decoding device and an image feature description that can correctly decode the encoded bitstream generated by the image feature descriptor encoding device and the image feature descriptor encoding method as described above. An object is to obtain a child decoding method.

この発明に係る画像特徴記述子符号化装置は、
入力画像をブロックに分割するブロック分割手段と、
上記ブロック内に記述子を１以上含むことを示す記述子有無フラグが真となるブロックにおいて、上記ブロックの予測モードがインター予測であった場合、上記ブロック内の各特徴点に対して、動きベクトルが指し示す参照ピクチャ上の符号化済み記述子信号を予測記述子信号とする予測手段と、
記述子信号と予測記述子信号との差分値に対して上記ブロックの変換モードにしたがって変換処理を実施して変換信号を生成する変換手段と、
上記記述子有無フラグ、上記予測モード、上記変換モード、上記各特徴点の変換信号を符号化して、上記記述子有無フラグ、上記予測モード、上記変換モード、上記各特徴点の変換信号の符号化データが多重化されているビットストリームを生成する符号化手段と、
を備え、
上記予測手段は、特徴点毎に、予測ベクトル候補である上記ブロックの周囲の１以上の符号化済み動きベクトルから予測ベクトルを選択して、上記動きベクトルと上記予測ベクトルとの差分である差分ベクトルを生成し、
上記符号化手段は、特徴点毎の上記予測ベクトルの選択情報と上記差分ベクトルを符号化して、上記予測ベクトルの選択情報と上記差分ベクトルの符号化データをビットストリームに多重化する
ことを特徴とするものである。 An image feature descriptor encoding device according to the present invention includes:
Block dividing means for dividing the input image into blocks;
In a block in which a descriptor presence / absence flag indicating that one or more descriptors are included in the block is true, when the prediction mode of the block is inter prediction, a motion vector is obtained for each feature point in the block. Prediction means that uses a coded descriptor signal on a reference picture indicated by
Conversion means for performing a conversion process on the difference value between the descriptor signal and the predicted descriptor signal according to the conversion mode of the block to generate a conversion signal;
Encode the descriptor presence flag, the prediction mode, the conversion mode, and the conversion signal of each feature point, and encode the descriptor presence flag, the prediction mode, the conversion mode, and the conversion signal of each feature point. Encoding means for generating a bitstream in which data is multiplexed;
With
The prediction means selects a prediction vector from one or more encoded motion vectors around the block that is a prediction vector candidate for each feature point, and a difference vector that is a difference between the motion vector and the prediction vector Produces
The encoding means encodes the prediction vector selection information and the difference vector for each feature point, and multiplexes the prediction vector selection information and the difference vector encoded data into a bitstream. To do.

この発明によれば、記述子情報符号化手段が特徴記述子に対して動き補償予測を実施して予測差分情報を符号化すると共に、動き補償予測に用いる動きベクトルについても周囲のベクトルからの予測を実施し、得られる差分ベクトルを符号化するように構成したので、時間方向の相関を利用して予測精度を高めて、高い符号化効率を実現することができる効果がある。 According to the present invention, the descriptor information encoding means encodes the prediction difference information by performing motion compensation prediction on the feature descriptor, and also predicts motion vectors used for motion compensation prediction from surrounding vectors. Since the difference vector obtained is encoded, the prediction accuracy is improved by using the correlation in the time direction, and high encoding efficiency can be realized.

この発明の実施の形態１による画像特徴記述子符号化装置を示す構成図である。It is a block diagram which shows the image feature descriptor encoding apparatus by Embodiment 1 of this invention. この発明の実施の形態１による特徴記述子の例を示す説明図である。It is explanatory drawing which shows the example of the characteristic descriptor by Embodiment 1 of this invention. この発明の実施の形態１による記述子情報の符号化順序の一例を示す説明図である。It is explanatory drawing which shows an example of the encoding order of the descriptor information by Embodiment 1 of this invention. この発明の実施の形態１による記述子情報の符号化順序の一例を示す説明図である。It is explanatory drawing which shows an example of the encoding order of the descriptor information by Embodiment 1 of this invention. この発明の実施の形態１による記述子情報の符号化順序の一例を示す説明図である。It is explanatory drawing which shows an example of the encoding order of the descriptor information by Embodiment 1 of this invention. この発明の実施の形態１による記述子情報符号化部を示す構成図である。It is a block diagram which shows the descriptor information encoding part by Embodiment 1 of this invention. この発明の実施の形態１による画像特徴記述子符号化装置のブロック単位の符号化順序の一例であるラスタスキャンを示す説明図である。It is explanatory drawing which shows the raster scan which is an example of the encoding order of the block unit of the image feature descriptor encoding apparatus by Embodiment 1 of this invention. この発明の実施の形態１による画像特徴記述子符号化装置のブロック単位の符号化順序の一例であるジグザグスキャンを示す説明図である。It is explanatory drawing which shows the zigzag scan which is an example of the encoding order of the block unit of the image feature descriptor encoding apparatus by Embodiment 1 of this invention. この発明の実施の形態１による画像特徴記述子符号化装置のブロック単位の符号化順序の一例である斜めスキャンを示す説明図である。It is explanatory drawing which shows the diagonal scan which is an example of the encoding order of the block unit of the image feature descriptor encoding apparatus by Embodiment 1 of this invention. この発明の実施の形態１による画像特徴記述子符号化装置のブロック単位の符号化順序の一例であるスパイラルスキャンを示す説明図である。It is explanatory drawing which shows the spiral scan which is an example of the encoding order of the block unit of the image feature descriptor encoding apparatus by Embodiment 1 of this invention. この発明の実施の形態１による画像特徴記述子符号化装置のブロック単位の符号化順序の一例である四分木スキャンを示す説明図である。It is explanatory drawing which shows the quadtree scan which is an example of the encoding order of the block unit of the image feature descriptor encoding apparatus by Embodiment 1 of this invention. この発明の実施の形態１による記述子情報符号化部を示す別の構成図である。It is another block diagram which shows the descriptor information encoding part by Embodiment 1 of this invention. この発明の実施の形態１による画像特徴記述子復号装置を示す構成図である。It is a block diagram which shows the image feature descriptor decoding apparatus by Embodiment 1 of this invention. この発明の実施の形態１による画像特徴記述子復号装置を示す別の構成図である。It is another block diagram which shows the image feature descriptor decoding apparatus by Embodiment 1 of this invention. この発明の実施の形態１による画像特徴記述子符号化装置の処理内容（画像特徴記述子符号化方法）を示すフローチャートである。It is a flowchart which shows the processing content (image feature descriptor encoding method) of the image feature descriptor encoding apparatus by Embodiment 1 of this invention. 図１５における“記述子情報の符号化”の処理内容を示すフローチャートである。16 is a flowchart showing the processing content of “descriptor information encoding” in FIG. 15. 図１６における“符号化対象画素の記述子情報の符号化”の処理内容を示すフローチャートである。FIG. 17 is a flowchart showing the processing content of “encoding descriptor information of encoding target pixel” in FIG. 16. FIG. この発明の実施の形態１による予測ベクトル候補の位置を示す説明図である。It is explanatory drawing which shows the position of the prediction vector candidate by Embodiment 1 of this invention. この発明の実施の形態１による画像特徴記述子復号装置の処理内容（画像特徴記述子復号方法）を示すフローチャートである。It is a flowchart which shows the processing content (image feature descriptor decoding method) of the image feature descriptor decoding apparatus by Embodiment 1 of this invention. 図１９における“復号対象画素の記述子情報の復号”の処理内容を示すフローチャートである。FIG. 20 is a flowchart showing the processing contents of “decoding descriptor information of decoding target pixels” in FIG. 19. FIG. この発明の実施の形態２による予測ベクトル候補の位置を示す説明図である。It is explanatory drawing which shows the position of the prediction vector candidate by Embodiment 2 of this invention.

実施の形態１．
図１はこの発明の実施の形態１による画像特徴記述子符号化装置を示す構成図である。ブロック図のレベルでは、従来のものと同様の構成であるが、後述するように各要素の機能が従来のものと異なっている。
この実施の形態１の画像特徴記述子符号化装置が処理対象とする映像信号は、輝度信号と２つの色差信号からなるＹＵＶ信号や、ディジタル撮像素子から出力されるＲＧＢ信号等の任意の色空間のカラー映像信号のほか、モノクロ画像信号や赤外線画像信号など、映像フレームが水平・垂直２次元のディジタルサンプル（画素）列から構成される任意の映像信号である。
各画素の階調は８ビットでもよいし、１０ビット、１２ビットなどの階調であってもよい。
また、入力信号は映像信号ではなく静止画像信号でもよいことは、静止画像信号を１フレームのみで構成される映像信号と解釈できることから当然である。 Embodiment 1 FIG.
FIG. 1 is a block diagram showing an image feature descriptor encoding apparatus according to Embodiment 1 of the present invention. At the level of the block diagram, the configuration is the same as the conventional one, but the function of each element is different from the conventional one as will be described later.
The video signal to be processed by the image feature descriptor encoding apparatus according to the first embodiment is an arbitrary color space such as a YUV signal including a luminance signal and two color difference signals, or an RGB signal output from a digital image sensor. In addition to the above color video signal, the video frame is an arbitrary video signal such as a monochrome image signal or an infrared image signal, which is composed of a horizontal / vertical two-dimensional digital sample (pixel) sequence.
The gradation of each pixel may be 8 bits, or a gradation of 10 bits, 12 bits, or the like.
Further, it is natural that the input signal may be a still image signal instead of a video signal because the still image signal can be interpreted as a video signal composed of only one frame.

以下の説明においては、便宜上、特に断らない限り、入力される映像信号が、２つの色差成分Ｕ，Ｖが輝度成分Ｙと同じサンプル数であるＹＵＶ４：４：４フォーマットの信号であるものとする。ただし、他の４：４：４フォーマットである赤色（Ｒ）、緑色（Ｇ）、青色（Ｂ）の三原色の信号からなるＲＧＢフォーマットや色相、彩度、明度からなるＨＳＶフォーマットであってもよい。また、ＹＵＶ４：２：０フォーマットやＹＵＶ４：２：２フォーマットであってもよい。映像フォーマットは入力画像前処理部１によって入力される映像信号のフォーマットから画像特徴記述子を生成するためのフォーマットに（必要があれば）変換される。 In the following description, for the sake of convenience, unless otherwise specified, it is assumed that an input video signal is a YUV 4: 4: 4 format signal in which the two color difference components U and V are the same number of samples as the luminance component Y. . However, other 4: 4: 4 formats such as an RGB format composed of signals of three primary colors of red (R), green (G), and blue (B), and an HSV format composed of hue, saturation, and lightness may be used. . Alternatively, the YUV 4: 2: 0 format or the YUV 4: 2: 2 format may be used. The video format is converted from the format of the video signal input by the input image preprocessing unit 1 into a format for generating an image feature descriptor (if necessary).

なお、画像特徴記述子を生成するためのフォーマットは画像特徴記述子復号装置側で認識できるようにインデックス情報として上位ヘッダで符号化してもよい。また、４：４：４フォーマット信号の場合、各色信号をモノクローム画像信号（ＹＵＶ４：０：０）とみなしてそれぞれ独立に符号化してビットストリームを生成するようにしてもよい。その際、各モノクローム信号がどの色信号であるかを示す情報をインデックス情報として後述するシーケンスレベルヘッダ等の上位ヘッダで符号化して画像特徴記述子復号装置側で認識できるようにしてもよい。このようにすることで、画像特徴記述子復号装置によって復号した復号特徴記述子の対象の色信号を識別することができる。 It should be noted that the format for generating the image feature descriptor may be encoded with the upper header as index information so that the image feature descriptor decoding apparatus can recognize it. In the case of a 4: 4: 4 format signal, each color signal may be regarded as a monochrome image signal (YUV4: 0: 0) and encoded independently to generate a bitstream. At this time, information indicating which color signal each monochrome signal is may be encoded as index information with an upper header such as a sequence level header described later so that the image feature descriptor decoding apparatus can recognize it. By doing so, it is possible to identify the target color signal of the decoded feature descriptor decoded by the image feature descriptor decoding apparatus.

なお、上記では４：４：４フォーマット信号の各色信号をモノクローム画像信号とみなしてそれぞれ独立に符号化する場合について説明したが、実際にモノクローム画像信号を対象としてモノクローム（ＹＵＶ４：０：０）符号化することも当然可能である。
また、上記ではＹＵＶ信号フォーマットやＲＧＢ信号フォーマットの場合について説明したが、その他の色信号のフォーマット（ＹＣｂＣｒや、ＸＹＺ等）においても同様に４：２：０、４：２：２、４：４：４フォーマットのいずれかであれば、ＹＵＶフォーマットと同様に符号化できる。ただし、対象とするフォーマットの各色信号がＹＵＶフォーマットにおいていずれの色信号に対応するかについては限定しない（任意に設定できる）。もちろん、ＲＧＢフォーマットの場合について上記で説明したように、色信号の対応付けはインデックス情報として上位ヘッダで符号化して画像特徴記述子復号装置側で認識できるようにしてもよい。 In the above description, the case where each color signal of the 4: 4: 4 format signal is regarded as a monochrome image signal and independently encoded is described. However, a monochrome (YUV4: 0: 0) code is actually used for a monochrome image signal. Naturally, it is also possible.
In the above description, the YUV signal format and the RGB signal format have been described. However, other color signal formats (YCbCr, XYZ, etc.) are similarly 4: 2: 0, 4: 2: 2, 4: 4. : Any of the four formats can be encoded in the same manner as the YUV format. However, it is not limited (which can be arbitrarily set) which color signal in the target format corresponds to which color signal in the YUV format. Of course, as described above in the case of the RGB format, the color signal association may be encoded as index information in the upper header so that the image feature descriptor decoding apparatus can recognize it.

なお、映像の各フレームに対応する処理データ単位を「ピクチャ」と称し、この実施の形態１では、「ピクチャ」は順次走査（プログレッシブスキャン）された映像フレームの信号として説明を行う。ただし、映像信号がインタレース信号である場合、「ピクチャ」は映像フレームを構成する単位であるフィールド画像信号であってもよい。 The processing data unit corresponding to each frame of the video is referred to as “picture”. In the first embodiment, “picture” is described as a signal of a video frame that has been sequentially scanned (progressive scan). However, when the video signal is an interlace signal, the “picture” may be a field image signal which is a unit constituting a video frame.

図１において、入力画像前処理部１は、入力画像信号を入力し、前処理を実施する。前処理は解像度変換、色フォーマット変換、ノイズ除去処理等のフィルタ処理等が挙げられる。その際、制御部４より解像度変換指示、色フォーマット変換指示、フィルタ処理指示を行う。例えば、非特許文献１と同様に、入力画像を垂直方向及び水平方向の解像度をそれぞれ予め設定したサイズ及び設定した縮小処理によって縮小する等の固定の処理を制御部４が指示する。あるいは、解像度変換指示は解像度変換フィルタや変換サイズの指示、フィルタ処理指示は使用するフィルタの指示等を制御部４が所定の規則に従って動的に行う。 In FIG. 1, an input image preprocessing unit 1 inputs an input image signal and performs preprocessing. Preprocessing includes filter processing such as resolution conversion, color format conversion, and noise removal processing. At that time, a resolution conversion instruction, a color format conversion instruction, and a filter processing instruction are issued from the control unit 4. For example, as in Non-Patent Document 1, the control unit 4 instructs a fixed process such as reducing the input image in a vertical size and a horizontal direction with a preset size and a set reduction process, respectively. Alternatively, the control unit 4 dynamically issues a resolution conversion instruction, a resolution conversion filter and conversion size instruction, a filter processing instruction, a filter to be used, and the like according to a predetermined rule.

特徴記述子生成部２は、入力画像前処理部１から入力された前処理済み画像信号に対して特徴点を抽出し、各特徴点の特徴記述子を生成する。
以下、特徴点抽出の一例を挙げる。まず、特徴点を抽出する際に用いる多数のフィルタ処理画像を生成する。具体的には平滑化強度の異なるガウシアンフィルタを用意し、各フィルタによる平滑化画像をフィルタ処理画像の一つとしてそれぞれ生成する。さらに、上記平滑化画像をさらにダウンサンプルした画像も、フィルタ処理画像の一つとしてそれぞれ生成する。 The feature descriptor generation unit 2 extracts feature points from the preprocessed image signal input from the input image preprocessing unit 1, and generates a feature descriptor for each feature point.
An example of feature point extraction will be given below. First, a large number of filter processed images used when extracting feature points are generated. Specifically, Gaussian filters having different smoothing intensities are prepared, and a smoothed image by each filter is generated as one of the filtered images. Further, an image obtained by further down-sampling the smoothed image is also generated as one of the filtered images.

次に図２に特徴記述子生成の一例を挙げる。図２はある特徴点に対して１２８次元のベクトルを特徴記述子としている。具体的には、前処理済み画像信号に対してまず特徴点の勾配方向を算出する。そして勾配方向が水平方向（あるいは垂直方向等の固定の方向であればどの方向でもよい）になるように回転させた前処理済み画像信号に対して、特徴点を中心とした所定の周囲領域を１６個の正方ブロックに分割する。そしてブロック毎に各画素の勾配ベクトルを８方向に量子化してその大きさを加算する。図２の左に示すｈ０〜ｈ７は各方向におけるブロック内の（８方向に量子化された）勾配ベクトルの大きさの総和を示している。上記処理を各ブロックに対して行い、１２８次元のベクトル情報を生成し、これを特徴点に対する特徴記述子とする。 Next, FIG. 2 shows an example of feature descriptor generation. FIG. 2 uses a 128-dimensional vector as a feature descriptor for a certain feature point. Specifically, the gradient direction of the feature points is first calculated for the preprocessed image signal. For a preprocessed image signal rotated so that the gradient direction is a horizontal direction (or any direction as long as it is a fixed direction such as a vertical direction), a predetermined surrounding area centered on a feature point is obtained. Divide into 16 square blocks. Then, the gradient vector of each pixel is quantized in eight directions for each block and the magnitude is added. 2 indicate the total sum of the magnitudes of gradient vectors (quantized in 8 directions) in the block in each direction. The above processing is performed on each block to generate 128-dimensional vector information, which is used as a feature descriptor for the feature point.

また、特徴記述子生成部２は、制御部４の制御指示に基づき、抽出した特徴点の選択処理を実施する。例えば記述子情報符号化部３の出力する符号化ビットストリームの伝送ビットレート（あるいは符号化ビットストリームのデータサイズ）を制御部４が制御する場合、後述する記述子情報符号化部３で符号化される記述子情報を符号化して得られる符号化ビットストリームのビットレートが制御目標のビットレート以下となるように特徴点数を制御する。あるいは、ピクチャ当たりの特徴点数を後述するピクチャレベルヘッダで定義し、本特徴点数分の特徴記述子を選択するようにする。
特徴記述子生成部２は抽出した特徴点の位置情報と生成した特徴記述子を記述子情報として記述子情報符号化部３に出力する。 Further, the feature descriptor generation unit 2 performs a process of selecting the extracted feature points based on the control instruction of the control unit 4. For example, when the control unit 4 controls the transmission bit rate (or data size of the encoded bit stream) of the encoded bit stream output from the descriptor information encoding unit 3, the descriptor information encoding unit 3 described later encodes The number of feature points is controlled so that the bit rate of the encoded bit stream obtained by encoding the descriptor information is equal to or less than the bit rate of the control target. Alternatively, the number of feature points per picture is defined by a picture level header to be described later, and feature descriptors corresponding to the number of feature points are selected.
The feature descriptor generation unit 2 outputs the extracted feature point position information and the generated feature descriptor to the descriptor information encoding unit 3 as descriptor information.

記述子情報符号化部３は、特徴記述子生成部２から入力された各ピクチャの記述子情報を符号化して符号化ビットストリームを生成し、出力する。
ここで、ピクチャの符号化順序の例を図３〜図５に示す。図３、図４は符号化順と表示順（時刻順）が同一の符号化構造を示し、図５は符号化順を表示順と異なる順序に入れ替えた符号化構造を示している。 The descriptor information encoding unit 3 encodes the descriptor information of each picture input from the feature descriptor generation unit 2 to generate an encoded bit stream and outputs it.
Here, examples of the coding order of pictures are shown in FIGS. 3 and 4 show an encoding structure having the same encoding order and display order (time order), and FIG. 5 shows an encoding structure in which the encoding order is changed to an order different from the display order.

さらに、これら符号化構造は後述する予測処理の参照構造も規定する。図３は符号化済みピクチャを参照する予測を用いないＩ（Intra）ピクチャ（後述する予測モードが直接符号化モードのみとなるピクチャ）のみを用いる符号化構造であり、本構造で生成した符号化ビットストリームは任意のピクチャから復号を開始することが可能となる。次に、図４はＩピクチャに加えて符号化済みピクチャを参照する予測を利用可能とするＰ（Predictive）ピクチャ（後述する予測モードが直接符号化モードまたはインター予測モードとなるピクチャ）も用いる符号化構造であり、本構造で生成した符号化ビットストリームはＩピクチャからのみ復号を開始することが可能となる。Ｐピクチャにおけるインター予測は後述する参照ピクチャリストに示される１以上の符号化済みピクチャから１つの参照ピクチャを参照する。図５はＩピクチャ、Ｐピクチャに加えて参照ピクチャリストに示される１以上の符号化済みピクチャから１つないし２つの参照ピクチャを参照した予測を利用可能とするＢ（Bi-predictive）ピクチャも用いる符号化構造である。この構造では、符号化順を表示順から入れ替えることで符号化遅延が生じるが、Ｂピクチャによる符号化効率を向上することが可能となる。 Further, these coding structures also define a reference structure for prediction processing described later. FIG. 3 shows an encoding structure that uses only an I (Intra) picture (a picture in which a prediction mode described later is a direct encoding mode only) that does not use prediction that refers to an encoded picture. The bit stream can be decoded from any picture. Next, FIG. 4 shows a code that uses a P (Predictive) picture (a picture in which a prediction mode to be described later becomes a direct encoding mode or an inter prediction mode) that enables prediction referring to an encoded picture in addition to an I picture. The encoded bit stream generated with this structure can be decoded only from the I picture. Inter prediction in a P picture refers to one reference picture from one or more encoded pictures shown in a reference picture list described later. FIG. 5 also uses a B (Bi-predictive) picture that enables prediction with reference to one or two reference pictures from one or more encoded pictures shown in the reference picture list in addition to the I picture and P picture. It is a coding structure. In this structure, the coding delay is caused by changing the coding order from the display order, but the coding efficiency by the B picture can be improved.

画像特徴記述子復号装置は、符号化順は符号化ビットストリームのヘッダ情報（ピクチャレベルヘッダ）の一部であるピクチャタイプ情報（Ｉピクチャ、Ｐピクチャ、Ｂピクチャのいずれかを示す情報）を復号することで、各ピクチャの予測の参照候補関係は後述する参照ピクチャリストを復号することで、それぞれ判別可能となる。 The image feature descriptor decoding apparatus decodes picture type information (information indicating any one of an I picture, a P picture, and a B picture) whose encoding order is a part of header information (picture level header) of the encoded bitstream. Thus, the reference candidate relationship for prediction of each picture can be determined by decoding a reference picture list described later.

図６に記述子情報符号化部３の詳細を示す。図６の矢印について、黒丸が記されている箇所は同一の矢印の分技を示している。一方、黒丸のない矢印同士の交点は互いに独立であることを示す。
分割部１０１は、符号化対象ピクチャに対して、ピクチャ内の抽出した記述子情報である記述子信号と対応する記述子位置情報を所定の順序で、切換スイッチ１０２、記述子情報用メモリ１０４、符号化部１０７、に出力する処理を実施する。具体的には、記述子信号は切換スイッチ１０２、記述子位置情報は記述子情報用メモリ１０４と符号化部１０７に出力する。 FIG. 6 shows details of the descriptor information encoding unit 3. Regarding the arrows in FIG. 6, the black circles indicate the same arrow technique. On the other hand, the intersections of arrows without black circles are independent of each other.
The dividing unit 101 sets descriptor position information corresponding to the descriptor signal, which is the descriptor information extracted in the picture, in a predetermined order with respect to the encoding target picture, in a predetermined order, the changeover switch 102, the descriptor information memory 104, Processing to be output to the encoding unit 107 is performed. Specifically, the descriptor signal is output to the changeover switch 102, and the descriptor position information is output to the descriptor information memory 104 and the encoding unit 107.

上記所定の順序は、ブロック単位における所定の順序と、画素単位における所定の順序とからなる。まず、ブロックにピクチャを分割し、ブロック単位の所定の順序に従って処理する。各ブロックは記述子有無フラグを持ち、ブロック内に記述子情報が１以上存在する場合は真（１）、一つも存在しない場合は偽（０）を示す。そして、記述子有無フラグが真のブロックのみに対して、ブロック内の各記述子情報を、ラスタスキャン等の画素単位の所定の順序に従って処理する。ブロック単位の所定の順序の例を、図７〜図１１に示す。図７〜図１１では、それぞれ基準ブロックに分割し、基準ブロック単位に処理する。各図の矢印が基準ブロックの処理順を示す。図７はピクチャの左上から水平方向に処理するラスタスキャン、図８はピクチャの左上から斜め方向にジグザグに処理するジグザグスキャン、図９は左下斜め方向に処理する斜めスキャン、図１０はピクチャの中心から時計回りに処理するスパイラルスキャン（円スキャン）を示す。 The predetermined order includes a predetermined order in block units and a predetermined order in pixel units. First, a picture is divided into blocks and processed according to a predetermined order in block units. Each block has a descriptor presence / absence flag, and indicates true (1) when there is one or more descriptor information in the block, and false (0) when none exists. Then, for only a block whose descriptor presence / absence flag is true, each descriptor information in the block is processed according to a predetermined order in pixel units such as raster scan. Examples of the predetermined order in block units are shown in FIGS. 7 to 11, each is divided into reference blocks and processed in units of reference blocks. The arrows in each figure indicate the processing order of the reference block. 7 is a raster scan processed horizontally from the upper left of the picture, FIG. 8 is a zigzag scan processed zigzag from the upper left of the picture, FIG. 9 is an oblique scan processed diagonally from the lower left, and FIG. 10 is the center of the picture Shows spiral scan (circle scan) processed clockwise.

図１１は四分木スキャンを示しており、まず図７〜図１０と同様に基準ブロック単位に記述子有無フラグを有する。さらに、上記記述子有無フラグが真（１）のブロックはさらに４つのブロックに分割される。そして、分割された４つのブロックはそれぞれ記述子有無フラグを有し、上記記述子有無フラグが真（１）のブロックがさらに４つのブロックに分割される。記述子のないブロックは上記記述子有無フラグを偽（０）としてそれ以上の分割は実施しない。この四分木分割は設定した最小のブロックサイズまで階層的に分割され、記述子有無フラグが真となる最小サイズのブロックに対して画素単位の処理を実施する。 FIG. 11 shows a quadtree scan. First, similarly to FIGS. 7 to 10, a descriptor presence / absence flag is provided for each reference block. Further, the block having the descriptor presence / absence flag true (1) is further divided into four blocks. Each of the four divided blocks has a descriptor presence / absence flag, and a block whose descriptor presence / absence flag is true (1) is further divided into four blocks. For blocks without descriptors, the above-mentioned descriptor presence / absence flag is set to false (0) and no further division is performed. This quadtree partitioning is hierarchically divided up to the set minimum block size, and processing in units of pixels is performed on the minimum size block for which the descriptor presence / absence flag is true.

ブロック内の画素単位の処理は画素単位の記述子有無フラグをラスタスキャン等の所定の順序で処理するものである。
上記ブロック単位及び画素単位の記述子有無フラグが記述子位置情報を示す。
また、分割部１０１は記述子情報を記述子信号と記述子位置情報に分離し、出力する処理を実施する。 The processing in units of pixels in the block is to process the descriptor presence / absence flags in units of pixels in a predetermined order such as raster scanning.
The descriptor presence / absence flags in block units and pixel units indicate descriptor position information.
Further, the dividing unit 101 performs a process of separating and outputting the descriptor information into the descriptor signal and the descriptor position information.

切換スイッチ１０２は制御部４から入力される予測モード情報を参照し、切換スイッチの切り替えを実施する。具体的には、予測モード情報がインター予測モードを示す場合、記述子信号を減算部１０５及び記述子情報用メモリ１０４に出力する。一方、予測モードが直接符号化モードを示す場合、減算部１０５に出力せずに直接変換部に出力する。 The changeover switch 102 refers to the prediction mode information input from the control unit 4 and switches the changeover switch. Specifically, when the prediction mode information indicates the inter prediction mode, the descriptor signal is output to the subtraction unit 105 and the descriptor information memory 104. On the other hand, when the prediction mode indicates the direct encoding mode, the prediction mode is output to the direct conversion unit without being output to the subtraction unit 105.

予測部１０３は記述子情報用メモリ１０４を参照したインター予測を実施する。即ち、符号化対象記述子は参照ピクチャ（予測部１０３が参照可能な符号化済みピクチャ）の記述子の中から予測に用いる記述子を探索し、探索の結果検出した記述子を予測記述子信号として減算部１０５に出力する処理を実施する。予測記述子の位置を示す動きベクトル（符号化対象記述子の位置を起点とした予測記述子の位置情報）は、後述する予測ベクトルとの差分ベクトルとして、インター予測情報の一部として符号化部１０７によって符号化される。インター予測情報は、予測記述子の存在する参照ピクチャを特定する参照ピクチャ情報と、符号化済みの周囲の動きベクトルから得られる予測ベクトル情報、上記動きベクトルと上記予測ベクトルとの差分ベクトル、から構成される。 The prediction unit 103 performs inter prediction with reference to the descriptor information memory 104. That is, the descriptor to be encoded is searched for a descriptor to be used for prediction from descriptors of reference pictures (encoded pictures that can be referred to by the prediction unit 103), and the descriptor detected as a result of the search is used as a prediction descriptor signal. As shown in FIG. The motion vector indicating the position of the prediction descriptor (position information of the prediction descriptor starting from the position of the encoding target descriptor) is encoded as part of inter prediction information as a difference vector from the prediction vector described later. 107 is encoded. The inter prediction information includes reference picture information for specifying a reference picture in which a prediction descriptor exists, prediction vector information obtained from an encoded surrounding motion vector, and a difference vector between the motion vector and the prediction vector. Is done.

探索手法としては、例えば符号化対象記述子が図２に示す１２８次元のベクトルである場合、各ベクトルの差分の絶対値和が最小となるものを予測記述子信号とする手法や、各ベクトルの差分の二乗誤差和が最小となるものを予測記述子信号とする手法がある。
また予測記述子信号を複数の記述子から生成してもよい。具体的には１以上の参照ピクチャから、２つの記述子を選択し、それらの重み付け平均値を予測記述子信号とする。２つの記述子の組み合わせとしては異なるピクチャから選択してもよいし、同一参照ピクチャの異なる位置の記述子を選択してもよい。２つの記述子の重み付け平均値を予測記述子信号とすることで、予測値のノイズ低減効果が得られ、予測精度を向上することができる。なお、予測記述子信号を複数の記述子から生成するか、単一の記述子から生成するかの情報はインター予測情報の一部として符号化される。 As a search method, for example, when the encoding target descriptor is the 128-dimensional vector shown in FIG. 2, a method that uses a prediction descriptor signal that minimizes the sum of absolute values of the differences between the vectors, There is a technique in which a signal with the smallest sum of squared differences is used as a prediction descriptor signal.
A prediction descriptor signal may be generated from a plurality of descriptors. Specifically, two descriptors are selected from one or more reference pictures, and their weighted average value is used as a prediction descriptor signal. A combination of two descriptors may be selected from different pictures, or descriptors at different positions of the same reference picture may be selected. By using the weighted average value of the two descriptors as the prediction descriptor signal, the noise reduction effect of the prediction value can be obtained and the prediction accuracy can be improved. Note that information on whether a prediction descriptor signal is generated from a plurality of descriptors or a single descriptor is encoded as part of inter prediction information.

記述子情報用メモリ１０４は、参照ピクチャの記述子信号とそれに対応する記述子位置情報を格納する処理を実施する。上記参照ピクチャの記述子信号と記述子位置情報を合わせて参照記述子情報と呼ぶ。参照ピクチャは制御部４によって制御される。制御方法については、常に符号化順序で直前のピクチャとする等、固定の方式としてもよいし、ピクチャ単位に動的に制御するようにしてもよい。例えば、参照候補となるピクチャ番号を参照ピクチャリストとして管理し、ピクチャ単位に更新する。リストの更新情報は符号化部１０７で符号化して符号化ビットストリームに多重化することで画像特徴記述子復号装置側でも符号化ビットストリームを復号することで同一の参照ピクチャリストを生成することができる。 The descriptor information memory 104 performs processing for storing the descriptor picture signal of the reference picture and the corresponding descriptor position information. The reference picture descriptor signal and descriptor position information are collectively referred to as reference descriptor information. The reference picture is controlled by the control unit 4. The control method may be a fixed method such as always using the immediately preceding picture in the encoding order, or may be dynamically controlled on a picture-by-picture basis. For example, picture numbers that are reference candidates are managed as a reference picture list and updated in units of pictures. The update information of the list is encoded by the encoding unit 107 and multiplexed into the encoded bitstream, so that the same reference picture list can be generated by decoding the encoded bitstream on the image feature descriptor decoding side. it can.

なお、参照ピクチャリストの更新情報は符号順で直前のピクチャの符号化時の参照ピクチャリストとのピクチャ番号の差分値で表すことができる。また、参照ピクチャ数はピクチャ単位に可変であってもよく、その場合は新規でピクチャ番号を符号化する。ピクチャ毎の参照ピクチャ数はシーケンス単位あるいはピクチャ単位に符号化する。また、最大の参照ピクチャ数はシーケンス単位に符号化する。最大の参照ピクチャ数を規定することで画像特徴記述子符号化装置が必要とする記述子情報用メモリ１０４の容量（同様に、画像特徴記述子復号装置が必要とする記述子情報用メモリ２０４の容量）を規定することができる。例えば予測精度よりも記述子情報用メモリ１０４の容量を低用量とすることを重要視する場合、最大の参照ピクチャ数を小さい値とすることで記述子情報用メモリ１０４が要する容量を低減できる。 The update information of the reference picture list can be represented by a difference value of the picture number from the reference picture list at the time of encoding the previous picture in the code order. The number of reference pictures may be variable for each picture. In that case, a new picture number is encoded. The number of reference pictures for each picture is encoded in sequence units or picture units. The maximum number of reference pictures is encoded in sequence units. By defining the maximum number of reference pictures, the capacity of the descriptor information memory 104 required by the image feature descriptor encoding device (similarly, the descriptor information memory 204 required by the image feature descriptor decoding device) Capacity) can be defined. For example, when it is important to set the capacity of the descriptor information memory 104 to be lower than the prediction accuracy, the capacity required for the descriptor information memory 104 can be reduced by setting the maximum number of reference pictures to a small value.

減算部１０５は切換スイッチ１０２より出力された記述子信号から、予測部１０３より出力された予測記述子信号を減算して、その減算結果を差分記述子信号として変換部１０６に出力する処理を実施する。 The subtracting unit 105 performs a process of subtracting the prediction descriptor signal output from the prediction unit 103 from the descriptor signal output from the changeover switch 102 and outputting the subtraction result to the conversion unit 106 as a difference descriptor signal. To do.

変換部１０６は制御部４により決定された変換モードを参照して、減算部１０５から出力された差分記述子信号に対する変換処理を実施し、変換結果を変換信号として符号化部１０７に出力する処理を実施する。即ち、変換方式を１以上用意し、変換モードによってこれら変換方式を選択する。変換方式については、例えば図２の１２８次元のベクトルにおける一つのセルに属する８成分ｈ０、ｈ１、・・・、ｈ７に対して、下記変換方式を用意し、セル単位に選択するようにする。この場合、セル単位の選択情報が変換モードに対応する。もしくはセル毎に予め決めた方式で行うように設定してもよい。このように構成した場合、変換モードは不要となる。例えば、ＴｒａｎｓｆｏｒｍＡを実施するセルに隣接するセルはＴｒａｎｓｆｏｒｍＢを実施し、ＴｒａｎｓｆｏｒｍＢを実施するセルに隣接するセルはＴｒａｎｓｆｏｒｍＡを実施するようにチェッカーボードの白黒のように交互に方式を入れ替えるようにする方法がある。 The conversion unit 106 refers to the conversion mode determined by the control unit 4, performs conversion processing on the differential descriptor signal output from the subtraction unit 105, and outputs the conversion result to the encoding unit 107 as a conversion signal To implement. That is, one or more conversion methods are prepared, and these conversion methods are selected according to the conversion mode. For the conversion method, for example, the following conversion method is prepared for the eight components h0, h1,..., H7 belonging to one cell in the 128-dimensional vector of FIG. In this case, the selection information for each cell corresponds to the conversion mode. Or you may set so that it may carry out by the method decided beforehand for every cell. In such a configuration, the conversion mode becomes unnecessary. For example, a cell adjacent to a cell implementing Transform A performs Transform B, and a cell adjacent to a cell implementing Transform B is switched alternately like black and white on a checkerboard so as to implement Transform A. There is a way to make it.

ＴｒａｎｓｆｏｒｍＡ
v0 = (h2 - h6)/2
v1 = (h3 - h7)/2
v2 = (h0 - h1)/2
v3 = (h2 - h3)/2
v4 = (h4 - h5)/2
v5 = (h6 - h7)/2
v6 = ((h0 + h4) - (h2 + h6))/4
v7 = ((h0 + h2 + h4 + h6) - (h1 + h3 + h5 + h7))/8
ＴｒａｎｓｆｏｒｍＢ
v0 = (h0 - h4)/2
v1 = (h1 - h5)/2
v2 = (h7 - h0)/2
v3 = (h1 - h2)/2
v4 = (h3 - h4)/2
v5 = (h5 - h6)/2
v6 = ((h1 + h5) - (h3 + h7))/4
v7 = ((h0 + h1 + h2 + h3) - (h4 + h5 + h6 + h7))/8 Transform A
v0 = (h2-h6) / 2
v1 = (h3-h7) / 2
v2 = (h0-h1) / 2
v3 = (h2-h3) / 2
v4 = (h4-h5) / 2
v5 = (h6-h7) / 2
v6 = ((h0 + h4)-(h2 + h6)) / 4
v7 = ((h0 + h2 + h4 + h6)-(h1 + h3 + h5 + h7)) / 8
Transform B
v0 = (h0-h4) / 2
v1 = (h1-h5) / 2
v2 = (h7-h0) / 2
v3 = (h1-h2) / 2
v4 = (h3-h4) / 2
v5 = (h5-h6) / 2
v6 = ((h1 + h5)-(h3 + h7)) / 4
v7 = ((h0 + h1 + h2 + h3)-(h4 + h5 + h6 + h7)) / 8

また、変換モードの一つに“変換なし”を用意して変換の有無を制御できるようにしてもよい。このようにすることで、変換による電力集中度が高い領域のみを選択的に変換することが可能となり、変換による電力集中度の低い領域に対する変換による符号化効率の低下を抑制できる。 Further, “no conversion” may be prepared as one of the conversion modes so that the presence or absence of conversion can be controlled. In this way, it is possible to selectively convert only a region where the power concentration degree due to conversion is high, and it is possible to suppress a decrease in coding efficiency due to conversion for a region where the power concentration degree due to conversion is low.

符号化部１０７は変換部１０６から出力された変換信号と、分割部１０１から出力された記述子位置情報と、制御部４から出力された予測モード、変換モード、予測部１０３から出力されたインター予測情報を記述子データとして符号化して符号化ビットストリームを生成する処理を実施する。
各パラメータの符号化方式の例としては、算術符号、ハフマン符号等のエントロピー符号が挙げられる。
また、符号化部１０７は、符号化ビットストリームのヘッダ情報を符号化する。ヘッダ情報として、シーケンスレベルヘッダ、ピクチャレベルヘッダを符号化し、記述子データと共に符号化ビットストリームを生成する処理を実施する。参照ピクチャ管理情報はピクチャレベルヘッダとして符号化される。 The encoding unit 107 outputs the conversion signal output from the conversion unit 106, the descriptor position information output from the division unit 101, the prediction mode and conversion mode output from the control unit 4, and the interface output from the prediction unit 103. A process of generating the encoded bitstream by encoding the prediction information as descriptor data is performed.
Examples of the encoding method for each parameter include entropy codes such as arithmetic codes and Huffman codes.
Also, the encoding unit 107 encodes header information of the encoded bit stream. As header information, a sequence level header and a picture level header are encoded, and a process of generating an encoded bit stream together with descriptor data is performed. The reference picture management information is encoded as a picture level header.

シーケンスレベルヘッダは、画像サイズ、色信号フォーマット、インター予測時の最大参照ピクチャ数等、一般的にシーケンス単位に共通となるヘッダ情報をまとめたものである。
ピクチャレベルヘッダは、参照するシーケンスレベルヘッダのインデックス、ピクチャタイプ情報、エントロピー符号化の確率テーブル初期化フラグ、上述の参照ピクチャリスト等、ピクチャ単位で設定するヘッダ情報をまとめたものである。
また、シーケンスレベルヘッダあるいはピクチャレベルヘッダの一つとしてピクチャ当たりの最大特徴点数を持つ。これにより、ピクチャが持つ最大の特徴点数を制御することが可能である。本パラメータを小さくすることで、一般に符号化及び復号処理の負荷が低減されると共に本画像特徴記述子符号化装置が出力する符号化ビットストリームのビットレートを低減することができる。
さらに、ピクチャレベルヘッダは当該ピクチャに特徴記述子（特徴点）が存在するか否かのフラグ情報（記述子存在フラグ）を持つ。画像特徴記述子符号化装置では記述子が存在するか否かを当該ピクチャの記述子位置情報が存在するか否かで判別できる。画像特徴記述子復号装置は符号化ビットストリームから本フラグを復号することで当該ピクチャに特徴記述子が存在するか否かを識別することができる。 The sequence level header is a collection of header information that is generally common to sequence units, such as the image size, color signal format, and the maximum number of reference pictures during inter prediction.
The picture level header is a collection of header information set in units of pictures, such as an index of a sequence level header to be referred to, picture type information, an entropy coding probability table initialization flag, and the above reference picture list.
Also, it has the maximum number of feature points per picture as one of sequence level header or picture level header. This makes it possible to control the maximum number of feature points that a picture has. By reducing this parameter, it is possible to generally reduce the load of encoding and decoding processing and reduce the bit rate of the encoded bit stream output from the image feature descriptor encoding apparatus.
Furthermore, the picture level header has flag information (descriptor presence flag) indicating whether or not a feature descriptor (feature point) exists in the picture. In the image feature descriptor encoding apparatus, whether or not a descriptor exists can be determined by whether or not descriptor position information of the picture exists. The image feature descriptor decoding apparatus can identify whether or not a feature descriptor exists in the picture by decoding this flag from the encoded bit stream.

各ヘッダ情報と後述するピクチャデータはＮＡＬ（Network Abstraction）ユニットによって識別される。具体的には、シーケンスレベルヘッダ、ピクチャレベルヘッダ、記述子データをピクチャ単位でまとめたピクチャデータはそれぞれ固有のＮＡＬユニットタイプとして定義され、ＮＡＬユニットタイプの識別情報（インデックス）と共に符号化される。補足情報についても、存在する場合には固有のＮＡＬユニットとして定義される。また、アクセスユニットは、ピクチャデータを示すＮＡＬユニットや補足情報を示すＮＡＬユニット等の当該ピクチャに関わるＮＡＬユニットをまとめた、各ピクチャのデータアクセスの単位を示している。
さらに、ピクチャデータは参照するピクチャレベルヘッダのインデックスをヘッダ除法として有する。 Each header information and later-described picture data are identified by a NAL (Network Abstraction) unit. Specifically, picture data in which a sequence level header, a picture level header, and descriptor data are grouped in units of pictures is defined as a unique NAL unit type, and is encoded together with identification information (index) of the NAL unit type. Supplemental information is also defined as a unique NAL unit if it exists. The access unit indicates a unit of data access of each picture in which NAL units related to the picture such as a NAL unit indicating picture data and a NAL unit indicating supplementary information are collected.
Further, the picture data has a picture level header index to be referred to as a header division method.

図１の例では、画像特徴記述子符号化装置の構成要素である分割部１０１、切換スイッチ１０２、予測部１０３、減算部１０５、変換部１０６及び符号化部１０７のそれぞれが専用のハードウェアで構成（例えば、ＣＰＵを実装している半導体集積回路や、ワンチップマイコンなどで構成）されているものを想定しているが、画像符号化装置がコンピュータで構成されていてもよい。
画像特徴記述子符号化装置をコンピュータで構成する場合、記述子情報用メモリ１０４をコンピュータのメモリ上に構成するとともに、分割部１０１、切換スイッチ１０２、予測部１０３、減算部１０５、変換部１０６及び符号化部１０７の処理内容を記述しているプログラムをコンピュータのメモリに格納し、当該コンピュータのＣＰＵが当該メモリに格納されているプログラムを実行するようにすればよい。 In the example of FIG. 1, each of the division unit 101, the changeover switch 102, the prediction unit 103, the subtraction unit 105, the conversion unit 106, and the encoding unit 107, which are components of the image feature descriptor encoding device, is dedicated hardware. Although a configuration (for example, a semiconductor integrated circuit mounted with a CPU or a one-chip microcomputer) is assumed, the image encoding device may be configured with a computer.
When the image feature descriptor encoding apparatus is configured by a computer, the descriptor information memory 104 is configured on the computer memory, and the dividing unit 101, the changeover switch 102, the prediction unit 103, the subtraction unit 105, the conversion unit 106, A program describing the processing contents of the encoding unit 107 may be stored in a memory of a computer, and the CPU of the computer may execute the program stored in the memory.

図１２に画像特徴記述子符号化装置の別の構成を示す。本構成では図６の構成と比較して予測部１０３から予測部１０８に変更されている。予測部１０８は制御部４より予測モードが入力され、この予測モードにしたがって予測方式を切り替える。即ち切換スイッチ１０２と予測部１０３を組み合わせたものと同等の機能を有する。具体的には、予測モードがインター予測モードを示す場合、記述子情報用メモリ１０４を参照したインター予測を実施し、インター予測によって得られた予測記述子信号を減算部１０５に出力する。インター予測の際得られたインター予測情報は符号化部１０７に出力される。一方、予測モードが直接符号化モードを示す場合、予測記述子信号を全て値が０の記述子として減算部１０５に出力する。さらに、本予測の場合インター予測情報はないためインター予測情報を符号化部１０７に出力しない。 FIG. 12 shows another configuration of the image feature descriptor encoding apparatus. In this configuration, the prediction unit 103 is changed to the prediction unit 108 as compared with the configuration of FIG. The prediction unit 108 receives a prediction mode from the control unit 4 and switches the prediction method according to the prediction mode. That is, it has a function equivalent to that of the combination of the changeover switch 102 and the prediction unit 103. Specifically, when the prediction mode indicates the inter prediction mode, inter prediction with reference to the descriptor information memory 104 is performed, and a prediction descriptor signal obtained by the inter prediction is output to the subtraction unit 105. The inter prediction information obtained in the inter prediction is output to the encoding unit 107. On the other hand, when the prediction mode indicates the direct encoding mode, the prediction descriptor signal is output to the subtraction unit 105 as a descriptor whose values are all 0. Further, since there is no inter prediction information in the case of the main prediction, the inter prediction information is not output to the encoding unit 107.

図１３はこの発明の実施の形態１による画像特徴記述子復号装置を示す構成図である。
図１３において、復号部２０１は、図１の画像特徴記述子符号化装置により生成された符号化ビットストリームを入力すると、そのビットストリームからシーケンスレベルヘッダ、ピクチャレベルヘッダなどの、各ヘッダ情報と記述子データを復号する処理を実施する。 FIG. 13 is a block diagram showing an image feature descriptor decoding apparatus according to Embodiment 1 of the present invention.
In FIG. 13, when the decoding unit 201 inputs the encoded bit stream generated by the image feature descriptor encoding apparatus of FIG. 1, each header information and description such as a sequence level header and a picture level header are described from the bit stream. A process of decoding the child data is performed.

このとき、シーケンスレベルヘッダに、ＹＵＶ４：４：４フォーマット信号やＲＧＢ４：４：４フォーマット信号の各信号をモノクローム画像信号とみなしてそれぞれ独立にモノクローム（ＹＵＶ４：０：０）符号化していることを示す情報が、ヘッダ情報として含まれる場合、各色信号の符号化ビットストリームに対して、それぞれ独立に復号処理を実施することができる。 At this time, in the sequence level header, each signal of the YUV 4: 4: 4 format signal and the RGB 4: 4: 4 format signal is regarded as a monochrome image signal and is independently encoded in monochrome (YUV4: 0: 0). When the information to be shown is included as header information, decoding processing can be performed independently on the encoded bit stream of each color signal.

各ヘッダ情報とピクチャデータはＮＡＬユニットによって識別される。具体的には、シーケンスレベルヘッダ、ピクチャレベルヘッダ、ピクチャデータはそれぞれ固有のＮＡＬユニットタイプとして定義され、ＮＡＬユニットタイプの識別情報（インデックス）を復号することで識別される。補足情報についても、存在する場合には固有のＮＡＬユニットとして識別される。また、アクセスユニットは、ピクチャデータや補足情報を示すＮＡＬユニット等の当該ピクチャに関わるＮＡＬユニットをまとめたユニットとして識別される。 Each header information and picture data is identified by a NAL unit. Specifically, the sequence level header, the picture level header, and the picture data are each defined as a unique NAL unit type, and are identified by decoding identification information (index) of the NAL unit type. The supplemental information is also identified as a unique NAL unit if it exists. Further, the access unit is identified as a unit in which NAL units related to the picture such as NAL units indicating picture data and supplementary information are collected.

また、ピクチャデータのヘッダ情報に存在するピクチャレベルヘッダのインデックスから当該ピクチャデータが参照するピクチャレベルヘッダが特定され、ピクチャレベルヘッダに存在するシーケンスレベルヘッダのインデックスから参照するシーケンスレベルヘッダが特定される。 In addition, a picture level header referred to by the picture data is identified from the index of the picture level header existing in the header information of the picture data, and a sequence level header referenced from the index of the sequence level header existing in the picture level header is specified. .

復号部２０１は、ピクチャレベルヘッダの参照ピクチャ管理情報を復号し、記述子情報用メモリ２０４に出力する処理を実施する。
また、ピクチャレベルヘッダの記述子存在フラグが真である時のみ当該ピクチャの記述子データが存在するため、２０２〜２０６の各部の処理を実施する。即ち、上記存在フラグが偽を示す場合、当該ピクチャの記述子データは存在しないため、復号順で次のピクチャの復号処理に進む。 The decoding unit 201 performs a process of decoding the reference picture management information in the picture level header and outputting it to the descriptor information memory 204.
Further, since the descriptor data of the picture exists only when the descriptor presence flag of the picture level header is true, the processing of each unit 202 to 206 is performed. That is, when the presence flag indicates false, the descriptor data of the picture does not exist, and the process proceeds to the decoding process of the next picture in the decoding order.

また、復号部２０１は、記述子データを復号して、変換信号、記述子位置情報、予測モード、変換モード、インター予測情報を得る。変換信号を逆変換部２０２、記述子位置情報は記述子情報用メモリ２０４、予測モードを切換スイッチ２０５、変換モードを逆変換部２０２、インター予測情報を予測部２０３にそれぞれ出力すると共に、記述子位置情報を画像特徴記述子復号装置の出力として出力する。なお、動きベクトルは復号済みの動きベクトル及び復号した予測ベクトル情報から得られる予測ベクトルを予測値とした復号処理によって得られる。
記述子位置情報は先述の通りブロック単位及び画素単位の記述子有無フラグから構成されており、画像特徴記述子符号化装置と同一のブロック単位及び画素単位の処理順にて、復号部２０１が復号した記述子位置情報に従って記述子の位置を特定する。 Also, the decoding unit 201 decodes the descriptor data to obtain a converted signal, descriptor position information, prediction mode, conversion mode, and inter prediction information. The conversion signal is output to the inverse conversion unit 202, the descriptor position information is output to the descriptor information memory 204, the prediction mode is output to the changeover switch 205, the conversion mode is output to the inverse conversion unit 202, and the inter prediction information is output to the prediction unit 203. The position information is output as an output of the image feature descriptor decoding apparatus. The motion vector is obtained by a decoding process in which a prediction vector obtained from the decoded motion vector and the decoded prediction vector information is used as a prediction value.
As described above, the descriptor position information is composed of the descriptor presence / absence flags in block units and pixel units, and is decoded by the decoding unit 201 in the same block unit and pixel unit processing order as the image feature descriptor encoding apparatus. The position of the descriptor is specified according to the descriptor position information.

逆変換部２０２は、復号部２０１により復号された変換モードを参照して、復号部２０１により復号された変換信号に対して逆変換処理を実施して、図６の差分記述子信号と同一の差分記述子信号を算出する処理を実施する。 The inverse conversion unit 202 refers to the conversion mode decoded by the decoding unit 201, performs an inverse conversion process on the conversion signal decoded by the decoding unit 201, and is the same as the difference descriptor signal in FIG. A process for calculating a differential descriptor signal is performed.

予測部２０３は、画像特徴記述子符号化装置の予測部１０３と同様のインター予測処理を実施する。即ち、復号部２０１により復号されたインター予測情報が示す記述子情報用メモリ２０４に格納された記述子を、予測記述子信号として加算部２０６に出力する処理を実施する。 The prediction unit 203 performs an inter prediction process similar to that performed by the prediction unit 103 of the image feature descriptor encoding device. That is, a process of outputting the descriptor stored in the descriptor information memory 204 indicated by the inter prediction information decoded by the decoding unit 201 to the adding unit 206 as a prediction descriptor signal is performed.

記述子情報用メモリ２０４は、復号部２０１により復号された参照ピクチャ管理情報に基づいて、参照記述子情報である参照ピクチャの記述子信号とそれに対応する記述子位置情報を管理する処理を実施する。具体的には上記参照ピクチャ管理情報に従って各参照ピクチャの参照記述子情報の参照可否を設定する。また、上記参照ピクチャ管理情報とシーケンスレベルヘッダの一部として復号されるインター予測時の最大参照ピクチャ数に従って記述子情報用メモリ２０４に格納する参照ピクチャの管理（記録、削除）を行う。 Based on the reference picture management information decoded by the decoding unit 201, the descriptor information memory 204 performs a process of managing the reference picture descriptor signal as reference descriptor information and the corresponding descriptor position information. . Specifically, whether to refer to the reference descriptor information of each reference picture is set according to the reference picture management information. Further, the reference picture stored in the descriptor information memory 204 is managed (recorded or deleted) in accordance with the reference picture management information and the maximum number of reference pictures at the time of inter prediction decoded as part of the sequence level header.

切換スイッチ２０５は、復号部２０１により復号された予測モードを参照し、切換スイッチの切り替えを実施する。具体的には、予測モードがインター予測モードを示す場合、差分記述子信号を加算部２０６に出力する。一方、予測モードが直接符号化モードを示す場合、加算部２０６に出力せずに画像特徴記述子復号装置の出力である記述子信号として出力する。 The changeover switch 205 refers to the prediction mode decoded by the decoding unit 201 and switches the changeover switch. Specifically, when the prediction mode indicates the inter prediction mode, the difference descriptor signal is output to the addition unit 206. On the other hand, when the prediction mode indicates the direct encoding mode, it is not output to the adding unit 206 but is output as a descriptor signal that is an output of the image feature descriptor decoding apparatus.

加算部２０６は、切換スイッチ２０５より出力された予測記述子信号と、予測部２０３より出力された予測記述子信号を加算して、その加算結果を画像特徴記述子復号装置の出力である記述子信号として出力する。 The addition unit 206 adds the prediction descriptor signal output from the changeover switch 205 and the prediction descriptor signal output from the prediction unit 203, and the addition result is a descriptor that is an output of the image feature descriptor decoding apparatus. Output as a signal.

図１３の例では、画像特徴記述子復号装置の構成要素である復号部２０１、逆変換部２０２、予測部２０３、記述子情報用メモリ２０４、切換スイッチ２０５、加算部２０６のそれぞれが専用のハードウェアで構成（記述子情報用メモリ２０４以外の構成要素は、例えば、ＣＰＵを実装している半導体集積回路や、ワンチップマイコンなどで構成）されているものを想定しているが、画像復号装置がコンピュータで構成されていてもよい。
画像特徴記述子復号装置をコンピュータで構成する場合、記述子情報用メモリ２０４をコンピュータのメモリ上に構成するとともに、復号部２０１、逆変換部２０２、予測部２０３、切換スイッチ２０５、加算部２０６の処理内容を記述しているプログラムをコンピュータのメモリに格納し、当該コンピュータのＣＰＵが当該メモリに格納されているプログラムを実行するようにすればよい。 In the example of FIG. 13, each of the decoding unit 201, the inverse conversion unit 202, the prediction unit 203, the descriptor information memory 204, the changeover switch 205, and the addition unit 206, which are components of the image feature descriptor decoding device, is dedicated hardware. It is assumed that the image decoding apparatus is configured by hardware (components other than the descriptor information memory 204 are configured by, for example, a semiconductor integrated circuit mounted with a CPU or a one-chip microcomputer) May be configured by a computer.
When the image feature descriptor decoding apparatus is configured by a computer, the descriptor information memory 204 is configured on the computer memory, and the decoding unit 201, the inverse conversion unit 202, the prediction unit 203, the changeover switch 205, and the addition unit 206 A program describing processing contents may be stored in a memory of a computer, and a CPU of the computer may execute the program stored in the memory.

図１４に、画像特徴記述子復号装置の別の構成を示す。本構成では図１３の構成と比較して、予測部２０３から予測部２０７に変更されている。予測部２０７は画像特徴記述子符号化装置の予測部１０８と同様の予測処理を実施する。予測部２０７は復号部２０１により復号された予測モードが入力され、この予測モードにしたがって予測方式を切り替える。即ち切換スイッチ２０５と予測部２０７を組み合わせたものと同等の機能を有する。具体的には、予測モードがインター予測モードを示す場合、復号部２０１により復号されたインター予測情報と記述子情報用メモリ２０４を参照したインター予測を実施し、インター予測によって得られた予測記述子信号を加算部２０６に出力する。一方、予測モードが直接符号化モードを示す場合、予測記述子信号を全て値が０の記述子として加算部２０６に出力する。 FIG. 14 shows another configuration of the image feature descriptor decoding apparatus. In this configuration, the prediction unit 203 is changed to the prediction unit 207 as compared with the configuration of FIG. The prediction unit 207 performs the same prediction process as the prediction unit 108 of the image feature descriptor encoding device. The prediction unit 207 receives the prediction mode decoded by the decoding unit 201, and switches the prediction method according to the prediction mode. That is, it has a function equivalent to that of the combination of the changeover switch 205 and the prediction unit 207. Specifically, when the prediction mode indicates the inter prediction mode, inter prediction with reference to the inter prediction information decoded by the decoding unit 201 and the descriptor information memory 204 is performed, and the prediction descriptor obtained by the inter prediction The signal is output to the adding unit 206. On the other hand, when the prediction mode indicates the direct encoding mode, the prediction descriptor signal is output to the adding unit 206 as a descriptor having all values of 0.

次に動作について説明する。
図１５は、この発明の実施の形態１による画像特徴記述子符号化装置の処理内容（画像特徴記述子符号化方法）を示すフローチャートである。
この実施の形態１では、映像の各フレーム画像を入力画像として、符号化済みピクチャを利用したインター予測を実施して、得られた予測記述子信号に対して変換処理を施し、変換後の変換信号を含む記述子データとヘッダ情報の符号化を行って符号化ビットストリームを生成する画像特徴記述子符号化装置と、その画像特徴記述子符号化装置から出力される符号化ビットストリームを復号する画像特徴記述子復号装置について説明する。 Next, the operation will be described.
FIG. 15 is a flowchart showing the processing contents (image feature descriptor encoding method) of the image feature descriptor encoding apparatus according to Embodiment 1 of the present invention.
In the first embodiment, each frame image of video is used as an input image, inter prediction using encoded pictures is performed, conversion processing is performed on the obtained prediction descriptor signal, and conversion after conversion is performed. An image feature descriptor encoding apparatus that encodes descriptor data including a signal and header information to generate an encoded bit stream, and an encoded bit stream output from the image feature descriptor encoding apparatus An image feature descriptor decoding apparatus will be described.

図１の画像特徴記述子符号化装置は、映像信号から特徴点とその点における特徴記述子を生成し、特徴点の時間方向の変化に応じたインター予測符号化を行うことを特徴としている。
一般的に、映像信号は時間的な相関が高く、特徴記述子においても隣接ピクチャでは変化が小さく時間的相関が高い傾向にある。したがって、インター予測を行うことで高精度な予測が実現できる。一方で、オクルージョン等によって特徴点の変化・消失や、新しいオブジェクトの出現により特徴点が新たに出現する等、時間方向の相関が乏しくインター予測の予測効率が低いケースも存在する。したがって、上記のような特徴点の時間方向の変化に応じてインター予測の有無を切り替えることで予測差分信号の電力・エントロピーを低減する方が望ましい。本実施の形態ではこのような適応的にインター予測の有無の切り替えを可能とする画像特徴記述子符号化装置を実現する。 The image feature descriptor encoding apparatus in FIG. 1 is characterized in that a feature point and a feature descriptor at that point are generated from a video signal, and inter-prediction encoding is performed in accordance with a change in the time direction of the feature point.
In general, video signals have a high temporal correlation, and feature descriptors tend to have a small temporal change and a small change in adjacent pictures. Therefore, highly accurate prediction can be realized by performing inter prediction. On the other hand, there are cases where the prediction efficiency of inter prediction is low due to poor correlation in the time direction, such as feature points changing / disappearing due to occlusion, or new feature points appearing due to the appearance of new objects. Therefore, it is desirable to reduce the power / entropy of the prediction difference signal by switching the presence / absence of inter prediction according to the change of the feature point in the time direction as described above. In the present embodiment, such an image feature descriptor encoding device that can adaptively switch the presence / absence of inter prediction is realized.

この実施の形態１では、このような映像信号の一般的な性質に適応した符号化を行うため、所定のブロック分割処理に従って分割されたブロック単位にインター予測処理の有無を適応化させる構成をとるようにしている。 In the first embodiment, in order to perform coding adapted to the general properties of such a video signal, a configuration is adopted in which presence / absence of inter prediction processing is adapted to each block divided according to predetermined block division processing. I am doing so.

最初に、図１の画像特徴記述子符号化装置の処理内容を説明する。
まず、画像特徴記述子符号化装置の入力画像前処理部１に映像信号が入力され（ステップＳＴ１）、予め定義した判断方法に基づいて入力された映像信号に前処理を実施する必要があるか否かの判断を実施する（ステップＳＴ２）。前処理の必要があると判断された場合、映像信号に対して前処理を実施する（ステップＳＴ３）。前処理の有無の判断の例としては、解像度が予め定義したサイズ以上であれば定義したサイズへの縮小処理を前処理として必要と判断する、等が挙げられる。そして、前処理が必要と判断した場合には前処理として定義したサイズへの縮小処理を実施する。その他の前処理の例としては、色フォーマット変換や、ノイズ除去処理等のフィルタ処理が挙げられる。 First, the processing contents of the image feature descriptor encoding apparatus in FIG. 1 will be described.
First, whether a video signal is input to the input image preprocessing unit 1 of the image feature descriptor encoding apparatus (step ST1), and it is necessary to perform preprocessing on the input video signal based on a predetermined determination method A determination of whether or not is made (step ST2). When it is determined that preprocessing is necessary, preprocessing is performed on the video signal (step ST3). As an example of the determination of the presence / absence of preprocessing, it is possible to determine that reduction processing to a defined size is necessary as preprocessing if the resolution is equal to or larger than a predetermined size. If it is determined that preprocessing is necessary, reduction processing to the size defined as preprocessing is performed. Examples of other preprocessing include color format conversion and filter processing such as noise removal processing.

次に、変換後の映像信号に対して、特徴記述子生成部２はピクチャ毎に特徴点の抽出を実施する（ステップＳＴ４）。なお、上記ＳＴ２において前処理不要と判断された場合は、ＳＴ１で入力された映像信号に対してＳＴ４を実施することとなる。なお、特徴点抽出処理では、制御部４の制御指示に基づき、抽出した特徴点の選択処理を実施する。例えば記述子情報符号化部３の出力する符号化ビットストリームの伝送ビットレート（あるいは符号化ビットストリームのデータサイズ）を制御部４が制御する場合、記述子情報符号化部３で符号化される記述子情報を符号化して得られる符号化ビットストリームのビットレートが制御目標のビットレート以下となるように特徴点数を制御する。具体的には、まず、各特徴点の重要度を算出する評価関数を定義し、定義した評価関数が示す各特徴点のスコアに基づいて重要度の高い特徴点順にインデックスｐ（ｐ＝１、２、・・・、Ｐ、Ｐはピクチャ内の特徴点数）を振る。そして各特徴点の予測符号量Ｃｐを算出する。そして、下記式を満たすＰＭａｘを算出する。ＰＭａｘが当該ピクチャの特徴点数となる。 Next, the feature descriptor generator 2 extracts feature points for each picture from the converted video signal (step ST4). If it is determined in ST2 that preprocessing is not necessary, ST4 is performed on the video signal input in ST1. In the feature point extraction process, the extracted feature point is selected based on a control instruction from the control unit 4. For example, when the control unit 4 controls the transmission bit rate (or data size of the encoded bit stream) of the encoded bit stream output from the descriptor information encoding unit 3, the descriptor information encoding unit 3 encodes the encoded bit stream. The number of feature points is controlled so that the bit rate of the encoded bit stream obtained by encoding the descriptor information is equal to or lower than the target bit rate. Specifically, first, an evaluation function for calculating the importance of each feature point is defined, and an index p (p = 1, in order of feature points having the highest importance based on the score of each feature point indicated by the defined evaluation function. 2,..., P, P are assigned the number of feature points in the picture. Then, the prediction code amount Cp of each feature point is calculated. Then, PMax satisfying the following formula is calculated. PMax is the number of feature points of the picture.

ただし、Ｔは制御目標のビットレート及び符号化済みピクチャの符号量から算出される当該ピクチャでの制御目標符号量である。Ｔの算出例を下記に示す。

However, T is the control target code amount for the picture calculated from the bit rate of the control target and the code amount of the encoded picture. An example of calculating T is shown below.

ここで、Ｒは制御目標のビットレート、Ｆｒはフレームレート、Ｍは１秒毎にカウントが０となるピクチャ番号（Ｍ＝１、２、・・・、Ｆｒ）である。

Here, R is a control target bit rate, Fr is a frame rate, and M is a picture number (M = 1, 2,..., Fr) whose count is 0 every second.

そして、同じく特徴記述子生成部２にて、選択処理後の特徴点に対して特徴記述子の生成が行われる（ステップＳＴ５）。特徴点の例は上述した通りである。特徴点の位置情報と生成した特徴記述子は記述子情報として記述子情報符号化部３に出力する。
記述子情報符号化部３では、制御部４の制御指示に基づいて記述子情報を符号化して符号化ビットストリームを生成する（ステップＳＴ６）。具体的な符号化処理の動作について次に説明する。 Similarly, the feature descriptor generation unit 2 generates a feature descriptor for the feature point after the selection process (step ST5). Examples of feature points are as described above. The position information of the feature point and the generated feature descriptor are output to the descriptor information encoding unit 3 as descriptor information.
The descriptor information encoding unit 3 encodes the descriptor information based on the control instruction of the control unit 4 to generate an encoded bit stream (step ST6). A specific encoding process operation will be described next.

図１６は、ステップＳＴ６の詳細な処理内容を示すフローチャートである。ステップＳＴ６ではピクチャ単位の記述子情報が入力される度に図１６に示す処理を実施する。
まず、記述子情報用メモリ１０４は制御部４に従って復号対象ピクチャに対するインター予測で用いる参照ピクチャを設定し、参照ピクチャリストを更新する（ステップＳＴ１１）。次に、符号化部１０７は符号化対象ピクチャのピクチャデータのヘッダ情報、上記ピクチャデータのヘッダ情報が参照するピクチャレベルヘッダを符号化する（ステップＳＴ１２）。ただし、当該ピクチャがシーケンス先頭である場合、シーケンスレベルヘッダも符号化する。また、符号化対象ピクチャで参照するピクチャレベルヘッダが既に符号化済みである場合はピクチャレベルヘッダの符号化は実施しない。 FIG. 16 is a flowchart showing the detailed processing contents of step ST6. In step ST6, the process shown in FIG. 16 is performed every time descriptor information in units of pictures is input.
First, the descriptor information memory 104 sets a reference picture to be used for inter prediction on a decoding target picture according to the control unit 4, and updates the reference picture list (step ST11). Next, the encoding unit 107 encodes the header information of the picture data of the picture to be encoded and the picture level header referred to by the header information of the picture data (step ST12). However, if the picture is the head of the sequence, the sequence level header is also encoded. In addition, when the picture level header referred to in the encoding target picture has already been encoded, the picture level header is not encoded.

次に符号化対象ピクチャに特徴記述子が１以上存在するか否か（記述子存在フラグが“１（真）”か否か）を判断する（ステップＳＴ１３）。記述子存在フラグが“０（偽）”である場合、図１６の処理、即ち図１５のステップＳＴ６の処理を終了する。記述子存在フラグが“１（真）”である場合、ピクチャのブロック分割を先述した図７〜図１１で説明した方法等で実施する（ステップＳＴ１４）。ただし、図１１の場合は基準ブロックの分割までを実施する。そして最初の符号化対象となる基準ブロック（符号化対象ブロック）を設定し（ステップＳＴ１５）、図７〜図１１の処理順で基準ブロック単位の符号化処理を実施する。 Next, it is determined whether or not one or more feature descriptors exist in the picture to be encoded (whether or not the descriptor presence flag is “1 (true)”) (step ST13). When the descriptor presence flag is “0 (false)”, the process of FIG. 16, that is, the process of step ST6 of FIG. When the descriptor presence flag is “1 (true)”, block division of the picture is performed by the method described with reference to FIGS. 7 to 11 described above (step ST14). However, in the case of FIG. 11, the process up to the division of the reference block is performed. Then, a reference block (encoding target block) that is the first encoding target is set (step ST15), and encoding processing in units of reference blocks is performed in the processing order of FIGS.

まず、符号化対象ブロックに１以上の記述子が存在するか否かを確認し（ステップＳＴ１６）、符号化対象ブロック内に１以上の記述子が存在する場合はブロック単位の記述子有無フラグとして“１（記述子有り）”を符号化する（ステップＳＴ１７）。一方、符号化対象ブロック内に記述子が存在しない場合はブロック単位の記述子有無フラグとして“０（記述子無し）”を符号化することで当該画素の記述子情報の符号化を終了する（ステップＳＴ１８）。 First, it is confirmed whether or not one or more descriptors exist in the encoding target block (step ST16), and when one or more descriptors exist in the encoding target block, a block unit descriptor presence flag is set. "1 (with descriptor)" is encoded (step ST17). On the other hand, when there is no descriptor in the encoding target block, encoding “0 (no descriptor)” is encoded as the descriptor presence / absence flag for each block, thereby completing the encoding of the descriptor information of the pixel ( Step ST18).

そして、符号化対象ブロック内に１以上の記述子が存在する場合は、制御部４に従って符号化対象ブロック内の予測モード及び変換モードを決定する（ステップ１９）。そして、符号化対象ブロック内の最初の符号化対象画素を設定し（ステップＳＴ２０）、符号化対象画素の記述子情報の符号化を実施する（ステップＳＴ２１）。符号化対象画素の記述子情報の符号化の詳細な処理については後で説明する。符号化対象ブロック内の最後の画素に対する符号化処理が完了するまで（ステップＳＴ２２）ラスタスキャン等の予め定めた処理順に従って画素毎に記述子情報の符号化を実施する（ステップＳＴ２３）。 If one or more descriptors exist in the encoding target block, the prediction mode and the transformation mode in the encoding target block are determined according to the control unit 4 (step 19). Then, the first encoding target pixel in the encoding target block is set (step ST20), and the descriptor information of the encoding target pixel is encoded (step ST21). Detailed processing of encoding the descriptor information of the encoding target pixel will be described later. Until the encoding process for the last pixel in the encoding target block is completed (step ST22), the descriptor information is encoded for each pixel in accordance with a predetermined processing order such as raster scan (step ST23).

符号化対象ブロック内の全ての画素の記述子情報の符号化を実施した後、符号化対象ピクチャ内の最後のブロックに対する符号化処理が完了するまで（ステップＳＴ２４）予め定めた処理順に従って基準ブロック毎に記述子情報の符号化を実施する（ステップＳＴ２５）。
ただし、図１１の処理順の場合、ステップＳＴ１９〜ステップＳＴ２３の処理は符号化対象ブロックが最小ブロックである場合のみ実施する。即ち、符号化対象ブロックが最小ブロックとなるまで基準ブロックを階層的に分割し、各階層の各ブロックにおいてブロック内に１以上の記述子が存在するか否かの記述子有無フラグを符号化するものとする。 After encoding the descriptor information of all the pixels in the encoding target block, until the encoding process for the last block in the encoding target picture is completed (step ST24), the reference block according to a predetermined processing order The descriptor information is encoded every time (step ST25).
However, in the case of the processing order of FIG. 11, the processing of step ST19 to step ST23 is performed only when the encoding target block is the minimum block. That is, the reference block is divided hierarchically until the encoding target block becomes the minimum block, and the descriptor presence / absence flag indicating whether one or more descriptors exist in the block in each block of each hierarchy is encoded. Shall.

図１７は、ステップＳＴ２１の詳細な処理内容を示すフローチャートである。まず、符号化対象画素に記述子情報が存在するか、すなわち、符号化対象画素が特徴記述子生成部２から出力された特徴点であるか否かを判断する（ステップＳＴ３１）。そして、符号化対象画素に記述子情報が存在する（符号化対象画素が特徴点である）場合、画素単位の記述子有無フラグとして“１（記述子有り）”を符号化する（ステップＳＴ３２）。一方、符号化対象画素に記述子情報が存在しない（符号化対象画素が特徴点でない）場合、画素単位の記述子有無フラグとして“０（記述子無し）”を符号化する（ステップＳＴ３３）。
さらに、符号化対象画素に記述子情報が存在する場合、符号化対象画素が属する符号化対象ブロックの予測モードを参照する（ステップＳＴ３４）。参照した予測モードがインター予測モードである場合、予測部１０３は当該画素のインター予測を実施する（ステップＳＴ３５）。即ち、符号化対象記述子は参照ピクチャが持つ記述子の中から予測に用いる記述子を探索し、探索の結果検出した記述子を予測記述子信号として減算部１０５に出力する処理を実施する。このときの探索処理は制御部４によって設定された探索範囲や評価関数に基づいて実施する。評価関数の例としては符号化対象記述子が図２に示す１２８次元のベクトルである場合、各ベクトルの差分の絶対値和、もしくは各ベクトルの差分の二乗誤差和が挙げられ、本評価関数が最小となる記述子を予測記述子信号とする。 FIG. 17 is a flowchart showing the detailed processing content of step ST21. First, it is determined whether or not descriptor information exists in the encoding target pixel, that is, whether or not the encoding target pixel is a feature point output from the feature descriptor generation unit 2 (step ST31). When the descriptor information exists in the encoding target pixel (the encoding target pixel is a feature point), “1 (with descriptor)” is encoded as a descriptor presence / absence flag for each pixel (step ST32). . On the other hand, when there is no descriptor information in the encoding target pixel (the encoding target pixel is not a feature point), “0 (no descriptor)” is encoded as a descriptor presence / absence flag for each pixel (step ST33).
Furthermore, when the descriptor information exists in the encoding target pixel, the prediction mode of the encoding target block to which the encoding target pixel belongs is referred to (step ST34). When the referred prediction mode is the inter prediction mode, the prediction unit 103 performs inter prediction of the pixel (step ST35). That is, the encoding target descriptor searches for a descriptor to be used for prediction from the descriptors of the reference picture, and performs a process of outputting the detected descriptor as a prediction descriptor signal to the subtraction unit 105. The search process at this time is performed based on the search range and the evaluation function set by the control unit 4. As an example of the evaluation function, when the encoding target descriptor is the 128-dimensional vector shown in FIG. 2, the absolute value sum of the difference of each vector or the square error sum of the difference of each vector can be cited. The smallest descriptor is the predicted descriptor signal.

次に、決定した予測記述子信号の位置を示す動きベクトルと周囲の符号化済み動きベクトルから得られる予測ベクトルとを減算して差分ベクトルを得る（ステップＳＴ３６）。予測ベクトルについて次に詳細に説明する。 Next, a motion vector indicating the position of the determined prediction descriptor signal is subtracted from a prediction vector obtained from surrounding encoded motion vectors to obtain a difference vector (step ST36). Next, the prediction vector will be described in detail.

図１８（ａ）〜（ｄ）は予測ベクトルの候補の例を示す。各例それぞれ、符号化対象画素の属する符号化対象ブロック単位に符号化済みの動きベクトルから予測ベクトル候補を生成している。また、各例それぞれ符号化対象ブロックが４×４画素の例を示している。図１８（ａ）は符号化対象ブロックに対して、左下に隣接する画素（Ａ０）、右上に隣接する画素（Ｂ０）、左上に隣接する画素（Ｂ２）と、Ａ０の一つ上の隣接画素（Ａ１）、Ｂ０の一つ左の隣接画素（Ｂ１）、符号化対象ブロックの中心から右斜め下の位置にある符号化順で一つ前のピクチャの画素（Ｃ）の動きベクトルを予測ベクトル候補としている。ただし、上記画素の内、動きベクトルが存在しない画素については予め定めた所定の手順に従って予測ベクトル候補を設定する。所定の手順の例としては符号化済みの動きベクトルの内、動きベクトルが存在しない画素に空間的距離が最も近い位置にある動きベクトルを設定する方法や、零ベクトル等の固定のベクトルを設定する手法がある。上記空間的距離の例としてはユークリッド距離、マンハッタン距離等が挙げられる。また、Ｃについては、上記位置の代わりに符号化対象ブロックに対して右下に隣接する位置としてもよい。 FIGS. 18A to 18D show examples of prediction vector candidates. In each example, a prediction vector candidate is generated from a motion vector that has been encoded for each encoding target block to which the encoding target pixel belongs. In addition, each example shows an example in which the encoding target block is 4 × 4 pixels. FIG. 18A shows a pixel (A0) adjacent to the lower left, a pixel (B0) adjacent to the upper right, a pixel (B2) adjacent to the upper left, and an adjacent pixel above A0 with respect to the encoding target block. (A1), a motion vector of the pixel (C) of the previous picture in the encoding order at the position diagonally lower right from the center of the encoding target block (B1), one adjacent pixel to the left of B0 It is a candidate. However, for the pixels having no motion vector among the above-mentioned pixels, prediction vector candidates are set according to a predetermined procedure. As an example of the predetermined procedure, among the encoded motion vectors, a method of setting a motion vector having a spatial distance closest to a pixel where no motion vector exists, or a fixed vector such as a zero vector is set. There is a technique. Examples of the spatial distance include Euclidean distance and Manhattan distance. Further, C may be a position adjacent to the encoding target block at the lower right instead of the above position.

上記予測候補ベクトルを設定後、制御部４によって設定された評価関数に基づいて予測ベクトルの選択を実施する。評価関数の例としては符号化対象の動きベクトルと予測ベクトル候補との差分ベクトルを（ｄｘ，ｄｙ）とした場合の各成分の絶対値和｜ｄｘ｜＋｜ｄｙ｜や各成分の二乗和ｄｘ^２＋ｄｙ^２が挙げられ、本評価関数が最小となる予測ベクトル候補を予測ベクトルとする。
そして、各予測ベクトル候補にインデックスを設定し（例えばＡ０、Ｂ０、Ｂ２、Ｃ、Ａ１、Ｂ１の順にインデックスを付与）、予測ベクトルとして選択した予測ベクトル候補のインデックスを予測ベクトル情報に設定する。 After setting the prediction candidate vector, a prediction vector is selected based on the evaluation function set by the control unit 4. As an example of the evaluation function, the absolute value sum | dx | + | dy | of each component when the difference vector between the motion vector to be encoded and the prediction vector candidate is (dx, dy) or the square sum dx of each component ² + dy ² is cited, and a prediction vector candidate that minimizes this evaluation function is set as a prediction vector.
Then, an index is set for each prediction vector candidate (for example, indexes are assigned in the order of A0, B0, B2, C, A1, and B1), and the index of the prediction vector candidate selected as the prediction vector is set in the prediction vector information.

図１８（ｂ）〜図１８（ｄ）は図１８（ａ）から一部の予測ベクトル候補を除いたものを示す。予測ベクトル候補が少ない程、動きベクトルの予測精度は低下するが、インデックスの最大値が小さくなることから予測ベクトル情報に係わる符号化データの符号量を削減することができる。 18 (b) to 18 (d) show a part obtained by removing some prediction vector candidates from FIG. 18 (a). As the number of prediction vector candidates decreases, the motion vector prediction accuracy decreases. However, since the maximum value of the index decreases, the amount of encoded data related to the prediction vector information can be reduced.

また、図１８の例以外の予測ベクトルの候補の例としては、符号化順で直近のＭ個（Ｍ：１以上の整数）の符号化済み動きベクトルを予測ベクトルの候補とする方法もある。この場合、図１８の例と比較して厳密な空間的相関を利用しないため、予測効率は低下するが、予測ベクトル候補の設定方法が簡易であるため、処理負荷が低い特徴がある。また、予測ベクトルの候補数Ｍはシーケンスレベルヘッダあるいはピクチャレベルヘッダで符号化するようにしてもよい。このようにすることで、記述子情報の時間的変化に応じた候補数の設定が可能となり、予測ベクトル情報の符号量と動きベクトルの予測効率を考慮した符号化制御が可能となる。 In addition, as an example of prediction vector candidates other than the example of FIG. 18, there is a method in which the most recent M encoded motion vectors (M: an integer of 1 or more) in the encoding order are used as prediction vector candidates. In this case, since the strict spatial correlation is not used as compared with the example of FIG. 18, the prediction efficiency is lowered, but the method for setting the prediction vector candidate is simple, and thus there is a feature that the processing load is low. Further, the number M of prediction vector candidates may be encoded by a sequence level header or a picture level header. In this way, it is possible to set the number of candidates according to the temporal change in descriptor information, and it is possible to perform coding control in consideration of the code amount of prediction vector information and the prediction efficiency of motion vectors.

ステップＳＴ３６の後、符号化対象画素におけるインター予測情報は符号化部１０７に出力され、符号化部１０７にて上記インター予測情報の符号化を実施する（ステップＳＴ３７）。ここで、インター予測情報は、予測記述子の存在する参照ピクチャを特定する参照ピクチャ情報と、上記予測ベクトル情報、上記差分ベクトルから構成される。
そして、符号化対象画素の記述子信号から、予測部１０３で生成した予測記述子信号を減算して差分記述子信号を生成し（ステップＳＴ３８）、これを変換部１０６に出力する処理を実施する。変換部１０６は入力された差分記述子信号を変換する処理を実施する（ステップＳＴ３９）。 After step ST36, the inter prediction information in the encoding target pixel is output to the encoding unit 107, and the encoding unit 107 encodes the inter prediction information (step ST37). Here, the inter prediction information includes reference picture information that identifies a reference picture in which a prediction descriptor exists, the prediction vector information, and the difference vector.
Then, a difference descriptor signal is generated by subtracting the prediction descriptor signal generated by the prediction unit 103 from the descriptor signal of the encoding target pixel (step ST38), and a process of outputting this to the conversion unit 106 is performed. . The conversion unit 106 performs a process of converting the input differential descriptor signal (step ST39).

一方、ステップＳＴ３４において、参照した予測モードが直接符号化モードである場合、符号化対象画素の記述子信号をそのまま変換部１０６に出力し、変換部１０６にて入力された記述子信号を変換する処理を実施する（ステップＳＴ４０）。 On the other hand, if the referenced prediction mode is the direct encoding mode in step ST34, the descriptor signal of the encoding target pixel is output to the conversion unit 106 as it is, and the descriptor signal input by the conversion unit 106 is converted. Processing is performed (step ST40).

また、変換部１０６はステップＳＴ３９又はステップＳＴ４０で生成した変換信号を符号化部１０７に出力する。符号化部１０７は入力された変換信号を符号化する（ステップＳＴ４１）。 Moreover, the conversion part 106 outputs the conversion signal produced | generated by step ST39 or step ST40 to the encoding part 107. Encoding section 107 encodes the input converted signal (step ST41).

なお、上記では予測モード及び変換モードはブロック単位に実施するものとして説明したが、予測モード及び変換モードを画素（特徴点）単位で切り替えるように構成してもよい。その場合、ステップＳＴ１９は実施せず、ステップＳＴ３２の後に符号化対象画素の予測モード及び変換モードを決定し、符号化するようにする。このようにすることで予測モード及び変換モードに係わる符号化データの符号量は増大するが、より高精度な予測処理及び変換処理が実施でき、変換信号に係わる符号化データの符号量を削減することができる。 In the above description, the prediction mode and the conversion mode are described as being performed in units of blocks. However, the prediction mode and the conversion mode may be switched in units of pixels (feature points). In this case, step ST19 is not performed, and after step ST32, the prediction mode and the conversion mode of the pixel to be encoded are determined and encoded. By doing so, the code amount of the encoded data related to the prediction mode and the conversion mode increases, but more accurate prediction processing and conversion processing can be performed, and the code amount of the encoded data related to the conversion signal is reduced. be able to.

また、上記では予測ベクトル情報を画素単位（特徴点単位）に算出し、符号化するように構成したが、符号他対象ブロック単位に算出、符号化するようにしてもよい。その場合、ステップＳＴ１９にて予測ベクトル情報を決定し符号化するものとする。このようにすることで各特徴点の動きベクトルの予測精度は低下するものの、予測ベクトル情報に係わる符号化データの符号量を削減することができる。 In the above description, the prediction vector information is calculated and encoded in units of pixels (feature points), but may be calculated and encoded in units of code and other target blocks. In that case, prediction vector information shall be determined and encoded in step ST19. By doing so, although the prediction accuracy of the motion vector of each feature point decreases, the code amount of the encoded data related to the prediction vector information can be reduced.

次に、画像特徴記述子復号装置の動作について説明する。図１９は、この発明の実施の形態１による画像特徴記述子復号装置の処理内容（画像特徴記述子復号方法）を示すフローチャートである。画像特徴記述子復号装置はピクチャ毎に図１９に示す処理を実施して記述子情報の復号を実現している。
まず復号部２０１は復号対象ピクチャのピクチャデータのヘッダ情報、上記ピクチャデータのヘッダ情報が参照するピクチャレベルヘッダを復号する（ステップＳＴ５１）。ただし、当該ピクチャがシーケンスの先頭である場合、シーケンスレベルヘッダも復号する。また、復号対象ピクチャで参照するピクチャレベルヘッダが既に復号済みである場合はピクチャレベルヘッダの復号は実施しない。 Next, the operation of the image feature descriptor decoding apparatus will be described. FIG. 19 is a flowchart showing the processing contents (image feature descriptor decoding method) of the image feature descriptor decoding apparatus according to Embodiment 1 of the present invention. The image feature descriptor decoding apparatus performs the processing shown in FIG. 19 for each picture to realize decoding of descriptor information.
First, the decoding unit 201 decodes the header information of the picture data of the decoding target picture and the picture level header referred to by the header information of the picture data (step ST51). However, when the picture is the head of the sequence, the sequence level header is also decoded. Also, when the picture level header referred to in the decoding target picture has already been decoded, the picture level header is not decoded.

次に復号したヘッダ情報を基に参照ピクチャリストを生成する（ステップＳＴ５２）。そして、復号した参照ピクチャリストを参照ピクチャ管理情報として記述子情報用メモリ２０４に渡す。そして、復号したヘッダ情報の一つである記述子存在フラグを参照し（ステップＳＴ５３）、記述子存在フラグが“１（記述子有り）”である場合、ピクチャのブロック分割を記述子情報符号化部３で説明した図７〜図１１のいずれかの方法で実施する（ステップＳＴ５４）。ただし、図１１の場合は基準ブロックの分割までを実施する。そして最初の復号対象となる基準ブロック（復号対象ブロック）を設定し（ステップＳＴ５５）、図７〜図１１の処理順で基準ブロック単位の復号処理を実施する。一方、ステップＳＴ５３において記述子存在フラグが“０（記述子成し）”である場合、当該ピクチャの記述子情報の符号化処理を終了し、次の符号化ピクチャへと進む。 Next, a reference picture list is generated based on the decoded header information (step ST52). Then, the decoded reference picture list is transferred to the descriptor information memory 204 as reference picture management information. Then, the descriptor presence flag which is one of the decoded header information is referred to (step ST53). When the descriptor presence flag is “1 (with descriptor)”, the block division of the picture is encoded with the descriptor information. The method is carried out by any one of the methods shown in FIGS. 7 to 11 described in the section 3 (step ST54). However, in the case of FIG. 11, the process up to the division of the reference block is performed. Then, the first reference block (decoding target block) to be decoded is set (step ST55), and decoding processing in units of reference blocks is performed in the processing order of FIGS. On the other hand, when the descriptor presence flag is “0 (descriptor formed)” in step ST53, the coding process of the descriptor information of the picture is finished, and the process proceeds to the next coded picture.

復号対象ブロック単位の復号処理として、復号部２０１は復号対象ブロックの記述子データを復号する（ステップＳＴ５６）。上記復号対象ブロック単位の記述子データとしてはブロック単位の記述子有無フラグが存在し、上記記述子有無フラグが“１（記述子有り）”である場合、さらに予測モード及び変換モードが存在する。そして、復号対象ブロックの記述子有無フラグが“１（記述子有り）”である場合（ステップＳＴ５７）、復号対象ブロック内の最初の処理対象画素を復号対象画素に設定し（ステップＳＴ５８）、復号対象画素の記述子情報の復号を実施する（ステップＳＴ５９）。復号対象画素の記述子情報の復号の詳細な処理については後で説明する。復号対象ブロック内の最後の画素に対する符号化処理が完了するまで（ステップＳＴ６０）ラスタスキャン等の予め定めた画像特徴記述子符号化装置と同一の処理順に従って画素毎に記述子情報の符号化を実施する（ステップＳＴ６１）。 As a decoding process for each decoding target block, the decoding unit 201 decodes the descriptor data of the decoding target block (step ST56). The descriptor data for each block to be decoded includes a descriptor presence flag for each block. When the descriptor presence flag is “1 (descriptor present)”, a prediction mode and a conversion mode further exist. When the descriptor presence / absence flag of the decoding target block is “1 (with descriptor)” (step ST57), the first processing target pixel in the decoding target block is set as the decoding target pixel (step ST58). Decoding of the descriptor information of the target pixel is performed (step ST59). Detailed processing for decoding the descriptor information of the decoding target pixel will be described later. Until the encoding process for the last pixel in the decoding target block is completed (step ST60), the descriptor information is encoded for each pixel in accordance with the same processing order as that of a predetermined image feature descriptor encoding apparatus such as a raster scan. Implement (step ST61).

復号対象ブロック内の全ての画素の記述子情報の復号を実施した後、復号対象ピクチャ内の最後のブロックに対する復号処理が完了するまで（ステップＳＴ６２）予め定めた画像特徴記述子符号化装置と同一の処理順に従って基準ブロック毎に記述子情報の復号を実施する（ステップＳＴ６３）。一方、ステップＳＴ５７において復号対象ブロックの記述子有無フラグが“０（記述子無し）”である場合、当該ブロックでの記述子情報の復号処理を終了し、次の処理ブロックの復号処理に進む。 After decoding the descriptor information of all the pixels in the decoding target block, until the decoding process for the last block in the decoding target picture is completed (step ST62), the same as the predetermined image feature descriptor encoding device The descriptor information is decoded for each reference block in accordance with the processing order (step ST63). On the other hand, when the descriptor presence / absence flag of the block to be decoded is “0 (no descriptor)” in step ST57, the decoding process of the descriptor information in the block is terminated, and the process proceeds to the decoding process of the next processing block.

ただし、図１１の処理順の場合、ステップＳＴ５８〜ステップＳＴ６１の処理は復号対象ブロックが最小ブロックである場合のみ実施する。即ち、復号対象ブロックが最小ブロックとなるか、あるいはブロック内に記述子が存在しなくなるまで基準ブロックを階層的に分割し、各階層の各ブロックにおいてブロック内に１以上の記述子が存在するか否かの記述子有無フラグを復号するものとする（ステップＳＴ５６）。 However, in the case of the processing order of FIG. 11, the processing of step ST58 to step ST61 is performed only when the decoding target block is the minimum block. In other words, whether the reference block is hierarchically divided until the decoding target block becomes the minimum block, or no descriptor exists in the block, and one or more descriptors exist in the block in each block of each hierarchy The descriptor presence / absence flag indicating whether or not is decoded (step ST56).

図２０は、ステップＳＴ５９の詳細な処理内容を示すフローチャートである。まず、復号対象画素の記述子データを復号する（ステップＳＴ７１）。この画素単位の記述子データとしては画素単位の記述子有無フラグがあり、上記記述子有無フラグが“１（記述子有り）”を示す場合、さらに変換信号及びインター予測情報が存在する。そして、復号対象画素の記述子有無フラグが“１（記述子有り）”を示す場合（ステップＳＴ７２）、逆変換部２０２は復号対象画素の属する復号対象ブロックの変換モードに基づいて変換信号の逆変換を実施して差分記述子信号を得る（ステップＳＴ７３）。 FIG. 20 is a flowchart showing the detailed processing content of step ST59. First, the descriptor data of the decoding target pixel is decoded (step ST71). This pixel unit descriptor data includes a pixel unit descriptor presence / absence flag. When the descriptor presence / absence flag indicates “1 (descriptor present)”, a conversion signal and inter prediction information further exist. When the descriptor presence / absence flag of the decoding target pixel indicates “1 (descriptor present)” (step ST72), the inverse transform unit 202 reverses the conversion signal based on the conversion mode of the decoding target block to which the decoding target pixel belongs. Conversion is performed to obtain a differential descriptor signal (step ST73).

そして、復号した符号化対象画素の予測モードがインター予測モードである場合（ステップＳＴ７４）、予測部２０３は復号したインター予測情報を用いたインター予測を実施する（ステップＳＴ７５）。具体的には上述した本実施の形態１の画像特徴記述子符号化装置の予測ベクトル候補と同一の手法によって予測ベクトル候補を生成し、インター予測情報の一部である予測ベクトル情報である予測ベクトル候補のインデックスから予測ベクトルを特定する。そして、同じくインター予測情報の一部である差分ベクトルと上記予測ベクトルを加算して動きベクトルを生成する。そして、同じくインター予測情報の一部である参照ピクチャ情報が示す参照ピクチャに対して、上記動きベクトルが指す位置（画素）の特徴記述子を予測記述子信号とする。
加算部２０６は、逆変換部２０２から出力される上記差分記述子信号と予測部２０３から出力される上記予測記述子信号の加算値を記述子信号として出力する（ステップＳＴ７６）。
一方、ステップＳＴ７４において、復号した符号化対象画素の予測モードが直接符号化モードである場合、逆変換部２０２から出力される上記差分記述子信号を記述子信号として出力する。 When the prediction mode of the decoded pixel to be encoded is the inter prediction mode (step ST74), the prediction unit 203 performs inter prediction using the decoded inter prediction information (step ST75). Specifically, a prediction vector candidate is generated by the same method as the prediction vector candidate of the image feature descriptor encoding apparatus of the first embodiment described above, and the prediction vector is prediction vector information that is a part of the inter prediction information. A prediction vector is identified from the candidate index. Then, a difference vector that is also a part of the inter prediction information and the prediction vector are added to generate a motion vector. Then, for the reference picture indicated by the reference picture information that is also a part of the inter prediction information, the feature descriptor of the position (pixel) indicated by the motion vector is used as a prediction descriptor signal.
The adding unit 206 outputs the addition value of the difference descriptor signal output from the inverse transform unit 202 and the prediction descriptor signal output from the prediction unit 203 as a descriptor signal (step ST76).
On the other hand, if the prediction mode of the decoded pixel to be encoded is the direct encoding mode in step ST74, the difference descriptor signal output from the inverse transform unit 202 is output as a descriptor signal.

さらに、ステップＳＴ７２において復号対象画素の記述子有無フラグが“０（記述子無し）”である場合、当該画素での記述子情報の復号処理、すなわちステップＳＴ５９の処理を終了する。 Further, when the descriptor presence / absence flag of the pixel to be decoded is “0 (no descriptor)” in step ST72, the decoding process of the descriptor information at the pixel, that is, the process of step ST59 is ended.

以上で明らかなように、この実施の形態１によれば、画像特徴記述子符号化装置の予測部１０３が各特徴記述子に対して時間方向の相関を利用した動き補償予測を実施すると共に、動き補償予測に用いる動きベクトルについても周囲のベクトルからの予測を実施して差分ベクトルを生成し、減算部１０５が予測部１０３が生成した予測記述子信号を用いて差分予測記述子信号を生成し、変換部１０６が差分予測記述子信号を変換信号に変換し、符号化部１０７が変換信号及び動き差分ベクトルを符号化するように構成したので、動き補償に用いる動きベクトル情報を高効率に符号化しつつ、高精度に記述子信号を予測してその差分情報を高効率に符号化することができるため、記述子信号を含む記述子情報の高効率符号化を実現することができる効果がある。 As is apparent from the above, according to the first embodiment, the prediction unit 103 of the image feature descriptor encoding apparatus performs motion compensation prediction using correlation in the time direction for each feature descriptor, The motion vector used for motion compensation prediction is also predicted from surrounding vectors to generate a difference vector, and the subtraction unit 105 generates a difference prediction descriptor signal using the prediction descriptor signal generated by the prediction unit 103. Since the conversion unit 106 converts the differential prediction descriptor signal into a conversion signal and the encoding unit 107 encodes the conversion signal and the motion difference vector, the motion vector information used for motion compensation is encoded with high efficiency. It is possible to predict the descriptor signal with high accuracy and encode the difference information with high efficiency, thereby realizing highly efficient encoding of descriptor information including the descriptor signal. There is that effect.

また、この実施の形態１によれば、上記効果を持つ画像特徴記述子符号化装置及び画像特徴記述子符号化方法が生成する符号化ビットストリームを正しく復号することができる画像特徴記述子復号装置及び画像特徴記述子復号方法が得られる効果を奏する。 Further, according to the first embodiment, the image feature descriptor decoding device capable of correctly decoding the encoded bitstream generated by the image feature descriptor encoding device and the image feature descriptor encoding method having the above-described effects. In addition, the image feature descriptor decoding method can be obtained.

実施の形態２．
実施の形態１では、インター予測で生成される動きベクトルを予測符号化するための予測ベクトル候補を符号化対象ブロック単位で生成していたが、本実施の形態２では、画素単位に予測ベクトル候補とする。図２１に候補の位置の例を示す。図２１に示す符号化対象画素に隣接する画素を予測ベクトル候補とする。ただし、符号化済でない画素位置については固定のベクトルを設定する。 Embodiment 2. FIG.
In Embodiment 1, prediction vector candidates for predictive encoding motion vectors generated by inter prediction are generated in units of encoding target blocks. However, in Embodiment 2, prediction vector candidates are generated in units of pixels. And FIG. 21 shows examples of candidate positions. A pixel adjacent to the encoding target pixel shown in FIG. 21 is set as a prediction vector candidate. However, a fixed vector is set for pixel positions that are not encoded.

また、上記予測ベクトル候補の画素の内、動きベクトルが存在しない画素については予め定めた所定の手順に従って予測ベクトル候補を設定する。所定の手順の例としては符号化済みの動きベクトルの内、動きベクトルが存在しない画素に空間的距離が最も近い位置にある動きベクトルを設定する方法や、零ベクトル等の固定のベクトルを設定する手法がある。上記空間的距離の例としてはユークリッド距離、マンハッタン距離等が挙げられる。
このように予測ベクトル候補を画素単位に更新可能とすることで予測ベクトルの精度が向上し高い符号化効率が期待できる。 In addition, among the prediction vector candidate pixels, a prediction vector candidate is set according to a predetermined procedure for pixels in which no motion vector exists. As an example of the predetermined procedure, among the encoded motion vectors, a method of setting a motion vector having a spatial distance closest to a pixel where no motion vector exists, or a fixed vector such as a zero vector is set. There is a technique. Examples of the spatial distance include Euclidean distance and Manhattan distance.
Thus, by making it possible to update prediction vector candidates in units of pixels, the accuracy of the prediction vector is improved, and high coding efficiency can be expected.

図２１（ａ）〜図２１（ｆ）は予測ベクトル候補数がそれぞれ異なる。予測ベクトル候補が少ない程、動きベクトルの予測精度は低下するが、インデックスの最大値が小さくなることから予測ベクトル情報に係わる符号化データの符号量を削減することができる。 21 (a) to 21 (f) differ in the number of prediction vector candidates. As the number of prediction vector candidates decreases, the motion vector prediction accuracy decreases. However, since the maximum value of the index decreases, the amount of encoded data related to the prediction vector information can be reduced.

さらに、上記予測ベクトル候補について、符号化（復号）対象ブロックの外を示す場合は固定のベクトルを設定する等として符号化済の動きベクトルを利用しないようにしてもよい。この場合、符号化（復号）対象ブロック単位にそれぞれ独立に符号化（復号）することが可能となる。これにより、並列化処理が実現でき処理の高速化が期待できる。 Furthermore, for the prediction vector candidate, when the outside of the encoding (decoding) target block is indicated, a fixed vector may be set so that the encoded motion vector is not used. In this case, it is possible to independently encode (decode) each block to be encoded (decoded). As a result, parallel processing can be realized and high-speed processing can be expected.

なお、上記で説明した事項以外、この実施の形態２は、上記実施の形態１と同じ構成、処理を実施する。 In addition to the matters described above, the second embodiment implements the same configuration and processing as the first embodiment.

なお、本願発明はその発明の範囲内において、各実施の形態の自由な組み合わせ、あるいは各実施の形態の任意の構成要素の変形、もしくは各実施の形態において任意の構成要素の省略が可能である。 In the present invention, within the scope of the invention, any combination of the embodiments, or any modification of any component in each embodiment, or omission of any component in each embodiment is possible. .

本発明にかかる画像特徴記述子符号化装置、画像特徴記述子復号装置、画像特徴記述子符号化方法及び画像特徴記述子復号方法は、高い符号化効率を実現する画像伝送システム等に適用できる。 The image feature descriptor encoding device, the image feature descriptor decoding device, the image feature descriptor encoding method, and the image feature descriptor decoding method according to the present invention can be applied to an image transmission system that realizes high encoding efficiency.

１入力画像前処理部、２特徴記述子生成部、３記述子情報符号化部、４制御部、１０１分割部、１０２切換スイッチ、１０３予測部、１０４記述子情報用メモリ、１０５減算部、１０６変換部、１０７符号化部、１０８予測部、２０１復号部、２０２逆変換部、２０３予測部、２０４記述子情報用メモリ、２０５切換スイッチ、２０６加算部、２０７予測部 DESCRIPTION OF SYMBOLS 1 Input image pre-processing part, 2 Feature descriptor production | generation part, 3 Descriptor information encoding part, 4 Control part, 101 division | segmentation part, 102 Changeover switch, 103 Prediction part, 104 Descriptor information memory, 105 Subtraction part, 106 Conversion unit, 107 encoding unit, 108 prediction unit, 201 decoding unit, 202 inverse conversion unit, 203 prediction unit, 204 descriptor information memory, 205 selector switch, 206 addition unit, 207 prediction unit

Claims

Block dividing means for dividing the input image into blocks;
In a block in which a descriptor presence / absence flag indicating that one or more descriptors are included in the block is true, when the prediction mode of the block is inter prediction, a motion vector is obtained for each feature point in the block. Prediction means that uses a coded descriptor signal on a reference picture indicated by
Conversion means for performing a conversion process on the difference value between the descriptor signal and the predicted descriptor signal according to the conversion mode of the block to generate a conversion signal;
Encode the descriptor presence flag, the prediction mode, the conversion mode, and the conversion signal of each feature point, and encode the descriptor presence flag, the prediction mode, the conversion mode, and the conversion signal of each feature point. Encoding means for generating a bitstream in which data is multiplexed;
With
The prediction means selects a prediction vector from one or more encoded motion vectors around the block that is a prediction vector candidate for each feature point, and a difference vector that is a difference between the motion vector and the prediction vector Produces
The encoding means encodes the prediction vector selection information and the difference vector for each feature point, and multiplexes the prediction vector selection information and the encoded data of the difference vector into a bitstream. An image feature descriptor encoding device.

2. The image feature descriptor encoding apparatus according to claim 1, wherein the prediction unit sets an encoded motion vector of a specific pixel among pixels adjacent to the block as the prediction vector candidate.

The predicting unit is an encoded motion vector that is spatially closest to the specific pixel when the encoded motion vector does not exist in the specific pixel among the pixels adjacent to the block as the prediction vector candidate. The image feature descriptor encoding apparatus according to claim 2, wherein:

Decode the presence / absence flag, prediction mode, conversion mode, and conversion signal of each feature point related to each block divided from the encoded data multiplexed in the bitstream, and the prediction mode is inter prediction mode. If there is, decoding means for decoding prediction vector selection information and difference vectors for each feature point;
Differential signal generating means for generating a differential descriptor signal from the transformed signal related to the encoded block decoded by the decoding means;
When the prediction mode is inter prediction, for each feature point in the block, an encoded descriptor signal on the reference picture indicated by the motion vector calculated from the prediction vector selection information and the difference vector Predicting means with a predictive descriptor signal,
Image feature descriptor generation means for generating a descriptor signal by adding the difference descriptor signal generated by the difference signal generation means and the prediction descriptor signal generated by the prediction means;
With
The prediction means selects a prediction vector based on selection information of the decoded prediction vector from one or more encoded motion vectors around the block that is a prediction vector candidate for each feature point, and An image feature descriptor decoding apparatus characterized in that a motion vector is generated by adding the decoded difference vectors, and an encoded descriptor signal on a reference picture indicated by the motion vector is used as a prediction descriptor signal.

A block division step for dividing the input image into blocks;
In a block in which a descriptor presence / absence flag indicating that one or more descriptors are included in the block is true, when the prediction mode of the block is inter prediction, a motion vector is obtained for each feature point in the block. A prediction step in which the encoded descriptor signal on the reference picture indicated by is a prediction descriptor signal;
A conversion step of performing a conversion process according to the conversion mode of the block on the difference value between the descriptor signal and the predicted descriptor signal to generate a conversion signal;
Encode the descriptor presence flag, the prediction mode, the conversion mode, and the conversion signal of each feature point, and encode the descriptor presence flag, the prediction mode, the conversion mode, and the conversion signal of each feature point. An encoding step for generating a bitstream in which data is multiplexed;
With
The prediction step selects, for each feature point, a prediction vector from one or more encoded motion vectors around the block that is a prediction vector candidate, and calculates a difference vector that is a difference between the motion vector and the prediction vector. Generate
The encoding step encodes the prediction vector selection information and the difference vector for each feature point, and multiplexes the prediction vector selection information and the encoded data of the difference vector into a bitstream. An image feature descriptor encoding method.

Decode the presence / absence flag, prediction mode, conversion mode, and conversion signal of each feature point related to each block divided from the encoded data multiplexed in the bitstream, and the prediction mode is inter prediction mode. If there is, a decoding step for decoding prediction vector selection information and a difference vector of each feature point;
A differential signal generation step of generating a differential descriptor signal from the transformed signal related to the encoded block decoded by the decoding step;
When the prediction mode is inter prediction, for each feature point in the block, an encoded descriptor signal on the reference picture indicated by the motion vector calculated from the prediction vector selection information and the difference vector A prediction step with a prediction descriptor signal,
An image feature descriptor generation step of generating a descriptor signal by adding the difference descriptor signal generated by the difference signal generation step and the prediction descriptor signal generated by the prediction step;
With
The prediction step selects, for each feature point, a prediction vector based on selection information of the decoded prediction vector from one or more encoded motion vectors around the block that is a prediction vector candidate, and the prediction vector An image feature descriptor decoding method, wherein a motion vector is generated by adding the decoded difference vectors, and an encoded descriptor signal on a reference picture indicated by the motion vector is used as a prediction descriptor signal.