JP2009212605A

JP2009212605A - Information processing method, information processor, and program

Info

Publication number: JP2009212605A
Application number: JP2008051154A
Authority: JP
Inventors: Noboru Murabayashi; 昇村林
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2008-02-29
Filing date: 2008-02-29
Publication date: 2009-09-17

Abstract

<P>PROBLEM TO BE SOLVED: To highly accurately classify respective pixel data in image data in a short time. <P>SOLUTION: A recording and reproducing device, as initial clustering processing, decides whether or not the feature value of pixel data to be input is within a prescribed threshold from the feature value of already input pixel data every time the pixel data are input, and classifies the pixel data into a first number of clusters by imparting the same ID as the one of the already input pixel data when they are within the prescribed threshold. After the initial clustering processing, the recording and reproducing device classifies the pixel data classified into the first number of clusters into a second number (target number) of clusters by K-means or the like, as main clustering processing. Since the number of the clusters of the pixel data to be the object of the main clustering processing is reduced by the initial clustering processing, the main clustering processing is accelerated. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、画像データを構成する画素データを複数のクラスタに分類するための情報処理方法、情報処理装置及びプログラムに関する。 The present invention relates to an information processing method, an information processing apparatus, and a program for classifying pixel data constituting image data into a plurality of clusters.

従来から、画像データを構成する画素データを、例えば色特徴、テクスチャ特徴等の所定の特徴データに基づいて分類する処理（クラスタリング処理）が知られている。このクラスタリング処理は、例えば画像データから複数の領域を抽出する画像セグメンテーション処理を実行するための代表的な手法となっている。画像セグメンテーション処理は、例えばオブジェクト符号化や画像検索等、画像処理の様々な分野で前処理として利用されている。 Conventionally, a process (clustering process) for classifying pixel data constituting image data based on predetermined feature data such as a color feature and a texture feature is known. This clustering process is a typical technique for executing an image segmentation process for extracting a plurality of regions from image data, for example. Image segmentation processing is used as preprocessing in various fields of image processing such as object coding and image search.

代表的なクラスタリング手法としては、例えばＫ平均法（K-means）、ファジィｃ平均法（Fussy c-means）等があるが、これらのクラスタリング手法では、データが収束してクラスタリング処理が終了するまでに膨大な時間が掛かるという問題がある。 Typical clustering methods include, for example, K-means (K-means), fuzzy c-means (Fussy c-means), etc. In these clustering methods, until the data converges and the clustering process is completed. There is a problem that it takes an enormous amount of time.

そのような問題を解決する技術の１つとして、例えば下記特許文献１には、分割対象の入力画像Ｉｉを第１の間引き率Ｒａで間引いた第１の間引き画像ＩａをＫ平均アルゴリズムにより領域分割して、入力画像Ｉｉの領域のおおよその数および中心を推定し、次に、その推定結果を用いて、入力画像Ｉｉを第１の間引き率Ｒａより小さい第２の間引き率Ｒｂで間引いた第２の間引き画像ＩｂをＫ平均アルゴリズムにより領域分割して、入力画像Ｉｉの領域の数および中心を算出し、次に、その算出された領域のうちの最も適切な領域に、入力画像Ｉｉに含まれるすべての画素を配分する、画像領域分割方法が記載されている。
特開２００５−０１１３７５号公報（段落［００１２］等） As one of the techniques for solving such a problem, for example, in Patent Document 1 below, the first thinned image Ia obtained by thinning the input image Ii to be divided at the first thinning rate Ra is divided into regions by the K average algorithm. Then, the approximate number and center of the regions of the input image Ii are estimated, and then the estimation result is used to thin out the input image Ii at a second thinning rate Rb that is smaller than the first thinning rate Ra. The thinned-out image Ib is divided into regions by the K-average algorithm to calculate the number and center of the input image Ii, and then included in the input image Ii in the most appropriate region among the calculated regions An image region dividing method for allocating all pixels to be recorded is described.
Japanese Patent Laying-Open No. 2005-011375 (paragraph [0012] etc.)

しかしながら、上記特許文献１に記載の技術では、第１の間引き画像及び第２の間引き画像をそれぞれ領域分割する際にＫ平均アルゴリズムを用いているため、間引き画像といえども、２回の領域分割処理には時間を要する。また、間引き率を高くすれば、領域分割処理時間の短縮は図れるものの、その領域分割処理の精度が低下してしまう。そして、間引き画像を用いて生成された分割領域によって、入力画像に含まれるすべての画素が配分される分割領域が確定してしまうため、間引き画像を用いた領域分割処理の精度の低下は、分割領域を用いたその後の画像処理の精度にも影響を与えてしまう。 However, since the technique described in Patent Document 1 uses a K-average algorithm when the first thinned image and the second thinned image are each divided into regions, even the thinned image is divided into two regions. Processing takes time. Further, if the thinning rate is increased, the area division processing time can be shortened, but the accuracy of the area division processing is lowered. Since the divided region generated by using the thinned image determines the divided region to which all the pixels included in the input image are allocated, the decrease in the accuracy of the region dividing process using the thinned image is reduced. This also affects the accuracy of subsequent image processing using the region.

以上のような事情に鑑み、本発明の目的は、画像データ中の各画素データを高精度かつ短時間で分類することが可能な情報処理方法、情報処理装置及びプログラムを提供することにある。 In view of the circumstances as described above, an object of the present invention is to provide an information processing method, an information processing apparatus, and a program capable of classifying pixel data in image data with high accuracy and in a short time.

上述の課題を解決するため、本発明の主たる観点に係る情報処理方法は、画像データを構成する複数の画素データを逐次入力し、前記入力される各画素データの所定の特徴値が、既入力の画素データの前記特徴値から所定範囲値内であるか否かを、前記入力毎に判定し、前記特徴値が前記所定範囲値内であると判定された画素データに、前記既入力の画素データと同一の第１の識別情報を付与し、前記特徴値が前記所定範囲値内でないと判定された画素データに、前記第１の識別情報と異なる第２の識別情報を付与して、前記複数の画素データを第１の数のクラスタに分類する。 In order to solve the above-described problem, an information processing method according to a main aspect of the present invention sequentially inputs a plurality of pixel data constituting image data, and a predetermined feature value of each input pixel data is already input. It is determined for each input whether or not the pixel value is within a predetermined range value from the feature value of the pixel data, and the pixel data that has been input to the pixel data that is determined that the feature value is within the predetermined range value The first identification information that is the same as the data is given, and the second identification information that is different from the first identification information is given to the pixel data determined that the feature value is not within the predetermined range value, A plurality of pixel data is classified into a first number of clusters.

ここで所定の特徴値とは、例えば各画素の色特徴やテクスチャ特徴等を多次元の特徴ベクトルデータとして表現したものである。 Here, the predetermined feature value is, for example, a color feature or texture feature of each pixel expressed as multidimensional feature vector data.

本発明の構成によれば、上記入力された画素データの特徴値が所定範囲値であるか否かに応じて画素データに逐次識別情報を付与することで、K-means等の従来のクラスタリング手法を用いる場合に比べて、画素データを高精度かつ短時間で分類することが可能となる。 According to the configuration of the present invention, a conventional clustering technique such as K-means is performed by sequentially adding identification information to pixel data according to whether or not the feature value of the input pixel data is a predetermined range value. It is possible to classify pixel data with high accuracy and in a short time compared to the case of using.

上記情報処理装置は、さらに、前記分類により同一のクラスタに分類された前記画素データを同一の特徴値を有する画素データと見なして、前記分類された複数の画素データを所定のクラスタリング手法により前記第１の数より少ない第２の数のクラスタに分類してもよい。 The information processing apparatus further regards the pixel data classified into the same cluster by the classification as pixel data having the same feature value, and determines the plurality of classified pixel data by the predetermined clustering method. You may classify | categorize into the 2nd number cluster smaller than the number of 1. FIG.

ここで所定のクラスタリング手法とは、例えばＫ平均法（K-means）、ファジィｃ平均法（Fussy c-means）、エントロピー法（Entropy method）、ウォード法（Ward's method）、自己組織化写像（Self-organizing maps, SOM）等であるが、これらに限られるものではない。同一の特徴値を有する画素データとは、例えば、同一のクラスタに属する各画素データの各特徴値の平均値である。 Here, the predetermined clustering method is, for example, a K-means method, a fuzzy c-means method, an entropy method, a Ward's method, a self-organizing map (Self -organizing maps, SOM), etc., but is not limited to these. The pixel data having the same feature value is, for example, the average value of the feature values of the pixel data belonging to the same cluster.

また、上記既入力の画素データが複数存在する場合には、上記第１の識別情報は複数の異なる識別情報となる。識別情報付与対象の画素データの特徴値が、その複数の異なる識別情報を有するの各画素データの各特徴値のいずれからも所定範囲値内にない場合に、その画素データに第２の識別情報が付与される。この場合、上記第１の数は、３つ以上の数になる。すなわち、上記構成は、クラスタ数が２つのみであることを意味するものではなく、画素データが少なくとも２つのクラスタに分類されることを意味する。 In addition, when there are a plurality of pieces of already input pixel data, the first identification information is a plurality of different pieces of identification information. When the feature value of the pixel data to which the identification information is to be added is not within a predetermined range value from any of the feature values of the pixel data having the plurality of different pieces of identification information, the second identification information is included in the pixel data. Is granted. In this case, the first number is three or more. That is, the above configuration does not mean that the number of clusters is only two, but means that the pixel data is classified into at least two clusters.

本発明の構成によれば、入力された画素データを、上記特徴値が所定範囲値内であるかに応じてまず第１の数のクラスタに分類し、同一のクラスタに分類された画素データを同一の特徴値を有する画素データと見なして、画素データのクラス（種類）数を減らしておき、その上で、各画素データを第２の数のクラスタに分類することができる。このように、画素データのクラス数を一端減らしてからクラスタリングを実行することで、多数のデータを直接クラスタリングする従来のクラスタリング手法に比べて、クラスタリング処理の収束までに要する処理数及び処理時間を大幅に減らすことができる。すなわち、クラスタリング処理の処理負荷を極力軽くすることができる。 According to the configuration of the present invention, the input pixel data is first classified into a first number of clusters according to whether the feature value is within a predetermined range value, and the pixel data classified into the same cluster is It can be regarded as pixel data having the same feature value, and the number (class) of pixel data is reduced, and then each pixel data can be classified into a second number of clusters. In this way, by reducing the number of classes of pixel data and then performing clustering, the number of processes and processing time required for convergence of the clustering process are greatly increased compared to conventional clustering methods that directly cluster a large number of data. Can be reduced. That is, the processing load of the clustering process can be reduced as much as possible.

また、画像データを構成する全ての画素データを用いて第１の数のクラスタに分類するため、例えば画像データを所定の間引き率で間引いた画像を用いる場合に比べて、処理時間を短縮しながらも、より高精度に分類処理を実行することができる。 Further, since all the pixel data constituting the image data is used to classify into the first number of clusters, for example, the processing time is shortened as compared with a case where an image obtained by thinning image data at a predetermined thinning rate is used. However, the classification process can be executed with higher accuracy.

上記情報処理方法は、さらに、前記第１の数のクラスタに分類された複数の画素データのうち、所定方向上に存在する連続した複数の第１の画素データと、前記所定方向上に存在し前記第１の画素データとは異なる連続した複数の第２の画素データと、前記所定方向上に前記第１の画素データと前記第２の画素データとの間に存在する少なくとも１つの第３の画素データとを抽出し、前記第１の画素データと前記第２の画素データの前記識別情報が同一であり、前記第３の画素データの前記識別情報が前記第１及び第２の画素データとは異なる場合に、前記第３の画素データの識別情報を、前記第１及び第２の画素データの識別情報へ置換してもよい。 The information processing method may further include a plurality of continuous first pixel data existing in a predetermined direction among the plurality of pixel data classified into the first number of clusters, and the predetermined direction. A plurality of continuous second pixel data different from the first pixel data, and at least one third pixel data existing between the first pixel data and the second pixel data in the predetermined direction. Pixel data is extracted, the identification information of the first pixel data and the second pixel data is the same, and the identification information of the third pixel data is the first and second pixel data May be replaced with the identification information of the first pixel data and the identification information of the second pixel data.

これにより、第１の画素データと第２の画素データとの間に存在する第３の画素データの識別情報が第１及び第２の画素データの識別情報と異なる場合には、第３の画素データをノイズと見なして、その識別情報を置換することで、ノイズを除去することができる。これにより、上記分類処理をより効率よく実行することができる。ここで所定方向とは、例えばＸ方向またはＹ方向である。 Thus, when the identification information of the third pixel data existing between the first pixel data and the second pixel data is different from the identification information of the first and second pixel data, the third pixel The noise can be removed by regarding the data as noise and replacing the identification information. Thereby, the said classification process can be performed more efficiently. Here, the predetermined direction is, for example, the X direction or the Y direction.

上記情報処理方法は、さらに、前記入力された各画素データ中の高周波成分をローパスフィルタにより除去してもよい。 In the information processing method, a high-frequency component in each input pixel data may be removed by a low-pass filter.

これにより、上記置換処理で置換しきれなかったノイズも除去することができ、上記分類処理をさらに効率よく実行することができる。 Thereby, noise that could not be replaced by the replacement process can be removed, and the classification process can be executed more efficiently.

上記情報処理方法は、さらに、前記第２の数のクラスタに分類された複数の画素データを基に、前記画像データを任意形状の前記第２の数の領域に分割しても構わない。 The information processing method may further divide the image data into the second number of regions having an arbitrary shape based on a plurality of pixel data classified into the second number of clusters.

これにより、上記第２の数のクラスタに分類された画素データを用いて、いわゆる画像セグメンテーション処理を実行することができる。 Thereby, so-called image segmentation processing can be executed using the pixel data classified into the second number of clusters.

上記情報処理方法は、さらに、前記分割された第２の数の領域毎に、複数の前記画像データ間で動きベクトルを検出し、前記検出された動きベクトルを基に、複数の前記画像データで構成される映像データ中の、カメラ動作により生じる所定の映像特徴を検出してもよい。 The information processing method further detects a motion vector between the plurality of image data for each of the divided second number of regions, and uses the plurality of image data based on the detected motion vector. A predetermined video feature generated by camera operation may be detected in the video data to be configured.

これにより、上記第２の数のクラスタを基に分割された第２の数の領域毎に動きベクトルを検出することで、上記映像特徴を効率よく検出することができる。ここで、カメラ動作により生じる所定の映像特徴とは、例えばパン、チルト、ズーム等の動き特徴である。 Accordingly, the video feature can be efficiently detected by detecting the motion vector for each of the second number of regions divided based on the second number of clusters. Here, the predetermined video features generated by the camera operation are motion features such as pan, tilt, and zoom.

上記情報処理方法において、前記映像特徴を検出するステップは、前記複数の画像データの前記第２の数の領域の画素数をそれぞれ算出し、前記複数の画像データ中の、前記画素数が最も大きい領域間で検出された動きベクトルを基に前記映像データ中の前記所定の映像特徴を検出してもよい。 In the information processing method, the step of detecting the video feature calculates the number of pixels in the second number of regions of the plurality of image data, respectively, and the number of pixels in the plurality of image data is the largest. The predetermined video feature in the video data may be detected based on a motion vector detected between regions.

ここで、画素数（面積）の最も大きい領域間で動きベクトルを検出するのは、画像データが、映像データ中の動く物体の領域と背景映像の領域とに分割された場合、一般的に、背景映像は動く物体よりもその面積が大きいと考えられ、当該背景映像の動きがその映像データ中のカメラ動作を示していると考えられるからである。これにより、上記映像データ中に動く物体が存在していても、当該動く物体の動きを無視して、カメラの動きのみに注目して動きベクトルを検出し、映像特徴を検出することができる。 Here, the motion vector is detected between the regions having the largest number of pixels (area) when the image data is divided into a moving object region and a background image region in the video data. This is because the background video is considered to have a larger area than the moving object, and the movement of the background video is considered to indicate the camera operation in the video data. As a result, even if there is a moving object in the video data, it is possible to detect the video feature by ignoring the movement of the moving object and detecting only the camera movement and detecting the motion vector.

上記情報処理方法は、さらに、前記分割された第２の数の領域毎に、前記画素データを符号化してもよい。 The information processing method may further encode the pixel data for each of the divided second number of regions.

これにより、オブジェクト符号化処理をより効率よく実行することができる。 Thereby, an object encoding process can be performed more efficiently.

上記情報処理方法は、さらに、前記分割された第２の数の領域毎に特徴ベクトルを生成し、複数の前記画像データ間で、前記第２の数のクラスタ毎に前記生成された特徴ベクトルを比較して、前記複数の画像データ間の類似性を判断してもよい。 The information processing method further generates a feature vector for each of the divided second number of regions, and generates the generated feature vector for each of the second number of clusters between the plurality of image data. In comparison, the similarity between the plurality of image data may be determined.

これにより、ある画像データに類似する他の画像データを効率よく検索することができる。 Thereby, other image data similar to certain image data can be searched efficiently.

上記情報処理方法において、前記類似性を判断するステップは、前記複数の画像データの前記第２の数の領域の画素数をそれぞれ算出し、前記複数の画素データ間で、前記画素数が最も大きい領域間で前記特徴ベクトルを比較して前記複数の画像データ間の類似性を判断しても構わない。 In the information processing method, the step of determining similarity calculates the number of pixels in the second number of regions of the plurality of image data, respectively, and the number of pixels is the largest among the plurality of pixel data. The feature vectors may be compared between regions to determine the similarity between the plurality of image data.

ここで、画素数（面積）が最も大きい領域間で特徴ベクトルを比較するのは、面積が最も大きい領域が、各画像データ間の類似性に最も影響を与えると考えられるからである。これにより、画像データ間の類似性の判断処理を、より効率よく高速に実行することが可能となる。 Here, the reason why the feature vectors are compared between the regions having the largest number of pixels (area) is that the region having the largest area is considered to have the most influence on the similarity between the image data. As a result, the similarity determination process between the image data can be executed more efficiently and at high speed.

上記情報処理方法は、前記第１の数のクラスタに分類するステップは、前記第１の数が所定数となるまで前記所定範囲値を可変して前記分類を繰り返してもよい。 In the information processing method, the step of classifying into the first number of clusters may repeat the classification by changing the predetermined range value until the first number reaches a predetermined number.

これにより、分類結果に応じて所定範囲値を可変して分類処理を繰り返すことで、最適な数までクラスタ数を減少させることが可能となる。 As a result, the number of clusters can be reduced to an optimum number by repeating the classification process by changing the predetermined range value according to the classification result.

上記情報処理方法は、前記第１の数のクラスタに分類するステップは、前記第１の数が所定数となった場合には、前記第２の数のクラスタへの分類を実行しないよう制御しても構わない。 In the information processing method, the step of classifying into the first number of clusters controls not to execute the classification into the second number of clusters when the first number reaches a predetermined number. It doesn't matter.

これにより、第１の数が十分少ない数にまで分類できたときは、それ以上分類を行わないことで、処理時間をより短縮することができる。 Thereby, when the first number can be classified into a sufficiently small number, the processing time can be further shortened by not performing further classification.

本発明の他の観点に係る情報処理装置は、画像データを構成する複数の画素データを逐次入力する入力手段と、前記入力される各画素データの特徴値が、既入力の画素データの特徴値から所定範囲値内であるか否かを、前記入力毎に判定し、前記特徴値が前記所定範囲値内であると判定された画素データに、前記既入力の画素データと同一の第１の識別情報を付与し、前記特徴値が前記所定範囲値内でないと判定された画素データに、前記第１の識別情報と異なる第２の識別情報を付与して、前記複数の画素データを第１の数のクラスタに分類する第１の分類手段と、同一のクラスタに分類された前記画素データを同一の特徴値を有する画素データと見なして、前記分類された複数の画素データを所定のクラスタリング手法により前記第１の数より少ない第２の数のクラスタに分類する第２の分類手段とを具備する。 An information processing apparatus according to another aspect of the present invention includes an input unit that sequentially inputs a plurality of pixel data constituting image data, and a feature value of each input pixel data is a feature value of already input pixel data. Is determined for each input, and the pixel data for which the feature value is determined to be within the predetermined range value is the same as the first input pixel data. Identification information is provided, second identification information different from the first identification information is assigned to pixel data for which the feature value is determined not to be within the predetermined range value, and the plurality of pixel data are set to the first A first classifying unit that classifies the plurality of classified data into a plurality of clusters, and the pixel data classified into the same cluster is regarded as pixel data having the same feature value, and the plurality of classified pixel data are subjected to a predetermined clustering method. The first ; And a second classifying means for classifying the smaller second number of clusters.

ここで情報処理装置とは、例えばＰＣ（Personal Computer）、ＨＤＤ（Hard Disk Drive）／ＤＶＤ／ＢＤ（Blu-ray Disc）レコーダ等の記録再生装置、サーバ装置、テレビジョン装置、ゲーム機器、デジタルカメラ、デジタルビデオカメラ、携帯電話機等の各種電子機器である。 Here, the information processing apparatus is, for example, a recording / playback apparatus such as a PC (Personal Computer), an HDD (Hard Disk Drive) / DVD / BD (Blu-ray Disc) recorder, a server apparatus, a television apparatus, a game machine, or a digital camera. And various electronic devices such as digital video cameras and mobile phones.

本発明の構成によれば、第１の分類手段により、入力された画素データを、上記特徴値が所定範囲値内であるかに応じてまず第１の数のクラスタに分類し、同一のクラスタに分類された画素データを同一の特徴値を有する画素データと見なして、画素データのクラス（種類）数を減らしておき、その上で、第２の分類手段により、各画素データを第２の数のクラスタに分類することができる。このように、画素データのクラス数を一端減らしてからクラスタリングを実行することで、多数のデータを直接クラスタリングする従来のクラスタリング手法に比べて、クラスタリング処理の収束までに要する処理数及び処理時間を大幅に減らすことができる。すなわち、クラスタリング処理の処理負荷を極力軽くすることができる。また、画像データを構成する全ての画素データを用いて第１の数のクラスタに分類するため、例えば画像データを所定の間引き率で間引いた画像を用いる場合に比べて、処理時間を短縮しながらも、より高精度に分類処理を実行することができる。 According to the configuration of the present invention, the first classifying unit first classifies the input pixel data into the first number of clusters according to whether the feature value is within the predetermined range value, and the same cluster. The pixel data classified into (2) is regarded as pixel data having the same feature value, the number of classes (types) of the pixel data is reduced, and then each pixel data is converted into the second data by the second classification means. It can be classified into a number of clusters. In this way, by reducing the number of classes of pixel data and then performing clustering, the number of processes and processing time required for convergence of the clustering process are greatly increased compared to conventional clustering methods that directly cluster a large number of data. Can be reduced. That is, the processing load of the clustering process can be reduced as much as possible. Further, since all the pixel data constituting the image data is used to classify into the first number of clusters, for example, the processing time is shortened as compared with a case where an image obtained by thinning image data at a predetermined thinning rate is used. However, the classification process can be executed with higher accuracy.

本発明のさらに別の観点に係るプログラムは、情報処理装置に、画像データを構成する複数の画素データを逐次入力するステップと、前記入力される各画素データの所定の特徴値が、既入力の画素データの前記特徴値から所定範囲値内であるか否かを、前記入力毎に判定し、前記特徴値が前記所定範囲値内であると判定された画素データに、前記既入力の画素データと同一の第１の識別情報を付与し、前記特徴値が前記所定範囲値内でないと判定された画素データに、前記第１の識別情報と異なる第２の識別情報を付与して、前記複数の画素データを第１の数のクラスタに分類するステップと、同一のクラスタに分類された前記画素データを同一の特徴値を有する画素データと見なして、前記分類された複数の画素データを所定のクラスタリング手法により前記第１の数より少ない第２の数のクラスタに分類するステップとを実行させるためのものである。 A program according to still another aspect of the present invention includes a step of sequentially inputting a plurality of pixel data constituting image data to an information processing apparatus, and a predetermined feature value of each input pixel data is already input. It is determined for each input whether or not the feature value is within a predetermined range value from the feature value of the pixel data, and the already-input pixel data is added to the pixel data that is determined that the feature value is within the predetermined range value. And the second identification information different from the first identification information to the pixel data determined that the feature value is not within the predetermined range value, Classifying the pixel data into a first number of clusters, the pixel data classified into the same cluster as pixel data having the same feature value, Cluster It is for and a step of classifying the first second number of clusters fewer than the number by grayed technique.

以上のように、本発明によれば、画像データ中の各画素データを高精度かつ短時間で分類することができる。 As described above, according to the present invention, each pixel data in the image data can be classified with high accuracy and in a short time.

以下、本発明の実施の形態を図面に基づき説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

［第１の実施形態］
まず、本発明の第１の実施形態について説明する。本実施形態においては、情報処理装置として記録再生装置を適用する。 [First Embodiment]
First, a first embodiment of the present invention will be described. In this embodiment, a recording / reproducing apparatus is applied as the information processing apparatus.

図１は、本実施形態に係る記録再生装置１００の構成を示した図である。
同図に示すように、記録再生装置１００は、ＣＰＵ（Central Processing Unit）１、ＲＡＭ（Random Access Memory）２、操作入力部３、セグメンテーション処理部２０、映像特徴検出部４、デジタルチューナ５、IEEE1394インタフェース６、Ethernet（登録商標）／無線ＬＡＮ（Local Area Network）インタフェース７、ＵＳＢ（Universal Serial Bus）インタフェース８、メモリカードインタフェース９、ＨＤＤ１０、光ディスクドライブ１１、バッファコントローラ１３、セレクタ１４、デマルチプレクサ１５、ＡＶ（Audio/Video）デコーダ１６、ＯＳＤ（On Screen Display）１７、映像Ｄ／Ａ（Digital/Analog）コンバータ１８及び音声Ｄ／Ａコンバータ１９を有している。 FIG. 1 is a diagram showing a configuration of a recording / reproducing apparatus 100 according to the present embodiment.
As shown in the figure, a recording / reproducing apparatus 100 includes a CPU (Central Processing Unit) 1, a RAM (Random Access Memory) 2, an operation input unit 3, a segmentation processing unit 20, a video feature detection unit 4, a digital tuner 5, an IEEE1394. Interface 6, Ethernet (registered trademark) / wireless LAN (Local Area Network) interface 7, USB (Universal Serial Bus) interface 8, memory card interface 9, HDD 10, optical disk drive 11, buffer controller 13, selector 14, demultiplexer 15, An AV (Audio / Video) decoder 16, an OSD (On Screen Display) 17, a video D / A (Digital / Analog) converter 18, and an audio D / A converter 19 are provided.

ＣＰＵ１は、必要に応じてＲＡＭ２等に適宜アクセスし、記録再生装置１００の各ブロック全体を制御する。ＲＡＭ２は、ＣＰＵ１の作業用領域等として用いられ、ＯＳ（Operating System）やプログラム、処理データ等を一時的に保持するメモリである。 The CPU 1 appropriately accesses the RAM 2 or the like as necessary, and controls the entire blocks of the recording / reproducing apparatus 100. The RAM 2 is a memory that is used as a work area of the CPU 1 and temporarily stores an OS (Operating System), a program, processing data, and the like.

操作入力部３は、ボタン、スイッチ、キー、タッチパネルや、リモートコントローラ（図示せず）から送信される赤外線信号の受光部等で構成され、ユーザの操作による各種設定値や指令を入力してＣＰＵ１へ出力する。 The operation input unit 3 includes a button, a switch, a key, a touch panel, a light receiving unit for an infrared signal transmitted from a remote controller (not shown), and the like. Output to.

デジタルチューナ５は、ＣＰＵ１の制御に従って、図示しないアンテナを介してデジタル放送の放送番組の放送信号を受信し、特定のチャンネルの放送信号を選局及び復調する。この放送信号は、セレクタ１４を介してデマルチプレクサ１５に出力され再生させたり、バッファコントローラ１３を介して、ＨＤＤ１０に記録されたり、光ディスクドライブ１１に挿入された光ディスク１２へ記録されたりする。 Under the control of the CPU 1, the digital tuner 5 receives a broadcast signal of a digital broadcast program through an antenna (not shown), and selects and demodulates a broadcast signal of a specific channel. This broadcast signal is output to the demultiplexer 15 via the selector 14 for reproduction, or recorded on the HDD 10 via the buffer controller 13 or recorded on the optical disk 12 inserted in the optical disk drive 11.

IEEE1394インタフェース６は、例えばデジタルビデオカメラ等の外部機器に接続可能である。例えばデジタルビデオカメラによって撮影され記録された映像コンテンツは、上記デジタルチューナ５によって受信された放送番組の映像コンテンツと同様に、再生されたり、ＨＤＤ１０や光ディスク１２へ記録されたりする。 The IEEE1394 interface 6 can be connected to an external device such as a digital video camera. For example, video content shot and recorded by a digital video camera is played back or recorded on the HDD 10 or the optical disc 12 in the same manner as the video content of a broadcast program received by the digital tuner 5.

Ethernet（登録商標）／無線ＬＡＮインタフェース７は、例えばＰＣや他の記録再生装置に記録された映像コンテンツを、Ethernet（登録商標）または無線ＬＡＮ経由で入力する。この映像コンテンツも、再生や、ＨＤＤ１０または光ディスク１２への記録が可能である。 The Ethernet (registered trademark) / wireless LAN interface 7 inputs, for example, video content recorded on a PC or other recording / playback apparatus via Ethernet (registered trademark) or wireless LAN. This video content can also be played back and recorded on the HDD 10 or the optical disc 12.

ＵＳＢインタフェース８は、ＵＳＢを介して例えばデジタルカメラ等の機器やいわゆるＵＳＢメモリ等の外部記憶装置から映像コンテンツを入力する。この映像コンテンツも、再生や、ＨＤＤ１０または光ディスク１２への記録が可能である。 The USB interface 8 inputs video content from a device such as a digital camera or an external storage device such as a so-called USB memory via the USB. This video content can also be played back and recorded on the HDD 10 or the optical disc 12.

メモリカードインタフェース９は、例えばフラッシュメモリを内蔵したメモリカードと接続して、当該メモリカードに記録された映像コンテンツを入力する。この映像コンテンツも、再生や、ＨＤＤ１０または光ディスク１２への記録が可能である。 The memory card interface 9 is connected to, for example, a memory card with a built-in flash memory, and inputs video content recorded on the memory card. This video content can also be played back and recorded on the HDD 10 or the optical disc 12.

ＨＤＤ１０は、放送信号として受信したまたは外部機器から入力した各種映像コンテンツを内蔵のハードディスクに記録し、また再生時にはそれらを当該ハードディスクから読み出し、バッファコントローラ１３へ出力する。またＨＤＤ１０は、ＯＳや、後述する画像セグメンテーション処理及び映像特徴の検出処理を実行するためのプログラム、その他の各種プログラム及びデータ等も格納する。なお、記録再生装置１００は、これらＯＳや各種プログラム及びデータを、ＨＤＤ１０ではなく、フラッシュメモリ（図示せず）等の他の記録媒体に格納するようにしてもよい。 The HDD 10 records various video contents received as a broadcast signal or input from an external device on a built-in hard disk, and reads them from the hard disk and outputs them to the buffer controller 13 during reproduction. The HDD 10 also stores an OS, a program for executing image segmentation processing and video feature detection processing, which will be described later, and various other programs and data. Note that the recording / reproducing apparatus 100 may store the OS, various programs, and data in another recording medium such as a flash memory (not shown) instead of the HDD 10.

光ディスクドライブ１１は、上記映像コンテンツ等を光ディスク１２に記録し、また再生時にはそれらを読み出し、バッファコントローラ１３へ出力する。光ディスク１２は、例えばＤＶＤ、ＢＤ、ＣＤ等である。 The optical disk drive 11 records the video content and the like on the optical disk 12, reads them during reproduction, and outputs them to the buffer controller 13. The optical disk 12 is, for example, a DVD, a BD, a CD, or the like.

バッファコントローラ１３は、例えば上記デジタルチューナ５やその他の各種インタフェースから連続的に供給される映像コンテンツの、ＨＤＤ１０または光ディスク１２への書き込みのタイミングやデータ量を制御し、当該映像コンテンツを断続的に書き込む。また、バッファコントローラ１３は、ＨＤＤ１０や光ディスク１２に記録された映像コンテンツの読み出しのタイミングやデータ量を制御し、断続的に読み出された映像コンテンツを、デマルチプレクサ１５へ連続的に供給する。 For example, the buffer controller 13 controls the timing and amount of data written to the HDD 10 or the optical disk 12 of video content continuously supplied from the digital tuner 5 and other various interfaces, and writes the video content intermittently. . Further, the buffer controller 13 controls the read timing and data amount of the video content recorded on the HDD 10 and the optical disk 12 and continuously supplies the video content read intermittently to the demultiplexer 15.

セレクタ１４は、上記デジタルチューナ５、各種インタフェース、ＨＤＤ１０及び光ディスクドライブ１１のいずれかから入力される映像コンテンツを、ＣＰＵ１からの制御信号に基づき選択する。 The selector 14 selects video content input from any of the digital tuner 5, various interfaces, the HDD 10, and the optical disk drive 11 based on a control signal from the CPU 1.

デマルチプレクサ１５は、前記バッファコントローラ１３から入力された、多重化された映像コンテンツを、映像信号と音声信号とに分離して、それらをＡＶデコーダ１６へ出力する。 The demultiplexer 15 separates the multiplexed video content input from the buffer controller 13 into a video signal and an audio signal, and outputs them to the AV decoder 16.

ＡＶデコーダ１６は、例えばＭＰＥＧ（Moving Picture Expert Group）−２やＭＰＥＧ−４等の形式でエンコードされた映像信号及び音声信号をそれぞれデコードして、映像信号をＯＳＤ１７へ、また音声信号をＤ／Ａコンバータ１９へ出力する。 The AV decoder 16 decodes a video signal and an audio signal encoded in a format such as MPEG (Moving Picture Expert Group) -2 or MPEG-4, for example, and converts the video signal to the OSD 17 and the audio signal to the D / A. Output to the converter 19.

ＯＳＤ１７は、図示しないディスプレイに表示するためのグラフィックス等を生成して、上記映像信号との合成処理や切り替え処理を施し、処理後の映像信号を映像Ｄ／Ａコンバータ１８へ出力する。映像Ｄ／Ａコンバータ１８は、ＯＳＤ１７でグラフィック処理を施された映像信号をＤ／Ａ変換によりＮＴＳＣ（National Television Standards Committee）信号とし、図示しないディスプレイに出力して表示させる。 The OSD 17 generates graphics or the like to be displayed on a display (not shown), performs synthesis processing and switching processing with the video signal, and outputs the processed video signal to the video D / A converter 18. The video D / A converter 18 converts the video signal subjected to graphic processing by the OSD 17 into an NTSC (National Television Standards Committee) signal by D / A conversion, and outputs and displays it on a display (not shown).

音声Ｄ／Ａコンバータ１９は、上記ＡＶデコーダ１６から入力された音声信号をＤ／Ａ変換して、図示しないスピーカに出力して再生させる。 The audio D / A converter 19 D / A converts the audio signal input from the AV decoder 16 and outputs it to a speaker (not shown) for reproduction.

セグメンテーション処理部２０は、ＡＶデコーダ１６によるデコード前の映像データ、または、デコード後の映像データを構成する各画像データから複数の領域（オブジェクト）を抽出する。このセグメンテーション処理の詳細は後述する。 The segmentation processing unit 20 extracts a plurality of areas (objects) from the video data before decoding by the AV decoder 16 or the image data constituting the video data after decoding. Details of this segmentation processing will be described later.

映像特徴検出部４は、セグメンテーション処理された映像データから、例えばパン、チルト、ズーム等のカメラ動作によって生じる映像特徴を検出する。この映像特徴検出処理の詳細も後述する。 The video feature detection unit 4 detects video features generated by camera operations such as panning, tilting, and zooming from the segmented video data. Details of this video feature detection processing will also be described later.

図２は、セグメンテーション処理の概要を示した図である。
セグメンテーション処理とは、同図（Ａ）に示すように、例えば建物２１、人物２２、道路２３等のオブジェクトが含まれる原画像から、同図（Ｂ）に示すように、建物２１、人物２２、道路２３にそれぞれ相当する領域２１ａ、２２ａ及び２３ａを抽出する処理（領域２１ａ、２２ａ及び２３ａに分割する処理）である。従来のセグメンテーション処理は、原画像を構成する各画素データを、例えばK-means等のクラスタリング手法を用いて複数のクラスタに分類することで実行される。しかし、この従来のセグメンテーション処理には、クラスタリング処理によりデータが収束するまでに時間を要し、システム全体の負担となっていた。そこで本実施形態においては、後述するように、クラスタリング処理を２段階で実施して処理時間の短縮を図っている。 FIG. 2 is a diagram showing an outline of the segmentation process.
As shown in FIG. 6A, segmentation processing is performed from an original image including objects such as a building 21, a person 22, and a road 23, as shown in FIG. This is a process of extracting the areas 21a, 22a and 23a corresponding to the road 23 (a process of dividing the areas 21a, 22a and 23a). Conventional segmentation processing is executed by classifying each pixel data constituting an original image into a plurality of clusters using a clustering technique such as K-means. However, in this conventional segmentation process, it takes time until the data is converged by the clustering process, which is a burden on the entire system. Therefore, in this embodiment, as will be described later, the clustering process is performed in two stages to reduce the processing time.

図３は、上記セグメンテーション処理部２０の構成を示した図である。
同図に示すように、セグメンテーション処理部２０は、ＬＰＦ（ローパスフィルタ）処理部３１、ＩＤ処理部３２、閾値設定部３３、補間処理部３４、K-means処理部３５を有する。 FIG. 3 is a diagram showing the configuration of the segmentation processing unit 20.
As shown in the figure, the segmentation processing unit 20 includes an LPF (low-pass filter) processing unit 31, an ID processing unit 32, a threshold setting unit 33, an interpolation processing unit 34, and a K-means processing unit 35.

ＬＰＦ処理部３１は、例えばＨＤＤ１０等から、映像データ中の各画像データを構成する画素データを逐次読み込み、各画素データ中の高周波成分（ノイズ）を除去してＩＤ処理部３２に出力する。例えば、ＬＰＦ処理部３１は、入力された画素データの値を、直前及び直後の画素データの平均値となるように補正する。 The LPF processing unit 31 sequentially reads pixel data constituting each image data in the video data, for example, from the HDD 10 or the like, removes high frequency components (noise) in each pixel data, and outputs them to the ID processing unit 32. For example, the LPF processing unit 31 corrects the value of the input pixel data so as to be the average value of the immediately preceding and immediately following pixel data.

ＩＤ処理部３２は、ＬＰＦ処理部３１から逐次入力した画素データに対して、所定の閾値を基に、識別データ（ＩＤ）を付与することで、各画素データを所定数のクラスタに分類し、補間処理部３４へ出力する。このＩＤ処理部３２における処理の詳細については後述する。 The ID processing unit 32 classifies each pixel data into a predetermined number of clusters by giving identification data (ID) to the pixel data sequentially input from the LPF processing unit 31 based on a predetermined threshold value. The data is output to the interpolation processing unit 34. Details of the processing in the ID processing unit 32 will be described later.

閾値設定部３３は、上記ＩＤ処理部３２においてＩＤ付与の基準となる閾値（範囲）を設定する。 The threshold value setting unit 33 sets a threshold value (range) that serves as a reference for ID assignment in the ID processing unit 32.

補間処理部３４は、ＩＤ処理部３２によりＩＤを付与された各画素データに対して所定の条件を基にＩＤの置換処理を実行し、K-means処理部３５へ出力する。この補間処理の詳細については後述する。 The interpolation processing unit 34 performs ID replacement processing on each pixel data to which the ID is assigned by the ID processing unit 32 based on a predetermined condition, and outputs the result to the K-means processing unit 35. Details of this interpolation processing will be described later.

K-means処理部３５は、補間処理部３４から入力された画素データを、K-means法に基づくクラスタリング処理により複数のクラスタに分類し、その分類結果を、領域抽出結果として出力する。 The K-means processing unit 35 classifies the pixel data input from the interpolation processing unit 34 into a plurality of clusters by clustering processing based on the K-means method, and outputs the classification result as a region extraction result.

本実施形態において処理される各画像データは例えばＶＧＡ（Video Graphic Array）サイズ（640×480ドット）である。したがって、上記ＬＰＦ処理部３１には、１つの画像データにつき、640×480＝307,200個の画素データが逐次入力されることとなる。つまり、この入力時点において、１つの画像データを構成する画素データのクラスタ数は、307,200である。 Each image data processed in this embodiment is, for example, VGA (Video Graphic Array) size (640 × 480 dots). Therefore, 640 × 480 = 307,200 pieces of pixel data are sequentially input to the LPF processing unit 31 for each image data. That is, at this input time, the number of clusters of pixel data constituting one image data is 307,200.

本実施形態において、画素データは、色特徴データとテクスチャ特徴データとから構成される。色特徴データは、例えば画像データのヒストグラムを基に抽出され、例えばＲＧＢの各成分が８ビット（０〜２５５）で表された３次元のデータである。もちろん、色特徴データは、色差信号（Ｙ／Ｃｂ／Ｃｒ）を基に表されてもよい。テクスチャ特徴データは、例えば、画像データ中の各画素にウェーブレット変換処理が施されることで、各画素データの周波数成分として抽出される。テクスチャ特徴データも、上記色特徴データに対応して、各８ビットの３次元のデータに正規化される。 In the present embodiment, the pixel data is composed of color feature data and texture feature data. The color feature data is, for example, extracted based on a histogram of image data, and is, for example, three-dimensional data in which each component of RGB is represented by 8 bits (0 to 255). Of course, the color feature data may be expressed based on the color difference signal (Y / Cb / Cr). The texture feature data is extracted as a frequency component of each pixel data, for example, by performing wavelet transform processing on each pixel in the image data. Texture feature data is also normalized to 8-bit three-dimensional data corresponding to the color feature data.

上記閾値設定部３３により設定される閾値は、上記色特徴データとテクスチャ特徴データにそれぞれ設定される。色特徴データの閾値は、例えば±２程度に設定され、テクスチャ特徴データの閾値は、例えば±３程度に設定される。 The threshold values set by the threshold value setting unit 33 are set in the color feature data and texture feature data, respectively. The threshold value of the color feature data is set to about ± 2, for example, and the threshold value of the texture feature data is set to about ± 3, for example.

本実施形態においては、上記ＬＰＦ処理部３１から補間処理部３４により、画素データの初期クラスタリング処理が実行され、上記K-means処理部３５により、本クラスタリング処理が実行される。上述のように入力時に307,200あった画素データのクラスタ数は、初期クラスタリング処理により、数１０程度にまで削減され、本クラスタリング処理により、５〜２０程度まで削減される。 In this embodiment, the LPF processing unit 31 to the interpolation processing unit 34 execute initial clustering processing of pixel data, and the K-means processing unit 35 executes this clustering processing. As described above, the number of clusters of pixel data 307,200 at the time of input is reduced to about several tens by the initial clustering process, and is reduced to about 5 to 20 by the present clustering process.

図４は、上記ＩＤ処理部３２の構成を示した図である。
同図に示すように、ＩＤ処理部３２は、データメモリ部４１、比較判定部４２、スイッチ４３、ＩＤメモリ部４４、最大ＩＤメモリ部４５及びアドレスカウンタ部４６を有する。 FIG. 4 is a diagram showing the configuration of the ID processing unit 32.
As shown in the figure, the ID processing unit 32 includes a data memory unit 41, a comparison / determination unit 42, a switch 43, an ID memory unit 44, a maximum ID memory unit 45, and an address counter unit 46.

データメモリ部４１は、画素データを逐次入力して一時的に記憶し、新たな画素データが入力された場合には、その画素データとの比較判定対象としての画素データを比較判定部４２へ出力する。 The data memory unit 41 sequentially inputs and temporarily stores pixel data. When new pixel data is input, the pixel data as a comparison determination target with the pixel data is output to the comparison determination unit 42. To do.

比較判定部４２は、上記閾値設定部３３により設定される閾値を基に、入力された画素データと、データメモリ部４１に記憶された画素データとを比較して、入力された画素データが、データメモリ部４１に記憶された画素データから閾値の範囲内にあるかを判定し、その比較判定結果をスイッチ４３へ出力する。 The comparison determination unit 42 compares the input pixel data with the pixel data stored in the data memory unit 41 based on the threshold set by the threshold setting unit 33, and the input pixel data is It is determined whether the pixel data stored in the data memory unit 41 is within the threshold range, and the comparison determination result is output to the switch 43.

スイッチ４３は、比較判定部４２による比較判定結果に応じて、比較判定対象の画素データへ、ＩＤメモリ部４４に記憶されたＩＤと同一のＩＤまたは新たなＩＤを付与してＩＤメモリ部４４及びＣＰＵ１へ出力する。 The switch 43 assigns the same ID as the ID stored in the ID memory unit 44 or a new ID to the pixel data to be compared and determined according to the comparison determination result by the comparison determination unit 42, and the ID memory unit 44 and Output to CPU1.

ＩＤメモリ部４４は、上記比較判定部４２により付与された各画素データのＩＤを記憶し、各ＩＤのうち最大ＩＤ値を最大ＩＤメモリ部４４に供給する。 The ID memory unit 44 stores the ID of each pixel data given by the comparison determination unit 42 and supplies the maximum ID value of each ID to the maximum ID memory unit 44.

最大ＩＤメモリ部４５は、上記最大ＩＤ値を記憶し、上記比較判定部４２により、入力された画素データが閾値の範囲外であると判定された場合に、最大ＩＤ値を１インクリメントして、スイッチ４３へ出力する。この場合スイッチ４３は、この新たな最大ＩＤ値を、入力された画素データへ付与して、ＩＤメモリ部４４へ出力する。 The maximum ID memory unit 45 stores the maximum ID value, and when the comparison determination unit 42 determines that the input pixel data is outside the threshold range, the maximum ID value is incremented by 1, Output to the switch 43. In this case, the switch 43 gives the new maximum ID value to the input pixel data and outputs it to the ID memory unit 44.

アドレスカウンタ部４６は、上記データメモリ部４１に記憶されている画素データの記憶領域のアドレスと、ＩＤメモリ部４４に記憶されている各画素データのＩＤの記憶領域のアドレスとの対応関係を管理する。 The address counter unit 46 manages the correspondence between the address of the pixel data storage area stored in the data memory unit 41 and the address of the ID storage area of each pixel data stored in the ID memory unit 44. To do.

上記データメモリ部４１とＩＤメモリ部４４とは、物理的に別々のメモリ素子としてもよいし、１つのメモリ素子を各メモリ用に分割するようにしても構わない。 The data memory unit 41 and the ID memory unit 44 may be physically separate memory elements, or one memory element may be divided for each memory.

次に、以上のように構成された記録再生装置１００の動作について説明する。 Next, the operation of the recording / reproducing apparatus 100 configured as described above will be described.

図５は、本実施形態における記録再生装置１００の動作の概略的な流れを示したフローチャートである。
同図に示すように、記録再生装置１００のセグメンテーション処理部２０は、画像データを構成する画素データを逐次入力し（ステップ５１）、当該各画素データの入力毎に初期クラスタリング処理を実行する（ステップ５２）。次いで、セグメンテーション処理部２０は、K-meansにより本クラスタリング処理を実行し（ステップ５３）、当該クラスタリング処理結果を基に各画像データから複数の領域を抽出する（ステップ５４）。 FIG. 5 is a flowchart showing a schematic flow of the operation of the recording / reproducing apparatus 100 in the present embodiment.
As shown in the figure, the segmentation processing unit 20 of the recording / reproducing apparatus 100 sequentially inputs pixel data constituting image data (step 51), and executes initial clustering processing for each input of the pixel data (step 51). 52). Next, the segmentation processing unit 20 executes the clustering process using K-means (step 53), and extracts a plurality of regions from each image data based on the clustering process result (step 54).

そして、記録再生装置１００の映像特徴検出部４は、１つの映像コンテンツを構成する複数の画像データ間で、上記抽出された領域毎に、動きベクトルを検出することで、映像コンテンツ中のカメラ特徴を検出する（ステップ５５）。 Then, the video feature detection unit 4 of the recording / playback apparatus 100 detects a motion vector for each of the extracted regions between a plurality of image data constituting one video content, thereby providing a camera feature in the video content. Is detected (step 55).

図６は、上記ステップ７２における初期クラスタリング処理の流れを示したフローチャートである。
同図に示すように、セグメンテーション処理部２０のＩＤ処理部３２は、上記ＬＰＦ処理部３１から画素データを入力すると、まず、初期化処理を実行する（ステップ６１）。すなわち、ＩＤ処理部３２は、画素データ数カウント用のｎ、ＩＤ値を示すｉｄ（ｎ）及び初期クラスタリング処理が終了したことを判定するための処理済みフラグｆｌｇ（ｎ）を、それぞれｎ＝０、ｉｄ（ｎ）＝０、ｆｌｇ（ｎ）＝０に設定する。 FIG. 6 is a flowchart showing the flow of the initial clustering process in step 72 described above.
As shown in the figure, when the ID processing unit 32 of the segmentation processing unit 20 receives pixel data from the LPF processing unit 31, it first executes an initialization process (step 61). That is, the ID processing unit 32 sets n = 0 for counting the number of pixel data, id (n) indicating the ID value, and a processed flag flg (n) for determining that the initial clustering process is completed, respectively. , Id (n) = 0 and flg (n) = 0.

続いて、ＩＤ処理部３２は、画素データｄ（ｎ）を読み込み（ステップ６２）、読み込んだ画素データを上記データメモリ部４１に書き込んで記憶する（ステップ６３）。 Subsequently, the ID processing unit 32 reads the pixel data d (n) (step 62), and writes and stores the read pixel data in the data memory unit 41 (step 63).

続いて、ＩＤ処理部３２は、上記ステップ６２で読み込んだ画素データと、既にデータメモリ部４１に記憶されている画素データとを比較し、上記読み込んだ画素データが、既に記憶されている画素データから閾値範囲内であるか否かを判定する（ステップ６４）。 Subsequently, the ID processing unit 32 compares the pixel data read in step 62 with the pixel data already stored in the data memory unit 41, and the read pixel data is already stored in the pixel data. To determine whether it is within the threshold range (step 64).

ここで、この比較判定処理及びＩＤ割り当て処理の詳細について説明する。図７は、当該各処理の詳細な流れを示したフローチャートである。
同図に示すように、まず、ＩＤ処理部３２は、読み込まれた画素データの比較対象である、既にデータメモリ部４１に記憶してある画素データのカウンタ値ｋをｋ＝０に初期化する（ステップ７１）。続いて、ＩＤ処理部３２は、上記既に記憶されている画素データｄ（ｋ）の処理済みフラグｆｌｇ（ｋ）がｆｌｇ（ｋ）＝１であるか否か、すなわち、画素データｄ（ｋ）へのＩＤ割り当て処理が済んでいるか否かを確認する（ステップ７２）。 Here, details of the comparison determination process and the ID assignment process will be described. FIG. 7 is a flowchart showing a detailed flow of each process.
As shown in the figure, first, the ID processing unit 32 initializes a counter value k of pixel data, which is a comparison target of the read pixel data, already stored in the data memory unit 41 to k = 0. (Step 71). Subsequently, the ID processing unit 32 determines whether or not the processed flag flg (k) of the already stored pixel data d (k) is flg (k) = 1, that is, the pixel data d (k). It is confirmed whether or not the ID assignment process has been completed (step 72).

ｆｌｇ（ｎ）＝０である場合（Ｙｅｓ）、すなわち画素データｄ（ｋ）へのＩＤ割り当て処理が済んでいる場合には、ＩＤ処理部３２は、当該画素データｄ（ｋ）をデータメモリ部４１から読み出す（ステップ７３）。 If flg (n) = 0 (Yes), that is, if the ID allocation process for the pixel data d (k) has been completed, the ID processing unit 32 stores the pixel data d (k) in the data memory unit. Read from 41 (step 73).

続いて、ＩＤ処理部３２は、比較判定部４２により、上記読み込まれた画素データｄ（ｎ）の値が、既に記憶されている画素データｄ（ｋ）の値から閾値ｄｔｈの範囲内にあるか否かを判定する（ステップ７４）。上述したように、閾値ｄｔｈは、画素データｄ（ｋ）の色特徴データとテクスチャ特徴データにそれぞれ設定されるため、ＩＤ処理部３２は、これら色特徴データとテクスチャ特徴データのそれぞれについて閾値ｄｔｈを判定する。 Subsequently, the ID processing unit 32 determines that the value of the pixel data d (n) read by the comparison / determination unit 42 is within the range of the threshold value dth from the value of the already stored pixel data d (k). Whether or not (step 74). As described above, since the threshold value dth is set for each of the color feature data and the texture feature data of the pixel data d (k), the ID processing unit 32 sets the threshold value dth for each of the color feature data and the texture feature data. judge.

閾値判定の結果、読み込まれた画素データｄ（ｎ）が、記憶された画素データｄ（ｋ）から閾値以内にあると判定された場合、すなわち、｜ｄ（ｎ）−ｄ（ｋ）｜≦ｄｔｈであると判定された場合（Ｙｅｓ）には、ＩＤ処理部３２は、画素データｄ（ｎ）に対して画素データｄ（ｋ）と同一のＩＤを割り当てる（ステップ７５）。 As a result of the threshold determination, when it is determined that the read pixel data d (n) is within the threshold from the stored pixel data d (k), that is, | d (n) −d (k) | ≦ If it is determined that it is dth (Yes), the ID processing unit 32 assigns the same ID as the pixel data d (k) to the pixel data d (n) (step 75).

閾値判定の結果、読み込まれた画素データｄ（ｎ）が、記憶された画素データｄ（ｋ）から閾値以内にないと判定された場合、すなわち、｜ｄ（ｎ）−ｄ（ｋ）｜＞ｄｔｈであると判定された場合（Ｙｅｓ）には、ＩＤ処理部３２は、上記カウンタ値ｋを１インクリメントし（ステップ７８）する。そして、ＩＤ処理部３２は、このインクリメントしたｋが画素データ数ｎよりも大きいか否か、すなわち、割り当てるべきＩＤがもうなくなったか否かを判定する（ステップ７８）。 As a result of the threshold determination, when it is determined that the read pixel data d (n) is not within the threshold from the stored pixel data d (k), that is, | d (n) −d (k) |> If it is determined that it is dth (Yes), the ID processing unit 32 increments the counter value k by 1 (step 78). Then, the ID processing unit 32 determines whether or not the incremented k is larger than the pixel data number n, that is, whether or not there are no more IDs to be allocated (step 78).

ｋ＞ｎであると判定された場合、ＩＤ処理部３２は、最大ＩＤメモリ部４５から、最大ＩＤ値ｉｄｍａｘを検出し、当該最大ＩＤ値ｉｄｍａｘを１インクリメントした値を、上記画素データｄ（ｎ）のＩＤとして割り当てる（ステップ８０）。 When it is determined that k> n, the ID processing unit 32 detects the maximum ID value idmax from the maximum ID memory unit 45 and sets the value obtained by incrementing the maximum ID value idmax by 1 as the pixel data d (n (Step 80).

ｋ≦ｎであると判定された場合、ＩＤ処理部３２は、上記ステップ７２以降の処理、すなわち、１インクリメントされた画素データｄ（ｋ）と画素データｄ（ｎ）との比較判定処理を実行する。 If it is determined that k ≦ n, the ID processing unit 32 performs the processing after step 72, that is, the comparison determination processing between the pixel data d (k) incremented by 1 and the pixel data d (n). To do.

ＩＤの割り当て処理が終了した場合には、ＩＤ処理部３２は、上記画素データｄ（ｎ）についての処理済みフラグｆｌｇ（ｎ）をｆｌｇ（ｎ）＝１に設定する（ステップ７６）。 When the ID assignment process is completed, the ID processing unit 32 sets the processed flag flg (n) for the pixel data d (n) to flg (n) = 1 (step 76).

また、上記ステップ７２において、ｆｌｇ（ｋ）＝０である場合（Ｎｏ）には、ＩＤ処理部３２は、ステップ７７へ進み、ｋを１インクリメントして、ステップ７８以降の処理を実行する。 If flg (k) = 0 in Step 72 (No), the ID processing unit 32 proceeds to Step 77, increments k by 1, and executes the processing after Step 78.

図６に戻り、ＩＤの割り当て処理が終了すると、ＩＤ処理部３２は、上記画素データｄ（ｎ）のカウンタ値ｎを１インクリメントし（ステップ６５）、インクリメント後のｎがデータ数の閾値ｎｔｈを越えたか否かを判定する（ステップ６６）。ｎ≦ｎｔｈの場合（Ｎｏ）には、ＩＤ処理部３２は、上記ステップ６２へ戻り、以降の処理を繰り返す。ｎ＞ｎｔｈの場合、すなわち、所定の画像データ中の、全ての画素データｄ（ｎ）へのＩＤの割り当てが終了した場合には、ＩＤ処理部３２は、上記補間処理部３４によるＩＤ補間処理を実行して（ステップ６７）、本クラスタリング処理へ移行する。 Returning to FIG. 6, when the ID allocation process is completed, the ID processing unit 32 increments the counter value n of the pixel data d (n) by 1 (step 65), and the incremented n sets the threshold value nth of the number of data. It is determined whether or not it has been exceeded (step 66). If n ≦ nth (No), the ID processing unit 32 returns to step 62 and repeats the subsequent processing. When n> nth, that is, when the assignment of IDs to all the pixel data d (n) in the predetermined image data is completed, the ID processing unit 32 performs the ID interpolation processing by the interpolation processing unit 34. Is executed (step 67), and the process proceeds to the clustering process.

図８及び図９は、上記ＩＤ割り当て処理を概念的に示した図である。
図８に示すように、ＩＤ処理部３２は、各画素データｄに、既に記憶されている画素データから上記閾値ｄｔｈの範囲内にあるか否かを基に、ＩＤを割り当てることにより、各画素データｄを複数のクラスタ（クラスタＡ〜Ｂ）に分類する（初期クラスタリング）。各クラスタに属する画素データは、それぞれの値は異なっていても、初期クラスタリング処理により、同一の値を有する画素データ、すなわち、例えば各クラスタに属する複数の画素データの平均値を有する画素データと見なされる。 8 and 9 are diagrams conceptually showing the ID assignment process.
As shown in FIG. 8, the ID processing unit 32 assigns each pixel data d an ID based on whether or not the pixel data d is within the range of the threshold value dth from the already stored pixel data. Data d is classified into a plurality of clusters (clusters A to B) (initial clustering). Pixel data belonging to each cluster is regarded as pixel data having the same value by the initial clustering process, that is, for example, pixel data having an average value of a plurality of pixel data belonging to each cluster even though the values are different. It is.

図９に示すように、逐次読み込まれる画素データは、先にＩＤを付与された各クラスタに属する画素データのうち、最初に読み込まれた画素データと順に比較される。 As shown in FIG. 9, sequentially read pixel data is sequentially compared with the first read pixel data among the pixel data belonging to each cluster previously given an ID.

すなわち、同図において、２番目に読み込まれた画素データｄ（２）は、先にＩＤ＝１を付与された画素データｄ（１）と比較され、画素データｄ（２）が当該画素データｄ（１）から閾値ｄｔｈ以内にあるため、画素データｄ（２）にＩＤ＝１が付与される。 That is, in the figure, the pixel data d (2) read second is compared with the pixel data d (1) previously assigned ID = 1, and the pixel data d (2) is compared with the pixel data d. Since it is within the threshold value dth from (1), ID = 1 is assigned to the pixel data d (2).

３番目に読み込まれた画素データｄ（３）は、画素データｄ（１）と比較され、当該画素データｄ（３）は画素データｄ（１）から閾値ｄｔｈ以内にないため、画素データｄ（３）には、ＩＤ＝１から１インクリメントされたＩＤ＝２が付与される。 The pixel data d (3) read third is compared with the pixel data d (1). Since the pixel data d (3) is not within the threshold value dth from the pixel data d (1), the pixel data d ( 3) is given ID = 2 which is incremented by 1 from ID = 1.

４番目に読み込まれた画素データｄ（４）は、ＩＤ＝１を有する画素データｄ（１）及びＩＤ＝２を有するｄ（２）と順次比較され、当該画素データｄ（４）は、画素データｄ（１）及び画素データｄ（２）のいずれからも閾値ｄｔｈ以内にないため、画素データｄ（３）には、ＩＤ＝２から１インクリメントされたＩＤ＝３が付与される。 The pixel data d (4) read fourth is sequentially compared with pixel data d (1) having ID = 1 and d (2) having ID = 2, and the pixel data d (4) Since neither the data d (1) nor the pixel data d (2) is within the threshold dth, the pixel data d (3) is given ID = 3 which is incremented by 1 from ID = 2.

５番目に読み込まれた画素データｄ（５）は、ＩＤ＝１を有する画素データｄ（１）及びＩＤ＝３を有する画素データｄ（３）のいずれからも閾値ｄｔｈ以内にある（閾値ｄｔｈの範囲がオーバーラップしている）が、この場合は、先に付与された（ＩＤのカウント値が小さい）ＩＤ＝１が画素データｄ（５）に付与される。もちろん、閾値ｄｔｈの範囲がオーバーラップする場合に、後に付与された（ＩＤのカウント値が大きい）ＩＤ（この場合ＩＤ＝３）が付与されても構わない。 The pixel data d (5) read fifth is within the threshold value dth from both the pixel data d (1) having ID = 1 and the pixel data d (3) having ID = 3 (with the threshold value dth). However, in this case, ID = 1 previously assigned (with a small ID count value) is assigned to the pixel data d (5). Of course, when the ranges of the threshold value dth overlap, an ID (in this case, ID = 3) assigned later (ID count value is large) may be assigned.

次に、上記図６のステップ６７におけるＩＤ補間処理の詳細について説明する。 Next, details of the ID interpolation process in step 67 of FIG. 6 will be described.

図１０は、ＩＤ補間処理の概略的な流れを示したフローチャートである。図１１は、ＩＤ補間処理を概念的に示した図である。
図１０に示すように、ＩＤ補間処理は、画像データ中のＸ方向におけるＩＤ置換処理（ステップ１０１）と、Ｙ方向におけるＩＤ置換処理（ステップ１０２）からなる。 FIG. 10 is a flowchart showing a schematic flow of the ID interpolation process. FIG. 11 is a diagram conceptually showing the ID interpolation processing.
As shown in FIG. 10, the ID interpolation process includes an ID replacement process (step 101) in the X direction in the image data and an ID replacement process (step 102) in the Y direction.

すなわち、図１１（Ａ）及び（Ｂ）に示すように、補間処理部３４は、画像データ中の、Ｘ方向及びＹ方向の各方向の画素データについて、同一のＩＤ（ＩＤ＝ｎ）を有する複数の画素データの間に存在する、異なるＩＤ（ＩＤ＝ｍ）を有する画素データのＩＤを、その両隣に存在する画素データのＩＤ＝ｎに置換する。これにより、画像データ中のノイズ成分が除去されることとなる。すなわち、補間処理部３４は、画素毎に大きくデータ値が変化することはほとんどなく、類似の画素データの間に存在する画素データも類似の画素データであるという前提の下、間に存在する画素データのＩＤが異なる場合には、当該画素データはノイズであると見なすことにしている。 That is, as shown in FIGS. 11A and 11B, the interpolation processing unit 34 has the same ID (ID = n) for the pixel data in the X direction and the Y direction in the image data. The ID of pixel data having different IDs (ID = m) existing between a plurality of pixel data is replaced with ID = n of pixel data existing on both sides thereof. As a result, the noise component in the image data is removed. In other words, the interpolation processing unit 34 hardly changes the data value for each pixel, and the pixel data existing between similar pixel data is also assumed to be similar pixel data. When the data IDs are different, the pixel data is regarded as noise.

図１２は、上記Ｘ方向におけるＩＤ置換処理の流れを示したフローチャートであり、図１３は、上記Ｙ方向におけるＩＤ置換処理の流れを示したフローチャートである。 FIG. 12 is a flowchart showing the flow of the ID replacement process in the X direction, and FIG. 13 is a flowchart showing the flow of the ID replacement process in the Y direction.

両図において、補間処理部３４は、例えば１つの画像データの左上端部を原点と見なして、当該原点からＸ方向（右方向）及びＹ方向（下方向）に順に画素データを処理することとしている。また、Ｘ方向、Ｙ方向の各画素データの最大座標を、それぞれｘｍａｘ、ｙｍａｘとする。画像データがＶＧＡサイズである場合、Ｘ方向に６４０、Ｙ方向に４８０の画素データが存在するため、ｘｍａｘ＝６４０、ｙｍａｘ＝４８０となる。 In both figures, the interpolation processing unit 34 considers, for example, the upper left end of one piece of image data as the origin, and processes pixel data sequentially from the origin in the X direction (right direction) and the Y direction (down direction). Yes. Further, the maximum coordinates of the pixel data in the X direction and the Y direction are assumed to be xmax and ymax, respectively. When the image data is VGA size, since there are 640 pixel data in the X direction and 480 in the Y direction, xmax = 640 and ymax = 480.

本実施形態においては、例えばＸ方向及びＹ方向において、連続する５つの画素データに着目する。この５つの画素データをｄ（ｍ）、ｄ（ｍ＋１）、ｄ（ｍ＋２）、ｄ（ｍ＋３）及びｄ（ｍ＋４）とし（ｍ≧０）、各画素データのＩＤをＩＤ（ｍ）、ＩＤ（ｍ＋１）、ＩＤ（ｍ＋２）、ＩＤ（ｍ＋３）及びＩＤ（ｍ＋４）とした場合、補間処理部３４は、前後２つのデータに挟まれた画素データｄ（ｍ＋２）のＩＤ（ｍ＋２）を置換すべきか否かを判断する。以下、処理の詳細を示す。 In the present embodiment, attention is paid to five consecutive pixel data in the X direction and the Y direction, for example. These five pixel data are d (m), d (m + 1), d (m + 2), d (m + 3), and d (m + 4) (m ≧ 0), and the ID of each pixel data is ID (m), ID ( If m + 1), ID (m + 2), ID (m + 3), and ID (m + 4), should the interpolation processing unit 34 replace ID (m + 2) of the pixel data d (m + 2) sandwiched between the two data before and after? Judge whether or not. Details of the processing will be described below.

図１２に示すように、まず、補間処理部３４は、１つの画像データ中のＹ方向の画素データの座標カウンタ値をｙ＝０に初期化する（ステップ１２１）。続いて、補間処理部３４は、画像データ中のＸ方向の画素データの座標カウンタ値をｘ＝４に初期化する。 As shown in FIG. 12, first, the interpolation processing unit 34 initializes the coordinate counter value of pixel data in the Y direction in one image data to y = 0 (step 121). Subsequently, the interpolation processing unit 34 initializes the coordinate counter value of the pixel data in the X direction in the image data to x = 4.

続いて、補間処理部３４は、上記５つの画素データが、Ｘ方向における置換条件を満たすか否かを判断する（ステップ１２３）。Ｘ方向の置換条件は、以下の式で表される。 Subsequently, the interpolation processing unit 34 determines whether or not the five pixel data satisfy the replacement condition in the X direction (step 123). The substitution condition in the X direction is expressed by the following formula.

ＩＤ（（ｘ−４）＋ｙ・ｘｍａｘ）＝ＩＤ（（ｘ−３）＋ｙ・ｘｍａｘ）＝
ＩＤ（（ｘ−１）＋ｙ・ｘｍａｘ）＝ＩＤ（ｘ＋ｙ・ｘｍａｘ）
かつ
ＩＤ（（ｘ−２）＋ｙ・ｘｍａｘ）≠ＩＤ（ｘ＋ｙ・ｘｍａｘ） ID ((x−4) + y · xmax) = ID ((x−3) + y · xmax) =
ID ((x-1) + y · xmax) = ID (x + y · xmax)
And ID ((x−2) + y · xmax) ≠ ID (x + y · xmax)

すなわち、５つの画素データのうち、１番目、２番目、４番目及び５番目の各画素データの各ＩＤが全て同一で、かつ、それらが３番目の画素データのＩＤと異なる、という条件が満たされているか否かが判定される。 That is, among the five pieces of pixel data, the first, second, fourth, and fifth pixel data have the same ID and are different from the third pixel data ID. It is determined whether or not it has been done.

当該置換条件が満たされていると判定された場合には、補間処理部３４は、３番目の画素データのＩＤを他の４つの画素データのＩＤへ置換する（ステップ１２４）。置換条件が満たされてないと判定された場合には、補間処理部３４は、ＩＤ置換処理は実行しない。 If it is determined that the replacement condition is satisfied, the interpolation processing unit 34 replaces the ID of the third pixel data with the IDs of the other four pixel data (step 124). When it is determined that the replacement condition is not satisfied, the interpolation processing unit 34 does not execute the ID replacement process.

その後、補間処理部３４は、Ｘ座標のカウンタ値を１インクリメントして（ステップ１２５）、当該インクリメントしたＸ座標がｘｍａｘであるか否かを判断し（ステップ１２６）、Ｘ座標がｘｍａｘでないと判断された場合には（Ｎｏ）、Ｘ座標がｘｍａｘとなるまでＸ座標を右方向へ移動させて、上記ステップ１２３及びステップ１２４の処理を繰り返す。 Thereafter, the interpolation processing unit 34 increments the X coordinate counter value by 1 (step 125), determines whether or not the incremented X coordinate is xmax (step 126), and determines that the X coordinate is not xmax. If it is determined (No), the X coordinate is moved to the right until the X coordinate reaches xmax, and the processing in steps 123 and 124 is repeated.

Ｘ座標がｘｍａｘであると判断された場合には（Ｙｅｓ）、Ｙ座標のカウンタ値を１インクリメントして（ステップ１２７）、当該インクリメントしたＹ座標がｙｍａｘであるか否かを判断し（ステップ１２８）、Ｙ座標がｙｍａｘでないと判断された場合には（Ｎｏ）、Ｙ座標がｙｍａｘとなるまでＹ座標を下方向へ移動させて、上記ステップ１２２〜ステップ１２８の処理を繰り返す。 If it is determined that the X coordinate is xmax (Yes), the counter value of the Y coordinate is incremented by 1 (step 127), and it is determined whether or not the incremented Y coordinate is ymax (step 128). ), When it is determined that the Y coordinate is not ymax (No), the Y coordinate is moved downward until the Y coordinate becomes ymax, and the processing of step 122 to step 128 is repeated.

以上により、画像データを構成する全ての画素データの、Ｘ方向についてのＩＤ置換処理が終了する。 As described above, the ID replacement process for all the pixel data constituting the image data in the X direction is completed.

図１３に示すように、Ｙ方向のについての置換処理も、図１２に示したＩＤ置換処理と同様に実行することができる。Ｙ方向の置換条件は、以下の式で表される。 As shown in FIG. 13, the replacement process for the Y direction can be executed in the same manner as the ID replacement process shown in FIG. The substitution condition in the Y direction is expressed by the following formula.

ＩＤ（ｘ＋（ｙ−４）・ｘｍａｘ）＝ＩＤ（ｘ＋（ｙ−３）・ｘｍａｘ）＝
ＩＤ（ｘ＋（ｙ−１）・ｘｍａｘ）＝ＩＤ（ｘ＋ｙ・ｘｍａｘ）
かつ
ＩＤ（ｘ＋（ｙ−２）・ｘｍａｘ）≠ＩＤ（ｘ＋ｙ・ｘｍａｘ） ID (x + (y−4) · xmax) = ID (x + (y−3) · xmax) =
ID (x + (y-1) .xmax) = ID (x + y.xmax)
And ID (x + (y−2) · xmax) ≠ ID (x + y · xmax)

以上のＸ方向、Ｙ方向の各置換処理により、画像データ中のノイズ成分が除去される。これにより、初期クラスタリングにより生成されるクラスタ数が削減され、その後の本クラスタリング処理の高速化を図ることができる。 The noise components in the image data are removed by the above replacement processing in the X direction and the Y direction. As a result, the number of clusters generated by the initial clustering is reduced, and the subsequent clustering process can be speeded up.

なお、画像データによっては、ノイズである画素データが、連続的に含まれていたり、例えば１画素データ置きに断続的に含まれていたりする可能性もある。そのような場合には、上述のように両隣を２つずつの画素データに挟まれた１つの画素データのみについてＩＤ置換処理を実行していては、ノイズが除去できない可能性がある。しかしながら、本実施形態においては、ＩＤ処理部３２によるＩＤ処理の前に、ＬＰＦ処理部３１がノイズを除去しているため、そのようなノイズも除去することが可能となっている。もちろん、補間処理部３４は、上記連続的または断続的なノイズ除去のために、上記図１２のステップ１２３や図１３のステップ１３３のような置換条件を用いてＩＤ置換処理を実行しても構わない。 Note that, depending on the image data, pixel data that is noise may be included continuously, or may be included intermittently, for example, every other pixel data. In such a case, as described above, noise may not be removed if the ID replacement process is executed only for one piece of pixel data sandwiched between two pieces of pixel data on both sides as described above. However, in the present embodiment, since the LPF processing unit 31 removes noise before the ID processing by the ID processing unit 32, it is possible to remove such noise. Of course, the interpolation processing unit 34 may execute the ID replacement process using the replacement conditions as in step 123 of FIG. 12 and step 133 of FIG. 13 in order to remove the continuous or intermittent noise. Absent.

図１４は、上記初期クラスタリング処理により生成されるクラスタ数と、閾値との関係を示したグラフである。
同図に示すように、初期クラスタ数と閾値とは、反比例の関係にある。閾値が大きすぎると、画素データの分類精度が低くなり、閾値が小さすぎると、その後の本クラスタリング処理の収束時間が長くなってしまう。したがって、初期クラスタ数の許容範囲（ｎ１〜ｎ２）は、上記K-meansによる本クラスタリング処理の目標クラスタ数（ｎ０〜ｎ１）の範囲（例えば５〜２０クラスタ）に応じて、当該本クラスタリング処理の負担とならず、かつ、分類の効果を得られるような範囲（例えば２０〜５０クラスタ）に設定され、それに応じて閾値の範囲（Ｔｈ１〜Ｔｈ２）も設定される。 FIG. 14 is a graph showing the relationship between the number of clusters generated by the initial clustering process and the threshold value.
As shown in the figure, the initial cluster number and the threshold value are in an inversely proportional relationship. If the threshold is too large, the classification accuracy of the pixel data will be low, and if the threshold is too small, the convergence time of the subsequent clustering process will be long. Therefore, the permissible range (n1 to n2) of the initial number of clusters depends on the range of the target cluster number (n0 to n1) of the main clustering process by the K-means (for example, 5 to 20 clusters). A range (for example, 20 to 50 clusters) that does not cause a burden and can obtain the effect of classification is set, and a threshold range (Th1 to Th2) is also set accordingly.

上記閾値設定部３３は、予め閾値を固定せずに、初期クラスタリング処理の結果に応じて、初期クラスタ数が上記許容範囲（ｎ１〜ｎ２）に収まるまで閾値を可変して初期クラスタリング処理を繰り返すことで、閾値の設定許容範囲を学習するようにしても構わない。 The threshold value setting unit 33 repeats the initial clustering process by changing the threshold value until the number of initial clusters falls within the allowable range (n1 to n2) according to the result of the initial clustering process without fixing the threshold value in advance. Thus, the threshold setting allowable range may be learned.

以上のように、初期クラスタリング処理（ＩＤ割り当て処理）により第１の数のクラスタに分類された（クラスタ数を削減された）画素データは、上記ＩＤ置換処理を経て、上記K-means処理部３５に供給され、本クラスタリング処理により、第１の数よりも少ない第２の数（目標クラスタ数）のクラスタに分類される。 As described above, the pixel data classified into the first number of clusters (reduced number of clusters) by the initial clustering process (ID assignment process) is subjected to the ID replacement process and the K-means processing unit 35. In this clustering process, the second number (target cluster number) of clusters is smaller than the first number.

そして、セグメンテーション処理部２０は、当該本クラスタリング処理による分類されたクラスタ毎に、画像データから上記第２の数の領域を抽出する。すなわち、セグメンテーション処理部２０は、画像データを、任意形状の第２の数の領域に分割する。セグメンテーション処理部２０は、このセグメンテーション処理を、各映像コンテンツを構成する全ての画像データについて実行し、その結果を上記ＨＤＤ１０やフラッシュメモリ等に記憶する。 Then, the segmentation processing unit 20 extracts the second number of areas from the image data for each cluster classified by the main clustering process. That is, the segmentation processing unit 20 divides the image data into a second number of regions having an arbitrary shape. The segmentation processing unit 20 executes this segmentation processing for all the image data constituting each video content, and stores the result in the HDD 10 or the flash memory.

図１５は、以上説明した本実施形態における２段階のクラスタリング処理と、従来のクラスタリング処理とを比較して示した概念図である。同図（Ａ）〜（Ｃ）は、上記画素データ（特徴データ）を、特徴ベクトルデータとして、多次元（６次元）の特徴ベクトル空間における当該特徴ベクトルデータの分布を概念的に示している。
従来のクラスタリング処理においては、同図（Ａ）のＩ個（例えば307,200個）の初期の画素データに、上記K-means等の手法によりクラスタリング処理を施すことで、同図（Ｃ）に示すようにＫ２個（例えば５〜２０個）のクラスタに削減している。しかし、このように初期データを直接K-means等の手法により処理すると、クラスタ数が上記Ｋ２個に収束するまでに大きな時間を要してしまう。 FIG. 15 is a conceptual diagram showing a comparison between the two-stage clustering process in the present embodiment described above and the conventional clustering process. FIGS. 9A to 9C conceptually show the distribution of feature vector data in a multi-dimensional (six-dimensional) feature vector space using the pixel data (feature data) as feature vector data.
In the conventional clustering processing, clustering processing is performed on the I (for example, 307,200) initial pixel data in FIG. 6A by the method such as K-means as shown in FIG. The number is reduced to K2 (for example, 5 to 20) clusters. However, if the initial data is directly processed by a technique such as K-means in this way, a long time is required until the number of clusters converges to the above K2.

そこで本実施形態においては、同図（Ｂ）に示すように、上記ＩＤ付与による初期クラスタリング処理により、まずＫ１個（例えば２０〜５０個）のクラスタに分類してから、K-meansによる本クラスタリング処理により、同図（Ｃ）に示すＫ１個のクラスタに分類することとしている。 Therefore, in the present embodiment, as shown in FIG. 5B, the initial clustering process using ID assignment first classifies the cluster into K1 (for example, 20 to 50) clusters, and then performs the main clustering using K-means. By processing, it is classified into K1 clusters shown in FIG.

以上説明したように、初期クラスタリング処理によりクラスタ数を削減することで、目標クラスタ数に画素データを分類するまでの時間を大幅に短縮し、記録再生装置１００への負荷も軽減することができる。 As described above, by reducing the number of clusters by the initial clustering process, the time until the pixel data is classified into the target number of clusters can be greatly shortened, and the load on the recording / reproducing apparatus 100 can be reduced.

次に、以上のようにセグメンテーション処理された画像データを用いた、上記図５のステップ５５におけるカメラ特徴の検出処理について説明する。 Next, the camera feature detection process in step 55 of FIG. 5 using the image data segmented as described above will be described.

図１６は、画像データ間のカメラ動作と、動きベクトルとの関係を模式的に示した図である。
同図（Ａ）及び（Ｂ）に示すように、１つの映像データ中の、時刻ｔにおける画像Ｆ１と、時刻ｔ２における画像Ｆ２とから、上記画像セグメンテーション処理により、静止物体である家の領域Ａと、動物体である車の領域Ｂと、背景である領域Ｃの３つの領域が抽出された場合を想定する。 FIG. 16 is a diagram schematically showing the relationship between camera operations between image data and motion vectors.
As shown in FIGS. 6A and 6B, a region A of a house that is a stationary object is obtained from the image F1 at time t and the image F2 at time t2 in one video data by the image segmentation process. Then, a case is assumed in which three regions of a vehicle region B that is a moving object and a region C that is a background are extracted.

例えば、画像Ｆ１から画像Ｆ２へ、右方向へパン操作が行われた場合、同図（Ｃ）に示すように、当該パン操作を示しているのは、動物体である領域Ｂの動きベクトルではなく、静止物体である領域Ａ及び領域Ｃの動きベクトルである。また、一般的に、背景の領域は、動物体の領域よりもその画素数（面積）が大きい。 For example, when a panning operation is performed in the right direction from the image F1 to the image F2, the panning operation is indicated by the motion vector of the region B that is a moving object, as shown in FIG. The motion vectors of the regions A and C, which are still objects, are not. In general, the background area has a larger number of pixels (area) than the area of the moving object.

そこで、映像特徴検出部４は、各抽出領域のうち、最大画素数を有する抽出領域の動きベクトルを、画像データ間における代表的な動きベクトルとして検出し、当該動きベクトルを基にカメラ特徴を判定することとしている。 Therefore, the video feature detection unit 4 detects the motion vector of the extraction region having the maximum number of pixels among the extraction regions as a representative motion vector between the image data, and determines the camera feature based on the motion vector. To do.

図１７は、映像特徴検出部４によるカメラ特徴検出処理の流れを示したフローチャートである。
同図に示すように、まず、映像特徴検出部４は、上記セグメンテーション処理により抽出された抽出領域毎に、１つの映像コンテンツ中の複数の画像データを入力する（ステップ１７１）。続いて、映像特徴検出部４は、各画像データ毎に、複数の抽出領域の画素数を算出し、最大画素数を有する抽出領域を選択する（ステップ１７２）。上記図１６においては、領域Ａ〜Ｃの各画素数は、Ｃ＞Ａ≒Ｂであるため、領域Ｃが選択される。 FIG. 17 is a flowchart showing the flow of camera feature detection processing by the video feature detection unit 4.
As shown in the figure, first, the video feature detection unit 4 inputs a plurality of image data in one video content for each extraction region extracted by the segmentation process (step 171). Subsequently, the video feature detection unit 4 calculates the number of pixels of the plurality of extraction regions for each image data, and selects the extraction region having the maximum number of pixels (step 172). In FIG. 16, since the number of pixels in the areas A to C is C> A≈B, the area C is selected.

続いて、映像特徴検出部４は、複数の画像データの、各選択された抽出領域間でブロックマッチング処理を行い、動きベクトルを検出する（ステップ１７３）。この処理により、上記図１６の場合では、動きベクトルＶ３のみが検出される。 Subsequently, the video feature detection unit 4 performs block matching processing between each selected extraction region of the plurality of image data, and detects a motion vector (step 173). By this processing, only the motion vector V3 is detected in the case of FIG.

続いて、映像特徴検出部４は、検出された動きベクトルデータを基に、重回帰分析処理を行い（ステップ１７４）、パン、チルト、ズームといった各カメラ特徴係数（アフィン係数）を算出する（ステップ１７５）。 Subsequently, the video feature detection unit 4 performs multiple regression analysis processing based on the detected motion vector data (step 174), and calculates camera feature coefficients (affine coefficients) such as pan, tilt, and zoom (step 174). 175).

そして、映像特徴検出部４は、選択された抽出領域のカメラ特徴係数を基に、画像データ間のカメラ特徴を判定し、その結果を出力する（ステップ１７６）。上記図１６の場合、算出されたパン係数を基に、パンが行われたことが判定される。 Then, the video feature detection unit 4 determines the camera feature between the image data based on the camera feature coefficient of the selected extraction region, and outputs the result (step 176). In the case of FIG. 16, it is determined that panning has been performed based on the calculated pan coefficient.

図１８は、上記ステップ１７３における動きベクトル検出処理の詳細を示したフローチャートである。
同図に示すように、映像特徴検出部４は、１つの画像データ（以下、基準フレームと称する）において選択された抽出領域と、当該基準フレームから１フレーム間隔、１０フレーム間隔、２０フレーム間隔及び３０フレーム間隔を置いた各フレームにおいて選択された抽出領域との間で、それぞれブロックマッチング処理を実行し、それぞれ動きベクトルデータを検出する（ステップ１８１〜１８４）。各フレーム間隔を置いたフレームは、例えば、フレーム間隔毎のフレームメモリ（図示せず）に保持されており、上記基準フレームとのブロックマッチング処理の度に当該フレームメモリから読み出される。 FIG. 18 is a flowchart showing details of the motion vector detection process in step 173.
As shown in the figure, the video feature detection unit 4 includes an extraction region selected in one image data (hereinafter referred to as a reference frame), an interval of 1 frame, an interval of 10 frames, an interval of 20 frames, and the like. Block matching processing is executed for each extracted region selected in each frame at intervals of 30 frames, and motion vector data is detected (steps 181 to 184). For example, frames at intervals of each frame are held in a frame memory (not shown) for each frame interval, and are read from the frame memory each time a block matching process is performed with the reference frame.

続いて、映像特徴検出部４は、各フレーム間隔について検出した動きベクトルデータを基に、所定フレーム間隔（例えば４０フレーム間隔）置いたフレーム（以下、探索フレームと称する）における動きベクトルデータを推定し（ステップ１８５）、この推定された動きベクトルデータを最終的な動きベクトルデータとして出力する（ステップ１８６）。この推定処理は、例えば、各フレーム間隔における動きベクトルデータの勾配を算出し、当該各勾配の平均値に、推定すべきフレーム間隔（例えば４０）を乗ずることで実現できる。 Subsequently, the video feature detection unit 4 estimates motion vector data in a frame (hereinafter referred to as a search frame) at a predetermined frame interval (for example, 40 frame interval) based on the motion vector data detected for each frame interval. (Step 185), the estimated motion vector data is output as final motion vector data (step 186). This estimation process can be realized, for example, by calculating the gradient of motion vector data at each frame interval and multiplying the average value of each gradient by the frame interval to be estimated (for example, 40).

そして、映像特徴検出部４は、１つの映像コンテンツを構成する全てのフレームの各選択された抽出領域について動きベクトルデータを出力したか否かを判断し、動きベクトルを検出すべきフレームがなくなるまで上記各ステップの処理を繰り返す（ステップ１８７）。 Then, the video feature detection unit 4 determines whether or not motion vector data has been output for each selected extraction region of all the frames constituting one video content, and until there is no frame for which a motion vector should be detected. The above steps are repeated (step 187).

次に、上記図１７のステップ１７４における重回帰分析処理によりアフィン係数を算出するためのアフィン変換モデルについて説明する。 Next, an affine transformation model for calculating affine coefficients by the multiple regression analysis process in step 174 of FIG. 17 will be described.

図１９は、アフィン変換モデルを示した図である。アフィン変換モデルは、３次元オブジェクトの平行移動、拡大／縮小、回転を、行列を用いた座標変換処理として記述するためのモデルである。上記パン、チルト、ズームといったカメラ特徴は、上記基準フレーム内の物体の平行移動、拡大／縮小であると考えられるため、アフィン変換モデルを用いることで、カメラ特徴を記述することが可能となる。 FIG. 19 is a diagram showing an affine transformation model. The affine transformation model is a model for describing translation, enlargement / reduction, and rotation of a three-dimensional object as coordinate transformation processing using a matrix. Since the camera features such as pan, tilt, and zoom are considered to be parallel movement and enlargement / reduction of an object in the reference frame, it is possible to describe the camera features by using an affine transformation model.

ここで、映像コンテンツにおいて、フレーム間隔が大きくない場合には、回転の特徴については、回転角θが小さいものとして、以下の近似処理を行うことができる。
ｓｉｎθ≒θ
ｃｏｓθ≒１ Here, in the video content, when the frame interval is not large, the following approximation process can be performed on the assumption that the rotation angle θ is small for the rotation feature.
sinθ ≒ θ
cos θ ≒ 1

したがって、アフィン変換モデルは、同図に示すように変形することができる。そして、上記検出した動きベクトルから、このアフィン変換モデルを用いて各係数を求めることで、カメラ特徴を検出することができる。すなわち、パン、チルト、ズームの各カメラ特徴について、所定の閾値Ｐｔｈ、Ｔｔｈ及びＺｔｈを設定しておき、上記検出された動きベクトルから処理した各アフィン係数と比較することで、各カメラ特徴を検出することができる。 Therefore, the affine transformation model can be modified as shown in FIG. And a camera characteristic can be detected by calculating | requiring each coefficient from this detected motion vector using this affine transformation model. That is, for each camera feature of pan, tilt, and zoom, predetermined threshold values Pth, Tth, and Zth are set, and each camera feature is detected by comparing with each affine coefficient processed from the detected motion vector. can do.

図２０は、重回帰分析によりアフィン係数を求める処理を示した図である。同図に示すように、映像特徴検出部４は、説明変数を、基準フレームの抽出領域における検出対象点（Ｐｎ）のｘ、ｙ座標（ｘｎ，ｙｎ）とし、非説明変数（目的変数）を、上記探索フレームの抽出領域における動きベクトルの検出位置（Ｐｍ）のｘ、ｙ座標（ｘｍ，ｙｍ）として、重回帰分析処理を行い、パン、チルト、ズームの各係数Ｐｘ、Ｐｙ、Ｚｘを求める。 FIG. 20 is a diagram showing processing for obtaining an affine coefficient by multiple regression analysis. As shown in the figure, the video feature detection unit 4 sets the explanatory variables as x and y coordinates (xn, yn) of the detection target point (Pn) in the extraction region of the reference frame, and sets the non-explanatory variables (object variables). Then, multiple regression analysis processing is performed as the x and y coordinates (xm, ym) of the motion vector detection position (Pm) in the search frame extraction region to obtain pan, tilt, and zoom coefficients Px, Py, and Zx. .

そして、映像特徴検出部４は、上記各係数Ｐｘ、Ｐｙ、Ｚｘと、上記閾値Ｐｔｈ、Ｔｔｈ及びＺｔｈとを比較して、各係数が各閾値よりも大きい場合には、各カメラ特徴が検出されたと見なして、当該検出を出力する。 The video feature detection unit 4 compares the coefficients Px, Py, and Zx with the threshold values Pth, Tth, and Zth, and detects each camera feature when each coefficient is greater than each threshold value. The detection is output.

なお、映像特徴検出部４は、パン、チルト、ズームの各カメラを、それぞれ左パン／右パン、左チルト／右チルト、ズームイン／ズームアウトをそれぞれ区別して検出するようにしても構わない。この区別は、アフィン係数の正負の符号を参照することで容易に行うことができる。 Note that the video feature detection unit 4 may detect the pan, tilt, and zoom cameras separately for left pan / right pan, left tilt / right tilt, and zoom in / zoom out, respectively. This distinction can be easily made by referring to the sign of the affine coefficient.

以上説明したように、画像セグメンテーションにより抽出された複数の領域のうち、最大画素数（面積）を有する抽出領域を基に画像データ間の動きベクトルを検出することで、動物体の動きに影響されずに、動きベクトルを高精度に検出ができ、より正確なカメラ特徴を検出することができる。 As described above, by detecting a motion vector between image data based on an extraction region having the maximum number of pixels (area) among a plurality of regions extracted by image segmentation, it is influenced by the movement of the moving object. Therefore, the motion vector can be detected with high accuracy, and more accurate camera features can be detected.

記録再生装置１００は、この検出されたカメラ特徴を基に、例えばダイジェスト再生（ハイライトシーン再生）等の処理を行う。すなわち、記録再生装置１００は、映像コンテンツのうち、各カメラ特徴が検出されたフレームを撮影者が注目しているシーンと見なし、映像コンテンツ中からそのようなフレームを抽出して、ユーザにダイジェスト映像を提供することができる。 The recording / reproducing apparatus 100 performs processing such as digest reproduction (highlight scene reproduction) based on the detected camera characteristics. In other words, the recording / reproducing apparatus 100 regards a frame in which each camera feature is detected in the video content as a scene in which the photographer is paying attention, extracts such a frame from the video content, and sends the digest video to the user. Can be provided.

［第２の実施形態］
次に、本発明の第２の実施形態について説明する。 [Second Embodiment]
Next, a second embodiment of the present invention will be described.

図２１は、本実施形態に係る記録再生装置２００の構成を示した図である。本実施形態において、記録再生装置２００は、上記画像セグメンテーション処理によって抽出された領域毎に、複数の画像データ間で類似性を判断することで、画像検索処理を実行することができる。 FIG. 21 is a diagram showing a configuration of the recording / reproducing apparatus 200 according to the present embodiment. In the present embodiment, the recording / reproducing apparatus 200 can execute the image search process by determining the similarity between a plurality of image data for each region extracted by the image segmentation process.

同図に示すように、記録再生装置２００は、上記第１の実施形態の図１で示した記録再生装置１００の映像特徴検出部４に代えて、画像検索部３０を有する。その他の各部については、上記第１実施形態における記録再生装置１００と同様であるため、説明を省略する。 As shown in the figure, the recording / reproducing apparatus 200 includes an image search unit 30 instead of the video feature detection unit 4 of the recording / reproducing apparatus 100 shown in FIG. 1 of the first embodiment. Since other parts are the same as those of the recording / reproducing apparatus 100 in the first embodiment, the description thereof is omitted.

図２２は、本実施形態に係る記録再生装置２００の動作の流れを示したフローチャートである。
同図に示すように、記録再生装置２００は、上記セグメンテーション処理部２０により、入力された複数の画像データの各画素データについて、初期クラスタリング処理、本クラスタリング処理、領域抽出処理を実行する（ステップ２２１〜２２４）。これらの各処理については、上記第１の実施形態と同様であるため、説明を省略する。続いて、記録再生装置２００は、上記画像検索部３０により、画像検索処理を実行する（ステップ２２５）。 FIG. 22 is a flowchart showing a flow of operations of the recording / reproducing apparatus 200 according to the present embodiment.
As shown in the figure, in the recording / reproducing apparatus 200, the segmentation processing unit 20 performs initial clustering processing, main clustering processing, and region extraction processing on each pixel data of a plurality of input image data (step 221). ~ 224). Since these processes are the same as those in the first embodiment, description thereof is omitted. Subsequently, the recording / reproducing apparatus 200 performs an image search process using the image search unit 30 (step 225).

図２３は、上記画像検索処理の詳細な流れを示したフローチャートである。また、図２４は、当該画像検索処理を概念的に示した図である。
両図に示すように、まず、画像検索部３０は、上記セグメンテーション処理により抽出された抽出領域の画素数を算出し、最大画素数を有する抽出領域を選択する（ステップ２３１）。 FIG. 23 is a flowchart showing a detailed flow of the image search process. FIG. 24 is a diagram conceptually showing the image search process.
As shown in both figures, first, the image search unit 30 calculates the number of pixels in the extraction region extracted by the segmentation process, and selects the extraction region having the maximum number of pixels (step 231).

続いて、画像検索部３０は、選択された抽出領域の各特徴ベクトルを生成する（ステップ２３２）。すなわち、画像検索部３０は、図２４に示すように、各画像データを構成する画素データの３次元の色特徴データ（ｃｎ＝（ｄｎｒ，ｄｎｇ，ｄｎｂ））と、３次元のテクスチャ特徴データ（ｔｎ＝（ｄｎｌ，ｄｎｌｈ，ｄｎｈｌ））を基に、６次元の特徴ベクトルデータＶｎ＝（ｄｎｌｌ，ｄｎｌｈ，ｄｎｈｌ，ｄｎｌｌ，ｄｎｌｈ，ｄｎｈｌ）を作成する。 Subsequently, the image search unit 30 generates each feature vector of the selected extraction region (step 232). That is, as shown in FIG. 24, the image search unit 30 3D color feature data (cn = (dnr, dng, dnb)) of pixel data constituting each image data and 3D texture feature data ( Based on tn = (dnl, dnlh, dnhl)), 6-dimensional feature vector data Vn = (dnll, dnlh, dnhl, dnll, dnlh, dnhl) are created.

続いて、画像検索部３０は、上記生成された特徴ベクトルデータのうち、１つの画像データ（基準画像）の特徴ベクトルデータを検索キーとして、基準画像の特徴ベクトルデータと、他の画像データ（探索画像）の特徴ベクトルデータとの間でベクトル間距離演算を実行することで、基準画像と探索画像との類似性を判断する（ステップ２３３）。 Subsequently, the image search unit 30 uses the feature vector data of one image data (reference image) of the generated feature vector data as a search key, and the feature vector data of the reference image and other image data (search The similarity between the reference image and the search image is determined by executing the inter-vector distance calculation with the feature vector data of the image) (step 233).

そして、画像検索部３０は、上記判断結果を、画像検索結果として出力する（ステップ２３４）。例えば、画像検索部３０は、上記ベクトル間距離に所定の閾値を設けておき、基準画像の特徴ベクトルと探索画像の特徴ベクトルとのベクトル間距離が当該閾値以内であれば、当該探索画像を類似画像として出力する。 Then, the image search unit 30 outputs the determination result as an image search result (step 234). For example, the image search unit 30 sets a predetermined threshold for the above-described vector distance, and if the inter-vector distance between the feature vector of the reference image and the feature vector of the search image is within the threshold, the search image is similar Output as an image.

以上説明したように、本実施形態においては、上記高速化された画像セグメンテーション処理により抽出された領域の特徴ベクトルデータを用いて画像検索処理を実行することで、特徴ベクトル生成処理の高速化を図ることができ、その結果、画像検索処理の高速化を図ることができる。 As described above, in the present embodiment, the speed of the feature vector generation process is increased by executing the image search process using the feature vector data of the region extracted by the accelerated image segmentation process. As a result, the image search process can be speeded up.

［第３の実施形態］
次に、本発明の第３の実施形態について説明する。 [Third Embodiment]
Next, a third embodiment of the present invention will be described.

本実施形態において、記録再生装置は、上記画像セグメンテーション処理による領域抽出結果を、映像データのエンコード処理におけるオブジェクト符号化処理に応用している。 In the present embodiment, the recording / reproducing apparatus applies the region extraction result by the image segmentation process to the object encoding process in the video data encoding process.

図２５は、本実施形態に係る記録再生装置３００の構成を示した図である。
同図に示すように、記録再生装置２００は、上記第１及び第２の実施形態で示した記録再生装置１００及び２００と比較して、映像特徴検出部４及び画像検索部４０が廃されている。また、記録再生装置１００及び２００のＡＶデコーダ１６に代えて、映像データのＭＰＥＧ形式でのエンコード及びデコードが可能なＡＶコーデック２５１が設けられている。このＡＶコーデック２５１が、上記オブジェクト符号化処理を担う。その他の各部については、上記第１実施形態における記録再生装置１００と同様であるため、説明を省略する。 FIG. 25 is a diagram showing a configuration of the recording / reproducing apparatus 300 according to the present embodiment.
As shown in the figure, the recording / reproducing apparatus 200 is different from the recording / reproducing apparatuses 100 and 200 shown in the first and second embodiments in that the video feature detection unit 4 and the image search unit 40 are eliminated. Yes. Further, an AV codec 251 capable of encoding and decoding video data in the MPEG format is provided in place of the AV decoder 16 of the recording / playback apparatuses 100 and 200. The AV codec 251 is responsible for the object encoding process. Since other parts are the same as those of the recording / reproducing apparatus 100 in the first embodiment, the description thereof is omitted.

図２６は、本実施形態におけるオブジェクト符号化処理の流れを示したフローチャートである。
同図に示すように、記録再生装置１００は、上記セグメンテーション処理部２０により、入力された複数の画像データの各画素データについて、初期クラスタリング処理、本クラスタリング処理、領域抽出処理を実行する（ステップ２６１〜２６４）。これらの各処理については、上記第１及び第２の実施形態と同様であるため、説明を省略する。続いて、記録再生装置１００は、上記ＡＶコーデック２５１により、映像データのエンコード処理を実行する（ステップ２６５）。 FIG. 26 is a flowchart showing the flow of object encoding processing in the present embodiment.
As shown in the figure, in the recording / reproducing apparatus 100, the segmentation processing unit 20 performs initial clustering processing, main clustering processing, and region extraction processing on each pixel data of a plurality of input image data (step 261). ~ 264). Since these processes are the same as those in the first and second embodiments, description thereof will be omitted. Subsequently, the recording / reproducing apparatus 100 executes an encoding process of the video data by the AV codec 251 (step 265).

図２７は、上記ステップ２６５におけるエンコード処理の流れを示したフローチャートである。
同図に示すように、ＡＶコーデック２５１は、上記画像セグメンテーション処理により抽出された領域毎に映像データを入力する（ステップ２７１）。 FIG. 27 is a flowchart showing the flow of the encoding process in step 265.
As shown in the figure, the AV codec 251 inputs video data for each area extracted by the image segmentation process (step 271).

続いて、ＡＶコーデック２５１は、上記抽出領域毎に動き予測を行う（ステップ２７２）。具体的には、ＡＶコーデック２５１は、入力された映像データの各画像データ（フレーム）と、フレームメモリに格納されている予測用の参照画像データとから、上記抽出領域毎に動きベクトルを検出する。次に、ＡＶコーデック２５１は、当該動きベクトルにより当該予測用参照画像データを動き補償して、予測画像を生成する。さらに、ＡＶコーデック２５１は、入力画像と予測画像との差分を、上記抽出領域毎に算出する。 Subsequently, the AV codec 251 performs motion prediction for each extraction region (step 272). Specifically, the AV codec 251 detects a motion vector for each extraction region from each image data (frame) of the input video data and prediction reference image data stored in the frame memory. . Next, the AV codec 251 performs motion compensation on the prediction reference image data using the motion vector, and generates a predicted image. Further, the AV codec 251 calculates the difference between the input image and the predicted image for each extraction region.

続いて、ＡＶコーデック２５１は、上記抽出領域毎の差分データに、ＤＣＴ変換（離散コサイン変換）処理を施し（ステップ２７３）、さらに当該ＤＣＴ変換後の差分データを量子化する（ステップ２７４）。 Subsequently, the AV codec 251 performs DCT transformation (discrete cosine transformation) processing on the difference data for each extraction region (step 273), and further quantizes the difference data after the DCT transformation (step 274).

続いて、ＡＶコーデック２５１は、量子化後の抽出領域毎の差分データに、可変長符号化（ＶＬＣ）により２次元ハフマン符号を割り当て、また上記抽出領域毎に検出された動きベクトルデータにも２次元ハフマン符号を割り当てる（ステップ２７５）。 Subsequently, the AV codec 251 assigns a two-dimensional Huffman code by variable length coding (VLC) to the difference data for each extraction region after quantization, and also adds 2 to the motion vector data detected for each extraction region. A dimensional Huffman code is assigned (step 275).

そして、ＡＶコーデック２５１は、ハフマン符号を割り当てられた差分データ及び動きベクトルデータを多重化して、符号化データとして出力する（ステップ２７６）。 Then, the AV codec 251 multiplexes the difference data and motion vector data to which the Huffman code is assigned, and outputs it as encoded data (step 276).

以上説明したように、本実施形態においては、上記高速化された画像セグメンテーション処理により抽出された領域を、オブジェクト符号化処理に利用することで、映像エンコード処理の高速化を図ることができる。 As described above, in the present embodiment, the speed of the video encoding process can be increased by using the region extracted by the accelerated image segmentation process for the object encoding process.

本発明は上述の実施形態にのみ限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々変更を加え得ることは勿論である。 The present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the scope of the present invention.

上記第１の実施形態において、映像特徴検出部４は、最大画素数（面積）を有する抽出領域のみの動きベクトルを検出していた。しかし、映像特徴検出部４は、複数の抽出領域の動きベクトルをそれぞれ検出し、各動きベクトルのうち、最も多く検出された動きベクトルを画像データ全体間における動きベクトルとして検出してもよい。 In the first embodiment, the video feature detection unit 4 detects only the motion vector of the extraction region having the maximum number of pixels (area). However, the video feature detection unit 4 may detect the motion vectors of the plurality of extraction regions, and may detect the motion vector detected most frequently as a motion vector between the entire image data.

また、映像特徴検出部４は、複数の抽出領域の動きベクトルをそれぞれ検出し、各動きベクトルに抽出領域の画素数（面積）に応じた重みを付与して、画像データ全体としての動きベクトルを検出してもよい。すなわち、映像特徴検出部４は、抽出領域の画素数が大きい程高く重み付けが可能な評価関数を用いて、各抽出領域の動きベクトルデータを重み付けすることで、画像データ全体としての動きベクトルの評価値を算出しても構わない。これにより、全ての抽出領域の画素数を反映したより正確な動きベクトル検出が可能となる。 In addition, the video feature detection unit 4 detects motion vectors of a plurality of extraction regions, assigns a weight according to the number of pixels (area) of the extraction region to each motion vector, and determines a motion vector as the entire image data. It may be detected. In other words, the video feature detection unit 4 weights the motion vector data of each extraction region using an evaluation function that can be weighted higher as the number of pixels in the extraction region is larger, thereby evaluating the motion vector of the entire image data. The value may be calculated. As a result, more accurate motion vector detection reflecting the number of pixels in all extraction regions is possible.

さらに、映像特徴検出部４は、複数の抽出領域のうち、画素数（面積）が最小となる抽出領域間の動きベクトルを画像データ全体としての動きベクトルとしても構わない。これにより、動物体に着目した動きベクトル検出が可能となる。 Further, the video feature detection unit 4 may use a motion vector between extraction regions having the smallest number of pixels (area) among a plurality of extraction regions as a motion vector of the entire image data. As a result, motion vector detection focusing on the moving object can be performed.

上記第１の実施形態において、映像特徴検出部４は、上記画像セグメンテーション処理及び動きベクトル検出処理により、カメラ特徴を検出していた。しかし、映像特徴検出部４は、この画像セグメンテーション処理及び動きベクトル検出処理を、例えば動物体を検出する処理にも応用することができる。 In the first embodiment, the video feature detection unit 4 detects camera features by the image segmentation process and the motion vector detection process. However, the video feature detection unit 4 can also apply this image segmentation process and motion vector detection process to, for example, a process of detecting a moving object.

すなわち、映像特徴検出部４は、上記画像セグメンテーション処理及び抽出領域毎の動きベクトル検出処理により、画像データ内の物体が動いているのか、静止しているのかを判定することで、動物体が含まれる画像データ（フレーム）を抽出することができる。 That is, the video feature detection unit 4 includes the moving object by determining whether the object in the image data is moving or stationary by the image segmentation process and the motion vector detection process for each extraction region. Image data (frame) can be extracted.

例えば、映像特徴検出部４は、領域の占める割合が大きく、かつ、動いている物体が含まれるフレームを抽出して、そのフレーム区間をダイジェスト再生させることができる。大きい領域は、撮影者がズーム等により注目している領域、動いている領域は、アクティブな領域としてそれぞれ捉えることができるからである。 For example, the video feature detection unit 4 can extract a frame in which a region occupies a large proportion and includes a moving object, and can perform digest playback of the frame section. This is because a large area can be regarded as an area that the photographer is paying attention to by zooming or the like, and a moving area can be regarded as an active area.

上記第２の実施形態においては、画像検索部３０は、最大画素数（面積）を有する抽出領域のみの画素データ（特徴データ）から特徴ベクトルデータを生成していた。しかし、画像検索部３０は、複数の抽出領域の画素データから、それぞれ特徴ベクトルデータを生成し、各特徴ベクトルのベクトル間距離演算結果に、抽出領域の画素数（面積）に応じた重みを付与して、両画像の類似性を判断しても構わない。すなわち、画像検索部３０は、基準画像と探索画像との間での抽出領域毎のベクトル間距離演算を行い、その演算結果を、抽出領域の画素数が大きい程高く重み付けが可能な評価関数を用いて重み付けして類似性の評価値を算出しても構わない。 In the second embodiment, the image search unit 30 generates feature vector data from pixel data (feature data) of only an extraction region having the maximum number of pixels (area). However, the image search unit 30 generates feature vector data from the pixel data of a plurality of extraction regions, and assigns a weight according to the number of pixels (area) of the extraction region to the inter-vector distance calculation result of each feature vector. Then, the similarity between both images may be determined. That is, the image search unit 30 performs an inter-vector distance calculation for each extraction region between the reference image and the search image, and calculates an evaluation function that can be weighted higher as the number of pixels in the extraction region is larger. The evaluation value of similarity may be calculated by weighting using them.

上記第１乃至第３の実施形態において、セグメンテーション処理部２０は、ＡＶデコーダ１６（またはＡＶコーデック２５１）によるデコード後のベースバンド信号から色特徴及びテクスチャ特徴を抽出していた。しかし、セグメンテーション処理部２０は、デコード前の画像データから色特徴及びテクスチャ特徴を抽出しても構わない。この場合、セグメンテーション処理部２０は、ＤＣＴ係数のＤＣ係数を基に各特徴を抽出可能である。 In the first to third embodiments, the segmentation processing unit 20 extracts color features and texture features from the baseband signal decoded by the AV decoder 16 (or AV codec 251). However, the segmentation processing unit 20 may extract color features and texture features from the image data before decoding. In this case, the segmentation processing unit 20 can extract each feature based on the DC coefficient of the DCT coefficient.

上記第１乃至第３の実施形態において、セグメンテーション処理部２０は、ＩＤ付与による初期クラスタリング処理と、K-meansによる本クラスタリング処理の２段階のクラスタリング処理を実行していた。しかし、セグメンテーション処理部２０は、上記K-means処理部３５を廃して、本クラスタリング処理を行わずに、初期クラスタリング処理のみで画像セグメンテーション処理を終了してもよい。また、セグメンテーション処理部２０は、K-means処理部３５を有する場合でも、初期クラスタリング処理により目標クラスタ数のクラスタに画素データを分類できた場合には、本クラスタリング処理を行わずに、その時点で画像セグメンテーション処理を終了しても構わない。 In the first to third embodiments, the segmentation processing unit 20 executes the two-stage clustering process of the initial clustering process by ID assignment and the main clustering process by K-means. However, the segmentation processing unit 20 may abandon the K-means processing unit 35 and terminate the image segmentation process only with the initial clustering process without performing the main clustering process. Further, even when the segmentation processing unit 20 has the K-means processing unit 35, if the pixel data can be classified into clusters of the target number of clusters by the initial clustering process, the present clustering process is not performed, but at that time. The image segmentation process may be terminated.

上記第１乃至第２の実施形態において、記録再生装置は、本クラスタリング処理のクラスタリング手法として、K-meansを用いていた。しかし、クラスタリング手法としては、ファジィc-means、エントロピー法、ウォード法、ＳＯＭ等、他の手法が用いられても構わない。 In the first and second embodiments, the recording / reproducing apparatus uses K-means as the clustering method of the clustering process. However, as the clustering method, other methods such as fuzzy c-means, entropy method, Ward method, SOM, etc. may be used.

上記第１乃至第３の実施形態においては、本発明を記録再生装置に適用した例を示した。しかし、本発明は、ＰＣ、サーバ装置、テレビジョン装置、ゲーム機器、デジタルカメラ、デジタルビデオカメラ、携帯電話機等のその他の各種電子機器にも同様に適用することができる。 In the first to third embodiments, an example in which the present invention is applied to a recording / reproducing apparatus has been described. However, the present invention can be similarly applied to other various electronic devices such as a PC, a server device, a television device, a game device, a digital camera, a digital video camera, and a mobile phone.

本発明の第１の実施形態に係る記録再生装置の構成を示した図である。It is the figure which showed the structure of the recording / reproducing apparatus which concerns on the 1st Embodiment of this invention. セグメンテーション処理の概要を示した図である。It is the figure which showed the outline | summary of the segmentation process. 本発明の第１の実施形態におけるセグメンテーション処理部の構成を示した図である。It is the figure which showed the structure of the segmentation process part in the 1st Embodiment of this invention. 本発明の第１の実施形態におけるＩＤ処理部の構成を示した図である。It is the figure which showed the structure of the ID process part in the 1st Embodiment of this invention. 本発明の第１の実施形態における記録再生装置の動作の概略的な流れを示したフローチャートである。3 is a flowchart showing a schematic flow of the operation of the recording / reproducing apparatus in the first embodiment of the present invention. 本発明の第１の実施形態における初期クラスタリング処理の流れを示したフローチャートである。It is the flowchart which showed the flow of the initial clustering process in the 1st Embodiment of this invention. 本発明の第１の実施形態における初期クラスタリング処理のうち、画素データ間の比較判定処理及びＩＤの割り当て処理の詳細な流れを示したフローチャートである。It is the flowchart which showed the detailed flow of the comparison determination process between pixel data and ID allocation process among the initial clustering processes in the 1st Embodiment of this invention. 本発明の第１の実施形態におけるＩＤの割り当て処理を概念的に示した図である。It is the figure which showed notionally ID allocation processing in the 1st Embodiment of this invention. 本発明の第１の実施形態におけるＩＤの割り当て処理を概念的に示した図である。It is the figure which showed notionally ID allocation processing in the 1st Embodiment of this invention. 本発明の第１の実施形態におけるＩＤ補間処理の概略的な流れを示したフローチャートである。It is the flowchart which showed the schematic flow of the ID interpolation process in the 1st Embodiment of this invention. 本発明の第１の実施形態におけるＩＤ補間処理を概念的に示した図である。It is the figure which showed notionally ID interpolation processing in the 1st Embodiment of this invention. 本発明の第１の実施形態におけるＸ方向についてのＩＤ置換処理の流れを示したフローチャートである。It is the flowchart which showed the flow of the ID replacement process about the X direction in the 1st Embodiment of this invention. 本発明の第１の実施形態におけるＹ方向についてのＩＤ置換処理の流れを示したフローチャートである。It is the flowchart which showed the flow of ID replacement processing about the Y direction in the 1st Embodiment of this invention. 本発明の第１の実施形態における初期クラスタリング処理により生成されるクラスタ数と、閾値との関係を示したグラフである。It is the graph which showed the relationship between the number of clusters produced | generated by the initial clustering process in the 1st Embodiment of this invention, and a threshold value. 本発明の第１の実施形態における２段階のクラスタリング処理と、従来のクラスタリング処理とを比較して示した概念図である。It is the conceptual diagram which showed the two-stage clustering process in the 1st Embodiment of this invention, and compared with the conventional clustering process. 画像データ間のカメラ動作と、動きベクトルとの関係を模式的に示した図である。It is the figure which showed typically the relationship between the camera operation | movement between image data and a motion vector. 本発明の第１の実施形態における映像特徴検出部によるカメラ特徴検出処理の流れを示したフローチャートである。It is the flowchart which showed the flow of the camera feature detection process by the image | video feature detection part in the 1st Embodiment of this invention. 本発明の第１の実施形態における動きベクトル検出処理の詳細を示したフローチャートである。It is the flowchart which showed the detail of the motion vector detection process in the 1st Embodiment of this invention. アフィン変換モデルを示した図である。It is the figure which showed the affine transformation model. 重回帰分析によりアフィン係数を求める処理を示した図である。It is the figure which showed the process which calculates | requires an affine coefficient by multiple regression analysis. 本発明の第２の実施形態に係る記録再生装置の構成を示した図である。It is the figure which showed the structure of the recording / reproducing apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施形態に係る記録再生装置の動作の流れを示したフローチャートである。6 is a flowchart showing a flow of operations of a recording / reproducing apparatus according to a second embodiment of the present invention. 本発明の第２の実施形態における画像検索処理の詳細な流れを示したフローチャートである。It is the flowchart which showed the detailed flow of the image search process in the 2nd Embodiment of this invention. 本発明の第２の実施形態における画像検索処理を概念的に示した図である。It is the figure which showed notionally the image search process in the 2nd Embodiment of this invention. 本発明の第３の実施形態に係る記録再生装置の構成を示した図である。It is the figure which showed the structure of the recording / reproducing apparatus which concerns on the 3rd Embodiment of this invention. 本発明の第３の実施形態におけるオブジェクト符号化処理の流れを示したフローチャートである。It is the flowchart which showed the flow of the object encoding process in the 3rd Embodiment of this invention. 本発明の第３の実施形態におけるエンコード処理の流れを示したフローチャートである。It is the flowchart which showed the flow of the encoding process in the 3rd Embodiment of this invention.

Explanation of symbols

１…ＣＰＵ
４…映像特徴検出部
１０…ＨＤＤ
２０…セグメンテーション処理部
３０…画像検索部
３１…ＬＰＦ処理部
３２…ＩＤ処理部
３３…閾値設定部
３４…補間処理部
３５…K-means処理部
４０…画像検索部
４１…データメモリ部
４２…比較判定部
４４…ＩＤメモリ部
４５…最大ＩＤ値メモリ部
４６…アドレスカウンタ部
１００、２００、３００…記録再生装置
２５１…ＡＶコーデック 1 ... CPU
4 ... Video feature detection unit 10 ... HDD
DESCRIPTION OF SYMBOLS 20 ... Segmentation process part 30 ... Image search part 31 ... LPF process part 32 ... ID process part 33 ... Threshold setting part 34 ... Interpolation process part 35 ... K-means process part 40 ... Image search part 41 ... Data memory part 42 ... Comparison Determination unit 44 ... ID memory unit 45 ... Maximum ID value memory unit 46 ... Address counter unit 100, 200, 300 ... Recording / reproducing device 251 ... AV codec

Claims

A plurality of pixel data constituting image data are sequentially input,
It is determined for each input whether a predetermined feature value of each input pixel data is within a predetermined range value from the feature value of already input pixel data, and the feature value is the predetermined range value. The first identification information that is the same as the already input pixel data is assigned to the pixel data that is determined to be within, and the pixel data that is determined not to be within the predetermined range value is added to the first pixel data. An information processing method for classifying the plurality of pieces of pixel data into a first number of clusters by assigning second identification information different from the identification information.

The information processing method according to claim 1, further comprising:
The pixel data classified into the same cluster by the classification is regarded as pixel data having the same feature value, and the second plurality of classified pixel data is less than the first number by a predetermined clustering method. An information processing method that classifies a number of clusters.

The information processing method according to claim 2, further comprising:
Among a plurality of pieces of pixel data classified into the first number of clusters, a plurality of continuous first pixel data existing in a predetermined direction and the first pixel data existing in the predetermined direction are A plurality of different continuous second pixel data and at least one third pixel data existing between the first pixel data and the second pixel data in the predetermined direction;
When the identification information of the first pixel data and the second pixel data is the same, and the identification information of the third pixel data is different from the first and second pixel data, the first pixel data and the second pixel data are different from each other. 3. An information processing method for replacing the identification information of the third pixel data with the identification information of the first and second pixel data.

The information processing method according to claim 3, further comprising:
An information processing method for removing high-frequency components in the input pixel data by a low-pass filter.

The information processing method according to claim 3, further comprising:
An information processing method for dividing the image data into the second number of regions having an arbitrary shape based on a plurality of pixel data classified into the second number of clusters.

The information processing method according to claim 5, further comprising:
Detecting a motion vector between a plurality of the image data for each of the divided second number of regions;
An information processing method for detecting a predetermined video feature generated by a camera operation in video data composed of a plurality of the image data based on the detected motion vector.

An information processing method according to claim 6,
Detecting the video feature comprises:
Calculating the number of pixels of the second number of regions of the plurality of image data,
An information processing method for detecting the predetermined video feature in the video data based on a motion vector detected between regions having the largest number of pixels in the plurality of image data.

The information processing method according to claim 5, further comprising:
An information processing method for encoding the pixel data for each of the divided second number of regions.

The information processing method according to claim 3, further comprising:
Generating a feature vector for each of the divided second number of regions;
An information processing method that compares the generated feature vectors for each of the second number of clusters among a plurality of the image data to determine similarity between the plurality of image data.

An information processing method according to claim 9,
The step of determining the similarity includes:
Calculating the number of pixels of the second number of regions of the plurality of image data,
An information processing method for determining similarity between the plurality of image data by comparing the feature vectors between regions having the largest number of pixels among the plurality of pixel data.

An information processing method according to claim 3,
The step of classifying into the first number of clusters repeats the classification by changing the predetermined range value until the first number reaches a predetermined number.

An information processing method according to claim 3,
The step of classifying into the first number of clusters performs control so as not to execute the classification into the second number of clusters when the first number reaches a predetermined number.

Input means for sequentially inputting a plurality of pixel data constituting image data;
It is determined for each input whether the feature value of each input pixel data is within a predetermined range value from the feature value of the already input pixel data, and the feature value is within the predetermined range value. The first identification information that is the same as the already input pixel data is assigned to the pixel data determined to be, and the first identification information is determined to be the pixel data determined that the feature value is not within the predetermined range value. First classification means for classifying the plurality of pieces of pixel data into a first number of clusters by giving second identification information different from
The pixel data classified into the same cluster is regarded as pixel data having the same feature value, and the classified pixel data is classified into a second number of clusters smaller than the first number by a predetermined clustering method. An information processing apparatus comprising: a second classification unit.

In the information processing device,
Sequentially inputting a plurality of pixel data constituting the image data;
It is determined for each input whether a predetermined feature value of each input pixel data is within a predetermined range value from the feature value of already input pixel data, and the feature value is the predetermined range value. The first identification information that is the same as the already input pixel data is assigned to the pixel data that is determined to be within, and the pixel data that is determined not to be within the predetermined range value is added to the first pixel data. Classifying the plurality of pixel data into a first number of clusters, giving second identification information different from the identification information of
The pixel data classified into the same cluster is regarded as pixel data having the same feature value, and the classified pixel data is classified into a second number of clusters smaller than the first number by a predetermined clustering method. A program for executing the steps to be executed.