JP2010067014A

JP2010067014A - Image classification device and image classification method

Info

Publication number: JP2010067014A
Application number: JP2008232793A
Authority: JP
Inventors: Hirohisa Inamoto; 浩久稲本; Yuka Kihara; 酉華木原
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2008-09-11
Filing date: 2008-09-11
Publication date: 2010-03-25

Abstract

<P>PROBLEM TO BE SOLVED: To provide a customizable image classification device and method for classifying images by using whole images while suppressing the labor of a user to the minimum. <P>SOLUTION: The image classification device 10 includes an image registration device 20. The image registration device 20 includes: a first similarity calculation means 21 for calculating the similarity of images with respect to an input image in the local region of an image; a first image DB22 for storing the data of an image to which a keyword which is concrete for a user is attached as a tag; a first similarity decision means 23 for making first similarity decision; a second similarity calculation means 24 for calculating the similarity of the whole images for the input image; a second image DB25 for storing the data of the image to which the tag of an ambiguous keyword is attached; and a second similarity decision means 26 for making second similarity decision. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、例えばデジタルスチルカメラで撮影した画像を蓄積する際に画像を比較して分類する画像分類装置及び画像分類方法に関する。 The present invention relates to an image classification device and an image classification method for comparing and classifying images when accumulating images taken with, for example, a digital still camera.

近年、デジタルスチルカメラの爆発的な普及や、デジタルスチルカメラに搭載される記憶容量の増大に伴って、一般的な写真撮影方法に大きな変化が見られるようになった。即ち、従来の銀塩カメラでは一回のフィルム交換で２０回程度の撮影しか行えず、それぞれの写真に対して撮り直しもできなかったため、ユーザは撮影するシーン、人物、タイミングを吟味し、ここぞという場面でシャッターを押すことで、厳選された写真のみを取得していた。 In recent years, with the explosive spread of digital still cameras and the increase in storage capacity mounted on digital still cameras, there has been a significant change in general photography methods. In other words, the conventional silver halide camera can only shoot about 20 times with a single film change, and the photograph cannot be retaken, so the user examines the scene, person, and timing to shoot. By pressing the shutter at the scene, only the carefully selected photos were acquired.

これに対し、デジタルスチルカメラでは、何百枚もの画像を撮影することができる上に、一度撮影した画像を液晶モニタ上で確認し、不要な画像を容易に削除できるため、昨今ではとりあえず様々な場面で何度もシャッターを切り、後でそれらを吟味し、取捨選択を行うといった撮影方法が主流となっており、ユーザの保持する画像数は増大の一途である。さらに言えば、デジタルスチルカメラで撮影した画像を保持しておくＰＣのハードディスクや光ディスクといった記憶装置の容量も年々増加しており、いまやユーザは天文学的な枚数の画像を保持することが可能となっている。その結果、取捨選択を行わずにそのまま記憶装置に大量の画像を保持しておくユーザも少なくない。 On the other hand, with a digital still camera, it is possible to shoot hundreds of images, check the captured images on the LCD monitor, and easily delete unnecessary images. Shooting methods are often used in which the shutter is released many times in the scene, the images are examined and selected afterwards, and the number of images held by the user is constantly increasing. Furthermore, the capacity of storage devices such as PC hard disks and optical discs that hold images taken with digital still cameras has been increasing year by year, and now users can hold astronomical numbers of images. ing. As a result, there are many users who hold a large amount of images in the storage device without selecting them.

一方、そのように莫大な数の画像がＰＣの記憶領域内に保持されているとなると、場合によっては必要な画像を探す際には長大な時間を要することになる。例えば、必要な画像を探さなければならない場合の例として、運動会の日に子供が友人と一緒に写っている写真を選択して印刷し、その友人に配る場合が挙げられる。このような場合、ユーザの一般的な画像の探し方は以下のとおりである。 On the other hand, when such an enormous number of images are held in the storage area of the PC, it takes a long time to search for the necessary images. For example, as an example of a case where a necessary image has to be searched, there is a case where a photograph of a child photographed with a friend is selected and printed on the day of an athletic meet and distributed to the friend. In such a case, a user's general method of searching for an image is as follows.

まず、画像に付加された情報で画像を絞り込み（第１の絞り込み）、絞り込まれた画像を縮小表示して並べてその中から友人が写っていると思しき画像を絞り込む（第２の絞り込み）。最後に一枚一枚画像を拡大表示し、確認しながら所望の画像を探す（第３の絞り込み）ことになる。このように一枚一枚画像を確認しながら過去を振り返ることは、写真の醍醐味ではあるが、先程例に挙げた様に他人に配るための画像を検索する場合は効率的な作業が求められる。 First, the image is narrowed down by the information added to the image (first narrowing down), the narrowed down images are displayed in a reduced size and arranged, and the image that seems to be a friend is narrowed down (second narrowing down). Finally, each image is enlarged and displayed, and a desired image is searched for (3rd narrowing) while confirming. Looking back at the past while checking the images one by one in this way is the best part of photography, but as mentioned earlier, when searching for images to distribute to others, efficient work is required. .

このとき、画像の絞り込み、特に第１の絞り込みで十分に画像を絞り込むことができれば、それほどユーザの負担は大きくない。しかし、第１の絞り込みで利用する付加情報のうち、人手を介さず自動的に付加される付加情報は、一般に画像に対して間接的である場合が多い。例えば、最も一般的な付加情報は撮影日時であるが、イベントの日時を正確に記憶しているユーザは少なく、そのようなユーザにとって、撮影日時はイベントと画像を繋ぐ間接的な情報でしかない。また、その他の一般的な付加情報として、撮影モード等もあるが、これは更に間接的な情報である。例えば撮影時にフラッシュが使用されたことが付加情報として保持されていても、その情報からユーザは室内で撮影が行われたのか、夜に撮影が行われたのか、天気が悪かったのか、といった推測を行う必要があるという問題がある。その結果、効果的な付加情報を得るためには、ユーザの手作業による分類に頼るしかなかった。なお、この分類手段について大別すると、ディレクトリ構造による分類、及び画像に何らかの手段でタグを付けるアノテーションの２種類がある。 At this time, if the image can be narrowed down sufficiently by narrowing down the image, particularly the first narrowing down, the burden on the user is not so great. However, among the additional information used in the first narrowing down, additional information that is automatically added without human intervention is generally indirect with respect to an image. For example, although the most common additional information is the shooting date and time, there are few users who accurately store the date and time of the event, and for such a user, the shooting date and time is only indirect information connecting the event and the image. . Further, as other general additional information, there is a shooting mode or the like, which is more indirect information. For example, even if the fact that the flash was used at the time of shooting was held as additional information, it was estimated from the information whether the user was shooting indoors, shooting at night, or the weather was bad There is a problem that it is necessary to do. As a result, in order to obtain effective additional information, the user must rely on manual classification. The classification means is roughly classified into two types: classification based on a directory structure, and annotation for tagging an image by some means.

以上の理由で、ユーザの手作業による分類を自動化するために、様々な技術が提案されている（例えば、特許文献１参照）。特許文献１では画像全体の特徴量を算出し、該特徴量に応じて自動的に画像を分類する技術が提案されている。この技術を用いることで、画像が自動的に分類されるので、ユーザの手作業を大幅に削減することができる。 For the above reasons, various techniques have been proposed in order to automate classification by manual operation of the user (see, for example, Patent Document 1). Patent Document 1 proposes a technique for calculating the feature amount of the entire image and automatically classifying the image according to the feature amount. By using this technique, images are automatically classified, so that the user's manual work can be greatly reduced.

ここで、この技術を使って写真画像を分類しようとする場合、どのような特徴量をどのカテゴリに分類するかという識別ルールを決定しなければならない。識別ルールの決定の仕方には様々な手段が考えられる。例えば、事前に分類された画像の中で入力画像と最も類似度の高い画像のカテゴリに分類するといった手法が考えられる。しかし、写真は、背景、人物、特定の物体など、様々な要素で構成されていることが多く、それぞれの要素の位置関係が変わるだけでも画像特徴量は大きく変わってしまう。その結果、例えば、風景画と肖像画とを分類するだけでも、様々なパターンの画像を用意しなければならない。ここで、例えば、一枚一枚の画像との類似度を判定するのではなく、非特許文献１に示されたＳＶＭ（Support Vector Machine）といった汎化性能の高い学習・識別器によって、カテゴリの傾向を算出しておけば、用意する画像数を削減できるが、それでも大量の画像を必要とする。このような大量の画像をユーザに用意させることは、ユーザにとって負荷が大きい。 Here, in order to classify a photographic image using this technique, it is necessary to determine an identification rule as to which feature quantity is classified into which category. Various means can be considered as a method of determining the identification rule. For example, a method of classifying an image into an image category having the highest similarity to the input image among images classified in advance can be considered. However, a photograph is often composed of various elements such as a background, a person, and a specific object. Even if the positional relationship of each element changes, the image feature amount changes greatly. As a result, for example, it is necessary to prepare images of various patterns just by classifying landscape pictures and portraits. Here, for example, instead of determining the degree of similarity between images one by one, a category / category such as SVM (Support Vector Machine) shown in Non-Patent Document 1 has a high generalization performance. If the tendency is calculated, the number of images to be prepared can be reduced, but it still requires a large number of images. Providing the user with such a large amount of images is heavy on the user.

これに対し、事前に、例えばソフトウェアでサービスを提供する場合にはソフトウェア出荷時に、様々な画像を用意しておくことで前述の問題は回避されるが、ユーザによるカスタマイズに対してフレキシブルな対応ができなくなる。例えば、肖像画であっても、自分やその親族が写ったものは、ユーザにとって特別な意味を持ち、ユーザがそのような画像を分類したくなることは容易に想像できる。一方、ソフトウェア出荷時に、ユーザの顔を登録しておくことは不可能である。以上のように、画像全体を用いてフレキシブルに画像を分類することは難しい。 On the other hand, for example, when a service is provided by software in advance, the above-mentioned problem can be avoided by preparing various images at the time of software shipment. become unable. For example, even if it is a portrait, a picture of itself or its relatives has a special meaning for the user, and it can be easily imagined that the user wants to classify such an image. On the other hand, it is impossible to register the user's face when the software is shipped. As described above, it is difficult to classify images flexibly using the entire image.

一方、画像の局所に注目して分類する手法が提案されている（例えば、特許文献２参照）。特許文献２に記載のものは、画像から特定のオブジェクトを抽出し、抽出したオブジェクトに一度オブジェクト名を付加すると、オブジェクト名を付けたオブジェクトと類似するオブジェクトが以降撮影された場合、自動的にその画像にも同様にそのオブジェクト名を付加する。前述の画像全体を比較する手法に比べて、個々のオブジェクトを抽出しているので、それらの組み合わせパターンが劇的に少なくなり、比較的少ない画像数で分類可能である。しかし、ユーザ自身や親族の顔等は、一般的にユーザの保持する写真に大量に含まれている可能性が高く、特許文献２に記載の手法のみでは、十分な絞り込みが行えないという問題があった。
特許第４０３６００９号公報特開２００６−３３３４４３号公報 C. Cortes and V.N. Vapnik,"Support vector Networks,"Machine Learning, vol.20, pp.273-297, 1995 On the other hand, a method of classifying by focusing on the local part of an image has been proposed (for example, see Patent Document 2). According to the technique described in Patent Document 2, when a specific object is extracted from an image and an object name is once added to the extracted object, when an object similar to the object with the object name is subsequently photographed, Similarly, the object name is added to the image. Compared with the above-described method of comparing the entire images, individual objects are extracted, so that their combination patterns are dramatically reduced and classification can be performed with a relatively small number of images. However, there is a high possibility that a user's own face or a relative's face is generally included in a large amount in a photograph held by the user, and the method described in Patent Document 2 cannot be sufficiently narrowed down. there were.
Japanese Patent No. 4036009 JP 2006-333443 A C. Cortes and VN Vapnik, "Support vector Networks," Machine Learning, vol.20, pp.273-297, 1995

本発明は、前述のような事情に鑑みてなされたものであり、ユーザの負担を最小限に抑えつつ、カスタマイズ可能で、且つ画像全体を用いた分類を行うことができる画像分類装置及び画像分類方法を提供することを目的とする。 The present invention has been made in view of the above-described circumstances, and is an image classification device and an image classification that can be customized and can be classified using the entire image while minimizing the burden on the user. It aims to provide a method.

本発明の画像分類装置は、キーワードが付与された画像と入力画像との類似性を比較して前記入力画像を分類する画像分類装置であって、前記キーワードが付与されて登録された第１及び第２の画像のデータをそれぞれ記憶する第１及び第２の画像データ記憶手段と、予め定めた大きさの局所領域を前記入力画像及び前記第１の画像から抽出する局所領域抽出手段と、抽出した前記局所領域において前記入力画像と前記第１の画像との類似度を算出する第１の類似度算出手段と、前記入力画像の画像全体と前記第２の画像の画像全体との類似度を算出する第２の類似度算出手段と、前記第１及び前記第２の類似度算出手段が算出した類似度に基づいてキーワードを前記入力画像に付与するキーワード付与手段とを備えた構成を有している。 The image classification device according to the present invention is an image classification device that classifies the input image by comparing the similarity between the image to which the keyword is assigned and the input image, and includes the first and the first registered with the keyword. First and second image data storage means for storing data of the second image, local area extraction means for extracting a local area of a predetermined size from the input image and the first image, and extraction First similarity calculating means for calculating the similarity between the input image and the first image in the local region, and the similarity between the entire image of the input image and the entire image of the second image. A second similarity calculating unit for calculating; and a keyword adding unit for adding a keyword to the input image based on the similarity calculated by the first and second similarity calculating units. ing.

この構成により、本発明の画像分類装置は、局所領域及び画像全体における類似度を算出することにより細やかな画像の分類が行えるので、ユーザの負担を最小限に抑えつつ、カスタマイズ可能で、且つ画像全体を用いた分類を行うことができる。 With this configuration, the image classification apparatus according to the present invention can perform detailed image classification by calculating the similarity in the local region and the entire image, and can be customized while minimizing the burden on the user. Classification using the whole can be performed.

また、本発明の画像分類装置は、前記第１の画像データ記憶手段が、ユーザが具体的なキーワードを付与して登録した画像のデータを前記第１の画像のデータとして記憶するものである構成を有している。 In the image classification device according to the present invention, the first image data storage unit stores image data registered by a user by assigning a specific keyword as data of the first image. have.

この構成により、本発明の画像分類装置は、入力画像に対してユーザに具体的なキーワードを付与させるので、入力画像のカスタマイズが可能となる。 With this configuration, the image classification device according to the present invention allows the user to assign a specific keyword to the input image, so that the input image can be customized.

また、本発明の画像分類装置は、前記第２の画像データ記憶手段は、予め定めた曖昧なキーワードが付された画像のデータを前記第２の画像のデータとして記憶するものである構成を有している。 The image classification apparatus of the present invention has a configuration in which the second image data storage means stores image data to which a predetermined ambiguous keyword is attached as data of the second image. is doing.

この構成により、本発明の画像分類装置は、曖昧なキーワードによって画像全体の類似度を算出することができる。また、第２の画像のデータが装置の工場出荷時に登録されていれば、ユーザが第２の画像を登録する手間を省略でき、ユーザの負担を抑えることができる。 With this configuration, the image classification device of the present invention can calculate the similarity of the entire image using an ambiguous keyword. Further, if the data of the second image is registered at the time of factory shipment of the apparatus, the user can save time for registering the second image, and the burden on the user can be suppressed.

また、本発明の画像分類装置は、前記局所領域抽出手段が、画像の特徴量が極値を示す極値画素を検出する極値画素検出部と、前記入力画像及び前記第１の画像のそれぞれにおいて前記極値画素近傍の画像特徴量を算出する特徴量算出手段と、算出した特徴量を比較する特徴量比較部と、前記入力画像及び前記第１の画像のそれぞれにおける極値画素の位置関係に基づいて極値画素を選定する極値画素選定部と、前記極値画素選定部が選定した極値画素の数に基づいて前記第１の画像に対応する画像領域を前記入力画像から抽出する対応領域抽出部とを備えた構成を有している。 Further, in the image classification device of the present invention, the local region extraction unit includes an extreme value pixel detection unit that detects an extreme value pixel whose feature value of the image indicates an extreme value, and each of the input image and the first image. A feature amount calculating means for calculating an image feature amount in the vicinity of the extreme pixel, a feature amount comparing unit for comparing the calculated feature amount, and a positional relationship of the extreme pixel in each of the input image and the first image An extreme pixel selection unit that selects an extreme pixel based on the image, and an image area corresponding to the first image is extracted from the input image based on the number of extreme pixels selected by the extreme pixel selection unit. And a corresponding area extraction unit.

この構成により、本発明の画像分類装置は、極値画素選定部が選定した極値画素の数に基づいて第１の画像に対応する画像領域を入力画像から抽出することができる。 With this configuration, the image classification device of the present invention can extract an image region corresponding to the first image from the input image based on the number of extreme pixels selected by the extreme pixel selection unit.

また、本発明の画像分類装置は、前記第１の類似度算出手段が、前記対応領域抽出部が抽出した画像領域の画像と前記第１の画像とを比較して前記類似度を算出する画像比較手段を備えた構成を有している。 In the image classification device of the present invention, the first similarity calculation unit compares the first image with the image of the image area extracted by the corresponding area extraction unit, and calculates the similarity. It has the structure provided with the comparison means.

この構成により、本発明の画像分類装置は、抽出した画像領域における特徴量に基づいて類似度を算出することにより、類似度の算出精度を向上させることができる。 With this configuration, the image classification device of the present invention can improve the similarity calculation accuracy by calculating the similarity based on the feature amount in the extracted image region.

また、本発明の画像分類装置は、前記極値画素検出部は、ユーザが前記第１の画像として登録する際の画像の極値画素を検出するものであって、前記極値画素検出部が検出した前記極値画素の数が予め定めた数以下のとき前記ユーザに警告を与える警告手段を備えた構成を有している。 In the image classification device of the present invention, the extreme pixel detection unit detects an extreme pixel of an image when a user registers as the first image, and the extreme pixel detection unit The system includes a warning unit that gives a warning to the user when the number of detected extreme value pixels is equal to or less than a predetermined number.

この構成により、本発明の画像分類装置は、登録される多数の画像に同じキーワードが付与されることを防止できるので、ユーザの利便性を向上させることができる。 With this configuration, the image classification device of the present invention can prevent the same keyword from being assigned to a large number of registered images, and thus can improve user convenience.

また、本発明の画像分類装置は、前記入力画像の特定の領域を指定する領域指定手段を備え、前記第１の画像データ記憶手段は、前記領域指定手段が指定した領域の画像データを記憶するものである構成を有している。 The image classification apparatus according to the present invention further includes an area designating unit for designating a specific area of the input image, and the first image data storage unit stores image data of the area designated by the area designating unit. It has the structure which is thing.

この構成により、本発明の画像分類装置は、第１の画像の登録時に画像全体ではなく画像の一部領域だけを指定することができるので、ユーザはキーワードを付与したい被写体のみが写った画像を用意する必要がなく、ユーザの利便性を向上させることができる。 With this configuration, the image classification apparatus of the present invention can specify only a partial area of the image, not the entire image, when registering the first image, so that the user can capture an image showing only a subject to which a keyword is to be assigned. There is no need to prepare, and the convenience of the user can be improved.

また、本発明の画像分類装置は、前記第１の類似度算出手段が算出した類似度に基づいて前記入力画像と前記第１の画像とが類似しているか否かを判定する類似判定手段を備え、前記第２の類似度算出手段は、前記入力画像と前記第１の画像とが類似していない場合にのみ前記画像全体に係る類似度を算出するものである構成を有している。 The image classification apparatus according to the present invention further includes similarity determination means for determining whether or not the input image is similar to the first image based on the similarity calculated by the first similarity calculation means. And the second similarity calculation means calculates a similarity related to the entire image only when the input image and the first image are not similar.

この構成により、本発明の画像分類装置は、１つの画像に１つのキーワードを付与する場合であって入力画像と第１の画像とが類似していない場合に、第２の類似度算出手段による処理を省略することができるので、画像登録時にユーザの待ち時間の短縮化を図ることができ、ユーザの利便性を向上させることができる。 With this configuration, the image classification apparatus of the present invention uses the second similarity calculation unit when one keyword is given to one image and the input image and the first image are not similar. Since the processing can be omitted, the waiting time of the user can be shortened at the time of image registration, and the convenience for the user can be improved.

本発明の画像分類方法は、キーワードが付与された画像と入力画像との類似性を比較して前記入力画像を分類する画像分類方法であって、前記キーワードが付与されて登録された第１及び第２の画像のデータをそれぞれ記憶するステップと、予め定めた大きさの局所領域を前記入力画像及び前記第１の画像から抽出するステップと、抽出した前記局所領域において前記入力画像と前記第１の画像との類似度を算出するステップと、前記入力画像の画像全体と前記第２の画像の画像全体との類似度を算出するステップと、前記第１及び前記第２の画像によって算出した類似度に基づいてキーワードを前記入力画像に付与するステップとを含む構成を有している。 The image classification method of the present invention is an image classification method for classifying the input image by comparing the similarity between the image to which the keyword is assigned and the input image, wherein the first and the first registered with the keyword are registered. Storing each of the second image data; extracting a local area of a predetermined size from the input image and the first image; and extracting the input image and the first in the extracted local area Calculating the similarity between the first image and the second image, calculating the similarity between the entire image of the input image and the entire image of the second image, and the similarity calculated by the first and second images Adding a keyword to the input image based on the degree.

この構成により、本発明の画像分類方法は、局所領域及び画像全体における類似度を算出することにより細やかな画像の分類が行えるので、ユーザの負担を最小限に抑えつつ、カスタマイズ可能で、且つ画像全体を用いた分類を行うことができる。 With this configuration, the image classification method of the present invention can perform detailed image classification by calculating the similarity in the local region and the entire image, and can be customized while minimizing the burden on the user. Classification using the whole can be performed.

本発明は、ユーザの負担を最小限に抑えつつ、カスタマイズ可能で、且つ画像全体を用いた分類を行うことができるという効果を有する画像分類装置及び画像分類方法を提供することができるものである。 The present invention can provide an image classification apparatus and an image classification method that can be customized and can be classified using the entire image while minimizing the burden on the user. .

以下、本発明の実施形態について図面を用いて説明する。実施形態では、本発明に係る画像分類装置を、画像データをアップロードして管理できるウェブサービスシステムに適用した例を挙げて説明する。このウェブサービスシステムは、サーバＰＣと、複数のクライアントＰＣとを備えるものである。以下の説明では、クライアントＰＣからサーバＰＣにアップロードされる対象となる画像を入力画像という。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the embodiment, an example in which the image classification device according to the present invention is applied to a web service system capable of uploading and managing image data will be described. This web service system includes a server PC and a plurality of client PCs. In the following description, an image to be uploaded from the client PC to the server PC is referred to as an input image.

（第１実施形態）
まず、本発明に係る画像分類装置の第１実施形態における構成について説明する。 (First embodiment)
First, the configuration of the image classification apparatus according to the first embodiment of the present invention will be described.

図１に示すように、本実施形態における画像分類装置１０は、入力画像を登録する画像登録装置２０と、主にキーワードを登録するキーワード登録装置３０とを備えている。 As shown in FIG. 1, the image classification device 10 according to the present embodiment includes an image registration device 20 that registers input images, and a keyword registration device 30 that mainly registers keywords.

画像登録装置２０は、第１の類似度を算出する第１の類似度算出手段２１と、第１の検索対象画像のデータを記憶する第１の画像データベース（ＤＢ）２２と、第１の類似判定を行う第１の類似判定手段２３と、第２の類似度を算出する第２の類似度算出手段２４と、第２の検索対象画像のデータを記憶する第２の画像ＤＢ２５と、第２の類似判定を行う第２の類似判定手段２６とを備えている。なお、第１の画像ＤＢ２２及び第２の画像ＤＢ２５は、それぞれ、本発明に係る第１及び第２の画像データ記憶手段を構成する。また、第１の類似判定手段２３及び第２の類似判定手段２６は、本発明に係るキーワード付与手段を構成する。 The image registration apparatus 20 includes a first similarity calculation unit 21 that calculates a first similarity, a first image database (DB) 22 that stores data of a first search target image, and a first similarity A first similarity determination unit 23 that performs a determination, a second similarity calculation unit 24 that calculates a second similarity, a second image DB 25 that stores data of a second search target image, and a second And second similarity determination means 26 for performing the similarity determination. Note that the first image DB 22 and the second image DB 25 constitute first and second image data storage means according to the present invention, respectively. The first similarity determination unit 23 and the second similarity determination unit 26 constitute a keyword assignment unit according to the present invention.

キーワード登録装置３０は、入力画像において特定の領域を指定する領域指定手段３１と、極値画素を検出する極値画素検出手段３２と、キーワードや画像を登録する登録手段３３と、ユーザに警告を与える警告手段３４とを備えている。 The keyword registration device 30 includes a region designation unit 31 for designating a specific region in the input image, an extreme pixel detection unit 32 for detecting extreme pixels, a registration unit 33 for registering keywords and images, and a warning to the user Warning means 34 for giving.

第１の検索対象画像とは、画像の局所領域において入力画像に対する画像の類似度を算出するために第１の画像ＤＢ２２から検索する画像をいい、ユーザが登録して蓄積した画像である。例えば、第１の検索対象画像は、「長男の顔」、「子供の友人Ａさんの顔」というようなユーザにとって具体的なキーワードがタグとして付される画像である。また、第２の検索対象画像とは、入力画像に対して画像全体の類似度を算出するために第２の画像ＤＢ２５から検索する画像をいい、例えばシステムの出荷時にメーカ側で登録して蓄積した画像である。具体例を挙げれば、第２の検索対象画像は、「肖像画」、「風景」、「動物」、「集合写真」というような曖昧な（換言すれば抽象的な）キーワードのタグが付される画像である。 The first search target image is an image searched from the first image DB 22 in order to calculate the similarity of the image to the input image in the local region of the image, and is an image registered and accumulated by the user. For example, the first search target image is an image to which a specific keyword is attached as a tag for a user such as “the face of the eldest son” or “the face of the child friend A”. The second search target image is an image searched from the second image DB 25 to calculate the similarity of the entire image with respect to the input image. For example, the image is registered and stored on the manufacturer side at the time of shipping the system. It is an image. For example, the second search target image is tagged with an ambiguous (in other words, abstract) keyword such as “portrait”, “landscape”, “animal”, “group photo”. It is an image.

なお、第１の検索対象画像に関しては、システム出荷時に何らかのサンプル画像が登録されていても構わない。また、第２の検索対象画像に関しては、曖昧なキーワードのタグを付すための画像をユーザが用意して蓄積するものであってもよい。 As for the first search target image, some sample image may be registered at the time of system shipment. Further, regarding the second search target image, the user may prepare and store an image for tagging an ambiguous keyword.

第１の類似度算出手段２１は、図２に示すように、局所領域を抽出する局所領域抽出手段４０と、画像を比較する画像比較手段５０とを備えている。局所領域抽出手段４０は、入力画像のＳＩＦＴ（Scale-Invariant Feature Transform）を算出するＳＩＦＴ算出部４１、第１の検索対象画像のＳＩＦＴを算出するＳＩＦＴ算出部４２、ＳＩＦＴを比較するＳＩＦＴ比較部４３、対応点を選定する対応点選定部４４、対応領域を抽出する対応領域抽出部４５を備えている。なお、ＳＩＦＴ算出部４１及び４２は、本発明に係る極値画素検出部、特徴量算出手段を構成する。また、ＳＩＦＴ比較部４３は、本発明に係る特徴量比較部を構成する。また、対応点選定部４４は、本発明に係る極値画素選定部を構成する。また、対応領域抽出部４５は、本発明に係る対応領域抽出部を構成する。 As shown in FIG. 2, the first similarity calculation unit 21 includes a local region extraction unit 40 that extracts a local region, and an image comparison unit 50 that compares images. The local region extraction unit 40 includes a SIFT calculation unit 41 that calculates a SIFT (Scale-Invariant Feature Transform) of the input image, a SIFT calculation unit 42 that calculates a SIFT of the first search target image, and a SIFT comparison unit 43 that compares the SIFTs. , A corresponding point selecting unit 44 for selecting corresponding points, and a corresponding region extracting unit 45 for extracting corresponding regions. The SIFT calculation units 41 and 42 constitute an extreme pixel detection unit and a feature amount calculation unit according to the present invention. The SIFT comparison unit 43 constitutes a feature amount comparison unit according to the present invention. The corresponding point selection unit 44 constitutes an extreme pixel selection unit according to the present invention. In addition, the corresponding region extraction unit 45 constitutes a corresponding region extraction unit according to the present invention.

図３は、本実施形態におけるウェブサービスシステムを構成するサーバＰＣ及びクライアントＰＣとして用いられるコンピュータ６０のブロック図であって、本実施形態における画像分類装置１０は、コンピュータ６０と、コンピュータ６０にロードされるプログラムとによって実現される。 FIG. 3 is a block diagram of a computer 60 used as a server PC and a client PC constituting the web service system in the present embodiment. The image classification apparatus 10 in the present embodiment is loaded on the computer 60 and the computer 60. It is realized by the program.

図３において、ＣＰＵ６１は中央処理装置であり、コンピュータ６０の全体の制御及び演算処理等を行う。ＲＯＭ６２は読み出し専用メモリであり、システム起動プログラムの情報等の記憶領域を有する。ＲＡＭ６３はランダムアクセスメモリであり、データ記憶領域を有する。ＲＡＭ６３には、オペレーティングシステムや、デバイスドライバ、例えばウェブブラウザのようなアプリケーション、通信制御等のプログラムがロードされ、ＣＰＵ６１によって実行される。入出力部６４は、キーボード、マウス等の入出力デバイスで構成され、ユーザが入出力デバイスに入力した情報をＣＰＵ６１に伝達するものである。表示部６５は例えば液晶ディスプレイ、表示制御部等を備えている。ＨＤＤ６６はハードディスク装置であり、検索対象画像のデータ、ウェブブラウザのプログラムファイル等を格納するようになっている。通信部６７は、ネットワーク通信制御を行うようになっており、ネットワークに接続された他のコンピュータや周辺機器との通信が可能となっている。データバス６８は、前述の構成要素間のデータの通路となるものである。なお、本実施形態では、第１の検索対象画像及び第２の検索対象画像がサーバＰＣのハードディスク装置に格納されているものとする。 In FIG. 3, a CPU 61 is a central processing unit, and performs overall control and arithmetic processing of the computer 60. The ROM 62 is a read-only memory and has a storage area for information such as system startup programs. The RAM 63 is a random access memory and has a data storage area. An operating system, a device driver, an application such as a web browser, and a program such as communication control are loaded on the RAM 63 and executed by the CPU 61. The input / output unit 64 includes input / output devices such as a keyboard and a mouse, and transmits information input by the user to the input / output device to the CPU 61. The display unit 65 includes, for example, a liquid crystal display, a display control unit, and the like. The HDD 66 is a hard disk device, and stores search target image data, a web browser program file, and the like. The communication unit 67 performs network communication control, and can communicate with other computers and peripheral devices connected to the network. The data bus 68 serves as a data path between the aforementioned components. In the present embodiment, it is assumed that the first search target image and the second search target image are stored in the hard disk device of the server PC.

前述の構成により、ユーザは、各クライアントＰＣから画像をサーバＰＣにアップロードし、アップロードした画像をクライアントＰＣから閲覧できるようになっている。各画像には、サーバＰＣやクライアントＰＣによって自動的に付与されたキーワードがタグ情報として付随している。なお、各画像ファイルにはキーワードが複数付随していることもある。 With the configuration described above, the user can upload an image from each client PC to the server PC and view the uploaded image from the client PC. Each image is accompanied by a keyword automatically assigned by the server PC or client PC as tag information. Each image file may be accompanied by a plurality of keywords.

次に、本実施形態における画像分類装置１０の動作について説明する。 Next, the operation of the image classification device 10 in this embodiment will be described.

（ユーザから見たシステムの動作）
最初にユーザから見たシステムの動作について説明する。 (System operation as seen from the user)
First, the operation of the system as seen from the user will be described.

まず、ユーザインターフェースについて説明する。ユーザは、サーバＰＣにアクセスする際、クライアントＰＣから例えばウェブブラウザを介して行う。具体的には、ユーザはウェブブラウザを立ち上げ、サーバＰＣのアドレス情報を入力すると、図４に示すようなダイアログ画面７０がディスプレイに表示される。図４に示したダイアログ画面７０は、キーワード入力ボックス７１、検索ボタン７２、画像登録ボタン７３、キーワード登録ボタン７４、画像表示領域７５を有する。 First, the user interface will be described. The user accesses the server PC from the client PC via, for example, a web browser. Specifically, when the user starts up a web browser and inputs address information of the server PC, a dialog screen 70 as shown in FIG. 4 is displayed on the display. The dialog screen 70 illustrated in FIG. 4 includes a keyword input box 71, a search button 72, an image registration button 73, a keyword registration button 74, and an image display area 75.

ユーザがキーワード入力ボックス７１に好みのキーワードを入力し、検索ボタン７２を押すと、クライアントＰＣに保持された画像ファイルのうち該キーワードをタグとして保持する画像ファイルのサムネイルが画像表示領域７５に整列されて表示される。 When the user inputs a favorite keyword in the keyword input box 71 and presses the search button 72, thumbnails of image files that hold the keyword as a tag among the image files held in the client PC are aligned in the image display area 75. Displayed.

また、ユーザが画像登録ボタン７３を押すことにより、ファイル選択ダイアログ画面が表示される。このファイル選択ダイアログ画面により、クライアントＰＣに保持された画像ファイルを指定すると、指定された画像ファイルがサーバＰＣにアップロードされる。サーバＰＣにアップロードされる画像（入力画像）には自動的に、タグが付与される。なお、どのようにタグを付与するかについては後述する。 When the user presses the image registration button 73, a file selection dialog screen is displayed. When an image file held in the client PC is designated on the file selection dialog screen, the designated image file is uploaded to the server PC. A tag is automatically assigned to an image (input image) uploaded to the server PC. Note that how to add a tag will be described later.

ここで、サーバＰＣに既に登録されている画像と類似する画像を含む画像領域が入力画像に存在するか否かを判定し、存在する場合には、サーバＰＣに登録されている画像に関連付けられたキーワードをタグとして、アップロードされた画像に付与する。また、該ダイアログ画面のプログラムは画像に新規のキーワードをタグとして付与するためのインターフェース手段を備えている。 Here, it is determined whether or not an image area including an image similar to an image already registered in the server PC exists in the input image. If there is an image area, the image area is associated with the image registered in the server PC. The keyword is added to the uploaded image as a tag. Further, the program for the dialog screen includes interface means for adding a new keyword as a tag to the image.

次に、ユーザがキーワード登録ボタン７４を押すと、図５に示すようなダイアログ画面８０がディスプレイに表示される。ダイアログ画面８０は、画像選択ボタン８１、キーワード入力ボックス８２、登録ボタン８３、表示領域８４、警告表示領域８５を有する。 Next, when the user presses the keyword registration button 74, a dialog screen 80 as shown in FIG. 5 is displayed on the display. The dialog screen 80 includes an image selection button 81, a keyword input box 82, a registration button 83, a display area 84, and a warning display area 85.

画像選択ボタン８１を押すことにより、ファイル選択ダイアログ画面が表示される。ここでクライアントＰＣに保持される画像ファイルを指定すると、当該画像ファイルに係る画像が表示領域８４に表示される。表示領域８４上でユーザがマウスをドラッグすると、図５の画像中に示したような矩形が描画される。 By pressing the image selection button 81, a file selection dialog screen is displayed. When an image file held in the client PC is designated here, an image related to the image file is displayed in the display area 84. When the user drags the mouse on the display area 84, a rectangle as shown in the image of FIG. 5 is drawn.

ユーザは、キーワード入力ボックス８２に、画像に付けたいタグ情報をキーワードとして入力することができる。登録ボタン８３が押されると、キーワード入力ボックス８２に入力したキーワードが、表示領域８４において矩形で囲まれた領域の画像とともに、第１の画像ＤＢ２２に登録される。この時、サーバＰＣは、登録される画像が識別に適するか否かを判定し、識別に適さない場合、警告表示領域８５に、例えば「選択領域は、上手く識別できません」といった警告を表示し、画像の登録をキャンセルするものとする。ここまでの処理は、以降入力される画像が、登録された画像に類似するか否かを判定し、類似している場合には自動的にキーワードを付与するためのテンプレートを作る作業である。しかし、平坦な画像領域（画面の広い範囲にわたり画素間の濃度変化の少ない画像領域）、例えば図５に示した人物画像の背景のような領域が指定された場合、様々な画像において類似しているとの判定が発生し、同じキーワードが殆どの画像に対して付与されてしまうことになる。そこで、本実施形態では、画像の登録の時点で、このような不具合が発生しそうな領域の指定に対して判定を行い、ユーザに警告を表示することにより、同じキーワードが殆どの画像に付与されるという問題を解消できる。なお、どのように、識別に適するか否かを判定するかについては後述する。 The user can input tag information to be added to the image as a keyword in the keyword input box 82. When the registration button 83 is pressed, the keyword input in the keyword input box 82 is registered in the first image DB 22 together with the image of the area surrounded by the rectangle in the display area 84. At this time, the server PC determines whether or not the registered image is suitable for identification. If the image is not suitable for identification, the server PC displays a warning such as “the selected area cannot be identified well” in the warning display area 85. The registration of images shall be cancelled. The processing so far is an operation of determining whether or not an image to be input thereafter is similar to a registered image, and creating a template for automatically assigning a keyword if it is similar. However, when a flat image area (an image area where the density change between pixels is small over a wide range of the screen), for example, an area such as the background of the person image shown in FIG. The same keyword is assigned to almost all images. Therefore, in the present embodiment, at the time of image registration, the same keyword is assigned to most images by making a determination on the designation of an area where such a problem is likely to occur and displaying a warning to the user. To solve the problem. How to determine whether it is suitable for identification will be described later.

（サーバＰＣから見たシステムの動作）
次に、サーバＰＣから見たシステムの動作について説明する。 (System operation seen from server PC)
Next, the operation of the system viewed from the server PC will be described.

まず、画像登録時のフローについて図１及び図６を用いて説明する。図６は、画像登録時のフローチャートである。 First, the flow at the time of image registration will be described with reference to FIGS. FIG. 6 is a flowchart at the time of image registration.

第１の類似度算出手段２１は、入力画像及び第１の検索対象画像のデータを入力する（ステップＳ１１、１２）。また、第１の類似度算出手段２１は、第１の検索対象画像に含まれる画像領域と類似する画像領域が入力画像内に存在するかを判定し、存在する場合には類似度を算出し（ステップＳ１３）、類似度を示すデータを第１の類似判定手段２３に転送する。なお、類似する領域が存在しない場合には、類似度０を算出し、第１の類似判定手段２３に転送する。 The first similarity calculation unit 21 inputs the input image and the data of the first search target image (steps S11 and S12). The first similarity calculating unit 21 determines whether an image area similar to the image area included in the first search target image exists in the input image, and calculates the similarity when it exists. (Step S <b> 13), data indicating the degree of similarity is transferred to the first similarity determination unit 23. If there is no similar area, a similarity of 0 is calculated and transferred to the first similarity determination means.

ここで、第１の検索対象画像は、前述のようにユーザがキーワード登録した画像であり、画像には対応付けられたタグが付与されているものとする。また、後述するように、第１の類似度算出手段２１は類似領域が存在するか否かを判定する際に、入力画像と検索対象画像とで様々な極値画素（Keypoint）を検出し、それぞれの極値画素周辺の情報を用いて、両画像間で対応する（類似する）極値を検出し、検出した極値画素の個数情報も同時に第１の類似判定手段２３に転送するものとする。 Here, it is assumed that the first search target image is an image registered by the user as described above, and an associated tag is assigned to the image. Further, as will be described later, the first similarity calculation unit 21 detects various extreme pixels (Keypoints) between the input image and the search target image when determining whether or not a similar region exists, Using the information around each extreme pixel, the corresponding (similar) extreme value between both images is detected, and the number information of the detected extreme pixel is also transferred to the first similarity determining means 23 at the same time. To do.

第１の類似判定手段２３は、入力画像と第１の検索対象画像とが類似しているか否かを判定する（ステップＳ１４）。ステップＳ１４において、第１の類似判定手段２３は、入力画像と第１の検索対象画像とが類似している場合は、検索対象画像に付与されたタグを出力し（ステップＳ１８）、入力画像と第１の検索対象画像とが類似していない場合は、タグを出力しない。なお、第１の類似判定手段２３の詳細な動作については後述する。 The first similarity determination unit 23 determines whether or not the input image and the first search target image are similar (step S14). In step S14, if the input image and the first search target image are similar to each other, the first similarity determination unit 23 outputs a tag given to the search target image (step S18). If the first search target image is not similar, no tag is output. The detailed operation of the first similarity determination unit 23 will be described later.

続いて、第２の類似度算出手段２４では入力画像の全領域と、第２の検索対象画像に保持される様々な画像の全領域との類似度を算出し、類似度を示すデータを第２の類似判定手段２６に出力する。なお、第２の類似度算出手段の詳細な動作については後述する。 Subsequently, the second similarity calculation means 24 calculates the similarity between the entire area of the input image and the entire areas of the various images held in the second search target image, and stores the data indicating the similarity in the first 2 to the similarity determination means 26. The detailed operation of the second similarity calculation unit will be described later.

第２の類似判定手段２６は、第２の類似度算出手段２４が算出した類似度が所定の閾値（例えば０．７）以上か否かに基づき、入力画像と第２の検索対象画像とが画像全体で類似しているか否かを判定する（ステップＳ１７）。ステップＳ１７において、第２の類似判定手段２６は、類似度の閾値以上となる第２の検索対象画像があった場合、当該第２の検索対象画像に付与されていたタグを出力する（ステップＳ１８）。一方、第２の類似判定手段２６は、類似度の閾値以上となる第２の検索対象画像がない場合はタグの出力は行わない。なお、類似度の閾値は、例えば予め実験を行って取得したデータを基に決定するのが好ましい。 The second similarity determination unit 26 determines whether the input image and the second search target image are based on whether the similarity calculated by the second similarity calculation unit 24 is a predetermined threshold (for example, 0.7) or more. It is determined whether the entire image is similar (step S17). In step S17, when there is a second search target image that is equal to or greater than the similarity threshold, the second similarity determination unit 26 outputs the tag that has been assigned to the second search target image (step S18). ). On the other hand, the second similarity determination unit 26 does not output a tag when there is no second search target image that is equal to or greater than the similarity threshold. Note that the similarity threshold is preferably determined based on, for example, data obtained through experiments in advance.

以上の処理により、入力画像には自動的に複数のタグが付与される。前述のとおり、風景や肖像画といったタグは曖昧で、人物や背景など、複数の構成要素により構成されている。画像上においてそれら要素の位置関係が変わると、画像全体の特徴量が大きく変わってしまう。そのため、画像全体の特徴量からこれらのタグを判定するためには、非常に大量の画像を用意しなければならない。これを登録する作業はユーザにとって、非常に高い負荷になる。一方で、ユーザ自身や家族の顔、富士山といった具体的なオブジェクトの場合、単数の構成要素により構成されている場合が多い。また、複数の構成要素であっても、構成要素の位置関係が画像上で固定されている場合が多い。このような具体的なオブジェクトの画像に関しては、少ない画像を登録するだけで、高い精度で類似度を判定することができる。本実施形態のように、大量の画像との比較を行わなければならない曖昧なタグに関してはシステム出荷時に登録しておき、少数の画像との比較を行うだけでよい具体的なタグに関してはユーザに登録させるという構成によって、ユーザの負担を低減して様々なタグを付けられるとともに、ユーザによるカスタマイズが可能となる。
（第１の類似度算出手段２１の動作）
次に、第１の類似度算出手段２１において、どのように第１の検索対象画像に含まれる画像（以下検索対象画像）と類似する領域が入力画像内に存在するかを判定するかについて図２及び図７に基づき述べる。図７は、第１の類似度算出手段２１の詳細な動作を示すフローチャートである。 Through the above processing, a plurality of tags are automatically assigned to the input image. As described above, tags such as landscapes and portraits are ambiguous and are composed of a plurality of components such as a person and a background. If the positional relationship of these elements changes on the image, the feature amount of the entire image changes greatly. Therefore, in order to determine these tags from the feature amount of the entire image, a very large amount of images must be prepared. The operation of registering this becomes a very high load for the user. On the other hand, in the case of a specific object such as the user himself / herself, family face, or Mt. Fuji, it is often composed of a single component. Even in the case of a plurality of components, the positional relationship between the components is often fixed on the image. With respect to such specific object images, it is possible to determine the degree of similarity with high accuracy only by registering a small number of images. As in this embodiment, ambiguous tags that must be compared with a large number of images are registered at the time of system shipment, and specific tags that only need to be compared with a small number of images are given to the user. With the configuration of registration, various tags can be attached while reducing the burden on the user, and customization by the user is possible.
(Operation of the first similarity calculation means 21)
Next, how the first similarity calculation unit 21 determines whether an area similar to an image included in the first search target image (hereinafter referred to as a search target image) exists in the input image. 2 and FIG. FIG. 7 is a flowchart showing the detailed operation of the first similarity calculation means 21.

ＳＩＦＴ算出部４１は、登録対象である入力画像のデータを入力し（ステップＳ２１）、入力画像のＳＩＦＴを算出する（ステップＳ２２）。また、ＳＩＦＴ算出部４２は、第１の検索対象画像のデータを入力し（ステップＳ２３）、第１の検索対象画像のＳＩＦＴを算出する（ステップＳ２４）。 The SIFT calculation unit 41 inputs data of an input image to be registered (step S21), and calculates the SIFT of the input image (step S22). Also, the SIFT calculation unit 42 receives the data of the first search target image (step S23), and calculates the SIFT of the first search target image (step S24).

ここでＳＩＦＴとは、文献１（David G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints", International Journal of Computer Vision, 2004
）に記載された技術で、画像内で特徴的な画素を複数検出し、それぞれの画素について、周辺領域の情報から特徴量を算出する技術である。 SIFT here refers to Document 1 (David G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints", International Journal of Computer Vision, 2004.
) Is a technique for detecting a plurality of characteristic pixels in an image and calculating a feature amount from information on a peripheral region for each pixel.

ここでいう特徴的な画素とは、一言で言うと周辺に対して極大又は極小、即ち極値である画素のことをいう。但し、単純に画像内で極値というわけでない。ＳＩＦＴでいう極値画素とは複数の、且つ連続したサイズ（分散）のガウシアンフィルタを画像に施し、複数枚のぼかした画像を作成し、サイズの順番に並べた後、連続に並んだ画像の差分画像（Difference of Gaussian：ＤｏＧ）を作成したものである。特徴的な画素は、注目画素が同じＤｏＧの画像内で極値なだけでなく、１つサイズが小さいＤｏＧ及び１つサイズが大きいＤｏＧの注目画素と対応する画素に対して極値である場合に、極値として検出される。 The characteristic pixel referred to here means a pixel that is maximal or minimal, that is, an extreme value with respect to the periphery. However, it is not simply an extreme value in the image. An extreme pixel in SIFT is a process of applying a plurality of continuous Gaussian filters of size (dispersion) to an image, creating a plurality of blurred images, arranging them in the order of size, A difference image (Difference of Gaussian: DoG) is created. A characteristic pixel is not only an extreme value in an image of the same DoG but also an extreme value for a pixel corresponding to a small DoG target pixel and a large DoG target pixel. Are detected as extreme values.

このようにすることにより、極値を構成する山、あるいは谷がどのサイズのガウシアンに最も合致するかを知ることができる。以降、そのサイズのガウシアンによって得られたＤｏＧを用いて周辺領域の特徴量を算出すれば、入力画像と、検索対象画像とのサイズが異なっていても、対応する点において類似する特徴量が得られる。つまり、ＳＩＦＴは画像のスケールに対して不変量を算出することができることになる。 In this way, it is possible to know which size of Gaussian the peak or valley constituting the extreme value most closely matches. Thereafter, by calculating the feature amount of the peripheral region using DoG obtained by the Gaussian of that size, even if the input image and the search target image are different in size, similar feature amounts can be obtained in corresponding points. It is done. That is, SIFT can calculate an invariant with respect to the scale of the image.

文献１では、これ以降の処理として、得られた特徴的な画素がエッジ上の点であるか否か、周辺画素のコントラストは閾値以上か、と言った処理によって検出された特徴的な画素の選定、及びパラボラフィッティングによる特徴的な画素の詳細な位置推定を行っているが、本発明の本質と離れるため、説明を省略する。 In Document 1, as the subsequent processing, whether or not the characteristic pixel obtained is a point on the edge, whether or not the contrast of the peripheral pixel is equal to or higher than the threshold, the characteristic pixel detected by the processing Although detailed position estimation of characteristic pixels by selection and parabolic fitting is performed, the description is omitted because it is different from the essence of the present invention.

続いて、ＳＩＦＴにおける特徴的な画素周辺の特徴量算出方法について述べる。ＳＩＦＴでは検索対象画像が入力画像内で回転して存在していても、対応する特徴的な画素が検出できるように、特徴的な画素のオリエンテーション推定を行う。 Subsequently, a characteristic amount calculation method around a characteristic pixel in SIFT will be described. In SIFT, characteristic pixel orientation estimation is performed so that a corresponding characteristic pixel can be detected even if the search target image exists in a rotated manner in the input image.

そのためにまずは、勾配強度及び勾配方向を示すデータを算出する。画像の画素値Ｌ（ｕ，ｖ）、勾配強度ｍ（ｕ，ｖ）、勾配方向θ（ｕ，ｖ）とすると、以下のように算出できる。 For this purpose, first, data indicating the gradient strength and gradient direction is calculated. Assuming that the pixel value L (u, v) of the image, the gradient intensity m (u, v), and the gradient direction θ (u, v), it can be calculated as follows.

その後、勾配方向を１０度ずつ、３６方向に離散化したヒストグラムを用意する。該ヒストグラムには、勾配強度に対し、注目画素を中心とするガウシアンを掛け合わせた値を加算していく。該ヒストグラムにおいて最も大きな値を示す方向が特徴的な画素のオリエンテーションとなる。 Thereafter, a histogram is prepared by discretizing the gradient direction by 10 degrees in 36 directions. In the histogram, a value obtained by multiplying the gradient intensity by Gaussian centered on the target pixel is added. The direction of the largest value in the histogram is the characteristic pixel orientation.

次に、特徴的な画素のオリエンテーションが画像の上方に向くよう画像を回転させる。その後、特徴的な画素の周辺領域を一辺４ブロックの計１６ブロックに分割する。ブロックごとに４５度ずつ、８方向の勾配ヒストグラムを作成することにより、４×４×８＝１２８次元の特徴量が得られる。このように特徴的な画素周辺の画素を、推定した特徴的な画素のオリエンテーションが画像の上方に向くよう正規化した後に、特徴量を算出するため、得られた特徴量は画像の回転に対して不変な特徴量になる。 Next, the image is rotated so that the characteristic pixel orientation is directed upward. Thereafter, the peripheral area of the characteristic pixel is divided into 16 blocks, 4 blocks on each side. By creating a gradient histogram in 8 directions at 45 degrees for each block, 4 × 4 × 8 = 128-dimensional feature values can be obtained. After normalizing the pixels around the characteristic pixels in this way so that the estimated orientation of the characteristic pixels is directed upward, the calculated feature values are calculated with respect to the rotation of the image. And invariant features.

以上述べたように、ＳＩＦＴ算出部４１及び４２は、ＳＩＦＴ特徴量を用いることにより、検索対象画像を入力画像内で探索する際に、両者のサイズの違いや回転に対して極めて安定して探索することができる。 As described above, the SIFT calculation units 41 and 42 use the SIFT feature value to search extremely stably with respect to the difference in size and rotation when searching for the search target image in the input image. can do.

続いて、ＳＩＦＴ比較部４３は、検索対象画像に含まれる特徴的な画素と近い特徴量を持つ特徴的な画素（対応点）が入力画像内に存在するか否かを１つ１つ、総当たりで調べる（ステップＳ２５）。ここでいう近い特徴量とは、特徴的な画素同士のユークリッド距離が所定の閾値（例えば３００）以下であるものを指す。 Subsequently, the SIFT comparison unit 43 determines whether or not characteristic pixels (corresponding points) having characteristic amounts close to characteristic pixels included in the search target image exist in the input image one by one. The hit is examined (step S25). The near feature amount referred to here indicates that the Euclidean distance between characteristic pixels is equal to or less than a predetermined threshold (for example, 300).

ＳＩＦＴは局所領域だけを参照して得られる特徴量なので、得られた対応点は必ずしも同じ画像に属するものではない。そこで、対応点選定部４４は、対応点の位置関係から、対応点を選定する（ステップＳ２６）。選定するために、文献１ではハフ（Hough）変換を用いている。即ち、１つの対応点のサイズ及びオリエンテーションから検索対象画像の平面が入力画像内にて、どのような姿勢をとっているかを推定することができる。この推定値を量子化し、それらの分布を見ることによって、対応点の選定を行う。同じ姿勢を示す対応点が多数有る場合、それらは信頼性の高い対応点であり、同じ姿勢を示す対応点が少数で有る場合、それらは信頼性の低い対応点であると言える。なお、本実施形態では、同じ姿勢を示す点の数を対応点数と呼ぶ。 Since SIFT is a feature quantity obtained by referring to only a local region, the obtained corresponding points do not necessarily belong to the same image. Therefore, the corresponding point selection unit 44 selects corresponding points from the positional relationship of the corresponding points (step S26). In order to make the selection, Hough transform is used in Document 1. That is, it is possible to estimate what posture the plane of the search target image takes in the input image from the size and orientation of one corresponding point. Corresponding points are selected by quantizing the estimated values and looking at their distribution. When there are a large number of corresponding points indicating the same posture, they are highly reliable corresponding points, and when there are a small number of corresponding points indicating the same posture, they can be said to be corresponding points with low reliability. In the present embodiment, the number of points indicating the same posture is referred to as the corresponding number of points.

対応点選定部４４は、対応点数が３以上有る場合は、検索対象画像が入力画像内に存在していると判定し、対応点数が２以下である場合は、検索対象画像が入力画像内に存在しないとして類似度を０と算出する（ステップＳ２７）。 The corresponding point selection unit 44 determines that the search target image exists in the input image when the number of corresponding points is 3 or more, and when the number of corresponding points is 2 or less, the search target image is included in the input image. The similarity is calculated as 0 because it does not exist (step S27).

検索対象画像が入力画像内に存在する場合、対応領域抽出部４５は、入力画像から、検索対象画像と対応する領域を抽出する（ステップＳ２８）。検索対象画像が入力画像内に存在する場合、前述のとおり、検索対象画像の平面が、入力画像内でどのような姿勢をとっているかを推定することができるので、該姿勢の平面を入力画像から抽出すればよい。 When the search target image exists in the input image, the corresponding region extraction unit 45 extracts a region corresponding to the search target image from the input image (step S28). When the search target image exists in the input image, as described above, it is possible to estimate what posture the plane of the search target image takes in the input image. Extract from

続いて、画像比較手段５０は、抽出した画像と検索対象画像とを比較して、両者の類似度を算出する（ステップＳ２９）。なお、画像比較手段５０の詳細な動作については後述する。 Subsequently, the image comparison means 50 compares the extracted image with the search target image and calculates the similarity between the two (step S29). The detailed operation of the image comparison unit 50 will be described later.

以上のように、第１の類似度算出手段２１は、検索対象画像が入力画像内に存在するか否かを判定し、存在すると判定した場合に類似度を算出することができる。 As described above, the first similarity calculation unit 21 determines whether or not the search target image exists in the input image, and can calculate the similarity when it is determined that the search target image exists.

（画像比較手段５０の動作）
画像比較手段５０は、２つの画像から画像の特徴を示す特徴量を算出し、それらを比較することによって類似度を算出する。本実施形態では色、エッジ及び模様の３種類の特徴量を算出するものとする。 (Operation of the image comparison means 50)
The image comparison unit 50 calculates a feature amount indicating the feature of the image from the two images, and calculates the similarity by comparing them. In the present embodiment, it is assumed that three types of feature amounts of color, edge, and pattern are calculated.

前提として、画像内の各画素の色情報は、Ｒ（レッド）、Ｇ（グリーン）、Ｂ（ブルー）の３原色の階調をそれぞれ０〜２５５の２５６階調で示しているものとする。３原色の階調が何れも０である場合には、その画素の色は黒となる。また、３原色の階調が何れも２５５である場合には、その画素の色は白となる。このように、各画素には、ｓＲＧＢ表色系の３次元の色情報が割り当てられているものとする。 As a premise, the color information of each pixel in the image is assumed to indicate the gradations of the three primary colors R (red), G (green), and B (blue) with 256 gradations of 0 to 255, respectively. When the gradations of the three primary colors are all 0, the color of the pixel is black. When the gradations of the three primary colors are all 255, the color of the pixel is white. As described above, it is assumed that three-dimensional color information of the sRGB color system is assigned to each pixel.

最初にエッジ特徴量の算出方法について述べる。まず、画像内の画素マトリクスの各画素に対し、図８に示すような３画素×３画素のフィルタリングマトリクスを用いて、畳み込み積分を施す。画像内の注目画素に対して、図示のフィルタリングマトリクスにおける中心画素の値"４"を割り当てるとともに、その注目画素の周囲に存在する画素に対して、フィルタリングマトリクスの中心画素の周囲に存在する画素の値を割り当てるのである。このような畳み込み積分を画像全体に施して、エッジ画像を得る。その後、所定の閾値（例えば１２８）を用いて画像を２値化する。次に、画像全体を例えば１０×１０の同サイズのブロックに等分して、それぞれのブロック中で２値化の閾値を超える画素をカウントする。以上の処理により１００次元のベクトルが得られる。また、ブロックに含まれる全ての画素数で除算することで正規化し、ベクトルの各要素の値を０〜１に正規化しておく。 First, a method for calculating the edge feature amount will be described. First, convolution integration is performed on each pixel of the pixel matrix in the image using a 3 × 3 filtering matrix as shown in FIG. The central pixel value “4” in the illustrated filtering matrix is assigned to the target pixel in the image, and the pixels existing around the central pixel of the filtering matrix are assigned to the pixels existing around the target pixel. Assign a value. Such convolution integration is performed on the entire image to obtain an edge image. Thereafter, the image is binarized using a predetermined threshold (for example, 128). Next, the entire image is equally divided into, for example, 10 × 10 blocks of the same size, and the pixels exceeding the binarization threshold in each block are counted. A 100-dimensional vector is obtained by the above processing. Also, normalization is performed by dividing by the number of all pixels included in the block, and the value of each element of the vector is normalized to 0-1.

次に、色特徴量の算出方法について述べる。まず、画像に含まれる画素を全て２５５で除算し、正規化しておく。下記の数３〜数９に基づいて画像をｓＲＧＢ表色系からＬａｂ表色系の色表現に変換する。 Next, a method for calculating the color feature amount will be described. First, all pixels included in the image are divided by 255 and normalized. The image is converted from the sRGB color system to the Lab color system based on the following equations 3 to 9.

光源としてＤ６５光源を想定した場合、Ｘｎ＝０．９５、Ｙｎ＝１．００、Ｚｎ＝１．０９となる。このようにしてＬａｂ表色系に変換したら、次に、エッジ特徴量の算出と同様にして、画像を１０×１０のブロックに等分し、それぞれのブロックで平均Ｌａｂを得る。更に得られたＬａｂは以下の式で０〜１の値に正規化したＬ'ａ'ｂ'に変換しておく。この結果、１００×３＝３００次元のベクトルが得られる。 When a D65 light source is assumed as the light source, Xn = 0.95, Yn = 1.00, and Zn = 1.09. After conversion to the Lab color system in this way, the image is then equally divided into 10 × 10 blocks in the same manner as the calculation of the edge feature amount, and an average Lab is obtained for each block. Further, the obtained Lab is converted into L′ a′b ′ normalized to a value of 0 to 1 by the following formula. As a result, a 100 × 3 = 300-dimensional vector is obtained.

次に、模様特徴量の算出方法について述べる。模様特徴量の算出には、周知の濃度共起行列が用いられる。濃度共起行列は、ある小領域において図９に示すように濃淡画像の明るさがｋの画素からδ（ｒ，θ）で示される相対位置に１の画素が出現する頻度をｐ_δ（ｒ，θ）とする。ｓＲＧＢ表色系の画像をグレー画像に変換し、その後、各画素を１６で割って余りを捨てることで１６の階調に量子化する。その後、１６の階調数をｍとして、数１３に基づいて１６×１６次元の濃度共起行列を得る。なお、得られたマトリクスの値を画像に含まれる画素数で割り、０〜１の値に正規化しておく。 Next, a method for calculating the pattern feature amount will be described. A known density co-occurrence matrix is used for calculating the pattern feature amount. Co-occurrence matrix, the frequency of 1 pixel in the small area from the pixel brightness of the k of the grayscale image shown in FIG. 9 δ (r, θ) in the relative position indicated by that appears p _[delta] (r , Θ). The sRGB color system image is converted to a gray image, and then each pixel is divided by 16 and the remainder is discarded to quantize it to 16 gradations. Then, a 16 × 16-dimensional density co-occurrence matrix is obtained based on Equation 13, where 16 is the number of gradations. The obtained matrix value is divided by the number of pixels included in the image and normalized to a value of 0 to 1.

本実施形態では、δ（１，０）、δ（１，４５）及びδ（１，９０）の３種類の濃度共起行列を得ているため、最終的には２５６×３＝７６８次元の特徴量ベクトルが得られる。 In this embodiment, since three types of density co-occurrence matrices of δ (1,0), δ (1,45), and δ (1,90) are obtained, the final result is 256 × 3 = 768 dimensions. A feature vector is obtained.

濃度共起行列は画像の周波数情報の概略を示す特徴量であるため、テクスチャ特徴量の算出に用いることが可能である。なお、周波数情報の取得にはフーリエ変換を用いることが可能である。また、ＭＦＰ（複合機）に記憶されている画像データファイルがＪＰＥＧ方式で圧縮されているものであれば離散コサイン変換（Discrete Cosine Transform）を用いることで、容易に周波数情報を得ることができる。以上のようにして、最終的には１００＋３００＋７６８＝１１６８次元の特徴量ベクトルが得られる。 Since the density co-occurrence matrix is a feature amount indicating an outline of the frequency information of the image, it can be used for calculating the texture feature amount. In addition, it is possible to use Fourier transformation for acquisition of frequency information. Further, if the image data file stored in the MFP (multifunction machine) is compressed by the JPEG method, the frequency information can be easily obtained by using the Discrete Cosine Transform. As described above, a 100 + 300 + 768 = 1168-dimensional feature vector is finally obtained.

以上の特徴量を２つの画像でそれぞれ算出し、それらのユークリッド距離を算出することで画像の類似度を算出することができる。特徴量は全て０〜１に正規化されているため、１からユークリッド距離を差し引けば０〜１の類似度が得られる。 The above feature quantities are calculated for two images, respectively, and the Euclidean distance between them is calculated to calculate the image similarity. Since the feature amounts are all normalized to 0 to 1, if the Euclidean distance is subtracted from 1, the similarity of 0 to 1 can be obtained.

以上のように、画像比較手段５０が、ＳＩＦＴの対応点だけでなく、画像の特徴量から類似度を算出することにより、第１の類似度算出手段２１は、更に高精度に類似度を算出することができる。 As described above, when the image comparison unit 50 calculates the similarity based not only on the corresponding points of SIFT but also on the feature amount of the image, the first similarity calculation unit 21 calculates the similarity with higher accuracy. can do.

（第１の類似判定手段２３の動作）
次に、第１の類似判定手段２３の動作について述べる。第１の類似判定手段２３は、第１の類似度算出手段２１が算出した類似度が所定の閾値を超えているか否かで、第１の検索対象画像に付与されたタグを入力画像に付けるか否かを決定する。ここで、通常の閾値は０．８とするが、対応点の数が例えば１０よりも多い場合には、閾値を０．７とするのが好ましい。一般に、複数の対応点が同じ姿勢であると、誤って判定される可能性は低い。そのため、図７のステップＳ２７では"３"という極めて少ない対応点の数が存在した場合に、検索対象画像が入力画像内に存在すると判定している。したがって、対応点が十分に多い場合は類似度が高い可能性が高いので、この場合に閾値を下げている。これにより、本来類似している画像を非類似と誤判定する可能性が低くなり、正しいタグ付けが行える可能性が高くなる。 (Operation of the first similarity determination means 23)
Next, the operation of the first similarity determination unit 23 will be described. The first similarity determination means 23 attaches a tag given to the first search target image to the input image depending on whether or not the similarity calculated by the first similarity calculation means 21 exceeds a predetermined threshold. Determine whether or not. Here, the normal threshold value is 0.8, but when the number of corresponding points is more than 10, for example, the threshold value is preferably 0.7. In general, when a plurality of corresponding points have the same posture, there is a low possibility of being erroneously determined. Therefore, in step S27 of FIG. 7, when there is a very small number of corresponding points of “3”, it is determined that the search target image exists in the input image. Therefore, when there are a sufficient number of corresponding points, there is a high possibility that the degree of similarity is high. In this case, the threshold value is lowered. As a result, the possibility that an image that is originally similar is erroneously determined to be dissimilar is reduced, and the possibility that correct tagging can be performed is increased.

（第２の類似度算出手段２４の動作）
第２の類似度算出手段２４の動作は、図７のステップＳ２９における処理と同様であり、入力画像と第２の検索対象画像に含まれる画像との類似度を比較する。但し、ここでの比較は、画像の局所領域における比較を行うのではなく、画像全体同士を比較する。 (Operation of Second Similarity Calculation Unit 24)
The operation of the second similarity calculation means 24 is the same as the processing in step S29 in FIG. 7, and the similarity between the input image and the image included in the second search target image is compared. However, the comparison here does not compare in the local region of the image, but compares the entire images.

（キーワード登録時の動作）
次に、キーワード登録時のサーバＰＣの動作を図１及び図１０に基づいて説明する。図１０は、キーワード登録時のサーバＰＣの動作を示すフローチャートである。 (Operation when registering keywords)
Next, the operation of the server PC at the time of keyword registration will be described based on FIG. 1 and FIG. FIG. 10 is a flowchart showing the operation of the server PC at the time of keyword registration.

領域指定手段３１は入力画像のデータを入力し（ステップＳ３１）、領域が指定された入力画像のデータを極値画素検出手段３２に出力する。この入力画像に対し、極値画素検出手段３２は、前述のＳＩＦＴ算出（図７ステップＳ２２）と同様の処理を行う（ステップＳ３２）。但し、極値画素検出手段３２は、入力画像内で特徴的な画素の検出を行うが、特徴量算出は行わない。極値画素検出手段３２は、この処理により特徴的な画素の数を算出し、特徴的な画素の数が例えば２０以下の場合は、領域が指定された入力画像は識別に不適としてユーザに警告を出す（ステップＳ３３「Ｎｏ」）。一方、特徴的な画素の数が２０よりも大きい場合は、画像とともにキーワードを登録する（ステップＳ３３「Ｙｅｓ」）。なお、ＳＩＦＴを用いた類似領域が存在するか否かを判定する手法において、対応点の選定を行うため、十分な数の特徴的な画素が存在しないと、類似領域が存在すると判定できる可能性が低くなる。ここで、十分な数の特徴的な画素が存在しない画像は、模様の少ない一様な画像であることが多い。一様な画像は前述のとおり様々な画像に存在しうるので識別には適さない。 The area designating unit 31 inputs the input image data (step S31), and outputs the input image data in which the area is designated to the extreme pixel detection unit 32. The extreme pixel detection means 32 performs the same processing as this SIFT calculation (step S22 in FIG. 7) on this input image (step S32). However, the extreme pixel detection means 32 detects a characteristic pixel in the input image, but does not calculate a feature amount. The extreme pixel detection means 32 calculates the number of characteristic pixels by this processing, and if the number of characteristic pixels is 20 or less, for example, the input image in which the area is designated is not suitable for identification and warns the user. (Step S33 “No”). On the other hand, if the number of characteristic pixels is greater than 20, a keyword is registered together with the image (step S33 “Yes”). In the method of determining whether or not there is a similar region using SIFT, there is a possibility that it can be determined that a similar region exists if there is not a sufficient number of characteristic pixels to select corresponding points. Becomes lower. Here, an image without a sufficient number of characteristic pixels is often a uniform image with few patterns. Since a uniform image can exist in various images as described above, it is not suitable for identification.

以上のように、本実施形態における画像分類装置１０によれば、第１の類似度算出手段２１は、ユーザにとって具体的なキーワードがタグとして付された第１の検索対象画像と入力画像との局所領域における類似度を算出し、第２の類似度算出手段２４は、曖昧なキーワードのタグが付された第２の検索対象画像と入力画像との画像全体における類似度を算出する構成としたので、ユーザの負担を最小限に抑えつつ、カスタマイズ可能で、且つ画像全体を用いた分類を行うことができる。 As described above, according to the image classification device 10 of the present embodiment, the first similarity calculation unit 21 calculates the first search target image and the input image that are tagged with a specific keyword for the user. The similarity in the local region is calculated, and the second similarity calculation means 24 is configured to calculate the similarity in the entire image between the second search target image tagged with the ambiguous keyword tag and the input image. Therefore, it is possible to perform customization while minimizing the burden on the user and perform classification using the entire image.

なお、前述の実施形態において、第１の検索対象画像をサーバＰＣのハードディスク装置に格納する構成を例に挙げて説明したが、本発明はこれに限定されるものではなく、第１の検索対象画像をクライアントＰＣのハードディスク装置に格納する構成としてもよい。また、例えば、サーバＰＣのハードディスク装置にユーザ毎のフォルダを設け、各ユーザと第１の検索対象画像とを関連付けて格納する構成としてもよい。 In the above-described embodiment, the configuration in which the first search target image is stored in the hard disk device of the server PC has been described as an example. However, the present invention is not limited to this, and the first search target image is stored. The image may be stored in the hard disk device of the client PC. Further, for example, a folder for each user may be provided in the hard disk device of the server PC, and each user and the first search target image may be stored in association with each other.

（第２実施形態）
まず、本発明に係る画像分類装置の第２実施形態における構成について説明する。 (Second Embodiment)
First, the configuration of the image classification apparatus according to the second embodiment of the present invention will be described.

図１１に示すように、本実施形態における画像分類装置は、画像を登録する画像登録装置９０を備えている。なお、第１実施形態と同様な構成には同一の符号を付し、構成の説明は省略する。 As shown in FIG. 11, the image classification device according to the present embodiment includes an image registration device 90 for registering images. In addition, the same code | symbol is attached | subjected to the structure similar to 1st Embodiment, and description of a structure is abbreviate | omitted.

画像登録装置９０は、第１の類似度を算出する第１の類似度算出手段２１と、第１の検索対象画像のデータを記憶する第１の画像ＤＢ２２と、第１の類似判定を行う第１の類似判定手段９１と、第２の類似度を算出する第２の類似度算出手段９２と、特徴量のデータを記憶する特徴量ＤＢ９３と、第２の類似判定を行う第２の類似判定手段９４とを備えている。 The image registration apparatus 90 includes a first similarity calculation unit 21 that calculates a first similarity, a first image DB 22 that stores data of a first search target image, and a first similarity determination. 1 similarity determination unit 91, a second similarity calculation unit 92 that calculates a second similarity, a feature amount DB 93 that stores feature amount data, and a second similarity determination that performs a second similarity determination Means 94.

図１１に示すように、本実施形態における画像登録装置９０は、第１実施形態とほぼ構成を同じくするものであるが、２つの点で第１実施形態と異なる。 As shown in FIG. 11, an image registration apparatus 90 in the present embodiment has substantially the same configuration as that of the first embodiment, but differs from the first embodiment in two points.

まず１つ目に、第１実施形態では、入力画像と、第１及び第２の検索対象画像を個々に比較していたが、本実施形態では、第２の検索対象画像について個々の画像と比較するのではなく、同じタグの付けられた様々な画像から特徴量を算出し、（特徴量算出方法については前述の画像比較手段５０と同様）それらを用いて非特許文献１記載のＳＶＭを用いることにより、識別を行うものとする。ＳＶＭは未知の入力に対して、比較的正確に分類を行うことのできる、即ち汎化性能の高い２値識別器である。そのため、ＳＶＭを利用することで、事前に、用意すべき画像の数が少なくて済む。更に、ＳＶＭはノイズ除去作用、同様のデータは無視すると言った特性を持つため、１つ１つの画像と比較するよりも、精度が高く、高速な識別が行える。なお、ＳＶＭの手法については後述する。 First, in the first embodiment, the input image and the first and second search target images are individually compared. However, in the present embodiment, each of the second search target images is Rather than comparing, the feature amount is calculated from various images with the same tag (the feature amount calculation method is the same as that of the image comparison unit 50 described above), and the SVM described in Non-Patent Document 1 is used by using them. It shall be identified by using it. The SVM is a binary discriminator that can classify an unknown input relatively accurately, that is, has high generalization performance. Therefore, the number of images to be prepared in advance can be reduced by using SVM. Furthermore, since the SVM has a noise removal function and ignores similar data, it is more accurate and can be identified at a higher speed than comparing each image. The SVM method will be described later.

更に２つ目に、本実施形態では、画像にタグを付けるのではなく、所定のタグに関連付けられたフォルダに分類して保存することとしている。このとき、１つの画像に対して複数のタグを付けることができず、１つに絞らなければならない。一般的に、曖昧な情報よりも具体的な情報の方が優先されることが予想される。そのため、本実施形態では、第１の検索対象画像が入力画像に含まれている場合には、該検索対象画像に付与されたタグを入力画像に付与するとともに、第２の類似度算出手段９２及び第２の類似判定手段９４の処理を行わない構成とした。これにより、計算処理量が削減され、画像登録時にユーザの待ち時間が減ってユーザの利便性が向上する。 Second, in the present embodiment, the image is not attached with a tag but is classified and stored in a folder associated with a predetermined tag. At this time, a plurality of tags cannot be attached to one image and must be narrowed down to one. In general, it is expected that specific information is given priority over ambiguous information. Therefore, in the present embodiment, when the first search target image is included in the input image, the tag added to the search target image is added to the input image, and the second similarity calculation unit 92 is added. The second similarity determination unit 94 is not processed. Thereby, the amount of calculation processing is reduced, the waiting time of the user is reduced during image registration, and the convenience for the user is improved.

（ＳＶＭによる分類規則の生成方法）
次に、ＳＶＭによる分類規則の生成方法について説明する。ＳＶＭは数１４に示すように入力ベクトルと重みベクトルωとの内積が特定の閾値を超えていればｙ＝１を、超えていなければｙ＝−１を出力する識別器であり、出力ｙ＝１のとき入力画像は文書画像であり、出力ｙ＝−１のとき入力画像は写真画像であると判定するものとする。つまり、ＳＶＭの学習とは重みベクトルω及び閾値ｈを決定する作業である。なお、ＳＶＭの学習についての詳細な説明は前述の文献１に記載されているが、その概要を以下説明する。 (Method for generating classification rules by SVM)
Next, a method for generating classification rules by SVM will be described. The SVM is a discriminator that outputs y = 1 if the inner product of the input vector and the weight vector ω exceeds a specific threshold, and outputs y = −1 if it does not exceed the output y = Assume that the input image is a document image when 1 and the input image is a photographic image when output y = −1. That is, SVM learning is an operation for determining the weight vector ω and the threshold value h. In addition, although the detailed description about learning of SVM is described in the above-mentioned literature 1, the outline | summary is demonstrated below.

図１３はＳＶＭの動作の概要を示したものである。まず前提条件として、図の左に示すように○×で表される２つの種類のベクトル群が存在するものとする。ＳＶＭはこれら２つのクラスを最適に分離するための超平面（図１３の右参照）を決定するためのアルゴリズムといえる。ＳＶＭにおいて、２つのベクトル群を最適に分割するとは、未知のベクトルが入力されたときの対応能力、つまり汎化能力を最大とするということと等価である。これを実現するために、２つのベクトル群の境界位置に存在するベクトル（Support Vector）を見つけて、このベクトルと超平面との距離が最大となるよう超平面を設定する。 FIG. 13 shows an outline of the operation of the SVM. First, as a precondition, it is assumed that there are two types of vector groups represented by ○ × as shown on the left side of the figure. SVM can be said to be an algorithm for determining a hyperplane (see the right in FIG. 13) for optimally separating these two classes. In SVM, optimally dividing two vector groups is equivalent to maximizing the ability to respond when an unknown vector is input, that is, the generalization ability. In order to realize this, a vector (Support Vector) existing at the boundary position between two vector groups is found, and the hyperplane is set so that the distance between this vector and the hyperplane is maximized.

ここで、実動作上、誤った教師データも存在するため、誤りの許容量（ソフトマージン）を設定するパラメータを決める必要がある。以上は、線形ＳＶＭについての説明であるが、実際の教師データは線形判別できるベクトル群とは限らない。しかし、特徴量ベクトルを、より高次の空間に射影し（カーネルトリック）、その空間上で超平面を求めることで、非線形な問題にも対応できる。 Here, since there is erroneous teacher data in actual operation, it is necessary to determine a parameter for setting an error tolerance (soft margin). The above is a description of the linear SVM, but actual teacher data is not necessarily a vector group that can be linearly discriminated. However, by projecting the feature vector onto a higher-order space (kernel trick) and obtaining a hyperplane in that space, it is possible to cope with nonlinear problems.

以上を実現するには、結果的に、教師データ（ｘ_ｉ，ｙ_ｉ）を用いて数１５の条件下で数１６を最大化するラグランジェ乗数ベクトルα_ｉを求めることになる。その後ラグランジェ乗数ベクトルの要素のうち０でない要素に対応する教師データ群Ｓ（これがサポートベクタとなる）と、そのうち、任意の１つの教師データ（ｘ_０，ｙ_０）を用いて超平面のパラメータω，ｈを求める（数１７、数１８）。 In order to realize the above, as a result, a Lagrange multiplier vector α _i that maximizes Expression 16 under the condition of Expression 15 is obtained using the teacher data (x _i , y _i ). Then, a super plane parameter is obtained using a teacher data group S corresponding to a non-zero element of Lagrange multiplier vector elements (this is a support vector) and any one of the teacher data (x ₀ , y ₀ ). ω and h are obtained (Equations 17 and 18).

なお、数１６において、Ｋ（ｘ，ｙ）はカーネルトリックを実現するためのカーネル関数を示す。カーネル関数には様々なものが考案されているが、今回の実施形態ではＲＢＦ（Radial Basis Function）を用いた。ＲＢＦは数１９で示される関数でＣは任意の数値である。以上述べたように、ＳＶＭを用いて学習を行うにはソフトマージンの許容量を設定するパラメータγとカーネル関数であるＲＢＦを決定するためのＣを設定する必要があるが、文献２（Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines, 2001）に記載されたように、Ｃ及びγの範囲と、それらの値のステップ幅とを予め定めて、総当たりで識別率を算出し、最も良い識別率が得られるＣ及びγを決定するのが好ましい。 In Equation 16, K (x, y) represents a kernel function for realizing a kernel trick. Various kernel functions have been devised. In this embodiment, RBF (Radial Basis Function) is used. RBF is a function expressed by Equation 19, and C is an arbitrary numerical value. As described above, in order to perform the learning using the SVM, it is necessary to set the parameter γ for setting the allowable amount of the soft margin and C for determining the RBF as the kernel function, but Document 2 (Chih− As described in Chung Chang and Chih-Jen Lin, LIBSVM: a library for support vector machines, 2001), the range of C and γ and the step width of those values are determined in advance, and the round-robin identification rate It is preferable to calculate C and γ that obtain the best discrimination rate.

次に、本実施形態における画像分類装置の動作について図１１及び図１２を用いて説明する。図１２は、本実施形態における画像分類装置の動作を示すフローチャートである。 Next, the operation of the image classification apparatus according to the present embodiment will be described with reference to FIGS. FIG. 12 is a flowchart showing the operation of the image classification apparatus according to this embodiment.

第１の類似判定手段２３は、入力画像と第１の検索対象画像とが類似しているか否かを予め定めた閾値に基づいて判定し（ステップＳ４１）、類似している場合には、第１の検索対象画像に付与されたタグと関連付けられたフォルダに入力画像のデータを移動する（ステップＳ４５）。 The first similarity determination means 23 determines whether or not the input image and the first search target image are similar based on a predetermined threshold (step S41). The data of the input image is moved to the folder associated with the tag assigned to one search target image (step S45).

一方、ステップＳ４１において、入力画像と第１の検索対象画像とが類似していない場合、第２の類似度算出手段９２は、特徴量ＤＢ９３から所定の特徴量データを読み出し（ステップＳ４２）、ＳＶＭを用いて第２の類似度を算出する（ステップＳ４３）。算出された第２の類似度のデータは、第２の類似判定手段９４に送られる。 On the other hand, if the input image and the first search target image are not similar in step S41, the second similarity calculation unit 92 reads predetermined feature amount data from the feature amount DB 93 (step S42), and SVM. Is used to calculate the second similarity (step S43). The calculated second similarity data is sent to the second similarity determination means 94.

第２の類似判定手段９４は、第２の類似度算出手段９２により算出された類似度に基づき、入力画像と第２の検索対象画像とが画像全体で類似しているか否かを判定する（ステップＳ４４）。 The second similarity determination unit 94 determines whether or not the input image and the second search target image are similar in the entire image based on the similarity calculated by the second similarity calculation unit 92 ( Step S44).

ステップＳ４４において、第２の類似判定手段９４は、入力画像と第２の検索対象画像とが画像全体で類似している場合、第２の検索対象画像に付与されたタグと関連付けられたフォルダに入力画像のデータを移動し（ステップＳ４５）、入力画像と第２の検索対象画像とが画像全体で類似していない場合は処理を終了する。 In step S44, when the input image and the second search target image are similar to each other in the entire image, the second similarity determination unit 94 stores the tag in the folder associated with the tag assigned to the second search target image. The data of the input image is moved (step S45), and if the input image and the second search target image are not similar in the entire image, the process ends.

以上のように、本実施形態における画像分類装置によれば、第２の類似度算出手段９２は、入力画像と第１の検索対象画像とが類似していない場合にのみ類似度を算出する構成としたので、１画像について１つの画像分類を行う場合において、不要な処理を削減して高速に画像分類処理を行うことができ、画像登録時にユーザの待ち時間を減少させ、ユーザの利便性を向上させることができる。 As described above, according to the image classification apparatus of the present embodiment, the second similarity calculation unit 92 is configured to calculate the similarity only when the input image and the first search target image are not similar. Therefore, when performing one image classification for one image, unnecessary processing can be reduced and image classification processing can be performed at high speed, reducing the waiting time of the user at the time of image registration, and improving user convenience. Can be improved.

以上のように、本発明に係る画像分類装置及び画像分類方法は、ユーザの負担を最小限に抑えつつ、カスタマイズ可能で、且つ画像全体を用いた分類を行うことができるという効果を有し、プリンタ、複写機、デジタルカメラ及びＰＣやサーバのソフトウェア等として有用である。 As described above, the image classification device and the image classification method according to the present invention have the effect that customization is possible and classification using the entire image can be performed while minimizing the burden on the user. It is useful as software for printers, copiers, digital cameras, PCs and servers.

本発明の第１実施形態における画像分類装置のブロック図1 is a block diagram of an image classification device according to a first embodiment of the present invention. 本発明の第１実施形態における第１の類似度算出手段のブロック図The block diagram of the 1st similarity calculation means in 1st Embodiment of this invention. 本発明の第１実施形態における画像分類装置を実現するコンピュータのブロック図1 is a block diagram of a computer that implements an image classification apparatus according to a first embodiment of the present invention. 本発明の第１実施形態における画像分類装置においてウェブブラウザの立ち上げ時のダイアログ画面を示す図The figure which shows the dialog screen at the time of starting of a web browser in the image classification device in 1st Embodiment of this invention. 本発明の第１実施形態における画像分類装置においてキーワード登録時のダイアログ画面を示す図The figure which shows the dialog screen at the time of keyword registration in the image classification device in 1st Embodiment of this invention. 本発明の第１実施形態における画像分類装置の画像登録時の動作を示すフローチャートThe flowchart which shows the operation | movement at the time of the image registration of the image classification device in 1st Embodiment of this invention. 本発明の第１実施形態における第１の類似度算出手段の詳細な動作を示すフローチャートThe flowchart which shows the detailed operation | movement of the 1st similarity calculation means in 1st Embodiment of this invention. 本発明の第１実施形態における画像分類装置のフィルタリングマトリクスの一例を示す図The figure which shows an example of the filtering matrix of the image classification device in 1st Embodiment of this invention. 本発明の第１実施形態における模様特徴量の算出方法の説明図Explanatory drawing of the calculation method of the pattern feature-value in 1st Embodiment of this invention. 本発明の第１実施形態においてキーワード登録時のサーバＰＣの動作を示すフローチャートThe flowchart which shows operation | movement of server PC at the time of keyword registration in 1st Embodiment of this invention. 本発明の第２実施形態における画像登録装置のブロック図The block diagram of the image registration apparatus in 2nd Embodiment of this invention. 本発明の第２実施形態における画像分類装置の動作を示すフローチャートThe flowchart which shows operation | movement of the image classification device in 2nd Embodiment of this invention. 本発明の第２実施形態におけるＳＶＭの動作の概要を示す図The figure which shows the outline | summary of operation | movement of SVM in 2nd Embodiment of this invention.

Explanation of symbols

１０画像分類装置
２０画像登録装置
２１第１の類似度算出手段
２２第１の画像ＤＢ（第１の画像データ記憶手段）
２３、９１第１の類似判定手段（類似判定手段、キーワード付与手段）
２４、９２第２の類似度算出手段
２５第２の画像ＤＢ（第２の画像データ記憶手段）
２６、９４第２の類似判定手段（キーワード付与手段）
３０キーワード登録装置
３１領域指定手段
３２極値画素検出手段
３３登録手段
３４警告手段
４０局所領域抽出手段
４１、４２ＳＩＦＴ算出部（極値画素検出部、特徴量算出手段）
４３ＳＩＦＴ比較部（特徴量比較部）
４４対応点選定部（極値画素選定部）
４５対応領域抽出部（対応領域抽出部）
５０画像比較手段
７１キーワード入力ボックス
７２検索ボタン
７３画像登録ボタン
７４キーワード登録ボタン
７５画像表示領域
８０ダイアログ画面
８１画像選択ボタン
８２キーワード入力ボックス
８３登録ボタン
８４表示領域
８５警告表示領域
９３特徴量ＤＢ DESCRIPTION OF SYMBOLS 10 Image classification apparatus 20 Image registration apparatus 21 1st similarity calculation means 22 1st image DB (1st image data storage means)
23, 91 First similarity determination means (similarity determination means, keyword assignment means)
24, 92 Second similarity calculation means 25 Second image DB (second image data storage means)
26, 94 Second similarity determination means (keyword assignment means)
DESCRIPTION OF SYMBOLS 30 Keyword registration apparatus 31 Area | region designation | designated means 32 Extreme value pixel detection means 33 Registration means 34 Warning means 40 Local area extraction means 41, 42 SIFT calculation part (Extreme value pixel detection part, feature-value calculation means)
43 SIFT comparison part (feature quantity comparison part)
44 Corresponding point selection part (extreme pixel selection part)
45 Corresponding region extraction unit (corresponding region extraction unit)
DESCRIPTION OF SYMBOLS 50 Image comparison means 71 Keyword input box 72 Search button 73 Image registration button 74 Keyword registration button 75 Image display area 80 Dialog screen 81 Image selection button 82 Keyword input box 83 Registration button 84 Display area 85 Warning display area 93 Feature-value DB

Claims

An image classification device for classifying the input image by comparing the similarity between the image with the keyword and the input image,
First and second image data storage means for storing first and second image data registered with the keyword assigned thereto, and a local area of a predetermined size as the input image and the first image data. Local area extracting means for extracting from the image, first similarity calculating means for calculating the similarity between the input image and the first image in the extracted local area, the entire image of the input image, and the A second similarity calculating means for calculating a similarity between the second image and the whole image; and a keyword is assigned to the input image based on the similarity calculated by the first and second similarity calculating means. An image classification apparatus comprising: a keyword assigning means for performing

The first image data storage unit stores data of an image registered by giving a specific keyword by a user as data of the first image. Image classification device.

3. The second image data storage means stores image data to which a predetermined ambiguous keyword is added as data of the second image. The image classification device described.

The local region extraction means includes an extreme value pixel detection unit that detects an extreme value pixel whose feature value is an extreme value, and an image feature value near the extreme value pixel in each of the input image and the first image. A feature amount calculating unit that calculates the feature amount, a feature amount comparison unit that compares the calculated feature amount, and an extreme pixel that selects an extreme pixel based on a positional relationship of the extreme pixel in each of the input image and the first image. A value pixel selection unit, and a corresponding region extraction unit that extracts an image region corresponding to the first image from the input image based on the number of extreme pixels selected by the extreme pixel selection unit. The image classification device according to any one of claims 1 to 3, wherein the image classification device is characterized by the following.

The first similarity calculation means includes image comparison means for calculating the similarity by comparing the image of the image area extracted by the corresponding area extraction unit with the first image. The image classification device according to claim 4.

The extreme pixel detection unit detects an extreme pixel of an image when a user registers as the first image,
6. The image according to claim 4, further comprising warning means for giving a warning to the user when the number of extreme pixels detected by the extreme pixel detection unit is equal to or less than a predetermined number. Classification device.

An area designating unit for designating a specific area of the input image;
The image classification according to any one of claims 1 to 6, wherein the first image data storage means stores image data of an area designated by the area designation means. apparatus.

Similarity determination means for determining whether or not the input image and the first image are similar based on the similarity calculated by the first similarity calculation means;
The second similarity calculation means calculates the similarity related to the entire image only when the input image and the first image are not similar to each other. The image classification device according to claim 1.

An image classification method for classifying the input image by comparing the similarity between the image with the keyword and the input image,
Storing the data of the first and second images registered with the keyword, extracting a local area of a predetermined size from the input image and the first image, and extracting Calculating the similarity between the input image and the first image in the local region, calculating the similarity between the entire image of the input image and the entire image of the second image, And a step of assigning a keyword to the input image based on the similarity calculated by the first and second images.