JP2014197412A

JP2014197412A - System and method for similarity search of images

Info

Publication number: JP2014197412A
Application number: JP2014121801A
Authority: JP
Inventors: ドン−チンチャン; Dong-Qing Zhang; ジョシラジャン; joshi Rajan; ビー．ベニテスアナ; B Benitez Ana; インルオ; Ying Luo; ジュグオ; Ju Guo
Original assignee: Thomson Licensing SAS
Current assignee: Thomson Licensing SAS
Priority date: 2014-06-12
Filing date: 2014-06-12
Publication date: 2014-10-16

Abstract

PROBLEM TO BE SOLVED: To provide a system and method for an efficient semantic similarity search of images with a classification structure.SOLUTION: A system and method comprises: building a semantic classification-search tree for a plurality of images (202), the classification tree including at least two image categories representing subsets of the plurality of images; receiving a query image (204); classifying the query image to select one of the at least two categories of images (208); and restricting the search for an image of interest by using the query image to the selected one of the at least two image categories (210).

Description

本発明は、コンピューターグラフィック処理及び表示システムに関し、特に、画像の類似検索に関する。 The present invention relates to computer graphic processing and display systems, and more particularly to image similarity retrieval.

クエリ画像に類似する画像の検出及び検索は、実社会における様々なアプリケーションに非常に有益である。本開示で記載される技術は、画像データベースをクエリすることの課題を扱い、好ましくは意味的な水準（つまり、同一のオブジェクト及び背景を含むが、場合により様々なバリエーションを含む画像）でクエリ画像に類似する画像を見つけ出すことである。この課題は、様々なアプリケーションで生じ、例えば、モバイル端末での位置認識サービスであり、ユーザーがランドマークの画像を撮影し、次に、このモバイル端末がユーザーにランドマークの位置と説明を知らせることができる場合である。別のアプリケーションでは、ユーザーが店内で複数の製品の画像を撮影し、次に、このモバイル端末が、異なる小売店で提供される同一の製品を対応する価格でウェブページを返すことができる。著作権侵害の発見において、未許可画像の使用に対してインターネット上を検索することにより、著作権侵害となりうるものを識別することができる。マルチメディアのコンテンツ管理において、複製物及びこれに準じる物は、多数のソースのビデオ、報道及びウェブページにおける記事でのストーリーへのリンクを支援することができる。 Detection and retrieval of images similar to query images is very useful for various applications in the real world. The techniques described in this disclosure address the problem of querying an image database, and preferably query images at a semantic level (ie, images that contain the same object and background, but sometimes include various variations). Is to find an image similar to. This issue arises in various applications, for example, a location recognition service on a mobile terminal, where the user takes an image of the landmark, and then the mobile terminal informs the user of the location and description of the landmark Is the case. In another application, a user can take images of multiple products in a store and then the mobile terminal can return a web page at a corresponding price for the same product offered at a different retail store. In the discovery of copyright infringement, it is possible to identify potential infringements by searching the internet for the use of unauthorized images. In multimedia content management, duplicates and the like can support links to stories in videos, reports, and articles on web pages from multiple sources.

本開示で記載される技術は一般画像又はビデオ検索に適用することができるけれども、本発明の開示は、色やテクスチャ（ｔｅｘｔｕｒｅ）等の低水準の特徴に基づく視覚による検索よりも、意味的な水準で画像又はビデオを検索することに焦点をあてる。低水準の特徴に基づく画像又はビデオの検索は十分に研究されており、高効率の検索アルゴリズムが大規模データベースに対して利用可能である。意味的な水準で画像又はビデオを検索することは、上記低水準の特徴の検索よりもはるかに困難なものである。その理由は、画像又はビデオに含まれるオブジェクトの比較が含まれるからである。上述のアプリケーション等、多くの実社会のアプリケーションにとって、低水準特徴型検索は、一般に、異なるオブジェクトを含む画像が類似の色又はテクスチャを有さないので十分ではない。 Although the techniques described in this disclosure can be applied to general image or video search, the present disclosure is more meaningful than visual search based on low-level features such as color and texture. Focus on searching images or videos at the level. Searching for images or videos based on low-level features has been well studied, and highly efficient search algorithms are available for large databases. Searching for images or videos at a semantic level is much more difficult than searching for low-level features. The reason is that a comparison of the objects contained in the image or video is included. For many real-world applications, such as those described above, low-level feature search is generally not sufficient because images containing different objects do not have similar colors or textures.

意味的なレベルでの画像又はビデオ検索は、画像内のオブジェクトの比較を要求する。この趣旨において定義される類似画像は、同一のオブジェクト及び背景を含むべきものであるが、オブジェクト動作、明暗の変化等のいくつかの変更を有しうる。課題は、非常に困難なものであり、なぜならば、コンピューターや計算装置等にとって意味的な水準で画像を理解し又は表現することが困難だからである。意味的な水準で画像及びビデオの検索を実行するいくつかの初期の研究が存在した。例えば、D. Q. Zhang 及びS. F. Chang著の「Detecting Image Near-Duplicate by Stochastic Attributed Relational Graph Matching with Learning」２００４年１０月、ACM Multimedia（米国ニューヨーク）には、機械学習法を用いて精密近似複製検出及び検索する部分型類似測定が記載されている。Zhang等により述べられる類似測定は、画像内のオブジェクト同士を実際に比較して、高精度な結果を得ることができる。しかし、本方法は、低水準特徴（例えば、色ヒストグラム）を用いる従来の検索方法と比較してとても遅く、また、実社会のアプリケーションに適用するのが困難である。 Image or video retrieval at a semantic level requires a comparison of objects in the image. Similar images defined for this purpose should contain the same object and background, but may have some changes such as object motion, changes in light and dark. The problem is very difficult because it is difficult to understand or represent images at a level that is meaningful to computers, computing devices, and the like. There were several early studies that performed image and video searches at a semantic level. For example, "Detecting Image Near-Duplicate by Stochastic Attributed Relational Graph Matching with Learning" written by DQ Zhang and SF Chang, October 2004, ACM Multimedia (New York, USA) Partial type similarity measurements are described. Similarity measurements described by Zhang et al. Can actually compare objects in an image and obtain highly accurate results. However, this method is very slow compared to conventional search methods that use low level features (eg, color histograms) and is difficult to apply to real-world applications.

従って、意味的な水準で画像を効果的に検索するための技術に対するニーズが存在する。 Accordingly, there is a need for techniques for effectively retrieving images at a semantic level.

分類構造で効果的に画像の意味類似検索をするためのシステム及び方法が提供される。本システム及び方法は、画像データベースをクエリして、意味的な水準、つまりクエリ画像と同じオブジェクト及び背景を含み、いくつかの変形を持つ画像であって、クエリ画像に類似するものを発見することを可能にする。本開示の技術は、ある特定のクラス又はカテゴリー内の画像の意味類似検索を制限することにより、類似性計算を大幅に減少させる。まず、データベース内の全ての画像に対して分類検索ツリーが構築される。次に、入力クエリ画像それぞれに対して、クエリ画像が１又は複数のカテゴリー（通常、人々、屋内、屋外等の意味カテゴリー）に分類される。カテゴリーは、全体の画像空間、つまり複数の画像のデータベースのサブセットを表す。画像類似性計算は、次に、サブセット内に限定される。 Systems and methods are provided for effectively performing semantic similarity searches of images with classification structures. The system and method query an image database to find a semantic level, i.e., an image that contains the same objects and background as the query image and has several variations, similar to the query image. Enable. The techniques of this disclosure significantly reduce similarity calculations by restricting semantic similarity searches for images within a particular class or category. First, a classification search tree is constructed for all images in the database. Next, for each input query image, the query image is classified into one or more categories (usually semantic categories such as people, indoors, outdoors, etc.). A category represents the entire image space, ie a subset of a database of images. The image similarity calculation is then limited to the subset.

本開示の１つの態様に従って、関心画像に関して、複数の画像を検索する方法が提供される。本方法は、複数の画像に対して分類構造を構築する。分類構造は、少なくとも２つの画像カテゴリーを含み、画像カテゴリーのそれぞれは複数の画像のサブセットを表す。そして、本方法は、次に、クエリ画像を受信して、少なくとも２つの画像カテゴリーのうちの選択された１つに対して、関心画像に対する検索を制限する。 According to one aspect of the present disclosure, a method for searching a plurality of images for an image of interest is provided. The method builds a classification structure for multiple images. The classification structure includes at least two image categories, each of which represents a subset of a plurality of images. The method then receives the query image and limits the search for the image of interest to a selected one of the at least two image categories.

別の態様に従って、関心画像に関して、複数の画像を検索するシステムは、少なくとも２つの意味カテゴリーに構造化される複数の画像を含むデータベースを含む。各意味カテゴリーは、複数の画像のサブセットを表す。また、本システムは、少なくとも１つの画像を取得する手段と、クエリ画像を分類して少なくとも２つの意味カテゴリーのうちの１つを選択する画像分類モジュールと、クエリ画像を用いて関心画像を検索する画像検索モジュールとを含み、検索が、少なくとも２つの意味カテゴリーのうちの選択された１つに制限されることを特徴とする。 According to another aspect, a system for retrieving a plurality of images for an image of interest includes a database including a plurality of images structured into at least two semantic categories. Each semantic category represents a subset of multiple images. The system also retrieves at least one image, an image classification module that classifies the query image and selects one of at least two semantic categories, and retrieves the image of interest using the query image. And an image search module, wherein the search is limited to a selected one of at least two semantic categories.

さらに別の態様に従って、機械により実行可能なプロラム命令を明確に実行して、関心画像に関して、複数の画像を検索する複数のステップを実行するための機械可読プログラム媒体装置が提供される。本方法は、複数の画像に対して分類構造を構築することを含む。分類構造は、少なくとも２つの画像カテゴリーを含み、画像カテゴリーのそれぞれは、複数の画像のサブセットを表す。また、本方法は、クエリ画像を受信し、クエリ画像を少なくとも２つの画像カテゴリーのうちの選択された１つに分類し、関心画像に対する検索を少なくとも２つの画像カテゴリーのうちの選択された１つに制限する。 In accordance with yet another aspect, a machine-readable program media apparatus is provided for performing program-executable program instructions to perform a plurality of steps for retrieving a plurality of images for an image of interest. The method includes constructing a classification structure for the plurality of images. The classification structure includes at least two image categories, each of which represents a subset of a plurality of images. The method also receives a query image, classifies the query image into a selected one of at least two image categories, and searches for an image of interest in the selected one of the at least two image categories. Limit to.

図面において同じ参照番号は、全体の一覧において類似の要素を示す。
図１は、本開示の態様に従う画像の類似検索のためのシステムの例示的な図である。図２は、本開示の態様に従う画像の類似検索のための例示的な方法のフロー図である。図３は、本開示に従う分類検索ツリーを示す。図４は、本開示に従う分類検索ツリーにおいて実行される単純検索を示す。図５は、本開示に従う分類検索ツリーにおいて実行される冗長検索を示す。図６は、本開示の態様に従う分類検索ツリーを構築又は生成するための方法を示す。図７は、タグ付けされたキーワードを持つ画像に関する特徴ベクトルを示す。図８は、本開示の態様に従う分類検索データベースに新しい画像を追加するための方法を示す。 The same reference numbers in the drawings indicate similar elements in the overall list.
FIG. 1 is an exemplary diagram of a system for similarity search of images in accordance with aspects of the present disclosure. FIG. 2 is a flow diagram of an exemplary method for similarity search of images in accordance with aspects of the present disclosure. FIG. 3 illustrates a classification search tree according to the present disclosure. FIG. 4 illustrates a simple search performed on a classified search tree according to the present disclosure. FIG. 5 illustrates a redundant search performed in a classified search tree according to the present disclosure. FIG. 6 illustrates a method for building or generating a classified search tree in accordance with aspects of the present disclosure. FIG. 7 shows feature vectors for images with tagged keywords. FIG. 8 illustrates a method for adding a new image to a classified search database according to an aspect of the present disclosure.

本開示によるこれら及び他の態様、特徴、及び利点が説明され、又は添付の図面と関連して参照されることにより好適な実施形態における詳細な説明から明確なものになる。 These and other aspects, features, and advantages of the present disclosure will be set forth or apparent from the detailed description of the preferred embodiments when read in conjunction with the accompanying drawings.

図に示す各要素は、ハードウェア、ソフトフェア、またはその組み合わせによる様々な形態で実施されうることが理解されよう。これら各要素は、ハードウェアと、適切にプログラムされた１以上の汎用デバイスであって、プロセッサー、メモリー、入出力インターフェースを含むことが可能な汎用デバイス上のソフトウェアと、を組み合わせて実施される。 It will be understood that each element shown in the figures may be implemented in various forms by hardware, software, or a combination thereof. Each of these elements is implemented in combination with hardware and software on one or more appropriately programmed general purpose devices that can include a processor, memory, and input / output interfaces.

本明細書の記載は、本開示による原則を示す。したがって、ここで明確に記載又は示されていなくとも、当業者であれば、本開示による原則を具現化し、そしてその主旨および範囲内に含まれる様々な配置を考案できることが理解されよう。 The description herein sets forth the principles according to the present disclosure. Accordingly, it will be appreciated by those skilled in the art that various arrangements can be devised that embody the principles of the present disclosure and fall within the spirit and scope thereof, even if not explicitly described or shown herein.

ここに列挙される全ての例及び条件付の用語は、本開示による原則と、技術促進への寄与に向けられた発明者による概念と、を読み手が理解するのを支援する教育的目的を意図するものであり、また、この特別に列挙された例及び条件に限定されることなしに解釈されるものである。 All examples and conditional terms listed here are intended for educational purposes to help readers understand the principles of the present disclosure and the inventor's concepts aimed at contributing to technology promotion. And is not to be construed as being limited to the specifically listed examples and conditions.

さらに、ここで、本開示による原則、態様、実施形態及びその特定の例を列挙している全ての記載は、構造的及び機能的な均等物の双方を包含することを目的とする。また、これら均等物は、現在公知の均等物及び将来開発される均等物、つまり構造に関係なく、同一の機能を発揮するように開発されたいかなる要素をも含むように意図される。 Moreover, all statements herein reciting principles, aspects, embodiments and specific examples thereof are intended to encompass both structural and functional equivalents. These equivalents are also intended to include currently known equivalents and equivalents developed in the future, i.e., any element developed to perform the same function regardless of structure.

したがって、例えば、ここで示されるブロック図は、本開示による原則を具現化する図示による概念的回路を表すことを当業者に理解されよう。同様に、いかなるフローチャート、フロー図、状態遷移図、及び擬似コード等は、実質的にコンピューター可読媒体に表され、また、コンピューター又はプロセッサーが明示されていてもいなくても、コンピューター又はプロセッサーにより実行される様々な処理を示す。 Thus, for example, those skilled in the art will appreciate that the block diagrams shown herein represent illustrative conceptual circuits that embody the principles according to the present disclosure. Similarly, any flowcharts, flow diagrams, state transition diagrams, pseudocode, etc. may be substantially represented on a computer-readable medium and executed by a computer or processor with or without the computer or processor specified. Various processes are shown.

図に示す様々な要素による機能は、専用ハードウェア及び最適なソフトウェアと共同してソフトウェアを実行することができるハードウェアの利用により提供可能である。プロセッサーにより提供される場合、単一の専用プロセッサー、単一の共有プロセッサー、又は複数の単一プロセッサー（幾分かは共有可能）により、提供可能である。さらに、語句「プロセッサー」又は「コントローラー」のような明示的使用は、専らソフトウェア実行可能なハードウェアのみに言及されていると解釈されるべきではなく、限定することなく、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）ハードウェア、ソフトウェアを格納するＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、及び不揮発性記憶を含むことが可能である。 Functions by various elements shown in the figure can be provided by using hardware capable of executing software in cooperation with dedicated hardware and optimum software. When provided by a processor, it can be provided by a single dedicated processor, a single shared processor, or multiple single processors (some can be shared). Furthermore, explicit uses such as the terms “processor” or “controller” should not be construed to refer exclusively to software-executable hardware, and without limitation, a DSP (Digital Signal Processor). It may include hardware, ROM (Read Only Memory) for storing software, RAM (Random Access Memory), and non-volatile storage.

他の従来型またはカスタムハードウェアもまた含まれる。同様に、図に示す任意のスイッチは単に概念的なものである。それらの機能は、専用ロジック、制御プログラム及び専用ロジックの相互作用による論理プログラムの動作、または手動により実行可能であり、特定の技術は、文脈によりとりわけ理解されるような実行機により選択される。 Other conventional or custom hardware is also included. Similarly, any switches shown in the figures are merely conceptual. These functions can be performed by dedicated logic, control program and logic program operation through interaction of dedicated logic, or manually, and the particular technique is selected by the executioner as is particularly understood by the context.

特許請求の範囲に関し、特定の機能を実施する手段として表現されたいかなる要素も、例えば、ａ）当該機能を実行する回路素子の組み合わせ、又はｂ）当該機能を実行するソフトウェアを実行する適切な回路と結合される、ファームウェア、マイクロコード等を含任意の形態のソフトウェア、を含む機能を実行する任意の方法を包含するものである。この特許請求の範囲により定義される本開示は、列挙された様々な手段により提供される機能が結合され、そして特許請求の範囲に記載された方法において当該機能が共に実行されることを含む。したがって、これら機能を提供することが可能な手段は、本明細書に示すものと等しいとみなすことができる。 With respect to the claims, any element expressed as a means for performing a particular function can be, for example, a) a combination of circuit elements that perform the function, or b) a suitable circuit that executes software that performs the function. And any method for performing functions including any form of software, including firmware, microcode, etc. This disclosure, as defined by the claims, includes the functions provided by the various listed means being combined and performing the functions together in the manner recited in the claims. Thus, means capable of providing these functions can be considered equivalent to those shown herein.

クエリ画像に類似する画像の検出及び検索は、実社会における様々なアプリケーションに非常に有用である。課題は、クエリ画像に意味的な基準で類似する画像（つまり、同一のシーンにおいて撮影されたもの、及び同一の対象を有するもの）を効果的に発見することである。いくつかの従来の技術では、低速で意味画像検索する高精度アルゴリズムが提案されている。効率に関する課題は、画像データベースが巨大である場合に特に重要である。通常、画像データベースを検索する時間は、データベースの大きさに直線的に比例する。本開示によるシステム及び方法は、画像データベース構造及び画像の意味を利用することにより、検索を高速化する。 The detection and retrieval of images similar to query images is very useful for various applications in the real world. The challenge is to effectively find images that are similar to the query image on a semantic basis (i.e., those taken in the same scene and having the same object). Some conventional techniques have proposed high-precision algorithms that search for semantic images at low speed. The efficiency issue is particularly important when the image database is huge. Usually, the time for searching the image database is linearly proportional to the size of the database. The system and method according to the present disclosure speeds up the search by utilizing the image database structure and the meaning of the image.

階層処理により画像またはビデオを効果的に検索するためのシステム及び方法が提供される。高品質画像／ビデオ類似アルゴリズム又は関数がすでに利用可能であると仮定すれば、アルゴリズムのスピードは、従来の特徴型の類似性計算アルゴリズムよりも低速である。したがって、本開示によるシステム及び方法は、画像またはビデオデータベースを意味検索することを加速する高速化処理を提供する。省略を目的として、本開示は、ビデオつまり連続する画像にも適用可能であるが、画像検索に焦点をあてることとする。このシステム及び方法は、画像のコンテンツ空間（ｃｏｎｔｅｎｔｓｐａｃｅ）の構造を利用することにより、検索アルゴリズムを高速化する。本開示による技術は、特定のクラス又はカテゴリーの範囲内で画像の視覚類似検索を制限して、類似性計算を大幅に削減する。最初に、分類構造が、分類ツリーに限定されないが、データベースの全ての画像に対して構築される。次に、各入力クエリ画像に対して、全体の画像空間のサブセットを表す１以上のカテゴリー（典型的に意味カテゴリーは、人々、屋内、屋外等である）に画像が分類される。画像の類似性計算は次に、当該サブセット内で制限される。 Systems and methods are provided for effectively retrieving images or videos through hierarchical processing. Assuming that a high quality image / video similarity algorithm or function is already available, the speed of the algorithm is slower than the traditional feature type similarity calculation algorithm. Accordingly, the system and method according to the present disclosure provides a speed-up process that accelerates semantic searching of image or video databases. For purposes of omission, the present disclosure is applicable to video, ie, continuous images, but will focus on image retrieval. The system and method speed up the search algorithm by utilizing the structure of the content space of the image. The technique according to the present disclosure limits visual similarity searches for images within a particular class or category, and greatly reduces similarity calculations. Initially, a classification structure is built for every image in the database, but not limited to a classification tree. Next, for each input query image, the images are classified into one or more categories (typically semantic categories are people, indoors, outdoors, etc.) that represent a subset of the overall image space. Image similarity calculations are then limited within the subset.

ここで、各図面を参照すると、図1には、本開示による実施形態に従う例示のシステムコンポーネント１００が記載されている。スキャン装置１０３は、フィルムプリント１０４をスキャンするものであり、例えば、カメラフィルムのネガを、デジタルフォーマット例えば、Ｃｉｎｅｏｎ−ｆｏｒｍａｔまたはＳＭＰＴＥ（ＳｏｃｉｅｔｙｏｆＭｏｔｉｏｎＰｉｃｔｕｒｅａｎｄＴｅｌｅｖｉｓｉｏｎＥｎｇｉｎｅｅｒｓ）のＤＰＸ（ＤｉｇｉｔａｌＰｉｃｔｕｒｅＥｘｃｈａｎｇｅ）のファイルにする。スキャン装置１０３は、テレシネ（ｔｅｌｅｃｉｎｅ）装置、又はビデオ出力するＡｒｒｉＬｏｃＰｒｏ（登録商標）のようなフィルムからのビデオ出力を生成しうる装置を含むことができる。または、ポスト生成プロセス又はデジタルシネマ１０６（例えば、コンピューター可読形態のファイル）からのファイルを直接に用いることができる。コンピューター可読ファイルのソースには、ＡＶＩＤ（登録商標）エディター、ＤＰＸファイル、Ｄ５テープとすることができる。 Referring now to the drawings, FIG. 1 describes an exemplary system component 100 in accordance with an embodiment according to the present disclosure. The scanning device 103 scans the film print 104. For example, a negative of a camera film is converted into a digital format such as a DPX (Digital Picture Exchange) file of Cineon-format or SMPTE (Society of Motion Picture and Television Engineers). To. The scanning device 103 can include a device that can generate video output from film, such as a telecine device, or Ari LocPro® for video output. Alternatively, a file from a post generation process or digital cinema 106 (eg, a file in computer readable form) can be used directly. The source of the computer readable file can be an AVID (registered trademark) editor, a DPX file, or a D5 tape.

デジタル画像またはスキャンされたフィルムプリントは、コンピューター等の後処理装置１０２に入力される。このコンピューターは、１以上の中央処理装置（ＣＰＵ）、ＲＡＭやＲＯＭ等のメモリー１１０、キーボード、カーソル制御装置（例えば、マウスやジョイスティック）、及びディスプレイ装置等の入出力（Ｉ／Ｏ）ユーザーインターフェース１１２等のハードウェアを有する公知の様々な任意のコンピューターフォーム上で実施される。このコンピュータープラットフォームは、オペレーティングシステムやマイクロインストラクションコードをもまた含む。ここに記載される様々な処理や機能は、マイクロインストラクションコードの一部とすること、またはオペレーティングシステムを介して実行されるソフトウェアアプリケーションの一部（または、これらの組み合わせ）とすることができる。１つの実施形態において、ソフトウェアアプリケーションプログラムは、プログラム記憶装置上で明確に具現化される。このソフトウェアアプリケーションプログラムは、アップロード可能であり、また、後処理装置１０２のような任意の適切な装置によって実行される。また、様々なインターフェースや、パラレルポート、シリアルポート、又はＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）等のバス構造により、他の様々な周辺装置を、このコンピュータープラットフォームに接続することができる。他の周辺装置は、追加の記憶装置１２４およびプリンター１２８を含むことができる。 The digital image or the scanned film print is input to a post-processing device 102 such as a computer. The computer includes one or more central processing units (CPU), a memory 110 such as a RAM and a ROM, a keyboard, a cursor control device (for example, a mouse and a joystick), and an input / output (I / O) user interface 112 such as a display device. Or any other known computer form with hardware. The computer platform also includes an operating system and microinstruction code. The various processes and functions described herein can be part of the microinstruction code or part of a software application (or a combination thereof) that is executed via the operating system. In one embodiment, the software application program is clearly embodied on a program storage device. This software application program can be uploaded and executed by any suitable device, such as post-processing device 102. Also, various other peripheral devices can be connected to the computer platform by various interfaces and a bus structure such as a parallel port, a serial port, or a USB (Universal Serial Bus). Other peripheral devices may include additional storage device 124 and printer 128.

また、コンピューター可読形態１０６（例えば、外部ハードドライブ１２４に格納可能なデジタルシネマ）におけるファイル／フィルムプリントは、コンピューター１０２に直接入力される。 Also, file / film prints in computer readable form 106 (eg, a digital cinema that can be stored on external hard drive 124) are input directly to computer 102.

ソフトウェアプログラムは、メモリー１１０に格納される類似検索モジュール１１４を含み、クエリ画像に基づいて関心画像を効率よく検索する。類似検索モジュール１１４はさらに画像分類モジュール１１６を含み、画像分類モジュール１１６は、複数の分類器（ｃｌａｓｓｉｆｉｅｒ）及びサブ分類器を生成して、クエリ画像を少なくとも１つのカテゴリーに分類する。特徴抽出部１１８は、画像から特徴を抽出する。特徴抽出部は、当業者に既知のものであり、テクスチャ、線方向、エッジ等の特徴を抽出するが、これに限定されない。1つの実施形態において、分類器は、抽出された特徴に基づいて、クエリ画像を分類するパターン認識関数を含む。 The software program includes a similarity search module 114 stored in the memory 110, and efficiently searches for an image of interest based on the query image. The similarity search module 114 further includes an image classification module 116 that generates a plurality of classifiers and sub-classifiers to classify the query images into at least one category. The feature extraction unit 118 extracts features from the image. The feature extraction unit is known to those skilled in the art and extracts features such as texture, line direction, and edge, but is not limited thereto. In one embodiment, the classifier includes a pattern recognition function that classifies the query image based on the extracted features.

類似検索モジュール１１４はさらに、画像データベース１２２の画像サブセットをそれぞれ検索するように構成された画像検索部を複数含む画像検索モジュール１１９を含む。各画像検索部は、類似測定を用いて、クエリ画像から関心画像を決定する。 The similarity search module 114 further includes an image search module 119 that includes a plurality of image search units configured to search each of the image subsets of the image database 122. Each image search unit determines an image of interest from the query image using similarity measurements.

キーワードタグ付け部１２０は、データベースの各画像に特徴をタグ付けする。１つの実施形態において、キーワードタグ付け部１２０はＮキーワードの辞書を含み、キーワードタグ付け部１２０はキーワードから特徴ベクトルを生成するために用いられる。タグ付けされる特徴は、画像を複数のサブセットに格納するために用いることができる。さらに、１つの実施形態において、画像分類モジュール１１６は、キーワードを用いて分類器を生成する。 The keyword tagging unit 120 tags features to each image in the database. In one embodiment, the keyword tagging unit 120 includes a dictionary of N keywords, and the keyword tagging unit 120 is used to generate a feature vector from the keywords. The tagged features can be used to store images in multiple subsets. Further, in one embodiment, the image classification module 116 generates a classifier using keywords.

さらに、類似検索モジュール１１４は、データベース中の画像においてオブジェクト（対象）を認識するためのオブジェクト認識部１２１を含む。認識されたオブジェクトを用いることにより、画像分類モジュール１１６は、当該オブジェクトから学習し、また、当該オブジェクトに基づいて分類器を構築することができる。 Furthermore, the similarity search module 114 includes an object recognition unit 121 for recognizing an object (target) in an image in the database. By using the recognized object, the image classification module 116 can learn from the object and build a classifier based on the object.

図２は、本発明の開示による態様に従う、これに限定されるものではないが、分類検索ツリー等の分類データ構造を用いた画像の類似検索のための例示的方法のフロー図である。まず、ステップ２０２において、以下に詳細に説明する分類検索ツリーが構築される。次に、ステップ２０４において、後処理装置１０２は、少なくとも１つの２次元（２Ｄ）画像、例えばクエリ画像を取得する。後処理装置１０２は、例えば民生用カメラを介して、コンピューター可読フォーマットにおけるデジタル画像ファイルを得ることにより、クエリ画像を取得することができる。本開示による技術は画像に関して記載されているけれども、連続する画像、例えばビデオ等の動画も、本開示による技術を利用することができる。デジタルビデオファイルは、デジタルカメラで一時的な一連の動画をキャプチャすることにより取得することができる。または、通常のフィルムタイプカメラにより、ビデオシーケンスをキャプチャすることができる。この場合、当該フィルムは、スキャン装置１０３を介してスキャンされる。ステップ２０６において、クエリ画像は、分類器により分類されて、続けて、ツリーのより低位の最下層または枝に到達するまで、ステップ２０８においてサブ分類器により分類される。ステップ２１０において、全体の画像空間またはデータベースに対するよりもむしろデータベース１２２の画像サブセットの範囲内で検索部による検索が実行される。分類検索ツリーの構築または生成、及びツリー内の検索の詳細は、以下に記載する。 FIG. 2 is a flow diagram of an exemplary method for similarity search of images using a classification data structure such as, but not limited to, a classification search tree, in accordance with an aspect according to the present disclosure. First, in step 202, a classification search tree described in detail below is constructed. Next, in step 204, the post-processing device 102 obtains at least one two-dimensional (2D) image, such as a query image. The post-processing device 102 can obtain a query image by obtaining a digital image file in a computer-readable format, for example via a consumer camera. Although the technique according to the present disclosure has been described with respect to images, successive images, for example moving images such as videos, can also utilize the technique according to the present disclosure. A digital video file can be obtained by capturing a series of temporary movies with a digital camera. Alternatively, a video sequence can be captured by a normal film type camera. In this case, the film is scanned via the scanning device 103. In step 206, the query image is classified by the classifier and subsequently classified by the sub-classifier in step 208 until it reaches the lowermost layer or branch of the tree. In step 210, a search by the search unit is performed within the image subset of the database 122 rather than against the entire image space or database. Details of building or generating a classified search tree and searching within the tree are described below.

本開示によるシステム及び方法は、ツリー型検索を採用してデータベースの小サブセット内の画像比較を制限する。ツリー型検索は、以下に記載の画像分類に基づいている。分類ツリーは、自動的に又は手動で画像にキーワードをタグ付けすることにより構築される。 The system and method according to the present disclosure employs a tree-type search to limit image comparison within a small subset of the database. The tree-type search is based on the image classification described below. A classification tree is constructed by automatically or manually tagging images with keywords.

本開示によるシステム及び方法は、分類検索ツリーの枝に沿うようにして関心画像の検索を制限することにより、検索処理を高速化する。検索の実行において、高精度類似測定値Ｓ（Ｉ_ｑ,Ｉ_ｄ）が利用されることが想定され、ここでＩ_ｑは、クエリ画像であり、Ｉ_ｄはデータベース中の画像である。類似測定値は、２つの画像の類似性を示す数であり、例えば、１．０は２つの画像が同一であることを意味し、０．０は２つの画像が完全に異なることを意味する。通常、距離が類似の逆数として考慮される。類似の１つの例は、２つの画像の色ダイアグラムの距離の逆数である。類似測定は、当業者に知られているものであり、また、画像の類似測定値は、ある画像カテゴリーに対して「学習可能」であり、当該カテゴリー内で類似検索が最適されうる。この類似測定が、ある画像カテゴリーに対して手動で設計されることもありうる。いずれの場合にも、類似測定が画像カテゴリーＣに適用され、Ｓ_Ｃ（Ｉ_ｑ,Ｉ_ｄ）として表される。 The system and method according to the present disclosure speeds up the search process by limiting the search for images of interest along the branches of the classified search tree. In performing the search, it is assumed that a high-precision similarity measure S (I _q , I _d ) is used, where I _q is a query image and I _d is an image in the database. A similarity measure is a number that indicates the similarity of two images, for example 1.0 means that the two images are the same, 0.0 means that the two images are completely different. . Usually distance is considered as a similar reciprocal. One similar example is the reciprocal of the distance between the color diagrams of two images. Similarity measurements are known to those skilled in the art, and image similarity measurements are “learnable” for an image category, and similarity searches can be optimized within that category. This similarity measure may be designed manually for a certain image category. In either case, a similar measurement is applied to image category C and is represented as S _C (I _q , I _d ).

分類検索ツリーは、当該ツリーにおける各中間ノードが分類器を用いて画像における１つ以上のカテゴリーを検出し又は分類するためのツリーである。ツリーにおける各枝は、カテゴリーを表す。検出されたカテゴリーの枝のみが、次に、ツリーにトラバースされる。図３に示すように、ツリー中の各リーフノード３０２、３０４、３０６、３０８、３１０は、特定のカテゴリーに対応する画像を表す。分類検索ツリーは、複数の層及びレベルを有することができる。例えば、図３におけるツリーは、３つのレベルを有する。さらに、図３に示すように、分類検索ツリーは分類器及び検索器を含む。 A classification search tree is a tree for each intermediate node in the tree to detect or classify one or more categories in the image using a classifier. Each branch in the tree represents a category. Only the branches of the detected category are then traversed to the tree. As shown in FIG. 3, each leaf node 302, 304, 306, 308, 310 in the tree represents an image corresponding to a particular category. A classification search tree can have multiple layers and levels. For example, the tree in FIG. 3 has three levels. Further, as shown in FIG. 3, the classification search tree includes a classifier and a searcher.

分類器は、クエリ画像をカテゴリーに分類するために用いられる。１つの実施形態において、分類器は、パターン認識、又は装置学習アルゴリズム、又は例えばカラー及びテクスチャ等の自動的に抽出された特徴に基づく関数である。分類の一般的な手順は次の通りである。まず、特徴ベクトルが画像から抽出され、次に、パターン認識アルゴリズム又は関数が特徴ベクトルを取得して、１以上のクラスラベルを、１以上のある画像カテゴリーを表す選択信頼性スコア（例えば、クラスＩＤ及びスコア）とともに出力する。一般に、パターン認識アルゴリズムは、入力として特徴ベクトルを取得し、クラスＩＤを示す整数を出力し、あるいは、パターン認識関数は、抽出されたベクトルを記憶されるベクトルと比較する。他のパターン認識アルゴリズムまたは関数が当業者に知られている。分類器はまた、２値とすることができる。この場合、分類器は、画像がそれぞれあるカテゴリーに属するかどうかを示すｙｅｓラベル又はｎｏラベルを出力する。分類器は、手動による設計又は例示的データから自動的に構築することができる。 The classifier is used to classify query images into categories. In one embodiment, the classifier is a function based on pattern recognition or device learning algorithms or automatically extracted features such as color and texture. The general procedure for classification is as follows. First, a feature vector is extracted from the image, then a pattern recognition algorithm or function obtains the feature vector, and one or more class labels are selected with a selected reliability score (eg, class ID) representing one or more certain image categories. And score). In general, the pattern recognition algorithm takes a feature vector as input and outputs an integer indicating a class ID, or the pattern recognition function compares the extracted vector with a stored vector. Other pattern recognition algorithms or functions are known to those skilled in the art. The classifier can also be binary. In this case, the classifier outputs a yes label or a no label indicating whether each image belongs to a certain category. The classifier can be automatically constructed from manual design or exemplary data.

検索器は、画像の類似性を計算し、クエリ画像に最も類似する関心画像を発見するために用いられるプログラムである。 A searcher is a program used to calculate the similarity of images and find the image of interest that most closely resembles the query image.

簡単な分類検索の場合、クエリ画像が１つに分類され、リーフカテゴリーがカテゴリーＣならば、各レベルで１つのカテゴリーのみとなる。分類の終了後、すなわち、クエリ画像が分類検索ツリーの底（リーフ層）に到達した後、類似測定値Ｓ_Ｃ（Ｉ_ｑ,Ｉ_ｄ）計算が実行されて、図４に示すように、画像カテゴリーＣに相当するデータベースのサブセットの範囲内にある画像を検索する。図４及び残りの図において、検索中にトラバースされる枝又はリーフノードを実線で示し、トラバースされない分類器及び検索器を破線で示す。例えば、図４において、クエリ画像が受信され、分類器０に出力される。分類器０において、この画像が、例えばサブ分類器である分類器０．１でさらに分類されるかが決定される。分類器０．１から、クエリ画像が分類器０．１．１に出力され、ここで、画像サブセット０．１．１．２に、クエリ画像に類似する画像を検索するために検索器０．１．１．２を用いるかどうかが判定される。関心画像の検索を画像サブセット０．１．１．２に制限することにより、検索がより効率よく、かつ早急に実行されることが理解されよう。 In the case of simple classification search, if the query images are classified into one and the leaf category is category C, there is only one category at each level. After the classification is completed, i.e., after the query image reaches the bottom (leaf layer) of the classification search tree, a similarity measure S _C (I _q , I _d ) calculation is performed, as shown in FIG. Search for images that fall within the subset of the database corresponding to category C. In FIG. 4 and the remaining figures, branches or leaf nodes traversed during the search are indicated by solid lines, and classifiers and searchers that are not traversed are indicated by dashed lines. For example, in FIG. 4, a query image is received and output to the classifier 0. In the classifier 0, it is determined whether this image is further classified by the classifier 0.1, which is a sub-classifier, for example. From the classifier 0.1, the query image is output to the classifier 0.1.1.1, where the image subset 0.1.1.2 is searched by a searcher 0.0.1 to search for images similar to the query image. It is determined whether 1.1.2 is used. It will be appreciated that by limiting the search for images of interest to the image subset 0.1.1.2, the search is performed more efficiently and quickly.

この場合、分類器の出力は、２値またはｎ変数にすることができる。２値分類器の場合、この分類器の出力は、カテゴリーに属するクエリ画像であるか否かを示す。同様に、ｎ変数の分類器の場合、この分類器の出力は、クエリ画像がどのカテゴリーに属するかを示す整数値にできる。分類検索ツリーにおいて分類器の全てが２値である場合、ツリーは２値ツリーであり、そうでなければ、非２値分類検索器でありうる。 In this case, the output of the classifier can be binary or n-variable. In the case of a binary classifier, the output of this classifier indicates whether or not the query image belongs to a category. Similarly, in the case of an n-variable classifier, the output of this classifier can be an integer value indicating which category the query image belongs to. If all of the classifiers in the classification search tree are binary, the tree can be a binary tree, otherwise it can be a non-binary classification searcher.

簡単な分類検索の１つの課題は、分類に誤りがあった場合にクエリ画像が完全に間違ったカテゴリーに進み、結果として誤った検索結果となりうることである。この課題は、１つのカテゴリーよりもむしろ複数のカテゴリーが検索されるようにする冗長な検索により解決することができる。 One challenge of a simple classification search is that if the classification is incorrect, the query image can go to a completely wrong category, resulting in an incorrect search result. This problem can be solved by a redundant search that allows multiple categories to be searched rather than a single category.

図５を参照すると、冗長な分類検索の場合において、クエリ画像は、例えば、分類器０．１及び分類器０．２といった１以上のリーフカテゴリーに分類される。分類の終了後、すなわち、クエリ画像が分類検索ツリーの底（リーフ層）における各自のカテゴリー、例えば分類器０．１．１及び分類器０．２に到達した後、類似測定値Ｓ_Ｃ（Ｉ_ｑ,Ｉ_ｄ）計算が実行され、例えば図５に示すように、検索器０．１．１．２が画像サブセット０．１．１．２を検索し、また検索器０．２．１が画像サブセット０．２．１を検索するようにして、選択された画像カテゴリーCに相当するデータベースのサブセットの範囲内で画像を検索する。 Referring to FIG. 5, in the case of redundant classification search, the query image is classified into one or more leaf categories, for example, classifier 0.1 and classifier 0.2. After classification of completion, namely, their category in the bottom (leaf layer) of the query image is classified search tree, after reaching the example classifier 0.1.1 and classifier 0.2, similar measurements S _{C (I} _{q 1} , I _d ) calculation is performed, for example, as shown in FIG. 5, searcher 0.1.1.2 searches image subset 0.1.1.2, and searcher 0.2.2.1 The image is searched in the range of the subset of the database corresponding to the selected image category C by searching the image subset 0.2.2.1.

冗長な分類検索を実現するために、分類器の出力は、クラスラベルのリスト、及びクエリ画像において存在するカテゴリーに相当する信頼性を表す浮動値のリストである必要がある。次に、閾値化手順が用いられて、分類器の出力が閾値よりも大きなカテゴリーのリストを得ることができる。クエリ画像は、結果として得られるカテゴリーのリストに属するようにされる。ツリーの底のレベルに到着後、カテゴリーのリストからの各画像に対する類似性スコアが決定されて、次に、最大の類似性スコアをもつ画像が関心画像として選択される。 In order to implement a redundant classification search, the output of the classifier needs to be a list of class labels and a list of floating values representing the reliability corresponding to the categories present in the query image. A thresholding procedure can then be used to obtain a list of categories for which the classifier output is greater than the threshold. The query image is made to belong to the resulting list of categories. After arriving at the bottom level of the tree, a similarity score for each image from the list of categories is determined, and then the image with the highest similarity score is selected as the image of interest.

画像に対して効率的な検索を可能にするために、分類検索ツリーが構築されて、画像空間を構造化して、毎回全ての画像が検索されないようにする。図６を参照すると、分類検索ツリーを構築することまたは生成することは、２つの段階を含んでいる。第１の段階において、ツリーの全ての枝が構築され、この際に、分類検索ツリーが複数の層を有する場合、全ての分類器を構築してツリーに分類器を編成する。第２の段階において、データベース中の画像がカテゴリーに分類されて、データベース中に画像のサブセットを形成する。さらに検索器が、画像のサブセットのそれぞれの範囲内を検索するために定義される。 In order to enable efficient search for images, a classification search tree is constructed to structure the image space so that not every image is searched each time. Referring to FIG. 6, building or generating a classification search tree includes two stages. In the first stage, all branches of the tree are built, and if the classification search tree has multiple layers, all classifiers are built and the classifiers are organized into trees. In the second stage, the images in the database are grouped into categories to form a subset of the images in the database. In addition, a searcher is defined to search within each range of the subset of images.

分類検索ツリーを構築するために、ツリー中の中間ノードの分類器が最初に構築されなければならない。各分類器は、１つの意味クラス（例えば、屋外シーン、木、人間の顔等）に相当する。この意味クラスは、人間により手動で決定することも、また、自動的にクラスタリング関数又はアルゴリズムを用いることもできる。分類器同士の間（つまり、ツリー構造）の関係は、人間の設計により定義することができる。 In order to build a classification search tree, a classifier for intermediate nodes in the tree must first be built. Each classifier corresponds to one semantic class (for example, outdoor scene, tree, human face, etc.). This semantic class can be determined manually by a human or can automatically use a clustering function or algorithm. The relationship between classifiers (ie, tree structure) can be defined by human design.

ひとたび意味クラスが定義されれば、意味分類器が、中間ノード、例えば、サブ分類器３０４、３０６、３０８、３１０に対して構築される。各分類器又はサブ分類器は、１つずつ、異なる手順で構築することができる。１つの実施形態において、「包括的な」分類器が提供されて、次に、「包括的な」分類器は、各画像カテゴリーの例示的画像から学習する。この手順は、本開示によるシステム及び方法が、特別に各分類器を設計することなくたくさんの意味分類器を構築できるようにしている。この形式の分類器は、学習型シーン（ｌｅａｒｎｉｎｇ−ｂａｓｅｄｓｃｅｎｅ）又はオブジェクト認識と呼ばれる。例示の学習型シーン又はオブジェクト認識は、R. Fergus, P. Perona, and A. Zissermanによる、Object Class Recognition by Unsupervised Scale-Invariant Learning", Proc. of the IEEE Conf on Computer Vision and Pattern Recognition 2003に開示されている。Fergusらの論文において、スケールが不変の方法で、ラベル付けされていないかつセグメント化されていない散乱したシーンから学習し、オブジェクトクラスモデルを認識する方法が記載されている。この方法において、オブジェクトは、パーツのフレキシブルな集合としてモデル化される。確率的表現が、全てのオブジェクトの態様、形態、外観、閉鎖及び相対スケールに対して用いられる。エントロピー型特徴検出器は、画像内の領域およびスケールを選択するために用いられる。学習において、スケール不変のオブジェクトモデルが評価される。これは、最大尤度設定における期待値最大化を用いることにより行われる。認識において、この方法は、画像を分類するためにベイシアン方に用いられている。 Once a semantic class is defined, a semantic classifier is built for intermediate nodes, eg, sub-classifiers 304, 306, 308, 310. Each classifier or sub-classifier can be constructed in a different procedure, one at a time. In one embodiment, a “global” classifier is provided, and then the “global” classifier learns from the exemplary images for each image category. This procedure allows the system and method according to the present disclosure to build many semantic classifiers without having to design each classifier specifically. This type of classifier is called learning-based scene or object recognition. An exemplary learning scene or object recognition is disclosed in R. Fergus, P. Perona, and A. Zisserman in Object Class Recognition by Unsupervised Scale-Invariant Learning ", Proc. Of the IEEE Conf on Computer Vision and Pattern Recognition 2003. The article by Fergus et al. Describes how to learn from an unlabeled and unsegmented scattered scene and recognize an object class model in a scale-invariant manner. The object is modeled as a flexible set of parts, and a probabilistic representation is used for all object aspects, forms, appearances, closures and relative scales. Used to select the region and scale of the scale, and in learning, the scale invariant object model Is evaluated. This is in. The recognition performed by using the expectation maximization in the maximum likelihood setting, this method has been used for Beishian side to classify the images.

分類器を定義し、構築する別の方法は、画像ユーザーによる「キーワードタグ付け」を用いることである。「キーワードタグ付け」に関し、画像ユーザーは、手動で「木」「顔」「青空」等のキーワードを画像に割り当てる。これら手動によりタグ付けされたキーワードは、画像の特徴の種類が考慮されており、したがって、分類を目的として用いることができる。例えば、キーワードスポット分類器が構築されて、ひとたび分類器があるキーワードを発見すると、画像をあるクラスに分類することができる。より高性能に、タグ付けされたキーワードは、ある種の特徴として取り扱われ、特徴ベクトルに変換される。これは、「ｔｅｒｍｖｅｃｔｏｒ」（タームベクトル）と呼ばれるイメージ検索に用いられる技術により実現される。基本的に、Ｎキーワードをもつ辞書が構築されて、そしてキーワードがタグ付けされた各画像に対してＮ次元のキーワード特徴ベクトルが画像に割り当てられる。画像が辞書におけるｉ番目のキーワードでタグ付けされた場合、次に、タームベクトルのｉ番目の要素に１が割り当てられ、もしくは０が割り当てられる。結果として、各画像に関するタームベクトルが提供されて、画像の意味を表すこととなる。このタームベクトルは、上述の特徴ベクトルで連結することができ、図７に示すように画像分類のための新たな特徴ベクトルを形成する。 Another way to define and build classifiers is to use “keyword tagging” by image users. Regarding “keyword tagging”, the image user manually assigns keywords such as “tree”, “face”, and “blue sky” to the image. These manually tagged keywords take into account the types of image features and can therefore be used for classification purposes. For example, once a keyword spot classifier is constructed and a keyword with a classifier is found, the images can be classified into a class. For higher performance, tagged keywords are treated as a kind of feature and converted to a feature vector. This is realized by a technique used for image retrieval called “term vector” (term vector). Basically, a dictionary with N keywords is built and an N-dimensional keyword feature vector is assigned to the image for each image tagged with the keyword. If the image is tagged with the i-th keyword in the dictionary, then the i-th element of the term vector is assigned 1 or 0. As a result, a term vector for each image is provided to represent the meaning of the image. This term vector can be concatenated with the feature vectors described above to form a new feature vector for image classification as shown in FIG.

各画像のサブセットに関し、画像検索器が手動で設計されるか、又は学習される。画像検索器は、データベースのサブセット内で類似検索を実行するために用いられる。 For each image subset, an image searcher is manually designed or learned. An image searcher is used to perform a similarity search within a subset of the database.

分類器が定義され構築された後に、データベースにおける各画像がサブセットに分類される。画像のサブセットを構築する方法は、分類−検索処理に非常に類似している。画像がデータベースに入力された場合、図８に示すように底レベルの分類器の１つに相当する画像プールに画像が置かれる分類ツリーの底レベルに到達するまで、分類ツリーに自動的に分類される。 After the classifier is defined and built, each image in the database is classified into a subset. The method of building a subset of images is very similar to the classification-retrieval process. When an image is entered into the database, it is automatically classified into a classification tree until it reaches the bottom level of the classification tree where the image is placed in the image pool corresponding to one of the bottom level classifiers as shown in FIG. Is done.

潜在的な課題は、画像が２以上のオブジェクトを含むことであり、例えば、人々と木とを含むような画像である。例えば、「人々」と「木」のように、分類ツリーに２つの意味クラスがある場合、画像を１つのクラスに分類するのに曖昧さが存在しうる。この課題は、上述の冗長分類により解決することができる。すなわち、入力画像は、２つのサブセットに分類される。 A potential challenge is that the image contains more than one object, for example an image that contains people and trees. For example, if there are two semantic classes in the classification tree, such as “people” and “trees”, there may be ambiguity in classifying images into one class. This problem can be solved by the above-described redundancy classification. That is, the input image is classified into two subsets.

本開示による教唆を組み込んだ実施形態が、ここで詳細に説明されているけれども、当
業者であれば、これら教唆をやはり組み込んだ他の様々な多くの実施形態をたやすく考案できる。分類検索ツリーで画像を効率よく意味類似検索するシステム及び方法の好適な実施形態を記載したことは（説明したものに限られないが）、上述の教唆に照らし合わせて、当業者により修正及び変形が可能であることに留意されたい。したがって、添付の特許請求の範囲により説明される開示の範囲内で、開示された開示の特定の実施形態において変更可能であることが理解されよう。 Although embodiments incorporating teachings according to the present disclosure are described in detail herein, those skilled in the art can readily devise many other various embodiments that also incorporate these teachings. Having described preferred embodiments of a system and method for efficient semantic similarity search of images in a categorized search tree (but not limited to those described), modifications and variations by those skilled in the art in light of the above teachings Note that is possible. Thus, it will be understood that changes may be made in the particular embodiments of the disclosed disclosure within the scope of the disclosure described by the appended claims.

Claims

A method for searching a plurality of images for an image of interest,
Building a classification structure for the plurality of images including at least two image categories representing a subset of the plurality of images;
Receiving a query image;
Categorizing the query image to select one of the at least two image categories;
Limiting the search to a selected one of the at least two image categories with respect to an image of the image of interest.

The method of claim 1, wherein the classification structure is a semantic classification search tree.

The step of classifying the query image includes:
Extracting features from the query image;
The method of claim 1, comprising identifying one of the at least two image categories based on extracted features.

The method of claim 1, wherein the step of classifying the query image is performed by a pattern recognition function.

Building the classification structure includes determining a classifier for each of the image categories;
The method of claim 1, wherein the classifier classifies images into one of the at least two image categories.

The method of claim 5, wherein the step of determining the classifier is performed by using a clustering function for the plurality of images.

6. The method of claim 5, further comprising determining at least one sub-classifier for each of the determined classifiers.

Classifying each of the plurality of images based on the determined classifier;
6. The method of claim 5, further comprising storing each of the plurality of images in at least one of the subset of the plurality of images.

Building the classification structure comprises:
Tagging each image of the plurality of images with a feature keyword;
Storing each of the plurality of images in at least one of the subsets of the plurality of images based on the feature keyword.

The method of claim 9, further comprising determining a classifier for each of the image categories based on the feature keywords.

Building the classification structure comprises:
Recognizing an object from each of the plurality of images of the at least two image categories;
Determining a classifier for each of the image categories based on the recognized object of each image;
Further including
The method of claim 1, wherein the classifier classifies images into one of the at least two categories.

The method of claim 1, wherein the search for the image of interest is performed by similarity measurement.

Classifying the query image into at least two of the at least two image categories;
Searching the image of interest using the query image in the at least two image categories;
Determining a similarity score for each of the images found in each of the at least two categories;
The method of claim 1, further comprising: selecting the image with the highest similarity score as the image of interest.

A system for searching a plurality of images related to an image of interest,
A database comprising the plurality of images structured into at least two of the image semantic categories representing a subset of the plurality of images;
Means for obtaining at least one query image;
An image classification module for classifying the query image and selecting one of the at least two image semantic categories;
An image search module for searching for the image of interest using the query image;
With
The system is characterized in that the search is limited to a selected one of at least two of the image semantic categories.

The image processing module further comprises a feature extraction unit that extracts features from the query image, and the image classification module determines one of at least two image semantic categories based on the extracted features. Item 15. The system according to Item 14.

The system of claim 14, wherein the image classification module includes a pattern recognition function.

Means for constructing a semantic classification search tree including a classifier for each of the image semantic categories, wherein the classifier classifies images into one of at least two of the image semantic categories. The system according to claim 14.

The system of claim 17, wherein the image classification module determines the classifier by applying a clustering function to the plurality of images.

The system of claim 17, wherein the image classification module determines a sub-classifier for each determined classifier.

The image classification module classifies each of the plurality of images based on the determined classifier, and stores each of the plurality of images in a subset of the plurality of images in the database. The system according to claim 17.

Further comprising a keyword tagging unit that tags each image of the plurality of images with a feature keyword and stores each of the plurality of images in a subset of the plurality of images of the database based on the feature keyword. The system of claim 17.

The system of claim 21, wherein the image classification module determines the classifier for each of the image semantic categories based on the feature keywords.

An object identification unit for identifying an object from each of the plurality of images of at least two image semantic categories, wherein the image classification module determines a classifier for the image semantic category based on the recognized objects of each image; 18. The system of claim 17, wherein:

The system of claim 14, wherein the image retrieval module includes a similarity measure.

The image classification module classifies the query image into at least two of the at least two image semantic categories, and the image search module uses the query image to determine the interest in at least two of the image semantic categories. Searching for images, determining a similarity score for each image found in each of at least two of the image semantic categories, and selecting the image with the highest similarity score as the image of interest The system according to claim 14.

A machine readable program storage device that specifically executes program instructions executable by a machine to perform a plurality of method steps for retrieving a plurality of images for an image of interest,
The method
Building a classification structure for the plurality of images including at least two image categories representing a subset of the plurality of images;
Receiving a query image;
Classifying the query images and selecting one of at least two image categories;
Limiting the search for the image of interest to a selected one of the at least two image categories;
A method comprising the steps of: