JP2009282883A

JP2009282883A - Image retrieval system, crawling device, and image retrieval device

Info

Publication number: JP2009282883A
Application number: JP2008136318A
Authority: JP
Inventors: Arihito Asai; 有人浅井
Original assignee: Fujifilm Corp
Current assignee: Fujifilm Corp
Priority date: 2008-05-26
Filing date: 2008-05-26
Publication date: 2009-12-03

Abstract

<P>PROBLEM TO BE SOLVED: To allow a user to easily know where a desired image is posted on the Internet. <P>SOLUTION: A crawling engine 11 is connected to a web page on the Internet 40 so as to acquire a URL of the web page, a URL of an image posted on the web page, and its image data. A hash value of the collected image is calculated by an image hash calculating device 12 so as to store the hash value in an image DB 30 by making it relate to the URL of the web page from which the image is acquired, and the URL or the like of the image. A retrieval device 23 checks whether or not a hash value coincident with the same hash value as that outputted from the image hash calculating device 22 is included in the image DB 30 and acquires the URL of the web page made to relate to the hash value and the URL of the image from the image DB 30 so as to display them on a retrieval result display device 24. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は画像検索システム、クローリング装置及び画像検索装置に係り、特にインターネット上から所望の画像を検索可能な画像検索システム、クローリング装置及び画像検索装置に関する。 The present invention relates to an image search system, a crawling device, and an image search device, and more particularly to an image search system, a crawling device, and an image search device that can search for a desired image on the Internet.

特許文献１には、自然文による検索条件の内容からキーワードを抽出し、キーワードと類似する情報をインターネットから収集した情報が登録されたデータベースから抽出し、その結果をキーワードとの類似度が高い順に並べて表示する発明が開示されている。 In Patent Literature 1, keywords are extracted from the contents of search conditions using natural sentences, information similar to the keywords is extracted from a database in which information collected from the Internet is registered, and the results are in descending order of similarity to the keywords. An invention of displaying side by side is disclosed.

特許文献２には、画像、音楽などのメディアデータからテキストデータからなる鍵を抽出し、メディアデータ、鍵、ネットワーク上のサイト等を関連付けたデータベースを構築する発明が開示されている。
特開２０００−２３１５６９号公報特開２０００−７６３００号公報 Patent Document 2 discloses an invention in which a key composed of text data is extracted from media data such as images and music, and a database in which the media data, the key, a site on a network, and the like are associated is disclosed.
Japanese Patent Application Laid-Open No. 2000-231569 JP 2000-76300 A

特許文献１、２に記載の発明のようなテキストデータを用いて検索を行う方法は、広くインターネット検索の技術分野において既知である。また、イメージ検索のように、テキスト情報からインターネット上の画像を検索することもよく行われている。 Methods for performing search using text data such as the inventions described in Patent Documents 1 and 2 are widely known in the technical field of Internet search. In addition, as in image search, images on the Internet are often searched from text information.

しかしながら、上記従来技術では、既に手元にある画像がインターネット上のどこのページに掲載されていたものかを調べることはできないという問題がある。したがって、画像の著作権を所有している著作権者が、自身の所有する画像の違法な転載などを効率よく調べる方法は未だに提供されていない。 However, the above-described conventional technique has a problem that it is impossible to check on which page on the Internet the image already on hand is placed. Therefore, a method for efficiently examining an illegal reprint of an image owned by the copyright owner who owns the copyright of the image has not yet been provided.

本発明はこのような事情に鑑みてなされたもので、所望の画像がインターネット上のどこに掲載されているかを容易に知ることができる画像検索システム、クローリング装置及び画像検索装置を提供することを目的とする。 The present invention has been made in view of such circumstances, and an object thereof is to provide an image search system, a crawling device, and an image search device that can easily know where a desired image is posted on the Internet. And

請求項１に記載の画像検索システムは、クローリング装置と、画像検索装置とで構成された画像検索システムであって、前記クローリング装置は、インターネット上からウェブページのＵＲＬと、該ウェブページに含まれる画像とを収集するクローリング手段と、前記クローリング手段により収集された画像の画像データに基づいて該画像の識別情報を算出する第１の識別情報算出手段と、前記第１の識別情報算出手段により算出された識別情報と、該識別情報の基となる画像が掲載されていたウェブページのＵＲＬであって、前記クローリング手段により収集されたウェブページのＵＲＬとを関連付けて保存するデータベースと、を備え、前記画像検索装置は、検索対象となる画像を取得する検索画像取得手段と、前記検索画像取得手段により取得された画像から該画像の識別情報を算出する第２の識別情報算出手段と、前記第２の識別情報算出手段により算出された識別情報と同じ識別情報が前記データベースに含まれるかどうかを検索し、前記第２の識別情報算出手段により算出された識別情報と同じ識別情報が前記データベースに含まれていた場合には、該識別情報に関連付けられたウェブページのＵＲＬを前記データベースから取得する検索手段と、前記検索手段により取得されたＵＲＬを出力する出力手段と、を備えたことを特徴とする。 The image search system according to claim 1 is an image search system including a crawling device and an image search device, and the crawling device is included in a URL of a web page from the Internet and the web page. Calculated by the first identification information calculating means, the first identification information calculating means for calculating the identification information of the image based on the image data of the image collected by the crawling means, and the first identification information calculating means. A database that stores the identified identification information in association with the URL of the web page on which the image serving as the basis of the identification information was posted, and the URL of the web page collected by the crawling means; The image search device includes search image acquisition means for acquiring an image to be searched, and search image acquisition means. A second identification information calculation unit that calculates identification information of the image from the acquired image, and whether or not the same identification information as the identification information calculated by the second identification information calculation unit is included in the database. When the database contains the same identification information as the identification information calculated by the second identification information calculation means, the URL of the web page associated with the identification information is acquired from the database. Search means; and output means for outputting the URL acquired by the search means.

請求項１に記載の画像検索システムによれば、インターネット上からウェブページのＵＲＬと、該ウェブページに含まれる画像とを収集し、収集された画像から該画像の識別情報を算出し、算出された識別情報と、該識別情報の基となる画像が掲載されていたウェブページのＵＲＬとを関連付けてデータベースに保存する。そして、検索対象となる画像を取得し、取得された画像から該画像の識別情報を算出し、算出された識別情報と同じ識別情報がデータベースに含まれるかどうかを検索し、算出された識別情報と同じ識別情報がデータベースに含まれていた場合には、この識別情報に関連付けられたウェブページのＵＲＬをデータベースから取得する。これにより、画像自身を検索のクエリーとして、その画像が掲載されているウェブページを検索することができる。そのため、手元にある画像の掲載元を事後的に知ることができる。 According to the image search system of claim 1, the URL of the web page and the image included in the web page are collected from the Internet, and the identification information of the image is calculated from the collected image. The identification information and the URL of the web page on which the image that is the basis of the identification information is associated are stored in the database. Then, an image to be searched is acquired, identification information of the image is calculated from the acquired image, whether or not the same identification information as the calculated identification information is included in the database, and the calculated identification information If the same identification information is included in the database, the URL of the web page associated with this identification information is acquired from the database. As a result, the web page on which the image is posted can be searched using the image itself as a search query. Therefore, it is possible to know the publisher of the image at hand afterwards.

請求項２に記載の画像検索システムは、請求項１に記載の画像検索システムにおいて、前記クローリング手段は、前記ウェブページに含まれる画像と共に該画像のＵＲＬを取得し、前記データベースは、前記クローリング手段により取得された画像のＵＲＬを該画像の識別情報と関連付けて保存し、前記検索手段は、前記識別情報に関連付けられたウェブページのＵＲＬ及び画像のＵＲＬを前記データベースから取得することを特徴とする。 The image search system according to claim 2 is the image search system according to claim 1, wherein the crawling unit obtains a URL of the image together with the image included in the web page, and the database includes the crawling unit. The URL of the image obtained by the above is stored in association with the identification information of the image, and the search means acquires the URL of the web page and the URL of the image associated with the identification information from the database. .

請求項２に記載の画像検索システムによれば、識別情報と関連付けてその画像のＵＲＬがデータベースに保存され、検索対象の画像の識別情報と同じ識別情報がデータベースに含まれていた場合には、この識別情報に関連付けられたウェブページのＵＲＬ及び画像のＵＲＬをデータベースから取得する。これにより、検索された画像のインターネット上での利用、転載の状況を知ることができる。 According to the image search system of the second aspect, when the URL of the image is stored in the database in association with the identification information, and the same identification information as the identification information of the image to be searched is included in the database, The URL of the web page and the URL of the image associated with this identification information are acquired from the database. As a result, it is possible to know the use and reprint status of the searched image on the Internet.

請求項３記載の画像検索システムは、請求項１又は２に記載の画像検索システムにおいて、前記クローリング装置は、前記クローリング手段により収集された画像に電子透かしが挿入されているかどうか、及び電子透かしが挿入されている場合には当該電子透かしに関する情報を検出する電子透かし検出手段を備え、前記画像検索装置は、検索対象となる電子透かしに関する情報を取得する電子透かし情報取得手段を備え、前記データベースは、前記電子透かし検出手段により検出された電子透かしに関する情報を該電子透かしに関する情報が検出された画像の識別情報と関連付けて保存し、前記検索手段は、前記電子透かし情報取得手段により取得された電子透かしに関する情報と同じ情報が前記データベースに含まれるかどうかを検索し、前記電子透かし情報取得手段により取得された電子透かしに関する情報と同じ情報が前記データベースに含まれていた場合には、前記電子透かしに関する情報に関連付けられたウェブページのＵＲＬ及び識別情報を前記データベースから取得することを特徴とする。 The image search system according to claim 3 is the image search system according to claim 1 or 2, wherein the crawling device is configured to determine whether a digital watermark is inserted into the image collected by the crawling means, and whether the digital watermark is In the case where it is inserted, it comprises a digital watermark detection means for detecting information related to the digital watermark, the image search device comprises a digital watermark information acquisition means for acquiring information related to a digital watermark to be searched, and the database Storing the information related to the digital watermark detected by the digital watermark detection means in association with the identification information of the image from which the information related to the digital watermark is detected, and the search means is the electronic watermark acquired by the digital watermark information acquisition means Search whether the database contains the same information as watermark information. When the same information as the information related to the digital watermark acquired by the digital watermark information acquisition means is included in the database, the URL and identification information of the web page associated with the information related to the digital watermark are acquired from the database. It is characterized by doing.

請求項３に記載の画像検索システムによれば、画像の識別情報と関連付けてその画像から検出された電子透かしに関する情報がデータベースに保存される。検索対象となる電子透かしに関する情報を取得し、取得された電子透かしに関する情報と同じ情報がデータベースに含まれていた場合には、この電子透かしに関する情報に関連付けられたウェブページのＵＲＬ及び識別情報をデータベースから取得する。これにより、画像に文字が上書きされる、画像がトリミングされるなどの加工が行われた場合においても、その画像が掲載されているウェブページを検索することができる。そのため、手元にある画像の掲載元を事後的に知ることができる。 According to the image search system of the third aspect, information relating to the electronic watermark detected from the image in association with the identification information of the image is stored in the database. If the information related to the digital watermark to be searched is acquired and the same information as the information related to the acquired digital watermark is included in the database, the URL and identification information of the web page associated with the information related to the digital watermark are obtained. Get from database. As a result, even when processing such as overwriting characters on the image or trimming the image is performed, the web page on which the image is posted can be searched. Therefore, it is possible to know the publisher of the image at hand afterwards.

請求項４に記載の画像検索システムは、請求項１から３のいずれかに記載の画像検索システムにおいて、前記識別情報は、前記画像に対してハッシュ関数を適用することにより算出されたハッシュ値であることを特徴とする。これにより、同一の画像が掲載されているウェブページを確実に検索することができる。 The image search system according to claim 4 is the image search system according to any one of claims 1 to 3, wherein the identification information is a hash value calculated by applying a hash function to the image. It is characterized by being. Thereby, a web page on which the same image is posted can be reliably searched.

請求項５に記載の画像検索システムは、請求項１から３のいずれかに記載の画像検索システムにおいて、前記識別情報は、前記画像固有の特徴を示す特徴量であることを特徴とする。これにより、同一の画像のみでなく、画像の拡大、縮小や、保存の形式（例えばjpeg形式からbmp形式）の変更などの処理が行われることによりデータとしては異なるが、ユーザから見て区別がつかない画像が掲載されたウェブページについても確実に検索することができる。 An image search system according to a fifth aspect is the image search system according to any one of the first to third aspects, wherein the identification information is a feature amount indicating a characteristic unique to the image. As a result, not only the same image but also data such as enlargement / reduction of the image, change of the storage format (for example, jpeg format to bmp format), and the like are different, but the distinction is seen from the user. You can also search for web pages with images that are not connected.

請求項６に記載の画像検索システムは、クローリング装置と、画像検索装置とで構成された画像検索システムであって、前記クローリング装置は、インターネット上からウェブページのＵＲＬと、該ウェブページに含まれる静止画とを収集するクローリング手段と、前記クローリング手段により収集された静止画に電子透かしが挿入されているかどうか、及び電子透かしが挿入されている場合には当該電子透かしに関する情報を検出する電子透かし検出手段と、前記クローリング手段により収集されたウェブページのＵＲＬと、前記電子透かし検出手段により検出された電子透かしに関する情報とを関連付けて保存するデータベースと、を備え、前記画像検索装置は、検索対象となる電子透かしに関する情報を取得する電子透かし情報取得手段と、前記電子透かし情報取得手段により取得された電子透かしに関する情報と同じ情報が前記データベースに含まれるかどうかを検索し、前記電子透かし情報取得手段により取得された電子透かしに関する情報と同じ情報が前記データベースに含まれていた場合には、該電子透かしに関する情報に関連付けられたウェブページのＵＲＬを取得する検索手段と、前記検索手段により取得されたウェブページのＵＲＬを出力する出力手段と、を備えたことを特徴とする。 The image search system according to claim 6 is an image search system including a crawling device and an image search device, and the crawling device is included in a URL of a web page from the Internet and the web page. Crawling means for collecting still images, whether or not a digital watermark is inserted in the still images collected by the crawling means, and a digital watermark for detecting information related to the digital watermark when a digital watermark is inserted And a database that stores the URL of the web page collected by the crawling unit and the information related to the digital watermark detected by the digital watermark detection unit in association with each other. Digital watermark information acquisition hand that acquires information about digital watermark And whether the same information as the digital watermark information acquired by the digital watermark information acquisition means is included in the database, and the same information as the digital watermark information acquired by the digital watermark information acquisition means is A search unit that acquires the URL of the web page associated with the information related to the digital watermark, and an output unit that outputs the URL of the web page acquired by the search unit, if included in the database; It is characterized by that.

請求項６に記載の画像検索システムによれば、インターネット上からウェブページのＵＲＬと、そのウェブページに含まれる静止画とを収集し、収集された静止画に電子透かしが挿入されているかどうか、及び電子透かしが挿入されている場合には当該電子透かしに関する情報を検出し、検出された電子透かしに関する情報と、収集されたウェブページのＵＲＬとを関連付けてデータベースに保存する。そして、検索対象となる電子透かしに関する情報を取得し、取得された電子透かしに関する情報と同じ情報がデータベースに含まれるかどうかを検索し、取得された電子透かしに関する情報と同じ情報がデータベースに含まれていた場合には、この電子透かしに関する情報に関連付けられたウェブページのＵＲＬを取得する。これにより、画像に文字が上書きされる、画像がトリミングされるなどの加工が行われた場合においても、その画像が掲載されているウェブページを検索することができる。そのため、手元にある画像の掲載元を事後的に知ることができる。 According to the image search system of claim 6, the URL of the web page and the still image included in the web page are collected from the Internet, and whether or not a digital watermark is inserted into the collected still image, If a digital watermark is inserted, information on the digital watermark is detected, and the detected information on the digital watermark and the URL of the collected web page are associated with each other and stored in a database. Then, the information about the digital watermark to be searched is acquired, whether or not the same information as the information about the acquired digital watermark is included in the database, and the same information as the information about the acquired digital watermark is included in the database. If so, the URL of the web page associated with the information related to the digital watermark is acquired. As a result, even when processing such as overwriting characters on the image or trimming the image is performed, the web page on which the image is posted can be searched. Therefore, it is possible to know the publisher of the image at hand afterwards.

請求項７に記載の画像検索システムは、請求項６に記載の画像検索システムにおいて、前記クローリング手段は、前記ウェブページに含まれる画像と共に該画像のＵＲＬを取得し、前記データベースは、前記クローリング手段により取得された画像のＵＲＬを該画像の電子透かしに関する情報と関連付けて保存し、前記検索手段は、前記画像の電子透かしに関する情報に関連付けられたウェブページのＵＲＬ及び画像のＵＲＬを前記データベースから取得することを特徴とする。 The image search system according to claim 7 is the image search system according to claim 6, wherein the crawling unit acquires a URL of the image together with the image included in the web page, and the database includes the crawling unit. The URL of the image obtained by the above is stored in association with information relating to the digital watermark of the image, and the search means obtains the URL of the web page and the URL of the image associated with the information relating to the digital watermark of the image from the database. It is characterized by doing.

請求項７に記載の画像検索システムによれば、画像から検出された電子透かしに関する情報と関連付けてその画像のＵＲＬがデータベースに保存される。検出対象として電子透かしに関する情報を取得し、その電子透かしに関する情報と同じ識別情報がデータベースに含まれていた場合には、この電子透かしに関する情報に関連付けられたウェブページのＵＲＬ及び画像のＵＲＬをデータベースから取得する。これにより、検索された画像のインターネット上での利用、転載の状況を知ることができる。 According to the image search system of the seventh aspect, the URL of the image is stored in the database in association with the information related to the digital watermark detected from the image. When the information related to the digital watermark is acquired as a detection target and the same identification information as the information related to the digital watermark is included in the database, the URL of the web page and the URL of the image associated with the information related to the digital watermark are stored in the database. Get from. As a result, it is possible to know the use and reprint status of the searched image on the Internet.

請求項８に記載のクローリング装置は、請求項１から７のいずれかに記載の画像検索システムを構成する。 A crawling device according to an eighth aspect constitutes an image search system according to any one of the first to seventh aspects.

請求項９に記載の画像検索装置は、請求項１から７のいずれかに記載の画像検索システムを構成する。 According to a ninth aspect of the present invention, there is provided an image retrieval apparatus according to any one of the first to seventh aspects.

本発明によれば、所望の画像がインターネット上のどこに掲載されているかを容易に知ることができる。 According to the present invention, it is possible to easily know where a desired image is posted on the Internet.

＜第１の実施の形態＞
図１は、第１の実施の形態に係る画像検索システム１の全体構造の概略図である。画像検索システム１は、主として、クローリング部１０と、検索部２０と、画像ＤＢ（データベース）３０とで構成される。 <First Embodiment>
FIG. 1 is a schematic diagram of the overall structure of an image search system 1 according to the first embodiment. The image search system 1 mainly includes a crawling unit 10, a search unit 20, and an image DB (database) 30.

クローリング部１０は、主として、クローリングエンジン１１と、画像ハッシュ算出装置１２とで構成され、検索部２０は、主として、画像入力装置２１と、画像ハッシュ算出装置２２と、検索装置２３と、検索結果表示装置２４とで構成される。クローリングエンジン１１には、インターネット４０が接続されている。また、クローリングエンジン１１には、画像ＤＢ３０が接続され、画像ＤＢ３０は検索装置２３に接続される。 The crawling unit 10 mainly includes a crawling engine 11 and an image hash calculation device 12. The search unit 20 mainly includes an image input device 21, an image hash calculation device 22, a search device 23, and a search result display. And device 24. The crawling engine 11 is connected to the Internet 40. In addition, the image DB 30 is connected to the crawling engine 11, and the image DB 30 is connected to the search device 23.

まず、クローリング部１０について説明する。 First, the crawling unit 10 will be described.

クローリングエンジン１１は、インターネット４０上のウェブページに接続して、そのウェブページのＵＲＬを取得する。また、クローリングエンジン１１は、ＨＴＭＬ（ＨｙｐｅｒＴｅｘｔＭａｒｋｕｐＬａｎｇｕａｇｅ）等で記述されたリンク情報を辿って、ウェブページに掲載された画像のＵＲＬと、その画像データを取得する。 The crawling engine 11 connects to a web page on the Internet 40 and acquires the URL of the web page. The crawling engine 11 traces link information described in HTML (Hyper Text Markup Language) or the like, and acquires the URL of the image posted on the web page and its image data.

クローリングエンジン１１で収集された画像は、クローリングエンジン１１から画像ハッシュ算出装置１２に出力され、画像ハッシュ算出装置１２においてＭＤ５やＳＨＡ−１といったハッシュ関数を利用してその画像のハッシュ値が算出され、その画像のハッシュ値が画像ハッシュ算出装置１２からクローリングエンジン１１に入力される。 The image collected by the crawling engine 11 is output from the crawling engine 11 to the image hash calculation device 12, and the hash value of the image is calculated by using the hash function such as MD5 or SHA-1 in the image hash calculation device 12, The hash value of the image is input from the image hash calculation device 12 to the crawling engine 11.

クローリングエンジン１１は、画像のハッシュ値を、その画像が取得されたウェブページのＵＲＬ、その画像のＵＲＬ等と関連付けて画像ＤＢ（データベース）３０に格納する。クローリングエンジン１１は、インターネット４０を自動巡回し、この処理を再帰的に行うことで、画像ＤＢ３０を継続的に増大させる。 The crawling engine 11 stores the hash value of the image in the image DB (database) 30 in association with the URL of the web page from which the image is acquired, the URL of the image, and the like. The crawling engine 11 automatically circulates the Internet 40 and recursively performs this process, thereby continuously increasing the image DB 30.

画像ＤＢ３０のデータ構造を図２に示す。画像ＤＢ３０は、画像が掲載されているウェブページのＵＲＬと、そのウェブページに掲載されている画像のＵＲＬと、その画像のハッシュ値とを関連付けて保存する。 The data structure of the image DB 30 is shown in FIG. The image DB 30 stores the URL of the web page on which the image is posted, the URL of the image posted on the web page, and the hash value of the image in association with each other.

ハッシュ値は、画像の同一性を検出する指標であり、同じ画像からは同じハッシュ値が算出される。図２に示す場合においては、１行目、４行目及び７行目に同じハッシュ値「１２３４５６７８」が格納されており、この３つが同じ画像であることを示している。 The hash value is an index for detecting the identity of images, and the same hash value is calculated from the same image. In the case illustrated in FIG. 2, the same hash value “12345678” is stored in the first row, the fourth row, and the seventh row, indicating that these three are the same image.

次に、検索部２０について説明する。 Next, the search unit 20 will be described.

画像入力装置２１は、検索対象となる画像を取得し、画像ハッシュ算出装置２２へ出力する。 The image input device 21 acquires an image to be searched and outputs it to the image hash calculation device 22.

画像ハッシュ算出装置２２は、画像入力装置２１から出力された画像を入力し、ＭＤ５やＳＨＡ−１といったハッシュ関数を利用してその画像のハッシュ値を算出する。 The image hash calculation device 22 receives the image output from the image input device 21 and calculates a hash value of the image using a hash function such as MD5 or SHA-1.

検索装置２３は、画像ハッシュ算出装置２２から出力されたハッシュ値と同一のハッシュ値と一致するハッシュ値が画像ＤＢ３０に含まれるかどうかを検索し、そのハッシュ値に関連付けられたウェブページのＵＲＬと、画像のＵＲＬとを画像ＤＢ３０から取得する。 The search device 23 searches whether or not the image DB 30 includes a hash value that matches the hash value that is the same as the hash value output from the image hash calculation device 22, and the URL of the web page associated with the hash value and The URL of the image is acquired from the image DB 30.

例えば、画像入力装置２１で取得された画像から画像ハッシュ算出装置２２においてハッシュ値「１２３４５６７８９」が算出され、画像ハッシュ算出装置２２から検索装置２３にハッシュ値「１２３４５６７８９」が入力されたとする。図２に示すように、画像ＤＢ３０には、ハッシュ値「１２３４５６７８９」が１行目、４行目及び７行目に格納されているため、検索装置２３は、１行目、４行目及び７行目に格納されているウェブページのＵＲＬと、画像のＵＲＬとを画像ＤＢ３０から取得する。 For example, it is assumed that the hash value “123456789” is calculated by the image hash calculation device 22 from the image acquired by the image input device 21 and the hash value “123456789” is input from the image hash calculation device 22 to the search device 23. As illustrated in FIG. 2, since the hash value “123456789” is stored in the first row, the fourth row, and the seventh row in the image DB 30, the search device 23 performs the first row, the fourth row, and the seventh row. The URL of the web page and the URL of the image stored in the line are acquired from the image DB 30.

検索装置２３は、この検索結果を一覧形式で検索結果表示装置２４に出力する。検索結果表示装置２４は、カラー表示が可能な液晶ディスプレイであり、検索装置２３から出力された検索結果が検索結果表示装置２４に表示される。 The search device 23 outputs the search results to the search result display device 24 in a list format. The search result display device 24 is a liquid crystal display capable of color display, and the search result output from the search device 23 is displayed on the search result display device 24.

検索結果表示装置２４に表示された検索結果の一例を図３に示す。検索結果表示装置２４には、検索結果として画像ＤＢ３０の１行目に格納されているウェブページのＵＲＬ「http://foo/index.html」及び画像のＵＲＬ「http://foo/img.jpeg」と、４行目に格納されているウェブページのＵＲＬ「http://bar/index.html」及び画像のＵＲＬ「http://foo/img.jpeg」と、７行目に格納されているウェブページのＵＲＬ「http://hoge/img.jpeg」と、画像のＵＲＬ「http://hoge/img2.jpeg」とが一覧表として表示される。 An example of the search result displayed on the search result display device 24 is shown in FIG. In the search result display device 24, the URL “http: //foo/index.html” of the web page stored in the first line of the image DB 30 and the URL “http: // foo / img. jpeg ”, web page URL“ http: //bar/index.html ”stored in the fourth line, and image URL“ http: //foo/img.jpeg ”, stored in the seventh line. The URL of the web page “http: //hoge/img.jpeg” and the URL of the image “http: //hoge/img2.jpeg” are displayed as a list.

これにより、画像入力装置２１で取得された画像が、３つのウェブページに掲載されていることが分かる。また、ウェブページ「http://foo/img.jpeg」及び「http://bar/index.html」に掲載された画像は、同一画像ＵＲＬ「http://foo/img.jpeg」へのリンクであるが、ウェブページ「http://hoge/img.jpeg」に掲載された画像は、画像のデータそのものがコピーされて転載されていることが分かる。 Thereby, it turns out that the image acquired with the image input device 21 is published on three web pages. In addition, images posted on the web pages “http: //foo/img.jpeg” and “http: //bar/index.html” are linked to the same image URL “http: //foo/img.jpeg”. Although it is a link, it can be seen that the image posted on the web page “http: //hoge/img.jpeg” is copied and reprinted from the image data itself.

本実施の形態によれば、キーワードではなく、画像自身を検索のクエリーとして、その画像が掲載されているウェブページを検索することができる。そのため、手元にある画像の掲載元を事後的に知ることができる。また、その画像のインターネット上での利用、転載の状況を知ることができる。 According to the present embodiment, it is possible to search for a web page on which an image is posted by using the image itself instead of a keyword as a search query. Therefore, it is possible to know the publisher of the image at hand afterwards. In addition, it is possible to know the use and reprint status of the image on the Internet.

また、本実施の形態によれば、画像自身を検索のクエリーとするため、異なる画像が掲載されたウェブページが検出されることを防止することができる。また、ユーザがキーワード等を入力する必要が無いため、使い勝手を良くすることができる。 Further, according to the present embodiment, since the image itself is used as a search query, it is possible to prevent detection of a web page on which a different image is posted. In addition, since the user does not need to input a keyword or the like, usability can be improved.

なお、本実施の形態では、検索結果として検索結果表示装置２４にウェブページのＵＲＬと、画像のＵＲＬとを表示したが、画像のＵＲＬの表示は必須ではなく、ウェブページのＵＲＬのみを検索結果表示装置２４に表示するようにしてもよい。この場合には、画像が掲載されているウェブページのＵＲＬと、その画像のハッシュ値とを関連付けて画像ＤＢ３０に保存しておけばよい。 In the present embodiment, the URL of the web page and the URL of the image are displayed on the search result display device 24 as the search results. However, the display of the URL of the image is not essential, and only the URL of the web page is retrieved. You may make it display on the display apparatus 24. FIG. In this case, the URL of the web page on which the image is posted and the hash value of the image may be stored in the image DB 30 in association with each other.

なお、本実施の形態は、静止画のみではなく、動画にも適用することができるし、音楽などの画像以外の各種コンテンツにも適用することができる。 Note that this embodiment can be applied not only to still images but also to moving images, and also to various contents other than images such as music.

＜第２の実施の形態＞
第１の実施の形態では、ハッシュ値に基づいて画像が掲載されているウェブページを検索したが、画像自身を検索のクエリーとしてその画像が掲載されているウェブページを検索する方法はこれに限らない。 <Second Embodiment>
In the first embodiment, the web page on which the image is posted is searched based on the hash value. However, the method for searching the web page on which the image is posted using the image itself as a search query is not limited to this. Absent.

第２の実施の形態は、画像の図柄、色、明るさなどの画像固有の特徴を示す特徴量に基づいて、画像が掲載されているウェブページを検索する形態である。以下、第２の実施の形態に係る画像検索システム２について説明する。以下、第１の実施の形態と同一の部分については、同一の符号を付し、詳細な説明を省略する。 The second embodiment is a form in which a web page on which an image is posted is searched based on a feature amount indicating a characteristic unique to the image such as an image pattern, color, and brightness. Hereinafter, the image search system 2 according to the second embodiment will be described. Hereinafter, the same parts as those in the first embodiment are denoted by the same reference numerals, and detailed description thereof is omitted.

図４に示すように、画像検索システム２は、主として、クローリング部１０−１と、検索部２０−１と、画像ＤＢ３１とで構成される。クローリング部１０−１は、主として、クローリングエンジン１３と、画像特徴量算出装置１４とで構成され、検索部２０−１は、主として、画像入力装置２１と、画像特徴量算出装置２５と、類似検索装置２６と、検索結果表示装置２４とで構成される。クローリングエンジン１１には、インターネット４０が接続されている。クローリングエンジン１１には、画像ＤＢ３１が接続され、画像ＤＢ３１は類似検索装置２６に接続される。 As shown in FIG. 4, the image search system 2 mainly includes a crawling unit 10-1, a search unit 20-1, and an image DB 31. The crawling unit 10-1 mainly includes a crawling engine 13 and an image feature amount calculation device 14, and the search unit 20-1 mainly includes an image input device 21, an image feature amount calculation device 25, and a similarity search. A device 26 and a search result display device 24 are included. The crawling engine 11 is connected to the Internet 40. An image DB 31 is connected to the crawling engine 11, and the image DB 31 is connected to the similarity search device 26.

まず、クローリング部１０−１について説明する。 First, the crawling unit 10-1 will be described.

クローリングエンジン１３は、インターネット４０の電子文書に接続して、そのウェブページのＵＲＬを取得する。また、クローリングエンジン１３は、ＨＴＭＬ等で記述されたリンク情報を辿って、ウェブページに掲載された画像のＵＲＬと、その画像データを取得する。クローリングエンジン１３で収集された画像は、クローリングエンジン１３から画像特徴量算出装置１４に出力される。 The crawling engine 13 connects to an electronic document on the Internet 40 and acquires the URL of the web page. The crawling engine 13 traces link information described in HTML or the like, and acquires the URL of the image posted on the web page and its image data. The images collected by the crawling engine 13 are output from the crawling engine 13 to the image feature amount calculation device 14.

画像特徴量算出装置１４は、クローリングエンジン１３から入力された画像の特徴量を算出する。特徴量とは、画像の図柄、色、明るさなどの画像固有の特徴を表す値であり、例えば多次元のベクトルで与えられる。画像特徴量算出装置１４は、例えば、図５に示すように、画像を４つに分割したときの左上の領域（領域１）、左下の領域（領域２）、右上の領域（領域３）及び右下の領域（領域４）と、画像の中心部の領域（領域５）との５個の領域の色（例えばＹ成分）の平均値を算出し、領域１、領域２、領域３、領域４及び領域５の色の平均値を用いた５次元のベクトルを特徴量として算出する。 The image feature amount calculation device 14 calculates the feature amount of the image input from the crawling engine 13. The feature amount is a value representing image-specific features such as image design, color, and brightness, and is given as, for example, a multidimensional vector. For example, as shown in FIG. 5, the image feature quantity calculating device 14 includes an upper left area (area 1), a lower left area (area 2), an upper right area (area 3), and The average value of the colors (for example, Y component) of the five regions of the lower right region (region 4) and the central region (region 5) of the image is calculated, and region 1, region 2, region 3, region A five-dimensional vector using the average values of the colors of 4 and the region 5 is calculated as a feature amount.

［数１］
特徴量＝（領域１の平均値、領域２の平均値、領域３の平均値、領域４の平均値、領域５の平均値）
これにより、画像の図柄、色、明るさなどが同一、又はユーザから見て区別がつかない程度に図柄、色、明るさなどが異なる画像（類似画像）であれば、同一又は略同一（±１桁程度の多少の誤差を有する）の特徴量を算出することができる。すなわち、画像の拡大、縮小や、保存の形式（例えばjpeg形式からbmp形式）の変更などの処理が行われていたとしても、ユーザから見て区別がつかないのであれば、同一又は略同一の特徴量を算出することができる。 [Equation 1]
Feature amount = (average value of region 1, average value of region 2, average value of region 3, average value of region 4, average value of region 5)
As a result, if images (similar images) having the same design, color, brightness, etc., or different in design, color, brightness, etc. are indistinguishable from the user's perspective, they are the same or substantially the same (± Feature quantity (with a slight error of about one digit) can be calculated. That is, even if processing such as enlargement / reduction of an image or change of a storage format (for example, jpeg format to bmp format) is performed, the same or substantially the same if it cannot be distinguished from the user A feature amount can be calculated.

画像特徴量算出装置１４で算出された特徴量は、クローリングエンジン１３に出力される。クローリングエンジン１３は、画像の特徴量を、その画像が取得されたウェブページのＵＲＬ、その画像のＵＲＬ等と関連付けて画像ＤＢ３１に格納する。クローリングエンジン１３は、インターネット４０を自動巡回し、この処理を再帰的に行うことで、画像ＤＢ３１を継続的に増大させる。 The feature amount calculated by the image feature amount calculation device 14 is output to the crawling engine 13. The crawling engine 13 stores the feature amount of the image in the image DB 31 in association with the URL of the web page from which the image is acquired, the URL of the image, and the like. The crawling engine 13 automatically circulates the Internet 40 and recursively performs this process, thereby continuously increasing the image DB 31.

画像ＤＢ３１のデータ構造を図６に示す。画像ＤＢ３１は、画像が掲載されているウェブページのＵＲＬと、そのウェブページに掲載されている画像のＵＲＬと、その画像の特徴量とを関連付けて保存する。 The data structure of the image DB 31 is shown in FIG. The image DB 31 stores the URL of the web page on which the image is posted, the URL of the image posted on the web page, and the feature amount of the image in association with each other.

次に、検索部２０−１について説明する。 Next, the search unit 20-1 will be described.

画像特徴量算出装置２５は、画像特徴量算出装置１４と同様の方法により、画像入力装置２１から出力された画像から特徴量を算出する。 The image feature amount calculation device 25 calculates a feature amount from the image output from the image input device 21 by the same method as the image feature amount calculation device 14.

類似検索装置２６は、画像特徴量算出装置２５から出力された特徴量と同一又は略同一の特徴量が画像ＤＢ３１に含まれるかどうかを検索し、その特徴量に関連付けられたウェブページのＵＲＬと、画像のＵＲＬとを画像ＤＢ３１から取得する。 The similarity search device 26 searches whether the image DB 31 includes a feature amount that is the same or substantially the same as the feature amount output from the image feature amount calculation device 25, and the URL of the web page associated with the feature amount. The URL of the image is acquired from the image DB 31.

例えば、画像入力装置２１で取得された画像から画像特徴量算出装置２５において特徴量「１２３４５」が算出され、画像特徴量算出装置２５から類似検索装置２６に特徴量「１２３４５」が入力されたとする。図６に示すように、画像ＤＢ３１には、特徴量「１２３４５」が１行目、４行目及び７行目に格納されているため、類似検索装置２６は、１行目、４行目及び７行目に格納されているウェブページのＵＲＬと、画像のＵＲＬとを画像ＤＢ３１から取得する。 For example, it is assumed that the feature value “12345” is calculated in the image feature value calculation device 25 from the image acquired by the image input device 21, and the feature value “12345” is input from the image feature value calculation device 25 to the similarity search device 26. . As shown in FIG. 6, since the feature amount “12345” is stored in the first row, the fourth row, and the seventh row in the image DB 31, the similarity search device 26 uses the first row, the fourth row, and the The URL of the web page and the URL of the image stored in the seventh line are acquired from the image DB 31.

この時、±１桁程度の多少の誤差を許容し、同一及び略同一の特徴量に関連付けられたウェブページのＵＲＬと、画像のＵＲＬとを取得する。これにより、拡大・縮小などの加工が行われたがユーザから見て元の画像と区別がつかないような類似画像と関連付けられたウェブページのＵＲＬと、画像のＵＲＬとを確実に取得することができる。 At this time, a slight error of about ± 1 digit is allowed, and the URL of the web page and the URL of the image associated with the same and substantially the same feature amount are acquired. This ensures acquisition of the URL of the web page and the URL of the image associated with a similar image that has been processed for enlargement / reduction, but is indistinguishable from the original image from the user's perspective. Can do.

類似検索装置２６は、この検索結果を一覧形式で検索結果表示装置２４に出力する。検索結果表示装置２４は、カラー表示が可能な液晶ディスプレイであり、類似検索装置２６から出力された検索結果が検索結果表示装置２４に表示される。 The similarity search device 26 outputs the search results to the search result display device 24 in a list format. The search result display device 24 is a liquid crystal display capable of color display, and the search result output from the similarity search device 26 is displayed on the search result display device 24.

また、本実施の形態によれば、画像自体の特徴を示す特徴量に基づいて検索を行うため、画像の拡大、縮小や、保存の形式（例えばjpeg形式からbmp形式）の変更などの処理が行われたとしても、ユーザから見て区別がつかないがデータとしては異なる画像が掲載されたウェブページについても確実に検索することができる。 In addition, according to the present embodiment, because the search is performed based on the feature amount indicating the feature of the image itself, processing such as enlargement / reduction of the image, change of the storage format (for example, jpeg format to bmp format), etc. Even if it is performed, it is possible to surely search for a web page on which an image different from data is posted although it cannot be distinguished from the user.

なお、本実施の形態では、特徴量として多次元のベクトルを用いたが、これに限らず、５文字〜１００文字程度で構成される様々な形式のものを用いることができる。 In the present embodiment, a multidimensional vector is used as the feature amount. However, the present invention is not limited to this, and various types composed of about 5 to 100 characters can be used.

また、本実施の形態では、検索結果としてウェブページのＵＲＬと、画像のＵＲＬとを出力したが、ウェブページのＵＲＬのみを出力するようにしてもよい。 In this embodiment, the URL of the web page and the URL of the image are output as the search results, but only the URL of the web page may be output.

なお、本実施の形態は、静止画のみではなく、動画のキーフレームにも適用することができる。 Note that this embodiment can be applied not only to still images but also to key frames of moving images.

＜第３の実施の形態＞
第１の実施の形態及び第２の実施の形態では、ハッシュ値や特徴量という画像のデータから算出された識別情報に基づいて画像が掲載されているウェブページを検索した。これらの方法は簡易で確実な方法ではあるが、画像に文字を上書きするなどの加工がされた場合には対応することができない。 <Third Embodiment>
In the first embodiment and the second embodiment, a web page on which an image is posted is searched based on identification information calculated from image data such as a hash value and a feature amount. These methods are simple and reliable methods, but cannot cope with processing such as overwriting characters on an image.

第３の実施の形態は、画像に埋め込まれた電子透かしに基づいて画像が掲載されているウェブページを検索する形態である。以下、第３の実施の形態に係る画像検索システム３について説明する。以下、第１の実施の形態と同一の部分については、同一の符号を付し、詳細な説明を省略する。 In the third embodiment, a web page on which an image is posted is searched based on a digital watermark embedded in the image. The image search system 3 according to the third embodiment will be described below. Hereinafter, the same parts as those in the first embodiment are denoted by the same reference numerals, and detailed description thereof is omitted.

図７に示すように、画像検索システム３は、主として、クローリング部１０−２と、検索部２０−２と、画像ＤＢ３２とで構成される。クローリング部１０−２は、主として、クローリングエンジン１５と、透かし検出装置１６とで構成され、検索部２０−２は、主として、透かし情報入力装置２７と、検索装置２８と、検索結果表示装置２４とで構成される。クローリングエンジン１５には、インターネット４０が接続されている。クローリングエンジン１５には、画像ＤＢ３２が接続され、画像ＤＢ３２は検索装置２８に接続される。 As shown in FIG. 7, the image search system 3 mainly includes a crawling unit 10-2, a search unit 20-2, and an image DB 32. The crawling unit 10-2 mainly includes a crawling engine 15 and a watermark detection device 16. The search unit 20-2 mainly includes a watermark information input device 27, a search device 28, and a search result display device 24. Consists of. The Internet 40 is connected to the crawling engine 15. An image DB 32 is connected to the crawling engine 15, and the image DB 32 is connected to the search device 28.

まず、クローリング部１０−２について説明する。 First, the crawling unit 10-2 will be described.

クローリングエンジン１５は、インターネット４０の電子文書に接続して、そのウェブページのＵＲＬを取得する。また、クローリングエンジン１５は、ＨＴＭＬ等で記述されたリンク情報を辿って、ウェブページに掲載された画像のＵＲＬと、その画像データを取得する。クローリングエンジン１５で収集された画像は、クローリングエンジン１５から透かし検出装置１６に出力される。 The crawling engine 15 connects to an electronic document on the Internet 40 and acquires the URL of the web page. The crawling engine 15 traces link information described in HTML or the like, and acquires the URL of the image posted on the web page and its image data. The images collected by the crawling engine 15 are output from the crawling engine 15 to the watermark detection device 16.

透かし検出装置１６は、クローリングエンジン１５から入力された画像に電子透かしが埋め込まれているか、電子透かしが埋め込まれている場合にはどのような電子透かしが埋め込まれているかを検出する。電子透かしを検出する方法については、様々な方法が公知となっているため、説明を省略する。透かし検出装置１６で電子透かしが検出された場合には、検出された電子透かしの文字列がクローリングエンジン１５に出力される。 The watermark detection device 16 detects whether a digital watermark is embedded in the image input from the crawling engine 15 or what kind of digital watermark is embedded when the digital watermark is embedded. Various methods for detecting a digital watermark are known and will not be described. When a digital watermark is detected by the watermark detection device 16, a character string of the detected digital watermark is output to the crawling engine 15.

クローリングエンジン１５は、透かし検出装置１６で電子透かしが検出された場合には、検出された電子透かしの文字列を、その画像が取得されたウェブページのＵＲＬ、その画像のＵＲＬ等と関連付けて画像ＤＢ３２に格納し、透かし検出装置１６で電子透かしが検出されなかった場合には、その画像が取得されたウェブページのＵＲＬと、その画像のＵＲＬとを関連付けて画像ＤＢ３２に格納する。クローリングエンジン１５は、インターネット４０を自動巡回し、この処理を再帰的に行うことで、画像ＤＢ３２を継続的に増大させる。 When the watermark detection device 16 detects a digital watermark, the crawling engine 15 associates the detected digital watermark character string with the URL of the web page from which the image was acquired, the URL of the image, and the like. When the digital watermark is not detected by the watermark detection device 16 stored in the DB 32, the URL of the web page from which the image is acquired is associated with the URL of the image and stored in the image DB 32. The crawling engine 15 automatically circulates the Internet 40 and recursively performs this process, thereby continuously increasing the image DB 32.

画像ＤＢ３２のデータ構造を図８に示す。画像ＤＢ３２は、画像が掲載されているウェブページのＵＲＬと、そのウェブページに掲載されている画像のＵＲＬと、その画像に埋め込まれた電子透かしの文字列とを関連付けて保存する。図８に示す場合においては、１行目、２行目、４行目、６行目及び７行目には、「Copyright fujifilm」という文字列が格納されており、この５つの画像からは「Copyright fujifilm」という文字列からなる電子透かしが検出されたことを示している。また、３行目及び５行目は、電子透かしの文字列が格納されておらず、この２つの画像からは電子透かしが検出されなかったことを示している。 The data structure of the image DB 32 is shown in FIG. The image DB 32 stores the URL of the web page on which the image is posted, the URL of the image posted on the web page, and the character string of the digital watermark embedded in the image in association with each other. In the case shown in FIG. 8, the first line, the second line, the fourth line, the sixth line, and the seventh line store the character string “Copyright fujifilm”. This indicates that a digital watermark composed of the character string “Copyright fujifilm” has been detected. The third and fifth lines indicate that no digital watermark character string is stored, and no digital watermark was detected from these two images.

次に、検索部２０−２について説明する。 Next, the search unit 20-2 will be described.

透かし情報入力装置２７は、検出対象となる画像に埋め込まれた電子透かしに関する情報、例えば電子透かしの文字列を入力する。 The watermark information input device 27 inputs information about a digital watermark embedded in an image to be detected, for example, a character string of the digital watermark.

検索装置２８は、透かし情報入力装置２７から出力された文字列と同一の文字列が画像ＤＢ３２に含まれるかどうかを検索し、その文字列に関連付けられたウェブページのＵＲＬと、画像のＵＲＬとを画像ＤＢ３２から取得する。 The search device 28 searches the image DB 32 for whether or not the character string identical to the character string output from the watermark information input device 27 is included, and the URL of the web page associated with the character string, the URL of the image, Is acquired from the image DB 32.

例えば、透かし情報入力装置２７から検索装置２８に「Copyright fujifilm」という文字列が入力されたとする。図８に示すように、画像ＤＢ３２には、「Copyright fujifilm」という文字列が１行目、２行目、４行目、６行目及び７行目に格納されているため、検索装置２８は、１行目、２行目、４行目、６行目及び７行目に格納されているウェブページのＵＲＬと、画像のＵＲＬとを取得する。 For example, it is assumed that a character string “Copyright fujifilm” is input from the watermark information input device 27 to the search device 28. As shown in FIG. 8, the character string “Copyright fujifilm” is stored in the first row, the second row, the fourth row, the sixth row, and the seventh row in the image DB 32. The URL of the web page and the URL of the image stored in the first line, the second line, the fourth line, the sixth line, and the seventh line are acquired.

検索装置２８は、この検索結果を一覧形式で検索結果表示装置２４に出力する。検索結果表示装置２４は、カラー表示が可能な液晶ディスプレイであり、検索装置２８から出力された検索結果が検索結果表示装置２４に表示される。 The search device 28 outputs the search results to the search result display device 24 in a list format. The search result display device 24 is a liquid crystal display capable of color display, and the search result output from the search device 28 is displayed on the search result display device 24.

本実施の形態によれば、電子透かしを用いることで、画像に文字が上書きされる、画像がトリミングされるなどの加工が行われた場合においても、その画像が掲載されているウェブページを検索することができる。そのため、手元にある画像の掲載元を事後的に知ることができる。また、その画像のインターネット上での利用、転載の状況を知ることができる。 According to this embodiment, even when processing such as overwriting characters on an image or trimming an image is performed by using a digital watermark, a web page on which the image is posted is searched. can do. Therefore, it is possible to know the publisher of the image at hand afterwards. In addition, it is possible to know the use and reprint status of the image on the Internet.

なお、本実施の形態では、電子透かしが検出されなかった画像についても画像ＤＢに保存したが、電子透かしが検出された画像のみを画像ＤＢに保存するようにしてもよい。 In the present embodiment, the image in which the digital watermark is not detected is also stored in the image DB. However, only the image in which the digital watermark is detected may be stored in the image DB.

なお、本実施の形態は、第１の実施の形態や第２の実施の形態と組み合わせて用いることでより精度の高い検出を行うことができる。 Note that this embodiment can perform detection with higher accuracy when used in combination with the first embodiment or the second embodiment.

なお、本発明は、クローリング部、検索部、画像ＤＢが全て含まれた装置として提供してもよいし、２つ以上の装置からなるシステムとして提供してもよい。また、装置は、ＰＣでもよいし、携帯端末でもよい。また、装置に限らず、装置に適用するプログラムとして提供することもできる。 Note that the present invention may be provided as a device that includes all of the crawling unit, the search unit, and the image DB, or may be provided as a system that includes two or more devices. The apparatus may be a PC or a mobile terminal. Further, the present invention is not limited to the device, and can be provided as a program applied to the device.

本発明が適用された画像検索システム１の概略図である。1 is a schematic diagram of an image search system 1 to which the present invention is applied. 画像検索システム１の画像ＤＢ３０に保存されるデータの一例である。It is an example of the data preserve | saved at image DB30 of the image search system 1. FIG. 画像検索システム１の検索結果の表示例である。It is a display example of the search result of the image search system 1. 本発明が適用された画像検索システム２の概略図である。1 is a schematic diagram of an image search system 2 to which the present invention is applied. 特徴量の算出について説明する図である。It is a figure explaining calculation of a feature-value. 画像検索システム２の画像ＤＢ３１に保存されるデータの一例である。It is an example of the data preserve | saved at image DB31 of the image search system 2. FIG. 本発明が適用された画像検索システム３の概略図である。1 is a schematic diagram of an image search system 3 to which the present invention is applied. 画像検索システム３の画像ＤＢ３２に保存されるデータの一例である。It is an example of the data preserve | saved at image DB32 of the image search system 3. FIG.

Explanation of symbols

１、２、３：画像検索システム、１０、１０−１、１０−２：クローリング部、１１、１３、１５：クローリングエンジン、１２：画像ハッシュ算出装置、１４：画像特徴量算出装置、１５：入力部、１６：透かし検出装置、２０、２０−１、２０−２：検索部、２１：画像入力装置、２２：画像ハッシュ算出装置、２３、２８：検索装置、２４：検索結果表示装置、２５：画像特徴量算出装置、２６：類似検索装置、２７：透かし情報入力装置、３０、３１、３２：画像ＤＢ、４０：インターネット 1, 2, 3: Image search system 10, 10-1, 10-2: Crawling unit, 11, 13, 15: Crawling engine, 12: Image hash calculation device, 14: Image feature amount calculation device, 15: Input Unit: 16: watermark detection device, 20, 20-1, 20-2: search unit, 21: image input device, 22: image hash calculation device, 23, 28: search device, 24: search result display device, 25: Image feature amount calculation device, 26: similarity search device, 27: watermark information input device, 30, 31, 32: image DB, 40: Internet

Claims

An image search system composed of a crawling device and an image search device,
The crawling device is
A crawling means for collecting a URL of a web page and an image included in the web page from the Internet;
First identification information calculation means for calculating identification information of the image based on image data of the image collected by the crawling means;
The identification information calculated by the first identification information calculation means, and the URL of the web page on which the image serving as the basis of the identification information was posted, and the URL of the web page collected by the crawling means A database to be stored in association,
The image search device includes:
Search image acquisition means for acquiring an image to be searched;
Second identification information calculation means for calculating identification information of the image from the image acquired by the search image acquisition means;
It is searched whether the same identification information as the identification information calculated by the second identification information calculation means is included in the database, and the same identification information as the identification information calculated by the second identification information calculation means is If included in the database, search means for obtaining the URL of the web page associated with the identification information from the database;
Output means for outputting the URL acquired by the search means;
An image search system comprising:

The crawling means acquires the URL of the image together with the image included in the web page,
The database stores the URL of the image acquired by the crawling means in association with the identification information of the image,
The image search system according to claim 1, wherein the search unit acquires the URL of a web page and the URL of an image associated with the identification information from the database.

The crawling device includes a digital watermark detection unit that detects whether or not a digital watermark is inserted in the image collected by the crawling unit, and detects information related to the digital watermark when the digital watermark is inserted,
The image search device includes digital watermark information acquisition means for acquiring information related to a digital watermark to be searched,
The database stores information related to the digital watermark detected by the digital watermark detection means in association with identification information of an image in which the information related to the digital watermark is detected,
The search means searches for whether the same information as the information related to the digital watermark acquired by the digital watermark information acquisition means is included in the database, and is the same as the information related to the digital watermark acquired by the digital watermark information acquisition means 3. The image search according to claim 1, wherein when the information is included in the database, the URL and identification information of the web page associated with the information related to the digital watermark are acquired from the database. system.

The image search system according to claim 1, wherein the identification information is a hash value calculated by applying a hash function to the image.

The image search system according to any one of claims 1 to 3, wherein the identification information is a feature amount indicating a feature unique to the image.

An image search system composed of a crawling device and an image search device,
The crawling device is
Crawling means for collecting a URL of a web page from the Internet and a still image included in the web page;
Whether or not a digital watermark is inserted in the still image collected by the crawling means, and if a digital watermark is inserted, a digital watermark detection means for detecting information relating to the digital watermark;
A database that stores URLs of web pages collected by the crawling means in association with information about the digital watermark detected by the digital watermark detection means, and
The image search device includes:
Digital watermark information acquisition means for acquiring information related to a digital watermark to be searched;
It is searched whether or not the same information as the digital watermark information acquired by the digital watermark information acquisition means is included in the database, and the same information as the digital watermark information acquired by the digital watermark information acquisition means is stored in the database. If included, search means for acquiring the URL of the web page associated with the information related to the digital watermark;
Output means for outputting the URL of the web page acquired by the search means;
An image search system comprising:

The crawling means acquires the URL of the image together with the image included in the web page,
The database stores the URL of the image acquired by the crawling means in association with information on the digital watermark of the image,
The image search system according to claim 6, wherein the search unit acquires the URL of the web page and the URL of the image associated with the information related to the digital watermark of the image from the database.

A crawling device constituting the image search system according to claim 1.

An image search apparatus constituting the image search system according to claim 1.