JP2023130409A

JP2023130409A - Information processing device, information processing method, and program

Info

Publication number: JP2023130409A
Application number: JP2023106533A
Authority: JP
Inventors: 侑吾西川; Yugo Nishikawa; 拓也生駒; Takuya Ikoma; 昌希内田; Masaki Uchida; 直之伊藤; Naoyuki Ito
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2019-02-28
Filing date: 2023-06-28
Publication date: 2023-09-20
Also published as: JP2020140488A

Abstract

To provide an information processing device, a method, and a program that can appropriately extract an image containing an object.SOLUTION: In an image extraction system, an information processing device 1 comprises: a search unit 10a that searches for images related to an object by using text via a network; an extraction unit 10b that extracts a feature value of each image found; a classification unit 10c that clusters the feature value of each image; an image extraction unit 10d that extracts an image according to the number of images constructing a cluster of feature values; and a detection unit 10f that detects an object contained in the image.SELECTED DRAWING: Figure 15

Description

本発明は、情報処理装置、情報処理方法及びプログラムに関する。 The present invention relates to an information processing device, an information processing method, and a program.

画像認識技術に基づき、多数の画像から所望の画像を抽出する技術がある。例えば、特許文献１には、タブレット端末等の情報処理端末からインターネットを介してサーバにアクセスし、所望の写真を注文する写真注文システムが開示されている。当該写真注文システムでは、ユーザが予め特定人物を登録しておくことで、サーバから取得した複数の写真の中から特定人物が写っている写真が抽出され、ユーザ端末に表示される。 There is a technology based on image recognition technology that extracts a desired image from a large number of images. For example, Patent Document 1 discloses a photo ordering system that accesses a server via the Internet from an information processing terminal such as a tablet terminal and orders a desired photo. In this photo ordering system, when a user registers a specific person in advance, a photo in which the specific person appears is extracted from a plurality of photos obtained from the server and displayed on the user terminal.

特開２０１３－１６１３９４号公報Japanese Patent Application Publication No. 2013-161394

しかしながら、特許文献１に係る発明は、特定人物と特徴が類似する人物の写真も抽出されてしまう恐れがある。 However, in the invention according to Patent Document 1, there is a risk that a photograph of a person whose characteristics are similar to a specific person may also be extracted.

一つの側面では、対象物を含む画像を適切に抽出することが可能な情報処理装置等を提供することにある。 One aspect of the present invention is to provide an information processing device or the like that can appropriately extract an image including a target object.

一つの側面に係る情報処理装置は、ネットワークを介して対象物に関連する画像をテキストを用いて検索する検索部と、検索された各画像の特徴量を抽出する抽出部と、各画像の特徴量をクラスタリングする分類部と、前記特徴量のクラスタを構成する画像の数に応じて、前記画像を抽出する画像抽出部と、前記画像に含まれる物体を検出する検出部とを備え、前記画像抽出部は、検出された前記物体の種類に応じて前記画像を抽出することを特徴とする。 An information processing device according to one aspect includes a search unit that uses text to search for images related to a target object via a network, an extraction unit that extracts feature amounts of each searched image, and a search unit that searches for images related to an object via a network using text; comprising: a classification unit that clusters feature quantities; an image extraction unit that extracts the images according to the number of images forming a cluster of the feature quantities; and a detection unit that detects objects included in the images; The image extracting unit extracts the image according to the type of the detected object.

一つの側面では、対象物を含む画像を適切に抽出することが可能となる。 In one aspect, it becomes possible to appropriately extract an image including a target object.

画像抽出システムの概要を示す説明図である。FIG. 1 is an explanatory diagram showing an overview of an image extraction system. サーバの構成例を示すブロック図である。FIG. 2 is a block diagram showing a configuration example of a server. 検索履歴ＤＢのレコードレイアウトの一例を示す説明図である。FIG. 2 is an explanatory diagram showing an example of a record layout of a search history DB. 抽出画像ＤＢのレコードレイアウトの一例を示す説明図である。FIG. 2 is an explanatory diagram showing an example of a record layout of an extracted image DB. 画像特徴量を抽出する動作を説明する説明図である。FIG. 3 is an explanatory diagram illustrating an operation of extracting image feature amounts. 画像特徴量に基づくクラスタリング処理を示す説明図である。FIG. 3 is an explanatory diagram showing clustering processing based on image feature amounts. 画像特徴量に基づくクラスタ分類により所望画像を抽出する際の処理手順を示すフローチャートである。12 is a flowchart showing a processing procedure for extracting a desired image by cluster classification based on image feature amounts. 実施形態２のサーバの構成例を示すブロック図である。FIG. 2 is a block diagram illustrating a configuration example of a server according to a second embodiment. 頻度集計ＤＢのレコードレイアウトの一例を示す説明図である。FIG. 2 is an explanatory diagram showing an example of a record layout of a frequency aggregation DB. 各種類の物体の出現頻度に応じて所望画像を抽出する動作を説明する説明図である。FIG. 6 is an explanatory diagram illustrating an operation of extracting a desired image according to the appearance frequency of each type of object. 各種類の物体の出現頻度に応じて所望画像を抽出する際の処理手順を示すフローチャートである。12 is a flowchart showing a processing procedure for extracting a desired image according to the appearance frequency of each type of object. 実施形態３のサーバの構成例を示すブロック図である。FIG. 3 is a block diagram illustrating a configuration example of a server according to a third embodiment. 類語辞書のレコードレイアウトの一例を示す説明図である。It is an explanatory view showing an example of a record layout of a thesaurus dictionary. 類語を用いて所望画像を抽出する際の処理手順を示すフローチャートである。12 is a flowchart showing a processing procedure when extracting a desired image using synonyms. 上述した形態のサーバの動作を示す機能ブロック図である。It is a functional block diagram showing operation of the server of the form mentioned above.

以下、本発明をその実施形態を示す図面に基づいて詳述する。 DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described in detail below based on drawings showing embodiments thereof.

（実施形態１）
実施形態１は、多数の画像から画像特徴量に基づくクラスタ分類により所望画像を抽出する形態に関する。図１は、画像抽出システムの概要を示す説明図である。本実施形態のシステムは、情報処理装置１及び情報共有サーバ２を含み、各装置はインターネット等のネットワークＮを介して情報の送受信を行う。 (Embodiment 1)
Embodiment 1 relates to a mode in which a desired image is extracted from a large number of images by cluster classification based on image feature amounts. FIG. 1 is an explanatory diagram showing an overview of an image extraction system. The system of this embodiment includes an information processing device 1 and an information sharing server 2, and each device transmits and receives information via a network N such as the Internet.

情報処理装置１は、種々の情報に対する処理、記憶及び送受信を行う情報処理装置である。情報処理装置１は、例えばサーバ装置、パーソナルコンピュータ等である。本実施形態において、情報処理装置１はサーバ装置であるものとし、以下では簡潔のためサーバ１と読み替える。 The information processing device 1 is an information processing device that processes, stores, transmits and receives various information. The information processing device 1 is, for example, a server device, a personal computer, or the like. In this embodiment, the information processing device 1 is assumed to be a server device, and will be read as server 1 below for brevity.

情報共有サーバ２は、ＳＮＳ（Social Networking Service：ソーシャル・ネットワーク・サービス）を管理するサーバ装置、またはウェブ検索エンジンとして機能するサーバ装置である。なお、本実施形態において、情報共有サーバ２はＳＮＳを管理するサーバ装置であるものとし、以下では簡潔のためＳＮＳサーバ２と読み替える。ＳＮＳサーバ２は、ユーザの登録・管理、ユーザが投稿したテキスト情報または画像の管理等、種々の情報に対する処理、記憶及び送受信を行う情報処理装置である。 The information sharing server 2 is a server device that manages an SNS (Social Networking Service) or a server device that functions as a web search engine. In this embodiment, the information sharing server 2 is assumed to be a server device that manages SNS, and will be read as SNS server 2 below for brevity. The SNS server 2 is an information processing device that processes, stores, transmits and receives various information, such as registering and managing users and managing text information or images posted by users.

本実施形態に係るサーバ１は、不特定多数の人物がネットワークＮを介してＳＮＳサーバ２にアップロードした各投稿記事に含まれる画像から、ある対象物が被写体として含まれる画像を収集する。例えばマーケティングへの応用を想定して、サーバ１は、商品名（対象物の名称）を検索クエリとして検索を行い、ある商品が撮像された画像を収集する。 The server 1 according to the present embodiment collects images that include a certain object as a subject from images included in each posted article uploaded by an unspecified number of people to the SNS server 2 via the network N. For example, assuming application to marketing, the server 1 performs a search using a product name (name of an object) as a search query, and collects images of a certain product.

一方で、商品名を検索クエリとした画像検索を行った場合であっても、商品名と同表記の異義語や検索テキストを含む商品に関係ない被写体の画像が検索される恐れがある。この場合、例えばディープラーニング、パターンマッチング等による画像認識を行い、画像から対象商品を認識して所望の画像を抽出することが考えられる。 On the other hand, even when performing an image search using a product name as a search query, there is a risk that images of subjects unrelated to the product may be retrieved, including synonyms or search text that have the same expression as the product name. In this case, it is conceivable to perform image recognition using deep learning, pattern matching, etc., to recognize the target product from the image, and extract the desired image.

しかし、ＳＮＳから収集する画像群から特定の対象物（個体）のみを認識して抽出することには困難が伴う。例えば不特定多数の画像群から特定の「犬」の画像を抽出する場合、犬らしい被写体を含む画像を抽出することは容易であっても、その中から特定の犬（個体）の画像を抽出することは容易ではない。この場合、例えば該当する犬のみの特徴をディープラーニング等で学習したモデルや、あるいは該当する犬を認識するためのパターンマッチングのモデル（ルール）を事前に用意しなければならず、困難が伴う。 However, it is difficult to recognize and extract only specific objects (individuals) from a group of images collected from SNS. For example, when extracting an image of a specific "dog" from a group of unspecified large numbers of images, even if it is easy to extract images that include dog-like subjects, the image of a specific dog (individual) is extracted from among them. It's not easy to do. In this case, for example, a model that has learned only the characteristics of the relevant dog using deep learning or the like, or a pattern matching model (rule) for recognizing the relevant dog must be prepared in advance, which is difficult.

そこで本実施形態では、ＳＮＳから検索した画像群のクラスタリングを行い、クラスタリング結果から、対象商品が含まれるものと推定される画像を推定して抽出する。具体的には、サーバ１は、後述するように機械学習によって構築したモデルを用いて各画像の特徴量を抽出し、抽出した特徴量を複数のクラスタに分割する。なお、特徴量抽出処理の詳細については後述する。 Therefore, in this embodiment, clustering is performed on a group of images searched from SNS, and from the clustering results, images that are estimated to include the target product are estimated and extracted. Specifically, the server 1 extracts the feature amount of each image using a model constructed by machine learning as described later, and divides the extracted feature amount into a plurality of clusters. Note that details of the feature amount extraction process will be described later.

サーバ１は、各クラスタに分割した特徴量のうち、同一クラスタ内の同じ元画像の複数の特徴量は１つとしてカウントした画像数を集計する。そしてサーバ１は、集計した各クラスタの画像数に基づき、いずれかのクラスタに分類された画像群が対象物を含む画像群であるものとして抽出する。例えばサーバ１は、画像数が最多のクラスタに分類された画像群を抽出する。 Among the feature amounts divided into each cluster, the server 1 counts a plurality of feature amounts of the same original image in the same cluster as one and totals the number of images. Based on the total number of images in each cluster, the server 1 extracts a group of images classified into any cluster as a group of images containing a target object. For example, the server 1 extracts a group of images classified into a cluster with the largest number of images.

上述の如く、サーバ１は対象物の名称（商品名）を検索クエリとして画像検索を行っている。従って、検索された画像では、対象物が被写体として含まれる画像が、その他の物体を含む画像よりも多いと推定される。そこで本実施形態では、上記のようにクラスタリングを行い、画像数が最多のクラスタに属する画像群を抽出する。これにより、対象物自体を認識せずとも、対象物が含まれるものと推定される画像群を抽出する。 As described above, the server 1 performs an image search using the name of the object (product name) as a search query. Therefore, among the searched images, it is estimated that there are more images that include the target object as a subject than images that include other objects. Therefore, in this embodiment, clustering is performed as described above, and a group of images belonging to the cluster with the largest number of images is extracted. As a result, a group of images that are estimated to include the target object are extracted without recognizing the target object itself.

図２は、サーバ１の構成例を示すブロック図である。サーバ１は、制御部１１、記憶部１２、通信部１３、入力部１４、表示部１５、読取部１６及び大容量記憶部１７を含む。各構成はバスＢで接続されている。 FIG. 2 is a block diagram showing an example of the configuration of the server 1. As shown in FIG. The server 1 includes a control section 11 , a storage section 12 , a communication section 13 , an input section 14 , a display section 15 , a reading section 16 , and a large-capacity storage section 17 . Each configuration is connected by bus B.

制御部１１はＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro-Processing Unit）、ＧＰＵ（Graphics Processing Unit）等の演算処理装置を含み、記憶部１２に記憶された制御プログラム１Ｐを読み出して実行することにより、サーバ１に係る種々の情報処理、制御処理等を行う。なお、図２では制御部１１を単一のプロセッサであるものとして説明するが、マルチプロセッサであっても良い。 The control unit 11 includes arithmetic processing units such as a CPU (Central Processing Unit), an MPU (Micro-Processing Unit), and a GPU (Graphics Processing Unit), and reads and executes the control program 1P stored in the storage unit 12. , performs various information processing, control processing, etc. related to the server 1. Note that although the control unit 11 is described as being a single processor in FIG. 2, it may be a multiprocessor.

記憶部１２はＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）等のメモリ素子を含み、制御部１１が処理を実行するために必要な制御プログラム１Ｐ又はデータ等を記憶している。また、記憶部１２は、制御部１１が演算処理を実行するために必要なデータ等を一時的に記憶する。通信部１３は通信に関する処理を行うための通信モジュールであり、ネットワークＮを介して、ＳＮＳサーバ２との間で情報の送受信を行う。 The storage unit 12 includes memory elements such as a RAM (Random Access Memory) and a ROM (Read Only Memory), and stores a control program 1P or data necessary for the control unit 11 to execute processing. Further, the storage unit 12 temporarily stores data and the like necessary for the control unit 11 to perform arithmetic processing. The communication unit 13 is a communication module for performing communication-related processing, and sends and receives information to and from the SNS server 2 via the network N.

入力部１４は、マウス、キーボード、タッチパネル、ボタン等の入力デバイスであり、受け付けた操作情報を制御部１１へ出力する。表示部１５は、液晶ディスプレイ又は有機ＥＬ（electroluminescence）ディスプレイ等であり、制御部１１の指示に従い各種情報を表示する。 The input unit 14 is an input device such as a mouse, a keyboard, a touch panel, a button, etc., and outputs the received operation information to the control unit 11. The display unit 15 is a liquid crystal display, an organic EL (electroluminescence) display, or the like, and displays various information according to instructions from the control unit 11.

読取部１６は、ＣＤ（Compact Disc）－ＲＯＭ又はＤＶＤ（Digital Versatile Disc）－ＲＯＭを含む可搬型記憶媒体１ａを読み取る。制御部１１が読取部１６を介して、制御プログラム１Ｐを可搬型記憶媒体１ａより読み取り、大容量記憶部１７に記憶しても良い。また、ネットワークＮ等を介して他のコンピュータから制御部１１が制御プログラム１Ｐをダウンロードし、大容量記憶部１７に記憶しても良い。さらにまた、半導体メモリ１ｂから、制御部１１が制御プログラム１Ｐを読み込んでも良い。 The reading unit 16 reads a portable storage medium 1a including a CD (Compact Disc)-ROM or a DVD (Digital Versatile Disc)-ROM. The control unit 11 may read the control program 1P from the portable storage medium 1a via the reading unit 16 and store it in the mass storage unit 17. Alternatively, the control unit 11 may download the control program 1P from another computer via the network N or the like and store it in the mass storage unit 17. Furthermore, the control unit 11 may read the control program 1P from the semiconductor memory 1b.

大容量記憶部１７は、例えばハードディスク等を含む大容量の記憶装置である。大容量記憶部１７は、検索履歴ＤＢ１７１、抽出画像ＤＢ１７２及び物体検出モデル１７３を含む。検索履歴ＤＢ１７１は、ＳＮＳサーバ２から検索した対象物に関連する画像の情報を記憶している。抽出画像ＤＢ１７２は、検索された画像の内、対象物を含むものとして抽出した画像に関する情報を記憶している。物体検出モデル１７３は、画像内の物体を検出する検出器であり、機械学習により生成された学習済みモデルである。 The large-capacity storage unit 17 is a large-capacity storage device including, for example, a hard disk. The large-capacity storage unit 17 includes a search history DB 171, an extracted image DB 172, and an object detection model 173. The search history DB 171 stores information on images related to objects searched from the SNS server 2. The extracted image DB 172 stores information regarding images extracted as containing objects among the searched images. The object detection model 173 is a detector that detects objects in images, and is a trained model generated by machine learning.

なお、本実施形態において記憶部１２及び大容量記憶部１７は一体の記憶装置として構成されていても良い。また、大容量記憶部１７は複数の記憶装置により構成されていても良い。更にまた、大容量記憶部１７はサーバ１に接続された外部記憶装置であっても良い。 Note that in this embodiment, the storage section 12 and the large-capacity storage section 17 may be configured as an integrated storage device. Furthermore, the large-capacity storage section 17 may be configured with a plurality of storage devices. Furthermore, the large-capacity storage section 17 may be an external storage device connected to the server 1.

なお、本実施形態では、サーバ１は一台の情報処理装置であるものとして説明するが、複数台により分散して処理させても良く、または仮想マシンにより構成されていても良い。 In this embodiment, the server 1 will be described as one information processing device, but the server 1 may be distributed and processed by a plurality of devices, or may be configured by a virtual machine.

図３は、検索履歴ＤＢ１７１のレコードレイアウトの一例を示す説明図である。検索履歴ＤＢ１７１は、検索ＩＤ列、検索クエリ列、検索日時列及び検索画像列を含む。検索ＩＤ列は、各検索の履歴データを識別するために、一意に特定される検索の履歴データのＩＤを記憶している。検索クエリ列は、対象物を検索する際の使われる検索クエリを記憶している。検索クエリは、例えばキーワード検索、ハッシュタグ検索等に用いる単語である。検索日時列は、対象物に関連する画像を検索した日時情報を記憶している。検索画像列は、検索された対象物に関連する画像を記憶している。 FIG. 3 is an explanatory diagram showing an example of the record layout of the search history DB 171. The search history DB 171 includes a search ID string, a search query string, a search date and time string, and a search image string. The search ID column stores IDs of search history data that are uniquely specified in order to identify the history data of each search. The search query string stores a search query used when searching for an object. A search query is a word used for a keyword search, a hashtag search, etc., for example. The search date and time column stores information on the date and time when images related to the object were searched. The search image string stores images related to the searched object.

図４は、抽出画像ＤＢ１７２のレコードレイアウトの一例を示す説明図である。抽出画像ＤＢ１７２は、検索ＩＤ列及び抽出画像列を含む。検索ＩＤ列は、対象物に関連する画像を検索した履歴データのＩＤを記憶している。抽出画像列は、収集対象外の画像を取り除いて収集対象の画像を記憶している。 FIG. 4 is an explanatory diagram showing an example of the record layout of the extracted image DB 172. The extracted image DB 172 includes a search ID string and an extracted image string. The search ID column stores IDs of historical data that have been searched for images related to the target object. The extracted image sequence stores images to be collected after removing images that are not to be collected.

図５は、画像特徴量を抽出する動作を説明する説明図である。サーバ１の制御部１１は、対象物を表す検索クエリ（例えば、対象物の固有名称）を入力部１４により受け付ける。制御部１１は、受け付けた検索クエリに基づき、不特定多数の人物がネットワークＮを介してＳＮＳサーバ２にアップロードした各投稿記事に含まれる画像から、対象物に関連する画像群を通信部１３により検索する。 FIG. 5 is an explanatory diagram illustrating the operation of extracting image feature amounts. The control unit 11 of the server 1 receives a search query representing a target object (for example, a unique name of the target object) through the input unit 14 . Based on the received search query, the control unit 11 causes the communication unit 13 to select a group of images related to the object from images included in each posted article uploaded by an unspecified number of people to the SNS server 2 via the network N. search for.

制御部１１は、検索した対象物に関連する画像群を大容量記憶部１７の検索履歴ＤＢ１７１に記憶する。具体的には、制御部１１は、検索ＩＤを振って、検索クエリ、検索日時、及び検索された対象物に関連する画像群を一つのレコードとして検索履歴ＤＢ１７１に記憶する。 The control unit 11 stores a group of images related to the searched object in the search history DB 171 of the mass storage unit 17. Specifically, the control unit 11 assigns a search ID and stores the search query, search date and time, and a group of images related to the searched object as one record in the search history DB 171.

制御部１１は、検索した画像ごとに、画像内に含まれる各被写体（物体）に対応する画像領域の特徴量を抽出する。例えば制御部１１は、画像内の物体を検出する物体検出モデル１７３であって、ディープラーニングにより構築された物体検出モデル１７３の一部を用いて画像特徴量を抽出する。 For each searched image, the control unit 11 extracts the feature amount of the image region corresponding to each subject (object) included in the image. For example, the control unit 11 is an object detection model 173 that detects an object in an image, and extracts image features using a part of the object detection model 173 constructed by deep learning.

サーバ１の制御部１１は、所定の教師データを用いて教師用画像の特徴量を学習するディープラーニングを行うことで物体検出モデル１７３を構築（生成）する。例えば、物体検出モデル１７３はＣＮＮ（Convolution Neural Network）であり、画像の入力を受け付ける入力層と、画像内の物体を検出した検出結果を出力する出力層と、各物体に対応する画像領域の特徴量を抽出する中間層とを有する。 The control unit 11 of the server 1 constructs (generates) the object detection model 173 by performing deep learning to learn the feature amount of the teacher image using predetermined teacher data. For example, the object detection model 173 is a CNN (Convolution Neural Network), which includes an input layer that receives image input, an output layer that outputs the detection results of detecting objects in the image, and features of the image area corresponding to each object. and an intermediate layer that extracts the amount.

入力層は、検索された対象物に関する画像群に含まれる各画像の画素の画素値の入力を受け付ける複数のニューロンを有し、入力された画素値を中間層に受け渡す。中間層は、画像特徴量を抽出する複数のニューロンを有し、抽出した画像領域の特徴量を出力層に受け渡す。例えば物体検出モデル１７３がＣＮＮである場合、中間層は、入力層から入力された各画素の画素値を畳み込むコンボリューション層と、コンボリューション層で畳み込んだ画素値をマッピングするプーリング層とが交互に連結された構成を有する。中間層は、領域の画素情報を圧縮しながら最終的に画像特徴量を抽出する。出力層は、中間層から出力された画像特徴量に基づいて、画像内の物体を検出した検出結果を出力する。 The input layer has a plurality of neurons that accept input of pixel values of pixels of each image included in the group of images related to the searched object, and passes the input pixel values to the intermediate layer. The intermediate layer has a plurality of neurons that extract image features, and passes the extracted features of the image region to the output layer. For example, when the object detection model 173 is a CNN, the intermediate layer alternates between a convolution layer that convolves the pixel value of each pixel input from the input layer and a pooling layer that maps the pixel values convolved with the convolution layer. It has a configuration connected to. The intermediate layer compresses the pixel information of the region and finally extracts the image feature amount. The output layer outputs a detection result of detecting an object in the image based on the image feature amount output from the intermediate layer.

例えば制御部１１は、物体検出モデル１７３として、ＣＮＮの一種であるＲ－ＣＮＮ（Regins with CNN）、セマンティックセグメンテーション、ＹＯＬＯ（You Look Only Once）、ＳＳＤ（Single Shot MultiBox Detector）等のニューラルネットワークを構築してある。Ｒ－ＣＮＮ、セマンティックセグメンテーション等はいずれも、画像内に含まれる各物体の画像領域を特定して、特定した画像領域毎に各物体が何であるかを識別するニュ
ーラルネットワークである。物体検出モデル１７３は、図５に示すように、入力層において画像の入力を受け付けた場合、中間層にて画像内の各物体に対応する画像領域を特定して各領域の特徴量を抽出し、各画像領域に含まれる物体が何であるかを識別した識別結果が出力層から出力されるよう構成されている。 For example, the control unit 11 constructs a neural network such as R-CNN (Regins with CNN), which is a type of CNN, semantic segmentation, YOLO (You Look Only Once), SSD (Single Shot MultiBox Detector), etc. as the object detection model 173. It has been done. R-CNN, semantic segmentation, etc. are all neural networks that identify the image region of each object included in an image and identify what each object is for each identified image region. As shown in FIG. 5, when the object detection model 173 receives an image input in the input layer, the intermediate layer identifies the image region corresponding to each object in the image and extracts the feature amount of each region. , the output layer outputs an identification result that identifies the object included in each image region.

しかしながら、既に述べたように、特定の対象物（個体）を検出可能な物体検出モデル１７３を構築することは難しく、物体検出モデル１７３は画像内の物体の一般名称（種類）を検出するに留まる。そこで本実施の形態では、物体検出モデル１７３を用いて対象物を直接検出するのではなく、物体検出モデル１７３から出力層を除去して入力層及び中間層のみを用いることで、物体検出モデル１７３の一部を、各物体に対応する画像領域の特徴量を抽出するための抽出モデルとして用いる（図５下側参照）。制御部１１は物体検出モデル１７３（の入力層及び中間層）を用いて、ＳＮＳから検索した各画像から、画像内に含まれる各物体に対応する画像領域の特徴量を抽出する。 However, as already mentioned, it is difficult to construct an object detection model 173 that can detect a specific object (individual), and the object detection model 173 only detects the general name (type) of an object in an image. . Therefore, in this embodiment, instead of directly detecting the target object using the object detection model 173, the output layer is removed from the object detection model 173 and only the input layer and intermediate layer are used. is used as an extraction model for extracting the feature amount of the image region corresponding to each object (see the lower side of FIG. 5). The control unit 11 uses the object detection model 173 (input layer and intermediate layer thereof) to extract feature amounts of image regions corresponding to each object included in the image from each image retrieved from the SNS.

なお、本実施の形態において制御部１１は、ディープラーニングによって構築された物体検出モデル１７３を用いて画像特徴量を抽出するものとするが、例えば制御部１１は、Ａ－ＫＡＺＥ（Accelerated KAZE）、ＳＩＦＴ（Scale Invariant Feature Transform）、ＳＵＲＦ（Speeded-Up Robust Features）、ＯＲＢ（Oriented FAST and Rotated BRIEF）、ＨＯＧ(Histograms of Oriented Gradients)等の局所特徴量抽出方法を用いて特徴量を抽出しても良い。すなわち制御部１１は、検索した各画像の特徴量を抽出可能であればよく、その抽出方法は特に限定されない。 Note that in this embodiment, the control unit 11 extracts the image feature amount using the object detection model 173 constructed by deep learning. Even if features are extracted using local feature extraction methods such as SIFT (Scale Invariant Feature Transform), SURF (Speed-Up Robust Features), ORB (Oriented FAST and Rotated BRIEF), and HOG (Histograms of Oriented Gradients), good. That is, the control unit 11 only needs to be able to extract the feature amount of each searched image, and the extraction method is not particularly limited.

図６は、画像特徴量に基づくクラスタリング処理を示す説明図である。サーバ１の制御部１１は、抽出した各画像領域の特徴量に応じて、クラスタ分類（クラスタリング）を行う。クラスタリング処理に関しては、例えば、Ｋ－ｍｅａｎｓ法（ｋ平均法）、Ｘ－ｍｅａｎｓ法を利用しても良い。Ｋ－ｍｅａｎｓ法は、非階層型クラスタリングのアルゴリズムであり、予め決められたクラスタ数「ｋ」個に分類する。Ｘ－ｍｅａｎｓ法は、Ｋ－ｍｅａｎｓ法の変形であり、最適なクラスタ数「ｋ」を自動的に推定するアルゴリズムである。制御部１１は、各物体に対応する画像領域ごとに行い、各画像領域をいずれかのクラスタに分類する。 FIG. 6 is an explanatory diagram showing clustering processing based on image feature amounts. The control unit 11 of the server 1 performs cluster classification (clustering) according to the extracted feature amount of each image region. Regarding the clustering process, for example, the K-means method (k-means method) or the X-means method may be used. The K-means method is a non-hierarchical clustering algorithm that classifies into a predetermined number of clusters, "k". The X-means method is a modification of the K-means method, and is an algorithm that automatically estimates the optimal number of clusters "k". The control unit 11 performs this for each image area corresponding to each object, and classifies each image area into one of the clusters.

制御部１１は、各クラスタに分類した画像領域の数を集計する。そして、制御部１１は、集計した各クラスタの画像領域の数に基づき、複数のクラスタのいずれかを選択する。例えば制御部１１は、画像数が最多のクラスタを選択しても良い。制御部１１は、選択したクラスタに分類された画像領域を有する画像群を抽出する。 The control unit 11 totals the number of image areas classified into each cluster. Then, the control unit 11 selects one of the plurality of clusters based on the total number of image areas of each cluster. For example, the control unit 11 may select the cluster with the largest number of images. The control unit 11 extracts a group of images having image regions classified into the selected cluster.

図６では、画像Ａ、画像Ｂ、画像Ｃ及び画像Ｄのクラスタリングを行う様子を図示している。例えば、サーバ１の制御部１１は、上述したクラスタリング処理によりクラスタ分類を行い、「クラスタ１」、「クラスタ２」及び「クラスタ３」に分類する。制御部１１は、それぞれの「クラスタ１」、「クラスタ２」及び「クラスタ３」に分類した画像領域の数を集計し、集計した画像領域の数が多いクラスタを選択する。例えば、クラスタそれぞれに分類された画像領域の数が多いクラスタがクラスタ１である場合、制御部１１は、対象物に対し画像領域の特徴量がクラスタ１に多く含まれることを推定してクラスタ１を選択する。これにより、制御部１１は、選択したクラスタ１に分類された画像領域を有する画像群（画像Ａ、画像Ｂ及び画像Ｃ）を抽出する。 FIG. 6 illustrates how images A, B, C, and D are clustered. For example, the control unit 11 of the server 1 performs cluster classification by the above-described clustering process, and classifies them into "cluster 1," "cluster 2," and "cluster 3." The control unit 11 totals the number of image regions classified into each of "cluster 1," "cluster 2," and "cluster 3," and selects a cluster with a large number of total image regions. For example, if cluster 1 is a cluster with a large number of image areas classified into each cluster, the control unit 11 estimates that the feature amount of the image area for the object is included in cluster 1, and selects cluster 1. Select. Thereby, the control unit 11 extracts a group of images (image A, image B, and image C) having image regions classified into the selected cluster 1.

制御部１１は、抽出した対象物の画像群を検索ＩＤに対応付けて大容量記憶部１７の抽出画像ＤＢ１７２に記憶する。また、制御部１１は、上記で抽出した対象物の画像群に対して対象物の情報をラベリングすることで、対象物の画像を学習するための教師データを生成する。具体的には、制御部１１は、抽出した画像において、上記で選択したクラスタ
（上述の例では「クラスタ１」）に分類された画像領域に対して対象物の固有名称（例えば商品名）を関連付けた教師データを生成する。これにより、例えば制御部１１は当該教師データを用いて物体検出モデル１７３の再学習を行うことで、特定の対象物（個体）を検出可能なモデルを構築することができる。 The control unit 11 stores the extracted image group of the target object in the extracted image DB 172 of the mass storage unit 17 in association with the search ID. Further, the control unit 11 generates training data for learning the image of the target object by labeling the image group of the target object extracted above with target object information. Specifically, the control unit 11 assigns a unique name (for example, a product name) of the object to an image region classified into the cluster selected above (“cluster 1” in the above example) in the extracted image. Generate associated training data. Thereby, for example, the control unit 11 can construct a model capable of detecting a specific target object (individual) by relearning the object detection model 173 using the teacher data.

図７は、画像特徴量に基づくクラスタ分類により所望画像を抽出する際の処理手順を示すフローチャートである。サーバ１の制御部１１は、対象物を表す検索クエリ（例えば、対象物の固有名称）を入力部１４により受け付ける（ステップＳ１０１）。制御部１１は、受け付けた検索クエリに基づき、不特定多数の人物がネットワークＮを介してＳＮＳサーバ２にアップロードした各投稿記事に含まれる画像から、対象物に関連する画像群を通信部１３により検索する（ステップＳ１０２）。制御部１１は、検索した対象物に関連する画像群を大容量記憶部１７の検索履歴ＤＢ１７１に記憶する（ステップＳ１０３）。 FIG. 7 is a flowchart showing a processing procedure for extracting a desired image by cluster classification based on image feature amounts. The control unit 11 of the server 1 receives a search query representing an object (for example, a unique name of the object) through the input unit 14 (step S101). Based on the received search query, the control unit 11 causes the communication unit 13 to select a group of images related to the object from images included in each posted article uploaded by an unspecified number of people to the SNS server 2 via the network N. Search (step S102). The control unit 11 stores a group of images related to the searched object in the search history DB 171 of the large-capacity storage unit 17 (step S103).

制御部１１は、検索した対象物に関連する画像ごとに、画像内に含まれる各物体に対応する画像領域の特徴量を抽出する（ステップＳ１０４）。制御部１１は、抽出した各画像領域の特徴量に応じて、クラスタ分類を行う（ステップＳ１０５）。制御部１１は、各クラスタに分類した画像領域の数を集計する（ステップＳ１０６）。制御部１１は、集計した各クラスタに分類した画像領域の数に基づき、複数のクラスタのいずれかを選択する（ステップＳ１０７）。例えば制御部１１は、画像領域の数が最多のクラスタを選択しても良い。制御部１１は、選択したクラスタに分類された画像領域を有する画像群を抽出する（ステップＳ１０８）。 For each image related to the searched object, the control unit 11 extracts the feature amount of the image region corresponding to each object included in the image (step S104). The control unit 11 performs cluster classification according to the extracted feature amount of each image region (step S105). The control unit 11 totals the number of image regions classified into each cluster (step S106). The control unit 11 selects one of the plurality of clusters based on the total number of image regions classified into each cluster (step S107). For example, the control unit 11 may select the cluster with the largest number of image areas. The control unit 11 extracts a group of images having image regions classified into the selected cluster (step S108).

制御部１１は、抽出した画像群を検索ＩＤに対応付けて大容量記憶部１７の抽出画像ＤＢ１７２に記憶する（ステップＳ１０９）。制御部１１は、ステップＳ１０８で抽出した画像群に対し、ステップＳ１０７で選択した画像領域に対象物の情報と関連付けた教師データを生成する（ステップＳ１１０）。 The control unit 11 stores the extracted image group in the extracted image DB 172 of the mass storage unit 17 in association with the search ID (step S109). The control unit 11 generates, for the image group extracted in step S108, teacher data that associates the image region selected in step S107 with object information (step S110).

本実施形態によると、画像特徴量におけるクラスタリング処理を利用し、収集対象外の画像を取り除いて対象物の画像を抽出することができる。よって、対象物と無関係な画像を排除するため、抽出の精度を向上することが可能となる。 According to the present embodiment, it is possible to extract an image of a target object by removing images that are not to be collected by using clustering processing on image feature amounts. Therefore, since images unrelated to the target object are excluded, it is possible to improve the accuracy of extraction.

（実施形態２）
実施形態２は、物体検出モデル１７３によって、画像内の物体の種類（一般名称）を検出し、ＳＮＳから検索された画像群全体での各種類の物体の出現頻度に基づき、所望画像を抽出する形態に関する。なお、実施形態１と重複する内容については説明を省略する。 (Embodiment 2)
In the second embodiment, the object detection model 173 detects the type of object (common name) in an image, and extracts a desired image based on the frequency of appearance of each type of object in the entire group of images retrieved from SNS. Concerning form. Note that descriptions of contents that overlap with those of Embodiment 1 will be omitted.

図８は、実施形態２のサーバ１の構成例を示すブロック図である。なお、図２と重複する内容については同一の符号を付して説明を省略する。大容量記憶部１７は、頻度集計ＤＢ１７４を含む。頻度集計ＤＢ１７４は、複数の画像の中から検出した各種類の物体の出現頻度を記憶している。 FIG. 8 is a block diagram showing a configuration example of the server 1 according to the second embodiment. Note that the same reference numerals are given to the same contents as those in FIG. 2, and the description thereof will be omitted. The large-capacity storage unit 17 includes a frequency aggregation DB 174. The frequency aggregation DB 174 stores the frequency of appearance of each type of object detected from a plurality of images.

図９は、頻度集計ＤＢ１７４のレコードレイアウトの一例を示す説明図である。頻度集計ＤＢ１７４は、検索ＩＤ列、種類列及び頻度列を含む。検索ＩＤ列は、画像を検索した履歴データのＩＤを記憶している。種類列は、画像の中から検出された物体の種類を記憶している。頻度列は、種類ごとの物体の出現回数を記憶している。 FIG. 9 is an explanatory diagram showing an example of the record layout of the frequency aggregation DB 174. The frequency aggregation DB 174 includes a search ID column, a type column, and a frequency column. The search ID column stores IDs of historical data for which images have been searched. The type column stores the types of objects detected in the image. The frequency column stores the number of times each type of object appears.

図１０は、各種類の物体の出現頻度に応じて所望画像を抽出する動作を説明する説明図である。図１０に基づき、本実施形態の概要を説明する。 FIG. 10 is an explanatory diagram illustrating the operation of extracting a desired image according to the appearance frequency of each type of object. An overview of this embodiment will be explained based on FIG. 10.

本実施形態でサーバ１の制御部１１は、物体検出モデル１７３から出力層を除去せず、
本来の物体検出用のモデルとして機能させる。制御部１１は、ＳＮＳサーバ２から取得した画像群を物体検出モデル１７３に入力し、各画像に含まれる物体の普通名称、たとえば物体の種類を示す検出結果を物体検出モデル１７３から取得する。これにより、図１０に示すように、各物体の画像領域ごとに物体の種類を識別した結果が出力される。 In this embodiment, the control unit 11 of the server 1 does not remove the output layer from the object detection model 173,
Function as a model for original object detection. The control unit 11 inputs the image group acquired from the SNS server 2 to the object detection model 173, and acquires from the object detection model 173 a detection result indicating the common name of the object included in each image, for example, the type of the object. As a result, as shown in FIG. 10, the result of identifying the type of object for each image area of each object is output.

制御部１１は、各画像の検出結果に基づき、ＳＮＳサーバ２から取得した画像群全体での各種類の物体の出現頻度を集計する。制御部１１は、集計した各種類の物体の出現頻度を検索ＩＤに対応付け、検索ＩＤ、種類及び出現頻度を一つのレコードとして頻度集計ＤＢ１７４に記憶する。制御部１１は、頻度が一番高い物体の種類を判定し、判定した種類の物体を含む画像群を、対象物を含む画像群として取得する。 The control unit 11 totals the appearance frequency of each type of object in the entire image group acquired from the SNS server 2 based on the detection result of each image. The control unit 11 associates the total appearance frequency of each type of object with a search ID, and stores the search ID, type, and appearance frequency as one record in the frequency total DB 174. The control unit 11 determines the type of object with the highest frequency, and acquires a group of images including the determined type of object as a group of images including the target object.

図１１は、各種類の物体の出現頻度に応じて所望画像を抽出する際の処理手順を示すフローチャートである。なお、図７と重複する内容については同一の符号を付して説明を省略する。制御部１１は、物体検出モデル１７３を用いて、ＳＮＳサーバ２から検索した対象物に関する画像群から各画像内の物体を検出する（ステップＳ１３１）。制御部１１は、検出した各画像に含まれる物体の種類に応じて、各物体の種類の出現頻度を集計する（ステップＳ１３２）。制御部１１は、集計した種類ごとの出現頻度を検索ＩＤに対応付けて頻度集計ＤＢ１７４に記憶する（ステップＳ１３３）。制御部１１は、頻度が一番高い物体の種類を取得し、取得した種類に基づき、物体検出モデル１７３から出力された該種類に対応付けた画像群を抽出する（ステップＳ１３４）。制御部１１は、抽出した画像群を大容量記憶部１７の抽出画像ＤＢ１７２に記憶する（ステップＳ１３５）。 FIG. 11 is a flowchart showing a processing procedure for extracting a desired image according to the appearance frequency of each type of object. Note that the same reference numerals are given to the same contents as those in FIG. 7, and the description thereof will be omitted. The control unit 11 uses the object detection model 173 to detect an object in each image from a group of images related to the target object retrieved from the SNS server 2 (step S131). The control unit 11 totalizes the appearance frequency of each object type according to the object type included in each detected image (step S132). The control unit 11 stores the aggregated appearance frequency of each type in the frequency aggregation DB 174 in association with the search ID (step S133). The control unit 11 acquires the type of object with the highest frequency, and based on the acquired type, extracts a group of images associated with the type output from the object detection model 173 (step S134). The control unit 11 stores the extracted image group in the extracted image DB 172 of the mass storage unit 17 (step S135).

本実施形態によると、各種類の物体の出現頻度を集計し、集計した頻度に応じて画像を抽出する。これによって、例えば実施形態１の処理によって、対象物と異なる物体が誤って最多のクラスタに分類されたような場合でも、誤抽出を防止することができ、画像抽出の精度を向上することが可能となる。 According to this embodiment, the appearance frequency of each type of object is tallied, and images are extracted according to the tallied frequency. As a result, even if an object different from the target object is mistakenly classified into the largest cluster due to the processing in Embodiment 1, for example, incorrect extraction can be prevented and the accuracy of image extraction can be improved. becomes.

（実施形態３）
実施形態３は、物体検出モデル１７３で検出した物体の普通名称と、対象物の名称とが類似するか否かを判定することで、所望画像を抽出する形態に関する。なお、実施形態２と重複する内容については説明を省略する。 (Embodiment 3)
The third embodiment relates to a form in which a desired image is extracted by determining whether the common name of an object detected by the object detection model 173 and the name of the target object are similar. Note that descriptions of contents that overlap with those of Embodiment 2 will be omitted.

サーバ１は、対象物に関する画像群を検索した際の検索クエリ（例えば、対象物の名称）の類語を抽出し、抽出した類語の類似度が所定閾値以上である場合、該類語に対応する画像を抽出する。類語の抽出処理に関しては、類語辞書またはＷｏｒｄ２Ｖｅｃ等のベクトル化されたデータを利用して抽出しても良い。なお、本実施形態では、類語辞書を用いた例をあげて説明する。 The server 1 extracts synonyms of a search query (for example, the name of an object) when searching for a group of images related to an object, and if the degree of similarity of the extracted synonyms is equal to or higher than a predetermined threshold, the server 1 extracts synonyms from a search query (for example, the name of an object), and if the similarity of the extracted synonyms is equal to or higher than a predetermined threshold, the Extract. Regarding the synonym extraction process, extraction may be performed using a thesaurus dictionary or vectorized data such as Word2Vec. Note that this embodiment will be described using an example using a thesaurus dictionary.

図１２は、実施形態３のサーバ１の構成例を示すブロック図である。なお、図８と重複する内容については同一の符号を付して説明を省略する。大容量記憶部１７は、類語辞書１７５を含む。類語辞書１７５は、同じような意味を持つ言葉をまとめた辞書である。 FIG. 12 is a block diagram showing a configuration example of the server 1 according to the third embodiment. Note that the same reference numerals are given to the same content as in FIG. 8, and the explanation thereof will be omitted. The mass storage unit 17 includes a thesaurus dictionary 175. The thesaurus dictionary 175 is a dictionary that compiles words with similar meanings.

図１３は、類語辞書１７５のレコードレイアウトの一例を示す説明図である。類語辞書１７５は、テキスト列及び類似度列を含む。テキスト列は、物体の名称（一般名称または固有名称）に相当するテキストを記憶している。類似度列は、各テキストの類似度を記憶している。 FIG. 13 is an explanatory diagram showing an example of the record layout of the thesaurus dictionary 175. Thesaurus dictionary 175 includes a text string and a similarity string. The text string stores text corresponding to the name (common name or proper name) of the object. The similarity column stores the similarity of each text.

図１４は、類語を用いて所望画像を抽出する際の処理手順を示すフローチャートである。なお、図１１と重複する内容については同一の符号を付して説明を省略する。制御部１１は、ステップＳ１３１で検出した物体の種類の名称と、対象物に関する画像群を検索し
た際の検索クエリ（対象物の名称）との類似度を類語辞書１７５から取得する（ステップＳ１４１）。制御部１１は、取得した類似度に応じて、ステップＳ１０２で検索した画像群から対象物を含む画像を抽出する（ステップＳ１４２）。例えば、制御部１１は、取得した各物体の種類の名称と対象物の名称との類似度が所定の閾値以上であると判定した場合、該当物体を含む画像を抽出する。 FIG. 14 is a flowchart showing the processing procedure when extracting a desired image using synonyms. Note that the same reference numerals are given to the same contents as those in FIG. 11, and the description thereof will be omitted. The control unit 11 acquires, from the thesaurus 175, the degree of similarity between the name of the object type detected in step S131 and the search query (object name) used when searching for a group of images related to the object (step S141). . The control unit 11 extracts images including the object from the image group searched in step S102 according to the obtained similarity (step S142). For example, when the control unit 11 determines that the degree of similarity between the acquired name of each object type and the name of the target object is greater than or equal to a predetermined threshold, the control unit 11 extracts an image that includes the object.

本実施形態によると、検出された画像内の物体の種類の名称と対象物の名称との類似度により画像を抽出することで、画像抽出の精度を向上することが可能となる。 According to this embodiment, by extracting an image based on the degree of similarity between the name of the object type in the detected image and the name of the target object, it is possible to improve the accuracy of image extraction.

（実施形態４）
図１５は、上述した形態のサーバ１の動作を示す機能ブロック図である。制御部１１が制御プログラム１Ｐを実行することにより、サーバ１は以下のように動作する。 (Embodiment 4)
FIG. 15 is a functional block diagram showing the operation of the server 1 of the above-described form. When the control unit 11 executes the control program 1P, the server 1 operates as follows.

検索部１０ａは、ネットワークＮを介して対象物に関連する画像群を検索する。抽出部１０ｂは、検索された各画像の特徴量を抽出する。分類部１０ｃは、画像特徴量に基づき、各画像を複数のクラスタのいずれかに分類する。画像抽出部１０ｄは、クラスタに分類された画像領域を有する画像を抽出する。生成部１０ｅは、抽出部１０ｂが抽出した画像に対し、該画像領域に対象物の情報を関連付けたデータを生成する。検出部１０ｆは、画像に含まれる物体を検出する。 The search unit 10a searches through the network N for a group of images related to the object. The extraction unit 10b extracts the feature amount of each searched image. The classification unit 10c classifies each image into one of a plurality of clusters based on the image feature amount. The image extraction unit 10d extracts an image having image regions classified into clusters. The generation unit 10e generates, for the image extracted by the extraction unit 10b, data in which object information is associated with the image area. The detection unit 10f detects an object included in the image.

本実施の形態４は以上の如きであり、その他は実施の形態１から３と同様であるので、対応する部分には同一の符号を付してその詳細な説明を省略する。 This Embodiment 4 is as described above, and other aspects are the same as Embodiments 1 to 3, so corresponding parts are given the same reference numerals and detailed explanation thereof will be omitted.

今回開示された実施形態はすべての点で例示であって、制限的なものではないと考えられるべきである。本発明の範囲は、上記した意味ではなく、特許請求の範囲によって示され、特許請求の範囲と均等の意味及び範囲内でのすべての変更が含まれることが意図される。 The embodiments disclosed herein are illustrative in all respects and should not be considered restrictive. The scope of the present invention is indicated by the claims rather than the above-mentioned meaning, and is intended to include meanings equivalent to the claims and all changes within the scope.

１情報処理装置（サーバ）
１１制御部
１２記憶部
１３通信部
１４入力部
１５表示部
１６読取部
１７大容量記憶部
１７１検索履歴ＤＢ
１７２抽出画像ＤＢ
１７３物体検出モデル
１７４頻度集計ＤＢ
１７５類語辞書
１ａ可搬型記憶媒体
１ｂ半導体メモリ
１Ｐ制御プログラム
２情報共有サーバ（ＳＮＳサーバ）
１０ａ検索部
１０ｂ抽出部
１０ｃ分類部
１０ｄ画像抽出部
１０ｅ生成部
１０ｆ検出部

1 Information processing device (server)
11 Control section 12 Storage section 13 Communication section 14 Input section 15 Display section 16 Reading section 17 Mass storage section 171 Search history DB
172 Extracted image DB
173 Object detection model 174 Frequency aggregation DB
175 Thesaurus 1a Portable storage medium 1b Semiconductor memory 1P Control program 2 Information sharing server (SNS server)
10a Search section 10b Extraction section 10c Classification section 10d Image extraction section 10e Generation section 10f Detection section

Claims

a search unit that uses text to search for images related to the object via the network;
an extraction unit that extracts feature amounts of each searched image;
a classification unit that clusters the features of each image;
an image extraction unit that extracts the image according to the number of images forming the cluster of the feature amount;
a detection unit that detects an object included in the image,
The information processing device is characterized in that the image extraction unit extracts the image according to the type of the detected object.

The extraction unit extracts the feature amount of each image region in the image,
The classification unit performs clustering for each image region,
The image extraction unit includes:
Selecting one of the plurality of clusters according to the number of the image regions forming the cluster of the feature amount,
The information processing apparatus according to claim 1, wherein the image having the image area constituting the selected cluster is extracted.

The information processing device according to claim 1 or 2, wherein the image extraction unit extracts the image according to the appearance frequency of each type of the object in all the images searched by the search unit.

The detection unit specifies the type name of the object,
The information processing according to any one of claims 1 to 3, wherein the image extraction unit extracts the image according to the degree of similarity between the name of the object and the name of the type of the object. Device.

Search for images related to the object through the network,
Extract the features of each searched image,
Clustering each of the images based on the feature amount,
Extracting the images according to the number of images forming the cluster of the feature amount,
detecting an object included in the image;
An information processing method that causes a computer to execute a process of extracting the image according to the type of the detected object.

to the computer,
Search for images related to the object through the network,
Extract the features of each searched image,
Clustering each of the images based on the feature amount,
Extracting the images according to the number of images forming the cluster of the feature amount,
detecting an object included in the image;
A program that executes a process of extracting the image according to the type of the detected object.

a search unit that uses text to search for images related to the object via the network;
an extraction unit that extracts feature amounts of each searched image;
a classification unit that clusters the features of each image;
an image extraction unit that extracts the image according to the number of images forming the cluster of the feature amount;
a detection unit that detects an object included in the image,
The detection unit specifies the type name of the object,
The information processing device is characterized in that the image extraction unit extracts the image according to the degree of similarity between the name of the object and the name of the type of the object.