JP7179705B2

JP7179705B2 - Information processing device, information processing method and information processing program

Info

Publication number: JP7179705B2
Application number: JP2019164021A
Authority: JP
Inventors: 雅二郎岩崎; 修平西村; 拓明田口
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2019-09-09
Filing date: 2019-09-09
Publication date: 2022-11-29
Anticipated expiration: 2039-09-09
Also published as: JP2021043603A

Description

本発明は、情報処理装置、情報処理方法および情報処理プログラムに関する。 The present invention relates to an information processing device, an information processing method, and an information processing program.

従来、ニューラルネットワークによる画像の特徴抽出に関する技術が提供されている。例えば、畳み込みニューラルネットワーク（Convolutional Neural Network）により、画像の顕著性マップを生成する技術が提供されている。 Conventionally, techniques related to image feature extraction by neural networks have been provided. For example, a technique for generating a saliency map of an image is provided by a convolutional neural network.

Karen Simonyan, Andrea Vedaldi, Andrew Zisserman, "Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps", International Conference on Machine Learning(ICLR), Apr14-16, 2014, Banff, Canada.Karen Simonyan, Andrea Vedaldi, Andrew Zisserman, "Deep Inside Convolutional Networks: Visualizing Image Classification Models and Saliency Maps", International Conference on Machine Learning (ICLR), Apr14-16, 2014, Banff, Canada. Misha Denil, Alban Demiraj, Nando de Freitas, "Extraction of Salient Sentences from Labelled Documents", International Conference on Machine Learning(ICLR), Apr14-16, 2015, San Diego, USA.Misha Denil, Alban Demiraj, Nando de Freitas, "Extraction of Salient Sentences from Labeled Documents", International Conference on Machine Learning (ICLR), Apr14-16, 2015, San Diego, USA.

しかしながら、上記の従来技術では、分類結果に対応する対象物に応じた低次特徴を高精度に抽出することができるとは限らない。 However, with the above-described conventional technology, it is not always possible to extract low-level features according to the object corresponding to the classification result with high accuracy.

本願は、上記に鑑みてなされたものであって、分類結果に対応する対象物に応じた低次特徴を高精度に抽出することができる情報処理装置、情報処理方法および情報処理プログラムを提供することを目的とする。 The present application has been made in view of the above, and provides an information processing device, an information processing method, and an information processing program capable of highly accurately extracting low-order features according to an object corresponding to a classification result. for the purpose.

本願にかかる情報処理装置は、処理対象の画像においてクラス分類のために有用な有用領域を特定する特定部と、前記処理対象の画像のうち、前記特定部により特定された有用領域に含まれる画像から低次特徴を抽出する抽出部とを有することを特徴とする。 An information processing apparatus according to the present application includes a specifying unit that specifies a useful region useful for class classification in an image to be processed, and an image included in the useful region specified by the specifying unit in the image to be processed. and an extracting unit for extracting low-order features from.

実施形態の一態様によれば、分類結果に対応する対象物に応じた低次特徴を高精度に抽出することができるといった効果を奏する。 According to one aspect of the embodiment, there is an effect that it is possible to highly accurately extract low-order features according to the target object corresponding to the classification result.

図１は、実施形態にかかる情報処理の概要を示す図である。FIG. 1 is a diagram illustrating an overview of information processing according to an embodiment; 図２は、実施形態にかかる情報処理の一例を示す図である。FIG. 2 is a diagram illustrating an example of information processing according to the embodiment; 図３は、実施形態にかかる情報処理システムの構成例を示す図である。FIG. 3 is a diagram illustrating a configuration example of an information processing system according to the embodiment; 図４は、実施形態にかかる情報処理装置の構成例を示す図である。FIG. 4 is a diagram illustrating a configuration example of an information processing apparatus according to the embodiment; 図５は、実施形態にかかる画像情報記憶部の一例を示す図である。5 is a diagram illustrating an example of an image information storage unit according to the embodiment; FIG. 図６は、実施形態にかかる特徴情報記憶部の一例を示す図である。6 is a diagram illustrating an example of a feature information storage unit according to the embodiment; FIG. 図７は、実施形態にかかる学習処理手順を示すフローチャートである。FIG. 7 is a flowchart illustrating a learning processing procedure according to the embodiment; 図８は、情報処理装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。FIG. 8 is a hardware configuration diagram showing an example of a computer that implements the functions of the information processing apparatus.

以下に、本願にかかる情報処理装置、情報処理方法および情報処理プログラムを実施するための形態（以下、「実施形態」と呼ぶ）について図面を参照しつつ説明する。なお、この実施形態により本願にかかる情報処理装置、情報処理方法および情報処理プログラムが限定されるものではない。また、以下の実施形態において、同一の部位には同一の符号を付し、重複する説明は省略される。 Hereinafter, modes for implementing an information processing apparatus, an information processing method, and an information processing program according to the present application (hereinafter referred to as "embodiments") will be described with reference to the drawings. The information processing apparatus, information processing method, and information processing program according to the present application are not limited to this embodiment. In addition, in the following embodiments, the same parts are denoted by the same reference numerals, and overlapping descriptions are omitted.

〔１．情報処理の概要〕
まず、図１を用いて、実施形態にかかる情報処理の概要について説明する。図１は、実施形態にかかる情報処理の概要を示す図である。実施形態にかかる情報処理は、情報処理装置１００によって行われる。 [1. Overview of information processing]
First, an overview of information processing according to the embodiment will be described with reference to FIG. FIG. 1 is a diagram illustrating an overview of information processing according to an embodiment; Information processing according to the embodiment is performed by the information processing apparatus 100 .

図１の説明に先立って、図３を用いて、実施形態にかかる情報処理システムについて説明する。図３は、実施形態にかかる情報処理システム１の構成例を示す図である。実施形態にかかる情報処理システム１は、図３に示すように、端末装置１０と、提供者装置２０と、情報処理装置１００とを含む。端末装置１０、提供者装置２０、情報処理装置１００は、ネットワークＮを介して有線または無線により通信可能に接続される。なお、図３に示す情報処理システム１には、複数台の端末装置１０や、複数台の提供者装置２０や、複数台の情報処理装置１００が含まれてよい。 Prior to the description of FIG. 1, the information processing system according to the embodiment will be described with reference to FIG. FIG. 3 is a diagram showing a configuration example of the information processing system 1 according to the embodiment. The information processing system 1 according to the embodiment includes a terminal device 10, a provider device 20, and an information processing device 100, as shown in FIG. The terminal device 10, the provider device 20, and the information processing device 100 are communicably connected via a network N by wire or wirelessly. The information processing system 1 shown in FIG. 3 may include multiple terminal devices 10 , multiple provider devices 20 , and multiple information processing devices 100 .

端末装置１０は、ユーザによって利用される情報処理端末である。端末装置１０は、例えば、商品検索したいと考えるユーザによって利用される情報処理端末である。端末装置１０は、例えば、スマートフォンや、タブレット型端末や、ノート型ＰＣ（Personal Computer）や、デスクトップＰＣや、携帯電話機や、ＰＤＡ（Personal Digital Assistant）等である。 The terminal device 10 is an information processing terminal used by a user. The terminal device 10 is, for example, an information processing terminal used by a user who wants to search for products. The terminal device 10 is, for example, a smart phone, a tablet terminal, a notebook PC (Personal Computer), a desktop PC, a mobile phone, a PDA (Personal Digital Assistant), or the like.

例えば、端末装置１０は、ユーザの操作に従って、情報処理装置１００に検索クエリを送信する。例えば、ユーザは、所定の画像に写されたオブジェクトに類似または一致する商品を検索したいとする。かかる場合、ユーザは、この所定の画像を端末装置１０に入力して検索ボタンを押下する。そうすると、端末装置１０は、この所定の画像を検索クエリとして情報処理装置１００に送信する。なお、このような場合、所定の画像は、検索クエリ画像等と呼ぶことができる。 For example, the terminal device 10 transmits a search query to the information processing device 100 according to a user's operation. For example, a user may wish to search for items that are similar or match an object depicted in a given image. In such a case, the user inputs this predetermined image into the terminal device 10 and presses the search button. Then, the terminal device 10 transmits this predetermined image to the information processing device 100 as a search query. In such a case, the predetermined image can be called a search query image or the like.

提供者装置２０は、検索クエリ（検索クエリ画像）に対応する検索結果となり得る画像を提供する提供者によって利用される情報処理端末である。本実施形態では、提供者は、情報処理装置１００の事業主によって運営されるショッピングサービス（以下、「サービスＳＨ」と表記する場合がある）に対して商品を出品している店舗であるものとする。かかる場合、検索結果となり得る画像では、実質、提供者の商品が映される（紹介される）ということになるため、かかる画像は商品画像と呼ぶことができる。 The provider device 20 is an information processing terminal used by a provider who provides an image that can be a search result corresponding to a search query (search query image). In this embodiment, the provider is assumed to be a store that sells products to a shopping service (hereinafter sometimes referred to as "service SH") operated by the business owner of the information processing device 100. do. In such a case, an image that can be a search result will actually show (introduce) the product of the provider, and thus such an image can be called a product image.

ここで、実施形態にかかる情報処理が行われるにあたっての前提について説明する。例えば、ユーザは、画像ＩＭＧｘ１に写されたオブジェクトＯＢｘ１を購入したいと考え、オブジェクトＯＢｘ１が商品として販売されていないか検索しようとする場合を想定する。このような場合、提供者（店舗）により入稿された商品画像から、当該商品画像に含まれる商品の低次特徴を抽出しておくことで、オブジェクトＯＢｘ１に応じた低次特徴を有する商品が映される商品画像を検索することが考えられる。 Here, premises for performing information processing according to the embodiment will be described. For example, it is assumed that the user wants to purchase the object OBx1 shown in the image IMGx1 and tries to search whether the object OBx1 is sold as a product. In such a case, by extracting the low-level features of the product included in the product image from the product image submitted by the provider (store), the product having the low-level features corresponding to the object OBx1 can be obtained. It is conceivable to search for the projected product image.

しかしながら、低次特徴を高精度に抽出することができるとは限らない。例えば、商品画像が、商品に対して背景単一色といった抽象的かつシンプルな画像であれば、商品の低次特徴を高精度に抽出することは可能であるかもしれない。しかしながら、多くの商品画像は、上記のように低次特徴を高精度に抽出できるような都合のよい態様にはなってないことがほとんである。例えば、商品画像には、商品だけでなく、商品以外の様々なオブジェクト、人物、複雑な背景、複雑な色合いで構成されていることが多く、このような場合、商品の低次特徴を高精度に抽出することはできない。 However, it is not always possible to extract low-order features with high accuracy. For example, if the product image is an abstract and simple image in which the product has a single background color, it may be possible to extract the low-order features of the product with high accuracy. However, in most cases, many product images are not in such a convenient mode that low-order features can be extracted with high accuracy as described above. For example, product images often consist of not only the product but also various other objects, people, complex backgrounds, and complex colors. cannot be extracted to

また、商品画像から低次特徴を抽出するには、商品以外の余分な情報（例えば、商品以外のオブジェクトや背景）を除去することで、商品を分離することで、分離した商品の低次特徴を抽出することが考えられる。このような手法として、ピクセルレベルのセグメンテーションや、矩形レベルの物体検出を行うことのできるモデルを学習しておくことで、かかるモデルを用いて処理対象の商品画像から余分な情報を除去する手法が挙げられる。しかしながら、モデルの学習やモデルを用いた計算にかかるコストが高コストであるとの問題がある。 In addition, in order to extract low-level features from product images, extraneous information other than products (for example, objects and backgrounds other than products) can be removed to separate products. can be extracted. As such a method, by learning a model that can perform pixel-level segmentation and rectangle-level object detection, there is a method of removing unnecessary information from the product image to be processed using such a model. mentioned. However, there is a problem that the cost required for model learning and calculation using the model is high.

すなわち、商品画像の構成上、商品の低次特徴を高精度に抽出することはできないし、また、これを実現するためには上記のようなモデルを用いた手法があるものの、コストの問題等から現実的ではない。したがって、本実施形態は、このような前提や課題を踏まえて、画像中のオブジェクト（例えば、商品）に対してクラス分類を行う既存の学習器（モデル）を活用することでコストを抑えつつ、かつ、画像中のオブジェクト（例えば、商品）の低次特徴を高精度に抽出することを目的としている。 In other words, due to the structure of the product image, it is not possible to extract the low-order features of the product with high accuracy. is not realistic. Therefore, based on such assumptions and problems, the present embodiment uses an existing learning device (model) that classifies objects (for example, products) in an image to reduce costs. Another object of the present invention is to extract low-order features of an object (for example, a product) in an image with high accuracy.

具体的には、実施形態に情報処理装置１００は、既存の学習器（モデル）を活用することでコストを抑えつつ、かつ、画像中のオブジェクト（例えば、商品）の低次特徴を高精度に抽出するための情報処理として次のような処置を行う。具体的には、情報処理装置１００は、処理対象の画像においてクラス分類のために有用な有用領域を特定し、処理対象の画像のうち、特定した有用領域に含まれる画像から低次特徴を抽出する。具体的には、情報処理装置１００は、入力画像に対してクラス分類した分類結果を出力するニューラルネットワークであるモデルに処理対象の画像を入力し、かかるモデルの中間層での所定の情報を取得する。そして、情報処理装置１００は、所定の情報に基づいて、有用領域を特定する。 Specifically, in the embodiment, the information processing apparatus 100 uses an existing learning device (model) to reduce costs and accurately detect low-level features of an object (for example, a product) in an image. The following measures are performed as information processing for extraction. Specifically, the information processing apparatus 100 identifies a useful region useful for class classification in the image to be processed, and extracts low-level features from the image included in the identified useful region in the image to be processed. do. Specifically, the information processing apparatus 100 inputs an image to be processed into a model, which is a neural network that outputs classification results obtained by classifying an input image, and obtains predetermined information in the intermediate layer of the model. do. Then, the information processing device 100 identifies the useful area based on the predetermined information.

例えば、情報処理装置１００は、所定の情報に基づき処理対象の画像において分類結果への貢献度が所定値より高いと判定された領域を有用領域として特定する。例えば、情報処理装置１００は、所定の情報として、中間層で得られた特徴マップであって、処理対象の画像に対する分類結果への貢献度に応じた特徴マップを取得する。したがって、情報処理装置１００は、中間層で得られた特徴マップをプーリングした状態で、処理対象の画像に対する分類結果にマッピングし、マッピングした後の特徴マップと、処理対象の画像とを比較した比較結果に基づいて、処理対象の画像において分類結果への貢献度が所定値より高いと判定された領域を有用領域として特定する。また、例えば、情報処理装置１００は、処理対象の画像として、特徴マップに基づきグリッド状に分割された処理対象の画像において、グリッド毎に算出された貢献度に基づき処理対象の画像において貢献度が所定値より高いと判定された領域を有用領域として特定する。 For example, the information processing apparatus 100 identifies, as a useful area, an area determined to contribute to the classification result higher than a predetermined value in the image to be processed based on predetermined information. For example, the information processing apparatus 100 acquires, as predetermined information, a feature map obtained in an intermediate layer and corresponding to the degree of contribution to the classification result for the image to be processed. Therefore, the information processing apparatus 100 maps the feature map obtained in the intermediate layer to the classification result of the image to be processed in a pooled state, and compares the feature map after mapping with the image to be processed. Based on the result, a region determined to contribute to the classification result higher than a predetermined value in the image to be processed is identified as a useful region. Further, for example, the information processing apparatus 100 determines that the image to be processed is an image to be processed that is divided into grids based on the feature map, and the contribution of the image to be processed is determined based on the contribution calculated for each grid. A region determined to be higher than a predetermined value is specified as a useful region.

上記実施形態にかかる情報処理の一例を示す。例えば、既存の学習器によって処理対象に画像に含まれる所定のオブジェクトがクラスＣＬに属すると判断されたとすると、実施形態にかかる情報処理装置１００は、クラスＣＬに属すると判断されるきっかけとなった情報、すなわちクラスＣＬに属すると判断されるうえで有用な情報をモデルの中間層から取得する。そして、情報処理装置１００は、取得した情報を処理対象の画像に適用することで、処理対象の画像においてクラスＣＬと判断された有用な有用領域を特定する。 An example of information processing according to the above embodiment is shown. For example, if an existing learning device determines that a predetermined object included in an image to be processed belongs to class CL, the information processing apparatus 100 according to the embodiment becomes a trigger for determining that it belongs to class CL. Information is obtained from the middle layers of the model, ie information useful in determining that it belongs to class CL. Then, the information processing apparatus 100 applies the acquired information to the image to be processed, thereby identifying useful useful regions determined to be class CL in the image to be processed.

以下では、図１を用いて、まず、実施形態にかかる情報処理の全体的な流れの一例について説明する。その後、図２を用いて、実施形態にかかる情報処理の一例を示す。 First, an example of the overall flow of information processing according to the embodiment will be described below with reference to FIG. After that, FIG. 2 is used to show an example of information processing according to the embodiment.

図１の例では、提供者装置２０は提供者Ｔ１によって利用され、また、端末装置１０はユーザＵ１によって利用されるものとする。また、図１の例では、提供者装置２０は、提供者Ｔ１の操作に従って、情報処理装置１００に対して商品画像ＩＭ１１を入稿している（ステップＳ１１）。商品画像ＩＭ１１には、提供者Ｔ１が取り扱っている商品、すなわち取引対象のオブジェクトＯＢ１１が映されている。図１の例によると、オブジェクトＯＢ１１はＴシャツである。 In the example of FIG. 1, provider device 20 is used by provider T1, and terminal device 10 is used by user U1. Further, in the example of FIG. 1, the provider device 20 submits the product image IM11 to the information processing device 100 according to the operation of the provider T1 (step S11). The product image IM11 shows the product handled by the provider T1, that is, the object OB11 to be traded. According to the example of FIG. 1, object OB11 is a T-shirt.

ここで、上記の通り、図１の例では、情報処理装置１００が提供者Ｔ１のみから商品画像の提供を受ける例を示すため、実施形態にかかる情報処理として、商品画像ＩＭ１１に焦点を当てた情報処理を示す。しかしながら、情報処理装置１００は、実際には、提供者Ｔ１以外にも多くの提供者から商品画像の提供を受けることになるため、各提供者から提供された商品画像毎に実施形態にかかる情報処理を行うものである。 Here, as described above, in the example of FIG. 1, in order to show an example in which the information processing apparatus 100 receives product images only from the provider T1, the information processing according to the embodiment focuses on the product image IM11. Indicates information processing. However, the information processing apparatus 100 actually receives product images from many providers other than the provider T1. processing.

図１の説明に戻る。情報処理装置１００は、商品画像ＩＭ１１の入稿を受け付けると、商品画像ＩＭ１１においてクラス分類のために有用な有用領域を特定する一連の特定処理を行う（ステップＳ１２）。ここで、既存の学習器ＬＥに関する学習情報として、入力画像に含まれる所定のオブジェクト対してクラス分類した分類結果を出力するニューラルネットワークであるモデルＭを例に挙げる。また、モデルＭを学習情報として有する既存の学習器ＬＥに対して、商品画像ＩＭ１１した場合、取引対象のオブジェクトＯＢ１１が属するクラス（カテゴリ）として、クラス「Ｔシャツ」が出力されるとする。そうすると、情報処理装置１００は、ステップＳ１２では、商品画像ＩＭ１１において、オブジェクトＯＢ１１をクラス「Ｔシャツ」に分類させるために有用となった有用領域を特定する一連の特定処理を行う。 Returning to the description of FIG. Upon receiving the product image IM11, the information processing apparatus 100 performs a series of identification processes for identifying useful regions useful for class classification in the product image IM11 (step S12). Here, as learning information about the existing learner LE, a model M, which is a neural network that outputs a result of classifying a predetermined object included in an input image, is taken as an example. Also, when the product image IM11 is applied to the existing learning device LE having the model M as learning information, it is assumed that the class "T-shirt" is output as the class (category) to which the object OB11 to be traded belongs. Then, in step S12, the information processing apparatus 100 performs a series of specifying processes for specifying useful regions that have become useful for classifying the object OB11 into the class "T-shirt" in the product image IM11.

次に、情報処理装置１００は、商品画像ＩＭ１１のうち、ステップＳ１２で特定した有用領域に含まれる画像から、所定のオブジェクトに関する低次特徴を抽出する（ステップＳ１３）。例えば、情報処理装置１００は、取引対象のオブジェクトＯＢ１１に関する低次特徴を抽出する。ステップＳ１２の特定処理、および、ステップＳ１３の抽出処理の詳細については、後ほど図２を用いて説明する。また、ここまでの処理により、情報処理装置１００は、商品画像毎に当該商品画像に含まれる所定のオブジェクト（取引対象のオブジェクト）に関する低次特徴を得られることになる。 Next, the information processing apparatus 100 extracts low-order features related to a predetermined object from the image included in the useful area specified in step S12 in the product image IM11 (step S13). For example, the information processing apparatus 100 extracts low-order features related to the object OB11 to be traded. Details of the specifying process in step S12 and the extraction process in step S13 will be described later with reference to FIG. In addition, through the processing up to this point, the information processing apparatus 100 can obtain low-order features related to a predetermined object (object to be traded) included in the product image for each product image.

このような状態において、情報処理装置１００は、ユーザＵ１の端末装置１０から検索クエリ画像ＱＥ２１を受け付けたとする（ステップＳ１４）。そうすると、情報処理装置１００は、各提供者から入稿されている商品画像の中から、検索クエリ画像ＱＥ２１に類似する画像を検索する類似画像検索を行う（ステップＳ１５）。具体的には、情報処理装置１００は、検索クエリ画像ＱＥ２１に含まれるオブジェクトであって、例えばユーザＵ１により指定されたオブジェクトである指定オブジェクトに類似するオブジェクトが取引対象として写された商品画像を検索する。例えば、情報処理装置１００は、指定オブジェクトと、各商品画像から抽出された低次特徴とのマッチングを行うことで、指定オブジェクトに類似するオブジェクトを取引対象とする商品画像を検索する。 In such a state, it is assumed that the information processing apparatus 100 receives a search query image QE21 from the terminal device 10 of the user U1 (step S14). Then, the information processing apparatus 100 performs a similar image search to search for an image similar to the search query image QE21 from the product images submitted by each provider (step S15). Specifically, the information processing apparatus 100 searches for product images in which an object included in the search query image QE21 and similar to the specified object, which is an object specified by the user U1, is photographed as a transaction object. do. For example, the information processing apparatus 100 searches for product images whose trade targets are objects similar to the specified object by matching the specified object with low-order features extracted from each product image.

そして、情報処理装置１００は、ステップＳ１５での検索で得られた商品画像（検索結果画像）を端末装置１０に送信する（ステップＳ１６）。 Then, the information processing device 100 transmits the product image (search result image) obtained by the search in step S15 to the terminal device 10 (step S16).

〔２．情報処理の一例〕

さて、ここからは、図２を用いて、実施形態にかかる情報処理の一例について説明する。図２は、実施形態にかかる情報処理の一例を示す図である。図２では、実施形態にかかる情報処理として、図１に示すステップＳ１２の特定処理、および、ステップＳ１３の抽出処理の詳細について説明する。 [2. Example of information processing]

From now on, an example of information processing according to the embodiment will be described with reference to FIG. FIG. 2 is a diagram illustrating an example of information processing according to the embodiment; In FIG. 2, as the information processing according to the embodiment, details of the specifying process in step S12 and the extraction process in step S13 shown in FIG. 1 will be described.

図１の例によると、情報処理装置１００は、商品画像ＩＭ１１の入稿を受け付けているため、学習器ＬＥに商品画像ＩＭ１１（処理対象の画像の一例）を入力する（ステップＳ１２１）。 According to the example of FIG. 1, the information processing apparatus 100 receives the submission of the product image IM11, and therefore inputs the product image IM11 (an example of the image to be processed) to the learning device LE (step S121).

ここで、情報処理装置１００が用いる学習器について説明する。情報処理装置１００が用いる学習器ＬＥは、例えば、入力されたデータに対する演算結果を出力する複数のノードを多層に接続した学習器であって、教師あり学習により抽象化された画像の特徴を学習された学習器である。例えば、学習器ＬＥは、複数のノードを有する層を多段に接続したニューラルネットワークであり、いわゆるディープラーニングの技術により実現されるＤＮＮ（Deep Neural Network）であってもよい。また、画像の特徴とは、画像に含まれる文字の有無、色、構成等、画像内に現れる具体的な特徴のみならず、撮像されている物体が何であるか、画像がどのような利用者に好かれるか、画像の雰囲気等、抽象化（メタ化）された画像の特徴をも含む概念である。 Here, the learning device used by the information processing apparatus 100 will be described. The learning device LE used by the information processing apparatus 100 is, for example, a learning device in which a plurality of nodes that output operation results for input data are connected in multiple layers, and learns features of an abstracted image by supervised learning. is a learned learner. For example, the learner LE is a neural network in which layers having a plurality of nodes are connected in multiple stages, and may be a DNN (Deep Neural Network) realized by so-called deep learning technology. In addition, the features of an image include not only specific features that appear in the image, such as the presence or absence of text contained in the image, color, and composition, but also what the object being imaged is, and what kind of user the image is. It is a concept that includes features of abstracted (meta) images, such as whether they are liked by people or the atmosphere of the image.

例えば、学習器ＬＥは、ディープラーニングの技術により、以下のような学習手法により生成される。例えば、学習器ＬＥは、各ノードの間の接続係数が初期化され、様々な特徴を有する画像が入力される。そして、学習器ＬＥは、学習器ＬＥにおける出力と、入力した画像との誤差が少なくなるようにパラメータ（接続係数）を補正するバックプロパゲーション（誤差逆伝播法）等の処理により生成される。例えば、学習器ＬＥは、所定の損失（ロス）関数を最小化するようにバックプロパゲーション等の処理を行うことにより生成される。上述のような処理を繰り返すことで、学習器ＬＥは、入力された画像をより良く再現できる出力、すなわち入力された画像の特徴を出力することができる。 For example, the learner LE is generated by the following learning method using deep learning technology. For example, in the learner LE, connection coefficients between nodes are initialized and images having various features are input. Then, the learning device LE is generated by processing such as back propagation (error backpropagation method) that corrects parameters (connection coefficients) so that the error between the output of the learning device LE and the input image is reduced. For example, the learner LE is generated by performing processing such as back propagation so as to minimize a predetermined loss function. By repeating the processing as described above, the learning device LE can output an output capable of better reproducing the input image, that is, output the features of the input image.

なお、学習器ＬＥの学習手法については、上述した手法に限定されるものではなく、任意の公知技術が適用可能である。また、学習器の学習を行う際に用いられる画像は、Ｔシャツが含まれる画像やＴシャツが含まれない画像等の種々の画像のデータセットを利用してもよい。また、学習器ＬＥに対する画像の入力方法、学習器ＬＥが出力するデータの形式、学習器ＬＥに対して明示的に学習させる特徴の内容等は、任意の手法が適用できる。すなわち、情報処理装置１００は、画像から抽象化された特徴を示す特徴量を算出できるのであれば、任意の学習器を用いることができる。 Note that the learning method of the learner LE is not limited to the method described above, and any known technique can be applied. Images used for learning by the learner may be data sets of various images such as images containing T-shirts and images not containing T-shirts. Any method can be applied to the method of inputting an image to the learning device LE, the format of the data output from the learning device LE, the contents of features that are explicitly learned by the learning device LE, and the like. In other words, the information processing apparatus 100 can use any learning device as long as it can calculate a feature quantity representing a feature abstracted from an image.

図１および図２では、情報処理装置１００は、入力画像の局所領域の畳み込みとプーリングとを繰り返す、いわゆる畳み込みニューラルネットワーク（Convolutional Neural Network）による学習器ＬＥを用いる。以下では、畳み込みニューラルネットワークをＣＮＮと記載する場合がある。例えば、ＣＮＮによる学習器ＬＥは、画像から特徴を抽出して出力する機能に加え、画像内に含まれる文字や撮像対象等の位置的変異に対し、出力の不変性を有する。このため、学習器ＬＥは、画像の抽象化された特徴を精度良く算出することができる。例えば、学習器ＬＥは、画像の抽象化された特徴として、入力画像に含まれる所定のオブジェクトが属するクラスを分類結果として算出する。 1 and 2, the information processing apparatus 100 uses a learner LE based on a so-called convolutional neural network that repeats convolution and pooling of local regions of an input image. Below, a convolutional neural network may be described as CNN. For example, the CNN-based learner LE has a function of extracting and outputting features from an image, and in addition, has invariance of output with respect to positional variations of characters, imaging targets, etc. included in the image. Therefore, the learning device LE can accurately calculate the abstracted features of the image. For example, the learning device LE calculates, as a classification result, a class to which a predetermined object included in the input image belongs as an abstract feature of the image.

具体的には、図２では、情報処理装置１００は、処理対象の画像に含まれる複数のオブジェクトそれぞれに対してクラス分類を行うことで、複数のオブジェクトの中からクラス「Ｔシャツ」に分類されるオブジェクトを識別する学習器ＬＥを用いるものとする。 Specifically, in FIG. 2, the information processing apparatus 100 classifies each of the plurality of objects included in the image to be processed into a class "T-shirt" among the plurality of objects. Let us use a learner LE that identifies objects that

このようなことから、図２では、商品画像ＩＭ１１が入力された学習器ＬＥは、商品画像ＩＭ１１にＴシャツが含まれるかを識別する（ステップＳ１２２）。図２では、商品画像ＩＭ１１にはＴシャツであるオブジェクトＯＢ１１が含まれる。このため、学習器ＬＥは、商品画像ＩＭ１１に含まれる複数のオブジェクトそれぞれに対してクラス分類を行うことで、複数のオブジェクトの中からクラス「Ｔシャツ」に分類されるオブジェクトＯＢ１１を識別する。この結果、ステップＳ１２２では、学習器ＬＥは、商品画像ＩＭ１１には、クラス「Ｔシャツ」に分類されるオブジェクトＯＢ１１が取引対象のオブジェクトとして写されていることを示す識別情報ＩＲを生成する。なお、ステップＳ１２２は、学習器ＬＥの動作を説明するための処理であり、行われなくてもよい。 For this reason, in FIG. 2, the learning device LE to which the product image IM11 is input identifies whether the product image IM11 includes a T-shirt (step S122). In FIG. 2, the product image IM11 includes an object OB11 that is a T-shirt. Therefore, the learning device LE classifies each of the plurality of objects included in the product image IM11, thereby identifying the object OB11 classified into the class “T-shirt” from among the plurality of objects. As a result, in step S122, the learning device LE generates the identification information IR indicating that the object OB11 classified into the class "T-shirt" is shown in the product image IM11 as an object to be traded. Note that step S122 is a process for explaining the operation of the learning device LE, and does not have to be performed.

ここで、学習器ＬＥは、ＣＮＮにより生成された学習器であり、複数の中間層Ａ～Ｃ等を含む。そこで、情報処理装置１００は、商品画像ＩＭ１１を学習器ＬＥに入力した際に所定の中間層における情報（以下、「中間画像」とする）を取得する。図２では、情報処理装置１００は、商品画像ＩＭ１１を学習器ＬＥに入力した際に中間層Ｂにおける中間画像を取得する（ステップＳ１２３）。具体的には、情報処理装置１００は、中間画像ＭＭ１１～ＭＭ１９を含む中間画像群ＭＧ１０を取得する。なお、図２では、中間画像ＭＭ１１～ＭＭ１９において特徴を示す領域は、色が濃い態様で示す。例えば、中間画像ＭＭ１２は、中央部に特徴を示す領域が含まれることを示す。また、例えば、中間画像ＭＭ１６は、略特徴を示す領域が含まれないことを示す。 Here, the learner LE is a learner generated by CNN and includes a plurality of intermediate layers A to C and the like. Therefore, the information processing apparatus 100 acquires information in a predetermined intermediate layer (hereinafter referred to as "intermediate image") when the product image IM11 is input to the learning device LE. In FIG. 2, the information processing apparatus 100 acquires an intermediate image in the intermediate layer B when the product image IM11 is input to the learning device LE (step S123). Specifically, information processing apparatus 100 acquires intermediate image group MG10 including intermediate images MM11 to MM19. Note that, in FIG. 2, regions showing characteristics in the intermediate images MM11 to MM19 are shown in dark colors. For example, the intermediate image MM12 indicates that the central portion includes a characteristic area. Further, for example, the intermediate image MM16 indicates that it does not include an area representing a substantial feature.

中間画像についてより詳細に説明する。例えば、学習器ＬＥは、商品画像ＩＭ１１から２次元の行列で表される特徴強度を得る。このため、中間画像は、行列をヒートマップ化した、いわゆる特徴マップである。このため、本実施形態では、中間画像のことを特徴マップと言い換えられるものとする。 Intermediate images will be described in more detail. For example, the learning device LE obtains feature strength represented by a two-dimensional matrix from the product image IM11. Therefore, the intermediate image is a so-called feature map in which the matrix is heat-mapped. Therefore, in this embodiment, the intermediate image can be called a feature map.

そして、情報処理装置１００は、中間画像群ＭＧ１０から、所定の対象の認識率向上に寄与する中間画像を抽出（取得）する（ステップＳ１２４）。図２では、情報処理装置１００は、中間画像群ＭＧ１０から、取引対象のオブジェクトＯＢ１１に対応するＴシャツの認識率向上に寄与する中間画像を抽出する。すなわち、情報処理装置１００は、オブジェクトＯＢ１１がクラス「Ｔシャツ」に属すると判断するきっかとになった貢献度が所定値より高い中間画像を抽出する。さらにいうなれば、情報処理装置１００は、取引対象のオブジェクトＯＢ１１がクラス「Ｔシャツ」と判定される確率が所定値より高い中間画像を抽出する。各中間画像は、元画像である商品画像ＩＭ１１を縮小したサイズであり、また、商品画像ＩＭ１１よりも画質が荒いグリッド状である。ここで、例えば、グリッド毎にクラス「Ｔシャツ」と判定される確率が算出されるため、情報処理装置１００は、中間画像群ＭＧ１０から、クラス「Ｔシャツ」と判定された確率が所定値より高いグリッドで構成される領域に対応する中間画像を抽出する。 Then, the information processing apparatus 100 extracts (obtains) an intermediate image that contributes to an improvement in the recognition rate of the predetermined target from the intermediate image group MG10 (step S124). In FIG. 2, the information processing apparatus 100 extracts intermediate images that contribute to improving the recognition rate of the T-shirt corresponding to the transaction object OB11 from the intermediate image group MG10. That is, the information processing apparatus 100 extracts an intermediate image whose degree of contribution is higher than a predetermined value and which has led to the determination that the object OB11 belongs to the class "T-shirt". In other words, the information processing apparatus 100 extracts an intermediate image in which the probability that the object OB11 to be traded is determined to be of the class "T-shirt" is higher than a predetermined value. Each intermediate image has a reduced size of the product image IM11, which is the original image, and has a grid shape with rougher image quality than the product image IM11. Here, for example, since the probability of being determined to be the class “T-shirt” is calculated for each grid, the information processing apparatus 100 determines that the probability of being determined to be the class “T-shirt” from the intermediate image group MG10 is higher than the predetermined value. Extract the intermediate image corresponding to the region composed of high grids.

図２の例では、情報処理装置１００は、オブジェクトＯＢ１１がクラス「Ｔシャツ」に属すると判断するきっかとになった貢献度が所定値より高い中間画像として、中間画像ＭＭ１２、ＭＭ１４、ＭＭ１７、ＭＭ１８を抽出したとする。なお、ここでいうクラス「Ｔシャツ」は、処理対象の画像である商品画像ＩＭ１１に対する分類結果の一例である。 In the example of FIG. 2, the information processing apparatus 100 selects the intermediate images MM12, MM14, MM17, and MM18 as the intermediate images whose degree of contribution is higher than a predetermined value and which has led to the determination that the object OB11 belongs to the class "T-shirt". is extracted. Note that the class “T-shirt” here is an example of the classification result for the product image IM11, which is the image to be processed.

なお、情報処理装置１００は、中間画像群ＭＧ１０に含まれる中間画像ＭＭ１１～ＭＭ１９の各々への加工に応じたＴシャツの認識率の変化に基づいて、中間画像を抽出してもよい。ここでいう中間画像の加工とは、中間画像の輝度を所定の値だけ増加させること等、目的に応じて種々の手段により行われてもよい。なお、加工によりＴシャツの認識率の変化を生じさせる中間画像は、Ｔシャツの認識に影響を持つ中間画像であることが推定される。そのため、情報処理装置１００は、加工によりＴシャツの認識率の変化を生じさせる中間画像を抽出する。 Information processing apparatus 100 may extract an intermediate image based on a change in the T-shirt recognition rate according to processing of each of intermediate images MM11 to MM19 included in intermediate image group MG10. Processing of the intermediate image referred to here may be performed by various means depending on the purpose, such as increasing the brightness of the intermediate image by a predetermined value. It is presumed that an intermediate image that causes a change in the recognition rate of the T-shirt due to processing is an intermediate image that affects the recognition of the T-shirt. Therefore, the information processing apparatus 100 extracts an intermediate image that causes a change in the T-shirt recognition rate by processing.

そして、情報処理装置１００は、ステップＳ１２４で抽出した中間画像ＭＭ１２、ＭＭ１４、ＭＭ１７、ＭＭ１８を合成する（ステップＳ１２５）。図２では、情報処理装置１００は、中間画像ＭＭ１２、ＭＭ１４、ＭＭ１７、ＭＭ１８を合成することにより、合成画像ＣＭ１１を生成する。具体的には、情報処理装置１００は、中間層Ｂから取得した中間画像ＭＭ１１～ＭＭ１９のうち、中間画像ＭＭ１２、ＭＭ１４、ＭＭ１７、ＭＭ１８をプーリングする。そして、情報処理装置１００は、プーリングした中間画像ＭＭ１２、ＭＭ１４、ＭＭ１７、ＭＭ１８を、オブジェクトＯＢ１１に対して分類した分類結果であるクラス「Ｔシャツ」にマッピングすることで、合成画像ＣＭ１１を生成する。なお、合成画像ＣＭ１１も特徴マップであることには変わりない。また、合成画像ＣＭ１１も、元画像である商品画像ＩＭ１１を縮小したサイズであるとともに、商品画像ＩＭ１１よりも画質が荒いグリッド状である。 Then, the information processing apparatus 100 synthesizes the intermediate images MM12, MM14, MM17, and MM18 extracted in step S124 (step S125). In FIG. 2, the information processing apparatus 100 generates a synthetic image CM11 by synthesizing intermediate images MM12, MM14, MM17, and MM18. Specifically, the information processing apparatus 100 pools the intermediate images MM12, MM14, MM17, and MM18 among the intermediate images MM11 to MM19 acquired from the intermediate layer B. FIG. Then, the information processing apparatus 100 maps the pooled intermediate images MM12, MM14, MM17, and MM18 to the class "T-shirt", which is the classification result of classifying the object OB11, thereby generating the composite image CM11. Note that the synthesized image CM11 is also a feature map. The composite image CM11 also has a reduced size of the product image IM11, which is the original image, and has a grid shape with rougher image quality than the product image IM11.

そして、情報処理装置１００は、図２に示すように、クラス「Ｔシャツ」に分類されるオブジェクトであるオブジェクトＯＢ１１の特徴を示す領域を含む合成画像ＣＭ１１を生成する。 Then, as shown in FIG. 2, the information processing apparatus 100 generates a composite image CM11 including an area representing the characteristics of the object OB11, which is an object classified into the class "T-shirt".

次に、情報処理装置１００は、商品画像ＩＭ１１において、オブジェクトＯＢ１１がクラス「Ｔシャツ」分類されるうえで有用となった領域である有用領域を特定する（ステップＳ１２６）。例えば、情報処理装置１００は、合成画像ＣＭ１１を商品画像ＩＭ１１と同一のサイズにまで拡大し、拡大した合成画像と、商品画像ＩＭ１１とを比較することにより、オブジェクトＯＢ１１を含む領域を特定し、特定した領域を有用領域と定める。この点について、より詳細に説明する。上記説明したように、合成画像ＣＭ１１は、クラス「Ｔシャツ」と判定された確率が所定値より高いグリッドで構成される領域に対応する中間画像である。このため、情報処理装置１００は、合成画像ＣＭ１１を商品画像ＩＭ１１に戻すことで、グリッド状に分割された商品画像ＩＭ１１を得たうえで、このグリッド毎に算出された確率（貢献度）に基づいて、確率が所定値より高いと判定された領域を有用領域と定める。 Next, the information processing apparatus 100 identifies a useful area in the product image IM11 that is useful for classifying the object OB11 into the class “T-shirt” (step S126). For example, the information processing apparatus 100 enlarges the synthetic image CM11 to the same size as the product image IM11, compares the enlarged synthetic image with the product image IM11, and identifies an area including the object OB11. The area where the This point will be described in more detail. As described above, the composite image CM11 is an intermediate image corresponding to an area formed by grids with a higher probability than a predetermined value of being determined as the class "T-shirt". Therefore, the information processing apparatus 100 returns the composite image CM11 to the product image IM11 to obtain the product image IM11 divided into grids, and then, based on the probability (contribution) calculated for each grid, Then, an area determined to have a probability higher than a predetermined value is determined as a useful area.

図２の例では、情報処理装置１００は、領域Ｒ１１を有用領域として特定している。図２に示す商品画像ＩＭ１１は、オブジェクトＯＢ１１の他にも花瓶やイス、複雑な背景が映されているが、情報処理装置１００は、このような商品画像ＩＭ１１から、クラス「Ｔシャツ」に分類されるオブジェクトＯＢ１１のみを精度よく抽出（検出）している。 In the example of FIG. 2, the information processing device 100 identifies the region R11 as the useful region. The product image IM11 shown in FIG. 2 includes a vase, a chair, and a complicated background in addition to the object OB11. Only the object OB11 to be detected is accurately extracted (detected).

次に、情報処理装置１００は、領域Ｒ１１で抽出されるオブジェクト（すなわちオブジェクトＯＢ１１）の低次特徴を抽出する（ステップＳ１２７）。ここでいう低次特徴とは、領域Ｒ１１で抽出されるオブジェクトの色相、エッジ、フロー等である。例えば、情報処理装置１００は、商品画像ＩＭ１１のうち領域Ｒ１１で抽出される部分の画像を解析することにより、領域Ｒ１１で抽出されるオブジェクトの低次特徴を抽出する。図２の例では、情報処理装置１００が低次特徴として、低次特徴ＦＡ１１を抽出した例を示す。また、抽出された低次特徴は、特徴情報記憶部１２２に格納される。 Next, the information processing apparatus 100 extracts low-order features of the object extracted from the region R11 (that is, the object OB11) (step S127). The low-level features referred to here are the hue, edge, flow, etc. of the object extracted in the region R11. For example, the information processing apparatus 100 extracts the low-order features of the object extracted in the region R11 by analyzing the image of the portion extracted in the region R11 of the product image IM11. The example of FIG. 2 shows an example in which the information processing apparatus 100 extracts a low-level feature FA11 as a low-level feature. Also, the extracted low-order features are stored in the feature information storage unit 122 .

さて、図１および図２を用いて説明してきたように、実施形態にかかる情報処理装置１００は、処理対象の画像においてクラス分類のために有用な有用領域を特定し、処理対象の画像のうち、有用領域に含まれる画像から低次特徴（低次特徴量）を抽出する。例えば、情報処理装置１００は、上述したように、ニューラルネットワーク（例えば、ＣＮＮ）における情報から、所定のクラスに分類されるオブジェクトを適切に認識するために用いる情報を生成する。図１および図２の例では、情報処理装置１００は、ＣＮＮの中間層における中間画像群ＭＧ１０から、オブジェクトＯＢ１１がＴシャツに分類されることに貢献した中間画像を抽出する。そして、情報処理装置１００は、抽出した中間画像を合成することにより、合成画像を生成する。すなわち、情報処理装置１００は、オブジェクトＯＢ１１がＴシャツに分類されることに貢献した中間画像のみを用いて合成画像を生成する。したがって、情報処理装置１００は、商品画像ＩＭ１１からオブジェクトＯＢ１１の特徴領域を精度よく示す示す合成画像を生成することができるため、オブジェクトＯＢ１１以外にも様々なオブジェクトや背景を含む商品画像ＩＭ１１からオブジェクトＯＢ１１のみを精度よく抽出することができる。この結果、情報処理装置１００は、クラス「Ｔシャツ」という分類結果に対応する対象物、すなわちオブジェクトＯＢ１１に応じた低次特徴を高精度に抽出することができる。 As described with reference to FIGS. 1 and 2, the information processing apparatus 100 according to the embodiment identifies useful regions useful for class classification in an image to be processed, , to extract low-level features (low-level features) from the image included in the useful region. For example, as described above, the information processing apparatus 100 generates information used for appropriately recognizing objects classified into a predetermined class from information in a neural network (for example, CNN). In the example of FIGS. 1 and 2, the information processing apparatus 100 extracts intermediate images that contributed to the classification of the object OB11 as a T-shirt from the intermediate image group MG10 in the intermediate layer of the CNN. Then, the information processing apparatus 100 generates a synthetic image by synthesizing the extracted intermediate images. That is, the information processing apparatus 100 generates a composite image using only intermediate images that contributed to the classification of the object OB11 as a T-shirt. Therefore, the information processing apparatus 100 can generate a composite image that accurately shows the characteristic region of the object OB11 from the product image IM11. can be extracted with high precision. As a result, the information processing apparatus 100 can highly accurately extract the low-order features corresponding to the object corresponding to the classification result of the class "T-shirt", that is, the object OB11.

また、実施形態にかかる情報処理装置１００は、上記のように抽出した低次特徴を図で説明した類似画像検索で用いることができるため、例えばユーザから受け付けた検索クエリ画像の特徴に対してより近しい特徴を有する商品画像を精度良く検索することができる。この結果、情報処理装置１００は、類似画像検索におけるユーザ満足度を高めることができる。 Further, since the information processing apparatus 100 according to the embodiment can use the low-level features extracted as described above in the similar image search described with reference to the drawings, for example, the information processing apparatus 100 can be more sensitive to the features of the search query image received from the user. Product images having similar features can be searched with high accuracy. As a result, the information processing apparatus 100 can improve user satisfaction in similar image retrieval.

また、実施形態にかかる情報処理装置１００は、図１および図２で説明した情報処理によれば、コストのかかるモデルを用いることなく、既存の学習器を活用しながらも低次特徴を高精度に抽出することができるため、コストを抑えつつも低次特徴を高精度に抽出することができる。 Further, according to the information processing described with reference to FIGS. 1 and 2, the information processing apparatus 100 according to the embodiment can obtain low-order features with high accuracy while utilizing existing learning devices without using costly models. Therefore, low-order features can be extracted with high accuracy while suppressing cost.

〔３．情報処理装置の構成〕
次に、図４を用いて、実施形態にかかる情報処理装置１００について説明する。図４は、実施形態にかかる情報処理装置１００の構成例を示す図である。図４に示すように、情報処理装置１００は、通信部１１０と、記憶部１２０と、制御部１３０とを有する。例えば、情報処理装置１００は、図１および図２で説明した情報処理を行うサーバ装置である。 [3. Configuration of Information Processing Device]
Next, the information processing apparatus 100 according to the embodiment will be described using FIG. FIG. 4 is a diagram illustrating a configuration example of the information processing apparatus 100 according to the embodiment. As shown in FIG. 4, the information processing apparatus 100 has a communication section 110, a storage section 120, and a control section . For example, the information processing device 100 is a server device that performs the information processing described with reference to FIGS. 1 and 2 .

（通信部１１０について）
通信部１１０は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。そして、通信部１１０は、ネットワークＮと有線または無線で接続され、例えば、端末装置１０や提供者装置２０との間で情報の送受信を行う。 (Regarding communication unit 110)
The communication unit 110 is realized by, for example, a NIC (Network Interface Card) or the like. The communication unit 110 is connected to the network N by wire or wirelessly, and transmits and receives information to and from the terminal device 10 and the provider device 20, for example.

（記憶部１２０について）
記憶部１２０は、例えば、ＲＡＭ（Random Access Memory)、フラッシュメモリ等の半導体メモリ素子またはハードディスク、光ディスク等の記憶装置によって実現される。記憶部１２０は、画像情報記憶部１２１と、特徴情報記憶部１２２とを有する。 (Regarding storage unit 120)
The storage unit 120 is realized by, for example, a RAM (Random Access Memory), a semiconductor memory device such as a flash memory, or a storage device such as a hard disk or an optical disk. The storage unit 120 has an image information storage unit 121 and a feature information storage unit 122 .

（画像情報記憶部１２１について）
画像情報記憶部１２１は、画像に関する各種情報を記憶する。図１および図２の例では、画像情報記憶部１２１は、提供者より入稿された商品画像に関する情報を記憶する。ここで、図５に実施形態にかかる画像情報記憶部１２１の一例を示す。図５の例では、画像情報記憶部１２１は、「画像ＩＤ」、「画像データ」といった項目を有する。 (Regarding the image information storage unit 121)
The image information storage unit 121 stores various information about images. In the examples of FIGS. 1 and 2, the image information storage unit 121 stores information about product images submitted by the provider. Here, FIG. 5 shows an example of the image information storage unit 121 according to the embodiment. In the example of FIG. 5, the image information storage unit 121 has items such as "image ID" and "image data".

「画像ＩＤ」は、提供者より入稿された商品画像を識別する識別情報を示す。「画像データ」は、「画像ＩＤ」によって識別蛙される商品画像そのもの、つまり画像情報を示す。 "Image ID" indicates identification information for identifying the product image submitted by the provider. "Image data" indicates the product image itself identified by the "image ID", that is, image information.

すなわち、図５の例では、情報処理装置１００が、画像ＩＤ「ＩＭ１１」によって識別される商品画像（商品画像ＩＭ１１）の入稿を受け付けている例を示す。なお、図５では不図示であるが、提供者を識別する提供者ＩＤがさらに紐付けられてもよい。 That is, the example of FIG. 5 shows an example in which the information processing apparatus 100 receives submission of a product image (product image IM11) identified by the image ID “IM11”. Although not shown in FIG. 5, a provider ID that identifies the provider may be further linked.

（特徴情報記憶部１２２について）
特徴情報記憶部１２２は、低次特徴に関する情報を記憶する。ここで、図６に実施形態にかかる特徴情報記憶部１２２の一例を示す。図６の例では、特徴情報記憶部１２２は、「画像ＩＤ」、「領域情報」、「低次特徴」といった項目を有する。 (Regarding the feature information storage unit 122)
The feature information storage unit 122 stores information on low-order features. Here, FIG. 6 shows an example of the feature information storage unit 122 according to the embodiment. In the example of FIG. 6, the feature information storage unit 122 has items such as "image ID", "area information", and "lower-level feature".

「画像ＩＤ」は、図５の例と同様に、提供者より入稿された商品画像を識別する識別情報を示す。「領域情報」は、クラス分類のために有用な有用領域を示す情報である。「領域情報」は、例えば、「画像ＩＤ」によって識別される商品画像（元画像）であって、有用領域が示された商品画像として記憶されてもよい。「低次特徴」は、「領域情報」が示す有用領域によって囲まれるオブジェクトから抽出された低次特徴を示す。 "Image ID" indicates identification information for identifying the product image submitted by the provider, as in the example of FIG. "Region information" is information indicating a useful region useful for class classification. The “area information” may be, for example, a product image (original image) identified by the “image ID” and stored as a product image indicating the useful area. "Low-order features" indicate low-order features extracted from the object surrounded by the useful region indicated by the "region information".

すなわち、図６の例では、画像ＩＤ「ＩＭ１１」によって識別される商品画像（商品画像ＩＭ１１）では、領域Ｒ１１が有用領域として特定されたことにより、領域Ｒ１１内のオブジェクトの低次特徴として低次特徴ＦＡ１１が抽出された例を示す。なお、係る例は、図２で示した例に対応する。また、図５では不図示であるが、有用領域で抽出されるオブジェクトを示す情報がさらに紐付けられてもよい。 That is, in the example of FIG. 6, in the product image (product image IM11) identified by the image ID “IM11”, since the region R11 is specified as the useful region, the low-order feature of the object in the region R11 is An example in which feature FA11 is extracted is shown. Note that this example corresponds to the example shown in FIG. Further, although not shown in FIG. 5, information indicating an object extracted in the useful area may be further linked.

（制御部１３０について）
図４に戻り、制御部１３０は、ＣＰＵやＭＰＵ等によって、情報処理装置１００内部の記憶装置に記憶されている各種プログラムがＲＡＭを作業領域として実行されることにより実現される。また、制御部１３０は、例えば、ＡＳＩＣやＦＰＧＡ等の集積回路により実現される。 (Regarding the control unit 130)
Returning to FIG. 4, the control unit 130 is realized by executing various programs stored in the storage device inside the information processing apparatus 100 using the RAM as a work area by the CPU, MPU, or the like. Also, the control unit 130 is realized by an integrated circuit such as an ASIC or FPGA, for example.

図３に示すように、制御部１３０は、取得部１３１と、入力部１３２と、抽出部１３３と、生成部１３４と、特定部１３５と、特徴抽出部１３６と、検索処理部１３７と、提示部１３８とを有し、以下に説明する情報処理の機能や作用を実現または実行する。なお、制御部１３０の内部構成は、図３に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。また、制御部１３０が有する各処理部の接続関係は、図３に示した接続関係に限られず、他の接続関係であってもよい。 As shown in FIG. 3, the control unit 130 includes an acquisition unit 131, an input unit 132, an extraction unit 133, a generation unit 134, an identification unit 135, a feature extraction unit 136, a search processing unit 137, a presentation 138, and implements or executes the information processing functions and actions described below. Note that the internal configuration of the control unit 130 is not limited to the configuration shown in FIG. 3, and may be another configuration as long as it performs information processing described later. Moreover, the connection relationship between the processing units of the control unit 130 is not limited to the connection relationship shown in FIG. 3, and may be another connection relationship.

（取得部１３１について）
取得部１３１は、画像を取得する。例えば、取得部１３１は、提供者から商品画像を取得する。すなわち、取得部１３１は、提供者から商品画像の入稿を受け付ける。図１および図２の例では、取得部１３１は、提供者Ｔ１の提供者装置２０から商品画像ＩＭ１１を取得する。すなわち、取得部１３１は、提供者Ｔ１の提供者装置２０から商品画像ＩＭ１１の入稿を受け付ける。また、取得部１３１は、取得した商品画像を画像情報記憶部１２２１に格納する。 (Regarding the acquisition unit 131)
Acquisition unit 131 acquires an image. For example, the acquisition unit 131 acquires product images from the provider. That is, the acquisition unit 131 receives a product image submission from the provider. In the example of FIGS. 1 and 2, the acquisition unit 131 acquires the product image IM11 from the provider device 20 of the provider T1. That is, the acquisition unit 131 receives the submission of the product image IM11 from the provider device 20 of the provider T1. The acquisition unit 131 also stores the acquired product image in the image information storage unit 1221 .

また、取得部１３１は、各処理部に行って処理が行われる際に、その処理に必要な情報を記憶部から取得し、対応する処理部に出力する処理も行うことがでいる。例えば、取得部１３１は、入力部１３２によって学習器に処理対象の画像が入力される際に、画像情報記憶部１２１から処理対象の画像を取得し、取得した処理対象の画像を入力部１３２に送信する。 In addition, the acquisition unit 131 can acquire information necessary for the processing from the storage unit and output the information to the corresponding processing unit when the processing is performed by going to each processing unit. For example, when an image to be processed is input to the learning device by the input unit 132, the acquisition unit 131 acquires the image to be processed from the image information storage unit 121, and sends the acquired image to be processed to the input unit 132. Send.

（入力部１３２について）
入力部１３２は、入力画像に含まれる所定のオブジェクト対してクラス分類した分類結果を出力するニューラルネットワークであるモデルに、処理対象の画像を入力する。図２の例では、入力部１３２は、学習器ＬＥに商品画像ＩＭ１１（処理対象の画像の一例）を入力する。 (Regarding the input unit 132)
The input unit 132 inputs an image to be processed to a model, which is a neural network that outputs a classification result of classifying a predetermined object included in the input image. In the example of FIG. 2, the input unit 132 inputs the product image IM11 (an example of the image to be processed) to the learning device LE.

（抽出部１３３について）
抽出部１３３は、ニューラルネットワークの中間層における所定の情報を抽出（取得）する。例えば、抽出部１３３は、所定の情報として、中間層で得られた特徴マップであって、処理対象の画像に対する分類結果への貢献度に応じた特徴マップを抽出（取得）する。例えば、抽出部１３３は、画像中の所定のオブジェクトを認識するニューラルネットワークの中間層における中間画像群から、所定のオブジェクトの認識率向上に寄与する中間画像（特徴マップの一例）を抽出する。例えば、抽出部１３３は、処理対象の画像に含まれる複数のオブジェクトそれぞれに対してクラス分類を行うことで、複数のオブジェクトの中からクラス「Ｔシャツ」に分類されるオブジェクトを識別（認識）するニューラルネットワークの中間層における中間画像群から、Ｔシャツの認識率向上に寄与する中間画像を抽出する。言い換えると、抽出部１３３は、中間画像群からクラス「Ｔシャツ」に属すると判断するきっかとになった貢献度が所定値より高い中間画像を抽出する。 (Regarding the extraction unit 133)
The extraction unit 133 extracts (obtains) predetermined information in the intermediate layer of the neural network. For example, the extraction unit 133 extracts (obtains), as the predetermined information, a feature map obtained in the intermediate layer and corresponding to the degree of contribution to the classification result for the image to be processed. For example, the extraction unit 133 extracts an intermediate image (an example of a feature map) that contributes to improving the recognition rate of a given object from a group of intermediate images in the intermediate layer of the neural network that recognizes the given object in the image. For example, the extraction unit 133 classifies each of a plurality of objects included in the image to be processed, thereby identifying (recognizing) an object classified into the class “T-shirt” from among the plurality of objects. An intermediate image that contributes to an improvement in the T-shirt recognition rate is extracted from the group of intermediate images in the intermediate layer of the neural network. In other words, the extraction unit 133 extracts from the group of intermediate images an intermediate image whose degree of contribution is higher than a predetermined value and which has caused the determination that the image belongs to the class “T-shirt”.

また、例えば、抽出部１３３は、畳み込み処理及びプーリング処理を行うニューラルネットワークの中間層における中間画像群から、所定のオブジェクトの認識率向上に寄与する中間画像を抽出する。例えば、抽出部１３３は、ＣＮＮの中間層における中間画像群から、所定のオブジェクトの認識率向上に寄与する中間画像を抽出する。図１では、抽出部１３３は、ＣＮＮの中間層における中間画像群ＭＧ１０から、Ｔシャツの認識率向上に寄与する中間画像ＭＭ１２、ＭＭ１４、ＭＭ１７、ＭＭ１８を抽出する。 Also, for example, the extraction unit 133 extracts an intermediate image that contributes to an improvement in the recognition rate of a predetermined object from an intermediate image group in an intermediate layer of a neural network that performs convolution processing and pooling processing. For example, the extraction unit 133 extracts an intermediate image that contributes to an improvement in the recognition rate of a predetermined object from intermediate images in the CNN intermediate layer. In FIG. 1, the extraction unit 133 extracts intermediate images MM12, MM14, MM17, and MM18 that contribute to improving the T-shirt recognition rate from the intermediate image group MG10 in the intermediate layer of the CNN.

（生成部１３４について）
生成部１３４は、抽出部１３３により抽出された中間画像を合成した合成画像を生成する。図２では、生成部１３４は、中間画像ＭＭ１２、ＭＭ１４、ＭＭ１７、ＭＭ１８を合成することにより、合成画像ＣＭ１１を生成する。具体的には、生成部１３４は、中間層Ｂから取得された中間画像ＭＭ１１～ＭＭ１９のうち、中間画像ＭＭ１２、ＭＭ１４、ＭＭ１７、ＭＭ１８をプーリングする。そして、生成部１３４は、プーリングした中間画像ＭＭ１２、ＭＭ１４、ＭＭ１７、ＭＭ１８を、オブジェクトＯＢ１１（所定の対象）に対して分類した分類結果であるクラス「Ｔシャツ」にマッピングすることで、合成画像ＣＭ１１を生成する。 (Regarding the generation unit 134)
The generation unit 134 generates a composite image by combining the intermediate images extracted by the extraction unit 133 . In FIG. 2, the generation unit 134 generates a synthetic image CM11 by synthesizing the intermediate images MM12, MM14, MM17, and MM18. Specifically, the generator 134 pools the intermediate images MM12, MM14, MM17, and MM18 out of the intermediate images MM11 to MM19 acquired from the intermediate layer B. FIG. Then, the generating unit 134 maps the pooled intermediate images MM12, MM14, MM17, and MM18 to the class “T-shirt”, which is the classification result of classifying the object OB11 (predetermined target), thereby generating the composite image CM11. to generate

（特定部１３５について）
特定部１３５は、処理対象の画像においてクラス分類のために有用な有用領域を特定する。例えば、特定部１３５は、中間層で得られた特徴マップであって、処理対象の画像に対する分類結果への貢献度に応じた特徴マップ基づいて、処理対象の画像において分類結果への貢献度が所定値より高い領域を判定（特定）する。そして、特定部１３５は、処理対象の画像において分類結果への貢献度が所定値より高いと判定された領域を有用領域として特定する。 (Regarding the identification unit 135)
The specifying unit 135 specifies a useful region useful for class classification in the image to be processed. For example, the specifying unit 135 determines whether the degree of contribution to the classification result in the image to be processed is determined based on the feature map obtained in the intermediate layer and based on the degree of contribution to the classification result of the image to be processed. A region higher than a predetermined value is determined (identified). Then, the specifying unit 135 specifies, as a useful region, a region determined to contribute to the classification result higher than a predetermined value in the image to be processed.

例えば、特定部１３５は、中間層で得られた特徴マップをプーリングした状態で、処理対象の画像に対する分類結果にマッピングし、マッピングした後の特徴マップと、処理対象の画像とを比較した比較結果に基づいて、処理対象の画像において分類結果への貢献度が所定値より高いと判定された領域を有用領域として特定する。一例を示すと、特定部１３５は、処理対象の画像として、特徴マップに基づきグリッド状に分割された処理対象の画像において、グリッド毎に算出された貢献度に基づき処理対象の画像において貢献度が所定値より高いと判定された領域を有用領域として特定する。 For example, the identification unit 135 maps the feature maps obtained in the intermediate layers to the classification result of the image to be processed in a pooled state, and compares the feature map after mapping with the image to be processed. Based on , an area determined to contribute to the classification result higher than a predetermined value in the image to be processed is specified as a useful area. As an example, the identification unit 135 determines that the image to be processed is an image to be processed divided into grids based on the feature map. A region determined to be higher than a predetermined value is specified as a useful region.

図２の例では、特定部１３５は、合成画像ＣＭ１１を商品画像ＩＭ１１と同一のサイズにまで拡大し、拡大した合成画像と、商品画像ＩＭ１１とを比較することにより、オブジェクトＯＢ１１を含む領域を特定する。そして、特定部１３５は、特定した領域を有用領域と定める。ここで、合成画像ＣＭ１１は、クラス「Ｔシャツ」と判定された確率が所定値より高いグリッドで構成される領域に対応する中間画像である。このため、特定部１３５は、合成画像ＣＭ１１を商品画像ＩＭ１１に戻すことで、グリッド状に分割された商品画像ＩＭ１１を得たうえで、このグリッド毎に算出された確率（貢献度）に基づいて、確率が所定値より高い領域を判定（特定）する。そして、特徴抽出部１３６は、確率が所定値より高い判定した領域を有用領域と定める。図２の例では、特徴抽出部１３６は、領域Ｒ１１を有用領域として特定する。 In the example of FIG. 2, the identifying unit 135 enlarges the synthetic image CM11 to the same size as the product image IM11, and compares the enlarged synthetic image with the product image IM11 to specify the area including the object OB11. do. Then, the specifying unit 135 defines the specified area as a useful area. Here, the composite image CM11 is an intermediate image corresponding to an area configured by grids with a higher probability than a predetermined value of being determined as the class “T-shirt”. Therefore, the specifying unit 135 returns the composite image CM11 to the product image IM11 to obtain the product image IM11 divided into grids, and based on the probability (contribution) calculated for each grid, , to determine (identify) regions where the probability is higher than a predetermined value. Then, the feature extracting unit 136 determines the region determined to have a probability higher than a predetermined value as a useful region. In the example of FIG. 2, the feature extraction unit 136 identifies the region R11 as the useful region.

（特徴抽出部１３６について）
特徴抽出部１３６は、処理対象の画像のうち、特定部１３５により特定された有用領域に含まれる画像から低次特徴を抽出する。例えば、特徴抽出部１３６は、商品画像ＩＭ１１のうち領域Ｒ１１で抽出される部分の画像を解析することにより、領域Ｒ１１で抽出されるオブジェクトの低次特徴を抽出する。また、特徴抽出部１３６は、抽出した低次特徴を特徴情報記憶部１２２に格納する。 (Regarding the feature extraction unit 136)
The feature extraction unit 136 extracts low-level features from the image included in the useful region identified by the identification unit 135, among the images to be processed. For example, the feature extraction unit 136 extracts low-order features of the object extracted in the region R11 by analyzing the image of the portion extracted in the region R11 of the product image IM11. The feature extraction unit 136 also stores the extracted low-order features in the feature information storage unit 122 .

（検索処理部１３７について）
検索処理部１３７は、検索クエリに対応する画像を検索する検索処理を行う。例えば、検索処理部１３７は、検索クエリ画像に関連（類似または一致）すう類似画像を検索する類似画像検索を行う。より具体的には、検索処理部１３７は、検索クエリ画像に含まれるオブジェクトであって、例えばユーザにより指定されたオブジェクトである指定オブジェクトに類似するオブジェクトが取引対象として写された商品画像を検索する。例えば、検索処理部１３７は、指定オブジェクトと、各商品画像から抽出された低次特徴とのマッチングを行うことで、指定オブジェクトに類似するオブジェクトを取引対象とする商品画像を検索する。 (Regarding the search processing unit 137)
The search processing unit 137 performs search processing for searching for images corresponding to a search query. For example, the search processing unit 137 performs a similar image search for searching for similar images that are related (similar or matching) to the search query image. More specifically, the search processing unit 137 searches for a product image in which an object included in the search query image, for example, an object similar to a specified object, which is an object specified by the user, is photographed as a transaction target. . For example, the search processing unit 137 searches for product images whose transaction targets are objects similar to the specified object by matching the specified object with low-order features extracted from each product image.

（提示部１３８について）
提示部１３８は、アクセス元のユーザに対して、所定の情報を提示する。例えば、提示部１３８は、検索処理部１３７による検索処理で得られた検索結果を、検索クエリ送信元のユーザに提示（送信）する。例えば、提示部１３８は、検索処理部１３７による検索処理で得られた検索結果としての商品画像を、検索クエリ画像を送信したユーザに提示（送信）する。 (Regarding presentation unit 138)
The presentation unit 138 presents predetermined information to the access source user. For example, the presentation unit 138 presents (transmits) search results obtained by the search processing performed by the search processing unit 137 to the user who sent the search query. For example, the presentation unit 138 presents (transmits) a product image as a search result obtained by the search processing by the search processing unit 137 to the user who sent the search query image.

〔４．処理手順〕
次に、図７を用いて、実施形態にかかる情報処理の手順について説明する。図７は、実施形態にかかる学習処理手順を示すフローチャートである。 [4. Processing procedure]
Next, the procedure of information processing according to the embodiment will be described with reference to FIG. FIG. 7 is a flowchart illustrating a learning processing procedure according to the embodiment;

まず、取得部１３１は、処理対象の画像を取得する（ステップＳ２０１）。例えば、取得部１３１は、提供者により入稿された商品画像を取得する。次に、入力部１３２は、取得部１３１により取得された処理対象の画像を学習器に入力する（ステップＳ２０２）。例えば、入力部１３２は、入力画像に含まれる所定のオブジェクト対してクラス分類した分類結果を出力するニューラルネットワークであるモデルに処理対象の画像を入力する。 First, the acquisition unit 131 acquires an image to be processed (step S201). For example, the acquisition unit 131 acquires a product image submitted by a provider. Next, the input unit 132 inputs the processing target image acquired by the acquisition unit 131 to the learning device (step S202). For example, the input unit 132 inputs the image to be processed to a model, which is a neural network that outputs a result of classifying a predetermined object included in the input image.

次に、抽出部１３３は、ニューラルネットワークの中間層における中間画像群を取得する（ステップＳ２０３）。そして、抽出部１３３は、ステップＳ２０３で取得した中間画像群から、所定のオブジェクト（例えば、取引対象のオブジェクト）の認識率向上に寄与する中間画像を抽出する（ステップＳ２０４）。例えば、抽出部１３３は、認識率（貢献度）向上に寄与する中間画像として、中間層で得られた特徴マップであって、処理対象の画像に対する分類結果への貢献度に応じた特徴マップを抽出（取得）する。 Next, the extraction unit 133 acquires an intermediate image group in the intermediate layer of the neural network (step S203). Then, the extraction unit 133 extracts an intermediate image that contributes to an improvement in the recognition rate of a predetermined object (for example, an object to be traded) from the group of intermediate images acquired in step S203 (step S204). For example, the extraction unit 133 extracts, as an intermediate image that contributes to an improvement in the recognition rate (contribution), a feature map obtained in an intermediate layer that corresponds to the contribution to the classification result for the image to be processed. Extract (obtain).

次に、生成部１３４は、抽出部１３３により抽出された中間画像を合成した合成画像を生成する（ステップＳ２０５）。例えば、生成部１３４は、ステップＳ２０５で抽出した特徴マップをプーリングする。そして、生成部１３４は、プーリングした特徴マップを所定のオブジェクトに対して分類した分類結果にマッピングすることで合成画像を生成する。 Next, the generation unit 134 generates a composite image by combining the intermediate images extracted by the extraction unit 133 (step S205). For example, the generator 134 pools the feature maps extracted in step S205. Then, the generation unit 134 generates a composite image by mapping the pooled feature map to the classification result of classifying a predetermined object.

次に、特定部１３５は、処理対象の画像においてクラス分類のために有用な有用領域を特定する（ステップＳ２０６）。例えば、特定部１３５は、ステップＳ２０６で生成された合成画像と、処理対象の画像とを比較することにより、所定のオブジェクトを含む領域を特定し、特定した領域を有用領域と定める。合成画像は、所定のオブジェクトが分類結果によって示されるクラスに属すると判定された確率が所定値より高いグリッドで構成される領域に対応する中間画像（特徴マップ）である。このため、特定部１３５は、合成画像を処理対象の画像に戻すことで、グリッド状に分割された処理対象の商品画像を得たうえで、このグリッド毎に算出された確率（貢献度）に基づいて、確率が所定値より高いと判定された領域を有用領域と定める。 Next, the specifying unit 135 specifies a useful region useful for class classification in the image to be processed (step S206). For example, the identification unit 135 identifies an area including a predetermined object by comparing the synthesized image generated in step S206 and the image to be processed, and defines the identified area as the useful area. The synthesized image is an intermediate image (feature map) corresponding to an area formed by grids in which the probability that a given object has been determined to belong to the class indicated by the classification result is higher than a given value. For this reason, the identifying unit 135 obtains the product image to be processed divided into grids by returning the composite image to the image to be processed, and then the probability (contribution) calculated for each grid. Based on this, a region determined to have a probability higher than a predetermined value is defined as a useful region.

そして、特徴抽出部１３６は、処理対象の画像のうち、特定部１３５により特定された有用領域に含まれる画像から低次特徴を抽出する。有用領域では、所定のオブジェクトが他のオブジェクトや背景等から精度よく分離されているため、特徴抽出部１３６は、実質、所定のオブジェクトのみを含む画像から、所定のオブジェクトを示す低次特徴を抽出する。 Then, the feature extraction unit 136 extracts low-level features from the image included in the useful region specified by the specification unit 135, among the images to be processed. In the useful area, the predetermined object is separated from other objects, the background, etc., with high accuracy, so the feature extraction unit 136 extracts low-level features indicating the predetermined object from the image that substantially includes only the predetermined object. do.

〔５．変形例〕
上記実施形態にかかる情報処理装置１００は、上記実施形態以外にも種々の異なる形態にて実施されてよい。そこで、以下では、情報処理装置１００の他の実施形態について説明する。 [5. Modification]
The information processing apparatus 100 according to the above embodiment may be implemented in various different forms other than the above embodiment. Therefore, other embodiments of the information processing apparatus 100 will be described below.

〔５－１．特徴ベクトル〕 [5-1. feature vector]

上記実施形態では、特徴抽出部１３６が、特定部１３５により特定された有用領域に含まれる画像から低次特徴を抽出する例を示した。しかし、特徴抽出部１３６は、グリッド状に分割された処理対象の商品画像において、グリッド毎に当該グリッドに対応する画像内から低次特徴を抽出する。そして、特徴抽出部１３６は、抽出した低次特徴の特徴量を特徴ベクトルに変換する。また、特徴抽出部１３６は、変換した特徴ベクトルが処理対象の画像を用いた類似画像検索で利用される情報となるよう制御する。 In the above embodiment, an example was given in which the feature extraction unit 136 extracts low-order features from the image included in the useful region specified by the specification unit 135 . However, the feature extracting unit 136 extracts low-order features from the image corresponding to each grid in the grid-divided product image to be processed. Then, the feature extraction unit 136 converts the feature amount of the extracted low-level feature into a feature vector. Further, the feature extraction unit 136 performs control so that the converted feature vector becomes information used in similar image search using the image to be processed.

例えば、各グリッドには、所定のオブジェクト（例えば、取引対象のオブジェクト）の一部分が含まれる。したがって、特徴抽出部１３６は、所定のオブジェクトの一部分毎に低次特徴を抽出し、抽出した各低次特徴を特徴ベクトルに変換する。そして、特徴抽出部１３６は、例えば、特徴情報記憶部１２２において、処理対象の画像を示す画像ＩＤと、当該処理対象画像から得た特徴ベクトルを特徴情報記憶部１２２に格納することにより、この特徴ベクトルが低次特徴に代わって類似画像検索に用いられるよう制御する。 For example, each grid contains a portion of a given object (eg, a traded object). Therefore, the feature extraction unit 136 extracts low-order features for each part of a predetermined object and converts each extracted low-order feature into a feature vector. Then, the feature extraction unit 136 stores the image ID indicating the image to be processed and the feature vector obtained from the image to be processed in the feature information storage unit 122, for example, so that the feature information is stored in the feature information storage unit 122. Control so that vectors are used for similar image retrieval instead of low-order features.

このように、実施形態にかかる情報処理装置１００は、処理対象の商品画像がグリッド状に分割されたグリッド画像毎に低次特徴を抽出し、抽出した各低次特徴を特徴ベクトルに変換する。また、情報処理装置１００は、各特徴ベクトルが類似画像検索におけるターゲティングに用いられるよう制御する。これにより、情報処理装置１００は、例えば、検索クエリ画像において指定される指定オブジェクトにより近しいオブジェクトが取引対象として示される画像を精度よく検索することができる。 As described above, the information processing apparatus 100 according to the embodiment extracts a low-order feature for each grid image obtained by dividing a product image to be processed into a grid, and converts each extracted low-order feature into a feature vector. Further, the information processing apparatus 100 controls so that each feature vector is used for targeting in similar image retrieval. As a result, the information processing apparatus 100 can accurately search for an image in which an object closer to the specified object specified in the search query image is shown as a transaction target, for example.

〔５－２．解像度を高めるための処理〕
また、特定部１３５は、処理対象の画像として、連続する複数の処理対象の画像（例えば、同一の複数の処理対象の画像）それぞれにおいてクラス分類のために有用な有用領域を特定する。そして、特徴抽出部１３６は、特定部１３５により特定された有用領域に含まれる画像であって、連続する複数の処理対象の画像ぞれぞれに対応する画像が重ね合わされた１つの重ね合せ画像から低次特徴を抽出する。このように、実施形態にかかる情報処理装置１００は、有用領域に含まれる画像であって、連続する複数の処理対象の画像ぞれぞれに対応する画像が重ね合わせることで、より解像度の高い画像を得ることができる。この結果、情報処理装置１００は、より精度よく目的のオブジェクトに対応する低次特徴を抽出することができる。 [5-2. Processing for increasing resolution]
In addition, the identifying unit 135 identifies useful regions useful for class classification in each of a plurality of consecutive images to be processed (for example, a plurality of identical images to be processed) as the images to be processed. Then, the feature extraction unit 136 generates one superimposed image in which the images included in the useful region specified by the specifying unit 135 and corresponding to each of the plurality of consecutive images to be processed are superimposed. Extract low-order features from As described above, the information processing apparatus 100 according to the embodiment superimposes images included in the useful region and corresponding to each of a plurality of consecutive images to be processed, thereby achieving higher resolution. image can be obtained. As a result, the information processing apparatus 100 can more accurately extract low-order features corresponding to the target object.

〔６．ハードウェア構成〕
また、上記実施形態にかかる情報処理装置１００は、例えば図８に示すような構成のコンピュータ１０００によって実現される。図８は、情報処理装置１００の機能を実現するコンピュータ１０００の一例を示すハードウェア構成図である。コンピュータ１０００は、ＣＰＵ１１００、ＲＡＭ１２００、ＲＯＭ１３００、ＨＤＤ１４００、通信インターフェイス（Ｉ／Ｆ）１５００、入出力インターフェイス（Ｉ／Ｆ）１６００、及びメディアインターフェイス（Ｉ／Ｆ）１７００を有する。 [6. Hardware configuration]
Further, the information processing apparatus 100 according to the above embodiment is implemented by a computer 1000 configured as shown in FIG. 8, for example. FIG. 8 is a hardware configuration diagram showing an example of a computer 1000 that implements the functions of the information processing apparatus 100. As shown in FIG. Computer 1000 has CPU 1100 , RAM 1200 , ROM 1300 , HDD 1400 , communication interface (I/F) 1500 , input/output interface (I/F) 1600 and media interface (I/F) 1700 .

ＣＰＵ１１００は、ＲＯＭ１３００又はＨＤＤ１４００に格納されたプログラムに基づいて動作し、各部の制御を行う。ＲＯＭ１３００は、コンピュータ１０００の起動時にＣＰＵ１１００によって実行されるブートプログラムや、コンピュータ１０００のハードウェアに依存するプログラム等を格納する。 The CPU 1100 operates based on programs stored in the ROM 1300 or HDD 1400 and controls each section. The ROM 1300 stores a boot program executed by the CPU 1100 when the computer 1000 is started up, a program depending on the hardware of the computer 1000, and the like.

ＨＤＤ１４００は、ＣＰＵ１１００によって実行されるプログラム、および、かかるプログラムによって使用されるデータ等を格納する。通信インターフェイス１５００は、通信網５０を介して他の機器からデータを受信してＣＰＵ１１００へ送り、ＣＰＵ１１００が生成したデータを、通信網５０を介して他の機器へ送信する。 HDD 1400 stores programs executed by CPU 1100 and data used by these programs. Communication interface 1500 receives data from other devices via communication network 50 and sends the data to CPU 1100 , and transmits data generated by CPU 1100 to other devices via communication network 50 .

ＣＰＵ１１００は、入出力インターフェイス１６００を介して、ディスプレイやプリンタ等の出力装置、及び、キーボードやマウス等の入力装置を制御する。ＣＰＵ１１００は、入出力インターフェイス１６００を介して、入力装置からデータを取得する。また、ＣＰＵ１１００は、生成したデータを、入出力インターフェイス１６００を介して出力装置へ出力する。 The CPU 1100 controls output devices such as displays and printers, and input devices such as keyboards and mice, through an input/output interface 1600 . CPU 1100 acquires data from an input device via input/output interface 1600 . CPU 1100 also outputs the generated data to an output device via input/output interface 1600 .

メディアインターフェイス１７００は、記録媒体１８００に格納されたプログラム又はデータを読み取り、ＲＡＭ１２００を介してＣＰＵ１１００に提供する。ＣＰＵ１１００は、かかるプログラムを、メディアインターフェイス１７００を介して記録媒体１８００からＲＡＭ１２００上にロードし、ロードしたプログラムを実行する。記録媒体１８００は、例えばＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等である。 Media interface 1700 reads programs or data stored in recording medium 1800 and provides them to CPU 1100 via RAM 1200 . CPU 1100 loads such a program from recording medium 1800 onto RAM 1200 via media interface 1700, and executes the loaded program. The recording medium 1800 is, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or a PD (Phase change rewritable disc), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory. etc.

例えば、コンピュータ１０００が実施形態にかかる情報処理装置１００として機能する場合、コンピュータ１０００のＣＰＵ１１００は、ＲＡＭ１２００上にロードされたプログラムを実行することにより、制御部１３０の機能を実現する。また、ＨＤＤ１４００には、記憶部１２０内のデータが格納される。コンピュータ１０００のＣＰＵ１１００は、これらのプログラムを、記録媒体１８００から読み取って実行するが、他の例として、他の装置から、通信網５０を介してこれらのプログラムを取得してもよい。 For example, when the computer 1000 functions as the information processing apparatus 100 according to the embodiment, the CPU 1100 of the computer 1000 implements the functions of the control unit 130 by executing programs loaded on the RAM 1200 . In addition, data in storage unit 120 is stored in HDD 1400 . CPU 1100 of computer 1000 reads these programs from recording medium 1800 and executes them, but as another example, these programs may be obtained from another device via communication network 50 .

〔７．その他〕
また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 [7. others〕
Also, each component of each device illustrated is functionally conceptual, and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution and integration of each device is not limited to the one shown in the figure, and all or part of them can be functionally or physically distributed and integrated in arbitrary units according to various loads and usage conditions. Can be integrated and configured.

以上、本願の実施形態をいくつかの図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の欄に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 As described above, the embodiments of the present application have been described in detail based on several drawings, but these are examples, and various modifications and It is possible to carry out the invention in other forms with modifications.

また、上述してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、取得部は、取得手段や取得回路に読み替えることができる。 Also, the above-mentioned "section, module, unit" can be read as "means" or "circuit". For example, the acquisition unit can be read as acquisition means or an acquisition circuit.

１情報処理システム
１０端末装置
２０提供者装置
１００情報処理装置
１２０記憶部
１２１画像情報記憶部
１２２特徴情報記憶部
１３０制御部
１３１取得部
１３２入力部
１３３抽出部
１３４生成部
１３５特定部
１３６特徴抽出部
１３７検索処理部
１３８提示部 1 information processing system 10 terminal device 20 provider device 100 information processing device 120 storage unit 121 image information storage unit 122 feature information storage unit 130 control unit 131 acquisition unit 132 input unit 133 extraction unit 134 generation unit 135 identification unit 136 feature extraction unit 137 search processing unit 138 presentation unit

Claims

A learner trained as a neural network for recognizing a predetermined object included in an image to be processed, wherein the predetermined an acquisition unit that acquires an intermediate image that contributes to an improvement in object recognition rate;
The degree of contribution of the image to be processed to the classification result is a predetermined value based on a comparison result of comparing the intermediate image mapped to the classification result for the image to be processed and the image to be processed. an identification unit that identifies the region determined to be higher as a useful region;
and an extracting unit that extracts low-order features from an image included in the useful region specified by the specifying unit, among the images to be processed.

2. The information processing apparatus according to claim 1, wherein the acquisition unit acquires, as the intermediate image, a feature map corresponding to a degree of contribution to a classification result for the image to be processed.

The specifying unit maps the pooled feature maps to a classification result of the image to be processed, and compares the feature map after mapping with the image to be processed. 3. The information processing apparatus according to claim 2, wherein, in the image to be processed, an area determined to contribute to a classification result higher than a predetermined value is specified as the useful area.

The specifying unit determines the degree of contribution in the image to be processed based on the degree of contribution calculated for each grid in the image to be processed divided into grids based on the feature map. 4. The information processing apparatus according to claim 3, wherein an area determined to be higher than a predetermined value is specified as the useful area.

The extraction unit extracts low-order features from the grid for each grid, and the feature vector converted from the feature amount of the extracted low-order features is information used in similar image search using the image to be processed. 5. The information processing apparatus according to claim 4, wherein the control is performed such that

The identifying unit identifies useful regions useful for class classification in each of a plurality of consecutive images to be processed, as the images to be processed,
The extracting unit extracts low-level features from one superimposed image obtained by superimposing a plurality of consecutive images corresponding to each of a plurality of images to be processed, which are included in the useful region specified by the specifying unit. The information processing apparatus according to any one of claims 1 to 5, characterized in that:

7. The extracting unit extracts, as the low-order features, low-order features that serve as information used in similar image retrieval using the image to be processed. The information processing device according to .

An information processing method executed by an information processing device,
A learner trained as a neural network for recognizing a predetermined object included in an image to be processed, wherein the predetermined an acquisition step of acquiring an intermediate image that contributes to an improvement in object recognition rate;
The degree of contribution of the image to be processed to the classification result is a predetermined value based on a comparison result of comparing the intermediate image mapped to the classification result for the image to be processed and the image to be processed. An identifying step of identifying the region determined to be higher as a useful region;
an extracting step of extracting low-order features from an image included in the useful region specified in the specifying step, among the images to be processed.

A learner trained as a neural network for recognizing a predetermined object included in an image to be processed, wherein the predetermined an acquisition procedure for acquiring an intermediate image that contributes to improving an object recognition rate;
The degree of contribution of the image to be processed to the classification result is a predetermined value based on a comparison result of comparing the intermediate image mapped to the classification result for the image to be processed and the image to be processed. An identification procedure for identifying a region determined to be higher as a useful region;
An information processing program for causing a computer to execute an extraction procedure for extracting low-order features from an image included in the useful region identified by the identification procedure, among the images to be processed.