JP6826389B2

JP6826389B2 - Estimator, estimation method, and estimation program

Info

Publication number: JP6826389B2
Application number: JP2016149641A
Authority: JP
Inventors: 智大田中; 直晃山下
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2016-07-29
Filing date: 2016-07-29
Publication date: 2021-02-03
Anticipated expiration: 2036-07-29
Also published as: JP2018018384A

Description

本発明は、推定装置、推定方法、及び推定プログラムに関する。 The present invention relates to an estimation device, an estimation method, and an estimation program.

従来、ニューラルネットワークによる画像の特徴抽出に関する技術が提供されている。例えば、畳み込みニューラルネットワーク（Convolutional Neural Network）により、画像に含まれる対象を識別する技術が提供されている。 Conventionally, a technique for extracting image features by a neural network has been provided. For example, a convolutional neural network (Convolutional Neural Network) provides a technique for identifying an object contained in an image.

特開２０１６−３３８０６号公報Japanese Unexamined Patent Publication No. 2016-33806

しかしながら、上記の従来技術では画像にどのような態様で対象が含まれるかを適切に推定することは難しい。例えば、画像にある対象が含まれると判別できるだけでは、その画像に対象が含まれる態様が適切に推定されるとは限らない。 However, in the above-mentioned conventional technique, it is difficult to appropriately estimate in what manner the image includes the object. For example, just by determining that an object is included in an image, it is not always possible to properly estimate the mode in which the object is included in the image.

本願は、上記に鑑みてなされたものであって、画像に対象が含まれる態様を適切に推定する推定装置、推定方法、及び推定プログラムを提供することを目的とする。 The present application has been made in view of the above, and an object of the present application is to provide an estimation device, an estimation method, and an estimation program for appropriately estimating an aspect in which an object is included in an image.

本願に係る推定装置は、画像を取得する取得部と、入力画像に応じて当該入力画像中における所定の対象の占有率を出力する学習器と、前記取得部により取得された画像とに基づいて、前記取得部により取得された画像における前記所定の対象の占有率を推定する推定部と、を備えたことを特徴とする。 The estimation device according to the present application is based on an acquisition unit that acquires an image, a learner that outputs an occupancy rate of a predetermined target in the input image according to the input image, and an image acquired by the acquisition unit. It is characterized by including an estimation unit for estimating the occupancy rate of the predetermined object in the image acquired by the acquisition unit.

実施形態の一態様によれば、画像に対象が含まれる態様を適切に推定することができるという効果を奏する。 According to one aspect of the embodiment, there is an effect that the aspect in which the object is included in the image can be appropriately estimated.

図１は、実施形態に係る検索処理の一例を示す図である。FIG. 1 is a diagram showing an example of a search process according to an embodiment. 図２は、実施形態に係る検索処理の一例を示す図である。FIG. 2 is a diagram showing an example of a search process according to the embodiment. 図３は、実施形態に係る検索装置の構成例を示す図である。FIG. 3 is a diagram showing a configuration example of the search device according to the embodiment. 図４は、実施形態に係る学習情報記憶部の一例を示す図である。FIG. 4 is a diagram showing an example of the learning information storage unit according to the embodiment. 図５は、実施形態に係る画像情報記憶部の一例を示す図である。FIG. 5 is a diagram showing an example of an image information storage unit according to an embodiment. 図６は、実施形態に係る一覧情報記憶部の一例を示す図である。FIG. 6 is a diagram showing an example of the list information storage unit according to the embodiment. 図７は、実施形態に係る画像における対象の占有率の推定の一例を示すフローチャートである。FIG. 7 is a flowchart showing an example of estimating the occupancy rate of the object in the image according to the embodiment. 図８は、実施形態に係るランキングの決定の一例を示すフローチャートである。FIG. 8 is a flowchart showing an example of determining the ranking according to the embodiment. 図９は、実施形態に係る学習処理の一例を示す図である。FIG. 9 is a diagram showing an example of the learning process according to the embodiment. 図１０は、実施形態に係る学習処理の一例を示す図である。FIG. 10 is a diagram showing an example of the learning process according to the embodiment. 図１１は、実施形態に係る学習処理の一例を示すフローチャートである。FIG. 11 is a flowchart showing an example of the learning process according to the embodiment. 図１２は、検索装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。FIG. 12 is a hardware configuration diagram showing an example of a computer that realizes the function of the search device.

以下に、本願に係る推定装置、推定方法、及び推定プログラムを実施するための形態（以下、「実施形態」と呼ぶ）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願に係る推定装置、推定方法、及び推定プログラムが限定されるものではない。また、以下の各実施形態において同一の部位には同一の符号を付し、重複する説明は省略される。 Hereinafter, the estimation device, the estimation method, and the mode for carrying out the estimation program (hereinafter referred to as “the embodiment”) according to the present application will be described in detail with reference to the drawings. It should be noted that this embodiment does not limit the estimation device, estimation method, and estimation program according to the present application. Further, in each of the following embodiments, the same parts are designated by the same reference numerals, and duplicate description is omitted.

（実施形態）
〔１．検索処理〕
図１及び図２を用いて、実施形態に係る検索処理の一例について説明する。図１及び図２は、実施形態に係る検索処理の一例を示す図である。具体的には、図１は、後述する検索の対象となる画像群の各々における対象の占有率の推定の一例を示す図である。図１に示す検索装置１００は、画像における対象の占有率を出力（推定）する学習器（モデル）を用いて、検索の対象となる画像群の各々における対象の占有率を推定する。 (Embodiment)
[1. Search process]
An example of the search process according to the embodiment will be described with reference to FIGS. 1 and 2. 1 and 2 are diagrams showing an example of a search process according to an embodiment. Specifically, FIG. 1 is a diagram showing an example of estimating the occupancy rate of the target in each of the image groups to be searched, which will be described later. The search device 100 shown in FIG. 1 estimates the occupancy rate of the target in each of the image groups to be searched by using a learning device (model) that outputs (estimates) the occupancy rate of the target in the image.

また、具体的には、図２は、クエリ画像を取得した場合に、クエリ画像における対象の占有率に応じた検索結果の提供の一例を示す図である。図２に示す検索装置１００は、画像における対象の占有率を出力（推定）する学習器ＬＥを用いて、クエリ画像における対象の占有率を推定する。また、検索装置１００は、クエリ画像における対象の占有率に応じて、検索結果をユーザに提供する。なお、ここでいうクエリ画像とは、検索において指定する条件を示す画像を意味し、クエリ画像との間において所定の条件を満たす画像を取り出すため処理の要求（問い合わせ）を含んでもよい。また、画像における対象の占有率を出力（推定）する学習器ＬＥは、入力画像とその入力画像における対象等の占有率を示す正解情報とを用いることにより生成されるが、詳細は後述する。また、図１及び図２に示す例においては、占有率を推定する対象が猫である場合を示すが、対象は、猫に限らず、犬や人間等の他の生物や植物や車等の種々の物体等であってもよい。また、ここでいう対象は、識別可能であれば種々の対象が含まれてもよく、例えば火や海の波など種々の現象等が含まれてもよい。 Further, specifically, FIG. 2 is a diagram showing an example of providing a search result according to the occupancy rate of the target in the query image when the query image is acquired. The search device 100 shown in FIG. 2 estimates the occupancy rate of the target in the query image by using the learner LE that outputs (estimates) the occupancy rate of the target in the image. Further, the search device 100 provides the search result to the user according to the occupancy rate of the target in the query image. The query image referred to here means an image indicating a condition specified in the search, and may include a processing request (inquiry) for extracting an image satisfying a predetermined condition with the query image. Further, the learner LE that outputs (estimates) the occupancy rate of the object in the image is generated by using the input image and the correct answer information indicating the occupancy rate of the object or the like in the input image, and the details will be described later. Further, in the examples shown in FIGS. 1 and 2, the case where the target for estimating the occupancy rate is a cat is shown, but the target is not limited to cats, but other organisms such as dogs and humans, plants, cars, and the like. It may be various objects or the like. Further, the objects referred to here may include various objects as long as they can be identified, and may include various phenomena such as fire and sea waves.

ここで、検索装置１００が用いる学習器について簡単に説明する。検索装置１００が用いる学習器は、例えば、入力されたデータに対する演算結果を出力する複数のノードを多層に接続した学習器であって、教師あり学習により抽象化された画像の特徴を学習された学習器である。例えば、学習器は、複数のノードを有する層を多段に接続したニューラルネットワークであり、いわゆるディープラーニングの技術により実現されるＤＮＮ（Deep Neural Network）であってもよい。また、画像の特徴とは、画像に含まれる文字の有無、色、構成等、画像内に現れる具体的な特徴のみならず、撮像されている物体が何であるか、画像がどのような利用者に好かれるか、画像の雰囲気等、抽象化（メタ化）された画像の特徴をも含む概念である。 Here, the learning device used by the search device 100 will be briefly described. The learner used by the search device 100 is, for example, a learner in which a plurality of nodes that output calculation results for the input data are connected in multiple layers, and the features of the image abstracted by supervised learning are learned. It is a learning device. For example, the learner is a neural network in which layers having a plurality of nodes are connected in multiple stages, and may be a DNN (Deep Neural Network) realized by a so-called deep learning technique. In addition, the features of the image are not only the specific features that appear in the image such as the presence / absence, color, and composition of characters contained in the image, but also what the imaged object is and what kind of user the image is. It is a concept that also includes the features of an abstracted (meta-ized) image, such as the atmosphere of the image.

例えば、学習器は、ディープラーニングの技術により、以下のような学習手法により生成される。例えば、学習器は、各ノードの間の接続係数が初期化され、様々な特徴を有する画像が入力される。そして、学習器は、学習器における出力と、入力した画像との誤差が少なくなるようにパラメータ（接続係数）を補正するバックプロパゲーション（誤差逆伝播法）等の処理により生成される。例えば、学習器は、誤差関数等、所定の損失（ロス）関数を最小化するようにバックプロパゲーション等の処理を行うことにより生成される。上述のような処理を繰り返すことで、学習器は、入力された画像をより良く再現できる出力、すなわち入力された画像の特徴を出力することができる。 For example, the learning device is generated by the following learning method by the technique of deep learning. For example, in the learner, the connection coefficient between each node is initialized, and images having various characteristics are input. Then, the learner is generated by processing such as backpropagation (error backpropagation method) that corrects a parameter (connection coefficient) so that an error between the output in the learner and the input image is reduced. For example, the learner is generated by performing processing such as backpropagation so as to minimize a predetermined loss function such as an error function. By repeating the above-mentioned processing, the learner can output an output capable of better reproducing the input image, that is, an output of the characteristics of the input image.

なお、学習器の学習手法については、上述した手法に限定されるものではなく、任意の公知技術が適用可能である。また、学習器の学習を行う際に用いられる情報は、画像及びその画像に含まれる対象を示す情報等の種々の画像のデータセットを利用してもよい。学習器の学習を行う際に用いられる情報は、対象が１つ含まれる画像及び対象が１つであることを示す情報のセットや、対象が複数（例えば２つ）含まれる画像及び対象が複数（例えば２つ）であることを示す情報のセットや、対象が含まれない画像及び対象が含まれない（０である）ことを示す情報のセット等を利用してもよい。また、学習器に対する画像の入力方法、学習器が出力するデータの形式、学習器に対して明示的に学習させる特徴の内容等は、任意の手法が適用できる。すなわち、検索装置１００は、画像から抽象化された特徴を示す特徴量を算出できるのであれば、任意の学習器を用いることができる。 The learning method of the learning device is not limited to the above-mentioned method, and any known technique can be applied. Further, as the information used when learning the learner, a data set of various images such as an image and information indicating an object included in the image may be used. The information used when learning the learner includes an image containing one target and a set of information indicating that the target is one, and an image containing a plurality of targets (for example, two targets) and a plurality of targets. You may use a set of information indicating that (for example, two), an image that does not include an object, a set of information that indicates that the object is not included (0), and the like. In addition, any method can be applied to the method of inputting an image to the learning device, the format of the data output by the learning device, the content of the feature to be explicitly learned by the learning device, and the like. That is, the search device 100 can use any learning device as long as it can calculate the feature amount indicating the abstracted feature from the image.

図１では、検索装置１００は、入力画像の局所領域の畳み込みとプーリングとを繰り返す、いわゆる畳み込みニューラルネットワーク（ＣＮＮ：Convolutional Neural Network）による学習器ＬＥを用いるものとする。以下では、畳み込みニューラルネットワークをＣＮＮと記載する場合がある。例えば、ＣＮＮによる学習器ＬＥは、画像から特徴を抽出して出力する機能に加え、画像内に含まれる文字や撮像対象等の位置的変異に対し、出力の不変性を有する。このため、学習器ＬＥは、画像の抽象化された特徴を精度良く算出することができる。 In FIG. 1, the search device 100 uses a learner LE based on a so-called convolutional neural network (CNN) that repeats convolution and pooling of a local region of an input image. In the following, the convolutional neural network may be referred to as CNN. For example, the learning device LE by CNN has a function of extracting features from an image and outputting the feature, and also has an output invariance with respect to a positional variation such as a character or an image pickup target included in the image. Therefore, the learner LE can accurately calculate the abstracted features of the image.

例えば、図１では、検索装置１００は、画像における対象（猫）の占有率に関する情報を出力する識別器（モデル）である学習器ＬＥを用いる。すなわち、図１では、検索装置１００は、上述のような所定の学習処理により生成済みの学習器ＬＥを用いるものとする。なお、図１では、検索装置１００が画像に含まれる対象を識別する学習器ＬＥを用いる場合を示すが、検索装置１００は、所定の対象の占有率を推定可能であれば、どのような学習器を用いてもよい。また、学習器ＬＥを生成（学習）する際には、所定の損失関数や正解情報等を用いるが詳細は後述する。 For example, in FIG. 1, the search device 100 uses a learning device LE, which is a discriminator (model) that outputs information regarding the occupancy rate of an object (cat) in an image. That is, in FIG. 1, the search device 100 uses the learner LE that has been generated by the predetermined learning process as described above. Note that FIG. 1 shows a case where the search device 100 uses a learning device LE that identifies an object included in an image. However, if the search device 100 can estimate the occupancy rate of a predetermined object, what kind of learning can be done? A vessel may be used. Further, when the learning device LE is generated (learning), a predetermined loss function, correct answer information, and the like are used, but the details will be described later.

図２に示すように、検索システム１には、端末装置１０と、検索装置１００とが含まれる。端末装置１０と、検索装置１００とは図示しない所定の通信網を介して、有線または無線により通信可能に接続される。また、図２に示した検索システム１には、複数台の端末装置１０や、複数台の検索装置１００が含まれてもよい。 As shown in FIG. 2, the search system 1 includes a terminal device 10 and a search device 100. The terminal device 10 and the search device 100 are communicably connected by wire or wirelessly via a predetermined communication network (not shown). Further, the search system 1 shown in FIG. 2 may include a plurality of terminal devices 10 and a plurality of search devices 100.

端末装置１０は、ユーザによって利用される情報処理装置である。端末装置１０は、例えば、スマートフォンや、タブレット型端末や、ノート型ＰＣ（Personal Computer）や、デスクトップＰＣや、携帯電話機や、ＰＤＡ（Personal Digital Assistant）等により実現される。図２に示す例においては、端末装置１０がスマートフォンである場合を示す。なお、以下では、端末装置１０をユーザと表記する場合がある。すなわち、以下では、ユーザを端末装置１０と読み替えることもできる。 The terminal device 10 is an information processing device used by the user. The terminal device 10 is realized by, for example, a smartphone, a tablet terminal, a notebook PC (Personal Computer), a desktop PC, a mobile phone, a PDA (Personal Digital Assistant), or the like. In the example shown in FIG. 2, a case where the terminal device 10 is a smartphone is shown. In the following, the terminal device 10 may be referred to as a user. That is, in the following, the user can be read as the terminal device 10.

検索装置１００は、入力画像中における所定の対象の占有率を出力する学習器ＬＥを用いて、取得した画像における所定の対象の占有率を推定する推定装置である。また、検索装置１００は、端末装置１０からクエリ画像を取得し、取得したクエリ画像に基づいて、所定の画像群から検索対象の占有率に関する条件を満たす検索結果画像を抽出する推定装置である。 The search device 100 is an estimation device that estimates the occupancy rate of a predetermined object in the acquired image by using the learning device LE that outputs the occupancy rate of the predetermined object in the input image. Further, the search device 100 is an estimation device that acquires a query image from the terminal device 10 and extracts a search result image that satisfies the condition regarding the occupancy rate of the search target from a predetermined image group based on the acquired query image.

まず、図１を用いて、検索装置１００による検索対象となる各画像における対象（猫）の占有率の推定について説明する。例えば、図１に示す画像ＩＭ１１〜ＩＭ１５等は、画像情報記憶部１２２（図５）に記憶される。以下では、画像ＩＭ１１〜ＩＭ１５等を区別せずに説明する場合、「画像ＩＭ」と記載する場合がある。 First, with reference to FIG. 1, the estimation of the occupancy rate of the target (cat) in each image to be searched by the search device 100 will be described. For example, the images IM11 to IM15 shown in FIG. 1 are stored in the image information storage unit 122 (FIG. 5). In the following, when the images IM11 to IM15 and the like are described without distinction, they may be described as "image IM".

例えば、検索装置１００は、画像ＩＭを学習器ＬＥに入力することにより、画像ＩＭにおける猫の占有率を推定する。具体的には、画像ＩＭが入力された学習器ＬＥは、入力された画像ＩＭにおける猫の占有率を示す情報を出力し、検索装置１００は、学習器ＬＥが出力した画像ＩＭにおける猫の占有率を示す情報に基づいて、画像ＩＭにおける猫の占有率を推定する。画像ＩＭが入力された学習器ＬＥは、猫のみではなく、入力された画像ＩＭにおける種々の対象の占有率を示す情報を出力する。図１及び図２では、説明を簡単にするために、猫と背景の２つのクラスの占有率のみを図示するが、学習器ＬＥは、他のクラスに関する占有率を示す情報を出力してもよい。なお、ここでいうクラスとは、機械学習におけるクラスであり、学習器により分類される種別等であってもよい。例えば、学習器ＬＥは、２０個のクラスに関する占有率を示す情報を出力してもよい。この場合、学習器ＬＥは、猫や背景に対応する２つのクラスに加えて、犬や飛行機や自転車等のクラスに関する占有率を示す情報を出力してもよい。なお、学習器ＬＥが、どの対象（クラス）の占有率を示す情報を出力するかは、学習によるが詳細は後述する。 For example, the search device 100 estimates the occupancy rate of the cat in the image IM by inputting the image IM into the learning device LE. Specifically, the learner LE in which the image IM is input outputs information indicating the occupancy rate of the cat in the input image IM, and the search device 100 outputs the occupancy of the cat in the image IM output by the learner LE. Based on the information indicating the rate, the occupancy rate of the cat in the image IM is estimated. The learner LE to which the image IM is input outputs information indicating the occupancy rate of various objects in the input image IM as well as the cat. In FIGS. 1 and 2, for simplification of explanation, only the occupancy rates of the two classes of cat and background are shown, but the learner LE may output information indicating the occupancy rates of the other classes. Good. The class referred to here is a class in machine learning, and may be a type classified by a learning device or the like. For example, the learner LE may output information indicating the occupancy rate for 20 classes. In this case, the learner LE may output information indicating the occupancy rate of classes such as dogs, airplanes, and bicycles, in addition to the two classes corresponding to cats and backgrounds. It should be noted that the learning device LE outputs information indicating the occupancy rate of which object (class) depends on the learning, but the details will be described later.

また、検索装置１００は、検索対象となる全画像ＩＭを学習器ＬＥに入力することにより、画像ＩＭにおける猫の占有率を推定してもよいが、対象である猫が含まれると想定される画像ＩＭのみを学習器ＬＥに入力して、画像ＩＭにおける猫の占有率を示す情報を出力してもよい。例えば、検索装置１００は、画像情報記憶部１２２（図５）中の画像ＩＭのうち、タグ「猫」が対応付けられた画像ＩＭのみを学習器ＬＥに入力して、画像ＩＭにおける猫の占有率を示す情報を出力してもよい。例えば、図１に示す画像ＩＭ１１〜ＩＭ１５等にタグ「猫」が対応付けられている場合、検索装置１００は、画像ＩＭ１１〜ＩＭ１５等を学習器ＬＥに入力して、画像ＩＭにおける猫の占有率を示す情報を出力してもよい。 Further, the search device 100 may estimate the occupancy rate of the cat in the image IM by inputting all the image IMs to be searched into the learning device LE, but it is assumed that the target cats are included. Only the image IM may be input to the learner LE to output information indicating the occupancy rate of the cat in the image IM. For example, the search device 100 inputs only the image IM associated with the tag “cat” among the image IMs in the image information storage unit 122 (FIG. 5) into the learning device LE, and occupies the cat in the image IM. Information indicating the rate may be output. For example, when the tag "cat" is associated with the images IM11 to IM15 shown in FIG. 1, the search device 100 inputs the images IM11 to IM15 and the like into the learning device LE, and the occupancy rate of the cat in the image IM. Information indicating that may be output.

図１の例では、検索装置１００は、画像ＩＭ１１を学習器ＬＥに入力する（ステップＳ１１−１）。例えば、検索装置１００は、画像情報記憶部１２２（図５）から画像ＩＭ１１を取得し、画像ＩＭ１１を学習器ＬＥに入力する。その後、検索装置１００は、学習器ＬＥの出力に基づいて、画像ＩＭ１１における猫の占有率を推定する（ステップＳ１２−１）。例えば、検索装置１００は、推定情報ＯＣ１１に示すように、学習器ＬＥの出力に基づいて、画像ＩＭ１１における猫の占有率を８０％（０．８）と推定し、画像ＩＭ１１における背景の占有率を１８％（０．１８）と推定する。例えば、検索装置１００は、各対象（クラス）の占有率の合計が１００％になるように各対象の占有率を推定する。例えば、検索装置１００は、猫や背景以外の対象（クラス）の各々の占有率の合計が２％となるように推定する。なお、本実施形態においては、占有率を百分率「％」（例えば、８０％等）で表記するが、小数点（例えば、０．８等）であってもよい。例えば、学習器ＬＥは、各対象（クラス）の占有率を示す０〜１の範囲内の数値を出力してもよい。 In the example of FIG. 1, the search device 100 inputs the image IM 11 to the learner LE (step S11-1). For example, the search device 100 acquires the image IM 11 from the image information storage unit 122 (FIG. 5) and inputs the image IM 11 to the learning device LE. After that, the search device 100 estimates the occupancy rate of the cat in the image IM 11 based on the output of the learning device LE (step S12-1). For example, as shown in the estimation information OC11, the search device 100 estimates that the occupancy rate of the cat in the image IM11 is 80% (0.8) based on the output of the learning device LE, and the occupancy rate of the background in the image IM11. Is estimated to be 18% (0.18). For example, the search device 100 estimates the occupancy rate of each object so that the total occupancy rate of each object (class) is 100%. For example, the search device 100 estimates that the total occupancy rate of each object (class) other than the cat and the background is 2%. In the present embodiment, the occupancy rate is expressed as a percentage "%" (for example, 80%), but may be a decimal point (for example, 0.8 etc.). For example, the learner LE may output a numerical value in the range of 0 to 1 indicating the occupancy rate of each object (class).

また、図１の例では、検索装置１００は、画像ＩＭ１２を学習器ＬＥに入力する（ステップＳ１１−２）。例えば、検索装置１００は、画像情報記憶部１２２（図５）から画像ＩＭ１２を取得し、画像ＩＭ１２を学習器ＬＥに入力する。その後、検索装置１００は、学習器ＬＥの出力に基づいて、画像ＩＭ１２における猫の占有率を推定する（ステップＳ１２−２）。例えば、検索装置１００は、推定情報ＯＣ１２に示すように、学習器ＬＥの出力に基づいて、画像ＩＭ１２における猫の占有率を６０％と推定し、画像ＩＭ１２における背景の占有率を３６％と推定する。 Further, in the example of FIG. 1, the search device 100 inputs the image IM12 to the learner LE (step S11-2). For example, the search device 100 acquires the image IM12 from the image information storage unit 122 (FIG. 5) and inputs the image IM12 to the learner LE. After that, the search device 100 estimates the occupancy rate of the cat in the image IM12 based on the output of the learning device LE (step S12-2). For example, as shown in the estimation information OC12, the search device 100 estimates the occupancy rate of the cat in the image IM12 to be 60% and the occupancy rate of the background in the image IM12 to be 36% based on the output of the learner LE. To do.

また、図１の例では、検索装置１００は、画像ＩＭ１３を学習器ＬＥに入力する（ステップＳ１１−３）。例えば、検索装置１００は、画像情報記憶部１２２（図５）から画像ＩＭ１３を取得し、画像ＩＭ１３を学習器ＬＥに入力する。その後、検索装置１００は、学習器ＬＥの出力に基づいて、画像ＩＭ１３における猫の占有率を推定する（ステップＳ１２−３）。例えば、検索装置１００は、推定情報ＯＣ１３に示すように、学習器ＬＥの出力に基づいて、画像ＩＭ１３における猫の占有率を２０％と推定し、画像ＩＭ１３における背景の占有率を７５％と推定する。 Further, in the example of FIG. 1, the search device 100 inputs the image IM 13 to the learner LE (step S11-3). For example, the search device 100 acquires the image IM13 from the image information storage unit 122 (FIG. 5) and inputs the image IM13 to the learning device LE. After that, the search device 100 estimates the occupancy rate of the cat in the image IM 13 based on the output of the learning device LE (step S12-3). For example, as shown in the estimation information OC13, the search device 100 estimates the occupancy rate of the cat in the image IM13 to be 20% and the occupancy rate of the background in the image IM13 to be 75% based on the output of the learner LE. To do.

また、図１の例では、検索装置１００は、画像ＩＭ１４を学習器ＬＥに入力する（ステップＳ１１−４）。例えば、検索装置１００は、画像情報記憶部１２２（図５）から画像ＩＭ１４を取得し、画像ＩＭ１４を学習器ＬＥに入力する。その後、検索装置１００は、学習器ＬＥの出力に基づいて、画像ＩＭ１４における猫の占有率を推定する（ステップＳ１２−４）。例えば、検索装置１００は、推定情報ＯＣ１４に示すように、学習器ＬＥの出力に基づいて、画像ＩＭ１４における猫の占有率を９０％と推定し、画像ＩＭ１４における背景の占有率を９％と推定する。 Further, in the example of FIG. 1, the search device 100 inputs the image IM 14 to the learner LE (step S11-4). For example, the search device 100 acquires the image IM14 from the image information storage unit 122 (FIG. 5) and inputs the image IM14 to the learner LE. After that, the search device 100 estimates the occupancy rate of the cat in the image IM14 based on the output of the learning device LE (step S12-4). For example, as shown in the estimation information OC14, the search device 100 estimates the occupancy rate of the cat in the image IM14 to be 90% and the occupancy rate of the background in the image IM14 to be 9% based on the output of the learner LE. To do.

また、図１の例では、検索装置１００は、画像ＩＭ１５を学習器ＬＥに入力する（ステップＳ１１−５）。例えば、検索装置１００は、画像情報記憶部１２２（図５）から画像ＩＭ１５を取得し、画像ＩＭ１５を学習器ＬＥに入力する。その後、検索装置１００は、学習器ＬＥの出力に基づいて、画像ＩＭ１５における猫の占有率を推定する（ステップＳ１２−５）。例えば、検索装置１００は、推定情報ＯＣ１５に示すように、学習器ＬＥの出力に基づいて、画像ＩＭ１５における猫の占有率を７０％と推定し、画像ＩＭ１５における背景の占有率を２７％と推定する。 Further, in the example of FIG. 1, the search device 100 inputs the image IM 15 to the learner LE (step S11-5). For example, the search device 100 acquires the image IM15 from the image information storage unit 122 (FIG. 5) and inputs the image IM15 to the learning device LE. After that, the search device 100 estimates the occupancy rate of the cat in the image IM 15 based on the output of the learning device LE (step S12-5). For example, as shown in the estimation information OC15, the search device 100 estimates the occupancy rate of the cat in the image IM15 to be 70% and the occupancy rate of the background in the image IM15 to be 27% based on the output of the learner LE. To do.

また、検索装置１００は、タグ「猫」が対応付けられた他の画像ＩＭについても学習器ＬＥに入力し、各画像ＩＭにおける猫の占有率を推定する。その後、検索装置１００は、推定した各画像ＩＭにおける猫の占有率を示す一覧情報を生成する（ステップＳ１３）。図１の例では、検索装置１００は、画像ＩＭ１１〜ＩＭ１５等における猫の占有率を示す一覧情報ＬＴ１１を生成する。なお、図１の例では、検索装置１００が説明のために、一覧情報ＬＴ１１を生成する場合を一例とするが、ステップＳ１２−１〜Ｓ１２−５等において推定した各画像ＩＭにおける猫の占有率を示す情報を、各画像に対応付けて画像情報記憶部１２２（図５）に記憶してもよい。 Further, the search device 100 also inputs other image IMs associated with the tag "cat" into the learning device LE, and estimates the occupancy rate of the cat in each image IM. After that, the search device 100 generates list information indicating the occupancy rate of the cat in each of the estimated image IMs (step S13). In the example of FIG. 1, the search device 100 generates list information LT11 indicating the occupancy rate of the cat in the images IM11 to IM15 and the like. In the example of FIG. 1, the case where the search device 100 generates the list information LT11 is taken as an example for the sake of explanation, but the occupancy rate of the cat in each image IM estimated in steps S12-1 to S12-5 and the like is taken as an example. The information indicating the above may be stored in the image information storage unit 122 (FIG. 5) in association with each image.

次に、図２を用いて、クエリ画像を取得した場合における、検索装置１００による検索結果の提供について説明する。まず、検索装置１００は、ユーザＵ１が利用する端末装置１０からクエリ画像ＩＭ１０を取得する（ステップＳ１４）。図２の例では、検索装置１００は、端末装置１０から猫が含まれるクエリ画像ＩＭ１０を取得する。 Next, with reference to FIG. 2, the provision of the search result by the search device 100 when the query image is acquired will be described. First, the search device 100 acquires the query image IM 10 from the terminal device 10 used by the user U1 (step S14). In the example of FIG. 2, the search device 100 acquires the query image IM10 including the cat from the terminal device 10.

その後、検索装置１００は、取得した画像ＩＭ１０を学習器ＬＥに入力する（ステップＳ１５）。そして、検索装置１００は、学習器ＬＥの出力に基づいて、画像ＩＭ１０における猫の占有率を推定する（ステップＳ１６）。例えば、検索装置１００は、推定情報ＯＣ１０に示すように、学習器ＬＥの出力に基づいて、画像ＩＭ１０における猫の占有率を６８％と推定し、画像ＩＭ１０における背景の占有率を３０％と推定する。このように、図２の例では、検索装置１００は、検索対象が「猫」であり、その検索対象に関する条件が占有率「６８％」であることを示す画像ＩＭ１０を、端末装置１０から取得する。すなわち、検索装置１００は、クエリ画像（画像ＩＭ１０）を取得することにより、検索対象が「猫」であり、その検索対象に関する条件が占有率「６８％」であることを示す指定情報を取得する。 After that, the search device 100 inputs the acquired image IM10 to the learner LE (step S15). Then, the search device 100 estimates the occupancy rate of the cat in the image IM 10 based on the output of the learning device LE (step S16). For example, as shown in the estimation information OC10, the search device 100 estimates the occupancy rate of the cat in the image IM10 to be 68% and the occupancy rate of the background in the image IM10 to be 30% based on the output of the learner LE. To do. As described above, in the example of FIG. 2, the search device 100 acquires the image IM10 indicating that the search target is the “cat” and the condition related to the search target is the occupancy rate “68%” from the terminal device 10. To do. That is, by acquiring the query image (image IM10), the search device 100 acquires designated information indicating that the search target is the "cat" and the condition related to the search target is the occupancy rate "68%". ..

なお、図２の例では、検索装置１００がクエリ画像（画像ＩＭ１０）を取得することにより、指定情報を取得する場合を示すが、検索装置１００は、指定情報を文字情報として取得してもよい。例えば、検索装置１００は、ユーザＵ１が端末装置１０に、検索対象が「猫」、占有率「６８％」と入力した場合、その入力された情報を指定情報として取得してもよい。この場合、検索装置１００は、ユーザＵ１が端末装置１０に、指定情報を文字情報として入力する画面等の入力インターフェイスを提供してもよい。 In the example of FIG. 2, the case where the search device 100 acquires the designated information by acquiring the query image (image IM10) is shown, but the search device 100 may acquire the designated information as character information. .. For example, when the user U1 inputs the search target "cat" and the occupancy rate "68%" to the terminal device 10, the search device 100 may acquire the input information as designated information. In this case, the search device 100 may provide the terminal device 10 with an input interface such as a screen in which the user U1 inputs the designated information as character information.

そして、検索装置１００は、検索対象が「猫」であり、その検索対象に関する条件が占有率「６８％」を満たす検索結果画像を抽出する（ステップＳ１７）。例えば、検索装置１００は、猫の占有率を示す一覧情報ＬＴ１１に含まれる画像群から、猫の占有率が条件である「６８％」から所定の範囲内の画像ＩＭを抽出する。図２の例では、検索装置１００は、猫の占有率が条件である「６８％」から±２５％以内である画像ＩＭを抽出する。例えば、検索装置１００は、猫の占有率が４３％から９３％の範囲内である画像ＩＭを抽出する。具体的には、検索装置１００は、抽出情報ＬＴ１２に示すように、画像ＩＭ１１、画像ＩＭ１２、画像ＩＭ１４、画像ＩＭ１５等を抽出する。このように、図２の例では、検索装置１００は、猫の占有率が２０％であり範囲外である画像ＩＭ１３等の条件を満たさない画像ＩＭ以外の画像ＩＭを抽出する。 Then, the search device 100 extracts a search result image in which the search target is a "cat" and the condition related to the search target satisfies the occupancy rate "68%" (step S17). For example, the search device 100 extracts an image IM within a predetermined range from "68%", which is a condition of the cat occupancy rate, from the image group included in the list information LT11 indicating the cat occupancy rate. In the example of FIG. 2, the search device 100 extracts an image IM in which the occupancy rate of the cat is within ± 25% from “68%” which is a condition. For example, the search device 100 extracts an image IM in which the occupancy rate of the cat is in the range of 43% to 93%. Specifically, the search device 100 extracts the image IM11, the image IM12, the image IM14, the image IM15, and the like, as shown in the extraction information LT12. As described above, in the example of FIG. 2, the search device 100 extracts an image IM other than the image IM that does not satisfy the conditions such as the image IM 13 that the cat occupancy rate is 20% and is out of the range.

その後、検索装置１００は、抽出した画像ＩＭのランキングを決定する（ステップＳ１８）。例えば、検索装置１００は、ランキング情報ＬＴ１３に示すように、猫の占有率が条件である「６８％」からの誤差が小さい画像ＩＭのランキングが高くなるようにランキングを決定する。具体的には、検索装置１００は、誤差が「２（＝７０−６８）」であり最小の画像ＩＭ１５を１位とし、画像ＩＭ１５の次に誤差が小さい画像ＩＭ１９を２位とし、画像ＩＭ１９の次に誤差が小さい画像ＩＭ１２を３位とし、画像ＩＭ１２の次に誤差が小さい画像ＩＭ１７を４位とする。 After that, the search device 100 determines the ranking of the extracted image IM (step S18). For example, as shown in the ranking information LT13, the search device 100 determines the ranking so that the ranking of the image IM having a small error from "68%", which is the condition of the cat occupancy, is high. Specifically, the search device 100 places the smallest image IM15 having an error of "2 (= 70-68)" as the first place, the image IM19 having the next smallest error after the image IM15 as the second place, and the image IM19. The image IM12 having the next smallest error is ranked third, and the image IM17 having the next smallest error after the image IM12 is ranked fourth.

その後、検索装置１００は、決定したランキングに基づいた検索結果をユーザＵ１に提供する（ステップＳ１９）。図２の例では、検索装置１００は、画像ＩＭ１５を１位とし、画像ＩＭ１９を２位とし、画像ＩＭ１２を３位とし、画像ＩＭ１７を４位とするランキングに基づいた表示順の検索結果をユーザＵ１に提供する。例えば、検索装置１００は、決定したランキングに基づいた検索結果をユーザＵ１が利用する端末装置１０に送信する。 After that, the search device 100 provides the user U1 with a search result based on the determined ranking (step S19). In the example of FIG. 2, the search device 100 sets the image IM15 as the first place, the image IM19 as the second place, the image IM12 as the third place, and the image IM17 as the fourth place. Provide to U1. For example, the search device 100 transmits a search result based on the determined ranking to the terminal device 10 used by the user U1.

上述したように、検索装置１００は、画像における対象の占有率を出力する学習器を用いることにより、画像に対象が含まれる態様、すなわち画像において対象がどれくらいの割合を占めるかを適切に推定することができる。また、検索装置１００は、クエリ画像を取得した場合に、クエリ画像における対象の占有率に応じて画像を抽出することにより、クエリ画像に対応する画像を適切に抽出することができる。図２に示すように、検索装置１００は、クエリ画像における対象の占有率を条件として、対象を含む画像を抽出することにより、ユーザの意図をより適切に反映した検索結果をユーザに提供することができる。 As described above, the search device 100 appropriately estimates the mode in which the target is included in the image, that is, how much the target occupies in the image, by using the learning device that outputs the occupancy rate of the target in the image. be able to. Further, when the query image is acquired, the search device 100 can appropriately extract the image corresponding to the query image by extracting the image according to the occupancy rate of the target in the query image. As shown in FIG. 2, the search device 100 provides the user with a search result that more appropriately reflects the user's intention by extracting an image including the target on condition of the occupancy rate of the target in the query image. Can be done.

なお、上記例では、検索装置１００が猫の占有率の差に応じて画像を抽出する例を示したが、類似度に基づいて、画像を抽出してもよい。例えば、検索装置１００は、各画像に含まれる複数の対象の占有率の分布に基づいて、画像を抽出してもよい。例えば、検索装置１００は、ＫＬダイバージェンス等の指標値に基づいて、画像を抽出してもよい。例えば、検索装置１００は、所定の画像群の画像うち、その画像に含まれる複数の対象の占有率の分布とクエリ画像に含まれる複数の対象の占有率の分布とのＫＬダイバージェンスの値が所定の閾値内の画像を検索結果画像として抽出してもよい。なお、検索装置１００は、ＫＬダイバージェンスに限らず、各画像における分布の類似度に基づく指標値であれば、種々の指標値を適宜用いて、画像を抽出してもよい。 In the above example, the search device 100 extracts an image according to the difference in the occupancy rate of the cat, but the image may be extracted based on the degree of similarity. For example, the search device 100 may extract an image based on the distribution of the occupancy rates of a plurality of objects included in each image. For example, the search device 100 may extract an image based on an index value such as KL divergence. For example, the search device 100 determines a KL divergence value between the distribution of the occupancy rates of a plurality of objects included in the image and the distribution of the occupancy rates of a plurality of objects included in the query image among the images of the predetermined image group. An image within the threshold value of may be extracted as a search result image. The search device 100 is not limited to KL divergence, and an image may be extracted by appropriately using various index values as long as they are index values based on the similarity of distribution in each image.

また、上述した例では、検索装置１００が１つの学習器ＬＥを用いる場合を示したが、検索装置１００は、対象ごとに学習器を使い分けてもよい。例えば、検索装置１００は、犬を対象とする学習器を、猫を対象とする学習器と別に用いてもよい。 Further, in the above-described example, the case where the search device 100 uses one learning device LE is shown, but the search device 100 may use the learning device properly for each target. For example, the search device 100 may use the learning device for dogs separately from the learning device for cats.

〔２．検索装置の構成〕
次に、図３を用いて、実施形態に係る検索装置１００の構成について説明する。図３は、実施形態に係る検索装置１００の構成例を示す図である。図３に示すように、検索装置１００は、通信部１１０と、記憶部１２０と、制御部１３０とを有する。なお、検索装置１００は、検索装置１００の管理者等から各種操作を受け付ける入力部（例えば、キーボードやマウス等）や、各種情報を表示するための表示部（例えば、液晶ディスプレイ等）を有してもよい。 [2. Search device configuration]
Next, the configuration of the search device 100 according to the embodiment will be described with reference to FIG. FIG. 3 is a diagram showing a configuration example of the search device 100 according to the embodiment. As shown in FIG. 3, the search device 100 includes a communication unit 110, a storage unit 120, and a control unit 130. The search device 100 has an input unit (for example, a keyboard, a mouse, etc.) that receives various operations from the administrator of the search device 100, and a display unit (for example, a liquid crystal display, etc.) for displaying various information. You may.

（通信部１１０）
通信部１１０は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。そして、通信部１１０は、ネットワークと有線または無線で接続され、例えば検索システム１に含まれる端末装置１０との間で情報の送受信を行う。 (Communication unit 110)
The communication unit 110 is realized by, for example, a NIC (Network Interface Card) or the like. Then, the communication unit 110 is connected to the network by wire or wirelessly, and transmits / receives information to / from, for example, the terminal device 10 included in the search system 1.

（記憶部１２０）
記憶部１２０は、例えば、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。実施形態に係る記憶部１２０は、図３に示すように、学習情報記憶部１２１と、画像情報記憶部１２２と、一覧情報記憶部１２３とを有する。 (Memory unit 120)
The storage unit 120 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory (Flash Memory), or a storage device such as a hard disk or an optical disk. As shown in FIG. 3, the storage unit 120 according to the embodiment includes a learning information storage unit 121, an image information storage unit 122, and a list information storage unit 123.

（学習情報記憶部１２１）
実施形態に係る学習情報記憶部１２１は、学習に関する各種情報を記憶する。例えば、図４では、学習情報記憶部１２１は、所定の学習処理により生成された学習器ＬＥに関する学習情報（モデル）を記憶する。図４に、実施形態に係る学習情報記憶部１２１の一例を示す。図４に示す学習情報記憶部１２１は、「重み（ｗ_ｉｊ）」を記憶する。 (Learning information storage unit 121)
The learning information storage unit 121 according to the embodiment stores various information related to learning. For example, in FIG. 4, the learning information storage unit 121 stores learning information (model) related to the learning device LE generated by a predetermined learning process. FIG. 4 shows an example of the learning information storage unit 121 according to the embodiment. The learning information storage unit 121 shown in FIG. 4 stores a “weight ( _wij )”.

例えば、図４に示す例において、「重み（ｗ_１１）」は「０．２」であり、「重み（ｗ_１２）」は「−０．３」であることを示す。また、図４に示す例において、「重み（ｗ_２１）」は「０．５」であり、「重み（ｗ_２２）」は「１．３」であることを示す。 For example, in the example shown in FIG. 4, it is shown that the “weight (w ₁₁ )” is “0.2” and the “weight (w ₁₂ )” is “−0.3”. Further, in the example shown in FIG. 4, it is shown that the “weight (w ₂₁ )” is “0.5” and the “weight (w ₂₂ )” is “1.3”.

なお、「重み（ｗ_ｉｊ）」は、例えば、学習器ＬＥにおけるニューロンｙ_ｉからニューロンｘ_ｊへのシナプス結合係数であってもよい。また、学習情報記憶部１２１は、上記に限らず、目的に応じて種々の情報を記憶してもよい。また、検索装置１００は、対象ごとに学習器を使い分ける場合、複数の学習器に関する情報を記憶してもよい。例えば、検索装置１００は、犬を対象とする学習器を、猫を対象とする学習器と別に用いる場合、犬を対象とする学習器に関する情報も記憶してもよい。 The "weight ( _wij )" may be, for example, a synaptic connection coefficient from the neuron y _i to the neuron x _j in the learner LE. Further, the learning information storage unit 121 is not limited to the above, and may store various information depending on the purpose. Further, when the search device 100 uses different learners for each object, the search device 100 may store information about a plurality of learners. For example, when the search device 100 uses the learning device for dogs separately from the learning device for cats, the search device 100 may also store information about the learning device for dogs.

（画像情報記憶部１２２）
実施形態に係る画像情報記憶部１２２は、画像に関する各種情報を記憶する。図５に、実施形態に係る画像情報記憶部１２２の一例を示す。例えば、画像情報記憶部１２２には、検索対象となる画像が記憶される。図５に示す画像情報記憶部１２２は、「画像ＩＤ」、「画像」、「タグ」といった項目を有する。 (Image information storage unit 122)
The image information storage unit 122 according to the embodiment stores various information related to the image. FIG. 5 shows an example of the image information storage unit 122 according to the embodiment. For example, the image information storage unit 122 stores an image to be searched. The image information storage unit 122 shown in FIG. 5 has items such as an “image ID”, an “image”, and a “tag”.

「画像ＩＤ」は、画像を識別するための識別情報を示す。「画像」は、画像情報を示す。具体的には、「画像」は、検索対象となる画像を示す。図５では、説明のため画像ＩＤにより識別される画像を図示するが、「画像」としては、画像の格納場所を示すファイルパス名などが格納されてもよい。「タグ」は、画像に対応付けて付される情報であり、画像に含まれる対象を示す。 The "image ID" indicates identification information for identifying an image. "Image" indicates image information. Specifically, "image" indicates an image to be searched. In FIG. 5, an image identified by an image ID is illustrated for explanation, but as the “image”, a file path name or the like indicating a storage location of the image may be stored. The "tag" is information attached in association with the image, and indicates an object included in the image.

例えば、図５に示す例において、画像ＩＤ「ＩＭ１１」により識別される画像は、猫を含む画像ＩＭ１１であり、タグ「猫」が対応付けて記憶されることを示す。なお、各画像に付されるタグは、画像を登録する管理者等が付してもよいし、画像認識の種々の技術を適宜用いて対象を識別することにより、自動で付されてもよい。 For example, in the example shown in FIG. 5, the image identified by the image ID “IM11” is the image IM11 including a cat, indicating that the tag “cat” is associated and stored. The tag attached to each image may be attached by an administrator or the like who registers the image, or may be automatically attached by identifying the target by appropriately using various techniques of image recognition. ..

なお、画像情報記憶部１２２は、上記に限らず、目的に応じて種々の情報を記憶してもよい。例えば、画像情報記憶部１２２は、画像を生成した日時に関する情報を記憶してもよい。また、例えば、画像情報記憶部１２２は、画像に含まれる対象に関する情報を記憶してもよい。また、例えば、画像情報記憶部１２２は、取得した元となる画像を記憶してもよい。 The image information storage unit 122 is not limited to the above, and various information may be stored depending on the purpose. For example, the image information storage unit 122 may store information regarding the date and time when the image was generated. Further, for example, the image information storage unit 122 may store information about an object included in the image. Further, for example, the image information storage unit 122 may store the acquired original image.

（一覧情報記憶部１２３）
実施形態に係る一覧情報記憶部１２３は、画像における対象の占有率等に関する各種情報を記憶する。例えば、一覧情報記憶部１２３は、対象ごとに画像における対象の占有率等に関する各種情報を記憶する。図６は、実施形態に係る一覧情報記憶部の一例を示す図である。図６に示す例においては、一覧情報記憶部１２３は、一覧情報ＬＴ１１や一覧情報ＬＴ２１といった含まれる対象ごとに情報（テーブル）を記憶する。例えば、一覧情報ＬＴ１１や一覧情報ＬＴ２１は、「画像ＩＤ」や「占有率（％）」といった項目を有する。 (List information storage unit 123)
The list information storage unit 123 according to the embodiment stores various information regarding the occupancy rate of the object in the image. For example, the list information storage unit 123 stores various information regarding the occupancy rate of the target in the image for each target. FIG. 6 is a diagram showing an example of the list information storage unit according to the embodiment. In the example shown in FIG. 6, the list information storage unit 123 stores information (table) for each included object such as the list information LT11 and the list information LT21. For example, the list information LT11 and the list information LT21 have items such as "image ID" and "occupancy rate (%)".

「画像ＩＤ」は、画像を識別するための識別情報を示す。「画像」は、画像情報を示す。一覧情報ＬＴ１１の「占有率（％）」には、「猫」や「背景」といった項目が含まれる。また、一覧情報ＬＴ２１の「占有率（％）」には、「犬」や「背景」といった項目が含まれる。 The "image ID" indicates identification information for identifying an image. "Image" indicates image information. The "occupancy rate (%)" of the list information LT11 includes items such as "cat" and "background". In addition, the "occupancy rate (%)" of the list information LT21 includes items such as "dog" and "background".

例えば、図６中の一覧情報ＬＴ１１に示すように、画像ＩＤ「ＩＭ１１」により識別される画像（ＩＭ１１）は、対象「猫」の占有率が「８０」％であり、対象「背景」の占有率が「１８」％であることを示す。また、例えば、図６中の一覧情報ＬＴ１１に示すように、画像ＩＤ「ＩＭ１２」により識別される画像（ＩＭ１２）は、対象「猫」の占有率が「６０」％であり、対象「背景」の占有率が「３６」％であることを示す。 For example, as shown in the list information LT11 in FIG. 6, the image (IM11) identified by the image ID “IM11” has an occupancy rate of the target “cat” of “80”% and occupies the target “background”. Indicates that the rate is "18"%. Further, for example, as shown in the list information LT11 in FIG. 6, in the image (IM12) identified by the image ID “IM12”, the occupancy rate of the target “cat” is “60”%, and the target “background”. Indicates that the occupancy rate of is "36"%.

なお、一覧情報記憶部１２３は、上記に限らず、目的に応じて種々の情報を記憶してもよい。例えば、一覧情報記憶部１２３は、対象の画像における位置に関する情報を記憶してもよい。例えば、一覧情報記憶部１２３は、対象が画像の中央、上、下、右、左、全体等のどの部分に位置するかに関する種々の情報を記憶してもよい。例えば、一覧情報記憶部１２３は、各画像ＩＤに対応する画像や画像の格納場所を示すファイルパス名などを記憶してもよい。また、図６の例では、一覧情報記憶部１２３は、一覧情報ＬＴ１１や一覧情報ＬＴ２１といったように、対象ごとにテーブルを分けて記憶する場合を示したが、対象の占有率等に関する各種情報を１つのテーブルに記憶してもよい。例えば、一覧情報記憶部１２３は、「占有率（％）」に全対象の項目を含ませることにより、１つのテーブルに記憶してもよい。 The list information storage unit 123 is not limited to the above, and various information may be stored depending on the purpose. For example, the list information storage unit 123 may store information regarding a position in the target image. For example, the list information storage unit 123 may store various information regarding which portion of the image, such as the center, top, bottom, right, left, or the whole, the target is located. For example, the list information storage unit 123 may store an image corresponding to each image ID, a file path name indicating a storage location of the image, and the like. Further, in the example of FIG. 6, the list information storage unit 123 shows a case where a table is separately stored for each target, such as list information LT11 and list information LT21, but various information regarding the occupancy rate of the target and the like is stored. It may be stored in one table. For example, the list information storage unit 123 may store all the target items in one table by including all the target items in the “occupancy rate (%)”.

（制御部１３０）
図３の説明に戻って、制御部１３０は、コントローラ（controller）であり、例えば、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）等によって、検索装置１００内部の記憶装置に記憶されている各種プログラム（推定プログラムの一例に相当）がＲＡＭを作業領域として実行されることにより実現される。また、制御部１３０は、コントローラ（controller）であり、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現される。 (Control unit 130)
Returning to the description of FIG. 3, the control unit 130 is a controller, and is stored in a storage device inside the search device 100 by, for example, a CPU (Central Processing Unit) or an MPU (Micro Processing Unit). It is realized by executing various programs (corresponding to an example of an estimation program) using the RAM as a work area. Further, the control unit 130 is a controller, and is realized by, for example, an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

図３に示すように、制御部１３０は、取得部１３１と、学習部１３２と、推定部１３３と、抽出部１３４と、決定部１３５と、提供部１３６とを有し、以下に説明する情報処理の機能や作用を実現または実行する。 As shown in FIG. 3, the control unit 130 includes an acquisition unit 131, a learning unit 132, an estimation unit 133, an extraction unit 134, a determination unit 135, and a provision unit 136, and the information described below. Realize or execute the function or action of processing.

（取得部１３１）
例えば、取得部１３１は、各種情報を取得する。例えば、取得部１３１は、画像を取得する。例えば、取得部１３１は、画像情報記憶部１２２から画像を取得する。図１では、取得部１３１は、画像情報記憶部１２２から画像ＩＭ１１〜ＩＭ１５等を取得する。なお、取得部１３１は、外部の情報処理装置から画像ＩＭ１１〜ＩＭ１５等を取得してもよい。 (Acquisition unit 131)
For example, the acquisition unit 131 acquires various types of information. For example, the acquisition unit 131 acquires an image. For example, the acquisition unit 131 acquires an image from the image information storage unit 122. In FIG. 1, the acquisition unit 131 acquires images IM11 to IM15 and the like from the image information storage unit 122. The acquisition unit 131 may acquire images IM11 to IM15 and the like from an external information processing device.

また、取得部１３１は、画像検索における指定情報に基づいて、検索対象の占有率に関する情報を取得する。例えば、取得部１３１は、指定情報としてクエリ画像から推定される検索対象の占有率に関する情報を取得する。図２では、取得部１３１は、ユーザＵ１が利用する端末装置１０からクエリ画像ＩＭ１０を取得する。図２では、取得部１３１は、検索対象が「猫」であり、その検索対象に関する条件が占有率「６８％」であることを示す画像ＩＭ１０を、端末装置１０から取得する。例えば、取得部１３１は、クエリ画像（画像ＩＭ１０）を取得し、後述する推定部１３３による占有率の推定により、検索対象が「猫」であり、その検索対象に関する条件が占有率「６８％」であることを示す指定情報を取得する。また、例えば、取得部１３１は、ユーザＵ１が端末装置１０に、検索対象が「猫」、占有率「６８％」と入力した場合、その入力された情報を指定情報として取得してもよい。 In addition, the acquisition unit 131 acquires information regarding the occupancy rate of the search target based on the designated information in the image search. For example, the acquisition unit 131 acquires information on the occupancy rate of the search target estimated from the query image as designated information. In FIG. 2, the acquisition unit 131 acquires the query image IM 10 from the terminal device 10 used by the user U1. In FIG. 2, the acquisition unit 131 acquires an image IM 10 from the terminal device 10 indicating that the search target is a “cat” and the condition relating to the search target is the occupancy rate of “68%”. For example, the acquisition unit 131 acquires the query image (image IM10), and the search target is "cat" by the estimation of the occupancy rate by the estimation unit 133 described later, and the condition related to the search target is the occupancy rate "68%". Acquires the specified information indicating that. Further, for example, when the user U1 inputs the search target "cat" and the occupancy rate "68%" to the terminal device 10, the acquisition unit 131 may acquire the input information as designated information.

（学習部１３２）
学習部１３２は、種々の情報を学習する。また、学習部１３２は、学習により種々の情報を生成する。例えば、学習部１３２は、学習器（モデル）を学習する。言い換えると、学習部１３２は、学習を行うことにより学習器（モデル）を生成する。例えば、学習部１３２は、学習器ＬＥを学習する。例えば、学習部１３２は、画像と当該画像における所定の対象の占有率との組み合わせにより学習器を学習する。また、学習部１３２は、所定の評価関数を最小化するように学習器を学習する。なお、学習部１３２が行う学習処理の詳細は後述する。 (Learning Department 132)
The learning unit 132 learns various information. In addition, the learning unit 132 generates various information by learning. For example, the learning unit 132 learns a learning device (model). In other words, the learning unit 132 generates a learning device (model) by performing learning. For example, the learning unit 132 learns the learning device LE. For example, the learning unit 132 learns the learning device by combining an image with an occupancy rate of a predetermined object in the image. Further, the learning unit 132 learns the learning device so as to minimize a predetermined evaluation function. The details of the learning process performed by the learning unit 132 will be described later.

（推定部１３３）
推定部１３３は、各種情報を推定する。例えば、推定部１３３は、入力画像に応じて当該入力画像中における所定の対象の占有率を出力する学習器と、取得部１３１により取得された画像とに基づいて、取得部１３１により取得された画像における所定の対象の占有率を推定する。例えば、推定部１３３は、所定のデータ（教師データ）により学習されたニューラルネットワークである学習器に基づいて、取得部１３１により取得された画像における所定の対象の占有率を推定する。例えば、推定部１３３は、画像と当該画像中における所定の対象の占有率との組み合わせにより学習されたニューラルネットワークである学習器に基づいて、取得部１３１により取得された画像における所定の対象の占有率を推定する。例えば、推定部１３３は、畳み込み処理及びプーリング処理を行うニューラルネットワークである学習器に基づいて、取得部１３１により取得された画像における所定の対象の占有率を推定する。 (Estimation unit 133)
The estimation unit 133 estimates various information. For example, the estimation unit 133 is acquired by the acquisition unit 131 based on the learning device that outputs the occupancy rate of a predetermined target in the input image according to the input image and the image acquired by the acquisition unit 131. Estimate the occupancy of a given object in an image. For example, the estimation unit 133 estimates the occupancy rate of a predetermined target in the image acquired by the acquisition unit 131 based on the learner which is a neural network learned from the predetermined data (teacher data). For example, the estimation unit 133 occupies a predetermined object in the image acquired by the acquisition unit 131 based on a learner which is a neural network learned by a combination of the image and the occupancy rate of the predetermined object in the image. Estimate the rate. For example, the estimation unit 133 estimates the occupancy rate of a predetermined target in the image acquired by the acquisition unit 131 based on the learner which is a neural network that performs the convolution process and the pooling process.

図１では、推定部１３３は、画像ＩＭを学習器ＬＥに入力することにより、画像ＩＭにおける猫の占有率を推定する。具体的には、画像ＩＭが入力された学習器ＬＥは、入力された画像ＩＭにおける猫の占有率を示す情報を出力し、推定部１３３は、学習器ＬＥが出力した画像ＩＭにおける猫の占有率を示す情報に基づいて、画像ＩＭにおける猫の占有率を推定する。例えば、推定部１３３は、学習器ＬＥの出力に基づいて、画像ＩＭ１１における猫の占有率を推定する。図１では、推定部１３３は、推定情報ＯＣ１１に示すように、学習器ＬＥの出力に基づいて、画像ＩＭ１１における猫の占有率を８０％と推定し、画像ＩＭ１１における背景の占有率を１８％と推定する。また、図１では、推定部１３３は、学習器ＬＥの出力に基づいて、画像ＩＭ１２〜ＩＭ１５等における猫の占有率を推定する。例えば、推定部１３３は、タグ「猫」が対応付けられた画像ＩＭを学習器ＬＥに入力することにより、各画像ＩＭにおける猫の占有率を推定する。 In FIG. 1, the estimation unit 133 estimates the occupancy rate of the cat in the image IM by inputting the image IM into the learner LE. Specifically, the learner LE to which the image IM is input outputs information indicating the occupancy rate of the cat in the input image IM, and the estimation unit 133 outputs the occupancy of the cat in the image IM output by the learner LE. The occupancy rate of the cat in the image IM is estimated based on the information indicating the rate. For example, the estimation unit 133 estimates the occupancy rate of the cat in the image IM11 based on the output of the learner LE. In FIG. 1, as shown in the estimation information OC11, the estimation unit 133 estimates that the occupancy rate of the cat in the image IM11 is 80% and the occupancy rate of the background in the image IM11 is 18% based on the output of the learner LE. Presumed to be. Further, in FIG. 1, the estimation unit 133 estimates the occupancy rate of the cat in the images IM12 to IM15 and the like based on the output of the learning device LE. For example, the estimation unit 133 estimates the occupancy rate of the cat in each image IM by inputting the image IM associated with the tag "cat" into the learner LE.

また、図２では、推定部１３３は、画像ＩＭ１０が入力された学習器ＬＥの出力に基づいて、画像ＩＭ１０における猫の占有率を推定する。例えば、推定部１３３は、推定情報ＯＣ１０に示すように、学習器ＬＥの出力に基づいて、画像ＩＭ１０における猫の占有率を６８％と推定し、画像ＩＭ１０における背景の占有率を３０％と推定する。 Further, in FIG. 2, the estimation unit 133 estimates the occupancy rate of the cat in the image IM10 based on the output of the learning device LE to which the image IM10 is input. For example, as shown in the estimation information OC10, the estimation unit 133 estimates that the occupancy rate of the cat in the image IM10 is 68% and the occupancy rate of the background in the image IM10 is 30% based on the output of the learner LE. To do.

（抽出部１３４）
抽出部１３４は、種々の情報を抽出する。例えば、抽出部１３４は、画像情報記憶部１２２や一覧情報記憶部１２３から画像に関する情報を抽出する。例えば、抽出部１３４は、画像情報記憶部１２２や一覧情報記憶部１２３に記憶された所定の画像群から条件を満たす画像を抽出する。例えば、抽出部１３４は、所定の画像群から検索対象の占有率に関する条件を満たす検索結果画像を抽出する。例えば、抽出部１３４は、クエリ画像中における検索対象を含む対象の占有率に関する情報と、所定の画像群の各画像中における検索対象を含む対象の占有率に関する情報との類似度に基づいて、検索結果画像を抽出する。 (Extractor 134)
The extraction unit 134 extracts various information. For example, the extraction unit 134 extracts information about an image from the image information storage unit 122 and the list information storage unit 123. For example, the extraction unit 134 extracts an image satisfying the conditions from a predetermined image group stored in the image information storage unit 122 or the list information storage unit 123. For example, the extraction unit 134 extracts a search result image that satisfies the condition regarding the occupancy rate of the search target from a predetermined image group. For example, the extraction unit 134 is based on the degree of similarity between the information on the occupancy rate of the target including the search target in the query image and the information on the occupancy rate of the target including the search target in each image of the predetermined image group. Extract search result images.

図２では、抽出部１３４は、検索対象が「猫」であり、その検索対象に関する条件が占有率「６８％」を満たす検索結果画像を抽出する。例えば、抽出部１３４は、猫の占有率を示す一覧情報ＬＴ１１に含まれる画像群から、猫の占有率が条件である「６８％」から所定の範囲内の画像ＩＭを抽出する。抽出部１３４は、猫の占有率が条件である「６８％」から±２５％以内である画像ＩＭを抽出する。例えば、抽出部１３４は、猫の占有率が４３％から９３％の範囲内である画像ＩＭを抽出する。具体的には、抽出部１３４は、抽出情報ＬＴ１２に示すように、画像ＩＭ１１、画像ＩＭ１２、画像ＩＭ１４、画像ＩＭ１５等を抽出する。このように、図２の例では、抽出部１３４は、猫の占有率が２０％であり範囲外である画像ＩＭ１３等の条件を満たさない画像ＩＭ以外の画像ＩＭを抽出する。 In FIG. 2, the extraction unit 134 extracts a search result image in which the search target is a “cat” and the condition related to the search target satisfies the occupancy rate “68%”. For example, the extraction unit 134 extracts an image IM within a predetermined range from "68%", which is a condition of the cat occupancy rate, from the image group included in the list information LT11 indicating the cat occupancy rate. The extraction unit 134 extracts an image IM in which the occupancy rate of the cat is within ± 25% from “68%” which is a condition. For example, the extraction unit 134 extracts an image IM in which the occupancy rate of the cat is in the range of 43% to 93%. Specifically, as shown in the extraction information LT12, the extraction unit 134 extracts the image IM11, the image IM12, the image IM14, the image IM15, and the like. As described above, in the example of FIG. 2, the extraction unit 134 extracts an image IM other than the image IM that does not satisfy the conditions such as the image IM 13 that the cat occupancy rate is 20% and is out of the range.

（決定部１３５）
決定部１３５は、種々の情報を決定する。例えば、決定部１３５は、推定部１３３により推定された複数の画像の各々における所定の対象の占有率に基づいて、複数の画像のランキングを決定する。例えば、決定部１３５は、抽出部１３４により抽出された検索結果画像の各々における検索対象の占有率に関する情報に応じて、検索結果画像のランキングを決定する。 (Decision part 135)
The determination unit 135 determines various information. For example, the determination unit 135 determines the ranking of a plurality of images based on the occupancy rate of a predetermined target in each of the plurality of images estimated by the estimation unit 133. For example, the determination unit 135 determines the ranking of the search result images according to the information regarding the occupancy rate of the search target in each of the search result images extracted by the extraction unit 134.

図２では、決定部１３５は、抽出した画像ＩＭのランキングを決定する。例えば、決定部１３５は、図２中のランキング情報ＬＴ１３に示すように、猫の占有率が条件である「６８％」からの誤差が小さい画像ＩＭのランキングが高くなるようにランキングを決定する。具体的には、決定部１３５は、誤差が「２（＝７０−６８）」であり最小の画像ＩＭ１５を１位とし、画像ＩＭ１５の次に誤差が小さい画像ＩＭ１９を２位とし、画像ＩＭ１９の次に誤差が小さい画像ＩＭ１２を３位とし、画像ＩＭ１２の次に誤差が小さい画像ＩＭ１７を４位とする。また、決定部１３５は、抽出した各画像ＩＭに含まれる複数の対象の占有率の分布とクエリ画像に含まれる複数の対象の占有率の分布との類似度に基づいて、ランキングを決定してもよい。例えば、決定部１３５は、抽出した各画像ＩＭに含まれる複数の対象の占有率の分布とクエリ画像に含まれる複数の対象の占有率の分布とのＫＬダイバージェンスの値が小さい程ランキングを高く決定してもよい。 In FIG. 2, the determination unit 135 determines the ranking of the extracted image IM. For example, as shown in the ranking information LT13 in FIG. 2, the determination unit 135 determines the ranking so that the ranking of the image IM having a small error from "68%", which is the condition of the cat occupancy, is high. Specifically, the determination unit 135 places the smallest image IM15 having an error of "2 (= 70-68)" as the first place, the image IM19 having the next smallest error after the image IM15 as the second place, and the image IM19. The image IM12 having the next smallest error is ranked third, and the image IM17 having the next smallest error after the image IM12 is ranked fourth. Further, the determination unit 135 determines the ranking based on the similarity between the distribution of the occupancy rates of the plurality of objects included in each extracted image IM and the distribution of the occupancy rates of the plurality of objects included in the query image. May be good. For example, the determination unit 135 determines the higher the KL divergence value between the distribution of the occupancy rates of the plurality of objects included in each extracted image IM and the distribution of the occupancy rates of the plurality of objects included in the query image. You may.

例えば、決定部１３５は、検索結果画像の各々のランキングにおける検索結果画像の各々の占有率の差に応じて、検索結果画像の各々のランキングを決定してもよい。例えば、決定部１３５は、検索結果画像の各々のランキングにおいて連続する検索結果画像の各々の占有率の差が所定の条件を満たすように、検索結果画像の各々のランキングを決定してもよい。例えば、決定部１３５は、連続するランキング間の画像ＩＭにおける対象の占有率の差が５％以上になるように、検索結果画像のランキングを決定してもよい。例えば、決定部１３５は、ランキングが１位の画像ＩＭにおける猫の占有率が「７０」％である場合、ランキングが２位の画像ＩＭにおける猫の占有率が「７５」％より大きいか、または「６５」％未満となるようにランキングが２位の画像ＩＭを決定してもよい。 For example, the determination unit 135 may determine the ranking of each of the search result images according to the difference in the occupancy rate of each of the search result images in each ranking of the search result images. For example, the determination unit 135 may determine the ranking of each of the search result images so that the difference in the occupancy rate of each of the consecutive search result images in each ranking of the search result images satisfies a predetermined condition. For example, the determination unit 135 may determine the ranking of the search result images so that the difference in the occupancy rate of the target in the image IM between the consecutive rankings is 5% or more. For example, in the determination unit 135, when the occupancy rate of the cat in the image IM having the first ranking is "70"%, the occupancy rate of the cat in the image IM having the second ranking is larger than "75"%, or The image IM having the second highest ranking may be determined so as to be less than "65"%.

例えば、決定部１３５は、検索結果画像の各々に検索対象以外の対象が含まれるか否かに応じて、検索結果画像の各々のランキングを決定してもよい。例えば、決定部１３５は、検索対象が「猫」である場合、画像に猫以外の対象（例えば、「犬」等）が含まれる場合、その画像のランキングが低くなるように、検索結果画像の各々のランキングを決定してもよい。 For example, the determination unit 135 may determine the ranking of each of the search result images depending on whether or not each of the search result images includes an object other than the search target. For example, the determination unit 135 determines the search result image so that when the search target is a "cat" and the image includes a target other than a cat (for example, "dog"), the ranking of the image is lowered. Each ranking may be determined.

例えば、決定部１３５は、検索結果画像の各々に文字情報が含まれるか否かに応じて、検索結果画像の各々のランキングを決定してもよい。例えば、決定部１３５は、検索結果画像の各々に文字情報が含まれる画像のランキングが低くなるように、検索結果画像の各々のランキングを決定してもよい。この場合、例えば、決定部１３５は、文字を認識する種々の従来技術を適宜用いて、検索結果画像の各々に文字情報が含まれるかを判別してもよい。 For example, the determination unit 135 may determine the ranking of each of the search result images depending on whether or not each of the search result images includes character information. For example, the determination unit 135 may determine the ranking of each of the search result images so that the ranking of the images in which the character information is included in each of the search result images is low. In this case, for example, the determination unit 135 may appropriately use various conventional techniques for recognizing characters to determine whether or not character information is included in each of the search result images.

例えば、決定部１３５は、検索結果画像の各々における検索対象の位置に応じて、検索結果画像の各々のランキングを決定してもよい。例えば、決定部１３５は、クエリ画像における対象の位置が右である場合、検索結果画像のうち、対象が右に位置する画像の順位が高くなるように、検索結果画像の各々のランキングを決定してもよい。この場合、例えば、決定部１３５は、取得部１３１により取得された検索結果画像の各々における対象の位置を示す情報に基づいて、検索結果画像の各々における検索対象の位置を判別してもよい。また、例えば、決定部１３５は、位置情報を保持する学習器から出力される検索結果画像の各々における対象の位置を示す情報に基づいて、検索結果画像の各々における検索対象の位置を判別してもよい。なお、上記は一例であり、決定部１３５は、検索結果画像の各々における検索対象の位置を判別可能であれば、どのような情報を用いてもよい。 For example, the determination unit 135 may determine the ranking of each of the search result images according to the position of the search target in each of the search result images. For example, when the target position in the query image is on the right, the determination unit 135 determines the ranking of each of the search result images so that the image whose target is located on the right is ranked higher among the search result images. You may. In this case, for example, the determination unit 135 may determine the position of the search target in each of the search result images based on the information indicating the position of the target in each of the search result images acquired by the acquisition unit 131. Further, for example, the determination unit 135 determines the position of the search target in each of the search result images based on the information indicating the position of the target in each of the search result images output from the learning device that holds the position information. May be good. The above is an example, and the determination unit 135 may use any information as long as the position of the search target in each of the search result images can be determined.

（提供部１３６）
提供部１３６は、外部の情報処理装置へ各種情報を提供する。例えば、提供部１３６は、決定部１３５により決定された複数の画像（検索結果画像）のランキングに基づく情報を提供する。図２では、提供部１３６は、決定したランキングに基づいた検索結果をユーザＵ１に提供する。例えば、提供部１３６は、画像ＩＭ１５を１位とし、画像ＩＭ１９を２位とし、画像ＩＭ１２を３位とし、画像ＩＭ１７を４位とするランキングに基づいた表示順の検索結果をユーザＵ１が利用する端末装置１０に提供する。例えば、提供部１３６は、決定したランキングに基づいた検索結果を端末装置１０に送信する。 (Providing Department 136)
The providing unit 136 provides various information to an external information processing device. For example, the providing unit 136 provides information based on the ranking of a plurality of images (search result images) determined by the determining unit 135. In FIG. 2, the providing unit 136 provides the user U1 with a search result based on the determined ranking. For example, in the providing unit 136, the user U1 uses the search results in the display order based on the ranking in which the image IM15 is the first place, the image IM19 is the second place, the image IM12 is the third place, and the image IM17 is the fourth place. Provided to the terminal device 10. For example, the providing unit 136 transmits the search result based on the determined ranking to the terminal device 10.

〔３．占有率の推定処理のフロー〕
ここで、図７を用いて、実施形態に係る検索装置１００による占有率の推定処理の手順について説明する。図７は、実施形態に係る画像における対象の占有率の推定の一例を示すフローチャートである。 [3. Flow of occupancy estimation processing]
Here, the procedure of the occupancy rate estimation process by the search device 100 according to the embodiment will be described with reference to FIG. 7. FIG. 7 is a flowchart showing an example of estimating the occupancy rate of the object in the image according to the embodiment.

図７に示すように、検索装置１００は、画像を取得する（ステップＳ１０１）。図１では、検索装置１００は、画像ＩＭ１１〜ＩＭ１５等を取得する。その後、検索装置１００は、ステップＳ１０１で取得した画像を学習器に入力する（ステップＳ１０２）。図１では、検索装置１００は、取得した画像ＩＭ１１〜ＩＭ１５等を学習器ＬＥに入力する。 As shown in FIG. 7, the search device 100 acquires an image (step S101). In FIG. 1, the search device 100 acquires images IM11 to IM15 and the like. After that, the search device 100 inputs the image acquired in step S101 into the learner (step S102). In FIG. 1, the search device 100 inputs the acquired images IM11 to IM15 and the like into the learning device LE.

その後、検索装置１００は、学習器の出力に基づいて入力画像における対象の占有率を推定する（ステップＳ１０３）。図１では、検索装置１００は、学習器ＬＥの出力に基づいて、入力した画像ＩＭ１１〜ＩＭ１５等における猫の占有率を推定する。 After that, the search device 100 estimates the occupancy rate of the target in the input image based on the output of the learner (step S103). In FIG. 1, the search device 100 estimates the occupancy rate of the cat in the input images IM11 to IM15 and the like based on the output of the learning device LE.

その後、検索装置１００は、占有率を含む画像の一覧情報を生成する（ステップＳ１０４）。図１では、検索装置１００は、画像ＩＭ１１〜ＩＭ１５等における猫の占有率を示す一覧情報ＬＴ１１を生成する。 After that, the search device 100 generates list information of images including the occupancy rate (step S104). In FIG. 1, the search device 100 generates list information LT11 indicating the occupancy rate of cats in images IM11 to IM15 and the like.

〔４．ランキングの決定処理のフロー〕
次に、図８を用いて、実施形態に係る検索装置１００によるランキングの決定処理の手順について説明する。図８は、実施形態に係るランキングの決定の一例を示すフローチャートである。 [4. Flow of ranking determination process]
Next, the procedure of the ranking determination process by the search device 100 according to the embodiment will be described with reference to FIG. FIG. 8 is a flowchart showing an example of determining the ranking according to the embodiment.

図８に示すように、検索装置１００は、クエリ画像を取得する（ステップＳ２０１）。図２では、検索装置１００は、クエリ画像として画像ＩＭ１０を取得する。なお、検索装置１００は、ステップＳ２０１では、クエリ画像に変えて文字情報を取得してもよい。この場合、検索装置１００は、ステップＳ２０２の処理を行わなくてもよい。 As shown in FIG. 8, the search device 100 acquires a query image (step S201). In FIG. 2, the search device 100 acquires the image IM10 as a query image. In step S201, the search device 100 may acquire character information instead of the query image. In this case, the search device 100 does not have to perform the process of step S202.

その後、検索装置１００は、クエリ画像を学習器に入力する（ステップＳ２０２）。図２では、検索装置１００は、画像ＩＭ１０を学習器ＬＥに入力する。 After that, the search device 100 inputs the query image into the learner (step S202). In FIG. 2, the search device 100 inputs the image IM 10 into the learner LE.

その後、検索装置１００は、学習器の出力に基づいてクエリ画像における対象の占有率を推定する（ステップＳ２０３）。図２では、検索装置１００は、学習器ＬＥの出力に基づいて、入力した画像ＩＭ１０における猫の占有率を推定する。 After that, the search device 100 estimates the occupancy rate of the target in the query image based on the output of the learner (step S203). In FIG. 2, the search device 100 estimates the occupancy rate of the cat in the input image IM10 based on the output of the learning device LE.

その後、検索装置１００は、一覧情報中の各画像における検索対象の占有率と、クエリ画像における検索対象の占有率とに基づいて一覧情報から画像を抽出する（ステップＳ２０４）。図２の例では、検索装置１００は、猫の占有率が条件である「６８％」から±２５％以内である画像ＩＭを抽出する。例えば、検索装置１００は、抽出情報ＬＴ１２に示すように、画像ＩＭ１１、画像ＩＭ１２、画像ＩＭ１４、画像ＩＭ１５等を抽出する。 After that, the search device 100 extracts an image from the list information based on the occupancy rate of the search target in each image in the list information and the occupancy rate of the search target in the query image (step S204). In the example of FIG. 2, the search device 100 extracts an image IM in which the occupancy rate of the cat is within ± 25% from “68%” which is a condition. For example, the search device 100 extracts the image IM11, the image IM12, the image IM14, the image IM15, and the like as shown in the extraction information LT12.

その後、検索装置１００は、抽出した画像のランキングを決定する（ステップＳ２０５）。例えば、検索装置１００は、ランキング情報ＬＴ１３に示すように、猫の占有率が条件である「６８％」からの誤差が小さい画像ＩＭのランキングが高くなるようにランキングを決定する。 After that, the search device 100 determines the ranking of the extracted images (step S205). For example, as shown in the ranking information LT13, the search device 100 determines the ranking so that the ranking of the image IM having a small error from "68%", which is the condition of the cat occupancy, is high.

その後、検索装置１００は、決定したランキングに基づく、検索結果を提供する（ステップＳ２０６）。図２の例では、検索装置１００は、画像ＩＭ１５を１位とし、画像ＩＭ１９を２位とし、画像ＩＭ１２を３位とし、画像ＩＭ１７を４位とするランキングに基づいた表示順の検索結果をユーザＵ１が利用する端末装置１０に提供する。 After that, the search device 100 provides a search result based on the determined ranking (step S206). In the example of FIG. 2, the search device 100 sets the image IM15 as the first place, the image IM19 as the second place, the image IM12 as the third place, and the image IM17 as the fourth place. It is provided to the terminal device 10 used by U1.

〔５．学習処理〕
ここで、検索装置１００の学習部１３２における学習処理について、図９及び図１０を用いて説明する。図９及び図１０は、実施形態に係る学習処理の一例を示す図である。 [5. Learning process]
Here, the learning process in the learning unit 132 of the search device 100 will be described with reference to FIGS. 9 and 10. 9 and 10 are diagrams showing an example of the learning process according to the embodiment.

まず、図９を用いて説明する。図９に示す例は、検索装置１００は、猫を含む画像ＩＭ２１と画像中における猫等の占有率を示す情報ＲＯ２１（以下、「正解情報ＲＯ２１」と記載する場合がある）との組み合わせを教師データとして取得する（ステップＳ２１）。図９では、説明を簡単にするために、正解情報ＲＯ２１中には猫と背景の２つのクラスの占有率のみを図示するが、正解情報ＲＯ２１中には学習器ＬＥが出力するクラスに対応する占有率を含んでもよい。例えば、学習器ＬＥが２０個のクラスに関する占有率を示す情報を出力する場合、正解情報ＲＯ２１は、猫と背景の２つのクラスの占有率に加えて他のクラスの占有率を示す情報を含んでもよい。学習器ＬＥが猫や背景に対応する２つのクラスに加えて、犬や飛行機や自転車等のクラスに関する占有率を示す情報を出力する場合、正解情報ＲＯ２１には、犬の占有率「０」や飛行機の占有率「０」や自転車の占有率「０」を示す情報を含んでもよい。 First, it will be described with reference to FIG. In the example shown in FIG. 9, the search device 100 teaches a combination of the image IM21 including a cat and the information RO21 indicating the occupancy rate of the cat or the like in the image (hereinafter, may be referred to as “correct answer information RO21”). Acquire as data (step S21). In FIG. 9, for simplification of explanation, only the occupancy rates of the two classes of cat and background are shown in the correct answer information RO21, but the correct answer information RO21 corresponds to the class output by the learner LE. Occupancy may be included. For example, when the learner LE outputs information indicating the occupancy rate of 20 classes, the correct answer information RO21 includes information indicating the occupancy rate of other classes in addition to the occupancy rate of the two classes of cat and background. It may be. When the learner LE outputs information indicating the occupancy rate of classes such as dogs, airplanes, and bicycles in addition to the two classes corresponding to cats and backgrounds, the correct answer information RO21 includes the dog occupancy rate of "0" or Information indicating an airplane occupancy rate "0" or a bicycle occupancy rate "0" may be included.

そして、学習器ＬＥには、猫を含む画像ＩＭ２１が入力される（ステップＳ２２）。その後、出力情報ＯＣ２１−１に示すような対象の占有率を示す情報が学習器ＬＥから出力される（ステップＳ２３）。図９では、説明を簡単にするために、猫と背景の２つのクラスの占有率のみを図示するが、学習器ＬＥは、他のクラスに関する占有率を示す情報を出力してもよい。例えば、学習器ＬＥが２０個のクラスに関する占有率を示す情報を出力する場合、学習器ＬＥは、猫や背景に対応する２つのクラスに加えて、犬や飛行機や自転車等のクラスに関する占有率を示す情報を出力してもよい。 Then, the image IM21 including the cat is input to the learning device LE (step S22). After that, information indicating the occupancy rate of the target as shown in the output information OC21-1 is output from the learner LE (step S23). In FIG. 9, for the sake of simplicity, only the occupancy rates of the two classes of the cat and the background are shown, but the learner LE may output information indicating the occupancy rates of the other classes. For example, when the learner LE outputs information indicating the occupancy rate for 20 classes, the learner LE outputs the occupancy rate for classes such as dogs, airplanes, and bicycles in addition to the two classes corresponding to cats and backgrounds. Information indicating that may be output.

上述したように、例えば、学習部１３２は、ディープラーニングの技術により、学習器ＬＥを学習し、生成する。例えば、学習部１３２は、画像と当該画像中における所定の対象の占有率との組み合わせを教師データとして用いる。例えば、学習部１３２は、学習器ＬＥにおける出力と、教師データに含まれる所定の対象の占有率との誤差が少なくなるようにパラメータ（接続係数）を補正するバックプロパゲーション（誤差逆伝播法）等の処理を行うことにより、学習器ＬＥを学習する。例えば、学習部１３２は、所定の誤差（ロス）関数を最小化するようにバックプロパゲーション等の処理を行うことにより学習器ＬＥを生成する。 As described above, for example, the learning unit 132 learns and generates the learning device LE by the technique of deep learning. For example, the learning unit 132 uses a combination of an image and an occupancy rate of a predetermined object in the image as teacher data. For example, the learning unit 132 corrects a parameter (connection coefficient) so that the error between the output in the learner LE and the occupancy rate of a predetermined object included in the teacher data is reduced (error back propagation method). The learner LE is learned by performing processing such as. For example, the learning unit 132 generates the learning device LE by performing processing such as backpropagation so as to minimize a predetermined error (loss) function.

例えば、学習部１３２は、下記の式（１）に示すような、誤差関数Ｌを用いる。下記の式（１）に示すように、学習部１３２は、例えば、Ｎ−クラス分類問題の場合、交差エントロピーを誤差関数として用いる。なお、誤差関数Ｌは、識別結果の確信度を表すものであれば、どのような関数であっても良い。例えば、誤差関数Ｌは、識別確率から求められるエントロピーであってもよい。また、例えば、誤差関数Ｌは、学習器ＬＥの認識の精度を示すものであれば、どのような関数であってもよい。 For example, the learning unit 132 uses an error function L as shown in the following equation (1). As shown in the following equation (1), the learning unit 132 uses the cross entropy as an error function, for example, in the case of the N-class classification problem. The error function L may be any function as long as it represents the certainty of the identification result. For example, the error function L may be the entropy obtained from the discrimination probability. Further, for example, the error function L may be any function as long as it indicates the recognition accuracy of the learner LE.

ここで、上記式（１）や下記の式（２）〜（４）中の「ｘ」は画像を示す。例えば、図９に示す例において、上記式（１）や下記の式（２）〜（４）中の「ｘ」は、画像ＩＭに対応する。また、変数「ｎ」に代入される１〜Ｎは、学習器ＬＥが識別（分類）する各クラスに対応する。例えば、上記式（１）に対応する学習器ＬＥは、Ｎ個のクラスを識別することを示す。例えば、各クラスには、「猫」や「背景」等が各々対応する。 Here, "x" in the above formula (1) and the following formulas (2) to (4) indicates an image. For example, in the example shown in FIG. 9, "x" in the above formula (1) and the following formulas (2) to (4) corresponds to the image IM. Further, 1 to N assigned to the variable "n" correspond to each class identified (classified) by the learner LE. For example, the learner LE corresponding to the above equation (1) indicates that it identifies N classes. For example, "cat", "background", etc. correspond to each class.

また、上記式（１）や下記の式（３）、（４）中の「ｔ_ｎ（ｘ）」は、画像ＩＭ２１におけるクラスｎ（１〜Ｎのいずれか）に対応する対象の占有率を示す。例えば、上記式（１）中の「ｔ_ｎ（ｘ）」は、正解情報ＲＯ２１に示すような、クラスｎに対応する対象の占有率を示す。この場合、例えば、クラス１に対応する対象を「猫」とした場合、「ｔ_１（ｘ）」は、「０．５３（５３％）」となる。また、例えば、上記式（１）中の「ｔ_ｎ（ｘ）」は、正解クラスの時のみ１を取り、それ以外では０を取るように定義してもよい。 Further, "t _n (x)" in the above equation (1) and the following equations (3) and (4) indicates the occupancy rate of the object corresponding to the class n (any of 1 to N) in the image IM21. Shown. For example, "t _n (x)" in the above equation (1) indicates the occupancy rate of the object corresponding to the class n as shown in the correct answer information RO21. In this case, for example, when the target corresponding to class 1 is a "cat", "t ₁ (x)" becomes "0.53 (53%)". Further, for example, "t _n (x)" in the above equation (1) may be defined to take 1 only in the correct answer class and 0 in other cases.

また、上記式（１）や下記の式（２）、（３）中の「ｐ_ｎ（ｘ）」は、画像ＩＭ２１におけるクラスｎ（１〜Ｎのいずれか）に対応する対象について、学習器ＬＥの出力に基づく占有率を示す。例えば、上記式（１）中の「ｐ_ｎ（ｘ）」は、出力情報ＯＣ２１−１に示すような、学習器ＬＥが出力するクラスｎに対応する対象の占有率を示す。この場合、例えば、クラス１に対応する対象を「猫」とした場合、「ｐ_１（ｘ）」は、「０．６４（６４％）」となる。 Further, " _pn (x)" in the above equation (1) and the following equations (2) and (3) is a learner for an object corresponding to the class n (any of 1 to N) in the image IM21. The occupancy rate based on the output of LE is shown. For example, “ _pn (x)” in the above equation (1) indicates the occupancy rate of the object corresponding to the class n output by the learner LE as shown in the output information OC21-1. In this case, for example, when the target corresponding to class 1 is "cat", "p ₁ (x)" becomes "0.64 (64%)".

また、上記式（１）中の「ｐ_ｎ（ｘ）」は、ｘに対するクラスｎの確率で以下の式（２）に示すようなＳｏｆｔｍａｘ関数で定義される。 Further, " _pn (x)" in the above equation (1) is defined by the Softmax function as shown in the following equation (2) with a probability of class n with respect to x.

上記式（２）の関数「ｆ_ｎ」は、ＣＮＮ（学習器ＬＥ）が出力するクラスｎのスコアである。「θ」は、ＣＮＮ（学習器ＬＥ）のパラメータである。また、関数「ｅｘｐ」は、指数関数（exponential function）である。この場合、上記式（１）に示す誤差関数Ｌ（１）の勾配は、下記の式（３）により算出される。 The function "f _n " in the above equation (2) is a class n score output by the CNN (learner LE). “Θ” is a parameter of CNN (learner LE). Further, the function "exp" is an exponential function. In this case, the gradient of the error function L (1) shown in the above equation (1) is calculated by the following equation (3).

上記式（３）に示すように、１〜Ｎまでの全クラスにおいて、ｐ_ｎ（ｘ）＝ｔ_ｎ（ｘ）である場合、誤差関数Ｌ（ｘ）の勾配は０になり極値になる。例えば、学習部１３２は、誤差関数Ｌ（ｘ）の勾配が０になるように、フィードバック処理を行う（ステップＳ２４）。例えば、学習部１３２が上述のような処理を繰り返すことにより、学習器ＬＥは、入力された画像における対象の占有率を適切に出力することができる。なお、図９は、学習器ＬＥの出力を正解情報ＲＯ２１に近づけるために、誤差関数Ｌ等を最小化するように処理を繰り返すことを視覚的に示すためのものであり、学習器ＬＥ内において自動で行われてもよい。 As shown in the above equation (3), when _pn (x) = t _n (x) in all the classes from 1 to N, the gradient of the error function L (x) becomes 0 and becomes an extreme value. .. For example, the learning unit 132 performs feedback processing so that the gradient of the error function L (x) becomes 0 (step S24). For example, when the learning unit 132 repeats the above-described processing, the learning device LE can appropriately output the occupancy rate of the target in the input image. Note that FIG. 9 is for visually showing that the process is repeated so as to minimize the error function L and the like in order to bring the output of the learner LE closer to the correct answer information RO21. It may be done automatically.

上述のように、「ｔ_ｎ（ｘ）」は、画像全体を１としたときのクラスｎの占有率と定義すると、例えば、以下のような式（４）で示される。 As described above, “t _n (x)” is defined as the occupancy rate of class n when the entire image is 1, and is represented by, for example, the following equation (4).

「δ_ｉ,ｊ」はクロネッカーのデルタであり、「ｊ_ｐ」はピクセルｐのラベルである。例えば、「ｊ_ｐ」は、猫や背景等の複数のクラスのいずれかに対応するラベルである。例えば、各ピクセルのラベルは、そのピクセルがどの対象に関するピクセルであるかを示す。あるピクセルのラベルが猫に対応するラベルである場合、そのピクセルは対象「猫」に関するピクセルであることを示す。例えば、上記式（４）の分母は、画像「ｘ」の全ピクセルの合計値となる。また、例えば、上記式（４）の分子は、画像「ｘ」におけるクラスｎのラベルが付されたピクセル数となる。このように、上記式（４）の「ｔ_ｎ（ｘ）」は、０〜１の値となる。上記式（４）により、画像における各クラスに対応する対象の占有率が算出される。なお、各ピクセルへのラベル付けは、例えば検索装置１００の管理者や画像の所有者等により行われてもよい。 "[Delta] _{i, j"} is the Kronecker delta, "j _p" is the label of a pixel p. For example, "j _p" is a label that corresponds to any one of a plurality of classes, such as cats and background. For example, the label for each pixel indicates which object the pixel is for. If the label of a pixel is the label corresponding to the cat, it indicates that the pixel is the pixel related to the target "cat". For example, the denominator of the above equation (4) is the total value of all pixels of the image "x". Further, for example, the molecule of the above formula (4) has the number of pixels labeled with class n in the image “x”. As described above, "t _n (x)" in the above formula (4) has a value of 0 to 1. The occupancy rate of the target corresponding to each class in the image is calculated by the above formula (4). The labeling of each pixel may be performed by, for example, the administrator of the search device 100, the owner of the image, or the like.

次に、図１０を用いて説明する。図１０では、検索装置１００は、学習器ＬＥとは別の学習器ＬＥ３１の学習を行う場合を示す。図１０に示す例は、検索装置１００は、人やボトルを含む画像ＩＭ３１と画像中における人やボトル等の占有率を示す情報ＲＯ３１（以下、「正解情報ＲＯ３１」と記載する場合がある）との組み合わせを教師データとして取得する（ステップＳ３１）。そして、学習器ＬＥ３１には、人やボトルを含む画像ＩＭ３１が入力される（ステップＳ３２）。その後、出力情報ＯＣ３１−１に示すような対象の占有率を示す情報が学習器ＬＥ３１から出力される（ステップＳ３３）。図１０に示す例においても、図９に示す例と同様に、学習部１３２は、上記式（１）〜（３）を用いて、誤差関数Ｌ（ｘ）の勾配が０になるように、フィードバック処理を行う（ステップＳ３４）。例えば、学習部１３２が上述のような処理を繰り返すことにより、対象が複数ある場合であっても、学習器ＬＥ３１は、入力された画像における対象の占有率を適切に出力することができる。なお、図１０は、学習器ＬＥ３１の出力を正解情報ＲＯ３１に近づけるために、誤差関数Ｌ等を最小化するように処理を繰り返すことを視覚的に示すためのものであり、学習器ＬＥ３１内において自動で行われてもよい。 Next, it will be described with reference to FIG. FIG. 10 shows a case where the search device 100 learns a learning device LE 31 different from the learning device LE. In the example shown in FIG. 10, the search device 100 includes an image IM 31 including a person or a bottle and an information RO 31 indicating the occupancy rate of the person or the bottle in the image (hereinafter, may be referred to as “correct answer information RO 31”). Is acquired as teacher data (step S31). Then, the image IM31 including the person or the bottle is input to the learning device LE31 (step S32). After that, information indicating the occupancy rate of the target as shown in the output information OC31-1 is output from the learner LE31 (step S33). In the example shown in FIG. 10, similarly to the example shown in FIG. 9, the learning unit 132 uses the above equations (1) to (3) so that the gradient of the error function L (x) becomes 0. Feedback processing is performed (step S34). For example, by repeating the above-mentioned processing by the learning unit 132, the learning device LE31 can appropriately output the occupancy rate of the target in the input image even when there are a plurality of targets. Note that FIG. 10 is for visually demonstrating that the process is repeated so as to minimize the error function L and the like in order to bring the output of the learner LE31 closer to the correct answer information RO31. It may be done automatically.

〔６．学習処理のフロー〕
ここで、図１１を用いて、実施形態に係る検索装置１００による学習処理の手順について説明する。図１１は、実施形態に係る学習処理の一例を示すフローチャートである。 [6. Learning process flow]
Here, the procedure of the learning process by the search device 100 according to the embodiment will be described with reference to FIG. FIG. 11 is a flowchart showing an example of the learning process according to the embodiment.

図１１に示すように、検索装置１００は、画像と画像中の各対象の占有率に関する正解情報とを取得する（ステップＳ３０１）。図９では、検索装置１００は、猫を含む画像ＩＭ２１と画像中における猫等の占有率を示す正解情報ＲＯ２１とを取得する。その後、検索装置１００は、ステップＳ３０１で取得した画像を学習器に入力する（ステップＳ３０２）。図９では、検索装置１００は、取得した画像ＩＭ２１を学習器ＬＥに入力する。 As shown in FIG. 11, the search device 100 acquires the image and the correct answer information regarding the occupancy rate of each object in the image (step S301). In FIG. 9, the search device 100 acquires the image IM21 including the cat and the correct answer information RO21 indicating the occupancy rate of the cat or the like in the image. After that, the search device 100 inputs the image acquired in step S301 into the learner (step S302). In FIG. 9, the search device 100 inputs the acquired image IM21 to the learner LE.

その後、検索装置１００は、学習器の出力に基づく占有率と、正解情報の占有率との誤差が小さくなるように学習する（ステップＳ３０３）。図９では、検索装置１００は、学習器ＬＥの出力に基づく出力情報ＯＣ２１−１に示す各対象の占有率と、正解情報ＲＯ２１に示す各対象の占有率とに基づいて学習する。 After that, the search device 100 learns so that the error between the occupancy rate based on the output of the learner and the occupancy rate of the correct answer information becomes small (step S303). In FIG. 9, the search device 100 learns based on the occupancy rate of each object shown in the output information OC21-1 based on the output of the learning device LE and the occupancy rate of each object shown in the correct answer information RO21.

その後、検索装置１００は、所定の条件を満たす場合（ステップＳ３０４：Ｙｅｓ）、処理を終了する。例えば、検索装置１００は、学習器の出力に基づく占有率と正解情報の占有率との誤差が所定の閾値以内である場合や、学習を開始してから所定の時間が経過した場合に所定の条件を満たすとして、処理を終了してもよい。また、検索装置１００は、所定の条件を満たさない場合（ステップＳ３０４：Ｎｏ）、ステップＳ３０３の処理を繰り返す。例えば、検索装置１００は、学習器の出力に基づく占有率と正解情報の占有率との誤差が所定の閾値より大きい場合や、学習を開始してから所定の時間が経過していない場合に所定の条件を満たさないとして、ステップＳ３０３の処理を繰り返してもよい。なお、お、上記の学習処理は一例であり、検索装置１００は、種々の手順により学習を行ってもよい。 After that, when the predetermined condition is satisfied (step S304: Yes), the search device 100 ends the process. For example, the search device 100 determines when the error between the occupancy rate based on the output of the learning device and the occupancy rate of the correct answer information is within a predetermined threshold value, or when a predetermined time elapses from the start of learning. If the condition is satisfied, the process may be terminated. Further, when the search device 100 does not satisfy the predetermined condition (step S304: No), the search device 100 repeats the process of step S303. For example, the search device 100 determines when the error between the occupancy rate based on the output of the learning device and the occupancy rate of the correct answer information is larger than a predetermined threshold value, or when a predetermined time has not elapsed since the start of learning. The process of step S303 may be repeated on the assumption that the condition of the above is not satisfied. The above learning process is an example, and the search device 100 may perform learning by various procedures.

〔７．効果〕
上述してきたように、実施形態に係る検索装置１００は、取得部１３１と、推定部１３３とを有する。取得部１３１は、画像を取得する。推定部１３３は、入力画像に応じて当該入力画像中における所定の対象（実施形態においては「猫」。以下、同じ）の占有率を出力する学習器（実施形態においては「学習器ＬＥ」。以下、同じ）と、取得部１３１により取得された画像とに基づいて、取得部１３１により取得された画像における所定の対象の占有率を推定する。 [7. effect〕
As described above, the search device 100 according to the embodiment includes an acquisition unit 131 and an estimation unit 133. The acquisition unit 131 acquires an image. The estimation unit 133 is a learner (“learner LE” in the embodiment) that outputs the occupancy rate of a predetermined object (“cat” in the embodiment; the same applies hereinafter) in the input image according to the input image. The occupancy rate of a predetermined target in the image acquired by the acquisition unit 131 is estimated based on (the same applies hereinafter) and the image acquired by the acquisition unit 131.

これにより、実施形態に係る検索装置１００は、画像における対象の占有率を出力する学習器を用いることにより、画像に対象が含まれる態様、すなわち画像において対象がどれくらいの割合を占めるかを適切に推定することができる。このように、検索装置１００は、画像に対象が含まれる態様を適切に推定することができる。 As a result, the search device 100 according to the embodiment appropriately determines the mode in which the object is included in the image, that is, how much the object occupies in the image, by using the learning device that outputs the occupancy rate of the object in the image. Can be estimated. In this way, the search device 100 can appropriately estimate the mode in which the object is included in the image.

また、実施形態に係る検索装置１００において、推定部１３３は、所定のデータにより学習されたニューラルネットワークである学習器に基づいて、取得部１３１により取得された画像における所定の対象の占有率を推定する。 Further, in the search device 100 according to the embodiment, the estimation unit 133 estimates the occupancy rate of a predetermined target in the image acquired by the acquisition unit 131 based on the learner which is a neural network learned from the predetermined data. To do.

これにより、実施形態に係る検索装置１００は、所定のデータにより学習されたニューラルネットワークである学習器に基づくことにより、画像において対象がどれくらいの割合を占めるかを適切に推定することができる。このように、検索装置１００は、画像に対象が含まれる態様を適切に推定することができる。 As a result, the search device 100 according to the embodiment can appropriately estimate how much the target occupies in the image based on the learner which is a neural network learned from the predetermined data. In this way, the search device 100 can appropriately estimate the mode in which the object is included in the image.

また、実施形態に係る検索装置１００において、推定部１３３は、画像と当該画像中における所定の対象の占有率との組み合わせにより学習されたニューラルネットワークである学習器に基づいて、取得部１３１により取得された画像における所定の対象の占有率を推定する。 Further, in the search device 100 according to the embodiment, the estimation unit 133 is acquired by the acquisition unit 131 based on the learner which is a neural network learned by the combination of the image and the occupancy rate of a predetermined target in the image. Estimate the occupancy rate of a predetermined object in the image.

これにより、実施形態に係る検索装置１００は、画像と当該画像中における所定の対象の占有率との組み合わせにより学習されたニューラルネットワークである学習器に基づくことにより、画像において対象がどれくらいの割合を占めるかを適切に推定することができる。このように、検索装置１００は、画像に対象が含まれる態様を適切に推定することができる。 As a result, the search device 100 according to the embodiment is based on a learner which is a neural network learned by a combination of an image and an occupancy rate of a predetermined target in the image, so that the ratio of the target in the image is increased. It is possible to properly estimate whether it occupies. In this way, the search device 100 can appropriately estimate the mode in which the object is included in the image.

また、実施形態に係る検索装置１００において、推定部１３３は、畳み込み処理及びプーリング処理を行うニューラルネットワークである学習器に基づいて、取得部１３１により取得された画像における所定の対象の占有率を推定する。 Further, in the search device 100 according to the embodiment, the estimation unit 133 estimates the occupancy rate of a predetermined target in the image acquired by the acquisition unit 131 based on the learner which is a neural network that performs the convolution process and the pooling process. To do.

これにより、実施形態に係る検索装置１００は、畳み込み処理及びプーリング処理を行うニューラルネットワークである学習器に基づくことにより、画像において対象がどれくらいの割合を占めるかを適切に推定することができる。このように、検索装置１００は、画像に対象が含まれる態様を適切に推定することができる。 As a result, the search device 100 according to the embodiment can appropriately estimate how much the target occupies in the image by being based on the learner which is a neural network that performs the convolution process and the pooling process. In this way, the search device 100 can appropriately estimate the mode in which the object is included in the image.

また、実施形態に係る検索装置１００は、学習部１３２を有する。学習部１３２は、学習器を学習する。 Further, the search device 100 according to the embodiment has a learning unit 132. The learning unit 132 learns the learning device.

これにより、実施形態に係る検索装置１００は、学習した学習器に基づくことにより、画像において対象がどれくらいの割合を占めるかを適切に推定することができる。このように、検索装置１００は、画像に対象が含まれる態様を適切に推定することができる。 As a result, the search device 100 according to the embodiment can appropriately estimate how much the target occupies in the image based on the learned learner. In this way, the search device 100 can appropriately estimate the mode in which the object is included in the image.

また、実施形態に係る検索装置１００は、学習部１３２を有する。学習部１３２は、画像と当該画像における所定の対象の占有率との組み合わせにより学習器を学習する。 Further, the search device 100 according to the embodiment has a learning unit 132. The learning unit 132 learns the learning device by combining the image and the occupancy rate of a predetermined object in the image.

これにより、実施形態に係る検索装置１００は、画像と当該画像における所定の対象の占有率との組み合わせにより学習した学習器に基づくことにより、画像において対象がどれくらいの割合を占めるかを適切に推定することができる。このように、検索装置１００は、画像に対象が含まれる態様を適切に推定することができる。 As a result, the search device 100 according to the embodiment appropriately estimates how much the target occupies in the image based on the learning device learned by the combination of the image and the occupancy rate of the predetermined target in the image. can do. In this way, the search device 100 can appropriately estimate the mode in which the object is included in the image.

また、実施形態に係る検索装置１００は、学習部１３２を有する。学習部１３２は、所定の評価関数を最小化するように学習器を学習する。 Further, the search device 100 according to the embodiment has a learning unit 132. The learning unit 132 learns the learner so as to minimize a predetermined evaluation function.

これにより、実施形態に係る検索装置１００は、所定の評価関数を最小化するように学習した学習器に基づくことにより、画像において対象がどれくらいの割合を占めるかを適切に推定することができる。このように、検索装置１００は、画像に対象が含まれる態様を適切に推定することができる。 As a result, the search device 100 according to the embodiment can appropriately estimate how much the target occupies in the image by being based on the learning device learned so as to minimize the predetermined evaluation function. In this way, the search device 100 can appropriately estimate the mode in which the object is included in the image.

また、実施形態に係る検索装置１００は、決定部１３５を有する。決定部１３５は、推定部１３４により推定された複数の画像の各々における所定の対象の占有率に基づいて、複数の画像のランキングを決定する。 Further, the search device 100 according to the embodiment has a determination unit 135. The determination unit 135 determines the ranking of the plurality of images based on the occupancy rate of a predetermined target in each of the plurality of images estimated by the estimation unit 134.

これにより、実施形態に係る検索装置１００は、推定された複数の画像の各々における所定の対象の占有率に基づいて、複数の画像のランキングを決定することにより、複数の画像のランキングを適切に決定することができる。 As a result, the search device 100 according to the embodiment appropriately ranks the plurality of images by determining the ranking of the plurality of images based on the occupancy rate of the predetermined target in each of the estimated plurality of images. Can be decided.

また、実施形態に係る検索装置１００は、提供部１３６を有する。提供部１３６は、決定部１３５により決定された複数の画像のランキングに基づく情報を提供する。 Further, the search device 100 according to the embodiment has a providing unit 136. The providing unit 136 provides information based on the ranking of a plurality of images determined by the determining unit 135.

これにより、実施形態に係る検索装置１００は、対象の占有率を用いて決定された複数の画像のランキングに基づく情報を提供することにより、より適切な情報をユーザに提供することができる。 Thereby, the search device 100 according to the embodiment can provide more appropriate information to the user by providing information based on the ranking of a plurality of images determined by using the occupancy rate of the target.

〔８．ハードウェア構成〕
上述してきた実施形態に係る検索装置１００は、例えば図１２に示すような構成のコンピュータ１０００によって実現される。図１２は、検索装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。コンピュータ１０００は、ＣＰＵ１１００、ＲＡＭ１２００、ＲＯＭ１３００、ＨＤＤ（Hard Disk Drive）１４００、通信インターフェイス（Ｉ／Ｆ）１５００、入出力インターフェイス（Ｉ／Ｆ）１６００、及びメディアインターフェイス（Ｉ／Ｆ）１７００を有する。 [8. Hardware configuration]
The search device 100 according to the above-described embodiment is realized by, for example, a computer 1000 having a configuration as shown in FIG. FIG. 12 is a hardware configuration diagram showing an example of a computer that realizes the function of the search device. The computer 1000 has a CPU 1100, a RAM 1200, a ROM 1300, an HDD (Hard Disk Drive) 1400, a communication interface (I / F) 1500, an input / output interface (I / F) 1600, and a media interface (I / F) 1700.

ＣＰＵ１１００は、ＲＯＭ１３００またはＨＤＤ１４００に格納されたプログラムに基づいて動作し、各部の制御を行う。ＲＯＭ１３００は、コンピュータ１０００の起動時にＣＰＵ１１００によって実行されるブートプログラムや、コンピュータ１０００のハードウェアに依存するプログラム等を格納する。 The CPU 1100 operates based on a program stored in the ROM 1300 or the HDD 1400, and controls each part. The ROM 1300 stores a boot program executed by the CPU 1100 when the computer 1000 is started, a program that depends on the hardware of the computer 1000, and the like.

ＨＤＤ１４００は、ＣＰＵ１１００によって実行されるプログラム、及び、かかるプログラムによって使用されるデータ等を格納する。通信インターフェイス１５００は、ネットワークＮを介して他の機器からデータを受信してＣＰＵ１１００へ送り、ＣＰＵ１１００が生成したデータをネットワークＮを介して他の機器へ提供する。 The HDD 1400 stores a program executed by the CPU 1100, data used by such a program, and the like. The communication interface 1500 receives data from another device via the network N and sends it to the CPU 1100, and provides the data generated by the CPU 1100 to the other device via the network N.

ＣＰＵ１１００は、入出力インターフェイス１６００を介して、ディスプレイやプリンタ等の出力装置、及び、キーボードやマウス等の入力装置を制御する。ＣＰＵ１１００は、入出力インターフェイス１６００を介して、入力装置からデータを取得する。また、ＣＰＵ１１００は、生成したデータを入出力インターフェイス１６００を介して出力装置へ出力する。 The CPU 1100 controls an output device such as a display or a printer, and an input device such as a keyboard or a mouse via the input / output interface 1600. The CPU 1100 acquires data from the input device via the input / output interface 1600. Further, the CPU 1100 outputs the generated data to the output device via the input / output interface 1600.

メディアインターフェイス１７００は、記録媒体１８００に格納されたプログラムまたはデータを読み取り、ＲＡＭ１２００を介してＣＰＵ１１００に提供する。ＣＰＵ１１００は、かかるプログラムを、メディアインターフェイス１７００を介して記録媒体１８００からＲＡＭ１２００上にロードし、ロードしたプログラムを実行する。記録媒体１８００は、例えばＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等である。 The media interface 1700 reads a program or data stored in the recording medium 1800 and provides the program or data to the CPU 1100 via the RAM 1200. The CPU 1100 loads the program from the recording medium 1800 onto the RAM 1200 via the media interface 1700, and executes the loaded program. The recording medium 1800 is an optical recording medium such as a DVD (Digital Versatile Disc) or PD (Phase change rewritable Disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory. And so on.

例えば、コンピュータ１０００が実施形態に係る検索装置１００として機能する場合、コンピュータ１０００のＣＰＵ１１００は、ＲＡＭ１２００上にロードされたプログラムを実行することにより、制御部１３０の機能を実現する。コンピュータ１０００のＣＰＵ１１００は、これらのプログラムを記録媒体１８００から読み取って実行するが、他の例として、他の装置からネットワークＮを介してこれらのプログラムを取得してもよい。 For example, when the computer 1000 functions as the search device 100 according to the embodiment, the CPU 1100 of the computer 1000 realizes the function of the control unit 130 by executing the program loaded on the RAM 1200. The CPU 1100 of the computer 1000 reads and executes these programs from the recording medium 1800, but as another example, these programs may be acquired from another device via the network N.

以上、本願の実施形態を図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の行に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 The embodiments of the present application have been described in detail with reference to the drawings, but these are examples, and various modifications and improvements are made based on the knowledge of those skilled in the art, including the embodiments described in the disclosure line of the invention. It is possible to carry out the present invention in other forms described above.

〔９．その他〕
また、上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 [9. Others]
Further, among the processes described in the above-described embodiment, all or a part of the processes described as being automatically performed can be manually performed, or the processes described as being manually performed can be performed. All or part of it can be done automatically by a known method. In addition, the processing procedure, specific name, and information including various data and parameters shown in the above document and drawings can be arbitrarily changed unless otherwise specified. For example, the various information shown in each figure is not limited to the illustrated information.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Further, each component of each of the illustrated devices is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or part of the device is functionally or physically distributed / physically in any unit according to various loads and usage conditions. It can be integrated and configured.

また、上述してきた実施形態は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 In addition, the above-described embodiments can be appropriately combined as long as the processing contents do not contradict each other.

また、上述してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、取得部は、取得手段や取得回路に読み替えることができる。 In addition, the above-mentioned "section, module, unit" can be read as "means" or "circuit". For example, the acquisition unit can be read as an acquisition means or an acquisition circuit.

１検索システム
１００検索装置（推定装置）
１２１学習情報記憶部
１２２画像情報記憶部
１２３一覧情報記憶部
１３０制御部
１３１取得部
１３２学習部
１３３推定部
１３４抽出部
１３５決定部
１３６提供部
１０端末装置
Ｎネットワーク 1 Search system 100 Search device (estimator)
121 Learning information storage unit 122 Image information storage unit 123 List information storage unit 130 Control unit 131 Acquisition unit 132 Learning unit 133 Estimating unit 134 Extraction unit 135 Decision unit 136 Providing unit 10 Terminal device N network

Claims

The acquisition department that acquires images and
Based on the learner that outputs the occupancy rate of a predetermined object in the input image according to the input image and the image acquired by the acquisition unit, the predetermined object in the image acquired by the acquisition unit An estimation unit that estimates the occupancy rate and
Using the occupancy rate of the predetermined object in each of the plurality of images estimated by the estimation unit , the ranking of the plurality of images is determined according to the difference in the occupancy rate between the images having consecutive rankings in the ranking. The decision-making part to do
An estimation device characterized by comprising.

The acquisition department that acquires images and
Based on the learner that outputs the occupancy rate of a predetermined object in the input image according to the input image and the image acquired by the acquisition unit, the predetermined object in the image acquired by the acquisition unit An estimation unit that estimates the occupancy rate and
Based on the occupancy rate of the predetermined object in each of the plurality of images estimated by the estimation unit, the ranking of the plurality of images is ranked according to whether or not each of the plurality of images contains character information. The decision-making part to decide and
An estimation device characterized by comprising.

The estimation unit
The first or second claim is characterized in that the occupancy rate of the predetermined object in the image acquired by the acquisition unit is estimated based on the learner which is a neural network trained by the predetermined data. The estimated device described.

The estimation unit
To estimate the occupancy rate of the predetermined object in the image acquired by the acquisition unit based on the learner which is a neural network learned by the combination of the image and the occupancy rate of the predetermined object in the image. The estimation device according to claim 3.

The estimation unit
The estimation according to claim 4, wherein the occupancy rate of the predetermined object in the image acquired by the acquisition unit is estimated based on the learner which is the neural network that performs the convolution process and the pooling process. apparatus.

Learning unit that learns the learning device,
The estimation device according to any one of claims 1 to 5, further comprising.

The learning unit
The estimation device according to claim 6, wherein the learning device is learned by a combination of an image and an occupancy rate of a predetermined object in the image.

The learning unit
The estimation device according to claim 6 or 7, wherein the learner is learned so as to minimize a predetermined evaluation function.

A providing unit that provides information based on the ranking of the plurality of images determined by the determining unit.
The estimation device according to any one of claims 1 to 8, further comprising.

An estimation method performed by a computer
The acquisition process to acquire the image and
Based on the learner that outputs the occupancy rate of a predetermined object in the input image according to the input image and the image acquired by the acquisition step, the predetermined object in the image acquired by the acquisition step The estimation process for estimating the occupancy rate and
Using the occupancy rate of the predetermined object in each of the plurality of images estimated by the estimation step , the ranking of the plurality of images is determined according to the difference in the occupancy rate between the images having consecutive rankings in the ranking. The decision process to do and
An estimation method characterized by including.

The acquisition procedure to acquire the image and
Based on the learner that outputs the occupancy rate of the predetermined object in the input image according to the input image and the image acquired by the acquisition procedure, the predetermined object in the image acquired by the acquisition procedure Estimating procedure for estimating occupancy and
Using the occupancy rate of the predetermined object in each of the plurality of images estimated by the estimation procedure , the ranking of the plurality of images is determined according to the difference in the occupancy rate between the images having consecutive ranks in the ranking. The decision procedure to make and
An estimation program characterized by having a computer execute.

An estimation method performed by a computer
The acquisition process to acquire the image and
Based on the learner that outputs the occupancy rate of a predetermined object in the input image according to the input image and the image acquired by the acquisition step, the predetermined object in the image acquired by the acquisition step The estimation process for estimating the occupancy rate and
Based on the occupancy rate of the predetermined object in each of the plurality of images estimated by the estimation step, the ranking of the plurality of images is ranked according to whether or not each of the plurality of images contains character information. The decision process to decide and
An estimation method characterized by including.

The acquisition procedure to acquire the image and
Based on the learner that outputs the occupancy rate of the predetermined object in the input image according to the input image and the image acquired by the acquisition procedure, the predetermined object in the image acquired by the acquisition procedure Estimating procedure for estimating occupancy and
Based on the occupancy rate of the predetermined object in each of the plurality of images estimated by the estimation procedure, the ranking of the plurality of images is ranked according to whether or not each of the plurality of images contains character information. The decision procedure to decide and
An estimation program characterized by having a computer execute.