JP7054603B2

JP7054603B2 - Judgment device, judgment method, and judgment program

Info

Publication number: JP7054603B2
Application number: JP2016152924A
Authority: JP
Inventors: 智大田中; 直晃山下
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2016-08-03
Filing date: 2016-08-03
Publication date: 2022-04-14
Anticipated expiration: 2036-08-03
Also published as: JP2018022332A

Description

本発明は、判定装置、判定方法、及び判定プログラムに関する。 The present invention relates to a determination device, a determination method, and a determination program.

従来、ニューラルネットワークによる画像の特徴抽出に関する技術が提供されている。例えば、畳み込みニューラルネットワーク（Convolutional Neural Network）により、画像の顕著性マップを生成する技術が提供されている。また、ニューラルネットワークにより、画像に含まれる所定の対象を識別する技術が提供されている。 Conventionally, a technique for extracting image features by a neural network has been provided. For example, a convolutional neural network provides a technique for generating an image saliency map. Further, a neural network provides a technique for identifying a predetermined object included in an image.

Min Lin, Qiang Chen, Shuicheng Yan, "Network In Network", arXiv preprint arXiv:1312.4400 (2013) & ICLR(International Conference on Learning Representations)-2014Min Lin, Qiang Chen, Shuicheng Yan, "Network In Network", arXiv preprint arXiv: 1312.4400 (2013) & ICLR (International Conference on Learning Representations)-2014 Jianming Zhang, Shugao Ma, Mehrnoosh Sameki, Stan Sclaroff, Margrit Betke, Zhe Lin, Xiaohui Shen, Brian Price, Radomir Mech "Salient Object Subitizing", The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 4045-4054Jianming Zhang, Shugao Ma, Mehrnoosh Sameki, Stan Sclaroff, Margrit Betke, Zhe Lin, Xiaohui Shen, Brian Price, Radomir Mech "Salient Object Subitizing", The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 4045 -4054

しかしながら、上記の従来技術では画像に含まれる対象の数が適切に判定されるとは限らない。例えば、画像に含まれる所定の対象を識別するのみでは、画像に含まれる対象の数を適切に判定できるとは限らない。 However, in the above-mentioned conventional technique, the number of objects included in the image is not always properly determined. For example, it is not always possible to appropriately determine the number of objects included in an image simply by identifying a predetermined object included in the image.

本願は、上記に鑑みてなされたものであって、ニューラルネットワークにおける情報を用いて画像に含まれる対象の数を適切に判定する判定装置、判定方法、及び判定プログラムを提供することを目的とする。 The present application has been made in view of the above, and an object thereof is to provide a determination device, a determination method, and a determination program for appropriately determining the number of objects included in an image by using information in a neural network. ..

本願に係る判定装置は、画像中の対象の数を識別するニューラルネットワークに入力された入力画像に基づく複数の特徴情報であって、前記ニューラルネットワークが識別する各数に対応する複数の特徴情報を取得する取得部と、前記取得部により取得された前記複数の特徴情報に基づいて、前記入力画像に含まれる前記対象の数を判定する判定部と、を備えたことを特徴とする。 The determination device according to the present application is a plurality of feature information based on an input image input to a neural network that identifies the number of objects in an image, and a plurality of feature information corresponding to each number identified by the neural network. It is characterized by including an acquisition unit to be acquired and a determination unit for determining the number of objects included in the input image based on the plurality of feature information acquired by the acquisition unit.

実施形態の一態様によれば、ニューラルネットワークにおける情報を用いて画像に含まれる対象の数を適切に判定することができるという効果を奏する。 According to one aspect of the embodiment, there is an effect that the number of objects included in the image can be appropriately determined by using the information in the neural network.

図１は、実施形態に係る判定処理の一例を示す図である。FIG. 1 is a diagram showing an example of a determination process according to an embodiment. 図２は、実施形態に係る判定装置の構成例を示す図である。FIG. 2 is a diagram showing a configuration example of the determination device according to the embodiment. 図３は、実施形態に係る学習情報記憶部の一例を示す図である。FIG. 3 is a diagram showing an example of a learning information storage unit according to an embodiment. 図４は、実施形態に係る画像情報記憶部の一例を示す図である。FIG. 4 is a diagram showing an example of an image information storage unit according to an embodiment. 図５は、実施形態に係る対象の数の判定の一例を示すフローチャートである。FIG. 5 is a flowchart showing an example of determination of the number of objects according to the embodiment. 図６は、実施形態に係る加工画像の生成例を示す図である。FIG. 6 is a diagram showing an example of generating a processed image according to the embodiment. 図７は、実施形態に係る加工画像の生成の一例を示すフローチャートである。FIG. 7 is a flowchart showing an example of generating a processed image according to the embodiment. 図８は、実施形態に係る学習処理の一例を示す図である。FIG. 8 is a diagram showing an example of a learning process according to an embodiment. 図９は、実施形態に係る学習処理の一例を示す図である。FIG. 9 is a diagram showing an example of the learning process according to the embodiment. 図１０は、実施形態に係る学習処理の一例を示すフローチャートである。FIG. 10 is a flowchart showing an example of the learning process according to the embodiment. 図１１は、実施形態に係る判定処理の一例を示す図である。FIG. 11 is a diagram showing an example of the determination process according to the embodiment. 図１２は、判定装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。FIG. 12 is a hardware configuration diagram showing an example of a computer that realizes the function of the determination device.

以下に、本願に係る判定装置、判定方法、及び判定プログラムを実施するための形態（以下、「実施形態」と呼ぶ）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願に係る判定装置、判定方法、及び判定プログラムが限定されるものではない。また、以下の各実施形態において同一の部位には同一の符号を付し、重複する説明は省略される。 Hereinafter, a determination device according to the present application, a determination method, and an embodiment for implementing the determination program (hereinafter referred to as “embodiment”) will be described in detail with reference to the drawings. It should be noted that this embodiment does not limit the determination device, determination method, and determination program according to the present application. Further, in each of the following embodiments, the same parts are designated by the same reference numerals, and duplicate description is omitted.

（実施形態）
〔１．判定処理〕
図１を用いて、実施形態に係る判定処理の一例について説明する。図１は、実施形態に係る判定処理の一例を示す図である。図１に示す判定装置１００は、画像に含まれる対象の数を認識する学習器ＬＥから取得した各数に対応する特徴情報を用いて画像に含まれる対象の数を判定する。具体的には、判定装置１００は、学習器ＬＥから取得した各数に対応する特徴情報ごとにスコアを算出し、スコアに基づいて画像に含まれる対象の数を判定する。図１に示す学習器ＬＥは、画像に含まれる対象の数が０個、１個、２個、３個、または４個以上の５つの種別（クラス）のいずれに分類されるかを識別する。この場合、学習器ＬＥは、画像を５つのクラス（数）に分類する。 (Embodiment)
[1. Determination process〕
An example of the determination process according to the embodiment will be described with reference to FIG. FIG. 1 is a diagram showing an example of a determination process according to an embodiment. The determination device 100 shown in FIG. 1 determines the number of objects included in the image by using the feature information corresponding to each number acquired from the learning device LE that recognizes the number of objects included in the image. Specifically, the determination device 100 calculates a score for each feature information corresponding to each number acquired from the learning device LE, and determines the number of objects included in the image based on the score. The learning device LE shown in FIG. 1 identifies whether the number of objects included in the image is classified into five types (classes) of 0, 1, 2, 3, or 4 or more. .. In this case, the learner LE classifies the images into five classes (numbers).

図１に示す例において、画像に含まれる対象の数を認識する学習器ＬＥについては、非特許文献１に示すようなＮＩＮ（Network In Network）での個数の検出等の種々の従来技術を適宜用いて生成された学習器であってもよい。例えば、図１中の学習器ＬＥは、畳み込みニューラルネットワーク（Convolutional Neural Network）の全結合層を畳み込み層に置き換えた学習器であってもよい。以下では、畳み込みニューラルネットワークをＣＮＮと記載する場合がある。例えば、図１中の学習器ＬＥは、全結合層を含まないＣＮＮ、すなわちＦＣＮ(Fully Convolutional Neuralnetwork)であってもよい。例えば、通常のＣＮＮにおいては全結合層を通すことで画像の空間的な情報が失われる。一方、ＦＣＮでは全結合層を畳み込み層に置き換えることで空間的な情報（「receptive field」ともいう）を維持した推定が可能となる。なお、図１中の学習器ＬＥは、各クラス（数）に対応する特徴情報を取得可能であれば、どのような学習器であってもよい。例えば、図１中の学習器ＬＥは、ＮＩＮに対象の数を識別する学習を行わせたものであってもよい。 In the example shown in FIG. 1, for the learner LE that recognizes the number of objects included in the image, various conventional techniques such as detection of the number by NIN (Network In Network) as shown in Non-Patent Document 1 are appropriately applied. It may be a learning device generated by using. For example, the learner LE in FIG. 1 may be a learner in which the fully connected layer of the convolutional neural network is replaced with the convolutional layer. In the following, the convolutional neural network may be referred to as CNN. For example, the learner LE in FIG. 1 may be a CNN that does not include a fully connected layer, that is, an FCN (Fully Convolutional Neural network). For example, in a normal CNN, the spatial information of the image is lost by passing through the fully connected layer. On the other hand, in FCN, by replacing the fully connected layer with a convolutional layer, it is possible to perform estimation while maintaining spatial information (also referred to as “receptive field”). The learning device LE in FIG. 1 may be any learning device as long as the feature information corresponding to each class (number) can be acquired. For example, the learning device LE in FIG. 1 may be one in which NIN is trained to identify the number of objects.

また、図１中の学習器ＬＥは、画像に含まれる人の顔の数を識別する。例えば、図１中の学習器ＬＥは、画像に含まれる人の顔の数に応じて、画像を複数のクラス（数）に分類する。例えば、図１中の学習器ＬＥは、画像に含まれる人の顔の数に応じて、各数に対応する各クラスに画像を分類する。また、図１に示す例においては、数を判定する対象が人の顔である場合を示すが、対象は、人の顔に限らず、犬や猫等の他の生物や植物や車等の種々の物体等であってもよい。また、ここでいう対象は、識別可能であれば種々の対象が含まれてもよく、例えば火や海の波など種々の現象等が含まれてもよい。 Further, the learning device LE in FIG. 1 identifies the number of human faces included in the image. For example, the learner LE in FIG. 1 classifies an image into a plurality of classes (numbers) according to the number of human faces included in the image. For example, the learner LE in FIG. 1 classifies images into classes corresponding to each number according to the number of human faces included in the image. Further, in the example shown in FIG. 1, a case where the target for determining the number is a human face is shown, but the target is not limited to the human face, but other organisms such as dogs and cats, plants, cars, and the like. It may be various objects or the like. Further, the object referred to here may include various objects as long as it can be identified, and may include various phenomena such as fire and sea waves.

上述のように、図１では、判定装置１００は、画像に含まれる人の顔の数に関する情報を出力する識別器（モデル）である学習器ＬＥを用いる。図１の例では、判定装置１００は、後述する所定の学習処理により生成済みの学習器ＬＥを用いるものとする。なお、判定装置１００は、所定の対象の数を判定可能であれば、どのような学習器を用いてもよい。また、学習器ＬＥを生成（学習）する際には、所定の損失関数や正解情報等を用いるが詳細は後述する。 As described above, in FIG. 1, the determination device 100 uses a learning device LE, which is a discriminator (model) that outputs information regarding the number of human faces included in the image. In the example of FIG. 1, the determination device 100 uses a learning device LE that has been generated by a predetermined learning process described later. The determination device 100 may use any learning device as long as it can determine the number of predetermined objects. Further, when the learning device LE is generated (learning), a predetermined loss function, correct answer information, and the like are used, but the details will be described later.

ここから、図１を用いて、判定装置１００による画像に含まれる対象の数の判定処理について説明する。図１に示すように、判定装置１００には、画像ＩＭ１１が入力される（ステップＳ１０）。例えば、判定装置１００は、対象として５人の人、すなわち５つの顔が写った画像ＩＭ１１を取得する。画像ＩＭ１１を取得した判定装置１００は、所定の学習器に画像ＩＭ１１を入力する（ステップＳ１１）。例えば、図１では、判定装置１００は、画像に含まれる対象の数を識別する識別器（モデル）である学習器ＬＥに画像ＩＭ１１を入力する。 From here, with reference to FIG. 1, the determination process of the number of objects included in the image by the determination device 100 will be described. As shown in FIG. 1, the image IM 11 is input to the determination device 100 (step S10). For example, the determination device 100 acquires an image IM 11 in which five people, that is, five faces, are captured as targets. The determination device 100 that has acquired the image IM 11 inputs the image IM 11 to a predetermined learning device (step S11). For example, in FIG. 1, the determination device 100 inputs the image IM 11 to the learner LE, which is a discriminator (model) that identifies the number of objects included in the image.

例えば、画像ＩＭ１１が入力された学習器ＬＥは、画像ＩＭ１１に含まれる対象の数を識別する処理を行う。例えば、学習器ＬＥは、対象の数を識別する処理を行う過程において、各クラス（数）に対応する特徴情報を生成する。図１の例では、０個、１個、２個、３個、または４個以上の５つの人の顔の数（クラス）に対応する特徴情報が、学習器ＬＥにより生成される。 For example, the learner LE to which the image IM 11 is input performs a process of identifying the number of objects included in the image IM 11. For example, the learner LE generates feature information corresponding to each class (number) in the process of identifying the number of objects. In the example of FIG. 1, feature information corresponding to the number (class) of five faces of 0, 1, 2, 3, or 4 or more is generated by the learner LE.

そこで、判定装置１００は、学習器ＬＥによる画像に含まれる人の顔の数を識別する処理の過程で生成される各特徴情報を取得する。図１の例では、判定装置１００は、学習器ＬＥから人の顔の数が０個であるクラスに対応する特徴情報ＦＭ１０を取得する（ステップＳ１２－０）。 Therefore, the determination device 100 acquires each feature information generated in the process of identifying the number of human faces included in the image by the learning device LE. In the example of FIG. 1, the determination device 100 acquires the feature information FM10 corresponding to the class in which the number of human faces is 0 from the learning device LE (step S12-0).

例えば、特徴情報ＦＭ１０は、画像ＩＭ１１における各画素の特徴量を示す。なお、ここでいう特徴量は、例えば、特徴量を示す数値である。具体的には、特徴情報ＦＭ１０を構成する各点（画素）の位置は、画像ＩＭ１１に重畳させた場合に画像ＩＭ１１において重なる位置に対応し、特徴情報ＦＭ１０は、画像ＩＭ１１において対応する画素の特徴量を示す。なお、図１中の特徴情報ＦＭ１０では、特徴を示す領域を色が濃い態様で示す。すなわち、特徴情報ＦＭ１０では、特徴量が大きいほど色が濃い態様で表示される。具体的には、図１中の特徴情報ＦＭ１０では、画像ＩＭ１１において人の顔が位置する領域が色の濃い態様で示される。なお、他の特徴情報についても同様である。 For example, the feature information FM10 indicates the feature amount of each pixel in the image IM11. The feature amount referred to here is, for example, a numerical value indicating the feature amount. Specifically, the positions of the points (pixels) constituting the feature information FM10 correspond to the overlapping positions in the image IM11 when superimposed on the image IM11, and the feature information FM10 corresponds to the characteristics of the corresponding pixels in the image IM11. Indicates the amount. In the feature information FM10 in FIG. 1, the region showing the feature is shown in a dark color mode. That is, in the feature information FM10, the larger the feature amount, the darker the color is displayed. Specifically, in the feature information FM10 in FIG. 1, the region where the human face is located is shown in the image IM11 in a dark mode. The same applies to other feature information.

また、図１の例では、判定装置１００は、学習器ＬＥから人の顔の数が１個であるクラスに対応する特徴情報ＦＭ１１を取得する（ステップＳ１２－１）。また、図１の例では、判定装置１００は、学習器ＬＥから人の顔の数が２個であるクラスに対応する特徴情報ＦＭ１２を取得する（ステップＳ１２－２）。また、図１の例では、判定装置１００は、学習器ＬＥから人の顔の数が３個であるクラスに対応する特徴情報ＦＭ１３を取得する（ステップＳ１２－３）。また、図１の例では、判定装置１００は、学習器ＬＥから人の顔の数が４個以上であるクラスに対応する特徴情報ＦＭ１４を取得する（ステップＳ１２－４）。このように、判定装置１００は、学習器ＬＥから、０個、１個、２個、３個、または４個以上の５つの顔の数（クラス）に対応する特徴情報ＦＭ１０～ＦＭ１４を取得する。なお、上記例では、説明を簡単にするために、１つの学習器ＬＥから５つの特徴情報を取得する例を示したが、０個、１個、２個、３個、または４個以上の人の顔を各々識別する５つの学習器から各数に対応する特徴情報を取得してもよい。以下、ステップＳ１２－０～Ｓ１２－４を区別せずに説明する場合、ステップＳ１２と記載する場合がある。 Further, in the example of FIG. 1, the determination device 100 acquires the feature information FM11 corresponding to the class in which the number of human faces is one from the learning device LE (step S12-1). Further, in the example of FIG. 1, the determination device 100 acquires the feature information FM12 corresponding to the class in which the number of human faces is two from the learning device LE (step S12-2). Further, in the example of FIG. 1, the determination device 100 acquires the feature information FM13 corresponding to the class in which the number of human faces is three from the learning device LE (step S12-3). Further, in the example of FIG. 1, the determination device 100 acquires the feature information FM14 corresponding to the class in which the number of human faces is 4 or more from the learning device LE (step S12-4). In this way, the determination device 100 acquires the feature information FM10 to FM14 corresponding to the number (class) of five faces of 0, 1, 2, 3, or 4 or more from the learner LE. .. In the above example, for the sake of simplicity, an example of acquiring five feature information from one learning device LE is shown, but the number is 0, 1, 2, 3, or 4 or more. Feature information corresponding to each number may be acquired from five learning devices that identify each person's face. Hereinafter, when steps S12-0 to S12-4 are described without distinction, they may be described as step S12.

その後、判定装置１００は、ステップＳ１２において取得した特徴情報ＦＭ１０～ＦＭ１４の各々についてスコアを算出する。例えば、判定装置１００は、特徴情報ＦＭ１０～ＦＭ１４の各々における特徴量の平均をスコアとして算出してもよい。また、判定装置１００は、特徴情報ＦＭ１０～ＦＭ１４の各々における特徴量の平均に所定の係数を乗算した値をスコアとして算出してもよい。また、判定装置１００は、特徴情報ＦＭ１０～ＦＭ１４の各々における特徴量の合計をスコアとして算出してもよい。また、判定装置１００は、特徴情報ＦＭ１０～ＦＭ１４の各々における特徴量の合計に所定の係数を乗算した値をスコアとして算出してもよい。 After that, the determination device 100 calculates a score for each of the feature information FM10 to FM14 acquired in step S12. For example, the determination device 100 may calculate the average of the feature amounts in each of the feature information FM10 to FM14 as a score. Further, the determination device 100 may calculate as a score a value obtained by multiplying the average of the feature amounts in each of the feature information FM10 to FM14 by a predetermined coefficient. Further, the determination device 100 may calculate the total of the feature amounts in each of the feature information FM10 to FM14 as a score. Further, the determination device 100 may calculate as a score a value obtained by multiplying the total feature amount in each of the feature information FM10 to FM14 by a predetermined coefficient.

例えば、判定装置１００は、特徴情報ＦＭ１０における特徴量に基づいて、特徴情報ＦＭ１０のスコアを算出する（ステップＳ１３－０）。図１の例では、判定装置１００は、スコア情報ＳＣ１０に示すように、人の顔の数「０個」に対応するスコアを「０．１２」と算出する。また、例えば、判定装置１００は、特徴情報ＦＭ１１における特徴量に基づいて、特徴情報ＦＭ１１のスコアを算出する（ステップＳ１３－１）。図１の例では、判定装置１００は、スコア情報ＳＣ１１に示すように、人の顔の数「１個」に対応するスコアを「０．０１」と算出する。また、例えば、判定装置１００は、特徴情報ＦＭ１２における特徴量に基づいて、特徴情報ＦＭ１２のスコアを算出する（ステップＳ１３－２）。図１の例では、判定装置１００は、スコア情報ＳＣ１２に示すように、人の顔の数「２個」に対応するスコアを「０．０３」と算出する。また、例えば、判定装置１００は、特徴情報ＦＭ１３における特徴量に基づいて、特徴情報ＦＭ１３のスコアを算出する（ステップＳ１３－３）。図１の例では、判定装置１００は、スコア情報ＳＣ１３に示すように、人の顔の数「３個」に対応するスコアを「０．０６」と算出する。また、例えば、判定装置１００は、特徴情報ＦＭ１４における特徴量に基づいて、特徴情報ＦＭ１４のスコアを算出する（ステップＳ１３－４）。図１の例では、判定装置１００は、スコア情報ＳＣ１４に示すように、人の顔の数「４個以上」に対応するスコアを「０．７５」と算出する。以下、ステップＳ１３－０～Ｓ１３－４を区別せずに説明する場合、ステップＳ１３と記載する場合がある。 For example, the determination device 100 calculates the score of the feature information FM10 based on the feature amount in the feature information FM10 (step S13-0). In the example of FIG. 1, the determination device 100 calculates the score corresponding to the number of human faces "0" as "0.12" as shown in the score information SC10. Further, for example, the determination device 100 calculates the score of the feature information FM11 based on the feature amount in the feature information FM11 (step S13-1). In the example of FIG. 1, the determination device 100 calculates the score corresponding to the number of human faces "1" as "0.01" as shown in the score information SC11. Further, for example, the determination device 100 calculates the score of the feature information FM12 based on the feature amount in the feature information FM12 (step S13-2). In the example of FIG. 1, the determination device 100 calculates the score corresponding to the number of human faces "2" as "0.03" as shown in the score information SC12. Further, for example, the determination device 100 calculates the score of the feature information FM13 based on the feature amount in the feature information FM13 (step S13-3). In the example of FIG. 1, the determination device 100 calculates the score corresponding to the number of human faces "3" as "0.06" as shown in the score information SC13. Further, for example, the determination device 100 calculates the score of the feature information FM14 based on the feature amount in the feature information FM14 (step S13-4). In the example of FIG. 1, the determination device 100 calculates the score corresponding to the number of human faces "4 or more" as "0.75" as shown in the score information SC14. Hereinafter, when steps S13-0 to S13-4 are described without distinction, they may be described as step S13.

その後、判定装置１００は、画像ＩＭ１１に含まれる人の顔の数を判定する（ステップＳ１４）。図１の例では、判定装置１００は、ステップＳ１３において取得した各特徴情報ＦＭ１０～ＦＭ１４のスコアに基づいて、画像ＩＭ１１に含まれる人の顔の数を判定する。例えば、判定装置１００は、ステップＳ１３において取得した各特徴情報ＦＭ１０～ＦＭ１４のスコアのうち、最大のスコアである特徴情報ＦＭ１４に対応するクラス（数）を画像ＩＭ１１に含まれる人の顔の数と判定する。具体的には、判定装置１００は、数判定情報ＡＮ１１に示すように、画像ＩＭ１１に含まれる人の顔の数を４個以上と判定する。 After that, the determination device 100 determines the number of human faces included in the image IM 11 (step S14). In the example of FIG. 1, the determination device 100 determines the number of human faces included in the image IM 11 based on the scores of the feature information FM10 to FM14 acquired in step S13. For example, the determination device 100 includes the class (number) corresponding to the feature information FM14, which is the maximum score among the scores of the feature information FM10 to FM14 acquired in step S13, as the number of human faces included in the image IM11. judge. Specifically, the determination device 100 determines that the number of human faces included in the image IM 11 is 4 or more, as shown in the number determination information AN11.

また、判定装置１００は、画像ＩＭ１１に含まれる人の顔の位置を判定する（ステップＳ１５）。図１の例では、判定装置１００は、最大のスコアである特徴情報ＦＭ１４に基づいて画像ＩＭ１１に含まれる人の顔の位置を判定する。例えば、判定装置１００は、所定の閾値以上の特徴量が位置する領域に人の顔が含まれると判定する。図１の例では、判定装置１００は、画像ＩＭ１１において特徴情報ＦＭ１４の領域ＡＲ１１に対応する領域に人の顔が含まれると判定する。具体的には、判定装置１００は、位置判定情報ＡＰ１１に示すように、画像ＩＭ１１に含まれる人の顔の位置を画像ＩＭ１１の上部と判定する。なお、なお、学習器ＬＥがＦＣＮである場合、判定装置１００は、特徴情報を用いずに人の顔の位置を推定してもよい。また、判定された人の顔の位置に基づく画像の加工については後述する。 Further, the determination device 100 determines the position of the human face included in the image IM 11 (step S15). In the example of FIG. 1, the determination device 100 determines the position of the human face included in the image IM 11 based on the feature information FM 14 which is the maximum score. For example, the determination device 100 determines that the human face is included in the region where the feature amount equal to or higher than a predetermined threshold value is located. In the example of FIG. 1, the determination device 100 determines in the image IM11 that the region corresponding to the region AR11 of the feature information FM14 includes a human face. Specifically, as shown in the position determination information AP11, the determination device 100 determines that the position of the human face included in the image IM11 is the upper part of the image IM11. When the learning device LE is FCN, the determination device 100 may estimate the position of the human face without using the feature information. Further, the processing of the image based on the position of the determined human face will be described later.

上述したように、判定装置１００は、ニューラルネットワーク（図１ではＦＣＮ）における情報を用いて画像に含まれる対象の数を適切に判定する。図１では、判定装置１００は、学習器ＬＥから、０個、１個、２個、３個、または４個以上の５つの顔の数（クラス）に対応する特徴情報ＦＭ１０～ＦＭ１４を取得する。そして、判定装置１００は、取得した特徴情報ＦＭ１０～ＦＭ１４の各々のスコアに基づいて、画像ＩＭ１１に含まれる人の顔の数を判定する。これにより、判定装置１００は、画像に含まれる対象の数を適切に判定することができる。そして、判定装置１００は、判定した人の顔の数に対応する特徴情報ＦＭ１４に基づいて、画像ＩＭ１１に含まれる人の顔の位置を判定する。これにより、判定装置１００は、画像に含まれる対象の位置を適切に判定することができる。 As described above, the determination device 100 appropriately determines the number of objects included in the image by using the information in the neural network (FCN in FIG. 1). In FIG. 1, the determination device 100 acquires feature information FM10 to FM14 corresponding to the number (class) of five faces of 0, 1, 2, 3, or 4 or more from the learner LE. .. Then, the determination device 100 determines the number of human faces included in the image IM 11 based on the scores of the acquired feature information FM10 to FM14. As a result, the determination device 100 can appropriately determine the number of objects included in the image. Then, the determination device 100 determines the position of the human face included in the image IM 11 based on the feature information FM 14 corresponding to the number of determined human faces. As a result, the determination device 100 can appropriately determine the position of the target included in the image.

上述した例では、学習器ＬＥが、画像に含まれる対象の数を０個、１個、２個、３個、または４個以上の５つの種別（クラス）のいずれかに分類する場合を示したが、学習器ＬＥが分類するクラスは、上記に限らず、目的に応じて種々のクラスであってもよい。例えば、学習器ＬＥが分類するクラスは、画像に含まれる対象の数が０～９個の各個数、または、１０個以上であるかを識別する１１のクラスであってもよい。また、例えば、学習器ＬＥが分類するクラスは、５個未満、または５個以上であるかを識別する２つのクラスであってもよい。 In the above example, the learning device LE shows a case where the number of objects included in the image is classified into one of five types (classes) of 0, 1, 2, 3, or 4 or more. However, the classes classified by the learner LE are not limited to the above, and may be various classes depending on the purpose. For example, the class classified by the learner LE may be 11 classes for identifying whether the number of objects included in the image is 0 to 9 each, or 10 or more. Further, for example, the class classified by the learner LE may be two classes for identifying whether the number is less than 5, or 5 or more.

〔２．判定装置の構成〕
次に、図２を用いて、実施形態に係る判定装置１００の構成について説明する。図２は、実施形態に係る判定装置１００の構成例を示す図である。図２に示すように、判定装置１００は、通信部１１０と、記憶部１２０と、制御部１３０とを有する。なお、判定装置１００は、判定装置１００の管理者等から各種操作を受け付ける入力部（例えば、キーボードやマウス等）や、各種情報を表示するための表示部（例えば、液晶ディスプレイ等）を有してもよい。 [2. Judgment device configuration]
Next, the configuration of the determination device 100 according to the embodiment will be described with reference to FIG. FIG. 2 is a diagram showing a configuration example of the determination device 100 according to the embodiment. As shown in FIG. 2, the determination device 100 includes a communication unit 110, a storage unit 120, and a control unit 130. The determination device 100 has an input unit (for example, a keyboard, a mouse, etc.) that receives various operations from the administrator of the determination device 100, and a display unit (for example, a liquid crystal display, etc.) for displaying various information. You may.

（通信部１１０）
通信部１１０は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。そして、通信部１１０は、ネットワークと有線または無線で接続され、例えば端末装置との間で情報の送受信を行う。 (Communication unit 110)
The communication unit 110 is realized by, for example, a NIC (Network Interface Card) or the like. Then, the communication unit 110 is connected to the network by wire or wirelessly, and transmits / receives information to / from, for example, a terminal device.

（記憶部１２０）
記憶部１２０は、例えば、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。実施形態に係る記憶部１２０は、図２に示すように、学習情報記憶部１２１と、画像情報記憶部１２２とを有する。 (Memory unit 120)
The storage unit 120 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory (Flash Memory), or a storage device such as a hard disk or an optical disk. As shown in FIG. 2, the storage unit 120 according to the embodiment has a learning information storage unit 121 and an image information storage unit 122.

（学習情報記憶部１２１）
実施形態に係る学習情報記憶部１２１は、学習に関する各種情報を記憶する。例えば、図３では、学習情報記憶部１２１は、所定の学習処理により生成された学習器ＬＥに関する学習情報（モデル）を記憶する。図３に、実施形態に係る学習情報記憶部１２１の一例を示す。図３に示す学習情報記憶部１２１は、「重み（ｗ_ｉｊ）」を記憶する。 (Learning information storage unit 121)
The learning information storage unit 121 according to the embodiment stores various information related to learning. For example, in FIG. 3, the learning information storage unit 121 stores learning information (model) related to the learning device LE generated by a predetermined learning process. FIG. 3 shows an example of the learning information storage unit 121 according to the embodiment. The learning information storage unit 121 shown in FIG. 3 stores a “weight ( _wij )”.

例えば、図３に示す例において、「重み（ｗ_１１）」は「０．２」であり、「重み（ｗ_１２）」は「－０．３」であることを示す。また、図３に示す例において、「重み（ｗ_２１）」は「０．５」であり、「重み（ｗ_２２）」は「１．３」であることを示す。 For example, in the example shown in FIG. 3, it is shown that the “weight (w ₁₁ )” is “0.2” and the “weight (w ₁₂ )” is “−0.3”. Further, in the example shown in FIG. 3, it is shown that the “weight (w ₂₁ )” is “0.5” and the “weight (w ₂₂ )” is “1.3”.

なお、「重み（ｗ_ｉｊ）」は、例えば、学習器ＬＥにおけるニューロンｙ_ｉからニューロンｘ_ｊへのシナプス結合係数であってもよい。また、学習情報記憶部１２１は、上記に限らず、目的に応じて種々の情報を記憶してもよい。 The "weight ( _wij )" may be, for example, a synaptic connection coefficient from the neuron y _i to the neuron x _j in the learner LE. Further, the learning information storage unit 121 is not limited to the above, and may store various information depending on the purpose.

（画像情報記憶部１２２）
実施形態に係る画像情報記憶部１２２は、画像に関する各種情報を記憶する。図４に、実施形態に係る画像情報記憶部１２２の一例を示す。図４に示す画像情報記憶部１２２は、「画像ＩＤ」、「画像」といった項目を有する。 (Image information storage unit 122)
The image information storage unit 122 according to the embodiment stores various information related to the image. FIG. 4 shows an example of the image information storage unit 122 according to the embodiment. The image information storage unit 122 shown in FIG. 4 has items such as an “image ID” and an “image”.

「画像ＩＤ」は、画像を識別するための識別情報を示す。「画像」は、画像情報を示す。具体的には、「画像」は、学習器ＬＥに入力した画像を示す。図４では、説明のため画像ＩＤにより識別される画像を図示するが、「画像」としては、画像の格納場所を示すファイルパス名などが格納されてもよい。 The "image ID" indicates identification information for identifying an image. "Image" indicates image information. Specifically, the "image" indicates an image input to the learner LE. In FIG. 4, an image identified by an image ID is shown for explanation, but as the “image”, a file path name or the like indicating a storage location of the image may be stored.

なお、画像情報記憶部１２２は、上記に限らず、目的に応じて種々の情報を記憶してもよい。例えば、画像情報記憶部１２２は、画像を生成した日時に関する情報を記憶してもよい。また、例えば、画像情報記憶部１２２は、画像に含まれる対象に関する情報を記憶してもよい。図４では、画像ＩＤ「ＩＭ１１」により識別される画像には、５つの人の顔が含まれることを示す情報を記憶してもよい。また、例えば、画像情報記憶部１２２は、取得した元となる画像を記憶してもよい。 The image information storage unit 122 is not limited to the above, and may store various information depending on the purpose. For example, the image information storage unit 122 may store information regarding the date and time when the image was generated. Further, for example, the image information storage unit 122 may store information about an object included in the image. In FIG. 4, information indicating that the image identified by the image ID “IM11” includes the faces of five people may be stored. Further, for example, the image information storage unit 122 may store the acquired original image.

（制御部１３０）
図２の説明に戻って、制御部１３０は、コントローラ（controller）であり、例えば、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）等によって、判定装置１００内部の記憶装置に記憶されている各種プログラム（判定プログラムの一例に相当）がＲＡＭを作業領域として実行されることにより実現される。また、制御部１３０は、コントローラ（controller）であり、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現される。 (Control unit 130)
Returning to the description of FIG. 2, the control unit 130 is a controller, and is stored in a storage device inside the determination device 100 by, for example, a CPU (Central Processing Unit) or an MPU (Micro Processing Unit). It is realized by executing various programs (corresponding to an example of a determination program) using the RAM as a work area. Further, the control unit 130 is a controller, and is realized by, for example, an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

図２に示すように、制御部１３０は、取得部１３１と、学習部１３２と、算出部１３３と、判定部１３４と、生成部１３５と、提供部１３６とを有し、以下に説明する情報処理の機能や作用を実現または実行する。なお、制御部１３０の内部構成は、図２に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。 As shown in FIG. 2, the control unit 130 includes an acquisition unit 131, a learning unit 132, a calculation unit 133, a determination unit 134, a generation unit 135, and a provision unit 136, and information described below. Realize or execute the function or action of processing. The internal configuration of the control unit 130 is not limited to the configuration shown in FIG. 2, and may be any other configuration as long as it is configured to perform information processing described later.

（取得部１３１）
取得部１３１は、各種情報を取得する。取得部１３１は、画像を取得する。例えば、取得部１３１は、外部の情報処理装置から画像を取得する。図１では、取得部１３１は、外部の情報処理装置から画像ＩＭ１１を取得する。また、取得部１３１は、画像情報記憶部１２２から画像（例えば、画像ＩＭ１１）を取得してもよい。 (Acquisition unit 131)
The acquisition unit 131 acquires various types of information. The acquisition unit 131 acquires an image. For example, the acquisition unit 131 acquires an image from an external information processing device. In FIG. 1, the acquisition unit 131 acquires the image IM 11 from an external information processing device. Further, the acquisition unit 131 may acquire an image (for example, an image IM 11) from the image information storage unit 122.

また、取得部１３１は、画像中の対象の数を識別するニューラルネットワークに入力された入力画像に基づく複数の特徴情報であって、ニューラルネットワークが識別する各数に対応する複数の特徴情報を取得する。例えば、取得部１３１は、画像中の人の顔の数を識別するニューラルネットワークに入力された入力画像に基づく複数の特徴情報を取得する。また、例えば、取得部１３１は、畳み込み処理及びプーリング処理を行うニューラルネットワークが識別する各数に対応する複数の特徴情報を取得する。また、例えば、取得部１３１は、全結合層を含まないニューラルネットワークが識別する各数に対応する複数の特徴情報を取得する。また、例えば、取得部１３１は、ＦＣＮが識別する各数に対応する複数の特徴情報を取得する。 Further, the acquisition unit 131 acquires a plurality of feature information based on the input image input to the neural network that identifies the number of objects in the image, and acquires a plurality of feature information corresponding to each number identified by the neural network. do. For example, the acquisition unit 131 acquires a plurality of feature information based on the input image input to the neural network that identifies the number of human faces in the image. Further, for example, the acquisition unit 131 acquires a plurality of feature information corresponding to each number identified by the neural network that performs the convolution process and the pooling process. Further, for example, the acquisition unit 131 acquires a plurality of feature information corresponding to each number identified by the neural network that does not include the fully connected layer. Further, for example, the acquisition unit 131 acquires a plurality of feature information corresponding to each number identified by the FCN.

図１の例では、取得部１３１は、学習器ＬＥが識別する各数に対応する複数の特徴情報を取得する。例えば、取得部１３１は、学習器ＬＥから０個、１個、２個、３個、または４個以上の５つの顔の数（クラス）に対応する特徴情報ＦＭ１０～ＦＭ１４を取得する。また、例えば、取得部１３１は、特徴情報ＦＭ１０～ＦＭ１４を外部の情報処理装置から取得してもよい。 In the example of FIG. 1, the acquisition unit 131 acquires a plurality of feature information corresponding to each number identified by the learner LE. For example, the acquisition unit 131 acquires the feature information FM10 to FM14 corresponding to the number (class) of five faces of 0, 1, 2, 3, or 4 or more from the learning device LE. Further, for example, the acquisition unit 131 may acquire the feature information FM10 to FM14 from an external information processing device.

（学習部１３２）
学習部１３２は、種々の情報を学習する。また、学習部１３２は、学習により種々の情報を生成する。例えば、学習部１３２は、学習器（モデル）を学習する。言い換えると、学習部１３２は、学習を行うことにより学習器（モデル）を生成する。例えば、学習部１３２は、学習器ＬＥを学習する。例えば、学習部１３２は、画像と当該画像に含まれる対象の数との組み合わせにより学習器を学習する。また、学習部１３２は、所定の評価関数を最小化するように学習器を学習する。なお、学習部１３２が行う学習処理の詳細は後述する。なお、判定装置１００は、外部の情報処理装置から各数に対応する複数の特徴情報を取得する場合、学習部１３２を有しなくてもよい。 (Learning Department 132)
The learning unit 132 learns various information. Further, the learning unit 132 generates various information by learning. For example, the learning unit 132 learns a learning device (model). In other words, the learning unit 132 generates a learning device (model) by performing learning. For example, the learning unit 132 learns the learning device LE. For example, the learning unit 132 learns the learning device by combining an image and the number of objects included in the image. Further, the learning unit 132 learns the learning device so as to minimize a predetermined evaluation function. The details of the learning process performed by the learning unit 132 will be described later. The determination device 100 does not have to have the learning unit 132 when acquiring a plurality of feature information corresponding to each number from an external information processing device.

（算出部１３３）
算出部１３３は、各種情報を算出する。例えば、算出部１３３は、複数の特徴情報に基づいてスコアを算出する。例えば、算出部１３３は、各数（クラス）に対応する特徴情報に基づいて、各数に対応するスコアを算出する。 (Calculation unit 133)
The calculation unit 133 calculates various information. For example, the calculation unit 133 calculates the score based on a plurality of feature information. For example, the calculation unit 133 calculates the score corresponding to each number based on the feature information corresponding to each number (class).

図１の例では、算出部１３３は、特徴情報ＦＭ１０～ＦＭ１４の各々についてスコアを算出する。例えば、算出部１３３は、特徴情報ＦＭ１０～ＦＭ１４の各々における特徴量の平均をスコアとして算出する。例えば、算出部１３３は、特徴情報ＦＭ１０における特徴量に基づいて、人の顔の数「０個」に対応するスコアを「０．１２」と算出する。また、例えば、算出部１３３は、特徴情報ＦＭ１１における特徴量に基づいて、人の顔の数「１個」に対応するスコアを「０．０１」と算出する。また、例えば、算出部１３３は、特徴情報ＦＭ１２における特徴量に基づいて、人の顔の数「２個」に対応するスコアを「０．０３」と算出する。また、例えば、算出部１３３は、特徴情報ＦＭ１３における特徴量に基づいて、人の顔の数「３個」に対応するスコアを「０．０６」と算出する。また、例えば、算出部１３３は、特徴情報ＦＭ１４における特徴量に基づいて、人の顔の数「４個以上」に対応するスコアを「０．７５」と算出する。 In the example of FIG. 1, the calculation unit 133 calculates a score for each of the feature information FM10 to FM14. For example, the calculation unit 133 calculates the average of the feature amounts in each of the feature information FM10 to FM14 as a score. For example, the calculation unit 133 calculates the score corresponding to the number of human faces "0" as "0.12" based on the feature amount in the feature information FM10. Further, for example, the calculation unit 133 calculates the score corresponding to the number of human faces "1" as "0.01" based on the feature amount in the feature information FM11. Further, for example, the calculation unit 133 calculates the score corresponding to the number of human faces "2" as "0.03" based on the feature amount in the feature information FM12. Further, for example, the calculation unit 133 calculates the score corresponding to the number of human faces "3" as "0.06" based on the feature amount in the feature information FM13. Further, for example, the calculation unit 133 calculates the score corresponding to the number of human faces "4 or more" as "0.75" based on the feature amount in the feature information FM14.

（判定部１３４）
判定部１３４は、種々の情報を判定する。例えば、判定部１３４は、取得部１３１により取得された複数の特徴情報に基づいて、入力画像に含まれる対象の数を判定する。例えば、判定部１３４は、算出部１３３により算出されたスコアに基づいて、入力画像に含まれる対象の数を判定する。判定部１３４は、算出部１３３により算出された数に対応するスコアに基づいて、入力画像に含まれる対象の数を判定する。 (Judgment unit 134)
The determination unit 134 determines various information. For example, the determination unit 134 determines the number of objects included in the input image based on the plurality of feature information acquired by the acquisition unit 131. For example, the determination unit 134 determines the number of objects included in the input image based on the score calculated by the calculation unit 133. The determination unit 134 determines the number of objects included in the input image based on the score corresponding to the number calculated by the calculation unit 133.

図１の例では、判定部１３４は、画像ＩＭ１１に含まれる人の顔の数を判定する。例えば、判定部１３４は、各特徴情報ＦＭ１０～ＦＭ１４のスコアに基づいて、画像ＩＭ１１に含まれる人の顔の数を判定する。例えば、判定部１３４は、各特徴情報ＦＭ１０～ＦＭ１４のスコアのうち、最大のスコアである特徴情報ＦＭ１４に対応するクラス（数）を画像ＩＭ１１に含まれる人の顔の数と判定する。例えば、判定部１３４は、画像ＩＭ１１に含まれる人の顔の数を４個以上と判定する。 In the example of FIG. 1, the determination unit 134 determines the number of human faces included in the image IM 11. For example, the determination unit 134 determines the number of human faces included in the image IM 11 based on the scores of the feature information FM10 to FM14. For example, the determination unit 134 determines that the class (number) corresponding to the feature information FM14, which is the maximum score among the scores of the feature information FM10 to FM14, is the number of human faces included in the image IM11. For example, the determination unit 134 determines that the number of human faces included in the image IM 11 is 4 or more.

また、判定部１３４は、判定した対象の数に対応する特徴情報における特徴量に関する情報に基づいて、入力画像における対象の位置を判定する。判定部１３４は、画像ＩＭ１１に含まれる人の顔の位置を判定する。図１の例では、判定部１３４は、最大のスコアである特徴情報ＦＭ１４に基づいて画像ＩＭ１１に含まれる人の顔の位置を判定する。例えば、判定部１３４は、所定の閾値以上の特徴量が位置する領域に人の顔が含まれると判定する。例えば、判定部１３４は、画像ＩＭ１１において特徴情報ＦＭ１４の領域ＡＲ１１（図１参照）に対応する領域に人の顔が含まれると判定する。例えば、判定部１３４は、位置判定情報ＡＰ１１に示すように、画像ＩＭ１１に含まれる人の顔の位置を画像ＩＭ１１の上部と判定する。 Further, the determination unit 134 determines the position of the target in the input image based on the information regarding the feature amount in the feature information corresponding to the number of the determined targets. The determination unit 134 determines the position of the human face included in the image IM 11. In the example of FIG. 1, the determination unit 134 determines the position of the human face included in the image IM 11 based on the feature information FM 14 which is the maximum score. For example, the determination unit 134 determines that the human face is included in the region where the feature amount equal to or higher than the predetermined threshold value is located. For example, the determination unit 134 determines that the human face is included in the region corresponding to the region AR11 (see FIG. 1) of the feature information FM14 in the image IM11. For example, as shown in the position determination information AP11, the determination unit 134 determines that the position of the human face included in the image IM11 is the upper part of the image IM11.

また、例えば、判定部１３４は、各特徴情報に含まれる複数の領域の各々の特徴量に関する情報に基づいて、入力画像に含まれる対象の位置を判定する。この場合についての詳細は後述する。 Further, for example, the determination unit 134 determines the position of the target included in the input image based on the information regarding the feature amount of each of the plurality of regions included in each feature information. Details of this case will be described later.

（生成部１３５）
生成部１３５は、各種情報を生成する。例えば、生成部１３５は、取得部１３１により取得された複数の特徴情報や判定部１３４により判定された対象の数や位置等に基づいて、画像を生成する。例えば、生成部１３５は、画像ＩＭ１１をクロッピングすることにより、加工画像（以下、「クロッピング画像」ともいう）を生成する。なお、ここでいうクロッピングとは画像から所定の領域を切り取る処理をいう。また、生成部１３５がクロッピングにより生成した画像は、例えば所定のコンテンツの画像として配信されるが、詳細は後述する。なお、判定装置１００が、画像中の対象の数や位置の判定のみを行う場合、生成部１３５を有しなくてもよい。 (Generator 135)
The generation unit 135 generates various information. For example, the generation unit 135 generates an image based on a plurality of feature information acquired by the acquisition unit 131, the number and positions of objects determined by the determination unit 134, and the like. For example, the generation unit 135 generates a processed image (hereinafter, also referred to as “cropping image”) by cropping the image IM11. The cropping here means a process of cutting a predetermined area from an image. Further, the image generated by the generation unit 135 by cropping is delivered as, for example, an image of predetermined content, and the details will be described later. When the determination device 100 only determines the number and position of objects in the image, it does not have to have the generation unit 135.

（提供部１３６）
提供部１３６は、外部の情報処理装置へ各種情報を提供（送信）する。例えば、提供部１３６は、生成部１３５により生成された加工画像を外部の情報処理装置へ提供する。また、提供部１３６は、加工画像ＩＭ１２～ＩＭ１４等（図６参照）を提供する。また、提供部１３６は、判定部１３４により判定された対象の数や位置に関する情報を外部の情報処理装置へ提供する。 (Providing Department 136)
The providing unit 136 provides (transmits) various information to an external information processing device. For example, the providing unit 136 provides the processed image generated by the generating unit 135 to an external information processing device. Further, the providing unit 136 provides processed images IM12 to IM14 and the like (see FIG. 6). Further, the providing unit 136 provides information regarding the number and positions of the objects determined by the determining unit 134 to the external information processing device.

〔３．画像に含まれる対象の数の判定処理のフロー〕
ここで、図５を用いて、実施形態に係る判定装置１００による画像に含まれる対象の数の判定処理の手順について説明する。図５は、実施形態に係る対象の数の判定の一例を示すフローチャートである。 [3. Flow of judgment processing of the number of objects included in the image]
Here, with reference to FIG. 5, a procedure for determining the number of objects included in the image by the determination device 100 according to the embodiment will be described. FIG. 5 is a flowchart showing an example of determination of the number of objects according to the embodiment.

図５に示すように、判定装置１００は、画像を取得する（ステップＳ１０１）。図１では、判定装置１００は、画像ＩＭ１１を取得する。その後、判定装置１００は、ステップＳ１０１で取得した画像を学習器に入力する（ステップＳ１０２）。図１では、判定装置１００は、取得した画像ＩＭ１１を学習器ＬＥに入力する。 As shown in FIG. 5, the determination device 100 acquires an image (step S101). In FIG. 1, the determination device 100 acquires the image IM 11. After that, the determination device 100 inputs the image acquired in step S101 to the learner (step S102). In FIG. 1, the determination device 100 inputs the acquired image IM 11 to the learner LE.

その後、判定装置１００は、複数のクラス（数）の各々に対応する特徴情報を取得する（ステップＳ１０３）。図１では、判定装置１００は、学習器ＬＥから、０個、１個、２個、３個、または４個以上の５つの顔の数（クラス）に対応する特徴情報ＦＭ１０～ＦＭ１４を取得する。 After that, the determination device 100 acquires the feature information corresponding to each of the plurality of classes (numbers) (step S103). In FIG. 1, the determination device 100 acquires feature information FM10 to FM14 corresponding to the number (class) of five faces of 0, 1, 2, 3, or 4 or more from the learner LE. ..

その後、判定装置１００は、各クラス（数）に対応する特徴情報に基づいて、各クラス（数）に対応するスコアを算出する（ステップＳ１０４）。図１では、判定装置１００は、特徴情報ＦＭ１０～ＦＭ１４の各々に基づいて、０個、１個、２個、３個、または４個以上の５つの顔の数（クラス）に対応するスコアを算出する。 After that, the determination device 100 calculates the score corresponding to each class (number) based on the feature information corresponding to each class (number) (step S104). In FIG. 1, the determination device 100 assigns a score corresponding to the number (class) of five faces of 0, 1, 2, 3, or 4 or more based on each of the feature information FM10 to FM14. calculate.

その後、判定装置１００は、各クラス（数）に対応するスコアに基づいて、画像に含まれる対象の数を判定する（ステップＳ１０５）。図１では、判定装置１００は、各特徴情報ＦＭ１０～ＦＭ１４のスコアに基づいて、画像ＩＭ１１に含まれる人の顔の数を判定する。 After that, the determination device 100 determines the number of objects included in the image based on the score corresponding to each class (number) (step S105). In FIG. 1, the determination device 100 determines the number of human faces included in the image IM 11 based on the scores of the feature information FM10 to FM14.

〔４．加工画像の生成〕
ここで、判定装置１００による加工画像の生成について、図６を用いて説明する。図６は、実施形態に係る加工画像の生成例を示す図である。図６の例では、判定装置１００の判定部１３４により判定された人の顔の位置に基づく画像のクロッピングの一例を示す。例えば、このような画像のクロッピングは、判定装置１００の生成部１３５が行う。 [4. Generation of processed images]
Here, the generation of the processed image by the determination device 100 will be described with reference to FIG. FIG. 6 is a diagram showing an example of generating a processed image according to the embodiment. In the example of FIG. 6, an example of cropping an image based on the position of a person's face determined by the determination unit 134 of the determination device 100 is shown. For example, such image cropping is performed by the generation unit 135 of the determination device 100.

なお、図６に示す画像ＩＭ１１は、図１に示す画像ＩＭ１１と同様であるものとする。すなわち、図６に示す画像ＩＭ１１は、特徴情報ＦＭ１４の領域ＡＲ１１（図１参照）に対応する領域（以下、「特徴領域」ともいう）に人の顔が含まれると判定され、画像ＩＭ１１に含まれる人の顔の位置を画像ＩＭ１１の上部と判定されているものとする。 The image IM11 shown in FIG. 6 is assumed to be the same as the image IM11 shown in FIG. That is, the image IM11 shown in FIG. 6 is determined to include a human face in the region (hereinafter, also referred to as “feature region”) corresponding to the region AR11 (see FIG. 1) of the feature information FM14, and is included in the image IM11. It is assumed that the position of the face of the person is determined to be the upper part of the image IM11.

図６の例では、判定装置１００が３種類のアスペクト比に応じた加工画像ＩＭ１２～ＩＭ１４を画像ＩＭ１１から生成する場合を示す。例えば、判定装置１００は、アスペクト比が「１：１」である場合、画像ＩＭ１１をクロッピングすることにより、アスペクト比が「１：１」であり、画像ＩＭ１１の特徴領域を含む加工画像ＩＭ１２を生成する。また、例えば、判定装置１００は、アスペクト比が「２：１」である場合、画像ＩＭ１１をクロッピングすることにより、アスペクト比が「２：１」であり、画像ＩＭ１１の特徴領域を含む加工画像ＩＭ１３を生成する。また、例えば、判定装置１００は、アスペクト比が「４：３」である場合、画像ＩＭ１１をクロッピングすることにより、アスペクト比が「４：３」であり、画像ＩＭ１１の特徴領域を含む加工画像ＩＭ１４を生成する。このように、判定装置１００は、種々のアスペクト比に応じた加工画像ＩＭ１２～ＩＭ１４等を生成することができる。例えば、判定装置１００は、アスペクト比が指定された場合、その指定されたアスペクト比を満たす加工画像を画像ＩＭ１１から生成してもよい。 In the example of FIG. 6, the case where the determination device 100 generates the processed images IM12 to IM14 corresponding to the three types of aspect ratios from the image IM11 is shown. For example, when the aspect ratio is "1: 1", the determination device 100 crops the image IM 11 to generate a processed image IM 12 having an aspect ratio of "1: 1" and including a feature region of the image IM 11. do. Further, for example, when the aspect ratio of the determination device 100 is "2: 1", the aspect ratio is "2: 1" by cropping the image IM11, and the processed image IM13 including the characteristic region of the image IM11 is included. To generate. Further, for example, when the aspect ratio of the determination device 100 is "4: 3", the aspect ratio is "4: 3" by cropping the image IM11, and the processed image IM14 including the feature area of the image IM11 is included. To generate. In this way, the determination device 100 can generate processed images IM12 to IM14 and the like according to various aspect ratios. For example, when the aspect ratio is specified, the determination device 100 may generate a processed image satisfying the specified aspect ratio from the image IM 11.

なお、図６に示す各アスペクト比に対応する加工画像ＩＭ１２～ＩＭ１４は一例であり、判定装置１００は、アスペクト比を満たせば、どのような加工画像が生成されてもよい。例えば、判定装置１００は、アスペクト比が指定された場合、アスペクト比を満たす切取枠を拡縮したり、特徴情報ＦＭ１４中を移動させたりすることにより、特徴情報ＦＭ１４内において切取枠内の領域の特徴量の平均が最大となる領域をクロッピングすることにより、指定されたアスペクト比を満たす加工画像を生成してもよい。また、例えば、判定装置１００は、画像ＩＭ１１の特徴領域を含み、アスペクト比を満たせば、どのような加工画像を生成してもよい。 The processed images IM12 to IM14 corresponding to each aspect ratio shown in FIG. 6 are examples, and the determination device 100 may generate any processed image as long as the aspect ratio is satisfied. For example, when the aspect ratio is specified, the determination device 100 expands or contracts the cutting frame satisfying the aspect ratio, or moves the cutting frame in the feature information FM14 to feature the area in the cutting frame in the feature information FM14. By cropping the region where the average amount is maximum, a processed image satisfying the specified aspect ratio may be generated. Further, for example, the determination device 100 may generate any processed image as long as it includes the feature region of the image IM 11 and satisfies the aspect ratio.

また、図６に示す例では、判定装置１００により生成されたアスペクト比が「２：１」である加工画像ＩＭ１３がユーザＵ１により利用される端末装置１０に表示される場合を示す。図６に示す例では、端末装置１０には、所定のコンテンツ配信装置から配信されたコンテンツＣＴ１１～ＣＴ１４が表示される。なお、判定装置１００がコンテンツを配信する場合、コンテンツＣＴ１１～ＣＴ１４は判定装置１００から端末装置１０へ送信されてもよい。 Further, in the example shown in FIG. 6, a case where the processed image IM 13 having an aspect ratio of “2: 1” generated by the determination device 100 is displayed on the terminal device 10 used by the user U1 is shown. In the example shown in FIG. 6, the terminal device 10 displays the contents CT11 to CT14 distributed from the predetermined content distribution device. When the determination device 100 distributes the content, the content CT11 to CT14 may be transmitted from the determination device 100 to the terminal device 10.

図６に示す例においては、コンテンツＣＴ１４の画像には加工画像ＩＭ１３が用いられる。このように、加工画像ＩＭ１３は、端末装置１０においてスクロール方向に並べて表示されるコンテンツの画像として用いられてもよい。このように、判定装置１００により生成された加工画像ＩＭ１３は、スマートフォン等の種々の端末装置１０において表示される。上述したように、判定装置１００は、種々のアスペクト比に応じて加工画像を生成できるため、端末装置１０の種別を問わず、適切な加工画像を生成することができる。 In the example shown in FIG. 6, the processed image IM13 is used for the image of the content CT14. As described above, the processed image IM 13 may be used as an image of the content displayed side by side in the scroll direction in the terminal device 10. In this way, the processed image IM 13 generated by the determination device 100 is displayed on various terminal devices 10 such as smartphones. As described above, since the determination device 100 can generate the processed image according to various aspect ratios, it is possible to generate an appropriate processed image regardless of the type of the terminal device 10.

〔５．画像の加工処理のフロー〕
次に、図７を用いて、実施形態に係る判定装置１００による画像の加工処理の手順について説明する。図７は、実施形態に係る加工画像の生成の一例を示すフローチャートである。 [5. Image processing flow]
Next, the procedure of image processing by the determination device 100 according to the embodiment will be described with reference to FIG. 7. FIG. 7 is a flowchart showing an example of generating a processed image according to the embodiment.

図７に示すように、判定装置１００は、画像と、画像中の対象の数に対応する特徴情報を取得する（ステップＳ２０１）。図１では、判定装置１００は、画像ＩＭ１１と画像ＩＭ１１中の対象の数（４個以上）に対応する特徴情報ＦＭ１４を取得する。 As shown in FIG. 7, the determination device 100 acquires the image and the feature information corresponding to the number of objects in the image (step S201). In FIG. 1, the determination device 100 acquires the feature information FM14 corresponding to the number of objects (4 or more) in the image IM11 and the image IM11.

その後、判定装置１００は、アスペクト比を取得する（ステップＳ２０２）。例えば、判定装置１００は、アスペクト比「２：１」を取得する。その後、判定装置１００は、アスペクト比と特徴情報に基づいて、画像中の対象を含むようにクロッピング画像を生成する（ステップＳ２０３）。例えば、判定装置１００は、アスペクト比「２：１」と特徴情報ＦＭ１４に基づいて、画像ＩＭ１１の特徴領域を含む加工画像ＩＭ１３を生成する。 After that, the determination device 100 acquires the aspect ratio (step S202). For example, the determination device 100 acquires the aspect ratio "2: 1". After that, the determination device 100 generates a cropping image so as to include an object in the image based on the aspect ratio and the feature information (step S203). For example, the determination device 100 generates a processed image IM 13 including a feature region of the image IM 11 based on the aspect ratio “2: 1” and the feature information FM14.

〔６．学習処理〕
ここで、判定装置１００の学習部１３２における学習処理について、図８及び図９を用いて説明する。図８及び図９は、実施形態に係る学習処理の一例を示す図である。 [6. Learning process]
Here, the learning process in the learning unit 132 of the determination device 100 will be described with reference to FIGS. 8 and 9. 8 and 9 are diagrams showing an example of the learning process according to the embodiment.

まず、判定装置１００が用いる学習器ＬＥについて簡単に説明する。判定装置１００が用いる学習器ＬＥは、例えば、入力されたデータに対する演算結果を出力する複数のノードを多層に接続した学習器であって、教師あり学習により抽象化された画像の特徴を学習された学習器である。例えば、学習器ＬＥは、複数のノードを有する層を多段に接続したニューラルネットワークであり、いわゆるディープラーニングの技術により実現されるＤＮＮ（Deep Neural Network）であってもよい。また、画像の特徴とは、画像に含まれる文字の有無、色、構成等、画像内に現れる具体的な特徴のみならず、撮像されている物体が何であるか、画像がどのような利用者に好かれるか、画像の雰囲気等、抽象化（メタ化）された画像の特徴をも含む概念であってもよい。 First, the learning device LE used by the determination device 100 will be briefly described. The learning device LE used by the determination device 100 is, for example, a learning device in which a plurality of nodes that output calculation results for the input data are connected in multiple layers, and the features of the image abstracted by supervised learning are learned. It is a learning device. For example, the learner LE is a neural network in which layers having a plurality of nodes are connected in multiple stages, and may be a DNN (Deep Neural Network) realized by a so-called deep learning technique. In addition, the features of the image are not only the specific features that appear in the image, such as the presence or absence of characters contained in the image, the color, and the composition, but also what the imaged object is and what kind of user the image is. It may be a concept that also includes the features of an abstracted (meta-ized) image, such as the atmosphere of the image.

例えば、学習器ＬＥは、ディープラーニングの技術により、以下のような学習手法により生成される。例えば、学習器は、各ノードの間の接続係数が初期化され、様々な特徴を有する画像が入力される。そして、学習器は、学習器における出力と、入力した画像との誤差が少なくなるようにパラメータ（接続係数）を補正するバックプロパゲーション（誤差逆伝播法）等の処理により生成される。例えば、学習器は、誤差関数等、所定の損失（ロス）関数を最小化するようにバックプロパゲーション等の処理を行うことにより生成される。上述のような処理を繰り返すことで、学習器は、入力された画像をより良く再現できる出力、すなわち入力された画像の特徴を出力することができる。 For example, the learning device LE is generated by the following learning method by the technique of deep learning. For example, in the learner, the connection coefficient between each node is initialized, and an image having various characteristics is input. Then, the learner is generated by processing such as backpropagation (error back propagation method) that corrects a parameter (connection coefficient) so that an error between the output in the learner and the input image is reduced. For example, the learner is generated by performing processing such as backpropagation so as to minimize a predetermined loss function such as an error function. By repeating the process as described above, the learner can output an output that can better reproduce the input image, that is, a feature of the input image.

なお、学習器の学習手法については、上述した手法に限定されるものではなく、任意の公知技術が適用可能である。また、学習器の学習を行う際に用いられる情報は、画像及びその画像に含まれる対象の数等の種々の画像のデータセットを利用してもよい。学習器の学習を行う際に用いられる情報は、対象が１つ含まれる画像及び対象が１つであることを示す情報のセットや、対象が複数（例えば２つ）含まれる画像及び対象が複数（例えば２つ）であることを示す情報のセットや、対象が含まれない画像及び対象が含まれない（０である）ことを示す情報のセット等を利用してもよい。また、学習器に対する画像の入力方法、学習器が出力するデータの形式、学習器に対して明示的に学習させる特徴の内容等は、任意の手法が適用できる。すなわち、判定装置１００は、画像から抽象化された特徴を示す特徴量を算出できるのであれば、任意の学習器を用いることができる。 The learning method of the learning device is not limited to the above-mentioned method, and any known technique can be applied. Further, as the information used when learning the learner, data sets of various images such as an image and the number of objects included in the image may be used. The information used when learning the learner includes an image containing one target and a set of information indicating that the target is one, and an image containing a plurality of targets (for example, two) and a plurality of targets. You may use a set of information indicating that (for example, two), an image that does not include the target, a set of information that indicates that the target is not included (0), and the like. In addition, any method can be applied to the method of inputting an image to the learning device, the format of the data output by the learning device, the content of the feature to be explicitly learned by the learning device, and the like. That is, the determination device 100 can use any learning device as long as it can calculate the feature amount indicating the abstracted feature from the image.

図１では、判定装置１００は、入力画像の局所領域の畳み込みとプーリングとを繰り返す、いわゆる畳み込みニューラルネットワーク（ＣＮＮ）による学習器ＬＥを用いるものとする。例えば、ＣＮＮによる学習器ＬＥは、画像から特徴を抽出して出力する機能に加え、画像内に含まれる文字や撮像対象等の位置的変異に対し、出力の不変性を有する。このため、学習器ＬＥは、画像の抽象化された特徴を精度良く算出することができる。 In FIG. 1, the determination device 100 uses a learner LE by a so-called convolutional neural network (CNN) that repeats convolution and pooling of a local region of an input image. For example, the learning device LE by CNN has an output invariance with respect to a positional variation such as a character included in an image or an image pickup target, in addition to a function of extracting features from an image and outputting the feature. Therefore, the learner LE can accurately calculate the abstracted features of the image.

まず、図８を用いて説明する。図８に示す例は、判定装置１００は、２個の人の顔を含む画像ＩＭ２１と画像ＩＭ２１に含まれる顔の数を示す情報ＲＯ２１（以下、「正解情報ＲＯ２１」と記載する場合がある）との組み合わせを教師データとして取得する（ステップＳ２１）。図８では、正解情報ＲＯ２１は、画像ＩＭ２１に含まれる顔の数が２個であることを示す情報を含む。具体的には、正解情報ＲＯ２１は、画像ＩＭ２１に含まれる顔の数が２個であるため、画像ＩＭ２１に含まれる顔の数が２個である確率が「１（１００％）」であることを示す情報を含む。 First, it will be described with reference to FIG. In the example shown in FIG. 8, the determination device 100 has an image IM21 including two human faces and an information RO21 indicating the number of faces included in the image IM21 (hereinafter, may be referred to as “correct answer information RO21”). The combination with and is acquired as teacher data (step S21). In FIG. 8, the correct answer information RO21 includes information indicating that the number of faces included in the image IM21 is two. Specifically, since the correct answer information RO21 has two faces included in the image IM21, the probability that the number of faces included in the image IM21 is two is "1 (100%)". Contains information indicating.

そして、学習器ＬＥには、２個の人の顔を含む画像ＩＭ２１が入力される（ステップＳ２２）。その後、出力情報ＯＣ２１－１に示すような対象の各数の確率を示す情報が学習器ＬＥから出力される（ステップＳ２３）。図８では、画像ＩＭ２１に含まれる顔の数が０個である確率が「０．０５（５％）」であり、１個である確率が「０．３（３０％）」であり、２個である確率が「０．５（５０％）」であり、３個である確率が「０．１（１０％）」であり、４個以上である確率が「０．０５（５％）」であることを示す出力情報ＯＣ２１－１が、学習器ＬＥから出力される。 Then, an image IM21 including the faces of two people is input to the learner LE (step S22). After that, information indicating the probability of each number of objects as shown in the output information OC21-1 is output from the learner LE (step S23). In FIG. 8, the probability that the number of faces included in the image IM21 is 0 is “0.05 (5%)”, the probability that the number is 1 is “0.3 (30%)”, and 2 The probability of having 3 pieces is "0.5 (50%)", the probability of having 3 pieces is "0.1 (10%)", and the probability of having 4 or more pieces is "0.05 (5%)". The output information OC21-1 indicating that the above is output from the learner LE.

上述したように、例えば、学習部１３２は、ディープラーニングの技術により、学習器ＬＥを学習し、生成する。例えば、学習部１３２は、画像と当該画像中における所定の対象の数との組み合わせを教師データとして用いる。例えば、学習部１３２は、学習器ＬＥにおける出力と、教師データに含まれる所定の対象の各数の確率との誤差が少なくなるようにパラメータ（接続係数）を補正するバックプロパゲーション（誤差逆伝播法）等の処理を行うことにより、学習器ＬＥを学習する。例えば、学習部１３２は、所定の誤差（ロス）関数を最小化するようにバックプロパゲーション等の処理を行うことにより学習器ＬＥを生成する。 As described above, for example, the learning unit 132 learns and generates the learning device LE by the technique of deep learning. For example, the learning unit 132 uses a combination of an image and a predetermined number of objects in the image as teacher data. For example, the learning unit 132 corrects a parameter (connection coefficient) so that the error between the output in the learner LE and the probability of each number of predetermined objects included in the teacher data is small, backpropagation (error back propagation). The learner LE is learned by performing processing such as method). For example, the learning unit 132 generates the learning device LE by performing processing such as backpropagation so as to minimize a predetermined error (loss) function.

例えば、学習部１３２は、下記の式（１）に示すような、誤差関数Ｌを用いる。下記の式（１）に示すように、学習部１３２は、例えば、Ｎ－クラス分類問題の場合、交差エントロピーを誤差関数として用いる。なお、誤差関数Ｌは、識別結果の確信度を表すものであれば、どのような関数であっても良い。例えば、誤差関数Ｌは、識別確率から求められるエントロピーであってもよい。また、例えば、誤差関数Ｌは、学習器ＬＥの認識の精度を示すものであれば、どのような関数であってもよい。 For example, the learning unit 132 uses an error function L as shown in the following equation (1). As shown in the following equation (1), the learning unit 132 uses the cross entropy as an error function, for example, in the case of the N-class classification problem. The error function L may be any function as long as it represents the certainty of the discrimination result. For example, the error function L may be the entropy obtained from the discrimination probability. Further, for example, the error function L may be any function as long as it indicates the accuracy of recognition of the learner LE.

ここで、上記式（１）や下記の式（２）～（３）中の「ｘ」は画像を示す。例えば、図８に示す例において、上記式（１）や下記の式（２）～（３）中の「ｘ」は、画像ＩＭ２１に対応する。また、変数「ｎ」に代入される０～Ｎは、学習器ＬＥが識別（分類）する各クラスに対応する。例えば、上記式（１）に対応する学習器ＬＥは、Ｎ個のクラスを識別することを示す。例えば、各クラスには、対象の数を示す「１個」や「２個」等が各々対応する。 Here, "x" in the above formula (1) and the following formulas (2) to (3) indicates an image. For example, in the example shown in FIG. 8, "x" in the above formula (1) and the following formulas (2) to (3) corresponds to the image IM21. Further, 0 to N assigned to the variable "n" correspond to each class identified (classified) by the learner LE. For example, the learner LE corresponding to the above equation (1) indicates that N classes are identified. For example, each class corresponds to "1", "2", etc., which indicate the number of objects.

また、上記式（１）や下記の式（３）中の「ｔ_ｎ（ｘ）」は、画像ＩＭ２１におけるクラスｎ（０～Ｎのいずれか）に対応する対象の数の確率を示す。例えば、上記式（１）中の「ｔ_ｎ（ｘ）」は、正解情報ＲＯ２１に示すような、クラスｎに対応する対象の数の確率を示す。この場合、例えば、クラス０に対応する対象の数を「０個」とした場合、「ｔ_０（ｘ）」は、「０（０％）」となる。また、例えば、クラス２に対応する対象の数を「２個」とした場合、「ｔ_２（ｘ）」は、「１（１００％）」となる。 Further, "t _n (x)" in the above equation (1) and the following equation (3) indicates the probability of the number of objects corresponding to the class n (any of 0 to N) in the image IM21. For example, "t _n (x)" in the above equation (1) indicates the probability of the number of objects corresponding to the class n as shown in the correct answer information RO21. In this case, for example, when the number of objects corresponding to class 0 is "0", "t ₀ (x)" becomes "0 (0%)". Further, for example, when the number of objects corresponding to class 2 is "2", "t ₂ (x)" becomes "1 (100%)".

また、上記式（１）や下記の式（２）、（３）中の「ｐ_ｎ（ｘ）」は、画像ＩＭ２１におけるクラスｎ（０～Ｎのいずれか）に対応する対象の数について、学習器ＬＥの出力に基づく確率を示す。例えば、上記式（１）中の「ｐ_ｎ（ｘ）」は、出力情報ＯＣ２１－１に示すような、学習器ＬＥが出力するクラスｎに対応する対象の数の確率を示す。この場合、例えば、クラス１に対応する対象の数を「１個」とした場合、「ｐ_１（ｘ）」は、「０．３（３０％）」となる。 Further, " _pn (x)" in the above equation (1) and the following equations (2) and (3) is the number of objects corresponding to the class n (any of 0 to N) in the image IM21. The probability based on the output of the learner LE is shown. For example, " _pn (x)" in the above equation (1) indicates the probability of the number of objects corresponding to the class n output by the learner LE as shown in the output information OC21-1. In this case, for example, when the number of objects corresponding to class 1 is "1", "p ₁ (x)" becomes "0.3 (30%)".

また、上記式（１）中の「ｐ_ｎ（ｘ）」は、ｘに対するクラスｎの確率で以下の式（２）に示すようなＳｏｆｔｍａｘ関数で定義される。 Further, " _pn (x)" in the above equation (1) is defined by the Softmax function as shown in the following equation (2) with a probability of class n with respect to x.

上記式（２）の関数「ｆ_ｎ」は、ＣＮＮ（学習器ＬＥ）が出力するクラスｎのスコアである。「θ」は、ＣＮＮ（学習器ＬＥ）のパラメータである。また、関数「ｅｘｐ」は、指数関数（exponential function）である。この場合、上記式（１）に示す誤差関数Ｌ（１）の勾配は、下記の式（３）により算出される。 The function "f _n " in the above equation (2) is a score of the class n output by the CNN (learner LE). "Θ" is a parameter of CNN (learner LE). Further, the function "exp" is an exponential function. In this case, the gradient of the error function L (1) shown in the above equation (1) is calculated by the following equation (3).

上記式（３）に示すように、１～Ｎまでの全クラスにおいて、ｐ_ｎ（ｘ）＝ｔ_ｎ（ｘ）である場合、誤差関数Ｌ（ｘ）の勾配は０になり極値になる。例えば、学習部１３２は、誤差関数Ｌ（ｘ）の勾配が０になるように、フィードバック処理を行う（ステップＳ２４）。例えば、学習部１３２が上述のような処理を繰り返すことにより、学習器ＬＥは、入力された画像における対象の数に関する情報を適切に出力することができる。なお、図８は、学習器ＬＥの出力を正解情報ＲＯ２１に近づけるために、誤差関数Ｌ等を最小化するように処理を繰り返すことを視覚的に示すためのものであり、学習器ＬＥ内において自動で行われてもよい。 As shown in the above equation (3), when _pn (x) = t _n (x) in all the classes from 1 to N, the gradient of the error function L (x) becomes 0 and becomes an extreme value. .. For example, the learning unit 132 performs feedback processing so that the gradient of the error function L (x) becomes 0 (step S24). For example, when the learning unit 132 repeats the above-mentioned processing, the learning device LE can appropriately output information regarding the number of objects in the input image. Note that FIG. 8 is for visually showing that the process is repeated so as to minimize the error function L and the like in order to bring the output of the learner LE closer to the correct answer information RO21. It may be done automatically.

次に、図９を用いて説明する。図９の例では、判定装置１００が全体を含む２個の人の顔と、半分を含む１個の人の顔を含む画像ＩＭ３１を教師データとして用いる場合を示す。すなわち、図９の例では、判定装置１００は、２．５個の人の顔を含む画像ＩＭ３１を教師データとして用いる。 Next, it will be described with reference to FIG. In the example of FIG. 9, a case where the determination device 100 uses the image IM31 including the faces of two people including the whole and the face of one person including half as the teacher data is shown. That is, in the example of FIG. 9, the determination device 100 uses the image IM 31 including 2.5 human faces as teacher data.

上述のように、図９の例では、判定装置１００は、２．５個の人の顔を含む画像ＩＭ３１と画像ＩＭ３１に含まれる顔の数を示す情報ＲＯ３１（以下、「正解情報ＲＯ３１」と記載する場合がある）との組み合わせを教師データとして取得する（ステップＳ３１）。図８では、正解情報ＲＯ３１は、画像ＩＭ３１に含まれる顔の数が２．５個であることを示す情報を含む。具体的には、正解情報ＲＯ３１は、画像ＩＭ３１に含まれる顔の数が２．５個であるため、画像ＩＭ３１に含まれる顔の数が２個である確率が「０．５（５０％）」であり、画像ＩＭ３１に含まれる顔の数が３個である確率が「０．５（５０％）」であることを示す情報を含む。 As described above, in the example of FIG. 9, the determination device 100 includes an image IM 31 including 2.5 human faces and an information RO 31 indicating the number of faces included in the image IM 31 (hereinafter referred to as “correct answer information RO 31”). The combination with (may be described) is acquired as teacher data (step S31). In FIG. 8, the correct answer information RO31 includes information indicating that the number of faces included in the image IM31 is 2.5. Specifically, since the correct answer information RO31 has 2.5 faces included in the image IM31, the probability that the number of faces included in the image IM31 is 2 is "0.5 (50%)). , And includes information indicating that the probability that the number of faces included in the image IM31 is three is "0.5 (50%)".

例えば、上述した正解情報ＲＯ３１に示すような、画像ＩＭ３１に含まれる顔の数が２個である確率が「０．５（５０％）」であり、画像ＩＭ３１に含まれる顔の数が３個である確率が「０．５（５０％）」であることを示す情報は、「２．５（個）＝２（個）×０．５＋３（個）×０．５」を満たす。このように、学習を行った場合、判定装置１００は、人の顔が部分的に含まれる場合であっても、各数に対応するクラスの確率と各数とを用いて、画像に含まれる対象の数を適切に推定することができる。 For example, as shown in the above-mentioned correct answer information RO31, the probability that the number of faces included in the image IM31 is 2 is "0.5 (50%)", and the number of faces included in the image IM31 is 3. The information indicating that the probability of being "0.5 (50%)" satisfies "2.5 (pieces) = 2 (pieces) x 0.5 + 3 (pieces) x 0.5". In this way, when learning is performed, the determination device 100 is included in the image using the probability of the class corresponding to each number and each number even when the human face is partially included. The number of objects can be estimated appropriately.

そして、学習器ＬＥには、２個の人の顔を含む画像ＩＭ３１が入力される（ステップＳ３２）。その後、出力情報ＯＣ３１－１に示すような対象の各数の確率を示す情報が学習器ＬＥから出力される（ステップＳ３３）。図９では、画像ＩＭ３１に含まれる顔の数が０個である確率が「０（０％）」であり、１個である確率が「０．１（１０％）」であり、２個である確率が「０．４（４０％）」であり、３個である確率が「０．４（４０％）」であり、４個以上である確率が「０．１（１０％）」であることを示す出力情報ＯＣ３１－１が、学習器ＬＥから出力される。 Then, an image IM 31 including the faces of two people is input to the learner LE (step S32). After that, information indicating the probability of each number of objects as shown in the output information OC31-1 is output from the learner LE (step S33). In FIG. 9, the probability that the number of faces included in the image IM31 is 0 is "0 (0%)", the probability that the number is 1 is "0.1 (10%)", and the number is 2. A certain probability is "0.4 (40%)", a probability of 3 is "0.4 (40%)", and a probability of 4 or more is "0.1 (10%)". The output information OC31-1 indicating that there is is output from the learner LE.

図９に示す例においても、図８に示す例と同様に、学習部１３２は、上記式（１）～（３）を用いて、誤差関数Ｌ（ｘ）の勾配が０になるように、フィードバック処理を行う（ステップＳ３４）。例えば、学習部１３２が上述のような処理を繰り返すことにより、対象が複数ある場合であっても、学習器ＬＥは、入力された画像に含まれる対象の数を適切に出力することができる。なお、図９は、学習器ＬＥの出力を正解情報ＲＯ３１に近づけるために、誤差関数Ｌ等を最小化するように処理を繰り返すことを視覚的に示すためのものであり、学習器ＬＥ内において自動で行われてもよい。 In the example shown in FIG. 9, similarly to the example shown in FIG. 8, the learning unit 132 uses the above equations (1) to (3) so that the gradient of the error function L (x) becomes 0. Feedback processing is performed (step S34). For example, by repeating the above-mentioned processing by the learning unit 132, the learning device LE can appropriately output the number of objects included in the input image even when there are a plurality of objects. Note that FIG. 9 is for visually showing that the process is repeated so as to minimize the error function L and the like in order to bring the output of the learner LE closer to the correct answer information RO31. It may be done automatically.

〔７．学習処理のフロー〕
ここで、図１０を用いて、実施形態に係る判定装置１００による学習処理の手順について説明する。図１０は、実施形態に係る学習処理の一例を示すフローチャートである。 [7. Learning process flow]
Here, the procedure of the learning process by the determination device 100 according to the embodiment will be described with reference to FIG. FIG. 10 is a flowchart showing an example of the learning process according to the embodiment.

図１０に示すように、判定装置１００は、画像と画像中の対象の数に関する正解情報とを取得する（ステップＳ３０１）。図８では、判定装置１００は、２個の人の顔を含む画像ＩＭ２１と画像に含まれる人の顔の数を示す正解情報ＲＯ２１とを取得する。その後、判定装置１００は、ステップＳ３０１で取得した画像を学習器に入力する（ステップＳ３０２）。図８では、判定装置１００は、取得した画像ＩＭ２１を学習器ＬＥに入力する。 As shown in FIG. 10, the determination device 100 acquires the image and the correct answer information regarding the number of objects in the image (step S301). In FIG. 8, the determination device 100 acquires an image IM 21 including two human faces and a correct answer information RO 21 indicating the number of human faces included in the image. After that, the determination device 100 inputs the image acquired in step S301 into the learner (step S302). In FIG. 8, the determination device 100 inputs the acquired image IM21 to the learner LE.

その後、判定装置１００は、学習器の出力に基づく人の顔の各数の確率と、正解情報の人の顔の各数の確率との誤差が小さくなるように学習する（ステップＳ３０３）。図８では、判定装置１００は、学習器ＬＥの出力に基づく出力情報ＯＣ２１－１に示す人の顔の各数の確率と、正解情報ＲＯ２１に示す人の顔の各数の確率とに基づいて学習する。 After that, the determination device 100 learns so that the error between the probability of each number of human faces based on the output of the learner and the probability of each number of human faces in the correct answer information becomes small (step S303). In FIG. 8, the determination device 100 is based on the probability of each number of human faces shown in the output information OC21-1 based on the output of the learning device LE and the probability of each number of human faces shown in the correct answer information RO21. learn.

その後、判定装置１００は、所定の条件を満たす場合（ステップＳ３０４：Ｙｅｓ）、処理を終了する。例えば、判定装置１００は、学習器の出力に基づく人の顔の各数の確率と正解情報の人の顔の各数の確率との誤差が所定の閾値以内である場合や、学習を開始してから所定の時間が経過した場合に所定の条件を満たすとして、処理を終了してもよい。また、判定装置１００は、所定の条件を満たさない場合（ステップＳ３０４：Ｎｏ）、ステップＳ３０３の処理を繰り返す。例えば、判定装置１００は、学習器の出力に基づく人の顔の各数の確率と正解情報の人の顔の各数の確率との誤差が所定の閾値より大きい場合や、学習を開始してから所定の時間が経過していない場合に所定の条件を満たさないとして、ステップＳ３０３の処理を繰り返してもよい。なお、上記の学習処理は一例であり、判定装置１００は、種々の手順により学習を行ってもよい。 After that, when the predetermined condition is satisfied (step S304: Yes), the determination device 100 ends the process. For example, the determination device 100 starts learning when the error between the probability of each number of human faces based on the output of the learning device and the probability of each number of human faces in the correct answer information is within a predetermined threshold value. When a predetermined time has elapsed since then, the process may be terminated on the assumption that the predetermined condition is satisfied. Further, when the determination device 100 does not satisfy the predetermined condition (step S304: No), the determination device 100 repeats the process of step S303. For example, the determination device 100 starts learning when the error between the probability of each number of human faces based on the output of the learning device and the probability of each number of human faces in the correct answer information is larger than a predetermined threshold value. If the predetermined condition is not satisfied when the predetermined time has not elapsed, the process of step S303 may be repeated. The above learning process is an example, and the determination device 100 may perform learning by various procedures.

〔８．画像の分割〕
上記例においては、判定装置１００が画像全体に対して処理を行う例を示したが、判定装置１００は、画像を複数の範囲に分割して処理を行ってもよい。この点について、図１１を用いて説明する。図１１は、実施形態に係る判定処理の一例を示す図である。 [8. Image split]
In the above example, an example in which the determination device 100 performs processing on the entire image is shown, but the determination device 100 may divide the image into a plurality of ranges and perform processing. This point will be described with reference to FIG. FIG. 11 is a diagram showing an example of the determination process according to the embodiment.

以下、図１１を用いて、判定装置１００による画像に含まれる対象の数の判定処理について説明する。図１１に示すように、判定装置１００には、画像ＩＭ１１が入力される（ステップＳ４０）。例えば、判定装置１００は、対象として５人の人、すなわち５つの顔が写った画像ＩＭ１１を取得する。画像ＩＭ１１を取得した判定装置１００は、所定の学習器に画像ＩＭ１１を入力する（ステップＳ４１）。 Hereinafter, the determination process of the number of objects included in the image by the determination device 100 will be described with reference to FIG. As shown in FIG. 11, the image IM 11 is input to the determination device 100 (step S40). For example, the determination device 100 acquires an image IM 11 in which five people, that is, five faces, are captured as targets. The determination device 100 that has acquired the image IM 11 inputs the image IM 11 to a predetermined learning device (step S41).

ここで、判定装置１００は、画像ＩＭ１１を４つの範囲（４範囲）に分割して学習器に入力する。図１１の例では、判定装置１００は、左上の範囲に対応する画像ＩＭ１１－１、右上の範囲に対応する画像ＩＭ１１－２、左下の範囲に対応する画像ＩＭ１１－３、及び右下の範囲に対応する画像ＩＭ１１－４の４つの範囲（４範囲）に画像ＩＭ１１を分割して学習器ＬＥに入力する。なお、図１１の例では、ステップＳ４１として示すが、画像ＩＭ１１－１～ＩＭ１１－４は個別に学習器ＬＥに入力されてもよい。 Here, the determination device 100 divides the image IM 11 into four ranges (four ranges) and inputs the image IM 11 to the learner. In the example of FIG. 11, the determination device 100 includes the image IM11-1 corresponding to the upper left range, the image IM11-2 corresponding to the upper right range, the image IM11-3 corresponding to the lower left range, and the lower right range. The image IM11 is divided into four ranges (4 ranges) of the corresponding image IM11-4 and input to the learner LE. In the example of FIG. 11, although shown as step S41, the images IM11-1 to IM11-4 may be individually input to the learner LE.

例えば、画像ＩＭ１１－１～ＩＭ１１－４が入力された学習器ＬＥは、画像ＩＭ１１－１～ＩＭ１１－４の各々に含まれる対象の数を識別する処理を行う。例えば、学習器ＬＥは、対象の数を識別する処理を行う過程において、各クラス（数）に対応する特徴情報を生成する。図１１の例では、０個、１個、２個、３個、または４個以上の５つの人の顔の数（クラス）に対応する特徴情報が、画像ＩＭ１１－１～ＩＭ１１－４の各々について学習器ＬＥにより生成される。 For example, the learner LE to which the images IM11-1 to IM11-4 are input performs a process of identifying the number of objects included in each of the images IM11-1 to IM11-4. For example, the learner LE generates feature information corresponding to each class (number) in the process of identifying the number of objects. In the example of FIG. 11, the feature information corresponding to the number (class) of five faces of 0, 1, 2, 3, or 4 or more is each of the images IM11-1 to IM11-4. Is generated by the learner LE.

そこで、判定装置１００は、画像ＩＭ１１－１～ＩＭ１１－４の各々について、学習器ＬＥによる画像に含まれる人の顔の数を識別する処理の過程で生成される各特徴情報を取得する。図１１の例では、判定装置１００は、画像ＩＭ１１－１について、学習器ＬＥから人の顔の数が０個であるクラスに対応する特徴情報ＦＭ４０－１を取得する。また、図１１の例では、判定装置１００は、画像ＩＭ１１－２について、学習器ＬＥから人の顔の数が０個であるクラスに対応する特徴情報ＦＭ４０－２を取得する。また、図１１の例では、判定装置１００は、画像ＩＭ１１－３について、学習器ＬＥから人の顔の数が０個であるクラスに対応する特徴情報ＦＭ４０－３を取得する。また、図１１の例では、判定装置１００は、画像ＩＭ１１－４について、学習器ＬＥから人の顔の数が０個であるクラスに対応する特徴情報ＦＭ４０－４を取得する。なお、以下では、特徴情報ＦＭ４０－１～ＦＭ４０－４を併せて特徴情報ＦＭ４０と記載する場合がある。このように、判定装置１００は、画像ＩＭ１１について、学習器ＬＥから人の顔の数が０個であるクラスに対応する特徴情報ＦＭ４０を取得する（ステップＳ４２－０）。 Therefore, the determination device 100 acquires each feature information generated in the process of identifying the number of human faces included in the image by the learning device LE for each of the images IM11-1 to IM11-4. In the example of FIG. 11, the determination device 100 acquires the feature information FM40-1 corresponding to the class in which the number of human faces is 0 from the learning device LE for the image IM11-1. Further, in the example of FIG. 11, the determination device 100 acquires the feature information FM40-2 corresponding to the class in which the number of human faces is 0 from the learning device LE for the image IM11-2. Further, in the example of FIG. 11, the determination device 100 acquires the feature information FM40-3 corresponding to the class in which the number of human faces is 0 from the learning device LE for the image IM11-3. Further, in the example of FIG. 11, the determination device 100 acquires the feature information FM40-4 corresponding to the class in which the number of human faces is 0 from the learning device LE for the image IM11-4. In the following, the feature information FM40-1 to FM40-4 may be collectively referred to as the feature information FM40. As described above, the determination device 100 acquires the feature information FM40 corresponding to the class in which the number of human faces is 0 from the learning device LE for the image IM11 (step S42-0).

例えば、特徴情報ＦＭ４０は、画像ＩＭ１１における各画素の特徴量を示す。なお、ここでいう特徴量は、例えば、特徴量を示す数値である。具体的には、特徴情報ＦＭ４０を構成する各点（画素）の位置は、画像ＩＭ１１に重畳させた場合に画像ＩＭ１１において重なる位置に対応し、特徴情報ＦＭ４０は、画像ＩＭ１１において対応する画素の特徴量を示す。なお、図１１中の特徴情報ＦＭ４０では、特徴を示す領域を色が濃い態様で示す。すなわち、特徴情報ＦＭ４０では、特徴量が大きいほど色が濃い態様で表示される。具体的には、図１１中の特徴情報ＦＭ４０では、画像ＩＭ１１において人の顔が位置する領域が色の濃い態様で示される。なお、他の特徴情報についても同様である。 For example, the feature information FM40 indicates the feature amount of each pixel in the image IM11. The feature amount referred to here is, for example, a numerical value indicating the feature amount. Specifically, the positions of the points (pixels) constituting the feature information FM 40 correspond to the overlapping positions in the image IM 11 when superimposed on the image IM 11, and the feature information FM 40 corresponds to the characteristics of the corresponding pixels in the image IM 11. Indicates the amount. In the feature information FM40 in FIG. 11, the region showing the feature is shown in a dark color mode. That is, in the feature information FM40, the larger the feature amount, the darker the color is displayed. Specifically, in the feature information FM40 in FIG. 11, the region where the human face is located is shown in the image IM11 in a dark mode. The same applies to other feature information.

また、図１１の例では、判定装置１００は、画像ＩＭ１１－１について、学習器ＬＥから人の顔の数が１個であるクラスに対応する特徴情報ＦＭ４１－１を取得する。また、図１１の例では、判定装置１００は、画像ＩＭ１１－２について、学習器ＬＥから人の顔の数が１個であるクラスに対応する特徴情報ＦＭ４１－２を取得する。また、図１１の例では、判定装置１００は、画像ＩＭ１１－３について、学習器ＬＥから人の顔の数が１個であるクラスに対応する特徴情報ＦＭ４１－３を取得する。また、図１１の例では、判定装置１００は、画像ＩＭ１１－４について、学習器ＬＥから人の顔の数が１個であるクラスに対応する特徴情報ＦＭ４１－４を取得する。なお、以下では、特徴情報ＦＭ４１－１～ＦＭ４１－４を併せて特徴情報ＦＭ４１と記載する場合がある。このように、判定装置１００は、画像ＩＭ１１について、学習器ＬＥから人の顔の数が１個であるクラスに対応する特徴情報ＦＭ４１を取得する（ステップＳ４２－１）。 Further, in the example of FIG. 11, the determination device 100 acquires the feature information FM41-1 corresponding to the class in which the number of human faces is one from the learning device LE for the image IM11-1. Further, in the example of FIG. 11, the determination device 100 acquires the feature information FM41-2 corresponding to the class in which the number of human faces is one from the learning device LE for the image IM11-2. Further, in the example of FIG. 11, the determination device 100 acquires the feature information FM41-3 corresponding to the class in which the number of human faces is one from the learning device LE for the image IM11-3. Further, in the example of FIG. 11, the determination device 100 acquires the feature information FM41-4 corresponding to the class in which the number of human faces is one from the learning device LE for the image IM11-4. In the following, the feature information FM41-1 to FM41-4 may be collectively referred to as the feature information FM41. As described above, the determination device 100 acquires the feature information FM41 corresponding to the class in which the number of human faces is one from the learning device LE for the image IM11 (step S42-1).

また、図１１の例では、判定装置１００は、画像ＩＭ１１－１について、学習器ＬＥから人の顔の数が２個であるクラスに対応する特徴情報ＦＭ４２－１を取得する。また、図１１の例では、判定装置１００は、画像ＩＭ１１－２について、学習器ＬＥから人の顔の数が２個であるクラスに対応する特徴情報ＦＭ４２－２を取得する。また、図１１の例では、判定装置１００は、画像ＩＭ１１－３について、学習器ＬＥから人の顔の数が２個であるクラスに対応する特徴情報ＦＭ４２－３を取得する。また、図１１の例では、判定装置１００は、画像ＩＭ１１－４について、学習器ＬＥから人の顔の数が２個であるクラスに対応する特徴情報ＦＭ４２－４を取得する。なお、以下では、特徴情報ＦＭ４２－１～ＦＭ４２－４を併せて特徴情報ＦＭ４２と記載する場合がある。このように、判定装置１００は、画像ＩＭ１１について、学習器ＬＥから人の顔の数が２個であるクラスに対応する特徴情報ＦＭ４２を取得する（ステップＳ４２－２）。 Further, in the example of FIG. 11, the determination device 100 acquires the feature information FM42-1 corresponding to the class in which the number of human faces is two from the learning device LE for the image IM11-1. Further, in the example of FIG. 11, the determination device 100 acquires the feature information FM42-2 corresponding to the class in which the number of human faces is two from the learning device LE for the image IM11-2. Further, in the example of FIG. 11, the determination device 100 acquires the feature information FM42-3 corresponding to the class in which the number of human faces is two from the learning device LE for the image IM11-3. Further, in the example of FIG. 11, the determination device 100 acquires the feature information FM42-4 corresponding to the class in which the number of human faces is two from the learning device LE for the image IM11-4. In the following, the feature information FM42-1 to FM42-4 may be collectively referred to as the feature information FM42. As described above, the determination device 100 acquires the feature information FM42 corresponding to the class in which the number of human faces is two from the learning device LE for the image IM11 (step S42-2).

また、図１１の例では、判定装置１００は、画像ＩＭ１１－１について、学習器ＬＥから人の顔の数が３個であるクラスに対応する特徴情報ＦＭ４３－１を取得する。また、図１１の例では、判定装置１００は、画像ＩＭ１１－２について、学習器ＬＥから人の顔の数が３個であるクラスに対応する特徴情報ＦＭ４３－２を取得する。また、図１１の例では、判定装置１００は、画像ＩＭ１１－３について、学習器ＬＥから人の顔の数が３個であるクラスに対応する特徴情報ＦＭ４３－３を取得する。また、図１１の例では、判定装置１００は、画像ＩＭ１１－４について、学習器ＬＥから人の顔の数が３個であるクラスに対応する特徴情報ＦＭ４３－４を取得する。なお、以下では、特徴情報ＦＭ４３－１～ＦＭ４３－４を併せて特徴情報ＦＭ４３と記載する場合がある。このように、判定装置１００は、画像ＩＭ１１について、学習器ＬＥから人の顔の数が３個であるクラスに対応する特徴情報ＦＭ４３を取得する（ステップＳ４２－３）。 Further, in the example of FIG. 11, the determination device 100 acquires the feature information FM43-1 corresponding to the class in which the number of human faces is three from the learning device LE for the image IM11-1. Further, in the example of FIG. 11, the determination device 100 acquires the feature information FM43-2 corresponding to the class in which the number of human faces is three from the learning device LE for the image IM11-2. Further, in the example of FIG. 11, the determination device 100 acquires the feature information FM43-3 corresponding to the class in which the number of human faces is three from the learning device LE for the image IM11-3. Further, in the example of FIG. 11, the determination device 100 acquires the feature information FM43-4 corresponding to the class in which the number of human faces is three from the learning device LE for the image IM11-4. In the following, the feature information FM43-1 to FM43-4 may be collectively referred to as the feature information FM43. As described above, the determination device 100 acquires the feature information FM43 corresponding to the class in which the number of human faces is three from the learning device LE for the image IM11 (step S42-3).

また、図１１の例では、判定装置１００は、画像ＩＭ１１－１について、学習器ＬＥから人の顔の数が４個以上であるクラスに対応する特徴情報ＦＭ４４－１を取得する。また、図１１の例では、判定装置１００は、画像ＩＭ１１－２について、学習器ＬＥから人の顔の数が４個以上であるクラスに対応する特徴情報ＦＭ４４－２を取得する。また、図１１の例では、判定装置１００は、画像ＩＭ１１－３について、学習器ＬＥから人の顔の数が４個以上であるクラスに対応する特徴情報ＦＭ４４－３を取得する。また、図１１の例では、判定装置１００は、画像ＩＭ１１－４について、学習器ＬＥから人の顔の数が４個以上であるクラスに対応する特徴情報ＦＭ４４－４を取得する。なお、以下では、特徴情報ＦＭ４４－１～ＦＭ４４－４を併せて特徴情報ＦＭ４４と記載する場合がある。このように、判定装置１００は、画像ＩＭ１１について、学習器ＬＥから人の顔の数が４個以上であるクラスに対応する特徴情報ＦＭ４４を取得する（ステップＳ４２－４）。以下、ステップＳ４２－０～Ｓ４２－４を区別せずに説明する場合、ステップＳ４２と記載する場合がある。 Further, in the example of FIG. 11, the determination device 100 acquires the feature information FM44-1 corresponding to the class in which the number of human faces is 4 or more from the learning device LE for the image IM11-1. Further, in the example of FIG. 11, the determination device 100 acquires the feature information FM44-2 corresponding to the class in which the number of human faces is 4 or more from the learning device LE for the image IM11-2. Further, in the example of FIG. 11, the determination device 100 acquires the feature information FM44-3 corresponding to the class in which the number of human faces is 4 or more from the learning device LE for the image IM11-3. Further, in the example of FIG. 11, the determination device 100 acquires the feature information FM44-4 corresponding to the class in which the number of human faces is 4 or more from the learning device LE for the image IM11-4. In the following, the feature information FM44-1 to FM44-4 may be collectively referred to as the feature information FM44. As described above, the determination device 100 acquires the feature information FM44 corresponding to the class in which the number of human faces is 4 or more from the learning device LE for the image IM11 (step S42-4). Hereinafter, when steps S42-0 to S42-4 are described without distinction, they may be described as step S42.

その後、判定装置１００は、ステップＳ４２において取得した特徴情報ＦＭ４０～ＦＭ４４の各々についてスコアを算出する。例えば、判定装置１００は、特徴情報ＦＭ４０について、特徴情報ＦＭ４０－１～ＦＭ４０－４の各々のスコアを算出する。例えば、判定装置１００は、特徴情報ＦＭ４０－１における特徴量に基づいて、特徴情報ＦＭ４０－１のスコアを算出する。また、例えば、判定装置１００は、特徴情報ＦＭ４０－２における特徴量に基づいて、特徴情報ＦＭ４０－２のスコアを算出する。例えば、判定装置１００は、特徴情報ＦＭ４０－３における特徴量に基づいて、特徴情報ＦＭ４０－３のスコアを算出する。また、例えば、判定装置１００は、特徴情報ＦＭ４０－４における特徴量に基づいて、特徴情報ＦＭ４０－４のスコアを算出する。このように、判定装置１００は、特徴情報ＦＭ４０－１～ＦＭ４０－４の各々における特徴量に基づいて、特徴情報ＦＭ４０－１～ＦＭ４０－４の各々のスコアを算出する（ステップＳ４３－０）。 After that, the determination device 100 calculates a score for each of the feature information FM40 to FM44 acquired in step S42. For example, the determination device 100 calculates the scores of the feature information FM40-1 to FM40-4 for the feature information FM40. For example, the determination device 100 calculates the score of the feature information FM40-1 based on the feature amount in the feature information FM40-1. Further, for example, the determination device 100 calculates the score of the feature information FM40-2 based on the feature amount in the feature information FM40-2. For example, the determination device 100 calculates the score of the feature information FM40-3 based on the feature amount in the feature information FM40-3. Further, for example, the determination device 100 calculates the score of the feature information FM40-4 based on the feature amount in the feature information FM40-4. As described above, the determination device 100 calculates the score of each of the feature information FM40-1 to FM40-4 based on the feature amount in each of the feature information FM40-1 to FM40-4 (step S43-0).

図１１の例では、判定装置１００は、スコア情報ＳＣ４０に示すように、人の顔の数「０個」について、特徴情報ＦＭ４０－１のスコアを「０」、特徴情報ＦＭ４０－２のスコアを「０」、特徴情報ＦＭ４０－３のスコアを「０．９」、及び特徴情報ＦＭ４０－４のスコアを「０．８」と算出する。 In the example of FIG. 11, as shown in the score information SC40, the determination device 100 sets the score of the feature information FM40-1 to "0" and the score of the feature information FM40-2 for the number of human faces "0". The score of "0", the score of the feature information FM40-3 is calculated as "0.9", and the score of the feature information FM40-4 is calculated as "0.8".

また、例えば、判定装置１００は、特徴情報ＦＭ４１－１～ＦＭ４１－４の各々における特徴量に基づいて、特徴情報ＦＭ４１－１～ＦＭ４１－４の各々のスコアを算出する（ステップＳ４３－１）。図１１の例では、判定装置１００は、スコア情報ＳＣ４１に示すように、人の顔の数「１個」について、特徴情報ＦＭ４１－１のスコアを「０」、特徴情報ＦＭ４１－２のスコアを「０」、特徴情報ＦＭ４１－３のスコアを「０．０５」、及び特徴情報ＦＭ４１－４のスコアを「０．１」と算出する。 Further, for example, the determination device 100 calculates the score of each of the feature information FM41-1 to FM41-4 based on the feature amount in each of the feature information FM41-1 to FM41-4 (step S43-1). In the example of FIG. 11, as shown in the score information SC41, the determination device 100 sets the score of the feature information FM41-1 to "0" and the score of the feature information FM41-2 for the number of human faces "1". The score of "0", the score of the feature information FM41-3 is calculated as "0.05", and the score of the feature information FM41-4 is calculated as "0.1".

また、例えば、判定装置１００は、特徴情報ＦＭ４２－１～ＦＭ４２－４の各々における特徴量に基づいて、特徴情報ＦＭ４２－１～ＦＭ４２－４の各々のスコアを算出する（ステップＳ４３－２）。図１１の例では、判定装置１００は、スコア情報ＳＣ４２に示すように、人の顔の数「２個」について、特徴情報ＦＭ４２－１のスコアを「０．２」、特徴情報ＦＭ４２－２のスコアを「０．８」、特徴情報ＦＭ４２－３のスコアを「０」、及び特徴情報ＦＭ４２－４のスコアを「０」と算出する。 Further, for example, the determination device 100 calculates the score of each of the feature information FM42-1 to FM42-4 based on the feature amount in each of the feature information FM42-1 to FM42-4 (step S43-2). In the example of FIG. 11, as shown in the score information SC42, the determination device 100 sets the score of the feature information FM42-1 to "0.2" and the score of the feature information FM42-2 for the number of human faces "2". The score is calculated as "0.8", the score of the feature information FM42-3 is calculated as "0", and the score of the feature information FM42-4 is calculated as "0".

また、例えば、判定装置１００は、特徴情報ＦＭ４３－１～ＦＭ４３－４の各々における特徴量に基づいて、特徴情報ＦＭ４３－１～ＦＭ４３－４の各々のスコアを算出する（ステップＳ４３－３）。図１１の例では、判定装置１００は、スコア情報ＳＣ４３に示すように、人の顔の数「３個」について、特徴情報ＦＭ４３－１のスコアを「０．８」、特徴情報ＦＭ４３－２のスコアを「０．２」、特徴情報ＦＭ４３－３のスコアを「０」、及び特徴情報ＦＭ４３－４のスコアを「０」と算出する。 Further, for example, the determination device 100 calculates the score of each of the feature information FM43-1 to FM43-4 based on the feature amount in each of the feature information FM43-1 to FM43-4 (step S43-3). In the example of FIG. 11, as shown in the score information SC43, the determination device 100 sets the score of the feature information FM43-1 to "0.8" and the score of the feature information FM43-2 for the number of human faces "3". The score is calculated as "0.2", the score of the feature information FM43-3 is calculated as "0", and the score of the feature information FM43-4 is calculated as "0".

また、例えば、判定装置１００は、特徴情報ＦＭ４４－１～ＦＭ４４－４の各々における特徴量に基づいて、特徴情報ＦＭ４４－１～ＦＭ４４－４の各々のスコアを算出する（ステップＳ４３－４）。図１１の例では、判定装置１００は、スコア情報ＳＣ４４に示すように、人の顔の数「４個以上」について、特徴情報ＦＭ４４－１のスコアを「０」、特徴情報ＦＭ４４－２のスコアを「０」、特徴情報ＦＭ４４－３のスコアを「０」、及び特徴情報ＦＭ４４－４のスコアを「０」と算出する。 Further, for example, the determination device 100 calculates the score of each of the feature information FM44-1 to FM44-4 based on the feature amount in each of the feature information FM44-1 to FM44-4 (step S43-4). In the example of FIG. 11, as shown in the score information SC44, the determination device 100 sets the score of the feature information FM44-1 to "0" and the score of the feature information FM44-2 for the number of human faces "4 or more". Is calculated as “0”, the score of the feature information FM44-3 is calculated as “0”, and the score of the feature information FM44-4 is calculated as “0”.

その後、判定装置１００は、以下の式（４）により、各範囲に含まれる人の顔の数を判定する。 After that, the determination device 100 determines the number of human faces included in each range by the following equation (4).

上記の式（４）における、「Ｎｕｍ」は、判定される人の顔の数を示す。また、右辺中の「ｐ_ｃ」は、各数のスコア（確率）に対応する。また、右辺中の「ｎ_ｃ」は、各数（クラス）に対応する。「ｃ」は、０個、１個、２個、３個、または４個以上の５つの顔の数（クラス）に対応する。 In the above formula (4), "Num" indicates the number of faces of the person to be determined. Further, " _pc " in the right side corresponds to each number of scores (probabilities). Further, " _nc " in the right side corresponds to each number (class). “C” corresponds to the number (class) of five faces of 0, 1, 2, 3, or 4 or more.

ここで、画像ＩＭ１１における左上の範囲に対応する画像ＩＭ１１－１に含まれる人の顔の数の判定について説明する。画像ＩＭ１１－１に含まれる人の顔の数は、上記式（４）を用いて、以下のような式（５）のように算出される。 Here, the determination of the number of human faces included in the image IM11-1 corresponding to the upper left range in the image IM11 will be described. The number of human faces included in the image IM11-1 is calculated by the following formula (5) using the above formula (4).

２．８＝０×０＋０×１＋０．２×２＋０．８×３＋０×４ …（５） 2.8 = 0 x 0 + 0 x 1 + 0.2 x 2 + 0.8 x 3 + 0 x 4 ... (5)

上記式（５）の右辺中の第１項は、０個のクラスに対応し、顔の数「０」とそのスコア「０」を乗算した項となる。また、上記式（５）の右辺中の第２項は、１個のクラスに対応し、顔の数「１」とそのスコア「０」を乗算した項となる。また、上記式（５）の右辺中の第３項は、２個のクラスに対応し、顔の数「２」とそのスコア「０．２」を乗算した項となる。また、上記式（５）の右辺中の第４項は、３個のクラスに対応し、顔の数「３」とそのスコア「０．８」を乗算した項となる。また、上記式（５）の右辺中の第５項は、４個以上のクラスに対応し、顔の数「４」とそのスコア「０」を乗算した項となる。 The first term in the right-hand side of the above equation (5) corresponds to 0 classes and is a term obtained by multiplying the number of faces "0" by the score "0". Further, the second term in the right side of the above equation (5) corresponds to one class and is a term obtained by multiplying the number of faces "1" and the score "0". Further, the third term in the right side of the above equation (5) corresponds to two classes and is a term obtained by multiplying the number of faces "2" and the score "0.2". Further, the fourth term in the right side of the above equation (5) corresponds to three classes and is a term obtained by multiplying the number of faces "3" and the score "0.8". Further, the fifth term in the right side of the above equation (5) corresponds to four or more classes, and is a term obtained by multiplying the number of faces "4" and the score "0".

上記式（５）により、判定装置１００は、画像ＩＭ１１－１に含まれる人の顔の数を「２．８」個と判定する。 According to the above formula (5), the determination device 100 determines that the number of human faces included in the image IM11-1 is "2.8".

また、画像ＩＭ１１における右上の範囲に対応する画像ＩＭ１１－２に含まれる人の顔の数の判定について説明する。画像ＩＭ１１－２に含まれる人の顔の数は、上記式（４）を用いて、以下のような式（６）のように算出される。 Further, the determination of the number of human faces included in the image IM11-2 corresponding to the upper right range in the image IM11 will be described. The number of human faces included in the image IM11-2 is calculated by the following formula (6) using the above formula (4).

２．２＝０×０＋０×１＋０．８×２＋０．２×３＋０×４ …（６） 2.2 = 0 x 0 + 0 x 1 + 0.8 x 2 + 0.2 x 3 + 0 x 4 ... (6)

上記式（６）により、判定装置１００は、画像ＩＭ１１－２に含まれる人の顔の数を「２．２」個と判定する。なお、画像ＩＭ１１－３や画像ＩＭ１１－４は、人の顔の数が１個以上である場合に対応する特徴情報のスコアが所定の閾値未満であるとして、上記式（４）による数の判定（算出）を行わなくてもよい。 According to the above formula (6), the determination device 100 determines that the number of human faces included in the image IM11-2 is "2.2". In the image IM11-3 and the image IM11-4, it is assumed that the score of the feature information corresponding to the case where the number of human faces is one or more is less than a predetermined threshold value, and the number is determined by the above equation (4). It is not necessary to perform (calculation).

このように、図１１の例では、判定装置１００は、画像ＩＭ１１の左上の範囲に２．８個の人の顔が含まれ、画像ＩＭ１１の右上の範囲に２．２個の人の顔が含まれると判定する。したがって、判定装置１００は、画像ＩＭ１１において人の顔が含まれる位置を適切に判定することができる。 As described above, in the example of FIG. 11, the determination device 100 includes 2.8 human faces in the upper left range of the image IM11 and 2.2 human faces in the upper right range of the image IM11. Judged to be included. Therefore, the determination device 100 can appropriately determine the position including the human face in the image IM11.

また、判定装置１００は、上記式（５）により算出された顔の数「２．８」個と、上記式（６）により算出された顔の数「２．２」個とを合算することにより、画像ＩＭ１１全体に含まれる人の顔の数は、「５」個であると判定する。このように、判定装置１００は、各範囲に対応する画像ＩＭ１１－１～ＩＭ１１－４について判定した人の顔の数に基づいて、画像ＩＭ１１全体に含まれる人の顔の数を判定することができる。 Further, the determination device 100 adds up the number of faces "2.8" calculated by the above formula (5) and the number of faces "2.2" calculated by the above formula (6). Therefore, it is determined that the number of human faces included in the entire image IM11 is "5". As described above, the determination device 100 can determine the number of human faces included in the entire image IM11 based on the number of human faces determined for the images IM11-1 to IM11-4 corresponding to each range. can.

なお、図１１の例では、画像を左上、右上、左下、右下の４エリアに分割する例を示したが、例えば９エリアや１６エリア等の種々のエリアに分割してもよい。 In the example of FIG. 11, the image is divided into four areas of upper left, upper right, lower left, and lower right, but the image may be divided into various areas such as 9 areas and 16 areas.

〔９．効果〕
上述してきたように、実施形態に係る判定装置１００は、取得部１３１と、判定部１３４とを有する。取得部１３１は、画像中の対象の数を識別するニューラルネットワーク（実施形態においては「学習器ＬＥ」。以下同じ）に入力された入力画像（実施形態においては「画像ＩＭ１１」。以下同じ）に基づく複数の特徴情報であって、ニューラルネットワークが識別する各数に対応する複数の特徴情報を取得する。判定部１３４は、取得部１３１により取得された複数の特徴情報に基づいて、入力画像に含まれる対象の数を判定する。 [9. effect〕
As described above, the determination device 100 according to the embodiment has an acquisition unit 131 and a determination unit 134. The acquisition unit 131 is used for an input image (“image IM11”; the same applies hereinafter) input to a neural network (“learner LE”; the same applies hereinafter in the embodiment) that identifies the number of objects in the image. It is a plurality of feature information based on, and a plurality of feature information corresponding to each number identified by the neural network is acquired. The determination unit 134 determines the number of objects included in the input image based on the plurality of feature information acquired by the acquisition unit 131.

これにより、実施形態に係る判定装置１００は、ニューラルネットワークにおける情報を用いて画像に含まれる対象の数を適切に判定することができる。 Thereby, the determination device 100 according to the embodiment can appropriately determine the number of objects included in the image by using the information in the neural network.

また、実施形態に係る判定装置１００において、判定部１３４は、判定した対象の数に対応する特徴情報における特徴量に関する情報に基づいて、入力画像における対象の位置を判定する。 Further, in the determination device 100 according to the embodiment, the determination unit 134 determines the position of the target in the input image based on the information regarding the feature amount in the feature information corresponding to the number of the determined targets.

これにより、実施形態に係る判定装置１００は、判定した対象の顔の数に対応する特徴情報に基づいて、入力画像に含まれる人の顔の位置を判定する。したがって、判定装置１００は、画像に含まれる対象の位置を適切に判定することができる。 As a result, the determination device 100 according to the embodiment determines the position of the human face included in the input image based on the feature information corresponding to the number of determined target faces. Therefore, the determination device 100 can appropriately determine the position of the target included in the image.

また、実施形態に係る判定装置１００において、取得部１３１は、画像中の人の顔の数を識別するニューラルネットワークに入力された入力画像に基づく複数の特徴情報を取得する。 Further, in the determination device 100 according to the embodiment, the acquisition unit 131 acquires a plurality of feature information based on the input image input to the neural network that identifies the number of human faces in the image.

これにより、実施形態に係る判定装置１００は、ニューラルネットワークにおける情報を用いて画像に含まれる人の顔の数を適切に判定することができる。 Thereby, the determination device 100 according to the embodiment can appropriately determine the number of human faces included in the image by using the information in the neural network.

また、実施形態に係る判定装置１００は、算出部１３３を有する。算出部１３３は、複数の特徴情報に基づいてスコアを算出する。判定部１３４は、算出部１３３により算出されたスコアに基づいて、入力画像に含まれる対象の数を判定する。 Further, the determination device 100 according to the embodiment has a calculation unit 133. The calculation unit 133 calculates the score based on a plurality of feature information. The determination unit 134 determines the number of objects included in the input image based on the score calculated by the calculation unit 133.

これにより、実施形態に係る判定装置１００は、算出したスコアに基づいて画像に含まれる対象の数を判定することにより、ニューラルネットワークにおける情報を用いて画像に含まれる対象の数を適切に判定することができる。 As a result, the determination device 100 according to the embodiment determines the number of objects included in the image based on the calculated score, and appropriately determines the number of objects included in the image using the information in the neural network. be able to.

また、実施形態に係る判定装置１００において、算出部１３３は、各数に対応する特徴情報に基づいて、各数に対応するスコアを算出する。判定部１３４は、算出部１３３により算出された各数に対応するスコアに基づいて、入力画像に含まれる対象の数を判定する。 Further, in the determination device 100 according to the embodiment, the calculation unit 133 calculates the score corresponding to each number based on the feature information corresponding to each number. The determination unit 134 determines the number of objects included in the input image based on the score corresponding to each number calculated by the calculation unit 133.

これにより、実施形態に係る判定装置１００は、算出した各数に対応するスコアに基づいて、画像に含まれる対象の数を判定することにより、ニューラルネットワークにおける情報を用いて画像に含まれる対象の数を適切に判定することができる。 As a result, the determination device 100 according to the embodiment determines the number of objects included in the image based on the score corresponding to each calculated number, thereby using the information in the neural network to determine the objects included in the image. The number can be determined appropriately.

また、実施形態に係る判定装置１００において、取得部１３１は、畳み込み処理及びプーリング処理を行うニューラルネットワークが識別する各数に対応する複数の特徴情報を取得する。 Further, in the determination device 100 according to the embodiment, the acquisition unit 131 acquires a plurality of feature information corresponding to each number identified by the neural network that performs the convolution process and the pooling process.

これにより、実施形態に係る判定装置１００は、畳み込み処理及びプーリング処理を行うニューラルネットワークにおける情報を用いて画像に含まれる人の顔の数を適切に判定することができる。 Thereby, the determination device 100 according to the embodiment can appropriately determine the number of human faces included in the image by using the information in the neural network that performs the convolution process and the pooling process.

また、実施形態に係る判定装置１００において、取得部１３１は、全結合層を含まないニューラルネットワークが識別する各数に対応する複数の特徴情報を取得する。 Further, in the determination device 100 according to the embodiment, the acquisition unit 131 acquires a plurality of feature information corresponding to each number identified by the neural network that does not include the fully connected layer.

これにより、実施形態に係る判定装置１００は、全結合層を含まないニューラルネットワーク（ＦＣＮ）における情報を用いて画像に含まれる人の顔の数を適切に判定することができる。 Thereby, the determination device 100 according to the embodiment can appropriately determine the number of human faces included in the image by using the information in the neural network (FCN) that does not include the fully connected layer.

また、実施形態に係る判定装置１００において、判定部１３４は、各特徴情報に含まれる複数の領域の各々の特徴量に関する情報に基づいて、入力画像に含まれる対象の位置を判定する。 Further, in the determination device 100 according to the embodiment, the determination unit 134 determines the position of the target included in the input image based on the information regarding the feature amounts of the plurality of regions included in the feature information.

これにより、実施形態に係る判定装置１００は、各特徴情報に含まれる複数の領域の各々の特徴量に関する情報に基づいて、入力画像に含まれる対象の位置を判定することにより、ニューラルネットワークにおける情報を用いて画像に含まれる対象の数を適切に判定する。したがって、判定装置１００は、画像に含まれる対象の位置を適切に判定することができる。 As a result, the determination device 100 according to the embodiment determines the position of the target included in the input image based on the information regarding the feature amounts of the plurality of regions included in the feature information, thereby providing information in the neural network. Is used to appropriately determine the number of objects included in the image. Therefore, the determination device 100 can appropriately determine the position of the target included in the image.

〔１０．ハードウェア構成〕
上述してきた実施形態に係る判定装置１００は、例えば図１２に示すような構成のコンピュータ１０００によって実現される。図１２は、判定装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。コンピュータ１０００は、ＣＰＵ１１００、ＲＡＭ１２００、ＲＯＭ１３００、ＨＤＤ１４００、通信インターフェイス（Ｉ／Ｆ）１５００、入出力インターフェイス（Ｉ／Ｆ）１６００、及びメディアインターフェイス（Ｉ／Ｆ）１７００を有する。 [10. Hardware configuration]
The determination device 100 according to the above-described embodiment is realized by, for example, a computer 1000 having a configuration as shown in FIG. FIG. 12 is a hardware configuration diagram showing an example of a computer that realizes the function of the determination device. The computer 1000 has a CPU 1100, a RAM 1200, a ROM 1300, an HDD 1400, a communication interface (I / F) 1500, an input / output interface (I / F) 1600, and a media interface (I / F) 1700.

ＣＰＵ１１００は、ＲＯＭ１３００またはＨＤＤ１４００に格納されたプログラムに基づいて動作し、各部の制御を行う。ＲＯＭ１３００は、コンピュータ１０００の起動時にＣＰＵ１１００によって実行されるブートプログラムや、コンピュータ１０００のハードウェアに依存するプログラム等を格納する。 The CPU 1100 operates based on a program stored in the ROM 1300 or the HDD 1400, and controls each part. The ROM 1300 stores a boot program executed by the CPU 1100 when the computer 1000 is started, a program depending on the hardware of the computer 1000, and the like.

ＨＤＤ１４００は、ＣＰＵ１１００によって実行されるプログラム、及び、かかるプログラムによって使用されるデータ等を格納する。通信インターフェイス１５００は、ネットワークＮを介して他の機器からデータを受信してＣＰＵ１１００へ送り、ＣＰＵ１１００が生成したデータをネットワークＮを介して他の機器へ送信する。 The HDD 1400 stores a program executed by the CPU 1100, data used by such a program, and the like. The communication interface 1500 receives data from another device via the network N and sends it to the CPU 1100, and transmits the data generated by the CPU 1100 to the other device via the network N.

ＣＰＵ１１００は、入出力インターフェイス１６００を介して、ディスプレイやプリンタ等の出力装置、及び、キーボードやマウス等の入力装置を制御する。ＣＰＵ１１００は、入出力インターフェイス１６００を介して、入力装置からデータを取得する。また、ＣＰＵ１１００は、生成したデータを入出力インターフェイス１６００を介して出力装置へ出力する。 The CPU 1100 controls an output device such as a display or a printer, and an input device such as a keyboard or a mouse via the input / output interface 1600. The CPU 1100 acquires data from the input device via the input / output interface 1600. Further, the CPU 1100 outputs the generated data to the output device via the input / output interface 1600.

メディアインターフェイス１７００は、記録媒体１８００に格納されたプログラムまたはデータを読み取り、ＲＡＭ１２００を介してＣＰＵ１１００に提供する。ＣＰＵ１１００は、かかるプログラムを、メディアインターフェイス１７００を介して記録媒体１８００からＲＡＭ１２００上にロードし、ロードしたプログラムを実行する。記録媒体１８００は、例えばＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等である。 The media interface 1700 reads a program or data stored in the recording medium 1800 and provides the program or data to the CPU 1100 via the RAM 1200. The CPU 1100 loads the program from the recording medium 1800 onto the RAM 1200 via the media interface 1700, and executes the loaded program. The recording medium 1800 is, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or PD (Phase change rewritable Disk), a magneto-optical recording medium such as MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory. And so on.

例えば、コンピュータ１０００が実施形態に係る判定装置１００として機能する場合、コンピュータ１０００のＣＰＵ１１００は、ＲＡＭ１２００上にロードされたプログラムを実行することにより、制御部１３０の機能を実現する。コンピュータ１０００のＣＰＵ１１００は、これらのプログラムを記録媒体１８００から読み取って実行するが、他の例として、他の装置からネットワークＮを介してこれらのプログラムを取得してもよい。 For example, when the computer 1000 functions as the determination device 100 according to the embodiment, the CPU 1100 of the computer 1000 realizes the function of the control unit 130 by executing the program loaded on the RAM 1200. The CPU 1100 of the computer 1000 reads and executes these programs from the recording medium 1800, but as another example, these programs may be acquired from another device via the network N.

以上、本願の実施形態を図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の行に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 The embodiments of the present application have been described in detail with reference to the drawings, but these are examples, and various modifications and improvements are made based on the knowledge of those skilled in the art, including the embodiments described in the disclosure line of the invention. It is possible to carry out the present invention in other forms described above.

〔１１．その他〕
また、上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 [11. others〕
Further, among the processes described in the above-described embodiment, all or a part of the processes described as being automatically performed can be manually performed, or the processes described as being manually performed can be performed. All or part of it can be done automatically by a known method. In addition, information including processing procedures, specific names, various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified. For example, the various information shown in each figure is not limited to the information shown in the figure.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Further, each component of each of the illustrated devices is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or part of them may be functionally or physically distributed / physically in any unit according to various loads and usage conditions. Can be integrated and configured.

また、上述してきた実施形態は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Further, the above-described embodiments can be appropriately combined as long as the processing contents do not contradict each other.

また、上述してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、取得部は、取得手段や取得回路に読み替えることができる。 Further, the above-mentioned "section, module, unit" can be read as "means" or "circuit". For example, the acquisition unit can be read as an acquisition means or an acquisition circuit.

１００判定装置
１２１学習情報記憶部
１２２画像情報記憶部
１３０制御部
１３１取得部
１３２学習部
１３３算出部
１３４判定部
１３５生成部
１３６提供部 100 Judgment device 121 Learning information storage unit 122 Image information storage unit 130 Control unit 131 Acquisition unit 132 Learning unit 133 Calculation unit 134 Judgment unit 135 Generation unit 136 Providing unit

Claims

A plurality of feature information based on an input image input to a neural network that identifies the number of objects in an image, and an acquisition unit that acquires a plurality of feature information corresponding to each number identified by the neural network.
A determination unit that determines the number of objects included in the input image based on the plurality of feature information acquired by the acquisition unit.
A calculation unit that calculates a score corresponding to each number based on the feature information corresponding to each number, and a calculation unit.
Based on the area where the feature amount above a predetermined threshold value included in the feature information which is the maximum score among the scores calculated by the calculation unit is located, and a plurality of aspect ratios having different aspect ratios . A generator that generates cropped processed images,
A determination device characterized by comprising.

The determination unit
The determination device according to claim 1, wherein the position of the object in the input image is determined based on the information regarding the feature amount in the feature information corresponding to the determined number of objects.

The acquisition unit
The determination device according to claim 1 or 2, wherein the plurality of feature information based on an input image input to a neural network for identifying the number of human faces in an image is acquired.

The determination unit
The determination device according to claim 3, wherein the number of the objects included in the input image is determined based on the score calculated by the calculation unit.

The determination unit
The determination device according to claim 4, wherein the number of the objects included in the input image is determined based on the score corresponding to each of the numbers calculated by the calculation unit.

The acquisition unit
The determination device according to any one of claims 1 to 5, wherein a plurality of feature information corresponding to each number identified by the neural network that performs convolution processing and pooling processing is acquired.

The acquisition unit
The determination device according to claim 6, wherein a plurality of feature information corresponding to each number identified by the neural network not including the fully connected layer is acquired.

The determination unit
The invention according to any one of claims 1 to 7, wherein the position of the target included in the input image is determined based on the information regarding the feature amount of each of the plurality of regions included in the feature information. Judgment device.

It is a judgment method executed by a computer.
A plurality of feature information based on an input image input to a neural network that identifies the number of objects in an image, and an acquisition process for acquiring a plurality of feature information corresponding to each number identified by the neural network.
A determination step of determining the number of the objects included in the input image based on the plurality of feature information acquired by the acquisition step, and a determination step.
A calculation process for calculating a score corresponding to each number based on the feature information corresponding to each number, and a calculation step.
Based on the region where the feature amount above a predetermined threshold value included in the feature information which is the maximum score among the scores calculated by the calculation step is located, and a plurality of aspect ratios having different aspect ratios . The generation process to generate cropped processed images and
A determination method characterized by including.

An acquisition procedure for acquiring a plurality of feature information based on an input image input to a neural network that identifies the number of objects in an image and corresponding to each number identified by the neural network.
A determination procedure for determining the number of the objects included in the input image based on the plurality of feature information acquired by the acquisition procedure, and a determination procedure.
A calculation procedure for calculating a score corresponding to each number based on the feature information corresponding to each number, and a calculation procedure.
Based on the area where the feature amount above a predetermined threshold value included in the feature information, which is the maximum score among the scores calculated by the calculation procedure, is located, and a plurality of aspect ratios having different aspect ratios . The generation procedure to generate the cropped processed image and
A judgment program characterized by having a computer execute.