JP2020160921A

JP2020160921A - Image recognition device, method for image recognition, and program

Info

Publication number: JP2020160921A
Application number: JP2019061039A
Authority: JP
Inventors: 剛早川; Takeshi Hayakawa; 純一気屋村; Junichi Kiyamura; 安利深谷; Yasutoshi Fukaya; 栗田　裕二; Yuji Kurita; 裕二栗田
Original assignee: NEC Solution Innovators Ltd
Current assignee: NEC Solution Innovators Ltd
Priority date: 2019-03-27
Filing date: 2019-03-27
Publication date: 2020-10-01
Anticipated expiration: 2039-03-27
Also published as: JP7287650B2

Abstract

To provide an image recognition device, a method for recognizing an image, and a program which can increase the processing efficiency in an image recognition using a mechanical learning model.SOLUTION: An image recognition device 10 includes: a feature amount map generation unit 11 for generating a feature amount map of an image as a target of an image recognition by using a convolution layer of a mechanical learning model for detecting a specific object from the image; and a score calculation unit 12 for setting a virtual window on the feature amount map, inputting a region in the window of the feature amount map into a total coupling layer of the mechanical learning model in a plurality of predetermined positions while sliding the window by a set amount, and calculating a score which shows the possibility that a specific object exits in the region in each predetermined position.SELECTED DRAWING: Figure 1

Description

本発明は、画像から対象物を認識するための画像認識装置及び画像認識方法に関し、更には、これらを実現するためのプログラムに関する。 The present invention relates to an image recognition device and an image recognition method for recognizing an object from an image, and further relates to a program for realizing these.

近年、機械学習モデルを用いて画像認識を行う画像認識装置が開発されている（例えば、特許文献１及び２参照）。このような画像認識装置によれば、予め学習させた人物、動物、自動車等を、画像から検出することができる。このため、画像認識装置は、映像監視システム、車両に搭載される事故防止システム等に利用されている。 In recent years, an image recognition device that performs image recognition using a machine learning model has been developed (see, for example, Patent Documents 1 and 2). According to such an image recognition device, a person, an animal, a car, or the like trained in advance can be detected from an image. For this reason, image recognition devices are used in video surveillance systems, accident prevention systems mounted on vehicles, and the like.

ここで、従来からの画像認識装置における処理について図１１を用いて説明する。図１１は、従来からの画像認識装置で行われる処理を示すフロー図である。また、この画像認識装置は、特定の物体を認識する機械学習モデルを備えている。機械学習モデルは、ディープラーニングによって構築されている。 Here, the processing in the conventional image recognition device will be described with reference to FIG. FIG. 11 is a flow chart showing processing performed by a conventional image recognition device. In addition, this image recognition device includes a machine learning model that recognizes a specific object. Machine learning models are built by deep learning.

図１１に示すように、最初に、画像認識装置は、外部の撮像装置又は記憶装置から画像データを取得する（ステップＳ１）。取得された画像データは、画像認識装置に搭載されたメモリ等に格納される。 As shown in FIG. 11, first, the image recognition device acquires image data from an external image pickup device or storage device (step S1). The acquired image data is stored in a memory or the like mounted on the image recognition device.

次に、画像認識装置は、取得した画像において検知対象物が含まれる可能性のある部分の範囲を指定する（ステップＳ２）。具体的には、画像認識装置は、ステップＳ２において、画面左上の座標、領域の横幅、及び領域の高さを指定することによって、範囲を指定する。また、ステップＳ２では、取得した画像の部分的な指定の代わりに、画像全体が指定される場合もある。 Next, the image recognition device specifies the range of the portion of the acquired image that may include the detection object (step S2). Specifically, in step S2, the image recognition device specifies a range by designating the coordinates at the upper left of the screen, the width of the area, and the height of the area. Further, in step S2, the entire image may be specified instead of the partial designation of the acquired image.

次に、画像認識装置は、ステップＳ２で指定された範囲内に、水平方向及び垂直方向の解像度が予め設定された矩形の領域を設定し、設定した領域の画像の切り出しを実行する（ステップＳ３）。ステップＳ３は、後述するように複数回繰り返し実行される。また、矩形の領域の設定は、実行の度に、その位置を設定画素数分だけスライドしながら行われる。この方式は、スライディング方式と呼ばれ、矩形の領域はスライディングウィンドウと呼ばれる。 Next, the image recognition device sets a rectangular area in which the resolutions in the horizontal direction and the vertical direction are preset within the range specified in step S2, and cuts out an image in the set area (step S3). ). Step S3 is repeatedly executed a plurality of times as described later. Further, the setting of the rectangular area is performed by sliding the position by the set number of pixels each time the execution is performed. This method is called a sliding method, and the rectangular area is called a sliding window.

また、この方式では、スライディングウィンドウは、指定された範囲の左上の端を基点にして、まず、水平方向に沿って、設定画素数分スライドされ、右上の端に到達すると、垂直方向に設定画素数分スライドさせた位置で、更に左端から右端へとスライドされる。また、スライド量となる設定画素数は、位置的に隣接するスライディングウィンドウ同士の端の部分が重なるように行われる。 In this method, the sliding window is first slid by the set number of pixels along the horizontal direction with the upper left edge of the specified range as the base point, and when it reaches the upper right edge, the set pixels are vertically set. At the position where it is slid for a few minutes, it is further slid from the left end to the right end. Further, the set number of pixels, which is the slide amount, is set so that the edges of the sliding windows that are positionedly adjacent to each other overlap each other.

次に、画像認識装置は、ステップＳ３で切り出された画像を、機械学習モデルに入力して、画像中の物体について推論を実行し、物体が特定の物体であることの確からしさ示すスコアを算出する（ステップＳ４）。 Next, the image recognition device inputs the image cut out in step S3 into the machine learning model, executes inference about the object in the image, and calculates a score indicating the certainty that the object is a specific object. (Step S4).

次に、画像認識装置は、ステップＳ４で算出されたスコアと、別のスライディングウィンドウについてのステップＳ４で先に算出されたスコアとを比較する。そして、画像認識装置は、値が高い方のスコアと、スコアの高い方のスライディングウィンドウの座標と、このスライディングウィンドウの画像識別番号とを保存する（ステップＳ５）。 Next, the image recognition device compares the score calculated in step S4 with the score previously calculated in step S4 for another sliding window. Then, the image recognition device stores the score having the higher value, the coordinates of the sliding window having the higher score, and the image identification number of the sliding window (step S5).

次に、画像認識装置は、ステップＳ２で指定された範囲全てについて、ステップＳ３〜Ｓ５が実行されているかどうかを判定する（ステップＳ６）。ステップＳ６の判定の結果、ステップＳ２で指定された範囲全てについて、ステップＳ３〜Ｓ５が実行されていない場合は、上述したように、画像認識装置は、スライディングウィンドウをスライドさせて、再度ステップＳ３を実行する。 Next, the image recognition device determines whether or not steps S3 to S5 are executed for the entire range specified in step S2 (step S6). As a result of the determination in step S6, if steps S3 to S5 are not executed for the entire range specified in step S2, the image recognition device slides the sliding window and repeats step S3 as described above. Execute.

一方、ステップＳ６の判定の結果、ステップＳ２で指定された範囲全てについて、ステップＳ３〜Ｓ５が実行されている場合は、画像認識装置は、保存しているスコアと、座標と、画像識別番号とを、外部に出力する。 On the other hand, as a result of the determination in step S6, when steps S3 to S5 are executed for the entire range specified in step S2, the image recognition device includes the stored score, coordinates, and image identification number. Is output to the outside.

このように、従来の画像認識装置では、スライディングウィンドウ単位で、学習モデルを用いた推論が行われて、画像認識が行われる。 As described above, in the conventional image recognition device, inference using the learning model is performed for each sliding window, and image recognition is performed.

特開２０１２−２４３１５５号公報Japanese Unexamined Patent Publication No. 2012-243155 特開２０１４−０４１４２７号公報Japanese Unexamined Patent Publication No. 2014-014427

しかしながら、従来の画像認識装置には、処理効率が低く、処理速度の向上が難しいという問題がある。具体的には、従来の画像認識装置は、上述したようにスライディングウィンドウ毎に、推論を実行する。そして、各スライディングウィンドウは、隣接する別のスライディングウィンドウと重なるように設定されている。このため、従来の画像認識装置は、重なっている部分については、重複して推論を実行しており、無駄な処理を行っている。結果、上述した問題が生じてしまう。 However, the conventional image recognition device has a problem that the processing efficiency is low and it is difficult to improve the processing speed. Specifically, the conventional image recognition device executes inference for each sliding window as described above. Then, each sliding window is set so as to overlap with another adjacent sliding window. For this reason, the conventional image recognition device performs inference in duplicate for the overlapping portion, and performs unnecessary processing. As a result, the above-mentioned problem occurs.

本発明の目的の一例は、上記問題を解消し、機械学習モデルを利用した画像認識において、処理効率の向上を図り得る、画像認識装置、画像認識方法、及びプログラムを提供することにある。 An example of an object of the present invention is to provide an image recognition device, an image recognition method, and a program capable of solving the above problems and improving processing efficiency in image recognition using a machine learning model.

上記目的を達成するため、本発明の一側面における画像認識装置は、
画像から特定の物体を検出するための機械学習モデルの畳み込み層を用いて、画像認識の対象となる画像の特徴量マップを生成する、特徴量マップ生成部と、
前記特徴量マップ上に、仮想のウィンドウを設定し、前記ウィンドウを設定量だけスライドさせながら、複数の所定の位置において、前記特徴量マップの前記ウィンドウ内の領域を、前記機械学習モデルの全結合層に入力して、前記所定の位置毎に、当該領域に前記特定の物体が存在している可能性を示すスコアを算出する、スコア算出部と、
を備えている、ことを特徴とする。 In order to achieve the above object, the image recognition device in one aspect of the present invention is
A feature map generator that generates a feature map of an image to be image-recognized using a convolutional layer of a machine learning model for detecting a specific object from an image.
A virtual window is set on the feature amount map, and while sliding the window by a set amount, the area in the window of the feature amount map is fully combined with the machine learning model at a plurality of predetermined positions. A score calculation unit that inputs to the layer and calculates a score indicating the possibility that the specific object exists in the region at each predetermined position.
It is characterized by having.

また、上記目的を達成するため、本発明の一側面における画像認識方法は、
（ａ）画像から特定の物体を検出するための機械学習モデルの畳み込み層を用いて、画像認識の対象となる画像の特徴量マップを生成する、ステップと、
（ｂ）前記特徴量マップ上に、仮想のウィンドウを設定し、前記ウィンドウを設定量だけスライドさせながら、複数の所定の位置において、前記特徴量マップの前記ウィンドウ内の領域を、前記機械学習モデルの全結合層に入力して、前記所定の位置毎に、当該領域に前記特定の物体が存在している可能性を示すスコアを算出する、ステップと、
を有する、ことを特徴とする。 Further, in order to achieve the above object, the image recognition method in one aspect of the present invention is:
(A) A step of generating a feature map of an image to be image-recognized using a convolution layer of a machine learning model for detecting a specific object from an image.
(B) A virtual window is set on the feature amount map, and while sliding the window by a set amount, a region in the window of the feature amount map is displayed at a plurality of predetermined positions in the machine learning model. To calculate a score indicating the possibility that the specific object is present in the region at each predetermined position by inputting into the fully connected layer of the step.
It is characterized by having.

更に、上記目的を達成するため、本発明の一側面におけるプログラムは、
コンピュータに、
（ａ）画像から特定の物体を検出するための機械学習モデルの畳み込み層を用いて、画像認識の対象となる画像の特徴量マップを生成する、ステップと、
（ｂ）前記特徴量マップ上に、仮想のウィンドウを設定し、前記ウィンドウを設定量だけスライドさせながら、複数の所定の位置において、前記特徴量マップの前記ウィンドウ内の領域を、前記機械学習モデルの全結合層に入力して、前記所定の位置毎に、当該領域に前記特定の物体が存在している可能性を示すスコアを算出する、ステップと、
を実行させる、ことを特徴とする。 Further, in order to achieve the above object, the program in one aspect of the present invention is:
On the computer
(A) A step of generating a feature map of an image to be image-recognized using a convolution layer of a machine learning model for detecting a specific object from an image.
(B) A virtual window is set on the feature amount map, and while sliding the window by a set amount, a region in the window of the feature amount map is displayed at a plurality of predetermined positions in the machine learning model. To calculate a score indicating the possibility that the specific object is present in the region at each predetermined position by inputting into the fully connected layer of the step.
It is characterized by executing.

以上のように、本発明によれば、機械学習モデルを利用した画像認識において、処理効率の向上を図ることができる。 As described above, according to the present invention, it is possible to improve the processing efficiency in image recognition using a machine learning model.

図１は、本発明の実施の形態１における画像認識装置の概略構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of an image recognition device according to the first embodiment of the present invention. 図２は、本発明の実施の形態１における画像認識装置の構成を具体的に示すブロック図である。FIG. 2 is a block diagram specifically showing the configuration of the image recognition device according to the first embodiment of the present invention. 図３は、本発明の実施の形態１で得られる特徴量マップの一例を示す図である。FIG. 3 is a diagram showing an example of a feature amount map obtained in the first embodiment of the present invention. 図４は、本発明の実施の形態１において用いられる識別器（機械学習モデル）の一例を示す図である。FIG. 4 is a diagram showing an example of a classifier (machine learning model) used in the first embodiment of the present invention. 図５は、本発明の実施の形態１における画像認識装置の動作を示すフロー図である。FIG. 5 is a flow chart showing the operation of the image recognition device according to the first embodiment of the present invention. 図６は、本発明の実施の形態２における画像認識装置の構成を具体的に示すブロック図である。FIG. 6 is a block diagram specifically showing the configuration of the image recognition device according to the second embodiment of the present invention. 図７は、本発明の実施の形態２における識別器（機械学習モデル）の一例を示す図である。FIG. 7 is a diagram showing an example of a classifier (machine learning model) according to the second embodiment of the present invention. 図８は、本発明の実施の形態２において識別器の畳み込み層が出力するデータの一例を示す図である。FIG. 8 is a diagram showing an example of data output by the convolution layer of the classifier in the second embodiment of the present invention. 図９は、本発明の実施の形態２における画像認識装置の動作を示すフロー図である。FIG. 9 is a flow chart showing the operation of the image recognition device according to the second embodiment of the present invention. 図１０は、本発明の実施の形態における画像認識装置を実現するコンピュータの一例を示すブロック図である。FIG. 10 is a block diagram showing an example of a computer that realizes the image recognition device according to the embodiment of the present invention. 図１１は、従来からの画像認識装置で行われる処理を示すフロー図である。FIG. 11 is a flow chart showing processing performed by a conventional image recognition device.

（実施の形態１）
以下、本発明の実施の形態１における画像認識装置、画像認識方法、及びプログラムについて、図１〜図５を参照しながら説明する。 (Embodiment 1)
Hereinafter, the image recognition device, the image recognition method, and the program according to the first embodiment of the present invention will be described with reference to FIGS. 1 to 5.

［装置構成］
最初に、図１を用いて、本発明の実施の形態１における画像認識装置の概略構成について説明する。図１は、本発明の実施の形態１における画像認識装置の概略構成を示すブロック図である。 [Device configuration]
First, the schematic configuration of the image recognition device according to the first embodiment of the present invention will be described with reference to FIG. FIG. 1 is a block diagram showing a schematic configuration of an image recognition device according to the first embodiment of the present invention.

図１に示す、本実施の形態１における画像認識装置１０は、画像から特定の物体を検出する装置である。図１に示すように、画像認識装置１０は、特徴量マップ生成部１１と、スコア算出部１２とを備えている。 The image recognition device 10 according to the first embodiment shown in FIG. 1 is a device that detects a specific object from an image. As shown in FIG. 1, the image recognition device 10 includes a feature amount map generation unit 11 and a score calculation unit 12.

特徴量マップ生成部１１は、画像から特定の物体を検出するための機械学習モデルの畳み込み層を用いて、画像認識の対象となる画像の特徴量マップを生成する。 The feature amount map generation unit 11 generates a feature amount map of an image to be image-recognized by using a convolution layer of a machine learning model for detecting a specific object from an image.

スコア算出部１２は、特徴量マップ上に、仮想のウィンドウを設定し、このウィンドウを設定量だけスライドさせながら、複数の所定の位置において、特徴量マップのウィンドウ内の領域を、機械学習モデルの全結合層に入力する。そして、この入力処理の結果から、スコア算出部１２は、所定の位置毎に、このウィンドウ内の領域に特定の物体が存在している可能性を示すスコアを算出する。 The score calculation unit 12 sets a virtual window on the feature map, and while sliding this window by the set amount, creates an area in the window of the feature map at a plurality of predetermined positions of the machine learning model. Enter in the fully connected layer. Then, from the result of this input processing, the score calculation unit 12 calculates a score indicating the possibility that a specific object exists in the area in the window for each predetermined position.

このように、本実施の形態１では、まず、機械学習モデルの畳み込み層を用いて、特徴量マップが生成され、そして、この特徴量マップ上でスライディングウィンドウによる処理が行われる。このため、従来に比べて、重複して行われる処理が大きく低減されるので、本実施の形態１によれば、機械学習モデルを利用した画像認識において、処理効率の向上を図ることができる。 As described above, in the first embodiment, first, the feature amount map is generated by using the convolution layer of the machine learning model, and then the processing by the sliding window is performed on the feature amount map. Therefore, since the duplicated processing is greatly reduced as compared with the conventional case, according to the first embodiment, it is possible to improve the processing efficiency in the image recognition using the machine learning model.

続いて、図２〜図４を用いて、本実施の形態１における画像認識装置１０の構成及び機能について具体的に説明する。図２は、本発明の実施の形態１における画像認識装置の構成を具体的に示すブロック図である。図３は、本発明の実施の形態１で得られる特徴量マップの一例を示す図である。図４は、本発明の実施の形態１において用いられる識別器（機械学習モデル）の一例を示す図である。 Subsequently, the configuration and function of the image recognition device 10 according to the first embodiment will be specifically described with reference to FIGS. 2 to 4. FIG. 2 is a block diagram specifically showing the configuration of the image recognition device according to the first embodiment of the present invention. FIG. 3 is a diagram showing an example of a feature amount map obtained in the first embodiment of the present invention. FIG. 4 is a diagram showing an example of a classifier (machine learning model) used in the first embodiment of the present invention.

図２に示すように、本実施の形態では、画像認識装置１０は、上述した特徴量マップ生成部１１及びスコア算出部１２に加えて、対象画像設定部１３と、特徴量マップ記憶部１４と、認識処理部１５と、機械学習モデルである識別器２０とを更に備えている。 As shown in FIG. 2, in the present embodiment, the image recognition device 10 includes the target image setting unit 13 and the feature amount map storage unit 14 in addition to the feature amount map generation unit 11 and the score calculation unit 12 described above. The recognition processing unit 15 and the classifier 20 which is a machine learning model are further provided.

なお、識別器２０は、図２の例では、画像認識装置１０内に備えられているが、この例に限定されるものではない。識別器２０は、画像認識装置１０以外の装置に設けられていても良い。 In the example of FIG. 2, the classifier 20 is provided in the image recognition device 10, but the classifier 20 is not limited to this example. The classifier 20 may be provided in a device other than the image recognition device 10.

対象画像設定部１３は、まず、画像認識の対象となる画像の画像データを取得する。続いて、対象画像設定部１３は、取得した画像データで特定される画像において、画像認識の対象となる範囲を設定する。 The target image setting unit 13 first acquires image data of an image to be image-recognized. Subsequently, the target image setting unit 13 sets a range to be image recognition in the image specified by the acquired image data.

具体的には、対象画像設定部１３は、画像において、認識対象が含まれる可能性のある範囲を特定し、その範囲を設定する。また、認識対象が含まれる可能性のある範囲の特定は、例えば、画像中の物体の輪郭検出を行い、検出された輪郭が存在する範囲を特定することによって行われる。更に、範囲の設定は、設定された範囲の左上の座標、横幅、及び高さを設定することによって行われる。加えて、対象画像設定部１３は、画像データの画像全体を、画像認識の対象となる範囲として設定することもできる。 Specifically, the target image setting unit 13 specifies a range in which the recognition target may be included in the image, and sets the range. Further, the range in which the recognition target may be included is specified, for example, by detecting the contour of the object in the image and specifying the range in which the detected contour exists. Further, the range is set by setting the coordinates, width, and height of the upper left of the set range. In addition, the target image setting unit 13 can also set the entire image of the image data as a range to be the target of image recognition.

特徴量マップ生成部１１は、本実施の形態では、対象画像設定部１３によって設定された範囲について、識別器２０の畳み込み層（図４参照）を用いて、特徴量マップを生成する。また、特徴量マップ生成部１１は、生成した特徴量マップを、特徴量マップ記憶部１４に格納する。特徴量マップ記憶部１４は、例えば、メモリであり、その記憶領域上に、特徴量マップを格納する。 In the present embodiment, the feature amount map generation unit 11 generates a feature amount map for the range set by the target image setting unit 13 by using the convolution layer (see FIG. 4) of the classifier 20. Further, the feature amount map generation unit 11 stores the generated feature amount map in the feature amount map storage unit 14. The feature amount map storage unit 14 is, for example, a memory, and stores the feature amount map in the storage area.

具体的には、図３に示すように、特徴量マップ生成部１１は、まず、対象画像設定部１３によって設定された範囲の画像データから、水平方向における全画素による行（ライン）をＮライン分毎に取り出す（Ｎ：任意の自然数）。続いて、特徴量マップ生成部１１は
、取り出したＮ行分の画像データを、順に、識別器２０の畳み込み層に入力する。これにより、Ｎライン分毎に、特徴量マップが生成される。図３においては、Ｎライン分の画像データと、それから生成された特徴量マップとが示されている。また、特徴量マップにおいて、格子は画素を示し、■は特徴量を示している。 Specifically, as shown in FIG. 3, the feature amount map generation unit 11 first sets N lines of all pixels in the horizontal direction from the image data in the range set by the target image setting unit 13. Take out every minute (N: any natural number). Subsequently, the feature amount map generation unit 11 inputs the extracted image data for N lines to the convolution layer of the classifier 20 in order. As a result, a feature map is generated for each N line. In FIG. 3, the image data for N lines and the feature amount map generated from the image data are shown. Further, in the feature amount map, the grid indicates the pixels, and ■ indicates the feature amount.

スコア算出部１２は、本実施の形態では、まず、特徴量マップ記憶部１４から、Ｎライン分の特徴量マップを取り出す。続いて、スコア算出部１２は、取り出したＮライン分の特徴量マップ上で、設定した仮想のウィンドウを設定量だけスライドさせながら、複数の所定の位置において、特徴量マップのウィンドウ内の領域を、識別器２０の全結合層（図４参照）に入力する。そして、スコア算出部１２は、所定の位置毎の全結合層の出力結果を、所定の位置毎のスコアとする。なお、図３において、矩形の破線は、仮想のウィンドウを示している。 In the present embodiment, the score calculation unit 12 first extracts a feature amount map for N lines from the feature amount map storage unit 14. Subsequently, the score calculation unit 12 slides the set virtual window by the set amount on the feature amount map for the extracted N lines, and at a plurality of predetermined positions, the area in the feature amount map window is displayed. , Input to the fully connected layer of the classifier 20 (see FIG. 4). Then, the score calculation unit 12 sets the output result of the fully connected layer for each predetermined position as the score for each predetermined position. In FIG. 3, the rectangular broken line indicates a virtual window.

また、図４に示すように、識別器２０は、畳み込み層２１〜２４と、全結合層２５とを備えている。図４の例では、画像データは、まず、畳み込み層２１に入力され、畳み込み層２１の出力が、畳み込み層２２と畳み込み層２４とに入力される。畳み込み層２４は、入力データに対してサイズ変換を行い、サイズ変換後の入力データを出力する。更に、畳み込み層２２の出力が、畳み込み層２３に入力され、畳み込み層２３の出力と畳み込み層２４の出力とが合成されて、特徴量マップとなる。 Further, as shown in FIG. 4, the classifier 20 includes convolutional layers 21 to 24 and a fully connected layer 25. In the example of FIG. 4, the image data is first input to the convolution layer 21, and the output of the convolution layer 21 is input to the convolution layer 22 and the convolution layer 24. The convolution layer 24 performs size conversion on the input data and outputs the input data after the size conversion. Further, the output of the convolution layer 22 is input to the convolution layer 23, and the output of the convolution layer 23 and the output of the convolution layer 24 are combined to form a feature map.

全結合層２５は、スコア算出部１２によって、特徴量マップのウィンドウ内の領域が入力されると、入力された領域に対して識別を行い、クラス毎に、画像中の物体がそのクラスに該当する確率を算出し、算出した確率を出力する。スコア算出部１２は、出力された確率を、スコアとする。 When the area in the window of the feature amount map is input by the score calculation unit 12, the fully connected layer 25 identifies the input area, and the object in the image corresponds to the class for each class. The probability of doing so is calculated, and the calculated probability is output. The score calculation unit 12 uses the output probability as a score.

認識処理部１５は、まず、スコア算出部１２によって、所定の位置毎に算出されたスコアの中から、最も値の大きいスコアと、そのときの所定の位置とを特定する。そして、認識処理部１５は、この特定したスコアと所定の位置とを、画像認識の結果として、出力する。この出力結果によれば、画像中に、特定の物体が存在しているかどうかを判断することができる。 First, the recognition processing unit 15 identifies the score having the largest value and the predetermined position at that time from the scores calculated for each predetermined position by the score calculation unit 12. Then, the recognition processing unit 15 outputs the specified score and a predetermined position as a result of image recognition. According to this output result, it is possible to determine whether or not a specific object is present in the image.

［装置動作］
次に、本実施の形態１における画像認識装置１０の動作について図５を用いて説明する。図５は、本発明の実施の形態１における画像認識装置の動作を示すフロー図である。以下の説明においては、適宜図１〜図４を参照する。また、本実施の形態１では、画像認識装置１０を動作させることによって、画像認識方法が実施される。よって、本実施の形態１における画像認識方法の説明は、以下の画像認識装置１０の動作説明に代える。 [Device operation]
Next, the operation of the image recognition device 10 in the first embodiment will be described with reference to FIG. FIG. 5 is a flow chart showing the operation of the image recognition device according to the first embodiment of the present invention. In the following description, FIGS. 1 to 4 will be referred to as appropriate. Further, in the first embodiment, the image recognition method is implemented by operating the image recognition device 10. Therefore, the description of the image recognition method in the first embodiment is replaced with the following description of the operation of the image recognition device 10.

図５に示すように、最初に、対象画像設定部１３は、画像認識の対象となる画像の画像データを取得する（ステップＡ１）。次に、対象画像設定部１３は、ステップＡ１で取得した画像データの画像において、画像認識の対象となる範囲を設定する（ステップＡ２）。 As shown in FIG. 5, first, the target image setting unit 13 acquires the image data of the image to be image-recognized (step A1). Next, the target image setting unit 13 sets a range to be image-recognized in the image of the image data acquired in step A1 (step A2).

次に、特徴量マップ生成部１１は、ステップＡ２で設定された範囲の画像データから、Ｎライン分の画像データを取り出す（ステップＡ３）。続いて、特徴量マップ生成部１１は、取り出したＮライン分の画像データを、識別器２０の畳み込み層に入力して、特徴量マップを生成する（ステップＡ４）。また、特徴量マップ生成部１１は、生成した特徴量マップを、特徴量マップ記憶部１４に格納する。 Next, the feature amount map generation unit 11 extracts image data for N lines from the image data in the range set in step A2 (step A3). Subsequently, the feature amount map generation unit 11 inputs the image data for the extracted N lines into the convolution layer of the classifier 20 to generate a feature amount map (step A4). Further, the feature amount map generation unit 11 stores the generated feature amount map in the feature amount map storage unit 14.

次に、スコア算出部１２は、特徴量マップ記憶部１４から、Ｎライン分の特徴量マップ
を取り出す。そして、スコア算出部１２は、取り出した特徴量マップ上で、仮想のウィンドウをスライドさせながら、複数の所定の位置において、特徴量マップのウィンドウ内の領域を、識別器２０の全結合層に入力して、スコアを算出する（ステップＡ５）。 Next, the score calculation unit 12 extracts the feature amount map for N lines from the feature amount map storage unit 14. Then, the score calculation unit 12 inputs the area in the window of the feature amount map into the fully connected layer of the classifier 20 at a plurality of predetermined positions while sliding the virtual window on the extracted feature amount map. Then, the score is calculated (step A5).

次に、認識処理部１５は、ステップＡ５において所定の位置毎に算出されたスコアの中から、最も値の大きいスコアと、そのときの所定の位置とを特定し、この特定したスコアと所定の位置とを、画像認識の結果として、出力する（ステップＡ６）。 Next, the recognition processing unit 15 identifies the score having the largest value and the predetermined position at that time from the scores calculated for each predetermined position in step A5, and the specified score and the predetermined position are specified. The position is output as a result of image recognition (step A6).

次に、認識処理部１５は、ステップＡ２で設定された範囲全てについて、ステップＡ３〜Ａ６の処理が終了しているかどうかを判定する（ステップＡ７）。 Next, the recognition processing unit 15 determines whether or not the processing of steps A3 to A6 has been completed for the entire range set in step A2 (step A7).

ステップＡ７の判定の結果、ステップＡ２で設定された範囲全てについて、ステップＡ３〜Ａ６の処理が終了していない場合は、認識処理部１５は、特徴量マップ生成部１１に再度ステップＡ３を実行させる。これにより、特徴量マップ生成部１１は、前回の画像データの下方に位置するＮライン分の画像データを取り出す。 As a result of the determination in step A7, if the processing of steps A3 to A6 is not completed for all the ranges set in step A2, the recognition processing unit 15 causes the feature amount map generation unit 11 to execute step A3 again. .. As a result, the feature amount map generation unit 11 extracts the image data for N lines located below the previous image data.

一方、ステップＡ７の判定の結果、ステップＡ２で設定された範囲全てについて、ステップＡ３〜Ａ６の処理が終了している場合は、画像認識装置における処理は終了する。 On the other hand, as a result of the determination in step A7, if the processing in steps A3 to A6 is completed for the entire range set in step A2, the processing in the image recognition device is completed.

［実施の形態１における効果］
以上のように、本実施の形態１では、画像のＮライン分毎に特徴量マップが生成され、Ｎライン分の特徴量マップ毎に、全結合層を用いたスコアの算出が行われる。このため、従来のように、重複した特徴量マップの生成は行われないので、本実施の形態１によれば、機械学習モデルを利用した画像認識において、処理効率の向上を図ることができる。 [Effect in Embodiment 1]
As described above, in the first embodiment, the feature amount map is generated for each N line portion of the image, and the score is calculated using the fully connected layer for each feature amount map for the N line portion. Therefore, unlike the conventional case, the duplicate feature amount map is not generated. Therefore, according to the first embodiment, it is possible to improve the processing efficiency in the image recognition using the machine learning model.

［プログラム］
本実施の形態１におけるプログラムは、コンピュータに、図５に示すステップＡ１〜Ａ７を実行させるプログラムであれば良い。このプログラムをコンピュータにインストールし、実行することによって、本実施の形態１における画像認識装置１０と画像認識方法とを実現することができる。この場合、コンピュータのプロセッサは、特徴量マップ生成部１１、スコア算出部１２、対象画像設定部１３、及び認識処理部１５として機能し、処理を行なう。 [program]
The program according to the first embodiment may be any program that causes a computer to execute steps A1 to A7 shown in FIG. By installing this program on a computer and executing it, the image recognition device 10 and the image recognition method according to the first embodiment can be realized. In this case, the computer processor functions as a feature amount map generation unit 11, a score calculation unit 12, a target image setting unit 13, and a recognition processing unit 15 to perform processing.

また、本実施の形態におけるプログラムは、複数のコンピュータによって構築されたコンピュータシステムによって実行されても良い。この場合は、例えば、各コンピュータが、それぞれ、特徴量マップ生成部１１、スコア算出部１２、対象画像設定部１３、及び認識処理部１５のいずれかとして機能しても良い。 Further, the program in the present embodiment may be executed by a computer system constructed by a plurality of computers. In this case, for example, each computer may function as any of the feature amount map generation unit 11, the score calculation unit 12, the target image setting unit 13, and the recognition processing unit 15.

（実施の形態２）
次に、本発明の実施の形態２における画像認識装置、画像認識方法、及びプログラムについて、図６〜図９を参照しながら説明する。 (Embodiment 2)
Next, the image recognition device, the image recognition method, and the program according to the second embodiment of the present invention will be described with reference to FIGS. 6 to 9.

［装置構成］
最初に、図６を用いて、本発明の実施の形態２における画像認識装置の概略構成について説明する。図６は、本発明の実施の形態２における画像認識装置の構成を具体的に示すブロック図である。 [Device configuration]
First, the schematic configuration of the image recognition device according to the second embodiment of the present invention will be described with reference to FIG. FIG. 6 is a block diagram specifically showing the configuration of the image recognition device according to the second embodiment of the present invention.

図６に示すように、本実施の形態２における画像認識装置３０は、実施の形態１における画像認識装置１０と同様の構成を備えているが、以下の点で異なっている。以下、実施の形態１との相違点を中心に説明する。 As shown in FIG. 6, the image recognition device 30 according to the second embodiment has the same configuration as the image recognition device 10 according to the first embodiment, but is different in the following points. Hereinafter, the differences from the first embodiment will be mainly described.

まず、本実施の形態２における画像認識装置３０は、識別器（機械学習モデル）２０の構築に用いられた学習データに、パディングデータが付加されており、それによって、特徴量マップに余分なデータが付加される場合に対応している。このため、画像認識装置３０は、実施の形態１における画像認識装置１０と異なり、補正データ生成部３１と、補正データ記憶部３２とを備えている。 First, in the image recognition device 30 according to the second embodiment, padding data is added to the learning data used for constructing the classifier (machine learning model) 20, and as a result, extra data is added to the feature amount map. Corresponds to the case where is added. Therefore, unlike the image recognition device 10 in the first embodiment, the image recognition device 30 includes a correction data generation unit 31 and a correction data storage unit 32.

補正データ生成部３１は、補正データを生成し、生成した補正データを補正データ記憶部３２に格納する。補正データは、学習データに付加されたパディングによって特徴量マップに付加されたデータを補正するための、データである。スコア算出部１２は、本実施の形態２では、補正データ生成部３１によって生成された補正データを用いて、特徴量マップを補正し、補正後の特徴量マップを用いて、スコアを算出する。 The correction data generation unit 31 generates correction data, and stores the generated correction data in the correction data storage unit 32. The correction data is data for correcting the data added to the feature map by padding added to the training data. In the second embodiment, the score calculation unit 12 corrects the feature amount map by using the correction data generated by the correction data generation unit 31, and calculates the score by using the corrected feature amount map.

また、本実施の形態２において用いられる識別器２０は、入力されたデータのサイズが、出力時において小さくなるという特性を有している。このため、上述したように、識別器２０の構築に用いられた学習データに、パディングデータが付加されている。 Further, the classifier 20 used in the second embodiment has a characteristic that the size of the input data becomes smaller at the time of output. Therefore, as described above, padding data is added to the learning data used for constructing the classifier 20.

パディングとは、畳み込み層に入力されたデータのサイズと、それから出力されるデータのサイズとが変わらないように、入力されたデータに、新たにデータ（パディングデータ）を追加することである。例えば、畳み込み層のカーネルの幅及び高さが共にＫであるとする（幅ｗ＝高さｈ＝Ｋ）。この場合、パディングデータを付加しないと、畳み込み層から出力されたデータのサイズは、幅及び高さ共に、入力時のサイズから「−（Ｋ−１）」となる。具体的には、Ｋ＝３の場合は、幅及び高さ共に「２」小さくなる。 Padding is to add new data (padding data) to the input data so that the size of the data input to the convolution layer and the size of the data output from it do not change. For example, assume that the width and height of the kernel of the convolution layer are both K (width w = height h = K). In this case, if the padding data is not added, the size of the data output from the convolution layer is "-(K-1)" from the size at the time of input in both width and height. Specifically, when K = 3, both the width and the height are reduced by "2".

従って、入力されるデータの矩形の外側に（Ｋ−１）／２ピクセルずつ、パディングデータを付加すれば、入力時と出力時とでデータの実質的なサイズを同一にすることができる。また、パディングデータの付加は、一般に、最初の畳み込み層（入力層）に対してだけではなく、下層の各畳み込み層に対しても行われる。また、通常は、パディングデータとしては、データ値がゼロとなったデータが使われる。この場合のパディングは「ゼロパディング」と称される。 Therefore, if padding data is added to the outside of the rectangle of the input data by (K-1) / 2 pixels, the actual size of the data can be made the same at the time of input and at the time of output. Further, the padding data is generally added not only to the first convolution layer (input layer) but also to each convolution layer of the lower layer. Further, normally, as the padding data, the data whose data value is zero is used. The padding in this case is called "zero padding".

ここで、図７及び図８を用いて、本実施の形態２における特徴量マップ生成部１１及び補正データ生成部３１の機能についてより具体的に説明する。図７は、本発明の実施の形態２における識別器（機械学習モデル）の一例を示す図である。図８は、本発明の実施の形態２において識別器の畳み込み層が出力するデータの一例を示す図である。 Here, the functions of the feature amount map generation unit 11 and the correction data generation unit 31 in the second embodiment will be described more specifically with reference to FIGS. 7 and 8. FIG. 7 is a diagram showing an example of a classifier (machine learning model) according to the second embodiment of the present invention. FIG. 8 is a diagram showing an example of data output by the convolution layer of the classifier in the second embodiment of the present invention.

上述したように、ゼロパディングのように固定値をパディングすると、それにより矩形の境界において、畳み込み層からの出力結果が変化してしまう可能性がある。このため、本実施の形態では、特徴量マップ生成部１１は、対象画像設定部１３によって設定された範囲の画像データにパディングデータを付加し、パディング後の画像データを、畳み込み層２１に入力する。 As mentioned above, padding a fixed value, such as zero padding, can change the output result from the convolution layer at the rectangular boundaries. Therefore, in the present embodiment, the feature amount map generation unit 11 adds padding data to the image data in the range set by the target image setting unit 13, and inputs the padded image data to the convolution layer 21. ..

また、このため、本実施の形態２では、識別器２０の構築に用いられる学習データにおいても、学習データとなる画像の周辺の画素データがパディングされる。例えば、スライディングウィンドウの矩形サイズが６４画素×８０画素であり、畳み込み層のカーネルのサイズがＫ＝３であるとする。この場合、機械学習においては、学習データとなる画像の周辺の１画素（＝（Ｋ−１）／２）を含めた６６画素×８２画素の画像が使用される。 Therefore, in the second embodiment, the pixel data around the image, which is the learning data, is also padded in the learning data used for constructing the classifier 20. For example, assume that the rectangular size of the sliding window is 64 pixels × 80 pixels, and the size of the kernel of the convolution layer is K = 3. In this case, in machine learning, an image of 66 pixels × 82 pixels including one pixel (= (K-1) / 2) around the image to be the learning data is used.

ところで、入力層（畳み込み層２１）以外の畳み込み層では、適切なパディングデータの値を特定することが困難である。なお、畳み込み層２４は、入力データに対してサイズ
変換を行うだけであるので、そのカーネルは、１×１であり、畳み込み層２４では、パディングは発生しない。 By the way, in the convolution layer other than the input layer (convolution layer 21), it is difficult to specify an appropriate padding data value. Since the convolution layer 24 only performs size conversion on the input data, its kernel is 1 × 1, and padding does not occur in the convolution layer 24.

このため、本実施の形態２では、図７に示すように、補正データ生成部３１は、畳み込み層２２及び２３のパディングデータに対応する補正データを生成する。また、補正データ生成部３１は、対応する畳み込み層の数だけ、補正ブロックを有している。図８の例では、補正データ生成部３１は、補正ブロック３１ａと、補正ブロック３１ｂとを有している。また、補正データ生成部３１は、畳み込み層での処理と同じに処理を実行でき、補正データ生成による処理の遅延を抑制している。 Therefore, in the second embodiment, as shown in FIG. 7, the correction data generation unit 31 generates correction data corresponding to the padding data of the convolution layers 22 and 23. Further, the correction data generation unit 31 has as many correction blocks as the number of corresponding convolution layers. In the example of FIG. 8, the correction data generation unit 31 has a correction block 31a and a correction block 31b. Further, the correction data generation unit 31 can execute the same processing as the processing in the convolution layer, and suppresses the delay of the processing due to the correction data generation.

ここで、図８に示すように、畳み込み層（入力層）２１の出力をp[n,m]、畳み込み層２２の出力をq[n,m]、畳み込み層２３の出力をR[n,m]とする。また、補正データ生成部３１による補正データによって補正された後の畳み込み層２３の出力をｒ[n,m]とする。なお、ｎは行を示し、ｍは列を示す。 Here, as shown in FIG. 8, the output of the convolution layer (input layer) 21 is p [n, m], the output of the convolution layer 22 is q [n, m], and the output of the convolution layer 23 is R [n, m]. Further, the output of the convolution layer 23 after being corrected by the correction data by the correction data generation unit 31 is set to r [n, m]. Note that n indicates a row and m indicates a column.

この場合、補正後の出力ｒ[n,m]は、ゼロパディングが行われていた場合と同一となる。よって、r[1,4]を例に挙げると、以下の数１によって算出される。 In this case, the corrected output r [n, m] is the same as when zero padding was performed. Therefore, taking r [1,4] as an example, it is calculated by the following equation 1.

［数１］
r[1,4] = q[0,4]*w2[0,1]+q[0,5]*w2[0,2]
+q[1,4]*w2[1,1]+q[1,5]*w2[1,2]
+q[2,4]*w2[2,1]+q[2,5]*w2[2,2] [Number 1]
r [1,4] = q [0,4] * w2 [0,1] + q [0,5] * w2 [0,2]
+ q [1,4] * w2 [1,1] + q [1,5] * w2 [1,2]
+ q [2,4] * w2 [2,1] + q [2,5] * w2 [2,2]

これに対して、本実施の形態２では、実施の形態１で述べたように、Ｎライン分のデータが畳み込みの対象となるので、パディングデータとして実データが用いられている。従って、畳み込み層２３の出力R[1,4]は、以下の数２によって算出される。 On the other hand, in the second embodiment, as described in the first embodiment, the data for N lines is the target of convolution, so that the actual data is used as the padding data. Therefore, the output R [1,4] of the convolution layer 23 is calculated by the following equation 2.

［数２］
R[1,4] = q[0,3]*w2[0,0] + { q[0,4]*w2[0,1]+q[0,5]*w2[0,2] }
+q[1,3]*w2[1,0] + { q[1,4]*w2[1,1]+q[1,5]*w2[1,2] }
+q[2,3]*w2[2,0] + { q[2,4]*w2[2,1]+q[2,5]*w2[2,2] }
= r[1,4] + { q[0,3]*w2[0,0] + q[1,3]*w2[1,0] + q[2,3]*w2[2,0] } [Number 2]
R [1,4] = q [0,3] * w2 [0,0] + {q [0,4] * w2 [0,1] + q [0,5] * w2 [0,2]}
+ q [1,3] * w2 [1,0] + {q [1,4] * w2 [1,1] + q [1,5] * w2 [1,2]}
+ q [2,3] * w2 [2,0] + {q [2,4] * w2 [2,1] + q [2,5] * w2 [2,2]}
= r [1,4] + {q [0,3] * w2 [0,0] + q [1,3] * w2 [1,0] + q [2,3] * w2 [2,0] }

また、上記数２において、r[1,4]をR[1,4]で表すと、下記数３に示す通りとなる。 Further, in the above equation 2, when r [1,4] is represented by R [1,4], it becomes as shown in the following equation 3.

［数３］
r[1,4] = R[1,4] - { q[0,3]*w2[0,0] + q[1,3]*w2[1,0] + q[2,3]*w2[2,0] } [Number 3]
r [1,4] = R [1,4]-{q [0,3] * w2 [0,0] + q [1,3] * w2 [1,0] + q [2,3] * w2 [2,0]}

ここで、上記数３において、｛｝内をCr[1,4]とすると、上記数３は、下記の数４によって表すことができる。 Here, assuming that the inside of {} is Cr [1,4] in the above number 3, the above number 3 can be represented by the following number 4.

［数４］
r[1,4] = R[1,4] - Cr[1,4] [Number 4]
r [1,4] = R [1,4] --Cr [1,4]

上記数４におけるC[1,4]が、パディングを補正するための補正データとなる。本実施の形態２においては、補正データ生成部３１の補正ブロック３１ｂは、この補正データを生成し、これを補正データ記憶部３２に格納する。 C [1,4] in the above equation 4 is the correction data for correcting the padding. In the second embodiment, the correction block 31b of the correction data generation unit 31 generates this correction data and stores it in the correction data storage unit 32.

また、畳み込み層２２の出力q[n,m]も、畳み込み層２１でのパディングの影響を受ける。但し、畳み込み層２１からの出力には、パディングデータは含まれていない。このため
、補正ブロック３１ａも、上述の補正ブロック３１ｂと同様の処理を実行する。 The output q [n, m] of the convolution layer 22 is also affected by the padding on the convolution layer 21. However, the output from the convolution layer 21 does not include padding data. Therefore, the correction block 31a also executes the same processing as the correction block 31b described above.

［装置動作］
次に、本実施の形態２における画像認識装置３０の動作について図９を用いて説明する。図９は、本発明の実施の形態２における画像認識装置の動作を示すフロー図である。以下の説明においては、適宜図６〜図８を参照する。また、本実施の形態２においても、画像認識装置３０を動作させることによって、画像認識方法が実施される。よって、本実施の形態２における画像認識方法の説明は、以下の画像認識装置３０の動作説明に代える。 [Device operation]
Next, the operation of the image recognition device 30 in the second embodiment will be described with reference to FIG. FIG. 9 is a flow chart showing the operation of the image recognition device according to the second embodiment of the present invention. In the following description, FIGS. 6 to 8 will be referred to as appropriate. Further, also in the second embodiment, the image recognition method is implemented by operating the image recognition device 30. Therefore, the description of the image recognition method in the second embodiment is replaced with the following description of the operation of the image recognition device 30.

図９に示すように、最初に、対象画像設定部１３は、画像認識の対象となる画像の画像データを取得する（ステップＢ１）。次に、対象画像設定部１３は、ステップＡ１で取得した画像データの画像において、画像認識の対象となる範囲を設定する（ステップＢ２）。 As shown in FIG. 9, first, the target image setting unit 13 acquires the image data of the image to be image-recognized (step B1). Next, the target image setting unit 13 sets a range to be targeted for image recognition in the image of the image data acquired in step A1 (step B2).

次に、特徴量マップ生成部１１は、ステップＢ２で設定された範囲の画像データから、Ｎライン分の画像データを取り出す（ステップＢ３）。続いて、特徴量マップ生成部１１は、取り出したＮライン分の画像データを、識別器２０の畳み込み層に入力して、特徴量マップを生成する（ステップＢ４）。また、特徴量マップ生成部１１は、生成した特徴量マップを、特徴量マップ記憶部１４に格納する。 Next, the feature amount map generation unit 11 extracts image data for N lines from the image data in the range set in step B2 (step B3). Subsequently, the feature amount map generation unit 11 inputs the image data for the extracted N lines into the convolution layer of the classifier 20 to generate a feature amount map (step B4). Further, the feature amount map generation unit 11 stores the generated feature amount map in the feature amount map storage unit 14.

次に、補正データ生成部３１は、パディングによって特徴量マップに付加されたデータを補正するため、補正データを生成し、生成した補正データを補正データ記憶部３２に格納する（ステップＢ５）。なお、ステップＢ５は、ステップＢ４と同じに実行されても良い。 Next, the correction data generation unit 31 generates correction data in order to correct the data added to the feature amount map by padding, and stores the generated correction data in the correction data storage unit 32 (step B5). Note that step B5 may be executed in the same manner as step B4.

次に、スコア算出部１２は、特徴量マップ記憶部１４から、Ｎライン分の特徴量マップを取り出し、更に、補正データ記憶部３２から補正データを取り出す。そして、スコア算出部１２は、取り出した特徴量マップを、補正データを用いて補正する（ステップＢ６）。 Next, the score calculation unit 12 takes out the feature amount map for N lines from the feature amount map storage unit 14, and further takes out the correction data from the correction data storage unit 32. Then, the score calculation unit 12 corrects the extracted feature amount map using the correction data (step B6).

次に、スコア算出部１２は、補正後の特徴量マップ上で、仮想のウィンドウをスライドさせながら、複数の所定の位置において、特徴量マップのウィンドウ内の領域を、識別器２０の全結合層に入力して、スコアを算出する（ステップＢ７）。 Next, the score calculation unit 12 slides the virtual window on the corrected feature amount map to display the area in the feature amount map window at a plurality of predetermined positions in the fully connected layer of the classifier 20. To calculate the score (step B7).

次に、認識処理部１５は、ステップＢ７において所定の位置毎に算出されたスコアの中から、最も値の大きいスコアと、そのときの所定の位置とを特定し、この特定したスコアと所定の位置とを、画像認識の結果として、出力する（ステップＢ８）。 Next, the recognition processing unit 15 identifies the score having the largest value and the predetermined position at that time from the scores calculated for each predetermined position in step B7, and the specified score and the predetermined position are specified. The position is output as a result of image recognition (step B8).

次に、認識処理部１５は、ステップＢ２で設定された範囲全てについて、ステップＢ３〜Ｂ８の処理が終了しているかどうかを判定する（ステップＢ９）。 Next, the recognition processing unit 15 determines whether or not the processing of steps B3 to B8 has been completed for the entire range set in step B2 (step B9).

ステップＢ９の判定の結果、ステップＢ２で設定された範囲全てについて、ステップＢ３〜Ｂ８の処理が終了していない場合は、認識処理部１５は、特徴量マップ生成部１１に再度ステップＢ３を実行させる。これにより、特徴量マップ生成部１１は、前回の画像データの下方に位置するＮライン分の画像データを取り出す。 As a result of the determination in step B9, if the processing of steps B3 to B8 is not completed for all the ranges set in step B2, the recognition processing unit 15 causes the feature amount map generation unit 11 to execute step B3 again. .. As a result, the feature amount map generation unit 11 extracts the image data for N lines located below the previous image data.

一方、ステップＢ９の判定の結果、ステップＢ２で設定された範囲全てについて、ステップＢ３〜Ｂ８の処理が終了している場合は、画像認識装置における処理は終了する。 On the other hand, as a result of the determination in step B9, if the processing in steps B3 to B8 is completed for the entire range set in step B2, the processing in the image recognition device is completed.

［実施の形態２における効果］
以上のように、本実施の形態２によれば、学習データにパディングが必要となる識別器２０が用いられる場合において、パディングデータを補正することができ、このような場合における識別精度の低下を抑制できる。また、本実施の形態２においても、実施の形態１と同様に、機械学習モデルを利用した画像認識において、処理効率の向上を図ることができる。 [Effect in Embodiment 2]
As described above, according to the second embodiment, when the classifier 20 that requires padding is used for the training data, the padding data can be corrected, and the discrimination accuracy is lowered in such a case. Can be suppressed. Further, also in the second embodiment, the processing efficiency can be improved in the image recognition using the machine learning model as in the first embodiment.

［プログラム］
本実施の形態２におけるプログラムは、コンピュータに、図９に示すステップＢ１〜Ｂ９を実行させるプログラムであれば良い。このプログラムをコンピュータにインストールし、実行することによって、本実施の形態２における画像認識装置３０と画像認識方法とを実現することができる。この場合、コンピュータのプロセッサは、特徴量マップ生成部１１、スコア算出部１２、対象画像設定部１３、認識処理部１５、及び補正データ生成部３１として機能し、処理を行なう。 [program]
The program according to the second embodiment may be any program that causes the computer to execute steps B1 to B9 shown in FIG. By installing this program on a computer and executing it, the image recognition device 30 and the image recognition method according to the second embodiment can be realized. In this case, the computer processor functions as a feature amount map generation unit 11, a score calculation unit 12, a target image setting unit 13, a recognition processing unit 15, and a correction data generation unit 31 to perform processing.

また、本実施の形態におけるプログラムは、複数のコンピュータによって構築されたコンピュータシステムによって実行されても良い。この場合は、例えば、各コンピュータが、それぞれ、特徴量マップ生成部１１、スコア算出部１２、対象画像設定部１３、認識処理部１５、及び補正データ生成部３１のいずれかとして機能しても良い。 Further, the program in the present embodiment may be executed by a computer system constructed by a plurality of computers. In this case, for example, each computer may function as one of the feature amount map generation unit 11, the score calculation unit 12, the target image setting unit 13, the recognition processing unit 15, and the correction data generation unit 31, respectively. ..

（変形例）
上述した実施の形態１及び２においては、図３及び図４に例示した識別器２０が用いられているが、実施の形態１及び２において、識別器は特に限定されるものではない。特に、実施の形態１においては、パディングが必要のない識別器が用いられていても良い。 (Modification example)
In the above-described first and second embodiments, the classifier 20 illustrated in FIGS. 3 and 4 is used, but in the first and second embodiments, the classifier is not particularly limited. In particular, in the first embodiment, a classifier that does not require padding may be used.

（物理構成）
ここで、実施の形態１及び２におけるプログラムを実行することによって、画像認識装置を実現するコンピュータについて図１０を用いて説明する。図１０は、本発明の実施の形態における画像認識装置を実現するコンピュータの一例を示すブロック図である。 (Physical configuration)
Here, a computer that realizes an image recognition device by executing the programs of the first and second embodiments will be described with reference to FIG. FIG. 10 is a block diagram showing an example of a computer that realizes the image recognition device according to the embodiment of the present invention.

図１０に示すように、コンピュータ１１０は、ＣＰＵ１１１と、メインメモリ１１２と、記憶装置１１３と、入力インターフェイス１１４と、表示コントローラ１１５と、データリーダ／ライタ１１６と、通信インターフェイス１１７とを備える。これらの各部は、バス１２１を介して、互いにデータ通信可能に接続される。また、コンピュータ１１０は、ＣＰＵ１１１に加えて、又はＣＰＵ１１１に代えて、ＧＰＵ（Graphics Processing Unit）、又はＦＰＧＡ（Field-Programmable Gate Array）又はＡＳＩＣ（Application Specific IC）を備えていても良い。 As shown in FIG. 10, the computer 110 includes a CPU 111, a main memory 112, a storage device 113, an input interface 114, a display controller 115, a data reader / writer 116, and a communication interface 117. Each of these parts is connected to each other via a bus 121 so as to be capable of data communication. Further, the computer 110 may include a GPU (Graphics Processing Unit), an FPGA (Field-Programmable Gate Array), or an ASIC (Application Specific IC) in addition to or in place of the CPU 111.

ＣＰＵ１１１は、記憶装置１１３に格納された、本実施の形態におけるプログラム（コード）をメインメモリ１１２に展開し、これらを所定順序で実行することにより、各種の演算を実施する。メインメモリ１１２は、典型的には、ＤＲＡＭ（Dynamic Random Access Memory）等の揮発性の記憶装置である。また、本実施の形態におけるプログラムは、コンピュータ読み取り可能な記録媒体１２０に格納された状態で提供される。なお、本実施の形態におけるプログラムは、通信インターフェイス１１７を介して接続されたインターネット上で流通するものであっても良い。 The CPU 111 expands the programs (codes) of the present embodiment stored in the storage device 113 into the main memory 112 and executes them in a predetermined order to perform various operations. The main memory 112 is typically a volatile storage device such as a DRAM (Dynamic Random Access Memory). Further, the program according to the present embodiment is provided in a state of being stored in a computer-readable recording medium 120. The program in the present embodiment may be distributed on the Internet connected via the communication interface 117.

また、記憶装置１１３の具体例としては、ハードディスクドライブの他、フラッシュメモリ等の半導体記憶装置が挙げられる。入力インターフェイス１１４は、ＣＰＵ１１１と、キーボード及びマウスといった入力機器１１８との間のデータ伝送を仲介する。表示コントローラ１１５は、ディスプレイ装置１１９と接続され、ディスプレイ装置１１９での表示を制御する。 Further, specific examples of the storage device 113 include a semiconductor storage device such as a flash memory in addition to a hard disk drive. The input interface 114 mediates data transmission between the CPU 111 and an input device 118 such as a keyboard and mouse. The display controller 115 is connected to the display device 119 and controls the display on the display device 119.

データリーダ／ライタ１１６は、ＣＰＵ１１１と記録媒体１２０との間のデータ伝送を仲介し、記録媒体１２０からのプログラムの読み出し、及びコンピュータ１１０における処理結果の記録媒体１２０への書き込みを実行する。通信インターフェイス１１７は、ＣＰＵ１１１と、他のコンピュータとの間のデータ伝送を仲介する。 The data reader / writer 116 mediates data transmission between the CPU 111 and the recording medium 120, reads a program from the recording medium 120, and writes a processing result in the computer 110 to the recording medium 120. The communication interface 117 mediates data transmission between the CPU 111 and another computer.

また、記録媒体１２０の具体例としては、ＣＦ（Compact Flash（登録商標））及びＳＤ（Secure Digital）等の汎用的な半導体記憶デバイス、フレキシブルディスク（Flexible Disk）等の磁気記録媒体、又はＣＤ−ＲＯＭ（Compact Disk Read Only Memory）などの光学記録媒体が挙げられる。 Specific examples of the recording medium 120 include a general-purpose semiconductor storage device such as CF (Compact Flash (registered trademark)) and SD (Secure Digital), a magnetic recording medium such as a flexible disk, or a CD-. Examples include optical recording media such as ROM (Compact Disk Read Only Memory).

なお、本実施の形態における画像認識装置は、プログラムがインストールされたコンピュータではなく、各部に対応したハードウェアを用いることによっても実現可能である。更に、画像認識装置は、一部がプログラムで実現され、残りの部分がハードウェアで実現されていてもよい。 The image recognition device in the present embodiment can also be realized by using hardware corresponding to each part instead of the computer in which the program is installed. Further, the image recognition device may be partially realized by a program and the rest may be realized by hardware.

上述した実施の形態の一部又は全部は、以下に記載する（付記１）〜（付記１２）によって表現することができるが、以下の記載に限定されるものではない。 A part or all of the above-described embodiments can be expressed by the following descriptions (Appendix 1) to (Appendix 12), but the present invention is not limited to the following description.

（付記１）
画像から特定の物体を検出するための機械学習モデルの畳み込み層を用いて、画像認識の対象となる画像の特徴量マップを生成する、特徴量マップ生成部と、
前記特徴量マップ上に、仮想のウィンドウを設定し、前記ウィンドウを設定量だけスライドさせながら、複数の所定の位置において、前記特徴量マップの前記ウィンドウ内の領域を、前記機械学習モデルの全結合層に入力して、前記所定の位置毎に、当該領域に前記特定の物体が存在している可能性を示すスコアを算出する、スコア算出部と、
を備えている、ことを特徴とする画像認識装置。 (Appendix 1)
A feature map generator that generates a feature map of an image to be image-recognized using a convolutional layer of a machine learning model for detecting a specific object from an image.
A virtual window is set on the feature amount map, and while sliding the window by a set amount, the area in the window of the feature amount map is fully combined with the machine learning model at a plurality of predetermined positions. A score calculation unit that inputs to the layer and calculates a score indicating the possibility that the specific object exists in the region at each predetermined position.
An image recognition device characterized by being equipped with.

（付記２）
付記１に記載の画像認識装置であって、
画像データを取得し、取得した画像データで特定される画像において、前記画像認識の対象となる範囲を設定する、対象画像設定部を、更に備え、
前記特徴量マップ生成部が、設定された範囲について、前記特徴量マップを生成する、ことを特徴とする画像認識装置。 (Appendix 2)
The image recognition device according to Appendix 1.
Further, a target image setting unit for acquiring image data and setting a range to be targeted for image recognition in the image specified by the acquired image data is further provided.
An image recognition device characterized in that the feature amount map generation unit generates the feature amount map for a set range.

（付記３）
付記１または２に記載の画像認識装置であって、
前記所定の位置毎に算出された前記スコアの中から、最も値の大きい前記スコアと、そのときの前記所定の位置とを特定し、特定した前記スコアと前記所定の位置とを出力する、認識処理部を、更に備えている、
ことを特徴とする画像認識装置。 (Appendix 3)
The image recognition device according to Appendix 1 or 2.
From the scores calculated for each predetermined position, the score having the largest value and the predetermined position at that time are specified, and the specified score and the predetermined position are output. It also has a processing unit,
An image recognition device characterized by this.

（付記４）
付記１〜３のいずれかに記載の画像認識装置であって、
前記機械学習モデルの構築に用いられた学習データに、パディングデータが付加されており、それによって、前記特徴量マップに余分なデータが付加される場合に、付加されているデータを補正するための、補正データを生成する、補正データ生成部を、更に備え、
前記スコア算出部は、生成された前記補正データを用いて、前記特徴量マップを補正し、補正後の前記特徴量マップを用いて、前記スコアを算出する、
ことを特徴とする画像認識装置。 (Appendix 4)
The image recognition device according to any one of Appendix 1 to 3.
When padding data is added to the training data used for constructing the machine learning model and extra data is added to the feature amount map, the added data is corrected. , A correction data generation unit is further provided to generate correction data.
The score calculation unit corrects the feature amount map using the generated correction data, and calculates the score using the corrected feature amount map.
An image recognition device characterized by this.

（付記５）
（ａ）画像から特定の物体を検出するための機械学習モデルの畳み込み層を用いて、画像認識の対象となる画像の特徴量マップを生成する、ステップと、
（ｂ）前記特徴量マップ上に、仮想のウィンドウを設定し、前記ウィンドウを設定量だけスライドさせながら、複数の所定の位置において、前記特徴量マップの前記ウィンドウ内の領域を、前記機械学習モデルの全結合層に入力して、前記所定の位置毎に、当該領域に前記特定の物体が存在している可能性を示すスコアを算出する、ステップと、
を有する、ことを特徴とする画像認識方法。 (Appendix 5)
(A) A step of generating a feature map of an image to be image-recognized using a convolution layer of a machine learning model for detecting a specific object from an image.
(B) A virtual window is set on the feature amount map, and while sliding the window by a set amount, a region in the window of the feature amount map is displayed at a plurality of predetermined positions in the machine learning model. To calculate a score indicating the possibility that the specific object is present in the region at each predetermined position by inputting into the fully connected layer of the step.
An image recognition method characterized by having.

（付記６）
付記５に記載の画像認識方法であって、
（ｃ）画像データを取得し、取得した画像データで特定される画像において、前記画像認識の対象となる範囲を設定する、ステップを、更に有する、
前記（ａ）のステップにおいて、設定された範囲について、前記特徴量マップを生成する、
ことを特徴とする画像認識方法。 (Appendix 6)
The image recognition method described in Appendix 5
(C) Further having a step of acquiring image data and setting a range to be targeted for the image recognition in the image specified by the acquired image data.
In the step (a), the feature amount map is generated for the set range.
An image recognition method characterized by that.

（付記７）
付記５または６に記載の画像認識方法であって、
（ｄ）前記所定の位置毎に算出された前記スコアの中から、最も値の大きい前記スコアと、そのときの前記所定の位置とを特定し、特定した前記スコアと前記所定の位置とを出力する、ステップを、更に有する、
ことを特徴とする画像認識方法。 (Appendix 7)
The image recognition method according to Appendix 5 or 6.
(D) From the scores calculated for each predetermined position, the score having the largest value and the predetermined position at that time are specified, and the specified score and the predetermined position are output. Have more steps,
An image recognition method characterized by that.

（付記８）
付記５〜７のいずれかに記載の画像認識方法であって、
（ｅ）前記機械学習モデルの構築に用いられた学習データに、パディングデータが付加されており、それによって、前記特徴量マップに余分なデータが付加される場合に、付加されているデータを補正するための、補正データを生成する、ステップを、更に有し、
前記（ｂ）のステップにおいて、生成された前記補正データを用いて、前記特徴量マップを補正し、補正後の前記特徴量マップを用いて、前記スコアを算出する、
ことを特徴とする画像認識方法。 (Appendix 8)
The image recognition method according to any one of Appendix 5 to 7.
(E) When padding data is added to the training data used for constructing the machine learning model and extra data is added to the feature amount map, the added data is corrected. Further has steps to generate correction data for
In the step (b), the generated correction data is used to correct the feature amount map, and the corrected feature amount map is used to calculate the score.
An image recognition method characterized by that.

（付記９）
コンピュータに、
（ａ）画像から特定の物体を検出するための機械学習モデルの畳み込み層を用いて、画像認識の対象となる画像の特徴量マップを生成する、ステップと、
（ｂ）前記特徴量マップ上に、仮想のウィンドウを設定し、前記ウィンドウを設定量だけスライドさせながら、複数の所定の位置において、前記特徴量マップの前記ウィンドウ内の領域を、前記機械学習モデルの全結合層に入力して、前記所定の位置毎に、当該領域に前記特定の物体が存在している可能性を示すスコアを算出する、ステップと、
を実行させる、プログラム。 (Appendix 9)
On the computer
(A) A step of generating a feature map of an image to be image-recognized using a convolution layer of a machine learning model for detecting a specific object from an image.
(B) A virtual window is set on the feature amount map, and while sliding the window by a set amount, a region in the window of the feature amount map is displayed at a plurality of predetermined positions in the machine learning model. To calculate a score indicating the possibility that the specific object is present in the region at each predetermined position by inputting into the fully connected layer of the step.
A program that runs.

（付記１０）
付記９に記載のプログラムであって、
前記コンピュータに、
（ｃ）画像データを取得し、取得した画像データで特定される画像において、前記画像認識の対象となる範囲を設定する、ステップを、更に実行させ、
前記（ａ）のステップにおいて、設定された範囲について、前記特徴量マップを生成す
る、
ことを特徴とするプログラム。 (Appendix 10)
The program described in Appendix 9
On the computer
(C) Further execute the step of acquiring the image data and setting the range to be the target of the image recognition in the image specified by the acquired image data.
In the step (a), the feature amount map is generated for the set range.
A program characterized by that.

（付記１１）
付記９または１０に記載のプログラムであって、
前記コンピュータに、
（ｄ）前記所定の位置毎に算出された前記スコアの中から、最も値の大きい前記スコアと、そのときの前記所定の位置とを特定し、特定した前記スコアと前記所定の位置とを出力する、ステップを、更に実行させ、
ことを特徴とするプログラム。 (Appendix 11)
The program described in Appendix 9 or 10 and
On the computer
(D) From the scores calculated for each predetermined position, the score having the largest value and the predetermined position at that time are specified, and the specified score and the predetermined position are output. To do, to perform more steps,
A program characterized by that.

（付記１２）
付記９〜１１のいずれかに記載のプログラムであって、
前記コンピュータに、
（ｅ）前記機械学習モデルの構築に用いられた学習データに、パディングデータが付加されており、それによって、前記特徴量マップに余分なデータが付加される場合に、付加されているデータを補正するための、補正データを生成する、ステップを、更に実行させ、
前記（ｂ）のステップにおいて、生成された前記補正データを用いて、前記特徴量マップを補正し、補正後の前記特徴量マップを用いて、前記スコアを算出する、
ことを特徴とするプログラム。 (Appendix 12)
The program described in any of the appendices 9 to 11 and
On the computer
(E) When padding data is added to the training data used for constructing the machine learning model and extra data is added to the feature amount map, the added data is corrected. To perform further steps to generate correction data,
In the step (b), the generated correction data is used to correct the feature amount map, and the corrected feature amount map is used to calculate the score.
A program characterized by that.

以上のように、本発明によれば、機械学習モデルを利用した画像認識において、処理効率の向上を図ることができる。本発明は、画像認識が必要される種々のシステムに有用である。 As described above, according to the present invention, it is possible to improve the processing efficiency in image recognition using a machine learning model. The present invention is useful for various systems that require image recognition.

１０画像認識装置（実施の形態１）
１１特徴量マップ生成部
１２スコア算出部
１３対象画像設定部
１４特徴量マップ記憶部
１５認識処理部
２０機械学習モデルである識別器
２１〜２４畳み込み層
２５全結合層
３０画像認識装置（実施の形態２）
３１補正データ生成部
３１ａ、３１ｂ補正ブロック
３２補正データ記憶部
１１０コンピュータ
１１１ＣＰＵ
１１２メインメモリ
１１３記憶装置
１１４入力インターフェイス
１１５表示コントローラ
１１６データリーダ／ライタ
１１７通信インターフェイス
１１８入力機器
１１９ディスプレイ装置
１２０記録媒体
１２１バス 10 Image recognition device (Embodiment 1)
11 Feature map generation unit 12 Score calculation unit 13 Target image setting unit 14 Feature map storage unit 15 Recognition processing unit 20 Machine learning model classifier 21 to 24 Folding layer 25 Fully connected layer 30 Image recognition device (embodiment) 2)
31 Correction data generation unit 31a, 31b Correction block 32 Correction data storage unit 110 Computer 111 CPU
112 Main memory 113 Storage device 114 Input interface 115 Display controller 116 Data reader / writer 117 Communication interface 118 Input device 119 Display device 120 Recording medium 121 Bus

Claims

A feature map generator that generates a feature map of an image to be image-recognized using a convolutional layer of a machine learning model for detecting a specific object from an image.
A virtual window is set on the feature amount map, and while sliding the window by a set amount, the area in the window of the feature amount map is fully combined with the machine learning model at a plurality of predetermined positions. A score calculation unit that inputs to the layer and calculates a score indicating the possibility that the specific object exists in the region at each predetermined position.
An image recognition device characterized by being equipped with.

The image recognition device according to claim 1.
Further, a target image setting unit for acquiring image data and setting a range to be targeted for image recognition in the image specified by the acquired image data is further provided.
An image recognition device characterized in that the feature amount map generation unit generates the feature amount map for a set range.

The image recognition device according to claim 1 or 2.
From the scores calculated for each predetermined position, the score having the largest value and the predetermined position at that time are specified, and the specified score and the predetermined position are output. It also has a processing unit,
An image recognition device characterized by this.

The image recognition device according to any one of claims 1 to 3.
When padding data is added to the training data used for constructing the machine learning model and extra data is added to the feature amount map, the added data is corrected. , A correction data generation unit is further provided to generate correction data.
The score calculation unit corrects the feature amount map using the generated correction data, and calculates the score using the corrected feature amount map.
An image recognition device characterized by this.

(A) A step of generating a feature map of an image to be image-recognized using a convolution layer of a machine learning model for detecting a specific object from an image.
(B) A virtual window is set on the feature amount map, and while sliding the window by a set amount, a region in the window of the feature amount map is displayed at a plurality of predetermined positions in the machine learning model. To calculate a score indicating the possibility that the specific object is present in the region at each predetermined position by inputting into the fully connected layer of the step.
An image recognition method characterized by having.

The image recognition method according to claim 5.
(C) Further having a step of acquiring image data and setting a range to be targeted for the image recognition in the image specified by the acquired image data.
In the step (a), the feature amount map is generated for the set range.
An image recognition method characterized by that.

The image recognition method according to claim 5 or 6.
(D) From the scores calculated for each predetermined position, the score having the largest value and the predetermined position at that time are specified, and the specified score and the predetermined position are output. Have more steps,
An image recognition method characterized by that.

The image recognition method according to any one of claims 5 to 7.
(E) When padding data is added to the training data used for constructing the machine learning model and extra data is added to the feature amount map, the added data is corrected. Further has steps to generate correction data for
In the step (b), the generated correction data is used to correct the feature amount map, and the corrected feature amount map is used to calculate the score.
An image recognition method characterized by that.

On the computer
(A) A step of generating a feature map of an image to be image-recognized using a convolution layer of a machine learning model for detecting a specific object from an image.
(B) A virtual window is set on the feature amount map, and while sliding the window by a set amount, a region in the window of the feature amount map is displayed at a plurality of predetermined positions in the machine learning model. To calculate a score indicating the possibility that the specific object is present in the region at each predetermined position by inputting into the fully connected layer of the step.
A program that runs.

The program according to claim 9.
On the computer
(C) Further execute the step of acquiring the image data and setting the range to be the target of the image recognition in the image specified by the acquired image data.
In the step (a), the feature amount map is generated for the set range.
A program characterized by that.

The program according to claim 9 or 10.
On the computer
(D) From the scores calculated for each predetermined position, the score having the largest value and the predetermined position at that time are specified, and the specified score and the predetermined position are output. To do, to perform more steps,
A program characterized by that.

The program according to any one of claims 9 to 11.
On the computer
(E) When padding data is added to the training data used for constructing the machine learning model and extra data is added to the feature amount map, the added data is corrected. To perform further steps to generate correction data,
In the step (b), the generated correction data is used to correct the feature amount map, and the corrected feature amount map is used to calculate the score.
A program characterized by that.