JP7287650B2

JP7287650B2 - Image recognition device, image recognition method, and program

Info

Publication number: JP7287650B2
Application number: JP2019061039A
Authority: JP
Inventors: 剛早川; 純一気屋村; 安利深谷; 裕二栗田
Original assignee: NEC Solution Innovators Ltd
Current assignee: NEC Solution Innovators Ltd
Priority date: 2019-03-27
Filing date: 2019-03-27
Publication date: 2023-06-06
Anticipated expiration: 2039-03-27
Also published as: JP2020160921A

Description

本発明は、画像から対象物を認識するための画像認識装置及び画像認識方法に関し、更には、これらを実現するためのプログラムに関する。 The present invention relates to an image recognition apparatus and image recognition method for recognizing an object from an image, and further to a program for realizing these.

近年、機械学習モデルを用いて画像認識を行う画像認識装置が開発されている（例えば、特許文献１及び２参照）。このような画像認識装置によれば、予め学習させた人物、動物、自動車等を、画像から検出することができる。このため、画像認識装置は、映像監視システム、車両に搭載される事故防止システム等に利用されている。 In recent years, an image recognition device that performs image recognition using a machine learning model has been developed (see Patent Documents 1 and 2, for example). According to such an image recognition device, it is possible to detect pre-learned persons, animals, automobiles, etc. from images. For this reason, the image recognition device is used in a video monitoring system, an accident prevention system mounted on a vehicle, and the like.

ここで、従来からの画像認識装置における処理について図１１を用いて説明する。図１１は、従来からの画像認識装置で行われる処理を示すフロー図である。また、この画像認識装置は、特定の物体を認識する機械学習モデルを備えている。機械学習モデルは、ディープラーニングによって構築されている。 Here, processing in a conventional image recognition apparatus will be described with reference to FIG. FIG. 11 is a flowchart showing processing performed by a conventional image recognition apparatus. This image recognition device also has a machine learning model for recognizing specific objects. Machine learning models are built by deep learning.

図１１に示すように、最初に、画像認識装置は、外部の撮像装置又は記憶装置から画像データを取得する（ステップＳ１）。取得された画像データは、画像認識装置に搭載されたメモリ等に格納される。 As shown in FIG. 11, the image recognition device first acquires image data from an external imaging device or storage device (step S1). The acquired image data is stored in a memory or the like installed in the image recognition device.

次に、画像認識装置は、取得した画像において検知対象物が含まれる可能性のある部分の範囲を指定する（ステップＳ２）。具体的には、画像認識装置は、ステップＳ２において、画面左上の座標、領域の横幅、及び領域の高さを指定することによって、範囲を指定する。また、ステップＳ２では、取得した画像の部分的な指定の代わりに、画像全体が指定される場合もある。 Next, the image recognition device designates a range of a portion of the obtained image that may contain the detection target (step S2). Specifically, in step S2, the image recognition device designates the range by designating the coordinates of the upper left corner of the screen, the width of the region, and the height of the region. Also, in step S2, the entire image may be designated instead of the partial designation of the acquired image.

次に、画像認識装置は、ステップＳ２で指定された範囲内に、水平方向及び垂直方向の解像度が予め設定された矩形の領域を設定し、設定した領域の画像の切り出しを実行する（ステップＳ３）。ステップＳ３は、後述するように複数回繰り返し実行される。また、矩形の領域の設定は、実行の度に、その位置を設定画素数分だけスライドしながら行われる。この方式は、スライディング方式と呼ばれ、矩形の領域はスライディングウィンドウと呼ばれる。 Next, the image recognition apparatus sets a rectangular area in which horizontal and vertical resolutions are preset within the range specified in step S2, and cuts out an image of the set area (step S3). ). Step S3 is repeatedly executed a plurality of times as described later. Also, the setting of the rectangular area is performed while sliding the position by the set number of pixels each time it is executed. This method is called a sliding method, and the rectangular area is called a sliding window.

また、この方式では、スライディングウィンドウは、指定された範囲の左上の端を基点にして、まず、水平方向に沿って、設定画素数分スライドされ、右上の端に到達すると、垂直方向に設定画素数分スライドさせた位置で、更に左端から右端へとスライドされる。また、スライド量となる設定画素数は、位置的に隣接するスライディングウィンドウ同士の端の部分が重なるように行われる。 Also, in this method, the sliding window starts from the upper left corner of the specified range and is first slid horizontally by a set number of pixels. At the position where it is slid for several minutes, it is slid further from the left end to the right end. Also, the number of pixels to be set as the amount of sliding is set so that the edge portions of positionally adjacent sliding windows overlap.

次に、画像認識装置は、ステップＳ３で切り出された画像を、機械学習モデルに入力して、画像中の物体について推論を実行し、物体が特定の物体であることの確からしさ示すスコアを算出する（ステップＳ４）。 Next, the image recognition device inputs the image cut out in step S3 to the machine learning model, performs inference on the object in the image, and calculates a score indicating the likelihood that the object is a specific object. (step S4).

次に、画像認識装置は、ステップＳ４で算出されたスコアと、別のスライディングウィンドウについてのステップＳ４で先に算出されたスコアとを比較する。そして、画像認識装置は、値が高い方のスコアと、スコアの高い方のスライディングウィンドウの座標と、このスライディングウィンドウの画像識別番号とを保存する（ステップＳ５）。 Next, the image recognition device compares the score calculated in step S4 with the score previously calculated in step S4 for another sliding window. Then, the image recognition device saves the score with the higher value, the coordinates of the sliding window with the higher score, and the image identification number of this sliding window (step S5).

次に、画像認識装置は、ステップＳ２で指定された範囲全てについて、ステップＳ３～Ｓ５が実行されているかどうかを判定する（ステップＳ６）。ステップＳ６の判定の結果、ステップＳ２で指定された範囲全てについて、ステップＳ３～Ｓ５が実行されていない場合は、上述したように、画像認識装置は、スライディングウィンドウをスライドさせて、再度ステップＳ３を実行する。 Next, the image recognition device determines whether or not steps S3 to S5 have been performed for all the ranges specified in step S2 (step S6). As a result of the determination in step S6, if steps S3 to S5 have not been executed for all the ranges specified in step S2, the image recognition apparatus slides the sliding window and performs step S3 again, as described above. Execute.

一方、ステップＳ６の判定の結果、ステップＳ２で指定された範囲全てについて、ステップＳ３～Ｓ５が実行されている場合は、画像認識装置は、保存しているスコアと、座標と、画像識別番号とを、外部に出力する。 On the other hand, as a result of the determination in step S6, if steps S3 to S5 have been executed for all the ranges specified in step S2, the image recognition device stores the saved score, coordinates, and image identification number. is output to the outside.

このように、従来の画像認識装置では、スライディングウィンドウ単位で、学習モデルを用いた推論が行われて、画像認識が行われる。 As described above, in the conventional image recognition apparatus, inference using a learning model is performed for each sliding window, and image recognition is performed.

特開２０１２－２４３１５５号公報JP 2012-243155 A 特開２０１４－０４１４２７号公報JP 2014-041427 A

しかしながら、従来の画像認識装置には、処理効率が低く、処理速度の向上が難しいという問題がある。具体的には、従来の画像認識装置は、上述したようにスライディングウィンドウ毎に、推論を実行する。そして、各スライディングウィンドウは、隣接する別のスライディングウィンドウと重なるように設定されている。このため、従来の画像認識装置は、重なっている部分については、重複して推論を実行しており、無駄な処理を行っている。結果、上述した問題が生じてしまう。 However, the conventional image recognition apparatus has a problem that the processing efficiency is low and it is difficult to improve the processing speed. Specifically, the conventional image recognition device performs inference for each sliding window as described above. Each sliding window is set so as to overlap another adjacent sliding window. For this reason, the conventional image recognition apparatus redundantly executes inference for the overlapping portion, and performs useless processing. As a result, the above-mentioned problem arises.

本発明の目的の一例は、上記問題を解消し、機械学習モデルを利用した画像認識において、処理効率の向上を図り得る、画像認識装置、画像認識方法、及びプログラムを提供することにある。 An example of an object of the present invention is to provide an image recognition apparatus, an image recognition method, and a program capable of solving the above problems and improving processing efficiency in image recognition using a machine learning model.

上記目的を達成するため、本発明の一側面における画像認識装置は、
画像から特定の物体を検出するための機械学習モデルの畳み込み層を用いて、画像認識の対象となる画像の特徴量マップを生成する、特徴量マップ生成部と、
前記特徴量マップ上に、仮想のウィンドウを設定し、前記ウィンドウを設定量だけスライドさせながら、複数の所定の位置において、前記特徴量マップの前記ウィンドウ内の領域を、前記機械学習モデルの全結合層に入力して、前記所定の位置毎に、当該領域に前記特定の物体が存在している可能性を示すスコアを算出する、スコア算出部と、
を備えている、ことを特徴とする。 In order to achieve the above object, an image recognition device according to one aspect of the present invention includes:
a feature map generation unit that generates a feature map of an image to be recognized by using a convolutional layer of a machine learning model for detecting a specific object from an image;
A virtual window is set on the feature map, and while the window is slid by a set amount, the area in the window of the feature map is displayed at a plurality of predetermined positions by the full coupling of the machine learning model. a score calculation unit that inputs to the layer and calculates a score indicating the possibility that the specific object exists in the region for each of the predetermined positions;
characterized by comprising

また、上記目的を達成するため、本発明の一側面における画像認識方法は、
（ａ）画像から特定の物体を検出するための機械学習モデルの畳み込み層を用いて、画像認識の対象となる画像の特徴量マップを生成する、ステップと、
（ｂ）前記特徴量マップ上に、仮想のウィンドウを設定し、前記ウィンドウを設定量だけスライドさせながら、複数の所定の位置において、前記特徴量マップの前記ウィンドウ内の領域を、前記機械学習モデルの全結合層に入力して、前記所定の位置毎に、当該領域に前記特定の物体が存在している可能性を示すスコアを算出する、ステップと、
を有する、ことを特徴とする。 Further, in order to achieve the above object, an image recognition method according to one aspect of the present invention includes:
(a) using a convolutional layer of a machine learning model for detecting specific objects from an image to generate a feature map of the image to be image recognized;
(b) setting a virtual window on the feature amount map, and sliding the window by a set amount, at a plurality of predetermined positions, the area within the window of the feature amount map, the machine learning model; inputting into the fully connected layer of and calculating, for each of the predetermined positions, a score indicating the possibility that the specific object exists in the region;
characterized by having

更に、上記目的を達成するため、本発明の一側面におけるプログラムは、
コンピュータに、
（ａ）画像から特定の物体を検出するための機械学習モデルの畳み込み層を用いて、画像認識の対象となる画像の特徴量マップを生成する、ステップと、
（ｂ）前記特徴量マップ上に、仮想のウィンドウを設定し、前記ウィンドウを設定量だけスライドさせながら、複数の所定の位置において、前記特徴量マップの前記ウィンドウ内の領域を、前記機械学習モデルの全結合層に入力して、前記所定の位置毎に、当該領域に前記特定の物体が存在している可能性を示すスコアを算出する、ステップと、
を実行させる、ことを特徴とする。 Furthermore, in order to achieve the above object, the program in one aspect of the present invention is
to the computer,
(a) using a convolutional layer of a machine learning model for detecting specific objects from an image to generate a feature map of the image to be image recognized;
(b) setting a virtual window on the feature amount map, and sliding the window by a set amount, at a plurality of predetermined positions, the area within the window of the feature amount map, the machine learning model; inputting into the fully connected layer of and calculating, for each of the predetermined positions, a score indicating the possibility that the specific object exists in the region;
is characterized by executing

以上のように、本発明によれば、機械学習モデルを利用した画像認識において、処理効率の向上を図ることができる。 As described above, according to the present invention, it is possible to improve processing efficiency in image recognition using a machine learning model.

図１は、本発明の実施の形態１における画像認識装置の概略構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of an image recognition device according to Embodiment 1 of the present invention. 図２は、本発明の実施の形態１における画像認識装置の構成を具体的に示すブロック図である。FIG. 2 is a block diagram specifically showing the configuration of the image recognition device according to Embodiment 1 of the present invention. 図３は、本発明の実施の形態１で得られる特徴量マップの一例を示す図である。FIG. 3 is a diagram showing an example of a feature quantity map obtained in Embodiment 1 of the present invention. 図４は、本発明の実施の形態１において用いられる識別器（機械学習モデル）の一例を示す図である。FIG. 4 is a diagram showing an example of a discriminator (machine learning model) used in Embodiment 1 of the present invention. 図５は、本発明の実施の形態１における画像認識装置の動作を示すフロー図である。FIG. 5 is a flowchart showing the operation of the image recognition device according to Embodiment 1 of the present invention. 図６は、本発明の実施の形態２における画像認識装置の構成を具体的に示すブロック図である。FIG. 6 is a block diagram specifically showing the configuration of the image recognition device according to Embodiment 2 of the present invention. 図７は、本発明の実施の形態２における識別器（機械学習モデル）の一例を示す図である。FIG. 7 is a diagram showing an example of a discriminator (machine learning model) according to Embodiment 2 of the present invention. 図８は、本発明の実施の形態２において識別器の畳み込み層が出力するデータの一例を示す図である。FIG. 8 is a diagram showing an example of data output from the convolution layer of the discriminator according to Embodiment 2 of the present invention. 図９は、本発明の実施の形態２における画像認識装置の動作を示すフロー図である。FIG. 9 is a flowchart showing the operation of the image recognition device according to Embodiment 2 of the present invention. 図１０は、本発明の実施の形態における画像認識装置を実現するコンピュータの一例を示すブロック図である。FIG. 10 is a block diagram showing an example of a computer that implements the image recognition device according to the embodiment of the present invention. 図１１は、従来からの画像認識装置で行われる処理を示すフロー図である。FIG. 11 is a flowchart showing processing performed by a conventional image recognition apparatus.

（実施の形態１）
以下、本発明の実施の形態１における画像認識装置、画像認識方法、及びプログラムについて、図１～図５を参照しながら説明する。 (Embodiment 1)
An image recognition apparatus, an image recognition method, and a program according to Embodiment 1 of the present invention will be described below with reference to FIGS. 1 to 5. FIG.

［装置構成］
最初に、図１を用いて、本発明の実施の形態１における画像認識装置の概略構成について説明する。図１は、本発明の実施の形態１における画像認識装置の概略構成を示すブロック図である。 [Device configuration]
First, using FIG. 1, a schematic configuration of an image recognition apparatus according to Embodiment 1 of the present invention will be described. FIG. 1 is a block diagram showing a schematic configuration of an image recognition device according to Embodiment 1 of the present invention.

図１に示す、本実施の形態１における画像認識装置１０は、画像から特定の物体を検出する装置である。図１に示すように、画像認識装置１０は、特徴量マップ生成部１１と、スコア算出部１２とを備えている。 An image recognition device 10 according to the first embodiment shown in FIG. 1 is a device for detecting a specific object from an image. As shown in FIG. 1 , the image recognition device 10 includes a feature quantity map generator 11 and a score calculator 12 .

特徴量マップ生成部１１は、画像から特定の物体を検出するための機械学習モデルの畳み込み層を用いて、画像認識の対象となる画像の特徴量マップを生成する。 The feature map generator 11 uses a convolutional layer of a machine learning model for detecting a specific object from an image to generate a feature map of an image to be recognized.

スコア算出部１２は、特徴量マップ上に、仮想のウィンドウを設定し、このウィンドウを設定量だけスライドさせながら、複数の所定の位置において、特徴量マップのウィンドウ内の領域を、機械学習モデルの全結合層に入力する。そして、この入力処理の結果から、スコア算出部１２は、所定の位置毎に、このウィンドウ内の領域に特定の物体が存在している可能性を示すスコアを算出する。 The score calculation unit 12 sets a virtual window on the feature quantity map, and while sliding this window by a set amount, at a plurality of predetermined positions, the area within the window of the feature quantity map is processed by the machine learning model. Enter the fully connected layer. Then, from the result of this input processing, the score calculation unit 12 calculates a score indicating the possibility that a specific object exists in the area within this window for each predetermined position.

このように、本実施の形態１では、まず、機械学習モデルの畳み込み層を用いて、特徴量マップが生成され、そして、この特徴量マップ上でスライディングウィンドウによる処理が行われる。このため、従来に比べて、重複して行われる処理が大きく低減されるので、本実施の形態１によれば、機械学習モデルを利用した画像認識において、処理効率の向上を図ることができる。 As described above, in the first embodiment, first, a feature map is generated using the convolutional layer of the machine learning model, and then processing is performed using a sliding window on this feature map. For this reason, as compared with the conventional art, redundant processing is greatly reduced, so according to the first embodiment, it is possible to improve processing efficiency in image recognition using a machine learning model.

続いて、図２～図４を用いて、本実施の形態１における画像認識装置１０の構成及び機能について具体的に説明する。図２は、本発明の実施の形態１における画像認識装置の構成を具体的に示すブロック図である。図３は、本発明の実施の形態１で得られる特徴量マップの一例を示す図である。図４は、本発明の実施の形態１において用いられる識別器（機械学習モデル）の一例を示す図である。 Next, the configuration and functions of the image recognition device 10 according to the first embodiment will be specifically described with reference to FIGS. 2 to 4. FIG. FIG. 2 is a block diagram specifically showing the configuration of the image recognition device according to Embodiment 1 of the present invention. FIG. 3 is a diagram showing an example of a feature quantity map obtained in Embodiment 1 of the present invention. FIG. 4 is a diagram showing an example of a discriminator (machine learning model) used in Embodiment 1 of the present invention.

図２に示すように、本実施の形態では、画像認識装置１０は、上述した特徴量マップ生成部１１及びスコア算出部１２に加えて、対象画像設定部１３と、特徴量マップ記憶部１４と、認識処理部１５と、機械学習モデルである識別器２０とを更に備えている。 As shown in FIG. 2, in the present embodiment, the image recognition apparatus 10 includes a target image setting unit 13 and a feature map storage unit 14 in addition to the feature map generation unit 11 and the score calculation unit 12 described above. , a recognition processing unit 15, and a discriminator 20, which is a machine learning model.

なお、識別器２０は、図２の例では、画像認識装置１０内に備えられているが、この例に限定されるものではない。識別器２０は、画像認識装置１０以外の装置に設けられていても良い。 Note that the discriminator 20 is provided in the image recognition device 10 in the example of FIG. 2, but is not limited to this example. The discriminator 20 may be provided in a device other than the image recognition device 10 .

対象画像設定部１３は、まず、画像認識の対象となる画像の画像データを取得する。続いて、対象画像設定部１３は、取得した画像データで特定される画像において、画像認識の対象となる範囲を設定する。 The target image setting unit 13 first acquires image data of an image to be subjected to image recognition. Subsequently, the target image setting unit 13 sets a range to be subjected to image recognition in the image specified by the acquired image data.

具体的には、対象画像設定部１３は、画像において、認識対象が含まれる可能性のある範囲を特定し、その範囲を設定する。また、認識対象が含まれる可能性のある範囲の特定は、例えば、画像中の物体の輪郭検出を行い、検出された輪郭が存在する範囲を特定することによって行われる。更に、範囲の設定は、設定された範囲の左上の座標、横幅、及び高さを設定することによって行われる。加えて、対象画像設定部１３は、画像データの画像全体を、画像認識の対象となる範囲として設定することもできる。 Specifically, the target image setting unit 13 specifies a range in which the recognition target may be included in the image, and sets the range. Further, the range in which the recognition target may be included is specified, for example, by detecting the contour of the object in the image and specifying the range in which the detected contour exists. Furthermore, the setting of the range is performed by setting the upper left coordinate, width and height of the set range. In addition, the target image setting unit 13 can also set the entire image of the image data as a target range for image recognition.

特徴量マップ生成部１１は、本実施の形態では、対象画像設定部１３によって設定された範囲について、識別器２０の畳み込み層（図４参照）を用いて、特徴量マップを生成する。また、特徴量マップ生成部１１は、生成した特徴量マップを、特徴量マップ記憶部１４に格納する。特徴量マップ記憶部１４は、例えば、メモリであり、その記憶領域上に、特徴量マップを格納する。 In this embodiment, the feature map generation unit 11 generates a feature map for the range set by the target image setting unit 13 using the convolution layer of the classifier 20 (see FIG. 4). Also, the feature map generation unit 11 stores the generated feature map in the feature map storage unit 14 . The feature map storage unit 14 is, for example, a memory, and stores feature maps in its storage area.

具体的には、図３に示すように、特徴量マップ生成部１１は、まず、対象画像設定部１３によって設定された範囲の画像データから、水平方向における全画素による行（ライン）をＮライン分毎に取り出す（Ｎ：任意の自然数）。続いて、特徴量マップ生成部１１は
、取り出したＮ行分の画像データを、順に、識別器２０の畳み込み層に入力する。これにより、Ｎライン分毎に、特徴量マップが生成される。図３においては、Ｎライン分の画像データと、それから生成された特徴量マップとが示されている。また、特徴量マップにおいて、格子は画素を示し、■は特徴量を示している。 Specifically, as shown in FIG. 3, the feature amount map generation unit 11 first divides the image data in the range set by the target image setting unit 13 into N rows (lines) of all pixels in the horizontal direction. Take out every minute (N: any natural number). Subsequently, the feature map generation unit 11 sequentially inputs the extracted N rows of image data to the convolution layer of the discriminator 20 . Thereby, a feature quantity map is generated every N lines. FIG. 3 shows image data for N lines and a feature map generated therefrom. Also, in the feature quantity map, grids indicate pixels, and ▪ indicate feature quantities.

スコア算出部１２は、本実施の形態では、まず、特徴量マップ記憶部１４から、Ｎライン分の特徴量マップを取り出す。続いて、スコア算出部１２は、取り出したＮライン分の特徴量マップ上で、設定した仮想のウィンドウを設定量だけスライドさせながら、複数の所定の位置において、特徴量マップのウィンドウ内の領域を、識別器２０の全結合層（図４参照）に入力する。そして、スコア算出部１２は、所定の位置毎の全結合層の出力結果を、所定の位置毎のスコアとする。なお、図３において、矩形の破線は、仮想のウィンドウを示している。 In the present embodiment, the score calculation unit 12 first retrieves feature maps for N lines from the feature map storage unit 14 . Subsequently, the score calculation unit 12 slides the set virtual window on the extracted N-line feature amount map by a set amount, and at a plurality of predetermined positions, the area within the window of the feature amount map. , are input to the fully connected layer of the discriminator 20 (see FIG. 4). Then, the score calculation unit 12 sets the output result of the fully connected layer for each predetermined position as the score for each predetermined position. Note that in FIG. 3, the rectangular dashed lines indicate virtual windows.

また、図４に示すように、識別器２０は、畳み込み層２１～２４と、全結合層２５とを備えている。図４の例では、画像データは、まず、畳み込み層２１に入力され、畳み込み層２１の出力が、畳み込み層２２と畳み込み層２４とに入力される。畳み込み層２４は、入力データに対してサイズ変換を行い、サイズ変換後の入力データを出力する。更に、畳み込み層２２の出力が、畳み込み層２３に入力され、畳み込み層２３の出力と畳み込み層２４の出力とが合成されて、特徴量マップとなる。 Further, as shown in FIG. 4, the discriminator 20 includes convolution layers 21 to 24 and a fully connected layer 25 . In the example of FIG. 4, image data is first input to convolutional layer 21 and the output of convolutional layer 21 is input to convolutional layer 22 and convolutional layer 24 . The convolution layer 24 performs size conversion on input data and outputs the input data after size conversion. Furthermore, the output of the convolutional layer 22 is input to the convolutional layer 23, and the output of the convolutional layer 23 and the output of the convolutional layer 24 are combined to form a feature map.

全結合層２５は、スコア算出部１２によって、特徴量マップのウィンドウ内の領域が入力されると、入力された領域に対して識別を行い、クラス毎に、画像中の物体がそのクラスに該当する確率を算出し、算出した確率を出力する。スコア算出部１２は、出力された確率を、スコアとする。 When the area within the window of the feature map is input by the score calculation unit 12, the fully connected layer 25 identifies the input area, and for each class, the object in the image corresponds to that class. Calculate the probability that The score calculation unit 12 uses the output probability as a score.

認識処理部１５は、まず、スコア算出部１２によって、所定の位置毎に算出されたスコアの中から、最も値の大きいスコアと、そのときの所定の位置とを特定する。そして、認識処理部１５は、この特定したスコアと所定の位置とを、画像認識の結果として、出力する。この出力結果によれば、画像中に、特定の物体が存在しているかどうかを判断することができる。 The recognition processing unit 15 first identifies the score with the largest value among the scores calculated for each predetermined position by the score calculation unit 12 and the predetermined position at that time. Then, the recognition processing unit 15 outputs the specified score and the predetermined position as a result of image recognition. Based on this output result, it can be determined whether or not a specific object exists in the image.

［装置動作］
次に、本実施の形態１における画像認識装置１０の動作について図５を用いて説明する。図５は、本発明の実施の形態１における画像認識装置の動作を示すフロー図である。以下の説明においては、適宜図１～図４を参照する。また、本実施の形態１では、画像認識装置１０を動作させることによって、画像認識方法が実施される。よって、本実施の形態１における画像認識方法の説明は、以下の画像認識装置１０の動作説明に代える。 [Device operation]
Next, the operation of the image recognition device 10 according to Embodiment 1 will be described with reference to FIG. FIG. 5 is a flowchart showing the operation of the image recognition device according to Embodiment 1 of the present invention. 1 to 4 will be referred to as needed in the following description. Further, in Embodiment 1, the image recognition method is implemented by operating the image recognition device 10 . Therefore, the description of the image recognition method in the first embodiment is replaced with the description of the operation of the image recognition apparatus 10 below.

図５に示すように、最初に、対象画像設定部１３は、画像認識の対象となる画像の画像データを取得する（ステップＡ１）。次に、対象画像設定部１３は、ステップＡ１で取得した画像データの画像において、画像認識の対象となる範囲を設定する（ステップＡ２）。 As shown in FIG. 5, first, the target image setting unit 13 acquires image data of an image to be subjected to image recognition (step A1). Next, the target image setting unit 13 sets a range to be subjected to image recognition in the image of the image data acquired in step A1 (step A2).

次に、特徴量マップ生成部１１は、ステップＡ２で設定された範囲の画像データから、Ｎライン分の画像データを取り出す（ステップＡ３）。続いて、特徴量マップ生成部１１は、取り出したＮライン分の画像データを、識別器２０の畳み込み層に入力して、特徴量マップを生成する（ステップＡ４）。また、特徴量マップ生成部１１は、生成した特徴量マップを、特徴量マップ記憶部１４に格納する。 Next, the feature map generator 11 extracts image data for N lines from the image data of the range set in step A2 (step A3). Subsequently, the feature map generation unit 11 inputs the extracted image data for N lines to the convolution layer of the classifier 20 to generate a feature map (step A4). Also, the feature map generation unit 11 stores the generated feature map in the feature map storage unit 14 .

次に、スコア算出部１２は、特徴量マップ記憶部１４から、Ｎライン分の特徴量マップ
を取り出す。そして、スコア算出部１２は、取り出した特徴量マップ上で、仮想のウィンドウをスライドさせながら、複数の所定の位置において、特徴量マップのウィンドウ内の領域を、識別器２０の全結合層に入力して、スコアを算出する（ステップＡ５）。 Next, the score calculation unit 12 retrieves feature maps for N lines from the feature map storage unit 14 . Then, the score calculation unit 12 inputs regions within the window of the feature map at a plurality of predetermined positions to the fully connected layer of the classifier 20 while sliding the virtual window on the extracted feature map. Then, the score is calculated (step A5).

次に、認識処理部１５は、ステップＡ５において所定の位置毎に算出されたスコアの中から、最も値の大きいスコアと、そのときの所定の位置とを特定し、この特定したスコアと所定の位置とを、画像認識の結果として、出力する（ステップＡ６）。 Next, the recognition processing unit 15 identifies the score with the largest value and the predetermined position at that time from among the scores calculated for each predetermined position in step A5, and identifies the identified score and the predetermined position. position is output as a result of image recognition (step A6).

次に、認識処理部１５は、ステップＡ２で設定された範囲全てについて、ステップＡ３～Ａ６の処理が終了しているかどうかを判定する（ステップＡ７）。 Next, the recognition processing unit 15 determines whether or not the processes of steps A3 to A6 have been completed for all the ranges set in step A2 (step A7).

ステップＡ７の判定の結果、ステップＡ２で設定された範囲全てについて、ステップＡ３～Ａ６の処理が終了していない場合は、認識処理部１５は、特徴量マップ生成部１１に再度ステップＡ３を実行させる。これにより、特徴量マップ生成部１１は、前回の画像データの下方に位置するＮライン分の画像データを取り出す。 As a result of the determination in step A7, if the processing of steps A3 to A6 has not been completed for all the ranges set in step A2, the recognition processing unit 15 causes the feature map generation unit 11 to execute step A3 again. . As a result, the feature map generation unit 11 extracts image data for N lines located below the previous image data.

一方、ステップＡ７の判定の結果、ステップＡ２で設定された範囲全てについて、ステップＡ３～Ａ６の処理が終了している場合は、画像認識装置における処理は終了する。 On the other hand, if the result of determination in step A7 is that the processing of steps A3 to A6 has been completed for all the ranges set in step A2, the processing in the image recognition apparatus is completed.

［実施の形態１における効果］
以上のように、本実施の形態１では、画像のＮライン分毎に特徴量マップが生成され、Ｎライン分の特徴量マップ毎に、全結合層を用いたスコアの算出が行われる。このため、従来のように、重複した特徴量マップの生成は行われないので、本実施の形態１によれば、機械学習モデルを利用した画像認識において、処理効率の向上を図ることができる。 [Effects of Embodiment 1]
As described above, in the first embodiment, a feature map is generated for each N lines of an image, and a score is calculated using a fully connected layer for each feature map of N lines. Therefore, unlike the conventional technique, redundant feature quantity maps are not generated. Therefore, according to the first embodiment, it is possible to improve processing efficiency in image recognition using a machine learning model.

［プログラム］
本実施の形態１におけるプログラムは、コンピュータに、図５に示すステップＡ１～Ａ７を実行させるプログラムであれば良い。このプログラムをコンピュータにインストールし、実行することによって、本実施の形態１における画像認識装置１０と画像認識方法とを実現することができる。この場合、コンピュータのプロセッサは、特徴量マップ生成部１１、スコア算出部１２、対象画像設定部１３、及び認識処理部１５として機能し、処理を行なう。 [program]
The program in the first embodiment may be any program that causes a computer to execute steps A1 to A7 shown in FIG. By installing this program in a computer and executing it, the image recognition apparatus 10 and the image recognition method according to the first embodiment can be realized. In this case, the processor of the computer functions as a feature quantity map generation unit 11, a score calculation unit 12, a target image setting unit 13, and a recognition processing unit 15, and performs processing.

また、本実施の形態におけるプログラムは、複数のコンピュータによって構築されたコンピュータシステムによって実行されても良い。この場合は、例えば、各コンピュータが、それぞれ、特徴量マップ生成部１１、スコア算出部１２、対象画像設定部１３、及び認識処理部１５のいずれかとして機能しても良い。 Also, the program in this embodiment may be executed by a computer system constructed by a plurality of computers. In this case, for example, each computer may function as one of the feature quantity map generation unit 11, the score calculation unit 12, the target image setting unit 13, and the recognition processing unit 15, respectively.

（実施の形態２）
次に、本発明の実施の形態２における画像認識装置、画像認識方法、及びプログラムについて、図６～図９を参照しながら説明する。 (Embodiment 2)
Next, an image recognition device, an image recognition method, and a program according to Embodiment 2 of the present invention will be described with reference to FIGS. 6 to 9. FIG.

［装置構成］
最初に、図６を用いて、本発明の実施の形態２における画像認識装置の概略構成について説明する。図６は、本発明の実施の形態２における画像認識装置の構成を具体的に示すブロック図である。 [Device configuration]
First, with reference to FIG. 6, a schematic configuration of an image recognition device according to Embodiment 2 of the present invention will be described. FIG. 6 is a block diagram specifically showing the configuration of the image recognition device according to Embodiment 2 of the present invention.

図６に示すように、本実施の形態２における画像認識装置３０は、実施の形態１における画像認識装置１０と同様の構成を備えているが、以下の点で異なっている。以下、実施の形態１との相違点を中心に説明する。 As shown in FIG. 6, the image recognition device 30 according to the second embodiment has the same configuration as the image recognition device 10 according to the first embodiment, but differs in the following points. The following description focuses on differences from the first embodiment.

まず、本実施の形態２における画像認識装置３０は、識別器（機械学習モデル）２０の構築に用いられた学習データに、パディングデータが付加されており、それによって、特徴量マップに余分なデータが付加される場合に対応している。このため、画像認識装置３０は、実施の形態１における画像認識装置１０と異なり、補正データ生成部３１と、補正データ記憶部３２とを備えている。 First, in the image recognition apparatus 30 according to the second embodiment, padding data is added to the learning data used to construct the discriminator (machine learning model) 20, so that extra data is added to the feature quantity map. is added. For this reason, unlike the image recognition device 10 according to the first embodiment, the image recognition device 30 includes a correction data generation section 31 and a correction data storage section 32 .

補正データ生成部３１は、補正データを生成し、生成した補正データを補正データ記憶部３２に格納する。補正データは、学習データに付加されたパディングによって特徴量マップに付加されたデータを補正するための、データである。スコア算出部１２は、本実施の形態２では、補正データ生成部３１によって生成された補正データを用いて、特徴量マップを補正し、補正後の特徴量マップを用いて、スコアを算出する。 The correction data generation unit 31 generates correction data and stores the generated correction data in the correction data storage unit 32 . Correction data is data for correcting the data added to the feature map by padding added to the learning data. In the second embodiment, the score calculation unit 12 corrects the feature quantity map using the correction data generated by the correction data generation unit 31, and calculates the score using the corrected feature quantity map.

また、本実施の形態２において用いられる識別器２０は、入力されたデータのサイズが、出力時において小さくなるという特性を有している。このため、上述したように、識別器２０の構築に用いられた学習データに、パディングデータが付加されている。 Further, the discriminator 20 used in the second embodiment has the characteristic that the size of the input data becomes smaller when it is output. Therefore, as described above, padding data is added to the learning data used to construct the discriminator 20 .

パディングとは、畳み込み層に入力されたデータのサイズと、それから出力されるデータのサイズとが変わらないように、入力されたデータに、新たにデータ（パディングデータ）を追加することである。例えば、畳み込み層のカーネルの幅及び高さが共にＫであるとする（幅ｗ＝高さｈ＝Ｋ）。この場合、パディングデータを付加しないと、畳み込み層から出力されたデータのサイズは、幅及び高さ共に、入力時のサイズから「－（Ｋ－１）」となる。具体的には、Ｋ＝３の場合は、幅及び高さ共に「２」小さくなる。 Padding is to add new data (padding data) to the input data so that the size of the data input to the convolutional layer and the size of the data output from it do not change. For example, let the kernel of a convolutional layer have both width and height K (width w=height h=K). In this case, if the padding data is not added, the size of the data output from the convolutional layer will be "-(K-1)" from the input size in both width and height. Specifically, when K=3, both the width and the height are reduced by "2".

従って、入力されるデータの矩形の外側に（Ｋ－１）／２ピクセルずつ、パディングデータを付加すれば、入力時と出力時とでデータの実質的なサイズを同一にすることができる。また、パディングデータの付加は、一般に、最初の畳み込み層（入力層）に対してだけではなく、下層の各畳み込み層に対しても行われる。また、通常は、パディングデータとしては、データ値がゼロとなったデータが使われる。この場合のパディングは「ゼロパディング」と称される。 Therefore, by adding padding data by (K-1)/2 pixels to the outside of the rectangle of the input data, it is possible to make the substantial size of the data the same at the time of input and output. Also, padding data is generally added not only to the first convolutional layer (input layer), but also to each lower convolutional layer. Also, normally, data with a data value of zero is used as padding data. The padding in this case is called "zero padding".

ここで、図７及び図８を用いて、本実施の形態２における特徴量マップ生成部１１及び補正データ生成部３１の機能についてより具体的に説明する。図７は、本発明の実施の形態２における識別器（機械学習モデル）の一例を示す図である。図８は、本発明の実施の形態２において識別器の畳み込み層が出力するデータの一例を示す図である。 Here, the functions of the feature quantity map generation unit 11 and the correction data generation unit 31 according to the second embodiment will be described more specifically with reference to FIGS. 7 and 8. FIG. FIG. 7 is a diagram showing an example of a discriminator (machine learning model) according to Embodiment 2 of the present invention. FIG. 8 is a diagram showing an example of data output from the convolution layer of the discriminator according to Embodiment 2 of the present invention.

上述したように、ゼロパディングのように固定値をパディングすると、それにより矩形の境界において、畳み込み層からの出力結果が変化してしまう可能性がある。このため、本実施の形態では、特徴量マップ生成部１１は、対象画像設定部１３によって設定された範囲の画像データにパディングデータを付加し、パディング後の画像データを、畳み込み層２１に入力する。 As mentioned above, padding with fixed values, such as zero padding, can cause the output results from the convolutional layers to change at the bounds of the rectangle. Therefore, in the present embodiment, the feature map generation unit 11 adds padding data to the image data in the range set by the target image setting unit 13, and inputs the image data after padding to the convolution layer 21. .

また、このため、本実施の形態２では、識別器２０の構築に用いられる学習データにおいても、学習データとなる画像の周辺の画素データがパディングされる。例えば、スライディングウィンドウの矩形サイズが６４画素×８０画素であり、畳み込み層のカーネルのサイズがＫ＝３であるとする。この場合、機械学習においては、学習データとなる画像の周辺の１画素（＝（Ｋ－１）／２）を含めた６６画素×８２画素の画像が使用される。 For this reason, in the second embodiment, also in the learning data used to construct the discriminator 20, peripheral pixel data of the image serving as the learning data is padded. For example, suppose the rectangular size of the sliding window is 64 pixels by 80 pixels and the size of the convolutional layer kernel is K=3. In this case, in machine learning, an image of 66 pixels×82 pixels including one pixel (=(K−1)/2) around the image serving as learning data is used.

ところで、入力層（畳み込み層２１）以外の畳み込み層では、適切なパディングデータの値を特定することが困難である。なお、畳み込み層２４は、入力データに対してサイズ
変換を行うだけであるので、そのカーネルは、１×１であり、畳み込み層２４では、パディングは発生しない。 By the way, it is difficult to specify appropriate padding data values in convolutional layers other than the input layer (convolutional layer 21). Since the convolutional layer 24 only performs size conversion on the input data, its kernel is 1×1, and no padding occurs in the convolutional layer 24 .

このため、本実施の形態２では、図７に示すように、補正データ生成部３１は、畳み込み層２２及び２３のパディングデータに対応する補正データを生成する。また、補正データ生成部３１は、対応する畳み込み層の数だけ、補正ブロックを有している。図８の例では、補正データ生成部３１は、補正ブロック３１ａと、補正ブロック３１ｂとを有している。また、補正データ生成部３１は、畳み込み層での処理と同じに処理を実行でき、補正データ生成による処理の遅延を抑制している。 Therefore, in the second embodiment, the correction data generator 31 generates correction data corresponding to the padding data of the convolution layers 22 and 23, as shown in FIG. Further, the correction data generator 31 has correction blocks corresponding to the number of corresponding convolution layers. In the example of FIG. 8, the correction data generator 31 has a correction block 31a and a correction block 31b. Further, the correction data generation unit 31 can execute the same processing as in the convolutional layer, thereby suppressing delay in processing due to correction data generation.

ここで、図８に示すように、畳み込み層（入力層）２１の出力をp[n,m]、畳み込み層２２の出力をq[n,m]、畳み込み層２３の出力をR[n,m]とする。また、補正データ生成部３１による補正データによって補正された後の畳み込み層２３の出力をｒ[n,m]とする。なお、ｎは行を示し、ｍは列を示す。 Here, as shown in FIG. 8, the output of the convolutional layer (input layer) 21 is p[n,m], the output of the convolutional layer 22 is q[n,m], and the output of the convolutional layer 23 is R[n, m]. Let r[n,m] be the output of the convolutional layer 23 after being corrected by the correction data generated by the correction data generation unit 31 . Note that n indicates a row and m indicates a column.

この場合、補正後の出力ｒ[n,m]は、ゼロパディングが行われていた場合と同一となる。よって、r[1,4]を例に挙げると、以下の数１によって算出される。 In this case, the corrected output r[n,m] is the same as when zero padding is performed. Therefore, taking r[1,4] as an example, it is calculated by Equation 1 below.

［数１］
r[1,4] = q[0,4]*w2[0,1]+q[0,5]*w2[0,2]
+q[1,4]*w2[1,1]+q[1,5]*w2[1,2]
+q[2,4]*w2[2,1]+q[2,5]*w2[2,2] [Number 1]
r[1,4] = q[0,4]*w2[0,1]+q[0,5]*w2[0,2]
+q[1,4]*w2[1,1]+q[1,5]*w2[1,2]
+q[2,4]*w2[2,1]+q[2,5]*w2[2,2]

これに対して、本実施の形態２では、実施の形態１で述べたように、Ｎライン分のデータが畳み込みの対象となるので、パディングデータとして実データが用いられている。従って、畳み込み層２３の出力R[1,4]は、以下の数２によって算出される。 On the other hand, in the second embodiment, as described in the first embodiment, the data for N lines is the object of convolution, so the real data is used as the padding data. Therefore, the output R[1,4] of the convolutional layer 23 is calculated by Equation 2 below.

［数２］
R[1,4] = q[0,3]*w2[0,0] + { q[0,4]*w2[0,1]+q[0,5]*w2[0,2] }
+q[1,3]*w2[1,0] + { q[1,4]*w2[1,1]+q[1,5]*w2[1,2] }
+q[2,3]*w2[2,0] + { q[2,4]*w2[2,1]+q[2,5]*w2[2,2] }
= r[1,4] + { q[0,3]*w2[0,0] + q[1,3]*w2[1,0] + q[2,3]*w2[2,0] } [Number 2]
R[1,4] = q[0,3]*w2[0,0] + { q[0,4]*w2[0,1]+q[0,5]*w2[0,2] }
+q[1,3]*w2[1,0] + { q[1,4]*w2[1,1]+q[1,5]*w2[1,2] }
+q[2,3]*w2[2,0] + { q[2,4]*w2[2,1]+q[2,5]*w2[2,2] }
= r[1,4] + { q[0,3]*w2[0,0] + q[1,3]*w2[1,0] + q[2,3]*w2[2,0] }

また、上記数２において、r[1,4]をR[1,4]で表すと、下記数３に示す通りとなる。 Also, if r[1,4] in Equation 2 is represented by R[1,4], Equation 3 below is obtained.

［数３］
r[1,4] = R[1,4] - { q[0,3]*w2[0,0] + q[1,3]*w2[1,0] + q[2,3]*w2[2,0] } [Number 3]
r[1,4] = R[1,4] - { q[0,3]*w2[0,0] + q[1,3]*w2[1,0] + q[2,3]* w2[2,0]}

ここで、上記数３において、｛｝内をCr[1,4]とすると、上記数３は、下記の数４によって表すことができる。 Here, if Cr[1,4] is set in { } in Equation 3, Equation 3 can be expressed by Equation 4 below.

［数４］
r[1,4] = R[1,4] - Cr[1,4] [Number 4]
r[1,4] = R[1,4] - Cr[1,4]

上記数４におけるC[1,4]が、パディングを補正するための補正データとなる。本実施の形態２においては、補正データ生成部３１の補正ブロック３１ｂは、この補正データを生成し、これを補正データ記憶部３２に格納する。 C[1,4] in Equation 4 above is correction data for correcting padding. In the second embodiment, the correction block 31b of the correction data generation section 31 generates this correction data and stores it in the correction data storage section 32. FIG.

また、畳み込み層２２の出力q[n,m]も、畳み込み層２１でのパディングの影響を受ける。但し、畳み込み層２１からの出力には、パディングデータは含まれていない。このため
、補正ブロック３１ａも、上述の補正ブロック３１ｂと同様の処理を実行する。 The output q[n,m] of convolutional layer 22 is also affected by the padding in convolutional layer 21 . However, the output from the convolutional layer 21 does not contain padding data. Therefore, the correction block 31a also performs the same processing as the correction block 31b described above.

［装置動作］
次に、本実施の形態２における画像認識装置３０の動作について図９を用いて説明する。図９は、本発明の実施の形態２における画像認識装置の動作を示すフロー図である。以下の説明においては、適宜図６～図８を参照する。また、本実施の形態２においても、画像認識装置３０を動作させることによって、画像認識方法が実施される。よって、本実施の形態２における画像認識方法の説明は、以下の画像認識装置３０の動作説明に代える。 [Device operation]
Next, the operation of the image recognition device 30 according to the second embodiment will be explained using FIG. FIG. 9 is a flowchart showing the operation of the image recognition device according to Embodiment 2 of the present invention. 6 to 8 will be referred to as needed in the following description. Also in the second embodiment, the image recognition method is implemented by operating the image recognition device 30 . Therefore, the description of the image recognition method in the second embodiment is replaced with the description of the operation of the image recognition device 30 below.

図９に示すように、最初に、対象画像設定部１３は、画像認識の対象となる画像の画像データを取得する（ステップＢ１）。次に、対象画像設定部１３は、ステップＡ１で取得した画像データの画像において、画像認識の対象となる範囲を設定する（ステップＢ２）。 As shown in FIG. 9, first, the target image setting unit 13 acquires image data of an image to be subjected to image recognition (step B1). Next, the target image setting unit 13 sets a range to be subjected to image recognition in the image of the image data acquired in step A1 (step B2).

次に、特徴量マップ生成部１１は、ステップＢ２で設定された範囲の画像データから、Ｎライン分の画像データを取り出す（ステップＢ３）。続いて、特徴量マップ生成部１１は、取り出したＮライン分の画像データを、識別器２０の畳み込み層に入力して、特徴量マップを生成する（ステップＢ４）。また、特徴量マップ生成部１１は、生成した特徴量マップを、特徴量マップ記憶部１４に格納する。 Next, the feature map generator 11 extracts image data for N lines from the image data of the range set in step B2 (step B3). Subsequently, the feature map generation unit 11 inputs the extracted N lines of image data to the convolution layer of the classifier 20 to generate a feature map (step B4). Also, the feature map generation unit 11 stores the generated feature map in the feature map storage unit 14 .

次に、補正データ生成部３１は、パディングによって特徴量マップに付加されたデータを補正するため、補正データを生成し、生成した補正データを補正データ記憶部３２に格納する（ステップＢ５）。なお、ステップＢ５は、ステップＢ４と同じに実行されても良い。 Next, the correction data generation unit 31 generates correction data in order to correct the data added to the feature map by padding, and stores the generated correction data in the correction data storage unit 32 (step B5). Note that step B5 may be executed in the same manner as step B4.

次に、スコア算出部１２は、特徴量マップ記憶部１４から、Ｎライン分の特徴量マップを取り出し、更に、補正データ記憶部３２から補正データを取り出す。そして、スコア算出部１２は、取り出した特徴量マップを、補正データを用いて補正する（ステップＢ６）。 Next, the score calculation unit 12 retrieves N lines of feature quantity maps from the feature quantity map storage unit 14 , and further extracts correction data from the correction data storage unit 32 . Then, the score calculation unit 12 corrects the extracted feature amount map using the correction data (step B6).

次に、スコア算出部１２は、補正後の特徴量マップ上で、仮想のウィンドウをスライドさせながら、複数の所定の位置において、特徴量マップのウィンドウ内の領域を、識別器２０の全結合層に入力して、スコアを算出する（ステップＢ７）。 Next, the score calculation unit 12, while sliding the virtual window on the corrected feature map, divides the region within the window of the feature map into the fully connected layer of the classifier 20 at a plurality of predetermined positions. to calculate the score (step B7).

次に、認識処理部１５は、ステップＢ７において所定の位置毎に算出されたスコアの中から、最も値の大きいスコアと、そのときの所定の位置とを特定し、この特定したスコアと所定の位置とを、画像認識の結果として、出力する（ステップＢ８）。 Next, the recognition processing unit 15 identifies the score with the largest value and the predetermined position at that time from among the scores calculated for each predetermined position in step B7, and identifies the identified score and the predetermined position. position is output as a result of image recognition (step B8).

次に、認識処理部１５は、ステップＢ２で設定された範囲全てについて、ステップＢ３～Ｂ８の処理が終了しているかどうかを判定する（ステップＢ９）。 Next, the recognition processing unit 15 determines whether or not the processes of steps B3 to B8 have been completed for all the ranges set in step B2 (step B9).

ステップＢ９の判定の結果、ステップＢ２で設定された範囲全てについて、ステップＢ３～Ｂ８の処理が終了していない場合は、認識処理部１５は、特徴量マップ生成部１１に再度ステップＢ３を実行させる。これにより、特徴量マップ生成部１１は、前回の画像データの下方に位置するＮライン分の画像データを取り出す。 As a result of the determination in step B9, if the processing of steps B3 to B8 has not been completed for all the ranges set in step B2, the recognition processing unit 15 causes the feature map generation unit 11 to execute step B3 again. . As a result, the feature map generation unit 11 extracts image data for N lines located below the previous image data.

一方、ステップＢ９の判定の結果、ステップＢ２で設定された範囲全てについて、ステップＢ３～Ｂ８の処理が終了している場合は、画像認識装置における処理は終了する。 On the other hand, if the result of determination in step B9 is that the processing of steps B3 to B8 has been completed for all the ranges set in step B2, the processing in the image recognition device is terminated.

［実施の形態２における効果］
以上のように、本実施の形態２によれば、学習データにパディングが必要となる識別器２０が用いられる場合において、パディングデータを補正することができ、このような場合における識別精度の低下を抑制できる。また、本実施の形態２においても、実施の形態１と同様に、機械学習モデルを利用した画像認識において、処理効率の向上を図ることができる。 [Effects of Embodiment 2]
As described above, according to the second embodiment, when the discriminator 20 that requires padding for learning data is used, the padding data can be corrected. can be suppressed. Also in the second embodiment, as in the first embodiment, it is possible to improve the processing efficiency in image recognition using a machine learning model.

［プログラム］
本実施の形態２におけるプログラムは、コンピュータに、図９に示すステップＢ１～Ｂ９を実行させるプログラムであれば良い。このプログラムをコンピュータにインストールし、実行することによって、本実施の形態２における画像認識装置３０と画像認識方法とを実現することができる。この場合、コンピュータのプロセッサは、特徴量マップ生成部１１、スコア算出部１２、対象画像設定部１３、認識処理部１５、及び補正データ生成部３１として機能し、処理を行なう。 [program]
The program in the second embodiment may be any program that causes a computer to execute steps B1 to B9 shown in FIG. By installing this program in a computer and executing it, the image recognition apparatus 30 and the image recognition method according to the second embodiment can be realized. In this case, the processor of the computer functions as the feature quantity map generator 11, the score calculator 12, the target image setter 13, the recognition processor 15, and the correction data generator 31, and performs processing.

また、本実施の形態におけるプログラムは、複数のコンピュータによって構築されたコンピュータシステムによって実行されても良い。この場合は、例えば、各コンピュータが、それぞれ、特徴量マップ生成部１１、スコア算出部１２、対象画像設定部１３、認識処理部１５、及び補正データ生成部３１のいずれかとして機能しても良い。 Also, the program in this embodiment may be executed by a computer system constructed by a plurality of computers. In this case, for example, each computer may function as one of the feature quantity map generation unit 11, the score calculation unit 12, the target image setting unit 13, the recognition processing unit 15, and the correction data generation unit 31. .

（変形例）
上述した実施の形態１及び２においては、図３及び図４に例示した識別器２０が用いられているが、実施の形態１及び２において、識別器は特に限定されるものではない。特に、実施の形態１においては、パディングが必要のない識別器が用いられていても良い。 (Modification)
Although the discriminator 20 illustrated in FIGS. 3 and 4 is used in the first and second embodiments described above, the discriminator is not particularly limited in the first and second embodiments. In particular, in Embodiment 1, discriminators that do not require padding may be used.

（物理構成）
ここで、実施の形態１及び２におけるプログラムを実行することによって、画像認識装置を実現するコンピュータについて図１０を用いて説明する。図１０は、本発明の実施の形態における画像認識装置を実現するコンピュータの一例を示すブロック図である。 (physical configuration)
A computer that implements the image recognition apparatus by executing the programs in the first and second embodiments will now be described with reference to FIG. FIG. 10 is a block diagram showing an example of a computer that implements the image recognition device according to the embodiment of the present invention.

図１０に示すように、コンピュータ１１０は、ＣＰＵ１１１と、メインメモリ１１２と、記憶装置１１３と、入力インターフェイス１１４と、表示コントローラ１１５と、データリーダ／ライタ１１６と、通信インターフェイス１１７とを備える。これらの各部は、バス１２１を介して、互いにデータ通信可能に接続される。また、コンピュータ１１０は、ＣＰＵ１１１に加えて、又はＣＰＵ１１１に代えて、ＧＰＵ（Graphics Processing Unit）、又はＦＰＧＡ（Field-Programmable Gate Array）又はＡＳＩＣ（Application Specific IC）を備えていても良い。 As shown in FIG. 10 , computer 110 includes CPU 111 , main memory 112 , storage device 113 , input interface 114 , display controller 115 , data reader/writer 116 and communication interface 117 . These units are connected to each other via a bus 121 so as to be able to communicate with each other. Further, the computer 110 may include a GPU (Graphics Processing Unit), FPGA (Field-Programmable Gate Array), or ASIC (Application Specific IC) in addition to or instead of the CPU 111 .

ＣＰＵ１１１は、記憶装置１１３に格納された、本実施の形態におけるプログラム（コード）をメインメモリ１１２に展開し、これらを所定順序で実行することにより、各種の演算を実施する。メインメモリ１１２は、典型的には、ＤＲＡＭ（Dynamic Random Access Memory）等の揮発性の記憶装置である。また、本実施の形態におけるプログラムは、コンピュータ読み取り可能な記録媒体１２０に格納された状態で提供される。なお、本実施の形態におけるプログラムは、通信インターフェイス１１７を介して接続されたインターネット上で流通するものであっても良い。 The CPU 111 expands the programs (codes) of the present embodiment stored in the storage device 113 into the main memory 112 and executes them in a predetermined order to perform various calculations. The main memory 112 is typically a volatile storage device such as a DRAM (Dynamic Random Access Memory). Also, the program in the present embodiment is provided in a state stored in computer-readable recording medium 120 . It should be noted that the program in this embodiment may be distributed on the Internet connected via communication interface 117 .

また、記憶装置１１３の具体例としては、ハードディスクドライブの他、フラッシュメモリ等の半導体記憶装置が挙げられる。入力インターフェイス１１４は、ＣＰＵ１１１と、キーボード及びマウスといった入力機器１１８との間のデータ伝送を仲介する。表示コントローラ１１５は、ディスプレイ装置１１９と接続され、ディスプレイ装置１１９での表示を制御する。 Further, as a specific example of the storage device 113, in addition to a hard disk drive, a semiconductor storage device such as a flash memory can be cited. Input interface 114 mediates data transmission between CPU 111 and input devices 118 such as a keyboard and mouse. The display controller 115 is connected to the display device 119 and controls display on the display device 119 .

データリーダ／ライタ１１６は、ＣＰＵ１１１と記録媒体１２０との間のデータ伝送を仲介し、記録媒体１２０からのプログラムの読み出し、及びコンピュータ１１０における処理結果の記録媒体１２０への書き込みを実行する。通信インターフェイス１１７は、ＣＰＵ１１１と、他のコンピュータとの間のデータ伝送を仲介する。 Data reader/writer 116 mediates data transmission between CPU 111 and recording medium 120 , reads programs from recording medium 120 , and writes processing results in computer 110 to recording medium 120 . Communication interface 117 mediates data transmission between CPU 111 and other computers.

また、記録媒体１２０の具体例としては、ＣＦ（Compact Flash（登録商標））及びＳＤ（Secure Digital）等の汎用的な半導体記憶デバイス、フレキシブルディスク（Flexible Disk）等の磁気記録媒体、又はＣＤ－ＲＯＭ（Compact Disk Read Only Memory）などの光学記録媒体が挙げられる。 Specific examples of the recording medium 120 include general-purpose semiconductor storage devices such as CF (Compact Flash (registered trademark)) and SD (Secure Digital); magnetic recording media such as flexible disks; An optical recording medium such as a ROM (Compact Disk Read Only Memory) can be used.

なお、本実施の形態における画像認識装置は、プログラムがインストールされたコンピュータではなく、各部に対応したハードウェアを用いることによっても実現可能である。更に、画像認識装置は、一部がプログラムで実現され、残りの部分がハードウェアで実現されていてもよい。 It should be noted that the image recognition apparatus according to the present embodiment can also be realized by using hardware corresponding to each part instead of a computer in which a program is installed. Furthermore, the image recognition device may be partly implemented by a program and the rest by hardware.

上述した実施の形態の一部又は全部は、以下に記載する（付記１）～（付記１２）によって表現することができるが、以下の記載に限定されるものではない。 Some or all of the above-described embodiments can be expressed by (Appendix 1) to (Appendix 12) described below, but are not limited to the following descriptions.

（付記１）
画像から特定の物体を検出するための機械学習モデルの畳み込み層を用いて、画像認識の対象となる画像の特徴量マップを生成する、特徴量マップ生成部と、
前記特徴量マップ上に、仮想のウィンドウを設定し、前記ウィンドウを設定量だけスライドさせながら、複数の所定の位置において、前記特徴量マップの前記ウィンドウ内の領域を、前記機械学習モデルの全結合層に入力して、前記所定の位置毎に、当該領域に前記特定の物体が存在している可能性を示すスコアを算出する、スコア算出部と、
を備えている、ことを特徴とする画像認識装置。 (Appendix 1)
a feature map generation unit that generates a feature map of an image to be recognized by using a convolutional layer of a machine learning model for detecting a specific object from an image;
A virtual window is set on the feature map, and while the window is slid by a set amount, the area in the window of the feature map is displayed at a plurality of predetermined positions by the full coupling of the machine learning model. a score calculation unit that inputs to the layer and calculates a score indicating the possibility that the specific object exists in the region for each of the predetermined positions;
An image recognition device characterized by comprising:

（付記２）
付記１に記載の画像認識装置であって、
画像データを取得し、取得した画像データで特定される画像において、前記画像認識の対象となる範囲を設定する、対象画像設定部を、更に備え、
前記特徴量マップ生成部が、設定された範囲について、前記特徴量マップを生成する、ことを特徴とする画像認識装置。 (Appendix 2)
The image recognition device according to Appendix 1,
further comprising a target image setting unit that acquires image data and sets a target range for the image recognition in the image specified by the acquired image data,
The image recognition apparatus, wherein the feature map generation unit generates the feature map for a set range.

（付記３）
付記１または２に記載の画像認識装置であって、
前記所定の位置毎に算出された前記スコアの中から、最も値の大きい前記スコアと、そのときの前記所定の位置とを特定し、特定した前記スコアと前記所定の位置とを出力する、認識処理部を、更に備えている、
ことを特徴とする画像認識装置。 (Appendix 3)
The image recognition device according to appendix 1 or 2,
identifying the score having the largest value and the predetermined position at that time from the scores calculated for each of the predetermined positions, and outputting the identified score and the predetermined position; further comprising a processing unit;
An image recognition device characterized by:

（付記４）
付記１～３のいずれかに記載の画像認識装置であって、
前記機械学習モデルの構築に用いられた学習データに、パディングデータが付加されており、それによって、前記特徴量マップに余分なデータが付加される場合に、付加されているデータを補正するための、補正データを生成する、補正データ生成部を、更に備え、
前記スコア算出部は、生成された前記補正データを用いて、前記特徴量マップを補正し、補正後の前記特徴量マップを用いて、前記スコアを算出する、
ことを特徴とする画像認識装置。 (Appendix 4)
The image recognition device according to any one of Appendices 1 to 3,
Padding data is added to the learning data used to build the machine learning model, and when extra data is added to the feature map, the added data is corrected. , a correction data generation unit that generates correction data,
The score calculation unit corrects the feature amount map using the generated correction data, and calculates the score using the corrected feature amount map.
An image recognition device characterized by:

（付記５）
（ａ）画像から特定の物体を検出するための機械学習モデルの畳み込み層を用いて、画像認識の対象となる画像の特徴量マップを生成する、ステップと、
（ｂ）前記特徴量マップ上に、仮想のウィンドウを設定し、前記ウィンドウを設定量だけスライドさせながら、複数の所定の位置において、前記特徴量マップの前記ウィンドウ内の領域を、前記機械学習モデルの全結合層に入力して、前記所定の位置毎に、当該領域に前記特定の物体が存在している可能性を示すスコアを算出する、ステップと、
を有する、ことを特徴とする画像認識方法。 (Appendix 5)
(a) using a convolutional layer of a machine learning model for detecting specific objects from an image to generate a feature map of the image to be image recognized;
(b) setting a virtual window on the feature amount map, and sliding the window by a set amount, at a plurality of predetermined positions, the area within the window of the feature amount map, the machine learning model; inputting into the fully connected layer of and calculating, for each of the predetermined positions, a score indicating the possibility that the specific object exists in the region;
An image recognition method characterized by comprising:

（付記６）
付記５に記載の画像認識方法であって、
（ｃ）画像データを取得し、取得した画像データで特定される画像において、前記画像認識の対象となる範囲を設定する、ステップを、更に有する、
前記（ａ）のステップにおいて、設定された範囲について、前記特徴量マップを生成する、
ことを特徴とする画像認識方法。 (Appendix 6)
The image recognition method according to appendix 5,
(c) acquiring image data, and setting a range to be subjected to image recognition in the image specified by the acquired image data;
generating the feature quantity map for the set range in step (a);
An image recognition method characterized by:

（付記７）
付記５または６に記載の画像認識方法であって、
（ｄ）前記所定の位置毎に算出された前記スコアの中から、最も値の大きい前記スコアと、そのときの前記所定の位置とを特定し、特定した前記スコアと前記所定の位置とを出力する、ステップを、更に有する、
ことを特徴とする画像認識方法。 (Appendix 7)
The image recognition method according to appendix 5 or 6,
(d) identifying the score having the largest value and the predetermined position at that time among the scores calculated for each of the predetermined positions, and outputting the identified score and the predetermined position; further comprising the step of
An image recognition method characterized by:

（付記８）
付記５～７のいずれかに記載の画像認識方法であって、
（ｅ）前記機械学習モデルの構築に用いられた学習データに、パディングデータが付加されており、それによって、前記特徴量マップに余分なデータが付加される場合に、付加されているデータを補正するための、補正データを生成する、ステップを、更に有し、
前記（ｂ）のステップにおいて、生成された前記補正データを用いて、前記特徴量マップを補正し、補正後の前記特徴量マップを用いて、前記スコアを算出する、
ことを特徴とする画像認識方法。 (Appendix 8)
The image recognition method according to any one of Appendices 5 to 7,
(e) When padding data is added to the learning data used to build the machine learning model, and extra data is added to the feature map, the added data is corrected. generating correction data for
In step (b), correcting the feature quantity map using the generated correction data, and calculating the score using the corrected feature quantity map;
An image recognition method characterized by:

（付記９）
コンピュータに、
（ａ）画像から特定の物体を検出するための機械学習モデルの畳み込み層を用いて、画像認識の対象となる画像の特徴量マップを生成する、ステップと、
（ｂ）前記特徴量マップ上に、仮想のウィンドウを設定し、前記ウィンドウを設定量だけスライドさせながら、複数の所定の位置において、前記特徴量マップの前記ウィンドウ内の領域を、前記機械学習モデルの全結合層に入力して、前記所定の位置毎に、当該領域に前記特定の物体が存在している可能性を示すスコアを算出する、ステップと、
を実行させる、プログラム。 (Appendix 9)
to the computer,
(a) using a convolutional layer of a machine learning model for detecting specific objects from an image to generate a feature map of the image to be image recognized;
(b) setting a virtual window on the feature amount map, and sliding the window by a set amount, at a plurality of predetermined positions, the area within the window of the feature amount map, the machine learning model; inputting into the fully connected layer of and calculating, for each of the predetermined positions, a score indicating the possibility that the specific object exists in the region;
The program that causes the to run.

（付記１０）
付記９に記載のプログラムであって、
前記コンピュータに、
（ｃ）画像データを取得し、取得した画像データで特定される画像において、前記画像認識の対象となる範囲を設定する、ステップを、更に実行させ、
前記（ａ）のステップにおいて、設定された範囲について、前記特徴量マップを生成す
る、
ことを特徴とするプログラム。 (Appendix 10)
The program according to Appendix 9,
to the computer;
(c) obtaining image data and setting a range to be subjected to image recognition in an image specified by the obtained image data;
generating the feature quantity map for the set range in step (a);
A program characterized by

（付記１１）
付記９または１０に記載のプログラムであって、
前記コンピュータに、
（ｄ）前記所定の位置毎に算出された前記スコアの中から、最も値の大きい前記スコアと、そのときの前記所定の位置とを特定し、特定した前記スコアと前記所定の位置とを出力する、ステップを、更に実行させ、
ことを特徴とするプログラム。 (Appendix 11)
The program according to Appendix 9 or 10,
to the computer;
(d) identifying the score having the largest value and the predetermined position at that time among the scores calculated for each of the predetermined positions, and outputting the identified score and the predetermined position; do, cause the step to be executed further,
A program characterized by

（付記１２）
付記９～１１のいずれかに記載のプログラムであって、
前記コンピュータに、
（ｅ）前記機械学習モデルの構築に用いられた学習データに、パディングデータが付加されており、それによって、前記特徴量マップに余分なデータが付加される場合に、付加されているデータを補正するための、補正データを生成する、ステップを、更に実行させ、
前記（ｂ）のステップにおいて、生成された前記補正データを用いて、前記特徴量マップを補正し、補正後の前記特徴量マップを用いて、前記スコアを算出する、
ことを特徴とするプログラム。 (Appendix 12)
The program according to any one of Appendices 9 to 11,
to the computer;
(e) When padding data is added to the learning data used to build the machine learning model, and extra data is added to the feature map, the added data is corrected. further executing the step of generating correction data for
In step (b), correcting the feature quantity map using the generated correction data, and calculating the score using the corrected feature quantity map;
A program characterized by

以上のように、本発明によれば、機械学習モデルを利用した画像認識において、処理効率の向上を図ることができる。本発明は、画像認識が必要される種々のシステムに有用である。 As described above, according to the present invention, it is possible to improve processing efficiency in image recognition using a machine learning model. INDUSTRIAL APPLICABILITY The present invention is useful for various systems that require image recognition.

１０画像認識装置（実施の形態１）
１１特徴量マップ生成部
１２スコア算出部
１３対象画像設定部
１４特徴量マップ記憶部
１５認識処理部
２０機械学習モデルである識別器
２１～２４畳み込み層
２５全結合層
３０画像認識装置（実施の形態２）
３１補正データ生成部
３１ａ、３１ｂ補正ブロック
３２補正データ記憶部
１１０コンピュータ
１１１ＣＰＵ
１１２メインメモリ
１１３記憶装置
１１４入力インターフェイス
１１５表示コントローラ
１１６データリーダ／ライタ
１１７通信インターフェイス
１１８入力機器
１１９ディスプレイ装置
１２０記録媒体
１２１バス 10 Image Recognition Apparatus (Embodiment 1)
11 feature map generation unit 12 score calculation unit 13 target image setting unit 14 feature map storage unit 15 recognition processing unit 20 classifier which is a machine learning model 21 to 24 convolution layer 25 fully connected layer 30 image recognition device (embodiment 2)
31 correction data generation unit 31a, 31b correction block 32 correction data storage unit 110 computer 111 CPU
112 main memory 113 storage device 114 input interface 115 display controller 116 data reader/writer 117 communication interface 118 input device 119 display device 120 recording medium 121 bus

Claims

a feature map generation unit that generates a feature map of an image to be recognized by using a convolutional layer of a machine learning model for detecting a specific object from an image;
A virtual window is set on the feature map, and while the window is slid by a set amount, the area in the window of the feature map is displayed at a plurality of predetermined positions by the full coupling of the machine learning model. a score calculation unit that inputs to the layer and calculates a score indicating the possibility that the specific object exists in the region for each of the predetermined positions;
Padding data is added to the learning data used to build the machine learning model, and when extra data is added to the feature map, the added data is corrected. , a correction data generation unit that generates correction data;
with
The score calculation unit corrects the feature amount map using the generated correction data, and calculates the score using the corrected feature amount map.
An image recognition device characterized by:

The image recognition device according to claim 1,
further comprising a target image setting unit that acquires image data and sets a target range for the image recognition in the image specified by the acquired image data,
The image recognition apparatus, wherein the feature map generation unit generates the feature map for a set range.

The image recognition device according to claim 1 or 2,
identifying the score having the largest value and the predetermined position at that time from the scores calculated for each of the predetermined positions, and outputting the identified score and the predetermined position; further comprising a processing unit;
An image recognition device characterized by:

(a) using a convolutional layer of a machine learning model for detecting specific objects from an image to generate a feature map of the image to be image recognized;
(b) setting a virtual window on the feature amount map, and sliding the window by a set amount, at a plurality of predetermined positions, the area within the window of the feature amount map, the machine learning model; inputting into the fully connected layer of and calculating, for each of the predetermined positions, a score indicating the possibility that the specific object exists in the region;
(e) When padding data is added to the learning data used to build the machine learning model, and extra data is added to the feature map, the added data is corrected. generating correction data for
has
In step (b), correcting the feature quantity map using the generated correction data, and calculating the score using the corrected feature quantity map;
An image recognition method characterized by:

The image recognition method according to claim 4 ,
(c) acquiring image data, and setting a range to be subjected to image recognition in the image specified by the acquired image data;
generating the feature quantity map for the set range in step (a);
An image recognition method characterized by:

The image recognition method according to claim 4 or 5 ,
(d) identifying the score having the largest value and the predetermined position at that time among the scores calculated for each of the predetermined positions, and outputting the identified score and the predetermined position; further comprising the step of
An image recognition method characterized by:

to the computer,
(a) using a convolutional layer of a machine learning model for detecting specific objects from an image to generate a feature map of the image to be image recognized;
(b) setting a virtual window on the feature amount map, and sliding the window by a set amount, at a plurality of predetermined positions, the area within the window of the feature amount map, the machine learning model; inputting into the fully connected layer of and calculating, for each of the predetermined positions, a score indicating the possibility that the specific object exists in the region;
(e) When padding data is added to the learning data used to build the machine learning model, and extra data is added to the feature map, the added data is corrected. generating correction data for
and
In step (b), correcting the feature quantity map using the generated correction data, and calculating the score using the corrected feature quantity map;
program.

The program according to claim 7 ,
to the computer;
(c) obtaining image data and setting a range to be subjected to image recognition in an image specified by the obtained image data;
generating the feature quantity map for the set range in step (a);
A program characterized by

The program according to claim 7 or 8 ,
to the computer;
(d) identifying the score having the largest value and the predetermined position at that time among the scores calculated for each of the predetermined positions, and outputting the identified score and the predetermined position; do, cause the step to be executed further,
A program characterized by