JP7843662B2

JP7843662B2 - Area detection device and program

Info

Publication number: JP7843662B2
Application number: JP2022124751A
Authority: JP
Inventors: 伶遠藤
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2022-08-04
Filing date: 2022-08-04
Publication date: 2026-04-10
Anticipated expiration: 2042-08-04
Also published as: JP2024021715A

Description

本発明は、領域検出装置およびプログラムに関する。 This invention relates to a region detection device and program.

機械学習の手法を用いて画像内の特定の領域を検出するための領域検出の手法が既存技術として存在する。その既存技術においては、まず、学習用の画像と、その画像に含まれる正解領域とのペアを、学習用データとして大量に準備する。正解領域は、特定の特徴を有する領域である。例えば、正解領域は特定の物体等が含まれる領域である。そして、その学習用データを用いて、ニューラルネットワークの学習を行う。学習においては、その時点でのニューラルネットワークが推定した領域と、正解領域との誤差が小さくなるように、ニューラルネットワークの内部のパラメーターの調整を行う。 Existing techniques exist for detecting specific regions within images using machine learning methods. These existing techniques first prepare a large amount of training data, consisting of pairs of training images and the corresponding ground truth regions. Ground truth regions are areas possessing specific features; for example, regions containing specific objects. Then, a neural network is trained using this training data. During training, the internal parameters of the neural network are adjusted to minimize the error between the region estimated by the neural network at that point and the ground truth region.

ニューラルネットワークの学習を行うにあたって、上記の誤差の計算は微分可能な関数によって行われる必要がある。即ち、推定領域と正解領域とを入力として誤差値を出力とする、微分可能な関数を用いる必要がある。これは、ニューラルネットワークの内部のパラメーターの調整を行う際にパラメーター値の多次元空間における誤差値の勾配を必要とするためである。 When training a neural network, the above error calculation must be performed using a differentiable function. That is, a differentiable function is needed that takes the estimated region and the correct region as input and outputs the error value. This is because adjusting the internal parameters of the neural network requires the gradient of the error value in a multidimensional space of parameter values.

従来技術では、微分可能な誤差値関数を用いるためもあって、画像内の特定領域を検出する際には長方形の形状の領域を検出対象としていた。また、推定領域と正解領域と（どちらも長方形）の間の誤差の計算には、ＩｏＵ（Intersection over Union）という誤差関数が適していることがわかっており、用いられている。 Conventional techniques, partly due to the use of differentiable error function, have limited detection to rectangular regions within an image. Furthermore, the Intersection over Union (IoU) error function has been found to be suitable for calculating the error between the estimated region and the ground truth region (both rectangular), and is therefore used.

また、従来技術では、特定領域の座標値を直接求めるようなニューラルネットワークを構成するのではなく、０以上且つ１以下の画素値からなるマスク画像をニューラルネットワークから出力するようにして、画素値が所定の閾値（例えば、０．５）を超えた領域を推定結果の特定領域として出力することも行われる場合もある。この場合には、長方形に限らず任意の形状が推定結果として求められる。 Furthermore, in conventional techniques, instead of constructing a neural network that directly determines the coordinate values of a specific region, a mask image consisting of pixel values between 0 and 1 is sometimes output from the neural network. The region where the pixel values exceed a predetermined threshold (e.g., 0.5) is then output as the specific region in the estimation result. In this case, any shape, not limited to rectangles, can be obtained as the estimation result.

非特許文献１には、回転した長方形に関して誤差を求めるための誤差関数として、ＰＩｏＵ誤差関数（PIoU loss）が記載されている。ＰＩｏＵ誤差関数は、回転した物体を検出するための損失関数であり、角度とＩｏＵの両方を利用することによって正確な回転バウンディングボックス回帰を行うように定式化されている。非特許文献１に記載された手法は、回転した長方形の内部に存在するピクセルの個数の近似値を微分可能な関数で求めるものであり、ＩｏＵに似た誤差値を求めることのできる方法である。 Non-Patent Document 1 describes the PIoU error function (PIoU loss) as an error function for determining the error for a rotated rectangle. The PIoU error function is a loss function for detecting rotated objects, and is formulated to perform accurate rotation bounding box regression by utilizing both angle and IoU. The method described in Non-Patent Document 1 approximates the number of pixels inside the rotated rectangle using a differentiable function, and is a method that can obtain an error value similar to IoU.

Zhiming Chen，Kean Chen，Weiyao Lin，John See，Hui Yu，Yan Ke，Cong Yang，"PIoU Loss: Towards Accurate Oriented Object Detection in Complex Environments "，European Conference on Computer Vision（ECCV），2020年，https://arxiv.org/pdf/2007.09584.pdfZhiming Chen, Kean Chen, Weiyao Lin, John See, Hui Yu, Yan Ke, Cong Yang, "PIoU Loss: Towards Accurate Oriented Object Detection in Complex Environments", European Conference on Computer Vision (ECCV), 2020, https://arxiv.org/pdf/2007.09584.pdf

しかしながら、従来技術には、次のような解決すべき課題が存在する。 However, conventional technologies have the following challenges that need to be addressed.

従来技術では、長方形の領域について誤差関数を用いて誤差を算出している。つまり、ニューラルネットワークによって推定される領域も、正解領域も、どちらも長方形に限定されている。このため、従来技術では、長方形以外の形状の領域の座標値を直接推定するようなニューラルネットワークを構成することができないという問題がある。 Conventional techniques calculate errors using an error function for rectangular regions. This means that both the region estimated by the neural network and the correct region are limited to rectangles. Therefore, conventional techniques have the problem of not being able to construct a neural network that directly estimates coordinate values for regions with shapes other than rectangles.

従来技術として説明した手法の一つは、ニューラルネットワークからマスク画像を出力するようにして、任意の形状の推定を行えるようにする方法である。しかしながら、ニューラルネットワークで生成されたマスク画像を基に推定領域を特定するための座標値の集合を求める際に、誤差が生じ得る。例えば、生成したマスク画像の１画素分のサイズよりも小さな領域を推定結果とすることはできない。例えば正解座標値と推定座標値との間での最小二乗誤差を誤差として扱うなど、単純な回帰で計算する方法も考えられるが、一般に座標値の回帰は精度が高くない。また、正解領域の図形が持つ頂点の数と推定領域の図形が持つ頂点の数とが同じでなければならないという制約がある。つまり、例えば四角形の領域と五角形の領域との間で頂点の座標値に基づく誤差を算出することはできない。 One of the conventional methods described involves outputting a mask image from a neural network to enable estimation of arbitrary shapes. However, errors can occur when determining the set of coordinate values for identifying the estimation region based on the mask image generated by the neural network. For example, it is not possible to estimate a region smaller than the size of one pixel in the generated mask image. While methods such as treating the least-squares error between the ground truth coordinates and the estimated coordinates as the error can be considered using simple regression, coordinate regression generally does not yield high accuracy. Furthermore, there is a constraint that the number of vertices in the ground truth region and the estimated region must be the same. In other words, it is not possible to calculate the error based on vertex coordinate values between, for example, a quadrilateral region and a pentagonal region.

また、非特許文献１に記載されているＰＩｏＵ誤差関数を用いる場合には回転した長方形を対象として領域間の誤差を求めることが可能であるが、入力される画像の画素のサイズよりも細かい形状を推定することが難しいという問題がある。 Furthermore, while using the PIoU error function described in Non-Patent Document 1 allows for calculating the error between regions of a rotated rectangle, it has the problem of being difficult to estimate shapes finer than the pixel size of the input image.

本発明は、上記の課題認識に基づいて行なわれたものであり、ニューラルネットワーク等の機械学習モデルを用いて画像内における特定の特徴を有する領域を検出することができ、検出される領域が長方形に限定されることのない領域検出装置およびプログラムを提供しようとするものである。 This invention was developed based on the above-mentioned problem recognition, and aims to provide a region detection device and program that can detect regions with specific features within an image using machine learning models such as neural networks, and in which the detected regions are not limited to rectangles.

［１］上記の課題を解決するため、本発明の一態様による領域検出装置は、機械学習モデルを内部に備えて、外部から渡される画像を前記機械学習モデルに入力することによって前記画像内において特定の特徴を有する領域を推定する領域推定部と、前記領域推定部が備える前記機械学習モデルの学習を行うための、学習用画像と当該学習用画像に対応する正解領域との対を供給する学習用データ供給部と、前記学習用データ供給部が供給する前記学習用画像に基づいて前記領域推定部が推定した結果である推定領域と、前記学習用画像に対応して前記学習用データ供給部が供給する前記正解領域と、の誤差を求める誤差算出部と、を備える領域検出装置であって、前記推定領域に対応する図形および前記正解領域に対応する図形は、いずれも、凸包多角形であって、前記誤差算出部は、前記画像内および前記学習用画像内のそれぞれにおいて共通する多数の仮想ピクセルを設定する仮想ピクセル生成部と、前記仮想ピクセルの各々について、前記推定領域に対応する図形についての内包判定値と前記正解領域に対応する図形についての内包判定値との和および積をそれぞれ求め、前記和から前記積を減じた値をユニオン値（union）として、前記積の値をインターセクション値（intersection）として、すべての前記仮想ピクセルについての前記インターセクション値の総和をすべての前記仮想ピクセルについての前記ユニオン値の総和で除して得られる値、を誤差として求める誤差関数値計算部と、を備え、前記推定領域に対応する図形についての内包判定値は、前記仮想ピクセルが当該図形の内側に存在して且つ当該図形のいずれの辺からも所定距離以上離れている場合には１またはほぼ１であり、前記仮想ピクセルが当該図形の外側に存在して且つ当該図形のいずれの辺からも所定距離以上離れている場合には０またはほぼ０であり、且つ前記推定領域に対応する図形についての内包判定値を求めるための関数は前記画像内の全領域において連続且つ微分可能であり、前記正解領域に対応する図形についての内包判定値は、前記仮想ピクセルが当該図形の内側に存在して且つ当該図形のいずれの辺からも所定距離以上離れている場合には１またはほぼ１であり、前記仮想ピクセルが当該図形の外側に存在して且つ当該図形のいずれの辺からも所定距離以上離れている場合には０またはほぼ０であり、且つ前記正解領域に対応する図形についての内包判定値を求めるための関数は前記学習用画像内の全領域において連続且つ微分可能であり、前記推定領域に対応する図形についての内包判定値および前記正解領域に対応する図形についての内包判定値は、それぞれ、前記画像および前記学習用画像の全領域において０以上且つ１以下である、というものである。 [1] To solve the above problems, a region detection device according to one aspect of the present invention comprises: a region estimation unit that internally incorporates a machine learning model and estimates regions having specific features within an image by inputting an image passed from the outside into the machine learning model; a learning data supply unit that supplies a pair of a learning image and a correct region corresponding to the learning image for training the machine learning model provided by the region estimation unit; and an error calculation unit that calculates the error between an estimated region which is the result of the region estimation unit's estimation based on the learning image supplied by the learning data supply unit, and the correct region supplied by the learning data supply unit corresponding to the learning image. The detection device includes a virtual pixel generation unit that sets a number of virtual pixels common to the image and the training image, respectively, and for each of the virtual pixels, it calculates the sum and product of the inclusion determination value for the figure corresponding to the figure corresponding to the estimate region and the inclusion determination value for the figure corresponding to the correct region, respectively, and takes the value obtained by subtracting the product from the sum as the union value, and the value of the product as the intersection value, and the sum of the intersection values for all the virtual pixels The system comprises an error function value calculation unit that calculates an error value obtained by dividing the sum by the sum of the union values for all the virtual pixels, wherein the inclusion determination value for the figure corresponding to the estimation region is 1 or approximately 1 if the virtual pixel is inside the figure and is at least a predetermined distance from any side of the figure, and is 0 or approximately 0 if the virtual pixel is outside the figure and is at least a predetermined distance from any side of the figure, and the function for determining the inclusion determination value for the figure corresponding to the estimation region is continuous and differentiable over the entire region of the image, and the inclusion of the figure corresponding to the correct region The determination value is 1 or approximately 1 if the virtual pixel is located inside the figure and is at least a predetermined distance from any side of the figure, and 0 or approximately 0 if the virtual pixel is located outside the figure and is at least a predetermined distance from any side of the figure. Furthermore, the function for determining the inclusion determination value for the figure corresponding to the correct answer region is continuous and differentiable throughout the entire training image. The inclusion determination value for the figure corresponding to the estimated region and the inclusion determination value for the figure corresponding to the correct answer region are, respectively, greater than or equal to 0 and less than or equal to 1 throughout the entire image and training image.

［２］また、本発明の一態様は、上記［１］の領域検出装置において、前記誤差算出部は、さらに、前記仮想ピクセルのそれぞれについて、前記推定領域に対応する図形が有するそれぞれの辺および前記正解領域に対応する図形が有するそれぞれの辺について、下に記載の式（１）（ただし、ｅはネイピア数、ｋは所定の正定数、ｄ_ｐｘは当該図形の内側に存在する所定の点を通って且つ当該辺と平行な基準線から当該仮想ピクセルまでの距離、ｄ_{ｃｅｎｔｅｒ}は当該辺に関する前記基準線から当該辺までの距離である）によって当該仮想ピクセルの当該辺についての内包判定値を求める内包判定部と、前記内包判定部が求めた当該図形が有するすべての辺についての前記仮想ピクセルの内包判定値を掛け合わせることによって当該仮想ピクセルの当該図形についての前記内包判定値を求める内包判定統合部と、を備えるものである。 [2] In another aspect of the present invention, the region detection device described in [1] above further comprises: an inclusion determination unit that, for each of the virtual pixels, calculates an inclusion determination value for each side of the virtual pixel using the formula (1) described below (where e is Napier's number, k is a predetermined positive constant, d _px is the distance from the virtual pixel to the reference line passing through a predetermined point located inside the figure and parallel to the side, and d _center is the distance from the reference line relating to the side to the side); and an inclusion determination integration unit that calculates the inclusion determination value for the virtual pixel for the figure by multiplying the inclusion determination values for all sides of the figure obtained by the inclusion determination unit.

［３］また、本発明の一態様は、機械学習モデルを内部に備えて、外部から渡される画像を前記機械学習モデルに入力することによって前記画像内において特定の特徴を有する領域を推定する領域推定部と、前記領域推定部が備える前記機械学習モデルの学習を行うための、学習用画像と当該学習用画像に対応する正解領域との対を供給する学習用データ供給部と、前記学習用データ供給部が供給する前記学習用画像に基づいて前記領域推定部が推定した結果である推定領域と、前記学習用画像に対応して前記学習用データ供給部が供給する前記正解領域と、の誤差を求める誤差算出部と、を備える領域検出装置であって、前記推定領域に対応する図形および前記正解領域に対応する図形は、いずれも、凸包多角形であって、前記誤差算出部は、前記画像内および前記学習用画像内のそれぞれにおいて共通する多数の仮想ピクセルを設定する仮想ピクセル生成部と、前記仮想ピクセルの各々について、前記推定領域に対応する図形についての内包判定値と前記正解領域に対応する図形についての内包判定値との和および積をそれぞれ求め、前記和から前記積を減じた値をユニオン値（union）として、前記積の値をインターセクション値（intersection）として、すべての前記仮想ピクセルについての前記インターセクション値の総和をすべての前記仮想ピクセルについての前記ユニオン値の総和で除して得られる値、を誤差として求める誤差関数値計算部と、を備え、前記推定領域に対応する図形についての内包判定値は、前記仮想ピクセルが当該図形の内側に存在して且つ当該図形のいずれの辺からも所定距離以上離れている場合には１またはほぼ１であり、前記仮想ピクセルが当該図形の外側に存在して且つ当該図形のいずれの辺からも所定距離以上離れている場合には０またはほぼ０であり、且つ前記推定領域に対応する図形についての内包判定値を求めるための関数は前記画像内の全領域において連続且つ微分可能であり、前記正解領域に対応する図形についての内包判定値は、前記仮想ピクセルが当該図形の内側に存在して且つ当該図形のいずれの辺からも所定距離以上離れている場合には１またはほぼ１であり、前記仮想ピクセルが当該図形の外側に存在して且つ当該図形のいずれの辺からも所定距離以上離れている場合には０またはほぼ０であり、且つ前記正解領域に対応する図形についての内包判定値を求めるための関数は前記学習用画像内の全領域において連続且つ微分可能であり、前記推定領域に対応する図形についての内包判定値および前記正解領域に対応する図形についての内包判定値は、それぞれ、前記画像および前記学習用画像の全領域において０以上且つ１以下である、領域検出装置、としてコンピューターを機能させるためのプログラムである。 [3] Another aspect of the present invention is a region detection device comprising: a region estimation unit that internally incorporates a machine learning model and estimates regions having specific features within an image by inputting an image received from an external source into the machine learning model; a learning data supply unit that supplies a pair of a learning image and a correct region corresponding to the learning image for training the machine learning model provided by the region estimation unit; and an error calculation unit that calculates the error between an estimated region which is the result of the region estimation unit's estimation based on the learning image supplied by the learning data supply unit, and the correct region supplied by the learning data supply unit corresponding to the learning image, wherein the figure corresponding to the estimated region The shape and the figure corresponding to the correct answer region are both convex hull polygons, and the error calculation unit includes a virtual pixel generation unit that sets a number of virtual pixels common to the image and the training image, respectively, and for each of the virtual pixels, it calculates the sum and product of the inclusion determination value for the figure corresponding to the estimation region and the inclusion determination value for the figure corresponding to the correct answer region, respectively, and takes the value obtained by subtracting the product from the sum as the union value, and the value of the product as the intersection value, and the sum of the intersection values for all the virtual pixels as the union value for all the virtual pixels The system comprises an error function value calculation unit that calculates an error value obtained by dividing by the sum of ON values, wherein the inclusion determination value for the figure corresponding to the estimation region is 1 or approximately 1 if the virtual pixel is inside the figure and is at least a predetermined distance from any side of the figure, and is 0 or approximately 0 if the virtual pixel is outside the figure and is at least a predetermined distance from any side of the figure, and the function for calculating the inclusion determination value for the figure corresponding to the estimation region is continuous and differentiable over the entire region of the image, and the inclusion determination value for the figure corresponding to the correct region is 1 if the virtual pixel is inside the figure This is a program for a computer to function as a region detection device, wherein the value is 1 or approximately 1 if the virtual pixel is located outside the figure and is located beyond a predetermined distance from any side of the figure, the value is 0 or approximately 0 if the virtual pixel is located outside the figure and is located beyond a predetermined distance from any side of the figure, the function for calculating the inclusion determination value for the figure corresponding to the correct answer region is continuous and differentiable throughout the entire region of the training image, and the inclusion determination value for the figure corresponding to the estimated region and the inclusion determination value for the figure corresponding to the correct answer region are, respectively, 0 or greater and 1 or less throughout the entire region of the image and the training image.

本発明によれば、ニューラルネットワーク等の機械学習モデルを用いて特定の特徴を有する領域を検出するための領域検出装置が、長方形以外の領域（凸包多角形の図形に対応する領域）を検出対象とすることができる。また、仮想ピクセルの解像度を任意に設定することができるため、実ピクセルの解像度に依存せずに精細な画像の処理が可能となる。 According to the present invention, a region detection device for detecting regions with specific features using machine learning models such as neural networks can detect regions other than rectangles (regions corresponding to convex-hull polygon shapes). Furthermore, since the resolution of virtual pixels can be arbitrarily set, detailed image processing becomes possible regardless of the resolution of actual pixels.

本発明の実施形態による領域検出装置の概略機能構成を示すブロック図である。This block diagram shows the schematic functional configuration of a region detection device according to an embodiment of the present invention. 同実施形態による誤差算出部の内部のさらに詳細な機能構成を示すブロック図である。This block diagram shows a more detailed functional configuration of the internal components of the error calculation unit according to the same embodiment. 同実施形態における、画像内での実ピクセルおよび仮想ピクセルの配置例を示す概略図である。This is a schematic diagram showing an example of the arrangement of real pixels and virtual pixels in an image in the same embodiment. 同実施形態による内包判定部の処理を説明するための概略図の１つであり、誤差計算の対象となる図形の１つがｘｙ平面上に存在する状態を示す。This is one schematic diagram illustrating the processing of the internal determination unit according to the same embodiment, showing a state in which one of the figures subject to error calculation lies on the xy plane. 同実施形態による内包判定部の処理を説明するための別の概略図であり、図形平面を回転させて１つの辺を水平にした状態を示す。This is another schematic diagram illustrating the processing of the containment determination unit according to the same embodiment, showing the state in which the geometric plane has been rotated so that one side is horizontal. 同実施形態による誤差関数値計算部が２つの図形の間の誤差の求める際の処理を説明するための概略図である。This is a schematic diagram illustrating the process by which the error function value calculation unit according to the same embodiment calculates the error between two figures. 同実施形態による領域検出装置を実現するための内部構成の例を示すブロック図である。This is a block diagram showing an example of the internal configuration for realizing the region detection device according to the same embodiment. 同実施形態の変形例による内包判定値の求め方を説明するための概略図である。This is a schematic diagram illustrating how to determine the inclusion determination value using a modified example of the same embodiment. 同実施形態の領域検出装置の応用例と、その検証のための実験に用いた画像の例とを説明するための概略図である。This is a schematic diagram illustrating an application example of the region detection device of the same embodiment and an example of an image used in an experiment to verify it.

次に、本発明の一実施形態について、図面を参照しながら説明する。本実施形態では、領域検出装置１は、機械学習の手法を用いることによって、入力される画像内において特定の特徴を有する領域を自動的に検出する。本実施形態では、機械学習モデルの学習を行うにあたって、本実施形態に特有の誤差（ロス（loss）あるいはエラー（error）等とも呼ばれる）の算出方法を用いる。その誤差算出における特徴は、大きく、次の２点である。 Next, one embodiment of the present invention will be described with reference to the drawings. In this embodiment, the region detection device 1 automatically detects regions with specific features within an input image by using machine learning techniques. In this embodiment, a method for calculating the error (also called loss or error, etc.) specific to this embodiment is used when training the machine learning model. The main features of this error calculation are as follows:

第１の特徴として、後で説明する誤差算出部１０５は、入力される画像を構成する実際のピクセルとは無関係に、計算上のみで便宜的に用いる仮想的なピクセルを定義して、その仮想的ピクセルに基づいた計算を行う。具体的には誤差算出部１０５は、画像内の領域を表す図形が内部に含む仮想的ピクセルの数を計算する。なお、ここでの図形を凸包図形に限定する。つまり、ここでの図形（多角形）を構成するいかなる頂点においても、その内角は１８０度未満である場合に限定する。なお、図形が内部に含む仮想的なピクセルの数として、誤差算出部１０５はその近似値を求めるものであってもよい。このように仮想的なピクセルに基づいた計算を行うことにより、誤差算出部１０５は、入力される画像の解像度が粗い場合であっても、高い精度で誤差を求めることができる。 The first characteristic is that the error calculation unit 105, which will be explained later, defines virtual pixels for calculational convenience only, independently of the actual pixels that make up the input image, and performs calculations based on these virtual pixels. Specifically, the error calculation unit 105 calculates the number of virtual pixels contained within a figure representing a region in the image. Note that the figure here is limited to a convex hull figure. That is, the interior angle of any vertex constituting the figure (polygon) is limited to less than 180 degrees. The error calculation unit 105 may also calculate an approximate value of the number of virtual pixels contained within the figure. By performing calculations based on these virtual pixels, the error calculation unit 105 can calculate the error with high accuracy even when the resolution of the input image is low.

第２の特徴として、誤差算出部１０５は、対称の図形を構成する辺ごとに、それぞれの仮想的なピクセルが図形の内部に存在するか外部に存在するかを判定する。そして、誤差算出部１０５は、それぞれの辺についての判定結果（内部か外部か）を統合することによって、各仮想的ピクセルが対称の図形の内部に存在するか外部に存在するかを決定する。この手順を用いて、誤差算出部１０５は、任意の形状を有する対称図形（ただし、凸包図形）の内部に存在する仮想的なピクセルの数を求める。その具体的な計算方法については、後で説明する。 The second feature is that the error calculation unit 105 determines whether each virtual pixel exists inside or outside the symmetrical figure for each edge constituting the figure. Then, the error calculation unit 105 integrates the determination results (inside or outside) for each edge to determine whether each virtual pixel exists inside or outside the symmetrical figure. Using this procedure, the error calculation unit 105 calculates the number of virtual pixels existing inside a symmetrical figure of any shape (however, a convex hull figure). The specific calculation method will be explained later.

図１は、本実施形態による領域検出装置の概略機能構成を示すブロック図である。図示するように、領域検出装置１は、画像入力部１０１と、領域推定部１０２と、結果出力部１０３と、学習用データ供給部１０４と、誤差算出部１０５とを含んで構成される。これらの各機能部は、例えば、コンピューターと、プログラムとで実現することが可能である。また、各機能部は、必要に応じて、記憶手段を有する。記憶手段は、例えば、プログラム上の変数や、プログラムの実行によりアロケーションされるメモリーである。また、必要に応じて、磁気ハードディスク装置やソリッドステートドライブ（ＳＳＤ）といった不揮発性の記憶手段を用いるようにしてもよい。また、各機能部の少なくとも一部の機能を、プログラムではなく専用の電子回路として実現してもよい。 Figure 1 is a block diagram illustrating the schematic functional configuration of the region detection device according to this embodiment. As shown in the figure, the region detection device 1 includes an image input unit 101, a region estimation unit 102, a result output unit 103, a training data supply unit 104, and an error calculation unit 105. Each of these functional units can be implemented, for example, by a computer and a program. Each functional unit also has storage means as needed. The storage means may be, for example, variables in the program or memory allocated by program execution. Alternatively, non-volatile storage means such as a magnetic hard disk drive or solid-state drive (SSD) may be used as needed. Furthermore, at least some of the functions of each functional unit may be implemented as a dedicated electronic circuit instead of a program.

領域検出装置１は、学習モードあるいは推定実行モードのいずれかのモードで稼働する。学習モードにおいては、領域検出装置１は、学習用データを用いた処理を行うことによって、領域推定部１０２が持つ機械学習モデルのパラメーターの最適化を行う。推定実行モードにおいては、領域推定部１０２が、学習済みのパラメーターを用いて、未知の入力画像な中の特定の領域を推定する処理を行う。各部のそれぞれの機能について、次に説明する。 The region detection device 1 operates in either a learning mode or an estimation execution mode. In learning mode, the region detection device 1 optimizes the parameters of the machine learning model in the region estimation unit 102 by processing training data. In estimation execution mode, the region estimation unit 102 uses the trained parameters to estimate a specific region in an unknown input image. The functions of each part are described below.

画像入力部１０１は、外部から入力される画像を取得して、領域推定部１０２に渡す。画像入力部１０１が領域推定部１０２に渡す画像は、推定実行モードでの動作における推定の対象となる画像である。 The image input unit 101 acquires an image input from an external source and passes it to the region estimation unit 102. The image that the image input unit 101 passes to the region estimation unit 102 is the image that will be used for estimation in the estimation execution mode.

領域推定部１０２は、機械学習モデルを内部に備えて、外部から渡される画像を前記機械学習モデルに入力することによって前記画像内において特定の特徴を有する領域を推定する。領域推定部１０２は、上記機械学習モデルとして、具体的には、ニューラルネットワークを内部に備える。領域推定部１０２は、推定結果である領域に対応する図形を出力する。本実施形態において、その図形は、凸包多角形である。つまり、その図形は、すべての内角１８０度未満であるような多角形である。より具体的には、領域推定部１０２は、推定結果である凸包多角形が有するそれぞれの頂点の座標値を出力するものであてよい。あるいは、領域推定部１０２は、その凸包多角形に関する同等の情報を出力するものであってよい。 The region estimation unit 102 internally incorporates a machine learning model and estimates regions within an image that possess specific features by inputting an image received from an external source into the machine learning model. Specifically, the region estimation unit 102 internally incorporates a neural network as the machine learning model. The region estimation unit 102 outputs a figure corresponding to the estimated region. In this embodiment, the figure is a convex hull polygon; that is, a polygon in which all interior angles are less than 180 degrees. More specifically, the region estimation unit 102 may output the coordinate values of each vertex of the estimated convex hull polygon. Alternatively, the region estimation unit 102 may output equivalent information regarding the convex hull polygon.

領域推定部１０２は、学習モードで動作する場合には、推定結果の領域の図形の情報を、誤差算出部１０５に渡す。これにより、誤差算出部１０５は、推定領域の図形と正解領域の図形他の間の誤差を算出することとなる。また、領域推定部１０２は、推定実行モードで動作する場合には、未知の画像を基に推定した結果である推定領域の図形の情報を、結果出力部１０３に渡す。 When operating in learning mode, the region estimation unit 102 passes information about the estimated region shape to the error calculation unit 105. The error calculation unit 105 then calculates the error between the estimated region shape and the correct region shape, etc. When operating in estimation execution mode, the region estimation unit 102 passes information about the estimated region shape (estimated based on an unknown image) to the result output unit 103.

結果出力部１０３は、推定実行モードにおいて領域推定部１０２が推定結果として出力した推定領域の情報（図形の情報）を、外部に出力する。この推定領域の情報は、特定の特徴を有する領域の情報として、様々な目的で利用可能な者である。 The result output unit 103 outputs the estimated region information (graphic information) output by the region estimation unit 102 as the estimation result in the estimation execution mode. This estimated region information, possessing specific characteristics, can be used for various purposes.

学習用データ供給部１０４は、領域検出装置１が学習モードで動作する場合に、領域推定部１０２が持つニューラルネットワーク（機械学習モデル）の学習を行うための学習用データを供給する。学習用データは、学習用画像と、その学習用画像に対応する正解領域を表す図形（本実施形態では凸包多角形）との対の集合の情報である。学習用データ供給部１０４は、対に含まれる学習用画像を領域推定部１０２に渡す。また、学習用データ供給部１０４は、その対に含まれる正解領域の図形の情報を、誤差算出部１０５に渡す。これにより、誤差算出部１０５は、学習用画像に基づいて領域推定部１０２が出力した領域（推定領域）の図形と、学習用データ供給部１０４が供給した正解領域との図形との誤差を算出できるようになる。ニューラルネットワークの学習を行う際には、学習用データ供給部１０４は、学習のために、各対を順次供給する。 The training data supply unit 104 supplies training data for the neural network (machine learning model) of the region estimation unit 102 when the region detection device 1 is operating in training mode. The training data is information about a set of pairs of training images and corresponding geometric shapes (convex hull polygons in this embodiment) representing the correct region. The training data supply unit 104 passes the training images included in the pairs to the region estimation unit 102. The training data supply unit 104 also passes the information about the geometric shapes of the correct region included in the pairs to the error calculation unit 105. This allows the error calculation unit 105 to calculate the error between the geometric shape of the region (estimated region) output by the region estimation unit 102 based on the training images and the geometric shape of the correct region supplied by the training data supply unit 104. When training the neural network, the training data supply unit 104 sequentially supplies each pair for training.

誤差算出部１０５は、領域検出装置１が学習モードで動作する場合に、領域推定部１０２が推定結果として出力する領域の図形と、学習用データ供給部１０４が供給する正解領域の図形との間の誤差を算出する。本実施形態において、推定領域に対応する図形および正解領域に対応する図形は、いずれも、凸包多角形である。誤差算出部１０５が算出した誤差は、誤差逆伝播法により、領域推定部１０２が持つニューラルネットワークの内部パラメーターの調整（最適化の処理）に用いられる。誤差算出部１０５のさらに詳細な構成および処理内容については、後で図２等を参照しながらさらに説明する。 The error calculation unit 105 calculates the error between the region shape output by the region estimation unit 102 as an estimation result and the correct region shape supplied by the learning data supply unit 104 when the region detection device 1 is operating in learning mode. In this embodiment, both the shape corresponding to the estimated region and the shape corresponding to the correct region are convex hull polygons. The error calculated by the error calculation unit 105 is used to adjust (optimize) the internal parameters of the neural network in the region estimation unit 102 using backpropagation. The more detailed configuration and processing contents of the error calculation unit 105 will be explained later with reference to Figure 2, etc.

なお、誤差算出部１０５が算出する誤差とは、次のようなものである。後で図２のブロック図を参照しながら説明する誤差算出部１０５の内部構成は、下記の誤差を具体的に実現するための手順の一例を実現するものである。 The error calculated by the error calculation unit 105 is as follows. The internal configuration of the error calculation unit 105, which will be explained later with reference to the block diagram in Figure 2, represents an example of a procedure for specifically realizing the following error.

誤差算出部１０５は、誤差を算出するための基礎的なデータとして、内包判定値を用いる。内包判定値は、後述する仮想ピクセルが、図形（推定領域の図形や、正解領域の図形）に内包されるか否かを表す数値である。ただし、本実施形態では、内包判定値は例えば０または１の二値ではなく、０以上且つ１以下の連続的な値を取り得るものである。 The error calculation unit 105 uses an inclusion determination value as the basic data for calculating the error. The inclusion determination value is a numerical value that indicates whether the virtual pixel (described later) is contained within a shape (the shape of the estimation region or the shape of the correct answer region). However, in this embodiment, the inclusion determination value is not a binary value such as 0 or 1, but can take on a continuous value between 0 and 1.

推定領域に対応する図形についての内包判定値は、仮想ピクセルが当該図形の内側に存在して且つ当該図形のいずれの辺からも所定距離以上離れている場合には１またはほぼ１であり、仮想ピクセルが当該図形の外側に存在して且つ当該図形のいずれの辺からも所定距離以上離れている場合には０またはほぼ０であり、且つ前記推定領域に対応する図形についての内包判定値を求めるための関数は前記画像内の全領域において連続且つ微分可能である。つまり、図形（既に説明しているように、図形は多角形である）の辺の近傍（近傍とは、辺から上記所定距離以内の領域）においては、内包判定値は、０（またはほぼ０）から１（またはほぼ１）まで連続的に（且つ急激に）且つ滑らかに変化する。 The inclusion determination value for the figure corresponding to the estimated region is 1 or approximately 1 if the virtual pixel is inside the figure and is more than a predetermined distance away from any of its edges, and 0 or approximately 0 if the virtual pixel is outside the figure and is more than a predetermined distance away from any of its edges. Furthermore, the function for determining the inclusion determination value for the figure corresponding to the estimated region is continuous and differentiable over the entire region within the image. In other words, in the neighborhood of an edge of a figure (as already explained, the figure is a polygon) (the neighborhood being the region within the predetermined distance from the edge), the inclusion determination value changes continuously (and abruptly) and smoothly from 0 (or approximately 0) to 1 (or approximately 1).

正解領域に関しても同様である。即ち、正解領域に対応する図形についての内包判定値は、仮想ピクセルが当該図形の内側に存在して且つ当該図形のいずれの辺からも所定距離以上離れている場合には１またはほぼ１であり、仮想ピクセルが当該図形の外側に存在して且つ当該図形のいずれの辺からも所定距離以上離れている場合には０またはほぼ０であり、且つ前記正解領域に対応する図形についての内包判定値を求めるための関数は前記学習用画像内の全領域において連続且つ微分可能である。つまり、図形（多角形である）の辺の近傍（近傍とは、辺から上記所定距離以内の領域）においては、内包判定値は、０（またはほぼ０）から１（またはほぼ１）まで連続的に（且つ急激に）且つ滑らかに変化する。 The same applies to the correct answer region. Specifically, the inclusion determination value for a figure corresponding to the correct answer region is 1 or approximately 1 if the virtual pixel is inside the figure and at least a predetermined distance from any of its edges, and 0 or approximately 0 if the virtual pixel is outside the figure and at least a predetermined distance from any of its edges. Furthermore, the function for determining the inclusion determination value for the figure corresponding to the correct answer region is continuous and differentiable across the entire training image. In other words, in the neighborhood of an edge of a figure (a polygon) (a neighborhood being the area within the predetermined distance from the edge), the inclusion determination value changes continuously (and abruptly) and smoothly from 0 (or approximately 0) to 1 (or approximately 1).

なお、推定領域に対応する図形についての内包判定値および正解領域に対応する図形についての内包判定値は、それぞれ、画像および学習用画像の全領域において０以上且つ１以下である。 Furthermore, the intensification judgment value for the figure corresponding to the estimated region and the intensification judgment value for the figure corresponding to the correct answer region are both 0 or greater and 1 or less across the entire region of the image and the training image.

本実施形態では、仮想ピクセルの、図形についての内包判定値を求めるために、誤差算出部１０５は、仮想ピクセルの、その図形の各辺についての内包判定値を求め、それらの内包判定値を掛け合わせることによって、その仮想ピクセルのその図形についての内包判定値を求める。 In this embodiment, to determine the inclusion determination value for a virtual pixel regarding a given shape, the error calculation unit 105 calculates the inclusion determination value for each side of the shape of the virtual pixel, and then multiplies these inclusion determination values to obtain the inclusion determination value for that virtual pixel regarding that shape.

図２は、本実施形態による誤差算出部１０５の内部のさらに詳細な機能構成を示すブロック図である。図示するように、誤差算出部１０５は、仮想ピクセル生成部１０５１と、内包判定部１０５２と、内包判定統合部１０５３と、誤差関数値計算部１０５４とを含んで構成される。各部の機能は、次の通りである。 Figure 2 is a block diagram showing a more detailed functional configuration of the error calculation unit 105 according to this embodiment. As shown, the error calculation unit 105 includes a virtual pixel generation unit 1051, an inclusion determination unit 1052, an inclusion determination integration unit 1053, and an error function value calculation unit 1054. The functions of each unit are as follows:

仮想ピクセル生成部１０５１は、画像内および学習用画像内のそれぞれにおいて共通する多数の仮想ピクセルを設定（生成）する。仮想ピクセルについては、後でさらに説明する。 The virtual pixel generation unit 1051 sets (generates) a large number of virtual pixels common to both the image and the training image. Virtual pixels will be explained further later.

内包判定部１０５２は、仮想ピクセルのそれぞれについて、推定領域に対応する図形が有するそれぞれの辺および正解領域に対応する図形が有するそれぞれの辺について、後で記載する式（１）（ただし、ｅはネイピア数、ｋは所定の正定数、ｄ_ｐｘは当該図形の内側に存在する所定の点を通って且つ当該辺と平行な基準線から当該仮想ピクセルまでの距離、ｄ_{ｃｅｎｔｅｒ}は当該辺に関する前記基準線から当該辺までの距離である）によって当該仮想ピクセルの当該辺についての内包判定値を求める。 The inclusion determination unit 1052 determines an inclusion determination value for each side of a virtual pixel for each side of the figure corresponding to the estimated region and each side of the figure corresponding to the correct region using the formula (1) described later (where e is Napier's number, k is a predetermined positive constant, d _px is the distance from the virtual pixel to a reference line that passes through a predetermined point inside the figure and is parallel to the side, and d _center is the distance from the reference line to the side).

内包判定統合部１０５３は、１つの仮想ピクセルについて内包判定部１０５２が求めた各辺についての内包判定値に基づいて、それらを統合し、その仮想ピクセルの図形（多角形）についての内包判定値を求める。具体的には、内包判定統合部１０５３は、内包判定部１０５２が求めた当該図形が有するすべての辺についての仮想ピクセルの内包判定値をすべて掛け合わせることによって、当該仮想ピクセルの当該図形についての内包判定値を求める。 The Encompassing Determination Integration Unit 1053 integrates the encompassing determination values for each side of a single virtual pixel, as determined by the Encompassing Determination Unit 1052, to determine the encompassing determination value for the shape (polygon) of that virtual pixel. Specifically, the Encompassing Determination Integration Unit 1053 calculates the encompassing determination value for that virtual pixel's shape by multiplying all the encompassing determination values for all sides of the shape determined by the Encompassing Determination Unit 1052.

誤差関数値計算部１０５４は、内包判定値に基づいて、推定領域の図形と正解領域の図形との間の誤差を計算する。具体的には、誤差関数値計算部１０５４は、仮想ピクセルが各図形に内包されるか否かを表す値である内包判定値に基づいて誤差を算出する。さらに具体的には、誤差関数値計算部１０５４は、仮想ピクセルの各々について、推定領域に対応する図形についての内包判定値と正解領域に対応する図形についての内包判定値との和および積をそれぞれ求め、和から積を減じた値をユニオン値（union）として、積の値をインターセクション値（intersection）として、すべての仮想ピクセルについてのインターセクション値の総和をすべての仮想ピクセルについての前記ユニオン値の総和で除して得られる値、を誤差として求める。 The error function calculation unit 1054 calculates the error between the figure in the estimated region and the figure in the correct region based on the inclusion determination value. Specifically, the error function calculation unit 1054 calculates the error based on the inclusion determination value, which is a value indicating whether or not a virtual pixel is contained within each figure. More specifically, for each virtual pixel, the error function calculation unit 1054 calculates the sum and product of the inclusion determination value for the figure corresponding to the estimated region and the inclusion determination value for the figure corresponding to the correct region, respectively. The value obtained by subtracting the product from the sum is taken as the union value, and the value of the product is taken as the intersection value. The error is then calculated by dividing the sum of the intersection values for all virtual pixels by the sum of the union values for all virtual pixels.

次に、誤差算出部１０５による処理の流れについて、実例とともに説明する。 Next, the processing flow by the error calculation unit 105 will be explained with an example.

仮想ピクセル生成部１０５１は、誤差算出の対象の画像について仮想ピクセルを生成する。仮想ピクセルは、画像の面上に所定の間隔で仮想的に配置されたピクセルである。ｘｙ直交座標で表わす場合、例としてｙ座標が０．０から１．５まで、ｘ座標が０．０から１．０まで、仮想ピクセルのｘ座標およびｙ座標のそれぞれの間隔を０．５とするとき、仮想ピクセルの座標値は、下記の１２個である。
（ｘ，ｙ）＝
（０．０，０．０），（０．０，０．５），（０．０，１．０），（０．０，１．５），
（０．５，０．０），（０．５，０．５），（０．５，１．０），（０．５，１．５），
（１．０，０．０），（１．０，０．５），（１．０，１．０），（１．０，１．５）， The virtual pixel generation unit 1051 generates virtual pixels for the image subject to error calculation. Virtual pixels are pixels virtually arranged at predetermined intervals on the surface of the image. When expressed in xy Cartesian coordinates, for example, if the y coordinate ranges from 0.0 to 1.5, the x coordinate ranges from 0.0 to 1.0, and the interval between the x and y coordinates of the virtual pixels is 0.5, then the coordinate values of the virtual pixels are the following 12.
(x, y) =
(0.0, 0.0), (0.0, 0.5), (0.0, 1.0), (0.0, 1.5),
(0.5, 0.0), (0.5, 0.5), (0.5, 1.0), (0.5, 1.5),
(1.0, 0.0), (1.0, 0.5), (1.0, 1.0), (1.0, 1.5),

仮想ピクセルの座標値は、画像が実際に持つピクセルの座標値とは異なっていてもよい。また、仮想ピクセルの配置間隔（上の例では、０．５）も実際のピクセル間の間隔とは無関係に適宜定めればよい。仮想ピクセルのｘおよびｙそれぞれの座標値は実数で表わされるものであり、整数である必要はない。このように、仮想ピクセルの座標値に関する自由度は高いため、例えばｘ軸およびｙ軸それぞれの方向のサイズが１ピクセル以下であるような小さい図形を扱うのにも適している。なお、仮想ピクセルの配置間隔を狭くするほど（つまい、所定の長さ当たりの仮想ピクセルの数を多くするほど）、誤差の計算のために必要な計算量は増えるが、計算精度は向上する。 The coordinate values of virtual pixels may differ from the coordinate values of the actual pixels in the image. Furthermore, the spacing between virtual pixels (0.5 in the example above) can be determined appropriately, independently of the actual spacing between pixels. The x and y coordinate values of virtual pixels are represented as real numbers and do not need to be integers. Because of this high degree of freedom regarding the coordinate values of virtual pixels, it is suitable for handling small shapes, such as those where the size in the x and y directions is 1 pixel or less. Note that narrowing the spacing between virtual pixels (i.e., increasing the number of virtual pixels per given length) increases the computational load required for error calculation, but improves calculation accuracy.

図３は、画像内における実ピクセルおよび仮想ピクセルの配置例を示す概略図である。同図は、画像内の一部の領域のみを示している。同図において、実ピクセル１９０１は、四角形で示されている。また、仮想ピクセル１９０２は、黒丸で示されている。本例では、実ピクセル１９０１も仮想ピクセル１９０２も、それぞれ正方配列としている。本例では、１個の実ピクセル１９０１に対応して、１６個の仮想ピクセル１９０２を配置している。つまり、仮想ピクセル１９０２の配置の密度は、縦および横のそれぞれの方向に、実ピクセル１９０１の配置の密度の４倍である。図３に示す配置のパターンは、一例に過ぎない。仮想ピクセルは、正方配列ではなく、例えばデルタ配列となっていてもよい。また、仮想ピクセルは、規則性のない配列となっていてもよい。実ピクセルの配置密度と仮想ピクセルの配置密度との関係も、任意である。いずれの場合も、仮想ピクセルの配置密度は、画面内の位置によって大きく異なることがなくできるだけ一様であることが望ましい。 Figure 3 is a schematic diagram showing an example of the arrangement of real and virtual pixels in an image. The figure shows only a portion of the image. In the figure, real pixels 1901 are represented by squares, and virtual pixels 1902 are represented by black circles. In this example, both real pixels 1901 and virtual pixels 1902 are arranged in a square grid. In this example, 16 virtual pixels 1902 are arranged corresponding to one real pixel 1901. That is, the density of virtual pixels 1902 is four times the density of real pixels 1901 in both the vertical and horizontal directions. The arrangement pattern shown in Figure 3 is merely an example. Virtual pixels may not be arranged in a square grid; for example, they may be arranged in a delta grid. Furthermore, virtual pixels may be arranged in an irregular array. The relationship between the density of real pixels and the density of virtual pixels is also arbitrary. In any case, it is desirable that the density of virtual pixels be as uniform as possible, without significant variations depending on their position within the screen.

内包判定部１０５２は、上で設定された仮想ピクセルのそれぞれについて、辺ごとの内包判定を行う。つまり、内包判定部１０５２は、各々の仮想ピクセルが各々の辺のどちら側に存在する。処理の手順として、内包判定部１０５２は、着目する辺が例えば水平方向となるように画像を回転させ、その状態においてそれぞれの仮想ピクセルがその辺よりも上に存在するか下に存在するかを判定するようにしてよい。 The inclusion determination unit 1052 performs an inclusion determination for each of the virtual pixels set above, edge by edge. In other words, the inclusion determination unit 1052 determines which side of each edge each virtual pixel is located on. As a processing procedure, the inclusion determination unit 1052 may rotate the image so that the edge of interest is, for example, horizontal, and then determine whether each virtual pixel is above or below that edge in that state.

図４は、内包判定部１０５２による処理を説明するための概略図の１つである。同図は、誤差を計算する対象となる図形の１つがｘｙ平面上に存在する状態を示している。同図において、２００１は、誤差算出の対象となる図形の一つである。図形２００１は、領域推定部１０２によって推定された結果の領域を表す図形、または学習用データ供給部１０４によって供給された正解領域を表す図形のいずれかである。本例では、図形２００１は、辺２０１１、２０１２、２０１３、および２０１４を有する四角形である。誤差算出の対象となる図形は、四角形に限らず、任意の多角形であってよい。ただし、対象となる図形は凸包多角形である。 Figure 4 is one schematic diagram illustrating the processing performed by the internalization determination unit 1052. This figure shows a state where one of the figures to be used for error calculation lies on the xy-plane. In this figure, 2001 is one of the figures to be used for error calculation. Figure 2001 is either a figure representing the region estimated by the region estimation unit 102, or a figure representing the correct region supplied by the learning data supply unit 104. In this example, figure 2001 is a quadrilateral with sides 2011, 2012, 2013, and 2014. The figure to be used for error calculation is not limited to a quadrilateral; it can be any polygon. However, the figure to be used is a convex hull polygon.

図５は、内包判定部１０５２による処理を説明するための別の概略図である。図５に示す状態は、図４において示した図形２００１を含む平面（ｘｙ平面）を回転させた結果の状態である。内包判定部１０５２は、回転により、図形２００１が持つ辺のうちの現在着目している辺（この例では、辺２０１１）が水平になるようにしている。図示する破線２０２１は、基準となる水平な線である。即ち、破線２０２１は、辺２０１１と平行な線である。破線２０２１は、図形２００１の中心点を通る線としている。なお、中心点は、図形２００１を構成する頂点のｘ座標およびｙ座標のそれぞれの平均値によって定めてよい。つまり、この図の垂直方向における位置関係を見たときに、破線２０２１は、辺２０１１よりも図形２００１の内側に位置している。逆に、垂直方向の位置において、辺２０１１の、破線２０２１と反対の側が、図形２００１の外側である。ここで、破線２０２１から辺２０１１までの距離を１．５とする。また、破線２０２１から仮想ピクセル２０２２までの距離を２．２とし、破線２０２１から仮想ピクセル２０２３までの距離を０．６とする。このとき、内包判定部１０５２は、下の式（１）により、内包判定値を算出する。 Figure 5 is another schematic diagram illustrating the processing performed by the containment determination unit 1052. The state shown in Figure 5 is the result of rotating the plane (xy plane) containing the figure 2001 shown in Figure 4. The containment determination unit 1052 rotates the figure 2001 so that the currently focused edge (edge 2011 in this example) is horizontal. The dashed line 2021 shown is a reference horizontal line. That is, the dashed line 2021 is a line parallel to edge 2011. The dashed line 2021 is a line that passes through the center point of figure 2001. The center point may be determined by the average value of the x and y coordinates of the vertices that make up figure 2001. In other words, when looking at the positional relationship in the vertical direction of this figure, the dashed line 2021 is located inside figure 2001, beyond edge 2011. Conversely, in the vertical direction, the side of edge 2011 opposite the dashed line 2021 is outside the shape 2001. Here, the distance from dashed line 2021 to edge 2011 is 1.5. Also, the distance from dashed line 2021 to virtual pixel 2022 is 2.2, and the distance from dashed line 2021 to virtual pixel 2023 is 0.6. At this time, the inclusion determination unit 1052 calculates the inclusion determination value using the following equation (1).

式（１）において、ｄ_ｐｘは、基準線（例における破線２０２１）から判定対象の仮想ピクセルまでの距離である。また、ｄ_{ｃｅｎｔｅｒ}は、基準線から着目している辺（この例では、辺２０１１）までの距離である。また、ｋは所定の正の値をとるパラメーターであり、例えばｋ＝１０とする。なお、ｅはネイピア数である。ｄ_ｐｘがｄ_{ｃｅｎｔｅｒ}よりも小さい場合には、式（１）によって求められる内包判定値は１に近い値となる。ｄ_ｐｘがｄ_{ｃｅｎｔｅｒ}よりも大きい場合には、式（１）によって求められる内包判定値は０に近い値となる。ｄ_ｐｘがｄ_{ｃｅｎｔｅｒ}にちょうど等しい場合には、式（１）によって求められる内包判定値は０．５となる。つまり、図５の例における垂直方向の位置で見たときに、仮想ピクセルが辺２０１１よりも下に位置していれば内包判定値は０に近づき、仮想ピクセルが辺２０１１よりも上に位置していれば内包判定値は１に近づく。なお、辺２０１１の近傍において内包判定値は滑らか且つ急な傾きで変化する。上記のパラメーターｋの値は、その辺２０１１の近傍における内包判定値の変化の急激さの度合いをコントロールするための値である。ただし、式（１）で表わされる内包判定値を求めるための関数は、ｄ_ｐｘの全領域においてｄ_ｐｘに関して微分可能である。 In equation (1), d _px is the distance from the reference line (dashed line 2021 in the example) to the virtual pixel to be determined. d _center is the distance from the reference line to the edge of interest (edge 2011 in this example). k is a parameter that takes a predetermined positive value, for example, k = 10. e is Napier's number. When _{d px} is less than d _center , the inclusion determination value obtained by equation (1) is close to 1. When d _px is greater than d _center , the inclusion determination value obtained by equation (1) is close to 0. When d _px is exactly equal to d _center , the inclusion determination value obtained by equation (1) is 0.5. In other words, when viewed from the vertical position in the example in Figure 5, if the virtual pixel is located below edge 2011, the inclusion determination value approaches 0, and if the virtual pixel is located above edge 2011, the inclusion determination value approaches 1. Furthermore, the intension determination value changes smoothly and steeply in the neighborhood of edge 2011. The value of the parameter k mentioned above controls the degree of abruptness of the change in the intension determination value in the neighborhood of edge 2011. However, the function for determining the intension determination value, expressed by equation (1), is differentiable with respect to _dpx over the entire domain of _dpx .

図５に示す例において、仮想ピクセル２０２２の、辺２０１１に関する内包判定値は、式（１）によって求められ、約０．０００９である（０に近い）。これは、仮想ピクセル２０２２が辺２０１１よりも上側に位置すること表す。また、仮想ピクセル２０２３の、辺２０１１に関する内包判定値は、式（１）によって求められ、約０．９９９９である（１に近い）。これは、仮想ピクセル２０２３が辺２０１１よりも下側に位置すること表す。 In the example shown in Figure 5, the inclusion determination value for virtual pixel 2022 with respect to edge 2011 is calculated by equation (1) and is approximately 0.0009 (close to 0). This indicates that virtual pixel 2022 is located above edge 2011. Similarly, the inclusion determination value for virtual pixel 2023 with respect to edge 2011 is calculated by equation (1) and is approximately 0.9999 (close to 1). This indicates that virtual pixel 2023 is located below edge 2011.

内包判定統合部１０５３は、内包判定部１０５２による辺ごとの内包判定を統合し、図形全体の関する内包判定値を求める。具体的には、内包判定統合部１０５３は、各仮想ピクセルについて、それぞれの辺に関して算出された内包判定値をすべて掛け合わせることによって、その仮想ピクセルの統合された内包判定値を算出する。既に説明したように、図５のように図形を回転させた状態において、即ち、辺が水平になるようにして且つ図形の中心点が辺の下側に位置するようにした状態において、辺の下側の仮想ピクセルの内包判定値は１に近く、辺の上側の仮想ピクセルの内包判定値は０に近い。また辺の近傍において内包判定値は急激に０と１との間で変化する。つまり、ある仮想ピクセルについて、それぞれの辺に関する内包判定値をすべて掛け合わせると、図形（例えば、四角形）の内部に存在する仮想ピクセルの統合された内包判定値は１に近い値となり、図形の外部に存在する仮想ピクセルの統合された内包判定値は０に近い値となる。 The Encompassing Determination Integration Unit 1053 integrates the encompassing determinations for each edge performed by the Encompassing Determination Unit 1052 to obtain an encompassing determination value for the entire figure. Specifically, the Encompassing Determination Integration Unit 1053 calculates the integrated encompassing determination value for each virtual pixel by multiplying all the encompassing determination values calculated for each respective edge. As already explained, when the figure is rotated as shown in Figure 5, that is, when the edges are horizontal and the center point of the figure is located below the edges, the encompassing determination value of virtual pixels below the edges is close to 1, and the encompassing determination value of virtual pixels above the edges is close to 0. Also, the encompassing determination value changes rapidly between 0 and 1 in the vicinity of the edges. In other words, when all the encompassing determination values for each edge are multiplied together for a given virtual pixel, the integrated encompassing determination value of virtual pixels inside the figure (for example, a rectangle) will be close to 1, and the integrated encompassing determination value of virtual pixels outside the figure will be close to 0.

内包判定統合部１０５３は、すべての仮想ピクセルについて、統合された内包判定値を求める。１枚の画像内のすべての仮想ピクセルについての統合された内包判定値の総和は、その図形が内包する仮想ピクセルの数の近似値であり、その値は図形の面積の近似値に比例するものである。つまり、内包判定統合部１０５３が求めた内包判定値の総和は、図形の面積を表す値であって、しかも式（１）の計算およびそれらの値の加算のみで求めることができる、微分可能な値である。 The enumeration determination integration unit 1053 calculates an integrated enumeration determination value for all virtual pixels. The sum of the integrated enumeration determination values for all virtual pixels in a single image is an approximation of the number of virtual pixels contained within the shape, and this value is proportional to the approximation of the shape's area. In other words, the sum of the enumeration determination values calculated by the enumeration determination integration unit 1053 represents the area of the shape, and is a differentiable value that can be obtained solely by the calculation in equation (1) and the addition of those values.

誤差関数値計算部１０５４は、内包判定統合部１０５３の計算結果を用いて、誤差算出部１０５に入力される２つの図形の間の誤差を計算する。具体的には、誤差関数値計算部１０５４は、次に図６を参照しながら説明する計算を行う。 The error function value calculation unit 1054 uses the calculation results of the internalization determination integration unit 1053 to calculate the error between the two figures input to the error calculation unit 105. Specifically, the error function value calculation unit 1054 performs the calculation described below with reference to Figure 6.

図６は、誤差関数値計算部１０５４が２つの図形の間の誤差の求める際の処理を説明するための概略図である。図６（Ａ）は、１つ目の図形である図形２００２の領域を示す。図６（Ｂ）は、２つ目の図形である図形２００３の領域を示す。これら２つの図形が一致しているほど両者間の誤差は小さく、異なっているほど両者間の誤差は大きい。そのような誤差を求めるために、誤差関数値計算部１０５４は、下で説明するように、ＩｏＵ（Intersection over Union）の代替となり得る計算を行う。図６（Ｃ）は、図形２００２の領域と図形２００３の領域との間で、一部においてのみ重複が存在している状況を示す。誤差関数値計算部１０５４が行う計算は、両者の和集合の部分の領域の広さと、両者の積集合の部分の領域の広さとに基づく計算である。 Figure 6 is a schematic diagram illustrating the process by which the error function value calculation unit 1054 calculates the error between two figures. Figure 6(A) shows the region of the first figure, figure 2002. Figure 6(B) shows the region of the second figure, figure 2003. The more these two figures coincide, the smaller the error between them; the more they differ, the larger the error between them. To calculate such an error, the error function value calculation unit 1054 performs a calculation that can serve as an alternative to IoU (Intersection over Union), as explained below. Figure 6(C) shows a situation where there is only a partial overlap between the regions of figure 2002 and figure 2003. The calculation performed by the error function value calculation unit 1054 is based on the size of the region of the union of the two figures and the size of the region of the intersection of the two figures.

ＩｏＵにおけるインターセクション（intersection）は、２つの図形の共通部分（積集合の部分）である。本実施形態では、ＩｏＵにおけるインターセクションに相当する値を得るための計算として、誤差関数値計算部１０５４は、各仮想ピクセルについて、２つの図形のそれぞれの統合された内包判定値の積を求める。１つの仮想ピクセルについて、２つの図形の少なくともいずれか一方の内包判定値が０に近い値の場合には、それら２つの内包判定値の積は０に近い値となる。２つの図形の両方の内包判定値が１に近い値の場合には、それら２つの内包判定値の積は１に近い値となる。しかも、２つの内包判定値の積は、微分可能である。すべての仮想ピクセルについての上記の積の総和が、本実施形態においてインターセクションに相当する値である。ＩｏＵにおけるユニオン（union）は、２つの図形の和集合の部分である。本実施形態では、ＩｏＵにおけるインターセクションに相当する値を得るための計算として、誤差関数値計算部１０５４は、各仮想ピクセルについて、２つの図形のそれぞれの統合された内包判定値の和を求め、その和から上記の積の値を減じた値を求める。２つの図形の共通部分については、それら内包判定値の和は２に近い値となる。したがって、その和から、２つの内包判定値の積（１に近い値）を減じることによって、求められる値は１に近い値となる。２つの図形の排他的和の部分については、それらの内包判定値の和は１に近い値となる。排他的和の部分についての２つの内包判定値の積は０に近い値であるので、結果として求められる値は１に近い値となる。その他の部分（２つの図形のどちらにも属さない部分）については、それらの内包判定値の和も積もともに０に近い値となる。したがって、その和からその積を減じた結果は、０に近い値となる。すべての仮想ピクセルについての上記の和から積を減じた結果の値の総和が、本実施形態においてユニオンに相当する値である。この値もまた微分可能である。 In IoU, the intersection is the common part (the intersection portion) of two figures. In this embodiment, to obtain a value corresponding to the intersection in IoU, the error function value calculation unit 1054 calculates the product of the integrated intensity determination values of the two figures for each virtual pixel. For a single virtual pixel, if the intensity determination value of at least one of the two figures is close to 0, the product of those two intensity determination values will be close to 0. If the intensity determination values of both figures are close to 1, the product of those two intensity determination values will be close to 1. Moreover, the product of the two intensity determination values is differentiable. The sum of the above products for all virtual pixels is the value corresponding to the intersection in this embodiment. In IoU, the union is the part of the union of two figures. In this embodiment, to obtain a value corresponding to the intersection in IoU, the error function value calculation unit 1054 calculates the sum of the integrated intensification determination values of the two shapes for each virtual pixel, and then subtracts the value of the product from that sum. For the common part of the two shapes, the sum of their intensification determination values is close to 2. Therefore, by subtracting the product of the two intensification determination values (close to 1) from this sum, the resulting value is close to 1. For the exclusive sum portion of the two shapes, the sum of their intensification determination values is close to 1. Since the product of the two intensification determination values for the exclusive sum portion is close to 0, the resulting value is close to 1. For the remaining portion (the portion not belonging to either of the two shapes), both the sum and the product of their intensification determination values are close to 0. Therefore, the result of subtracting the product from the sum is close to 0. The sum of the results obtained by subtracting the product from the above sum for all virtual pixels is the value corresponding to the union in this embodiment. This value is also differentiable.

つまり、誤差関数値計算部１０５４は、下の式（２）によって、２つの図形の間の誤差を算出する。 In other words, the error function value calculation unit 1054 calculates the error between the two figures using the following equation (2).

sum(intersection) / sum(union) ・・・（２） sum(intersection) / sum(union)...(2)

この式（２）において、unionは、上記方法で誤差関数値計算部１０５４が求めた各仮想ピクセルのユニオンに相当する値である。また、式（２）におけるintersectionは、上記方法で誤差関数値計算部１０５４が求めた各仮想ピクセルのインターセクションに相当する値である。また、sum(・)は、画像内のすべての仮想ピクセルについての値の総和を取る操作（演算）を表す。 In equation (2), `union` is the value corresponding to the union of each virtual pixel, as determined by the error function value calculation unit 1054 using the method described above. Similarly, `intersection` in equation (2) is the value corresponding to the intersection of each virtual pixel, as determined by the error function value calculation unit 1054 using the method described above. Furthermore, `sum(•)` represents the operation (calculation) of summing the values of all virtual pixels in the image.

これまでに説明した通り、式（２）におけるunionもintersectionも、微分可能である。また、それらそれぞれのすべての仮想ピクセルについての総和の値も微分可能である。さらに、式（２）によって表される値（誤差関数値計算部１０５４が算出する誤差）もまた、微分可能である。 As explained above, both the union and intersection in equation (2) are differentiable. Furthermore, the sum of their values over all virtual pixels is also differentiable. Moreover, the value expressed by equation (2) (the error calculated by the error function calculation unit 1054) is also differentiable.

以上説明したように、誤差算出部１０５は、２つの図形（いずれも、凸包図形（凸包多角形））に関して式（１）の関数を用いることによって適切な誤差を求めることができる。また、その誤差は、微分可能であるため、領域推定部１０２が内部に持つニューラルネットワークについて、誤差逆伝播によるパラメーターの更新を行うことができる。つまり、十分な量の学習用データを用いて、十分な回数の誤差逆伝播を行うことによって、領域推定部１０２が持つニューラルネットワーク（機械学習モデル）は最適化される。 As explained above, the error calculation unit 105 can determine an appropriate error by using the function of equation (1) with respect to the two figures (both convex hull figures (convex hull polygons)). Furthermore, since this error is differentiable, the neural network internally controlled by the region estimation unit 102 can be updated through backpropagation. In other words, by using a sufficient amount of training data and performing backpropagation a sufficient number of times, the neural network (machine learning model) of the region estimation unit 102 is optimized.

以上説明したように、本実施形態の領域検出装置１では検出すべき領域を表す図形は、任意の向きの任意の凸包多角形であってよい（長方形等に限定されない）。また、上で説明した誤差算出の処理過程からも明らかなように、領域推定部１０２が推定結果として出力する領域の図形（凸包多角形）の頂点の数と、学習用データ供給部１０４が供給する正解領域の図形（凸包多角形）の頂点の数とは、異なっていてもよい。
即ち、本実施形態により、前述した課題は解決され、任意の凸包多角形同士の誤差を、微分可能な関数の値として求めることができる。これにより、機械学習モデルを用いて、長方形以外の図形に関する領域の検出を行うことが可能となる。これにより、画像内の特定の特徴を有する領域を検出するという問題において制約が取り払われ、より一般的な問題を解決するために領域検出装置１を利用することができる。 As explained above, in the region detection device 1 of this embodiment, the figure representing the region to be detected may be any convex hull polygon with any orientation (not limited to rectangles, etc.). Furthermore, as is clear from the error calculation process described above, the number of vertices of the region figure (convex hull polygon) output by the region estimation unit 102 as an estimation result may be different from the number of vertices of the correct region figure (convex hull polygon) supplied by the learning data supply unit 104.
In other words, this embodiment solves the aforementioned problems, and the error between any two convex hull polygons can be obtained as the value of a differentiable function. This makes it possible to detect regions of shapes other than rectangles using a machine learning model. This removes constraints in the problem of detecting regions with specific features in an image, and the region detection device 1 can be used to solve more general problems.

図７は、実施形態として説明した領域検出装置１の内部構成の例を示すブロック図である。領域検出装置１は、コンピューターを用いて実現され得る。図示するように、そのコンピューターは、中央処理装置９０１と、ＲＡＭ９０２と、入出力ポート９０３と、入出力デバイス９０４や９０５等と、バス９０６と、を含んで構成される。コンピューター自体は、既存技術を用いて実現可能である。中央処理装置９０１は、ＲＡＭ９０２等から読み込んだプログラムに含まれる命令を実行する。中央処理装置９０１は、各命令にしたがって、ＲＡＭ９０２にデータを書き込んだり、ＲＡＭ９０２からデータを読み出したり、算術演算や論理演算を行ったりする。ＲＡＭ９０２は、データやプログラムを記憶する。ＲＡＭ９０２に含まれる各要素は、アドレスを持ち、アドレスを用いてアクセスされ得るものである。なお、ＲＡＭは、「ランダムアクセスメモリー」の略である。入出力ポート９０３は、中央処理装置９０１が外部の入出力デバイス等とデータのやり取りを行うためのポートである。入出力デバイス９０４や９０５は、入出力デバイスである。入出力デバイス９０４や９０５は、入出力ポート９０３を介して中央処理装置９０１との間でデータをやりとりする。バス９０６は、コンピューター内部で使用される共通の通信路である。例えば、中央処理装置９０１は、バス９０６を介してＲＡＭ９０２のデータを読んだり書いたりする。また、例えば、中央処理装置９０１は、バス９０６を介して入出力ポートにアクセスする。 Figure 7 is a block diagram showing an example of the internal configuration of the region detection device 1 described as an embodiment. The region detection device 1 can be implemented using a computer. As shown in the figure, the computer is composed of a central processing unit 901, RAM 902, input/output ports 903, input/output devices 904 and 905, etc., and a bus 906. The computer itself can be implemented using existing technology. The central processing unit 901 executes instructions contained in programs read from RAM 902, etc. The central processing unit 901 writes data to RAM 902, reads data from RAM 902, and performs arithmetic and logical operations according to each instruction. RAM 902 stores data and programs. Each element contained in RAM 902 has an address and can be accessed using that address. RAM stands for "Random Access Memory". Input/output ports 903 are ports for the central processing unit 901 to exchange data with external input/output devices, etc. Input/output devices 904 and 905 are input/output devices. Input/output devices 904 and 905 exchange data with the central processing unit 901 via input/output port 903. Bus 906 is a common communication channel used within the computer. For example, the central processing unit 901 reads and writes data to RAM 902 via bus 906. Also, for example, the central processing unit 901 accesses input/output ports via bus 906.

上述した実施形態における領域検出装置１の少なくとも一部の機能をコンピューターおよびプログラムで実現することができる。その場合、この機能を実現するためのプログラムをコンピューター読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピューターシステムに読み込ませ、実行することによって実現しても良い。なお、ここでいう「コンピューターシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピューター読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、ＵＳＢメモリー等の可搬媒体、コンピューターシステムに内蔵されるハードディスク等の記憶装置のことをいう。つまり、「コンピューター読み取り可能な記録媒体」とは、非一過性の（non-transitory）コンピューター読み取り可能な記録媒体であってよい。さらに「コンピューター読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、一時的に、動的にプログラムを保持するもの、その場合のサーバーやクライアントとなるコンピューターシステム内部の揮発性メモリーのように、一定時間プログラムを保持しているものも含んでも良い。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピューターシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 At least some of the functions of the area detection device 1 in the above-described embodiment can be realized by a computer and a program. In that case, the program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on this recording medium may be loaded into a computer system and executed. Here, "computer system" includes hardware such as an OS and peripheral devices. Furthermore, "computer-readable recording medium" refers to portable media such as flexible disks, magneto-optical disks, ROMs, CD-ROMs, DVD-ROMs, USB memory, and storage devices such as hard disks built into a computer system. In other words, "computer-readable recording medium" may be a non-transitory computer-readable recording medium. Moreover, "computer-readable recording medium" may also include those that temporarily and dynamically hold programs, such as communication lines when transmitting programs via networks such as the Internet or communication lines such as telephone lines, and those that hold programs for a certain period of time, such as volatile memory inside a computer system that acts as a server or client in that case. Furthermore, the above-mentioned program may be for realizing some of the functions described above, and may also be able to realize the above-mentioned functions in combination with a program already recorded in the computer system.

以上、実施形態を説明したが、本発明はさらに次のような変形例でも実施することが可能である。 The embodiments have been described above, but the present invention can also be implemented with the following modifications.

［変形例］
実施形態においては、式（１）により１つの仮想ピクセル且つ１つの辺についての内包判定値を算出し、すべての辺についてのその内包判定値を掛け合わせることによってその仮想ピクセルの図形（凸包多角形）に関する内包判定値を求める手順を説明した。変形例として、ここでは、１つの仮想ピクセルの図形に関する内包判定値を求める方法を説明する。 [Variations]
In the embodiment, a procedure was described in which an inclusion determination value for one virtual pixel and one edge is calculated using equation (1), and the inclusion determination value for the shape (convex hull polygon) of that virtual pixel is obtained by multiplying the inclusion determination values for all edges. As a modified example, a method for obtaining an inclusion determination value for the shape of one virtual pixel is described here.

図８は、この変形例による内包判定値の求め方を説明するための概略図である。同図に示す例において、判定対象の図形は凸５角形である。その凸５角形は、辺２０３１、２０３２、２０３３、２０３４、および２０３５を持つ。この図では、境界線（破線）を用いて領域を３つに分割して示している。領域Ｒ１は、図形（５角形）の内側であり、且つ、いずれの辺からも所定距離以上離れている領域である。領域Ｒ３は、図形（５角形）の外側であり、且つ、いずれの辺からも所定距離以上離れている領域である。領域Ｒ２は、領域Ｒ１でもＲ３でもない領域である。即ち、領域Ｒ２は、辺２０３１、２０３２、２０３３、２０３４、および２０３５のうちのいずれかの辺の近傍の領域である。言い換えれば、領域Ｒ２は、これらの辺のうちの少なくともいずれかの辺から所定の距離内にある領域である。各辺とは垂直な方向における領域Ｒ２の幅は、十分に小さいものとする。この変形例においても、内包判定値は、０以上且つ１以下の値とする。この変形例においては、領域Ｒ１における内包判定値を１またはほぼ１（即ち、１－ε以上且つ１以下）とする。ただし、εは、１に比べて十分に小さい正定数である。また、領域Ｒ３における内包判定値を０またはほぼ０（即ち、０以上且つε以下）とする。また、この画像内の仮想ピクセルにおける内包判定値を算出するための関数は、画像内の全領域において連続且つ微分可能とする。 Figure 8 is a schematic diagram illustrating how to determine the inclusion determination value using this modified example. In the example shown in the figure, the figure to be determined is a convex pentagon. This convex pentagon has sides 2031, 2032, 2033, 2034, and 2035. In this figure, the region is divided into three areas using boundary lines (dashed lines). Region R1 is the area inside the figure (pentagon) and is at least a predetermined distance from any of its sides. Region R3 is the area outside the figure (pentagon) and is at least a predetermined distance from any of its sides. Region R2 is the area that is neither Region R1 nor R3. That is, Region R2 is the area near any of the sides 2031, 2032, 2033, 2034, and 2035. In other words, Region R2 is the area that is at least a predetermined distance from any of these sides. The width of region R2 in the direction perpendicular to each edge is assumed to be sufficiently small. In this modification, the intensification criterion is set to a value between 0 and 1 (inclusive). In this modification, the intensification criterion in region R1 is set to 1 or approximately 1 (i.e., between 1-ε and 1), where ε is a positive constant sufficiently small compared to 1. Furthermore, the intensification criterion in region R3 is set to 0 or approximately 0 (i.e., between 0 and ε). Additionally, the function for calculating the intensification criterion for virtual pixels within this image is assumed to be continuous and differentiable across the entire image region.

変形例の誤差算出部１０５は、このような内包判定値に基づいて、既に実施形態で説明した計算手順を用いて、２つの図形間の誤差を算出する。 The modified error calculation unit 105 calculates the error between the two figures based on the inclusion determination value, using the calculation procedure already described in the embodiment.

なお、図８を参照しながら説明した内包判定値は、より一般的な形態に対応するものである。実施形態で説明した内包判定値の求め方は、この一般的形態における一つの特殊例である。 The inclusion determination value explained with reference to Figure 8 corresponds to a more general form. The method for determining the inclusion determination value described in the embodiment is a special case within this general form.

以上説明したように、実施形態（変形例を含む）によれば、機械学習モデルを用いて、長方形以外の図形についても、領域の検出の対象とすることが可能となる。また、実施形態（変形例を含む）によれば、任意の解像度で仮想ピクセルを生成して（設けて）精細に領域を検出することが可能となる。 As described above, according to the embodiments (including modifications), it becomes possible to detect regions of shapes other than rectangles using a machine learning model. Furthermore, according to the embodiments (including modifications), it becomes possible to generate (create) virtual pixels at any resolution to detect regions with greater precision.

以上、この発明の実施形態（変形例を含む）について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The embodiments of this invention (including modifications) have been described in detail above with reference to the drawings. However, the specific configuration is not limited to these embodiments, and designs and other elements that do not depart from the spirit of this invention are also included.

［応用例と効果］
上記実施形態の領域検出装置１の応用例とその効果について説明する。応用例としては、領域検出装置１を用いて、画像内に含まれる文字の領域を検出する。 [Application Examples and Effects]
The application examples and effects of the region detection device 1 of the above embodiment will be described. As an application example, the region detection device 1 is used to detect the region of characters contained in an image.

図９は、この応用例において処理対象とした画像の例を示す概略図である。図９（Ａ）は、領域検出装置１への入力となる画像を示す。この画像内には例として「ａｂｃ」という文字が含まれている。領域検出装置１は、文字領域を検出するように、学習用データを用いて予め機械学習を行っておく。学習済みの領域検出装置１を、推定実行モードで動作させて、図９（Ａ）の画像を入力する。図９（Ｂ）は、その検出結果の例である。領域検出装置１は、「ａ」、「ｂ」、「ｃ」という文字をそれぞれ含んだ領域を検出する。検出された領域のそれぞれは、四角形である（長方形ではない）。ここで、領域検出装置１による検出結果に基づいて、検出された領域をそれぞれ独立の画像として分離するとともに、検出された領域以外については黒（画素値０）でマスクする。図９（Ｃ）は、その結果を示す３つの画像である。図９（Ｃ）においてハッチングされている領域は、黒（画素値０）でマスクされた領域である。このマスクは、次に行う文字認識の処理において不要なノイズ情報が入らないようにするためである。そして、これら３つの画像を対象として文字認識装置（本願発明対象外）を用いて文字認識を行った。その結果、「ａ」、「ｂ」、および「ｃ」という文字が正しく認識された。つまり、この応用例においては、文字認識の処理の前処理として、特定の特徴を有する領域（文字の領域）の検出を行った。この実証実験を行ったところ、文字認識精度（Ｆ値）は、領域検出を行わない場合の７０％から、領域検出の前処理を行った場合の７３％に、認識精度が向上した。つまり、上記実施形態の効果が得られていることが確認できた。 Figure 9 is a schematic diagram showing an example of an image processed in this application. Figure 9(A) shows the image that will be input to the region detection device 1. This image contains the letters "abc" as an example. The region detection device 1 is pre-trained using training data to detect character regions. The trained region detection device 1 is operated in estimation execution mode and the image in Figure 9(A) is input. Figure 9(B) is an example of the detection result. The region detection device 1 detects regions containing the letters "a", "b", and "c", respectively. Each of the detected regions is a rectangle (not a square). Here, based on the detection result by the region detection device 1, the detected regions are separated into independent images, and the areas other than the detected regions are masked with black (pixel value 0). Figure 9(C) shows three images showing the result. In Figure 9(C), the hatched regions are the regions masked with black (pixel value 0). This mask is to prevent unnecessary noise information from being included in the subsequent character recognition processing. Then, character recognition was performed on these three images using a character recognition device (not covered by this invention). As a result, the characters "a," "b," and "c" were correctly recognized. In other words, in this application example, as a preprocessing step for character recognition, a region with specific features (a region containing characters) was detected. In this demonstration experiment, the character recognition accuracy (F-value) improved from 70% without region detection to 73% with region detection preprocessing. In other words, it was confirmed that the effects of the above embodiment were achieved.

本発明は、例えば、画像内に存在する特定の特徴を検出するために利用することができる。より具体的には、本発明を、画像内の物体の検出に利用したり、画像内の文字領域の検出に利用したり、その他の特定の領域の検出に利用したりすることができる。但し、本発明の利用範囲はここに例示したものには限られない。 This invention can be used, for example, to detect specific features present in an image. More specifically, it can be used to detect objects in an image, to detect areas containing text, or to detect other specific regions. However, the scope of application of this invention is not limited to those exemplified herein.

１領域検出装置
１０１画像入力部
１０２領域推定部
１０３結果出力部
１０４学習用データ供給部
１０５誤差算出部
９０１中央処理装置
９０２ＲＡＭ
９０３入出力ポート
９０４，９０５入出力デバイス
９０６バス
１０５１仮想ピクセル生成部
１０５２内包判定部
１０５３内包判定統合部
１０５４誤差関数値計算部 1. Region detection device 101, Image input unit 102, Region estimation unit 103, Result output unit 104, Training data supply unit 105, Error calculation unit 901, Central processing unit 902, RAM
903 Input/Output Ports 904, 905 Input/Output Device 906 Bus 1051 Virtual Pixel Generation Unit 1052 Encompassment Determination Unit 1053 Encompassment Determination Integration Unit 1054 Error Function Value Calculation Unit

Claims

A region estimation unit that internally incorporates a machine learning model and estimates regions within an image that have specific features by inputting an image provided from an external source into the machine learning model,
A training data supply unit that supplies pairs of training images and corresponding ground truth regions for training the machine learning model provided by the region estimation unit,
An error calculation unit that calculates the error between an estimated region, which is the result of estimation by the region estimation unit based on the training image supplied by the training data supply unit, and the correct region supplied by the training data supply unit in correspondence with the training image.
A region detection device comprising,
The figure corresponding to the estimation region and the figure corresponding to the correct answer region are both convex hull polygons,
The error calculation unit,
A virtual pixel generation unit sets a number of virtual pixels common to both the aforementioned image and the aforementioned training image,
An error function value calculation unit calculates, for each of the virtual pixels, the sum and product of the inclusion determination value for the figure corresponding to the estimation region and the inclusion determination value for the figure corresponding to the correct answer region, respectively, the value obtained by subtracting the product from the sum as the union value, the value of the product as the intersection value, and the value obtained by dividing the sum of the intersection values for all the virtual pixels by the sum of the union values for all the virtual pixels as the error.
Equipped with,
The inclusion determination value for the figure corresponding to the estimated region is 1 or approximately 1 if the virtual pixel is inside the figure and is at least a predetermined distance from any side of the figure, and is 0 or approximately 0 if the virtual pixel is outside the figure and is at least a predetermined distance from any side of the figure, and the function for determining the inclusion determination value for the figure corresponding to the estimated region is continuous and differentiable over the entire region of the image.
The inclusion determination value for a figure corresponding to the correct answer region is 1 or approximately 1 if the virtual pixel is inside the figure and is at least a predetermined distance from any side of the figure, and is 0 or approximately 0 if the virtual pixel is outside the figure and is at least a predetermined distance from any side of the figure, and the function for determining the inclusion determination value for a figure corresponding to the correct answer region is continuous and differentiable over the entire region of the training image.
The inclusion determination value for the figure corresponding to the estimation region and the inclusion determination value for the figure corresponding to the correct answer region are, respectively, 0 or greater and 1 or less in all regions of the image and the training image.
Area detection device.

The error calculation unit,
moreover,
For each of the aforementioned virtual pixels, for each edge of the figure corresponding to the estimated region and each edge of the figure corresponding to the correct region, the following equation (1)
(where e is Napier's number, k is a predetermined positive constant, _dpx is the distance from a reference line parallel to the edge and passing through a predetermined point inside the figure to the virtual pixel, and d _center is the distance from the reference line to the edge with respect to that edge.)
An inclusion determination unit that determines an inclusion determination value for the edge of the virtual pixel,
An integration unit for determining the inclusion determination value of a virtual pixel for a given shape is determined by multiplying the inclusion determination values of the virtual pixels for all sides of the shape determined by the inclusion determination unit.
The region detection device according to claim 1, comprising:

A region estimation unit that internally incorporates a machine learning model and estimates regions within an image that have specific features by inputting an image provided from an external source into the machine learning model,
A training data supply unit that supplies pairs of training images and corresponding ground truth regions for training the machine learning model provided by the region estimation unit,
An error calculation unit that calculates the error between an estimated region, which is the result of estimation by the region estimation unit based on the training image supplied by the training data supply unit, and the correct region supplied by the training data supply unit in correspondence with the training image.
A region detection device comprising,
The figure corresponding to the estimation region and the figure corresponding to the correct answer region are both convex hull polygons,
The error calculation unit,
A virtual pixel generation unit sets a number of virtual pixels common to both the aforementioned image and the aforementioned training image,
An error function value calculation unit calculates, for each of the virtual pixels, the sum and product of the inclusion determination value for the figure corresponding to the estimation region and the inclusion determination value for the figure corresponding to the correct answer region, respectively, the value obtained by subtracting the product from the sum as the union value, the value of the product as the intersection value, and the value obtained by dividing the sum of the intersection values for all the virtual pixels by the sum of the union values for all the virtual pixels as the error.
Equipped with,
The inclusion determination value for the figure corresponding to the estimated region is 1 or approximately 1 if the virtual pixel is inside the figure and is at least a predetermined distance from any side of the figure, and is 0 or approximately 0 if the virtual pixel is outside the figure and is at least a predetermined distance from any side of the figure, and the function for determining the inclusion determination value for the figure corresponding to the estimated region is continuous and differentiable over the entire region of the image.
The inclusion determination value for a figure corresponding to the correct answer region is 1 or approximately 1 if the virtual pixel is inside the figure and is at least a predetermined distance from any side of the figure, and is 0 or approximately 0 if the virtual pixel is outside the figure and is at least a predetermined distance from any side of the figure, and the function for determining the inclusion determination value for a figure corresponding to the correct answer region is continuous and differentiable over the entire region of the training image.
The inclusion determination value for the figure corresponding to the estimation region and the inclusion determination value for the figure corresponding to the correct answer region are, respectively, 0 or greater and 1 or less in all regions of the image and the training image.
A program to make a computer function as a region detection device.