JP2023029006A

JP2023029006A - Information processing device, generation method of teacher data, generation method of learnt model and program

Info

Publication number: JP2023029006A
Application number: JP2021135053A
Authority: JP
Inventors: 裕介三木; Yusuke Miki
Original assignee: Hitachi Zosen Corp
Current assignee: Hitachi Zosen Corp
Priority date: 2021-08-20
Filing date: 2021-08-20
Publication date: 2023-03-03
Also published as: CN115713460A

Abstract

To generate teacher data including a label indicating a position and a range of a detection object correctly.SOLUTION: An information processing device (1) comprises: a teacher image generation unit (101) that generates a teacher image by applying conversion that enlarges or shrinks an original image on which a detection object is photographed in a height direction and a width direction; a coordinate conversion unit (102) that converts a coordinate of each apex indicating a position and a range of the detection object photographed on the original image into a coordinate of each apex indicating a position and a range of the detection object photographed on the teacher image based on the magnification of the conversion; and a teacher data generation unit (103) that generates a label indicating the position and the range of the detection object photographed on the teacher image based on the converted coordinate to make it associated with the teacher image to be teacher data (112).SELECTED DRAWING: Figure 1

Description

本発明は、機械学習に用いる教師データを生成する情報処理装置等に関する。 The present invention relates to an information processing apparatus and the like that generate teacher data used for machine learning.

従来から、教師データを用いた機械学習が行われている。一般的に機械学習には多数の教師データが必要となるが、多数の教師データを取得することが難しい場合もある。このような問題を解消するための技術として、既存の教師データを基に新たな教師データを生成して教師データの総数を増やす技術が知られている。 Conventionally, machine learning using teacher data has been performed. Machine learning generally requires a large amount of teacher data, but sometimes it is difficult to obtain a large amount of teacher data. As a technique for solving such a problem, a technique is known in which new teacher data is generated based on existing teacher data to increase the total number of teacher data.

例えば、下記の特許文献１には、教師画像を複数のパターンに分類して、数が少ないパターンを特定し、教師画像を空間的に反転したり色調を変更したりすることにより、不足パターンに属する新たな教師画像を生成する機械学習システムが開示されている。 For example, in Patent Document 1 below, a teacher image is classified into a plurality of patterns, a pattern with a small number is specified, and by spatially inverting the teacher image or changing the color tone, it is possible to solve the missing pattern. A machine learning system is disclosed that generates a new training image to belong to.

特開２０１８－１６９６７２号JP 2018-169672

また、画像に写る検出対象を検出する学習済みモデルを生成するための機械学習では、検出対象が写る教師画像に対して、その検出対象の位置および範囲を示すラベルを対応付けたものを教師データとする。一般的には、検出対象を囲む矩形領域の代表座標（例えば矩形領域の中心位置の座標）と当該矩形領域の幅および高さがラベルとして教師画像に対応付けられる。なお、幅および高さは、横の長さおよび縦の長さと読み替えることもできる。 In addition, in machine learning to generate a trained model that detects a detection target in an image, training data that associates a label indicating the position and range of the detection target with a teacher image in which the detection target is captured is used as training data. and In general, representative coordinates of a rectangular area surrounding a detection target (for example, coordinates of the center position of the rectangular area) and the width and height of the rectangular area are associated with the teacher image as labels. Note that width and height can also be read as horizontal length and vertical length.

図４は、従来技術を示す図であり、教師画像における検出対象の位置および範囲を示すラベルの設定例を示す図である。図４の左側に示す教師画像ＩＭＧ１には、検出対象ＯＢ１が写っており、この検出対象ＯＢ１を囲む矩形領域Ｒ１が設定されている。この矩形領域Ｒ１の代表座標と幅と高さがラベルとして教師画像ＩＭＧ１に対応付けられる。 FIG. 4 is a diagram showing a conventional technique, showing an example of label setting indicating the position and range of a detection target in a teacher image. A detection target OB1 is shown in the teacher image IMG1 shown on the left side of FIG. 4, and a rectangular area R1 surrounding the detection target OB1 is set. The representative coordinates, width and height of this rectangular area R1 are associated with the teacher image IMG1 as a label.

ここで、図４の教師画像ＩＭＧ１のように、検出対象ＯＢ１が棒状の物体であり、この検出対象ＯＢ１が傾斜して写っている場合、矩形領域Ｒ１に占める検出対象ＯＢ１が写っていない背景領域（矩形領域Ｒ１の右上領域と左下領域）の割合が高くなる。このようなラベルを用いて学習を行うと、検出対象ＯＢ１の背景領域の影響が大きくなるため好ましくない。例えば図４の教師画像ＩＭＧ１では、板状の物体の一部が矩形領域Ｒ１内に写り込んでおり、これが学習結果に影響を与える可能性がある。 Here, as in the teacher image IMG1 in FIG. 4, when the detection target OB1 is a rod-shaped object and the detection target OB1 is photographed at an angle, the background region in which the detection target OB1 is not shown in the rectangular region R1 is The ratio of (upper right area and lower left area of rectangular area R1) is high. If learning is performed using such labels, the influence of the background area of the detection target OB1 becomes large, which is not preferable. For example, in the teacher image IMG1 of FIG. 4, part of a plate-like object is reflected in the rectangular region R1, which may affect the learning result.

そこで、図４の右側の教師画像ＩＭＧ２のように、傾斜した矩形領域Ｒ２を設定することが考えられる。これにより、矩形領域Ｒ１と比べて背景領域の割合を大きく引き下げることができる。教師画像ＩＭＧ２には、矩形領域Ｒ２の代表座標、幅、および高さに加えて、傾斜角度αをラベルとして対応付ければよい。 Therefore, it is conceivable to set an inclined rectangular region R2 like the teacher image IMG2 on the right side of FIG. As a result, the ratio of the background area can be greatly reduced compared to the rectangular area R1. In addition to the representative coordinates, width, and height of the rectangular region R2, the teacher image IMG2 may be associated with the tilt angle α as a label.

しかしながら、矩形領域を傾斜させた場合、教師画像のバリエーションを増やすために、元画像のアスペクト比（高さと幅の比。縦横比と同義）を変える変換を行ったときに、変換により生成された教師画像において、ラベルにより示される検出対象の位置および範囲が、教師画像に写る検出対象の位置および範囲とずれる場合があるという問題がある。 However, when the rectangular area is tilted, the aspect ratio (ratio of height and width; synonymous with aspect ratio) of the original image is changed in order to increase the variation of the teacher image. There is a problem that the position and range of the detection target indicated by the label in the teacher image may deviate from the position and range of the detection target shown in the teacher image.

これについて、図５に基づいて説明する。図５は、従来技術の問題点を説明する図である。図５の例では、上述の教師画像ＩＭＧ１、ＩＭＧ２と同じく検出対象ＯＢ１が写る画像ＩＭＧ３から一点鎖線で示す切出領域の部分を切り出している。そして、その切出領域の部分を元画像ＩＭＧ３’とし、元画像ＩＭＧ３’をサイズ調整することにより教師画像ＩＭＧ４を生成している。 This will be described with reference to FIG. FIG. 5 is a diagram for explaining problems of the conventional technology. In the example of FIG. 5, a cut-out area indicated by a dashed line is cut out from an image IMG3 in which the detection target OB1 is captured, as with the teacher images IMG1 and IMG2 described above. Then, the clipped region portion is used as the original image IMG3', and the teacher image IMG4 is generated by adjusting the size of the original image IMG3'.

より詳細には、画像ＩＭＧ３のサイズは幅および高さが何れも６０８ピクセルであり、画像ＩＭＧ３に写る対象物ＯＢ１には、図４のＩＭＧ２と同様に角度αで傾斜した矩形領域Ｒ３が設定されている。また、画像ＩＭＧ３から切り出された元画像ＩＭＧ３’のサイズは幅５６６、高さ３８３ピクセルである。元画像ＩＭＧ３’の下端部は画像ＩＭＧ３からはみ出しているが、教師画像の生成においては、このような切り出し方も可能である。 More specifically, the size of the image IMG3 is 608 pixels in width and height, and a rectangular area R3 inclined at an angle α is set in the object OB1 shown in the image IMG3 as in the case of the IMG2 in FIG. ing. Also, the size of the original image IMG3' cut out from the image IMG3 is 566 pixels wide and 383 pixels high. Although the lower end of the original image IMG3' protrudes from the image IMG3, such a clipping method is also possible in generating the teacher image.

そして、元画像ＩＭＧ３’を拡大・縮小して、幅および高さが５４４ピクセルの教師画像ＩＭＧ４としている。具体的には、元画像ＩＭＧ３’を幅方向に（５４４／５６６）倍し、高さ方向に（５４４／３８３）倍することにより教師画像ＩＭＧ４が生成されている。幅方向と高さ方向の拡大率が相違しているため、元画像ＩＭＧ３’と教師画像ＩＭＧ４ではアスペクト比が変わっている。アスペクト比が変わること自体は、教師画像のバリエーションを増やすという観点から好ましい。 Then, the original image IMG3' is enlarged/reduced to become a teacher image IMG4 having a width and height of 544 pixels. Specifically, the teacher image IMG4 is generated by multiplying the original image IMG3' by (544/566) in the width direction and by (544/383) in the height direction. Since the enlargement ratios in the width direction and height direction are different, the original image IMG3' and the teacher image IMG4 have different aspect ratios. Changing the aspect ratio itself is preferable from the viewpoint of increasing variations of the teacher image.

また、ＩＭＧ４において、検出対象ＯＢ１の位置および範囲を示す矩形領域Ｒ４は、矩形領域Ｒ２の幅を（５４４／５６６）倍し、高さを（５４４／３８３）倍したものである。このように、矩形領域Ｒ３の幅および高さを元画像ＩＭＧ３’と同様に拡大・縮小して矩形領域Ｒ４を設定した場合、図５に示すように検出対象ＯＢ１の一部が矩形領域Ｒ４からはみ出すことがある。 In IMG4, a rectangular area R4 indicating the position and range of the detection target OB1 is obtained by multiplying the width of the rectangular area R2 by (544/566) and by multiplying the height by (544/383). In this way, when the rectangular region R4 is set by enlarging/reducing the width and height of the rectangular region R3 in the same manner as the original image IMG3′, a portion of the detection target OB1 extends from the rectangular region R4 as shown in FIG. It may stick out.

これは、元画像ＩＭＧ３’のアスペクト比が変わったことにより検出対象ＯＢ１の傾斜角度が変化しているのに対し、矩形領域Ｒ４の傾斜角度αは、元画像ＩＭＧ３’の矩形領域Ｒ３の傾斜角度と同じであるためである。 This is because the tilt angle of the detection target OB1 changes due to the change in the aspect ratio of the original image IMG3′, whereas the tilt angle α of the rectangular region R4 is the same as the tilt angle of the rectangular region R3 of the original image IMG3′. because it is the same as

このように、元画像に設定された矩形領域が傾斜している場合であって、その元画像とはアスペクト比が異なる教師画像を生成した場合には、教師画像に対応付けられるラベルが、検出対象の位置および範囲を正しく示すものとはならないという問題が生じる。 In this way, when the rectangular area set in the original image is slanted and the teacher image is generated with an aspect ratio different from that of the original image, the label associated with the teacher image is detected. The problem arises that it does not correctly indicate the position and extent of the object.

本発明の一態様は、上記のような場合にも、教師画像における検出対象の位置および範囲を正しく示すラベルを含む教師データを生成することが可能な情報処理装置等を実現することを目的とする。 An object of one aspect of the present invention is to realize an information processing apparatus or the like capable of generating teacher data including a label that correctly indicates the position and range of a detection target in a teacher image even in the case described above. do.

上記の課題を解決するために、本発明の一態様に係る情報処理装置は、検出対象が写る元画像に対し、高さ方向および幅方向の少なくとも何れかに拡大または縮小する変換を施して教師画像を生成する教師画像生成部と、前記変換の倍率に基づいて、前記元画像に写る前記検出対象の位置および範囲を示す矩形領域の各頂点の座標を、前記教師画像に写る前記検出対象の位置および範囲を示す四角形領域の各頂点の座標に変換する座標変換部と、前記座標変換部による変換後の前記座標に基づいて前記教師画像に写る前記検出対象の位置および範囲を示すラベルを生成し、前記教師画像に対応付けて教師データとする教師データ生成部と、を備える。 In order to solve the above-described problems, an information processing apparatus according to an aspect of the present invention converts an original image in which a detection target is shown so as to expand or contract in at least one of the height direction and the width direction. a teacher image generating unit that generates an image; and based on the magnification of the conversion, coordinates of each vertex of a rectangular area indicating the position and range of the detection target appearing in the original image are converted to the detection target appearing in the teacher image. A coordinate transformation unit that transforms the coordinates of each vertex of a rectangular area indicating the position and range, and a label that indicates the position and range of the detection target appearing in the teacher image based on the coordinates after conversion by the coordinate conversion unit. and a teacher data generation unit that generates teacher data in association with the teacher image.

上記の課題を解決するために、本発明の一態様に係る教師データの生成方法は、１または複数の情報処理装置が実行する教師データの生成方法であって、検出対象が写る元画像に対し、高さ方向および幅方向の少なくとも何れかに拡大または縮小する変換を施して教師画像を生成する教師画像生成ステップと、前記変換の倍率に基づいて、前記元画像に写る前記検出対象の位置および範囲を示す矩形領域の各頂点の座標を、前記教師画像に写る前記検出対象の位置および範囲を示す四角形領域の各頂点の座標に変換する座標変換ステップと、前記座標変換ステップによる変換後の前記座標に基づいて前記教師画像に写る前記検出対象の位置および範囲を示すラベルを生成し、前記教師画像に対応付けて教師データとする教師データ生成ステップと、を含む。 In order to solve the above-described problems, a teaching data generating method according to an aspect of the present invention is a teaching data generating method executed by one or a plurality of information processing apparatuses, wherein an original image in which a detection target is captured is , a training image generation step of generating a training image by applying a transformation that enlarges or shrinks in at least one of the height direction and the width direction, and based on the magnification of the transformation, the position and a coordinate transformation step of transforming the coordinates of each vertex of a rectangular area indicating a range into the coordinates of each vertex of a rectangular area indicating the position and range of the detection target in the teacher image; and a teacher data generating step of generating a label indicating the position and range of the detection target appearing in the teacher image based on the coordinates and associating the label with the teacher image as teacher data.

上記の課題を解決するために、本発明の一態様に係る学習済みモデルの生成方法は、１または複数の情報処理装置が実行する学習済みモデルの生成方法であって、検出対象が写る元画像に対し、高さ方向および幅方向の少なくとも何れかに拡大または縮小する変換を施して教師画像を生成する教師画像生成ステップと、前記変換の倍率に基づいて、前記元画像に写る前記検出対象の位置および範囲を示す矩形領域の各頂点の座標を、前記教師画像に写る前記検出対象の位置および範囲を示す四角形領域の各頂点の座標に変換する座標変換ステップと、前記座標変換ステップによる変換後の前記座標に基づいて前記教師画像に写る前記検出対象の位置および範囲を示すラベルを生成し、前記教師画像に対応付けて教師データとする教師データ生成ステップと、前記教師データ生成ステップで生成された前記教師データを用いた機械学習により、画像から前記検出対象を検出するための学習済みモデルを生成する学習ステップと、を含む。 In order to solve the above problems, a method for generating a trained model according to an aspect of the present invention is a method for generating a trained model executed by one or more information processing apparatuses, the method comprising: a teacher image generating step of generating a teacher image by applying a transformation that enlarges or shrinks in at least one of the height direction and the width direction to the target image; a coordinate conversion step of converting the coordinates of each vertex of a rectangular area indicating the position and range into the coordinates of each vertex of the rectangular area indicating the position and range of the detection target appearing in the teacher image; and after conversion by the coordinate conversion step. a teacher data generation step of generating a label indicating the position and range of the detection target appearing in the teacher image based on the coordinates of the teacher image and using the label as teacher data in association with the teacher image; and a learning step of generating a learned model for detecting the detection target from the image by machine learning using the teacher data.

本発明の一態様によれば、傾斜した矩形領域が設定された元画像から、その元画像とはアスペクト比が異なる教師画像を生成した場合においても、教師画像における検出対象の位置および範囲を正しく示すラベルを含む教師データを生成することができる。 According to one aspect of the present invention, even when a teacher image having an aspect ratio different from that of the original image is generated from an original image in which a slanted rectangular area is set, the position and range of the detection target in the teacher image can be correctly set. Teacher data can be generated that includes labels that indicate

本発明の一実施形態に係る情報処理装置の要部構成の一例を示すブロック図である。1 is a block diagram showing an example of the main configuration of an information processing apparatus according to an embodiment of the present invention; FIG. 上記情報処理装置による教師画像とラベルの生成例を示す図である。It is a figure which shows the generation example of the teacher image and label by the said information processing apparatus. 上記情報処理装置が実行する処理の一例を示すフローチャートである。It is a flow chart which shows an example of processing which the above-mentioned information processor performs. 従来技術を示す図であり、教師画像における検出対象の位置および範囲を示すラベルの設定例を示す図である。FIG. 10 is a diagram showing a conventional technique, and is a diagram showing a setting example of labels indicating the position and range of a detection target in a teacher image. 従来技術の問題点を説明する図である。It is a figure explaining the problem of a prior art.

〔装置構成〕
本発明の一実施形態に係る情報処理装置１の構成を図１に基づいて説明する。図１は、情報処理装置１の要部構成の一例を示すブロック図である。情報処理装置１は、一例として、パーソナルコンピュータ、サーバー、またはワークステーションであってもよい。〔Device configuration〕
A configuration of an information processing apparatus 1 according to one embodiment of the present invention will be described with reference to FIG. FIG. 1 is a block diagram showing an example of the main configuration of an information processing apparatus 1. As shown in FIG. The information processing device 1 may be, for example, a personal computer, a server, or a workstation.

図示のように、情報処理装置１は、情報処理装置１の各部を統括して制御する制御部１０と、情報処理装置１が使用する各種データを記憶する記憶部１１を備えている。また、情報処理装置１は、情報処理装置１が他の装置と通信するための通信部１２、情報処理装置１に対する各種データの入力を受け付ける入力部１３、および情報処理装置１が各種データを出力するための出力部１４を備えている。 As shown in the figure, the information processing apparatus 1 includes a control section 10 that centrally controls each section of the information processing apparatus 1 and a storage section 11 that stores various data used by the information processing apparatus 1 . The information processing apparatus 1 also includes a communication unit 12 for communicating between the information processing apparatus 1 and other apparatuses, an input unit 13 for receiving input of various data to the information processing apparatus 1, and various data output by the information processing apparatus 1. An output unit 14 is provided for doing so.

また、制御部１０には、教師画像生成部１０１、座標変換部１０２、教師データ生成部１０３、学習部１０４、および推論部１０５が含まれている。そして、記憶部１１には、画像１１１、教師データ１１２、および学習済みモデル１１３が記憶されている。 The control unit 10 also includes a teacher image generator 101 , a coordinate transform unit 102 , a teacher data generator 103 , a learning unit 104 and an inference unit 105 . An image 111, teacher data 112, and a trained model 113 are stored in the storage unit 11. FIG.

画像１１１は、検出対象が写っている画像である。画像１１１には、画像１１１に写る検出対象の位置および範囲を示すラベルが対応付けられている。このラベルは、例えば、検出対象を囲む矩形領域の代表座標、幅、高さ、および傾斜角度を示すものであってもよい。なお、幅および高さは、横の長さおよび縦の長さと読み替えることもできる。また、検出対象の検出のみならず識別についても行う学習済みモデルを生成する場合、上述した情報に加えて、検出対象の識別子をラベルに追加すればよい。 An image 111 is an image showing a detection target. The image 111 is associated with a label indicating the position and range of the detection target appearing in the image 111 . This label may indicate, for example, representative coordinates, width, height, and tilt angle of a rectangular area surrounding the detection target. Note that width and height can also be read as horizontal length and vertical length. When generating a trained model that not only detects but also identifies a detection target, the identifier of the detection target may be added to the label in addition to the information described above.

画像１１１は、入力部１３を介して入力してもよいし、通信部１２を介して受信してもよい。例えば、入力部１３は、ＵＳＢ（Universal Serial Bus）インターフェースであってもよい。この場合、入力部１３と検出対象を撮影する撮影装置とをＵＳＢインターフェースを介して接続してもよい。そして、撮影装置が撮影した画像を、情報処理装置１が入力部１３を介して受信して、画像１１１として記憶するようにしてもよい。また、撮影装置が撮影した画像を、ＬＡＮ（Local-Area Network）や無線ＬＡＮ等により送信し、情報処理装置１が通信部１２を介して当該画像を受信して、画像１１１として記憶する構成としてもよい。 The image 111 may be input via the input unit 13 or received via the communication unit 12 . For example, the input unit 13 may be a USB (Universal Serial Bus) interface. In this case, the input unit 13 and a photographing device that photographs the detection target may be connected via a USB interface. Then, the information processing device 1 may receive the image captured by the image capturing device via the input unit 13 and store it as the image 111 . Also, an image captured by the imaging device is transmitted via a LAN (Local-Area Network), a wireless LAN, or the like, and the information processing device 1 receives the image via the communication unit 12 and stores it as the image 111. good too.

教師画像生成部１０１は、検出対象が写る画像１１１から切り出した元画像に対し、高さ方向および幅方向の少なくとも何れかに拡大または縮小する変換を施して教師画像を生成する。なお、画像１１１をそのまま元画像として教師画像を生成してもよい。また、教師画像の生成方法の詳細は図２に基づいて後述する。 The teacher image generation unit 101 generates a teacher image by performing a conversion to enlarge or reduce the original image cut out from the image 111 showing the detection target in at least one of the height direction and the width direction. Note that the teacher image may be generated using the image 111 as the original image. Further, the details of the method of generating the teacher image will be described later with reference to FIG.

座標変換部１０２は、教師画像の変換の倍率（高さ方向および幅方向への拡大／縮小の倍率）に基づいて、元画像に写る検出対象の位置および範囲を示す矩形領域の各頂点の座標を、教師画像に写る検出対象の位置および範囲を示す四角形領域の各頂点の座標に変換する。座標の変換方法の詳細は図２に基づいて後述する。 The coordinate transformation unit 102 converts the coordinates of each vertex of a rectangular area representing the position and range of the detection target in the original image based on the transformation magnification of the teacher image (magnification for enlargement/reduction in the height direction and width direction). are transformed into the coordinates of the vertices of a quadrangular area indicating the position and range of the detection target in the teacher image. The details of the coordinate conversion method will be described later with reference to FIG.

教師データ生成部１０３は、座標変換部１０２による変換後の座標に基づいて、教師画像生成部１０１が生成した教師画像に写る検出対象の位置および範囲を示すラベルを生成し、教師画像に対応付けて教師データとする。そして、教師データ生成部１０３は、生成した教師データを教師データ１１２として記憶部１１に記憶させる。ラベルの生成方法の詳細は図２に基づいて後述する。 The teacher data generation unit 103 generates a label indicating the position and range of the detection target appearing in the teacher image generated by the teacher image generation unit 101 based on the coordinates after conversion by the coordinate transformation unit 102, and associates the label with the teacher image. be training data. Then, the teacher data generation unit 103 stores the generated teacher data as the teacher data 112 in the storage unit 11 . The details of the label generation method will be described later with reference to FIG.

学習部１０４は、教師データ生成部１０３が生成する教師データ１１２を用いた機械学習により、画像から検出対象を検出するための学習済みモデルを生成する。そして、学習部１０４は、生成した学習済みモデルを学習済みモデル１１３として記憶部１１に記憶させる。モデルは、例えば、ニューラルネットワークモデル（深層ニューラルネットワークモデルを含む）等であってもよい。また、学習は、学習中のモデルに教師データに含まれる教師画像を入力して得られる推論結果とその教師データに示される正解との誤差を計算し、誤差逆伝搬により上記モデルを更新する、という処理を、教師データを変えながら繰り返し行うことにより行えばよい（誤差逆伝搬法）。 The learning unit 104 generates a trained model for detecting a detection target from an image by machine learning using the training data 112 generated by the training data generation unit 103 . Then, the learning unit 104 stores the generated trained model as a trained model 113 in the storage unit 11 . The model may be, for example, a neural network model (including a deep neural network model), or the like. In addition, learning calculates the error between the inference result obtained by inputting the teacher image contained in the teacher data into the model under learning and the correct answer shown in the teacher data, and updates the model by error back propagation. This process may be performed repeatedly while changing the teacher data (error backpropagation method).

推論部１０５は、学習済みモデル１１３を用いて画像から検出対象を検出する。より詳細には、まず、推論部１０５は、対象となる画像を読み込む。画像は予め記憶部１１に記憶されているものであってもよいし、通信部１２または入力部１３を介して撮影装置等から入力されたものであってもよい。そして、推論部１０５は、読み込んだ上記の画像を学習済みモデル１１３に入力し、得られた検出結果のうち、確度が閾値以上の検出結果を出力部１４に出力させる。 The inference unit 105 uses the trained model 113 to detect detection targets from the image. More specifically, the inference unit 105 first reads the target image. The image may be stored in the storage unit 11 in advance, or may be input from a photographing device or the like via the communication unit 12 or the input unit 13 . Then, the inference unit 105 inputs the read image to the trained model 113, and causes the output unit 14 to output the detection results whose accuracy is equal to or higher than the threshold among the obtained detection results.

学習済みモデル１１３は、検出対象の位置および範囲を正しく示すラベルを含む教師データ１１２を用いた学習により生成されたものであるから、学習済みモデル１１３を用いて検出を行う推論部１０５によれば、高精度な物体検出が可能である。なお、上記検出結果は、画像に写る検出対象の位置および範囲を示すものである。検出結果としては、検出対象を囲む領域の代表座標、幅、高さ、および傾斜角度を出力させてもよいし、対象となる画像に上記領域を示す矩形を描画する態様で出力させてもよい。 The trained model 113 is generated by learning using the teacher data 112 including labels that correctly indicate the position and range of the detection target. , high-precision object detection is possible. Note that the detection result indicates the position and range of the detection target appearing in the image. As the detection result, the representative coordinates, width, height, and tilt angle of the area surrounding the detection target may be output, or a rectangle indicating the area may be drawn on the target image. .

以上のように、情報処理装置１は、検出対象が写る元画像に対し、高さ方向および幅方向の少なくとも何れかに拡大または縮小する変換を施して教師画像を生成する教師画像生成部１０１と、前記変換の倍率に基づいて、元画像に写る検出対象の位置および範囲を示す矩形領域の各頂点の座標を、教師画像に写る検出対象の位置および範囲を示す四角形領域の各頂点の座標に変換する座標変換部１０２と、座標変換部１０２による変換後の座標に基づいて教師画像に写る検出対象の位置および範囲を示すラベルを生成し、教師画像に対応付けて教師データ１１２とする教師データ生成部１０３と、を備える。 As described above, the information processing apparatus 1 includes the teacher image generation unit 101 that generates a teacher image by performing a transformation that enlarges or reduces an original image in which a detection target is captured in at least one of the height direction and the width direction. , based on the conversion magnification, the coordinates of each vertex of a rectangular area showing the position and range of the detection target in the original image are converted to the coordinates of each vertex of a rectangular area showing the position and range of the detection target in the teacher image. A coordinate conversion unit 102 for conversion and a label indicating the position and range of a detection target appearing in the teacher image are generated based on the coordinates after conversion by the coordinate conversion unit 102, and are associated with the teacher image to form teacher data 112. and a generation unit 103 .

上記の構成によれば、検出対象が写る元画像に対し、高さ方向および幅方向の少なくとも何れかに拡大または縮小する変換を施して教師画像を生成する。これにより、元画像とアスペクト比（高さと幅の比。縦横比と同義）が異なるものを含め、バリエーションに富んだ教師画像を生成することができる。 According to the above configuration, the original image in which the detection target is captured is subjected to a transformation that enlarges or reduces it in at least one of the height direction and the width direction, thereby generating the teacher image. As a result, it is possible to generate a wide variety of teacher images, including those with different aspect ratios (ratio of height and width; synonymous with aspect ratio) from the original image.

また、上記の構成によれば、生成した教師画像に対応付けるラベルを、教師画像の生成時に適用した変換の倍率に基づいて元画像の矩形領域の４頂点の座標を変換した四角形領域の４頂点の座標に基づいて生成する。そして、上記教師画像に上記ラベルを対応付けて教師データとする。 Further, according to the above configuration, the labels associated with the generated teacher image are the four vertices of the rectangular area obtained by transforming the coordinates of the four vertices of the rectangular area of the original image based on the magnification of the transformation applied when the teacher image was generated. Generate based on coordinates. Then, the teacher data is obtained by associating the label with the teacher image.

四角形領域の４頂点の座標は、教師画像の生成時に適用した変換の倍率に基づいて求められたものであるから、教師画像に写る検出対象の位置および範囲を正しく示している。これにより、教師画像に写る検出対象の位置および範囲を正しく示すラベルが対応付けられた教師データ１１２を生成することができる。 The coordinates of the four vertices of the quadrilateral area are obtained based on the magnification of the conversion applied when generating the teacher image, so they correctly indicate the position and range of the detection target appearing in the teacher image. As a result, it is possible to generate teacher data 112 associated with labels that correctly indicate the position and range of the detection target appearing in the teacher image.

以上のとおり、上記の構成によれば、元画像に設定された矩形領域が傾斜している場合であって、その元画像とはアスペクト比が異なる教師画像を生成した場合においても、教師画像における検出対象の位置および範囲を正しく示すラベルを含む教師データ１１２を生成することができるという効果を奏する。 As described above, according to the above configuration, even when the rectangular area set in the original image is slanted and the teacher image is generated with an aspect ratio different from that of the original image, the The effect is that it is possible to generate teacher data 112 including labels that correctly indicate the position and range of the detection target.

また、情報処理装置１は、教師データ生成部１０３が生成する教師データ１１２を用いた機械学習により、画像から検出対象を検出するための学習済みモデル１１３を生成する学習部１０４を備える。この構成によれば、学習済みモデル１１３を自動で生成することができる。 The information processing apparatus 1 also includes a learning unit 104 that generates a trained model 113 for detecting a detection target from an image by machine learning using the teacher data 112 generated by the teacher data generation unit 103 . With this configuration, the learned model 113 can be automatically generated.

〔教師画像とラベルの生成例〕
図２は、情報処理装置１による教師画像とラベルの生成例を示す図である。図２には、幅および高さが何れも６０８ピクセルの画像ＩＭＧ５から一点鎖線で示す切出領域の部分を切り出して元画像ＩＭＧ５’とし、これを拡大・縮小して、幅および高さが５４４ピクセルの教師画像ＩＭＧ６としている。なお、教師画像ＩＭＧ６の幅および高さは、乱数を用いる等して決定してもよい。また、検出対象ＯＢ１の全体ではなく一部を切り出した元画像を用いて教師画像を生成してもよい。このような教師画像を用いて学習を行うことにより、検出対象ＯＢ１の一部のみ写った画像から検出対象ＯＢ１を検出することが可能になる。 [Generation example of teacher image and label]
FIG. 2 is a diagram showing an example of generation of teacher images and labels by the information processing apparatus 1. As shown in FIG. In FIG. 2, an image IMG5 with a width and height of 608 pixels is cut out from an image IMG5 whose width and height are both 608 pixels, and an original image IMG5′ is obtained by cutting out the cutout region indicated by the dashed line. A pixel teacher image IMG6 is used. Note that the width and height of the teacher image IMG6 may be determined by using random numbers or the like. Alternatively, a teacher image may be generated using an original image obtained by cutting out a portion of the detection target OB1 instead of the entire detection target OB1. By performing learning using such a teacher image, it becomes possible to detect the detection target OB1 from an image showing only a part of the detection target OB1.

このように、教師画像生成部１０１は、検出対象が写る画像ＩＭＧ５から切り出した元画像ＩＭＧ５’に対し、高さ方向および幅方向の少なくとも何れかに拡大または縮小する変換を施して教師画像ＩＭＧ６を生成する。教師画像生成部１０１は、切出領域の位置および範囲並びに変換倍率を、例えば乱数で決定してもよい。 In this way, the teacher image generation unit 101 performs conversion to enlarge or reduce the original image IMG5′ cut out from the image IMG5 including the detection target in at least one of the height direction and the width direction, thereby creating the teacher image IMG6. Generate. The teacher image generation unit 101 may determine the position and range of the clipping region and the conversion magnification using random numbers, for example.

教師画像ＩＭＧ６が生成されると、座標変換部１０２は、教師画像ＩＭＧ６を生成する際の変換の倍率に基づいて、元画像ＩＭＧ５’に写る検出対象ＯＢ１の位置および範囲を示す矩形領域Ｒ５の各頂点Ｐ５ａ～Ｐ５ｄの座標を求める。これらの座標は、画像ＩＭＧ５に対応付けられているラベルに示される、矩形領域Ｒ５の代表座標、幅、高さ、および傾斜角度から算出される。 When teacher image IMG6 is generated, coordinate transformation unit 102 transforms each of rectangular regions R5 indicating the position and range of detection target OB1 appearing in original image IMG5′ based on the magnification of transformation when teacher image IMG6 is generated. Find the coordinates of the vertices P5a to P5d. These coordinates are calculated from the representative coordinates, width, height, and tilt angle of the rectangular area R5 indicated on the label associated with the image IMG5.

そして、座標変換部１０２は、頂点Ｐ５ａ～Ｐ５ｄの座標を、教師画像ＩＭＧ６に写る検出対象ＯＢ１の位置および範囲を示す四角形領域Ｒ６の各頂点Ｐ６ａ～Ｐ６ｄの座標に変換する。例えば、座標変換部１０２は、頂点Ｐ５ａ～Ｐ５ｄのｘ座標に教師画像ＩＭＧ６を生成する際の幅方向の変換倍率を乗じ、頂点Ｐ５ａ～Ｐ５ｄのｙ座標に教師画像ＩＭＧ６を生成する際の高さ方向の変換倍率を乗じることにより上記の変換を行う。 Coordinate transformation unit 102 transforms the coordinates of vertices P5a to P5d into the coordinates of vertices P6a to P6d of quadrangular region R6 indicating the position and range of detection target OB1 appearing in teacher image IMG6. For example, the coordinate transformation unit 102 multiplies the x-coordinates of the vertices P5a to P5d by a transformation magnification in the width direction when generating the teacher image IMG6, and converts the y-coordinates of the vertices P5a to P5d to the heights when generating the teacher image IMG6. The above conversion is performed by multiplying the direction conversion magnification.

教師画像ＩＭＧ６の生成時に適用した高さ方向と幅方向の変換の倍率が異なるため、元画像ＩＭＧ５’と教師画像ＩＭＧ６のアスペクト比は異なっている。この場合、教師画像ＩＭＧ６に写る検出対象ＯＢ１は高さと幅のバランスが変わって歪んだ形状となっている。このとき、傾斜した矩形領域Ｒ５に対応する四角形領域Ｒ６は、歪んだ検出対象ＯＢ１の外縁に沿った平行四辺形の領域になる。なお、元画像と教師画像とでアスペクト比が変わっていなければ、教師画像に写る検出対象は歪むことなく拡大または縮小される。この場合、元画像の矩形領域は歪むことなく拡大または縮小されるので変換後の領域も矩形となる。 The aspect ratios of the original image IMG5' and the teacher image IMG6 are different because the magnifications of the transforms in the height direction and the width direction that are applied when the teacher image IMG6 is generated are different. In this case, the detection target OB1 appearing in the teacher image IMG6 has a distorted shape due to a change in the balance between height and width. At this time, a quadrangular region R6 corresponding to the inclined rectangular region R5 becomes a parallelogram region along the outer edge of the distorted detection object OB1. Note that if the aspect ratios of the original image and the teacher image do not change, the detection target appearing in the teacher image is enlarged or reduced without being distorted. In this case, since the rectangular area of the original image is enlarged or reduced without being distorted, the area after conversion is also rectangular.

次に、教師データ生成部１０３は、座標変換部１０２による変換後の座標、すなわち頂点Ｐ６ａ～Ｐ６ｄの座標に基づいて、教師画像ＩＭＧ６に写る検出対象ＯＢ１の位置および範囲を示すラベルを生成する。具体的には、教師データ生成部１０３は、頂点Ｐ６ａ～Ｐ６ｄの座標から下記の４つの値を算出する。 Next, training data generation unit 103 generates a label indicating the position and range of detection target OB1 appearing in training image IMG6 based on the coordinates after conversion by coordinate conversion unit 102, that is, the coordinates of vertices P6a to P6d. Specifically, the teacher data generation unit 103 calculates the following four values from the coordinates of the vertices P6a to P6d.

（１）四角形領域Ｒ６の代表座標
（２）四角形領域Ｒ６の長辺の長さ
（３）四角形領域Ｒ６の長辺の傾斜角度
（４）四角形領域Ｒ６の対向する長辺間の距離
上記（１）は、教師画像ＩＭＧ６に写る検出対象ＯＢ１の位置を示す情報である。代表座標は四角形領域Ｒ６の位置を特定できるような座標であればよい。例えば、教師データ生成部１０３は、四角形領域Ｒ６の対角線の交点Ｐ６ｅの座標を代表座標として算出してもよい。また、例えば、教師データ生成部１０３は、四角形領域Ｒ６の左上端の頂点の座標、つまり頂点Ｐ６ａの座標を代表座標として算出してもよい。 (1) Representative coordinates of quadrilateral region R6 (2) Length of long side of quadrilateral region R6 (3) Inclination angle of long side of quadrilateral region R6 (4) Distance between opposite long sides of quadrilateral region R6 (1) ) is information indicating the position of the detection target OB1 appearing in the teacher image IMG6. The representative coordinates may be coordinates that can specify the position of the quadrangular region R6. For example, the teacher data generation unit 103 may calculate the coordinates of the intersection point P6e of the diagonal lines of the rectangular region R6 as the representative coordinates. Further, for example, the teacher data generation unit 103 may calculate the coordinates of the upper left corner vertex of the quadrangular region R6, that is, the coordinates of the vertex P6a as the representative coordinates.

上記（２）は、教師画像ＩＭＧ６に写る検出対象の高さＨを示す情報である。教師データ生成部１０３は、頂点Ｐ６ａとＰ６ｄとの間の距離、または頂点Ｐ６ｂとＰ６ｃとの間の距離を算出すればよい。なお、四角形領域Ｒ６の長辺の長さの代わりに、四角形領域Ｒ６の短辺の長さを算出してもよい。 The above (2) is information indicating the height H of the detection target appearing in the teacher image IMG6. The teacher data generator 103 may calculate the distance between the vertices P6a and P6d or the distance between the vertices P6b and P6c. The length of the short side of the quadrilateral region R6 may be calculated instead of the length of the long side of the quadrilateral region R6.

上記（３）は、教師画像ＩＭＧ６に写る検出対象の傾斜角度を示す情報である。教師データ生成部１０３は、頂点Ｐ６ａとＰ６ｄを結ぶ直線が、教師画像ＩＭＧ６の幅方向に対してなす角度β、または頂点Ｐ６ｂとＰ６ｃを結ぶ直線が、教師画像ＩＭＧ６の幅方向に対してなす角度を算出すればよい。なお、上記（２）において、四角形領域Ｒ６の長辺の長さの代わりに、四角形領域Ｒ６の短辺の長さを算出した場合、短辺の傾斜角度を算出すればよい。 The above (3) is information indicating the tilt angle of the detection target appearing in the teacher image IMG6. The training data generation unit 103 determines the angle β formed by the straight line connecting the vertices P6a and P6d with respect to the width direction of the training image IMG6, or the angle β formed by the straight line connecting the vertices P6b and P6c with respect to the width direction of the training image IMG6. should be calculated. In addition, in the above (2), when the length of the short side of the quadrilateral region R6 is calculated instead of the length of the long side of the quadrilateral region R6, the inclination angle of the short side may be calculated.

上記（４）は、教師画像ＩＭＧ６に写る検出対象の幅Ｗを示す情報である。教師データ生成部１０３は、頂点Ｐ６ａとＰ６ｄを結ぶ直線と、頂点Ｐ６ｂとＰ６ｃを結ぶ直線との距離を算出すればよい。上記（２）において、四角形領域Ｒ６の長辺の長さの代わりに、四角形領域Ｒ６の短辺の長さを算出した場合、短辺間の距離を算出すればよい。 The above (4) is information indicating the width W of the detection target appearing in the teacher image IMG6. The teacher data generation unit 103 may calculate the distance between the straight line connecting the vertices P6a and P6d and the straight line connecting the vertices P6b and P6c. In the above (2), when the length of the short side of the quadrilateral region R6 is calculated instead of the length of the long side of the quadrilateral region R6, the distance between the short sides may be calculated.

なお、実際のＯＢ１の幅よりもやや広くなるが、教師データ生成部１０３は、頂点Ｐ６ａとＰ６ｂとの間の距離、または頂点Ｐ６ｃとＰ６ｄとの間の距離を、四角形領域Ｒ６の幅として算出してもよい。 Although the actual width of OB1 is slightly wider than the actual width, the training data generation unit 103 calculates the distance between the vertices P6a and P6b or the distance between the vertices P6c and P6d as the width of the rectangular area R6. You may

また、教師データ生成部１０３は、図２に示すように、頂点Ｐ６ａ～Ｐ６ｄをＰ６ａ’～Ｐ６ｄ’に補正して、Ｐ６ａ’～Ｐ６ｄ’を４頂点とする長方形を特定し、その長方形の位置および範囲を示すラベルを生成してもよい。具体的には、教師データ生成部１０３は、Ｐ６ａ～Ｐ６ｄを４頂点とする平行四辺形と同じ面積となり、かつ対角線の交点がＰ６ｅとなるように頂点Ｐ６ａ～Ｐ６ｄの位置を補正する。例えば、長方形の長辺が、平行四辺形の長辺と同じ長さとなるようにし、長方形の短辺の長さが平行四辺形の高さと同じになるようにしてもよい。これにより、図２に示すようなＰ６ａ’～Ｐ６ｄ’を４頂点とする長方形が特定される。この場合、例えばＰ６ａ’～Ｐ６ｄ’の座標を、当該長方形の位置および範囲を示すラベルとしてもよい。無論、長方形の短辺が、平行四辺形の短辺と同じ長さとなるようにし、長方形の長辺の長さが平行四辺形の高さと同じになるようにしてもよい。 Further, as shown in FIG. 2, the training data generation unit 103 corrects the vertices P6a to P6d to P6a′ to P6d′, specifies a rectangle having four vertices P6a′ to P6d′, and determines the position of the rectangle. and a label indicating the range. Specifically, the training data generation unit 103 corrects the positions of the vertices P6a to P6d so that the area is the same as that of a parallelogram having four vertices P6a to P6d and the intersection of the diagonals is P6e. For example, the long sides of the rectangle may have the same length as the long sides of the parallelogram, and the short sides of the rectangle may have the same length as the height of the parallelogram. As a result, a rectangle having four vertices P6a' to P6d' as shown in FIG. 2 is specified. In this case, coordinates P6a' to P6d', for example, may be labels indicating the position and range of the rectangle. Of course, the short sides of the rectangle may be the same length as the short sides of the parallelogram, and the length of the long sides of the rectangle may be the same as the height of the parallelogram.

最後に、教師データ生成部１０３は、算出した上記４つの値を示すラベルを教師画像ＩＭＧ６に対応付けて教師データとする。このように、教師データ生成部１０３は、座標変換部１０２が求めた各頂点の座標から、（１）検出対象の位置を示す情報として当該四角形領域の代表座標を算出し、（２）検出対象の高さを示す情報として当該四角形領域の一辺の長さを算出し、（３）検出対象の傾斜角度を示す情報として上記一辺の傾斜角度を算出し、（４）検出対象の幅を示す情報として上記一辺と該一辺に対向する辺との間の距離を算出してもよい。 Finally, the teacher data generation unit 103 associates the labels indicating the four calculated values with the teacher image IMG6 to generate teacher data. In this way, the teacher data generation unit 103 (1) calculates the representative coordinates of the rectangular area as information indicating the position of the detection target from the coordinates of each vertex obtained by the coordinate conversion unit 102, and (2) calculates the representative coordinates of the detection target. (3) calculating the inclination angle of the one side as information indicating the inclination angle of the detection target; and (4) information indicating the width of the detection target. , the distance between the one side and the side opposite to the one side may be calculated.

上記の構成によれば、物体検出のラベルとして一般的な、代表座標、高さ、および幅、に加えて、棒状の検出対象の学習の際に有用な傾斜角度を示すラベルが対応付けられた教師データを生成することができる。 According to the above configuration, in addition to representative coordinates, height, and width, which are general labels for object detection, labels indicating tilt angles useful for learning rod-shaped detection targets are associated. Teacher data can be generated.

〔処理の流れ〕
情報処理装置１が実行する処理の流れを図３に基づいて説明する。図３は、情報処理装置１が実行する処理の一例を示すフローチャートである。なお、図３の処理の開始前に、学習に必要な設定情報は読み込みが終了しているものとする。設定情報としては、例えば画像１１１の格納場所、教師画像を生成する際に使用する乱数の範囲、一度に読み込む画像１１１の数、および学習の終了条件等が挙げられる。 [Process flow]
The flow of processing executed by the information processing apparatus 1 will be described with reference to FIG. FIG. 3 is a flowchart showing an example of processing executed by the information processing apparatus 1. As shown in FIG. It is assumed that the setting information necessary for learning has been read before the processing of FIG. 3 is started. The setting information includes, for example, the storage location of the image 111, the range of random numbers used when generating the teacher image, the number of images 111 to be read at one time, and learning termination conditions.

Ｓ１では、教師画像生成部１０１が、記憶部１１から画像１１１を取得する。取得する画像１１１の数は上述のように設定情報で設定しておけばよい。取得する画像１１１の数は１つであっても複数であってもよい。また、教師画像生成部１０１は、何れの画像１１１を取得するかを乱数によって決定してもよい。 In S<b>1 , the teacher image generation unit 101 acquires the image 111 from the storage unit 11 . The number of images 111 to be acquired may be set in the setting information as described above. The number of images 111 to be acquired may be one or plural. Also, the teacher image generation unit 101 may determine which image 111 to acquire using a random number.

Ｓ２では、教師画像生成部１０１は、Ｓ１で取得した画像１１１の切り出し範囲を決定する。例えば、教師画像生成部１０１は、乱数により切り出し範囲の位置、幅、および高さを決定してもよい。 In S2, the teacher image generation unit 101 determines the clipping range of the image 111 acquired in S1. For example, the teacher image generation unit 101 may determine the position, width, and height of the extraction range using random numbers.

Ｓ３（教師画像生成ステップ）では、教師画像生成部１０１は、Ｓ１で読み出した画像１１１をＳ２で決定した切り出し範囲で切り出して教師画像の元になる元画像とする。そして、教師画像生成部１０１は、上記の元画像に対し、高さ方向および幅方向の少なくとも何れかに拡大または縮小する変換を施して教師画像を生成する。なお、拡大または縮小の倍率は、生成する教師画像の高さ方向および幅方向のサイズと、元画像の高さ方向および幅方向のサイズによって決まる。教師画像生成部１０１は、教師画像の高さ方向および幅方向のサイズを乱数によって決定してもよい。 In S3 (teaching image generating step), the teaching image generating unit 101 cuts out the image 111 read out in S1 within the cutting range determined in S2, and uses it as an original image to be the basis of the teaching image. Then, the teacher image generating unit 101 generates a teacher image by performing a conversion to enlarge or reduce the original image in at least one of the height direction and the width direction. The magnification for enlargement or reduction is determined by the size of the teacher image to be generated in the height direction and width direction and the size of the original image in the height direction and width direction. The teacher image generation unit 101 may determine the size of the teacher image in the height direction and width direction using random numbers.

Ｓ４（座標変換ステップ）では、まず、座標変換部１０２が、上記元画像における検出対象の範囲を示す４頂点の座標を算出する。そして、座標変換部１０２は、算出した座標を、元画像に施された変換の倍率に基づいて変換して、Ｓ３で生成された教師画像における検出対象の範囲を示す４頂点の座標を算出する。各頂点の座標の算出方法については、図２に基づいて説明したとおりであるから、ここでは説明を繰り返さない。 In S4 (coordinate conversion step), first, the coordinate conversion unit 102 calculates the coordinates of the four vertices indicating the detection target range in the original image. Then, the coordinate conversion unit 102 converts the calculated coordinates based on the magnification of the conversion applied to the original image, and calculates the coordinates of the four vertices indicating the detection target range in the teacher image generated in S3. . Since the method of calculating the coordinates of each vertex is as described with reference to FIG. 2, the description will not be repeated here.

Ｓ５では、教師データ生成部１０３は、Ｓ４で座標変換部１０２が算出した各頂点の座標から、Ｓ３で生成された教師画像に写る検出対象の位置および範囲を示す情報として、検出対象の代表座標、幅、高さ、および傾斜角度を算出する。これらの情報の算出方法については、図２に基づいて説明したとおりであるから、ここでは説明を繰り返さない。 In S5, the teacher data generation unit 103 converts the coordinates of each vertex calculated by the coordinate transformation unit 102 in S4 into representative coordinates of the detection target as information indicating the position and range of the detection target appearing in the teacher image generated in S3. , width, height, and tilt angle. The method for calculating these pieces of information is as described with reference to FIG. 2, so description thereof will not be repeated here.

Ｓ６（教師データ生成ステップ）では、教師データ生成部１０３が、Ｓ５で算出した各情報からラベルを生成し、Ｓ３で生成された教師画像に対応付けて教師データとする。そして、教師データ生成部１０３は、生成した教師データを教師データ１１２として記憶部１１に記憶させる。変換後の座標からラベルを生成する方法については、図２に基づいて説明したとおりであるから、ここでは説明を繰り返さない。 In S6 (teaching data generating step), the teaching data generating unit 103 generates a label from each information calculated in S5, and associates the label with the teaching image generated in S3 to make the teaching data. Then, the teacher data generation unit 103 stores the generated teacher data as the teacher data 112 in the storage unit 11 . The method of generating labels from coordinates after conversion is as described with reference to FIG. 2, so description thereof will not be repeated here.

Ｓ７（学習ステップ）では、学習部１０４が、Ｓ６で生成された教師データ１１２を用いて機械学習を行う。例えば、ニューラルネットワークモデルの学習を行う場合には、学習部１０４は、学習中のモデルに教師データ１１２に含まれる教師画像を入力し、当該モデルから出力される出力値と、当該教師画像に対応付けられたラベルの値との誤差を算出する。そして、学習部１０４は、算出した誤差に基づいて誤差逆伝搬法により、ニューラルネットワークモデルの重み値を更新する。 In S7 (learning step), the learning unit 104 performs machine learning using the teacher data 112 generated in S6. For example, when learning a neural network model, the learning unit 104 inputs a teacher image included in the teacher data 112 to the model being learned, and outputs values output from the model and correspondences to the teacher image. Calculate the error with the attached label value. Then, the learning unit 104 updates the weight values of the neural network model by the error back propagation method based on the calculated error.

Ｓ８では、学習部１０４は、学習を終了するか否かを判定する。ここで終了すると判定された場合（Ｓ８でＹＥＳ）には、学習部１０４は、最新の更新後のモデルを学習済みモデル１１３として記憶部１１に記憶させ、図３の処理は終了となる。一方、終了しないと判定された場合（Ｓ８でＮＯ）には処理はＳ１に戻り、新たな画像１１１が読み出される。Ｓ８の終了条件は設定情報として予め定めておけばよい。例えば、モデルの更新が所定回数以上行われたことを終了条件としてもよいし、誤差が閾値以下となったことを終了条件としてもよい。 In S8, the learning unit 104 determines whether or not to end learning. If it is determined to end here (YES in S8), the learning unit 104 stores the latest updated model in the storage unit 11 as the learned model 113, and the processing in FIG. 3 ends. On the other hand, if it is determined not to end (NO in S8), the process returns to S1 and a new image 111 is read. The termination condition of S8 may be determined in advance as setting information. For example, the termination condition may be that the model has been updated a predetermined number of times or more, or that the error has become equal to or less than a threshold.

なお、Ｓ１で複数の画像１１１を読み出していた場合には、Ｓ２～Ｓ６の処理により、Ｓ１で読み出された複数の画像１１１のそれぞれから教師データが生成され、教師データ１１２として記憶される。そして、Ｓ７では、それらの教師データ１１２を用いた学習が行われる。 If a plurality of images 111 have been read out in S1, teacher data is generated from each of the plurality of images 111 read out in S1 and stored as teacher data 112 by the processes of S2 to S6. Then, in S7, learning using the teacher data 112 is performed.

図３のＳ１～Ｓ６の処理は教師データの生成方法である。また、この教師データの生成方法に、機械学習によって学習済みモデルを生成するＳ７～Ｓ８の処理を加えたＳ１～Ｓ８は学習済みモデルの生成方法である。なお、Ｓ１～Ｓ６の処理とＳ７～Ｓ８の処理は、必ずしも連続で行う必要はない。Ｓ１では学習に必要な数の画像１１１を取得し、それらの画像１１１からＳ２～Ｓ６の処理により学習に必要な数の教師データ１１２が生成された後で、Ｓ７～Ｓ８の処理を行うようにしてもよい。 The processing of S1 to S6 in FIG. 3 is a method of generating teacher data. In addition, S1 to S8, which are obtained by adding the processing of S7 to S8 for generating a learned model by machine learning to this teacher data generation method, are methods for generating a trained model. It should be noted that the processing of S1 to S6 and the processing of S7 to S8 do not necessarily have to be performed consecutively. In S1, the number of images 111 necessary for learning is acquired, and after the necessary number of teacher data 112 for learning is generated from those images 111 by the processing of S2 to S6, the processing of S7 to S8 is performed. may

また、教師データの生成においては、教師データのバリエーションを増やすため、教師画像の色や回転角度を元画像から変化させてもよい。教師データのバリエーションを増やすための色変換や回転変換の手法としては、従来から適用されている様々な手法を適用することができる。 Also, in generating the training data, the color and rotation angle of the training image may be changed from the original image in order to increase the variation of the training data. Various conventionally applied methods can be applied as methods of color conversion and rotation conversion for increasing the variation of teacher data.

以上のように、情報処理装置１が実行する教師データの生成方法は、検出対象が写る元画像に対し、高さ方向および幅方向の少なくとも何れかに拡大または縮小する変換を施して教師画像を生成する教師画像生成ステップ（Ｓ３）と、変換の倍率に基づいて、元画像に写る検出対象の位置および範囲を示す矩形領域の各頂点の座標を、教師画像に写る検出対象の位置および範囲を示す四角形領域の各頂点の座標に変換する座標変換ステップ（Ｓ４）と、座標変換ステップによる変換後の座標に基づいて教師画像に写る検出対象の位置および範囲を示すラベルを生成し、教師画像に対応付けて教師データとする教師データ生成ステップ（Ｓ６）と、を含む。これにより、教師画像における検出対象の位置および範囲を正しく示すラベルを含む教師データを生成することができる。 As described above, the method of generating teacher data executed by the information processing apparatus 1 is to convert an original image in which a detection target is shown so as to expand or contract in at least one of the height direction and the width direction, thereby producing a teacher image. Based on the teacher image generation step (S3) and the conversion magnification, the coordinates of each vertex of a rectangular area indicating the position and range of the detection target appearing in the original image are converted to the position and range of the detection target appearing in the teacher image. A coordinate transformation step (S4) for transforming the coordinates of each vertex of the quadrilateral area shown, and a label indicating the position and range of the detection target appearing in the teacher image based on the coordinates after the transformation in the coordinate transformation step, are generated in the teacher image. and a teacher data generating step (S6) for making teacher data in association with each other. This makes it possible to generate teacher data including labels that correctly indicate the position and range of the detection target in the teacher image.

以上のように、情報処理装置１が実行する学習済みモデルの生成方法は、検出対象が写る元画像に対し、高さ方向および幅方向の少なくとも何れかに拡大または縮小する変換を施して教師画像を生成する教師画像生成ステップ（Ｓ３）と、変換の倍率に基づいて、元画像に写る検出対象の位置および範囲を示す矩形領域の各頂点の座標を、教師画像に写る検出対象の位置および範囲を示す四角形領域の各頂点の座標に変換する座標変換ステップ（Ｓ４）と、座標変換ステップによる変換後の座標に基づいて教師画像に写る検出対象の位置および範囲を示すラベルを生成し、教師画像に対応付けて教師データとする教師データ生成ステップ（Ｓ６）と、教師データ生成ステップで生成された教師データを用いた機械学習により、画像から検出対象を検出するための学習済みモデルを生成する学習ステップ（Ｓ７）と、を含む。これにより、検出対象の位置および範囲を正しく示すラベルを含む教師データを用いて、画像から検出対象を検出するための学習済みモデルを自動で生成することができる。 As described above, the method of generating a trained model executed by the information processing apparatus 1 is to convert an original image in which a detection target is shown so as to expand or contract in at least one of the height direction and the width direction, and generate a teacher image. and the coordinate of each vertex of a rectangular area indicating the position and range of the detection target appearing in the original image, based on the magnification of the conversion, to the position and range of the detection target appearing in the teacher image. a coordinate transformation step (S4) for transforming into the coordinates of each vertex of a quadrangular area showing ; Learning to generate a trained model for detecting a detection target from an image by a teacher data generation step (S6) that uses the teacher data as teacher data in association with and machine learning using the teacher data generated in the teacher data generation step. and a step (S7). As a result, it is possible to automatically generate a trained model for detecting a detection target from an image using teacher data including labels that correctly indicate the position and range of the detection target.

〔変形例〕
上述の実施形態で説明した各処理の実行主体は任意であり、上述の例に限られない。つまり、相互に通信可能な複数の情報処理装置により、情報処理装置１と同様の機能を有する情報処理システムを構築することができる。例えば、教師画像の生成と、座標の変換と、教師データの生成と、学習と、推論とをそれぞれ別の情報処理装置が実行する構成としてもよい。また、例えば、教師画像の生成から教師データの生成までの処理、つまり教師データの生成方法を実行する情報処理装置と、生成された教師データを用いて学習済みモデルを生成する情報処理装置と、生成された学習済みモデルを用いて推論を行う情報処理装置と、を含む情報処理システムを構築することもできる。このように、上述の教師データの生成方法は、１または複数の情報処理装置により実現できる。上述の学習済みモデルの生成方法についても同様である。また、検出対象を撮影する撮影装置を情報処理システムの構成要素に含めてもよい。 [Modification]
The execution subject of each process described in the above embodiments is arbitrary, and is not limited to the above examples. That is, an information processing system having functions similar to those of the information processing apparatus 1 can be constructed by using a plurality of information processing apparatuses that can communicate with each other. For example, a configuration may be adopted in which different information processing apparatuses execute the generation of the teacher image, the conversion of the coordinates, the generation of the teacher data, the learning, and the inference. Also, for example, an information processing device that executes processing from generation of a teacher image to generation of teacher data, that is, a method of generating teacher data, an information processing device that generates a trained model using the generated teacher data, It is also possible to construct an information processing system including an information processing device that performs inference using the generated trained model. In this way, the method of generating teacher data described above can be implemented by one or more information processing devices. The same applies to the method of generating the trained model described above. Also, a photographing device for photographing a detection target may be included in the information processing system.

〔ソフトウェアによる実現例〕
情報処理装置１（以下、「装置」と呼ぶ）の機能は、当該装置としてコンピュータを機能させるためのプログラムであって、当該装置の各制御ブロック（特に制御部１０に含まれる各部）としてコンピュータを機能させるためのプログラムにより実現することができる。 [Example of realization by software]
The function of the information processing device 1 (hereinafter referred to as "device") is a program for causing a computer to function as the device, and the computer is used as each control block (especially each part included in the control unit 10) of the device. It can be realized by a program for functioning.

この場合、上記装置は、上記プログラムを実行するためのハードウェアとして、少なくとも１つの制御装置（例えばプロセッサ）と少なくとも１つの記憶装置（例えばメモリ）を有するコンピュータを備えている。この制御装置と記憶装置により上記プログラムを実行することにより、上記各実施形態で説明した各機能が実現される。 In this case, the apparatus comprises a computer having at least one control device (eg processor) and at least one storage device (eg memory) as hardware for executing the program. Each function described in each of the above embodiments is realized by executing the above program using the control device and the storage device.

上記制御装置としては、例えばＣＰＵ（Central Processing Unit）、ＧＰＵ（Graphics Processing Unit）、あるいはこれらの組み合わせ等を適用してもよい。また、上記記憶装置には、高速でデータの書き込みおよび読み出しが可能な高速記憶部と、高速記憶部よりもデータの記憶容量が大きい大容量記憶部とが含まれていてもよい。高速記憶部としては、例えばＳＤＲＡＭ（Synchronous Dynamic Random-Access Memory）等の高速アクセスメモリを適用することもできる。また、大容量記憶部としては、例えばＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid-State Drive）、ＳＤ（Secure Digital）カード、あるいはｅＭＭＣ（embedded Multi-Media Controller）等を適用することもできる。 As the control device, for example, a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or a combination thereof may be applied. Further, the storage device may include a high-speed storage unit capable of writing and reading data at high speed, and a large-capacity storage unit having a larger data storage capacity than the high-speed storage unit. As the high-speed storage unit, a high-speed access memory such as SDRAM (Synchronous Dynamic Random-Access Memory) can be applied. Moreover, as a large-capacity storage unit, for example, a HDD (Hard Disk Drive), SSD (Solid-State Drive), SD (Secure Digital) card, eMMC (embedded Multi-Media Controller), or the like can be applied.

上記プログラムは、一時的ではなく、コンピュータ読み取り可能な、１または複数の記録媒体に記録されていてもよい。この記録媒体は、上記装置が備えていてもよいし、備えていなくてもよい。後者の場合、上記プログラムは、有線または無線の任意の伝送媒体を介して上記装置に供給されてもよい。 The program may be recorded on one or more computer-readable recording media, not temporary. The recording medium may or may not be included in the device. In the latter case, the program may be supplied to the device via any transmission medium, wired or wireless.

また、上記各制御ブロックの機能の一部または全部は、集積回路（ＩＣチップ）等に形成された論理回路により実現することも可能である。例えば、上記各制御ブロックとして機能する論理回路が形成された集積回路も本発明の範疇に含まれる。この他にも、例えば量子コンピュータにより上記各制御ブロックの機能を実現することも可能である。 Also, part or all of the functions of each control block can be realized by a logic circuit formed in an integrated circuit (IC chip) or the like. For example, integrated circuits in which logic circuits functioning as the control blocks described above are formed are also included in the scope of the present invention. In addition, it is also possible to implement the functions of the control blocks described above by, for example, a quantum computer.

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。 The present invention is not limited to the above-described embodiments, but can be modified in various ways within the scope of the claims, and can be obtained by appropriately combining technical means disclosed in different embodiments. is also included in the technical scope of the present invention.

１情報処理装置
１０１教師画像生成部
１０２座標変換部
１０３教師データ生成部
１０４学習部 1 Information Processing Device 101 Teacher Image Generation Unit 102 Coordinate Transformation Unit 103 Teacher Data Generation Unit 104 Learning Unit

Claims

a teacher image generating unit for generating a teacher image by performing a conversion to enlarge or reduce an original image in which a detection target is captured in at least one of the height direction and the width direction;
Coordinates of each vertex of a rectangular area indicating the position and range of the detection target in the original image are obtained based on the conversion magnification, and each vertex of a rectangular area indicating the position and range of the detection target in the teacher image. a coordinate transformation unit that transforms the coordinates of
a teacher data generating unit that generates a label indicating the position and range of the detection target appearing in the teacher image based on the coordinates converted by the coordinate transforming unit, and associates the label with the teacher image as teacher data; information processing device.

The training data generation unit, from the coordinates of each vertex of the quadrilateral area,
(1) calculating representative coordinates of the rectangular area as information indicating the position of the detection target;
(2) calculating the length of one side of the rectangular region as information indicating the height of the detection target;
(3) calculating the tilt angle of the one side as information indicating the tilt angle of the detection target;
(4) The information processing apparatus according to claim 1, wherein the distance between the one side and the side opposite to the one side is calculated as the information indicating the width of the detection target.

3. The information processing apparatus according to claim 1, further comprising a learning unit that generates a trained model for detecting the detection target from an image by machine learning using the teacher data generated by the teacher data generation unit. .

A method of generating teacher data executed by one or more information processing devices, comprising:
a teacher image generating step of generating a teacher image by applying a transformation that enlarges or shrinks an original image in which a detection target is captured in at least one of the height direction and the width direction;
Coordinates of each vertex of a rectangular area indicating the position and range of the detection target in the original image are obtained based on the conversion magnification, and each vertex of a rectangular area indicating the position and range of the detection target in the teacher image. a coordinate transformation step of transforming to the coordinates of
a teacher data generation step of generating a label indicating the position and range of the detection target appearing in the teacher image based on the coordinates after the transformation by the coordinate transformation step, and associating the label with the teacher image as teacher data; How to generate teacher data including.

A method for generating a trained model executed by one or more information processing devices,
a teacher image generating step of generating a teacher image by applying a transformation that enlarges or shrinks an original image in which a detection target is captured in at least one of the height direction and the width direction;
Coordinates of each vertex of a rectangular area indicating the position and range of the detection target in the original image are obtained based on the conversion magnification, and each vertex of a rectangular area indicating the position and range of the detection target in the teacher image. a coordinate transformation step of transforming to the coordinates of
a teacher data generation step of generating a label indicating the position and range of the detection target appearing in the teacher image based on the coordinates after the transformation by the coordinate transformation step, and associating the label with the teacher image as teacher data;
and a learning step of generating a learned model for detecting the detection target from an image by machine learning using the teacher data generated in the teacher data generating step.

2. A program for causing a computer to function as the information processing apparatus according to claim 1, the program for causing the computer to function as the teacher image generating section, the coordinate transforming section, and the teacher data generating section.