JP2023166812A

JP2023166812A - Information processing method, computer program and information processing device

Info

Publication number: JP2023166812A
Application number: JP2022077607A
Authority: JP
Inventors: 聖悟山下; Seigo Yamashita; アクヒルアシュレフ; Akhil Ashref; ドゥルブリテッシュジェイン; Dwulub Ritesh Jane
Original assignee: Exa Wizards Inc
Current assignee: Exa Wizards Inc
Priority date: 2022-05-10
Filing date: 2022-05-10
Publication date: 2023-11-22
Anticipated expiration: 2042-05-10
Also published as: JP7225458B1

Abstract

To provide an information processing method that can be expected to support annotation work for images, a computer program, and an information processing device.SOLUTION: In an information processing method according to the present embodiment, an information processing device acquires a plurality of captured images of the same object, each captured using a different wavelength of light, and based on at least one of the plurality of acquired captured images, attaches related information to at least one of the plurality of captured images. Also, the information processing device may receive designation of an image region for at least one captured image of the plurality of acquired captured images, extract a similar region to the accepted image region from the captured image, and attach information on the image region and the similar region as the related information.SELECTED DRAWING: Figure 3

Description

本発明は、画像に対するアノテーション作業を支援する情報処理方法、コンピュータプログラム及び情報処理装置に関する。 The present invention relates to an information processing method, computer program, and information processing apparatus that support annotation work on images.

近年、機械学習及び深層学習の技術が著しく進歩しており、これらの機械学習等により生成された学習モデルを用いる高機能な装置及び製品等が広く利用されている。機械学習等の手法の中で、教師あり学習と呼ばれる手法では、学習モデルに対する入力情報と出力情報とが対応付けられた学習用データ（いわゆる教師データ）が必要である。例えば、学習モデルに対して入力される画像と、この画像に写されたものの分類結果又はこの画像中で対象物が写された画像領域等の情報とが対応付けられて、学習用データとして教師あり学習に用いられる。この学習用データの生成、即ち入力情報に対する出力情報の対応付けの作業はアノテーションと呼ばれ、人手による作業である場合が多い。 2. Description of the Related Art In recent years, machine learning and deep learning technologies have made remarkable progress, and high-performance devices and products that use learning models generated by these machine learning methods are widely used. Among methods such as machine learning, a method called supervised learning requires learning data (so-called teacher data) in which input information and output information for a learning model are associated. For example, an image that is input to a learning model is associated with information such as the classification results of things captured in this image or image areas where objects are captured in this image, and the training model is used as learning data. Used for learning. This process of generating learning data, that is, associating output information with input information, is called annotation, and is often a manual process.

特許文献１においては、同一の付与対象への複数の作業者からの入力に基づく複数のアノテーションを取得し、取得した複数のアノテーションを比較し、比較に基づいて、複数の作業者間にアノテーションの付与基準のずれがあるかを判定し、複数の作業者間に付与基準のずれがある場合に警告を行う情報処理装置が提案されている。 In Patent Document 1, multiple annotations based on inputs from multiple workers to the same assignment target are acquired, the acquired multiple annotations are compared, and based on the comparison, annotations are shared between multiple workers. An information processing device has been proposed that determines whether there is a deviation in the application standards and issues a warning if there is a deviation in the application standards between a plurality of workers.

特開２０２１－１９６９０５号公報Japanese Patent Application Publication No. 2021-196905

一般的に、機械学習には大量の学習用データが必要であり、人手で行うアノテーションの作業量は膨大である。またアノテーションの作業を行う作業者には扱うデータに対するある程度に知識が必要であり、誰でも簡単にアノテーションの作業を行うことができるわけではない。また扱うデータが秘匿性の高いものである場合、アノテーションの作業を行うことができる人数が制限される可能性がある。 Generally, machine learning requires a large amount of learning data, and the amount of manual annotation work is enormous. In addition, a worker who performs annotation work requires a certain level of knowledge about the data being handled, and not just anyone can perform annotation work easily. Additionally, if the data being handled is highly confidential, the number of people who can perform annotation work may be limited.

本発明は、斯かる事情に鑑みてなされたものであって、その目的とするところは、画像に対するアノテーション作業を支援することが期待できる情報処理方法、コンピュータプログラム及び情報処理装置を提供することにある。 The present invention has been made in view of the above circumstances, and its purpose is to provide an information processing method, a computer program, and an information processing device that can be expected to support annotation work on images. be.

一実施形態に係る情報処理方法は、情報処理装置が、それぞれが異なる光の波長に基づいて同じ対象物を撮影した複数の撮影画像を取得し、取得した複数の撮影画像のうちの少なくとも１つの撮影画像に基づいて、前記複数の撮影画像のうちの少なくとも１つの撮影画像に対する関連情報の付与を行う。 In an information processing method according to an embodiment, an information processing device obtains a plurality of captured images of the same object based on different wavelengths of light, and at least one of the obtained plurality of captured images. Based on the photographed images, relevant information is added to at least one photographed image among the plurality of photographed images.

一実施形態による場合は、画像に対するアノテーション作業を支援することが期待できる。 According to one embodiment, it can be expected to support annotation work on images.

本実施の形態に係る情報処理システムの構成を説明するための模式図である。FIG. 1 is a schematic diagram for explaining the configuration of an information processing system according to the present embodiment. 回転フィルタの構成を説明するための模式図である。FIG. 2 is a schematic diagram for explaining the configuration of a rotating filter. 本実施の形態に係る情報処理装置１の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of an information processing device 1 according to the present embodiment. 合成画像の一例を示す模式図である。FIG. 2 is a schematic diagram showing an example of a composite image. 赤外線領域画像の一例を示す模式図である。FIG. 2 is a schematic diagram showing an example of an infrared region image. アノテーションの作業の一例を説明するための模式図である。FIG. 2 is a schematic diagram for explaining an example of annotation work. アノテーションの作業の一例を説明するための模式図である。FIG. 2 is a schematic diagram for explaining an example of annotation work. 学習用データの一例を説明するための模式図である。FIG. 2 is a schematic diagram for explaining an example of learning data. 本実施の形態に係る情報処理装置が行うアノテーション処理の手順を示すフローチャートである。3 is a flowchart showing the procedure of annotation processing performed by the information processing apparatus according to the present embodiment. 本実施の形態に係る情報処理装置が行うアノテーション処理の手順を示すフローチャートである。3 is a flowchart showing the procedure of annotation processing performed by the information processing apparatus according to the present embodiment.

本発明の実施形態に係る情報処理システムの具体例を、以下に図面を参照しつつ説明する。なお、本発明はこれらの例示に限定されるものではなく、特許請求の範囲によって示され、特許請求の範囲と均等の意味及び範囲内でのすべての変更が含まれることが意図される。 A specific example of an information processing system according to an embodiment of the present invention will be described below with reference to the drawings. Note that the present invention is not limited to these examples, but is indicated by the scope of the claims, and is intended to include all changes within the meaning and scope equivalent to the scope of the claims.

＜システム構成＞
図１は、本実施の形態に係る情報処理システムの構成を説明するための模式図である。本実施の形態に係る情報処理システムは、情報処理装置１、カメラ３及び回転フィルタ７等を備えて構成されている。本実施の形態に係る情報処理システムでは、カメラ３が回転フィルタ７を介して対象物を撮影した撮影画像を情報処理装置１が取得し、ユーザが情報処理装置１を利用してこの撮影画像に対するアノテーションを行うシステムである。 <System configuration>
FIG. 1 is a schematic diagram for explaining the configuration of an information processing system according to this embodiment. The information processing system according to this embodiment includes an information processing device 1, a camera 3, a rotating filter 7, and the like. In the information processing system according to the present embodiment, the information processing device 1 acquires a captured image of an object captured by the camera 3 via the rotating filter 7, and the user uses the information processing device 1 to It is a system for annotation.

アノテーションは、何らかのデータに対して関連する情報を付与する処理又は作業等であり、特に人工知能の分野においては教師あり学習を行うための教師データを作成する処理又は作業等である。アノテーションの作業では、例えば学習モデルへの入力となる数値、画像又は文字列等のデータに対して、学習モデルが出力すべきデータ（正解値又は正解ラベル等の情報）を付与する作業が、一又は複数のユーザにより行われる。本実施の形態に係る情報処理システムは、入力データとして画像を扱うアノテーションの作業を支援するシステムである。 Annotation is the process or work of adding related information to some data, and particularly in the field of artificial intelligence, it is the process or work of creating teacher data for supervised learning. In annotation work, for example, one task is to add data (information such as correct values or correct answer labels) that the learning model should output to data such as numerical values, images, or character strings that are input to the learning model. Or by multiple users. The information processing system according to this embodiment is a system that supports annotation work that uses images as input data.

また本実施の形態に係る情報処理システムは、例えば画像に写された対象物の領域又は対象物に含まれる異常箇所の領域等のように、画像に含まれる特定の領域を示す情報を付与する、詳しくは画像の画素毎にラベル等の情報を付与するアノテーションの作業を支援する。画像に対してこのような情報が付された学習用データは、いわゆる画像のセグメンテーションを行う学習モデルの機械学習に用いられる。このようなアノテーションの作業は、例えばディスプレイに表示された画像に対し、ユーザがマウス又はペンタブレット等の入力装置を操作して、画像中の特定領域を選択することで行われ得る。 Further, the information processing system according to the present embodiment adds information indicating a specific area included in the image, such as the area of the object captured in the image or the area of an abnormality included in the object. Specifically, it supports annotation work that adds information such as labels to each pixel of an image. Learning data in which such information is attached to an image is used for machine learning of a learning model that performs so-called image segmentation. Such an annotation work can be performed, for example, by a user operating an input device such as a mouse or a pen tablet on an image displayed on a display to select a specific area in the image.

ただし本実施の形態に係る情報処理システムにおいて、アノテーションにより入力データに付与されるデータは、例えば画像に写された対象物の異常の有無を示す２値分類結果、又は、画像に写された対象物を複数のクラスのいずれかに分類する多値分類結果等のように、１つの画像に対して１つの値が付与される態様のデータであってもよい。 However, in the information processing system according to the present embodiment, the data added to the input data by annotation is, for example, a binary classification result indicating the presence or absence of an abnormality of the object shown in the image, or the data added to the input data by annotation. The data may be data in which one value is assigned to one image, such as a multi-value classification result for classifying an object into one of a plurality of classes.

本実施の形態に係る情報処理システムでは、カメラ３及び回転フィルタ７を用いた撮影により、異なる光の波長に基づいて同じ対象物を撮影した複数の撮影画像を取得し、取得した複数の画像を基にアノテーションの作業が行われる。図２は、回転フィルタ７の構成を説明するための模式図である。回転フィルタ７は、円板状の本体部に複数種類のフィルタ７ａ～７ｆが周方向に並べて配置されたものであり、本体部の中心にはモータの回転軸が固定されている。回転フィルタ７は、いずれか１つのフィルタ７ａ～７ｆがカメラ３のレンズの前に位置するよう設置される。回転フィルタ７のモータはカメラ３により回転の制御が行われ、カメラ３はモータの回転を制御して回転フィルタ７の本体部を回転させることで、カメラ３のレンズの前に位置するフィルタ７ａ～７ｆを切り替えることができる。即ちカメラ３は、回転フィルタ７が備える複数種類のフィルタ７ａ～７ｆの１つを選択して、検査対象物の撮影を行うことができる。 In the information processing system according to the present embodiment, a plurality of photographed images of the same object are acquired based on different wavelengths of light by photographing using the camera 3 and the rotating filter 7, and the plurality of acquired images are Annotation work is done based on this. FIG. 2 is a schematic diagram for explaining the configuration of the rotating filter 7. As shown in FIG. The rotary filter 7 has a disc-shaped main body in which a plurality of types of filters 7a to 7f are arranged side by side in the circumferential direction, and a rotating shaft of a motor is fixed at the center of the main body. The rotating filters 7 are installed such that any one of the filters 7a to 7f is located in front of the lens of the camera 3. The rotation of the motor of the rotary filter 7 is controlled by the camera 3, and the camera 3 controls the rotation of the motor to rotate the main body of the rotary filter 7, thereby rotating the filters 7a to 7a located in front of the lens of the camera 3. 7f can be switched. That is, the camera 3 can select one of the plurality of types of filters 7a to 7f included in the rotating filter 7 to photograph the object to be inspected.

また本実施の形態に係るカメラ３は、可視光線のみでなく、赤外線を受光して撮影を行うことが可能なカメラである。回転フィルタ７が備える複数種類のフィルタ７ａ～７ｆには、例えば赤色の光を通過させる赤色フィルタ、青色の光を通過させる青色フィルタ、緑色の光を通過させる緑色フィルタ、第１波長（例えば７５０ｎｍ）の赤外線を通過させる第１赤外線フィルタ、及び、第２波長（例えば８００ｎｍ）の赤外線を通過させる第２赤外線フィルタ等が含まれ得る。また回転フィルタ７のフィルタ７ａ～７ｆには、実質的にフィルタなしで通常の撮影をカメラ３にて行うことを可能とするもの（赤外線を受光しない通常のカメラと同等の撮影を行うことを可能とするフィルタ、例えば通常のカメラが備える赤外線を遮断するフィルタなど）が含まれている。 Furthermore, the camera 3 according to the present embodiment is a camera that can receive not only visible light but also infrared light to take pictures. The plurality of types of filters 7a to 7f included in the rotary filter 7 include, for example, a red filter that passes red light, a blue filter that passes blue light, a green filter that passes green light, and a first wavelength (for example, 750 nm) filter. A first infrared filter that passes infrared rays of a second wavelength (for example, 800 nm), a second infrared filter that passes infrared rays of a second wavelength (for example, 800 nm), and the like may be included. In addition, the filters 7a to 7f of the rotating filter 7 are filters that enable the camera 3 to perform normal photography virtually without a filter (allowing the camera 3 to perform photography equivalent to a normal camera that does not receive infrared rays). (e.g., a filter that blocks infrared rays included in ordinary cameras).

本実施の形態に係るカメラ３は、回転フィルタ７を回転させて複数種類のフィルタ７ａ～７ｆを介した検査対象物の撮影を順に行い、撮影した複数の撮影画像を情報処理装置１へ出力する。例えば回転フィルタ７に６種類のフィルタ７ａ～７ｆが設けられている場合、カメラ３は各フィルタ７ａ～７ｆを介した検査対象物の撮影を順に行うことによって、６種類の撮影画像を取得することができる。ただしカメラ３は、回転フィルタ７が備える全てのフィルタ７ａ～７ｆを用いた撮影を行う必要はなく、例えば回転フィルタ７に６種類のフィルタ７ａ～７ｆが設けられている場合であっても、５つ又はそれ以下のフィルタ７ａ～７ｆを用いて撮影を順に行い、５種類又はそれ以下の種類の撮影画像を取得してもよい。 The camera 3 according to the present embodiment rotates the rotary filter 7 to sequentially photograph the object to be inspected through a plurality of types of filters 7a to 7f, and outputs the plurality of photographed images to the information processing device 1. . For example, if the rotary filter 7 is provided with six types of filters 7a to 7f, the camera 3 can acquire six types of photographed images by sequentially photographing the object to be inspected through each of the filters 7a to 7f. Can be done. However, the camera 3 does not need to take pictures using all the filters 7a to 7f included in the rotating filter 7. For example, even if the rotating filter 7 is provided with six types of filters 7a to 7f, Photographing may be performed sequentially using one or fewer filters 7a to 7f to obtain five or fewer types of photographed images.

なお本実施の形態においては、回転フィルタ７を用いて複数種類のフィルタを変更する構成が採用されているが、フィルタを変更するための構成は回転フィルタ７の構成に限らない。例えば、カメラ３のレンズに対して一又は複数のフィルタがスライドすることにより切り替わる構成であってよく、これら以外の方法でフィルタの切り替えが行われる構成であってよい。またフィルタの切り替えは、カメラ３の制御により行われる構成に限らず、例えば情報処理装置１の制御により自動で行われてよく、ユーザの手動により行われてもよい。ユーザが手動でフィルタを切り替える構成の場合、例えばカメラ３のレンズに対してユーザがフィルタを個別に着脱する構成であってもよい。 Note that in this embodiment, a configuration is adopted in which a plurality of types of filters are changed using the rotating filter 7, but the configuration for changing filters is not limited to the configuration of the rotating filter 7. For example, the configuration may be such that one or more filters are switched by sliding with respect to the lens of the camera 3, or the filters may be switched by other methods. Furthermore, the filter switching is not limited to the configuration in which it is performed under the control of the camera 3, and may be performed automatically under the control of the information processing device 1, for example, or may be performed manually by the user. In the case of a configuration in which the user manually switches filters, for example, a configuration in which the user individually attaches and detaches the filters to and from the lens of the camera 3 may be used.

情報処理装置１は、カメラ３が回転フィルタ７の複数のフィルタ７ａ～７ｆを介して撮影した複数の画像を取得し、これら複数の画像を１セットの画像として保存する。なお本実施の形態においては、カメラ３から情報処理装置１への画像の受け渡しは、例えば有線又は無線の通信を介して行われるものとするが、これに限るものではない。例えば、メモリカード又は光ディスク等の記録媒体を介してカメラ３から情報処理装置１へ画像を授受してもよく、カメラ３及び情報処理装置１がサーバ装置等の他の装置を介して画像を授受してもよく、カメラ３及び情報処理装置１が一体の装置であってもよい。 The information processing device 1 acquires a plurality of images taken by the camera 3 through the plurality of filters 7a to 7f of the rotating filter 7, and stores these plurality of images as one set of images. In this embodiment, it is assumed that images are transferred from the camera 3 to the information processing device 1 via, for example, wired or wireless communication, but the present invention is not limited to this. For example, images may be sent and received from the camera 3 to the information processing device 1 via a recording medium such as a memory card or an optical disk, and the camera 3 and the information processing device 1 may send and receive images via another device such as a server device. Alternatively, the camera 3 and the information processing device 1 may be an integrated device.

情報処理装置１は、アノテーションの作業を行うユーザに対して、１セットとされた複数の画像のうちの一又は複数の画像を表示し、画像に含まれる特定の領域を指定する操作をユーザから受け付ける。情報処理装置１は、ユーザの操作により指定された領域に含まれる画素（及び、指定されていない領域に含まれる画素）に対して付すラベル等のデータを生成し、生成したデータを画像と共に記憶する。 The information processing device 1 displays one or more images of a set of images to a user performing an annotation work, and prompts the user to specify a specific area included in the image. accept. The information processing device 1 generates data such as labels to be attached to pixels included in the area specified by the user's operation (and pixels included in the unspecified area), and stores the generated data together with the image. do.

＜装置構成＞
図３は、本実施の形態に係る情報処理装置１の構成を示すブロック図である。本実施の形態に係る情報処理装置１は、処理部１１、記憶部（ストレージ）１２、通信部（トランシーバ）１３、表示部（ディスプレイ）１４及び操作部１５等を備えて構成されている。本実施の形態に係る情報処理装置１は、例えばパーソナルコンピュータ又はタブレット型端末装置等の汎用的な情報処理装置を用いて構成され得る。なお本実施の形態においては、１つの情報処理装置１にて処理が行われるものとして説明を行うが、複数の情報処理装置が分散して処理を行ってもよい。 <Device configuration>
FIG. 3 is a block diagram showing the configuration of the information processing device 1 according to the present embodiment. The information processing device 1 according to the present embodiment includes a processing section 11, a storage section 12, a communication section (transceiver) 13, a display section 14, an operation section 15, and the like. The information processing device 1 according to the present embodiment may be configured using a general-purpose information processing device such as a personal computer or a tablet terminal device. Note that although the present embodiment will be described assuming that the processing is performed by one information processing device 1, the processing may be performed in a distributed manner by a plurality of information processing devices.

処理部１１は、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro-Processing Unit）、ＧＰＵ（Graphics Processing Unit）又は量子プロセッサ等の演算処理装置、ＲＯＭ（Read Only Memory）及びＲＡＭ（Random Access Memory）等を用いて構成されている。処理部１１は、記憶部１２に記憶されたプログラム１２ａを読み出して実行することにより、カメラ３が回転フィルタ７を介して撮影した複数の撮影画像を取得する処理、及び、これら複数の撮影画像を用いたアノテーションの作業を支援する処理等の種々の処理を行う。 The processing unit 11 includes an arithmetic processing device such as a CPU (Central Processing Unit), an MPU (Micro-Processing Unit), a GPU (Graphics Processing Unit), or a quantum processor, a ROM (Read Only Memory), a RAM (Random Access Memory), etc. It is configured using The processing unit 11 reads out and executes a program 12a stored in the storage unit 12, thereby performing a process of acquiring a plurality of photographed images taken by the camera 3 via the rotating filter 7, and processing of acquiring these plurality of photographed images. Performs various processing such as processing to support the work of the annotations used.

記憶部１２は、例えばハードディスク等の大容量の記憶装置を用いて構成されている。記憶部１２は、処理部１１が実行する各種のプログラム、及び、処理部１１の処理に必要な各種のデータを記憶する。本実施の形態において記憶部１２は、処理部１１が実行するプログラム１２ａを記憶する。また記憶部１２には、ユーザがアノテーションの作業を行うことで生成された機械学習に用いる学習用データを記憶する学習用データ記憶部１２ｂが設けられている。 The storage unit 12 is configured using a large-capacity storage device such as a hard disk, for example. The storage unit 12 stores various programs executed by the processing unit 11 and various data necessary for processing by the processing unit 11. In this embodiment, the storage unit 12 stores a program 12a executed by the processing unit 11. The storage unit 12 is also provided with a learning data storage unit 12b that stores learning data used for machine learning that is generated by the user performing annotation work.

本実施の形態においてプログラム（コンピュータプログラム、プログラム製品）１２ａは、メモリカード又は光ディスク等の記録媒体９９に記録された態様で提供され、情報処理装置１は記録媒体９９からプログラム１２ａを読み出して記憶部１２に記憶する。ただし、プログラム１２ａは、例えば情報処理装置１の製造段階において記憶部１２に書き込まれてもよい。また例えばプログラム１２ａは、遠隔のサーバ装置等が配信するものを情報処理装置１が通信にて取得してもよい。例えばプログラム１２ａは、記録媒体９９に記録されたものを書込装置が読み出して情報処理装置１の記憶部１２に書き込んでもよい。プログラム１２ａは、ネットワークを介した配信の態様で提供されてもよく、記録媒体９９に記録された態様で提供されてもよい。 In this embodiment, the program (computer program, program product) 12a is provided in a form recorded on a recording medium 99 such as a memory card or an optical disk, and the information processing device 1 reads the program 12a from the recording medium 99 and stores it in the storage unit. 12. However, the program 12a may be written into the storage unit 12, for example, during the manufacturing stage of the information processing device 1. Further, for example, the program 12a may be distributed by a remote server device or the like, and the information processing device 1 may obtain it through communication. For example, the program 12a may be recorded on the recording medium 99 and read by a writing device and written into the storage unit 12 of the information processing device 1. The program 12a may be provided in the form of distribution via a network, or may be provided in the form of being recorded on the recording medium 99.

学習用データ記憶部１２ｂは、カメラ３が撮影した一又は複数の撮影画像と、ユーザのアノテーションの作業により付された情報とを対応付けて、学習用データとして記憶する。なお、カメラ３が例えば６種類の光の波長による同一対象物の撮影を行って６つの撮影画像を情報処理装置１が取得する場合、ユーザのアノテーションの作業がなされた後の学習用データには、６つの撮影画像のうちの少なくとも１つの撮影画像が含まれていればよく、６つの撮影画像の全てが含まれていなくてよい。即ち、学習用データには、学習モデルの機械学習に必要な画像（学習モデルへ入力される画像）のみが少なくとも含まれていればよく、機械学習に不要な画像は含まれていなくてよい。 The learning data storage unit 12b associates one or more photographed images taken by the camera 3 with information added by the user's annotation work and stores them as learning data. Note that when the camera 3 photographs the same object using, for example, six different wavelengths of light, and the information processing device 1 acquires six photographed images, the learning data after the user's annotation work is , it is sufficient that at least one of the six captured images is included, and it is not necessary to include all of the six captured images. That is, the learning data need only include at least images necessary for machine learning of the learning model (images input to the learning model), and do not need to include images unnecessary for machine learning.

通信部１３は、例えば通信ケーブルを介してカメラ３に接続され、カメラ３との間で撮影画像及び制御情報等のデータの授受を行う。通信部１３は、例えばＵＳＢ（Universal Serial Bus）又はＬＡＮ（Local Area Network）等の通信規格に基づいてデータの授受を行ってよい。また通信部１３は、通信ケーブルを介すことなく、無線通信によりカメラ３とのデータの授受を行ってもよい。 The communication unit 13 is connected to the camera 3 via a communication cable, for example, and exchanges data such as captured images and control information with the camera 3. The communication unit 13 may exchange data based on a communication standard such as USB (Universal Serial Bus) or LAN (Local Area Network). Further, the communication unit 13 may exchange data with the camera 3 by wireless communication without using a communication cable.

表示部１４は、液晶ディスプレイ等を用いて構成されており、処理部１１の処理に基づいて種々の画像及び文字等を表示する。操作部１５は、ユーザの操作を受け付け、受け付けた操作を処理部１１へ通知する。例えば操作部１５は、機械式のボタン又は表示部１４の表面に設けられたタッチパネル等の入力デバイスによりユーザの操作を受け付ける。また例えば操作部１５は、マウス及びキーボード等の入力デバイスであってよく、これらの入力デバイスは情報処理装置１に対して取り外すことが可能な構成であってもよい。 The display unit 14 is configured using a liquid crystal display or the like, and displays various images, characters, etc. based on the processing of the processing unit 11. The operation unit 15 accepts user operations and notifies the processing unit 11 of the accepted operations. For example, the operation unit 15 receives user operations using input devices such as mechanical buttons or a touch panel provided on the surface of the display unit 14 . Further, for example, the operation unit 15 may be an input device such as a mouse and a keyboard, and these input devices may be configured to be detachable from the information processing apparatus 1.

また本実施の形態に係る情報処理装置１には、記憶部１２に記憶されたプログラム１２ａを処理部１１が読み出して実行することにより、撮影画像取得部１１ａ、アノテーション処理部１１ｂ及び表示処理部１１ｃ等が、ソフトウェア的な機能部として処理部１１に実現される。なお本図においては、処理部１１の機能部として、カメラ３の撮影画像に基づくアノテーションを支援する処理に関連する機能部を図示し、これ以外の処理に関する機能部は図示を省略している。 In addition, the information processing device 1 according to the present embodiment includes a captured image acquisition section 11a, an annotation processing section 11b, and a display processing section 11c, by the processing section 11 reading out and executing the program 12a stored in the storage section 12. etc. are implemented in the processing unit 11 as software-like functional units. In this figure, as functional units of the processing unit 11, functional units related to processing that supports annotation based on images taken by the camera 3 are illustrated, and functional units related to other processes are omitted.

撮影画像取得部１１ａは、通信部１３にてカメラ３との間でデータの送受信を行うことによって、カメラ３が撮影した対象物の撮影画像、即ち回転フィルタ７の複数のフィルタ７ａ～７ｆを介して撮影した複数の撮影画像を取得する処理を行う。撮影画像取得部１１ａは、取得した撮影画像を記憶部１２に一時的に記憶する。 The photographed image acquisition unit 11a transmits and receives data to and from the camera 3 through the communication unit 13, thereby acquiring the photographed image of the object photographed by the camera 3, that is, through the plurality of filters 7a to 7f of the rotating filter 7. Performs processing to acquire multiple captured images. The photographed image acquisition unit 11a temporarily stores the acquired photographed image in the storage unit 12.

アノテーション処理部１１ｂは、カメラ３の撮影画像に対するユーザのアノテーションの作業を支援する処理を行う。例えば、アノテーション処理部１１ｂは、異なる光の波長に基づいて撮影された複数の画像を表示部３４に並べて表示し、ユーザからいずれの画像を基にアノテーションの作業を行うかの選択を受け付ける。アノテーション処理部１１ｂは、選択された画像を表示部３４に拡大表示し、表示した画像の中から特定のラベルを付す画像領域の指定をユーザから受け付ける。アノテーション処理部１１ｂは、指定された画像領域に関する情報を撮影画像に付して学習用データとし、この学習用データを学習用データ記憶部１２ｂに記憶する。 The annotation processing unit 11b performs processing to support the user's annotation work on images captured by the camera 3. For example, the annotation processing section 11b displays a plurality of images taken based on different wavelengths of light side by side on the display section 34, and receives a selection from the user as to which image is to be used for annotation. The annotation processing unit 11b enlarges and displays the selected image on the display unit 34, and receives from the user a designation of an image area to be given a specific label from among the displayed images. The annotation processing unit 11b adds information regarding the designated image area to the photographed image as learning data, and stores this learning data in the learning data storage unit 12b.

このときにアノテーション処理部１１ｂは、例えばマウス等の入力デバイスにより表示した画像の特定箇所の選択をユーザから受け付け、受け付けた特定箇所の画素値を画像から取得し、この画素値を含む所定範囲内の画素値の画素が集まっている領域を画像から特定して候補領域とし、この候補領域を示す境界線又は図形等を基の画像に重畳して表示することでユーザに提示する。ユーザはこの候補領域の承認又は修正等の操作を行い、アノテーション処理部１１ｂは承認又は修正等がなされた領域を、特定のラベルを付す画像領域として受け付ける。 At this time, the annotation processing unit 11b accepts a selection of a specific location in the displayed image from the user using an input device such as a mouse, obtains the pixel value of the accepted specific location from the image, and selects a pixel value within a predetermined range that includes this pixel value. An area where pixels with a pixel value of . The user performs operations such as approving or modifying this candidate area, and the annotation processing unit 11b accepts the approved or revised area as an image area to which a specific label is attached.

またアノテーション処理部１１ｂは、複数の画像に基づいて、特定のラベルを付す画像領域の候補となる一又は複数の候補領域を先に決定し、元の画像に候補領域を示す境界線又は図形等を重畳して表示してもよい。ユーザは、表示された一又は複数の候補領域の中から適切なものを選択する、不適切なものを修正又は削除する等の操作を行い、アノテーション処理部１１ｂは、これらの操作がなされた後の候補領域を、特定のラベルを付す画像領域として受け付けることができる。 In addition, the annotation processing unit 11b first determines one or more candidate regions that are candidates for image regions to be attached with a specific label based on the plurality of images, and adds a boundary line, figure, etc. indicating the candidate region to the original image. may be displayed in a superimposed manner. The user performs operations such as selecting an appropriate one from the displayed candidate area or modifying or deleting an inappropriate area, and the annotation processing unit 11b can be accepted as an image region to be labeled with a specific label.

表示処理部１１ｃは、画像及び文字等の種々の情報を表示部１４に表示する処理を行う。本実施の形態において表示処理部１１ｃは、アノテーション処理部１１ｂによるアノテーションの支援において、撮影画像の表示、領域を示す境界線又は図形等の表示、及び、ユーザに対するメッセージの表示等の種々の表示を行う。 The display processing unit 11c performs processing to display various information such as images and characters on the display unit 14. In the present embodiment, the display processing unit 11c performs various displays such as displaying captured images, displaying boundaries or figures indicating areas, and displaying messages to the user in support of annotation by the annotation processing unit 11b. conduct.

＜アノテーションの支援＞
本実施の形態に係る情報処理装置１は、カメラ３が回転フィルタ７の各フィルタ７ａ～７ｆを介して撮影した複数の撮影画像を取得し、取得した複数の画像を１セットとして記憶部１２に記憶し、多数の撮影画像のセットを蓄積する。ユーザは、情報処理装置１を用いて、蓄積された撮影画像に対するアノテーションの作業を行う。アノテーションの作業としてユーザは、例えば撮影画像に置ける特定の対象物が写された領域を指定する操作、又は、撮影画像に写された対象物の異常箇所が写された領域を指定する操作等を行う。なおユーザは、例えば撮影画像に写されている対象物の分類を示すラベルの入力、又は、撮影画像に写されている対象物に異常があるか否かを示すラベルの入力等をアノテーションの作業として行ってもよい。 <Annotation support>
The information processing device 1 according to the present embodiment acquires a plurality of photographed images taken by the camera 3 through each of the filters 7a to 7f of the rotating filter 7, and stores the plurality of acquired images as one set in the storage unit 12. Store and accumulate a large set of captured images. The user uses the information processing device 1 to perform annotation work on the accumulated captured images. As annotation work, the user may, for example, specify an area in a photographed image where a specific object is photographed, or specify an area where an abnormal part of the object photographed in a photographed image is photographed. conduct. Note that the user may perform annotation work such as inputting a label indicating the classification of the object shown in the captured image, or inputting a label indicating whether or not there is an abnormality in the object captured in the captured image. You can also go as

情報処理装置１は、例えば記憶部１２に記憶された１セットの撮影画像を読み出し、この１セットに含まれる複数の画像を表示部３４に並べて表示する。情報処理装置１は、同一の対象物を異なる光の波長に基づいて撮影した複数の撮影画像の中から、アノテーションの作業に用いる撮影画像の選択をユーザから受け付ける。情報処理装置１は、ユーザにより選択された撮影画像を表示部３４に拡大して表示し、撮影画像の領域を指定する操作を操作部３５にて受け付ける。 The information processing device 1 reads out one set of photographed images stored in the storage unit 12, for example, and displays a plurality of images included in this one set side by side on the display unit 34. The information processing device 1 receives from the user a selection of a captured image to be used for annotation work from among a plurality of captured images of the same object captured using different wavelengths of light. The information processing device 1 enlarges and displays the photographed image selected by the user on the display unit 34, and receives an operation for specifying a region of the photographed image using the operation unit 35.

また本実施の形態に係る情報処理装置１は、カメラ３により撮影された複数の撮影画像のみでなく、これらの複数の撮影画像を合成した合成画像を生成し、複数の撮影画像と共に一又は複数の合成画像をユーザの選択対象として並べて表示してもよい。図４は、合成画像の一例を示す模式図である。本図には、「ＲＧＢ画像」、「赤外線領域画像」、「赤色領域画像」及び「合成画像」の４つの画像が情報処理装置１の表示部３４に並べて表示されている。 Furthermore, the information processing device 1 according to the present embodiment not only generates a plurality of photographed images taken by the camera 3, but also generates a composite image by combining these plurality of photographed images, and generates one or more composite images together with the plurality of photographed images. The composite images may be displayed side by side for selection by the user. FIG. 4 is a schematic diagram showing an example of a composite image. In this figure, four images, an "RGB image," an "infrared region image," a "red region image," and a "composite image" are displayed side by side on the display unit 34 of the information processing device 1.

「ＲＧＢ画像」は、カメラ３による通常の撮影画像、即ち回転フィルタ７が備える複数のフィルタ７ａ～７ｆのうち、実質的にフィルタなしでの撮影を行うものに切り替えてカメラ３が撮影した撮影画像である。「赤外線領域画像」は、所定の波長（所定範囲の波長）の赤外線を透過し、これ以外の波長の光を遮断するフィルタを介してカメラ３が撮影した撮影画像である。「赤色領域画像」は、可視光線の赤色の波長を透過し、これ以外の波長の光を遮断するフィルタ、いわゆる赤色フィルタを介してカメラ３が撮影した撮影画像である。これら３つの撮影画像は、カメラ３の撮影により得られる画像である。 The "RGB image" is an image normally taken by the camera 3, that is, an image taken by the camera 3 after switching to one of the plurality of filters 7a to 7f included in the rotating filter 7, which performs shooting substantially without a filter. It is. The "infrared region image" is an image taken by the camera 3 through a filter that transmits infrared rays of a predetermined wavelength (a predetermined range of wavelengths) and blocks light of other wavelengths. The "red area image" is an image taken by the camera 3 through a so-called red filter, which is a filter that transmits red wavelengths of visible light and blocks light of other wavelengths. These three photographed images are images obtained by photographing with the camera 3.

「合成画像」は、情報処理装置１が上記の「赤外線領域画像」及び「赤色領域画像」を合成した画像である。「赤外線領域画像」及び「赤色領域画像」は、共にカメラ３が撮影した撮影画像であるため、画像のサイズは同じであり、情報処理装置１は、「赤外線領域画像」及び「赤色領域画像」の対応する各画素について以下の（１）式の演算を行うことにより、合成画像の各画素の画素値を算出し、合成画像を生成する。 The "synthesized image" is an image that the information processing device 1 synthesizes the "infrared region image" and "red region image" described above. Since both the "infrared region image" and the "red region image" are images taken by the camera 3, the size of the images is the same, and the information processing device 1 uses the "infrared region image" and the "red region image". By calculating the following equation (1) for each corresponding pixel, the pixel value of each pixel of the composite image is calculated, and a composite image is generated.

合成画像の画素値＝（赤外線領域画像の画素値－赤色領域画像の画素値）／（赤外線領域画像の画素値＋赤色領域画像の画素値） …（１） Pixel value of composite image = (pixel value of infrared region image - pixel value of red region image) / (pixel value of infrared region image + pixel value of red region image) ... (1)

赤外線領域画像の画素値及び赤色領域画像を用いて上記の（１）式で生成される合成画像は、ＮＤＶＩ（Normalized Difference Vegetation Index、正規化差植生指数）の画像であり、植物による光の反射特性を示す。図４の合成画像では、画素値が大きい（即ち、画像中で白色又は白色に近い色）の領域は、植物が存在する領域である。例えばユーザが行うアノテーションの対象物が植物である場合、ＮＤＶＩの合成画像を表示することで、ユーザは画像中の植物の存在する場所を容易に把握することができる。 The composite image generated using the above equation (1) using the pixel values of the infrared region image and the red region image is an NDVI (Normalized Difference Vegetation Index) image, which is based on the reflection of light by plants. Show characteristics. In the composite image of FIG. 4, regions with large pixel values (that is, white or near-white colors in the image) are regions where plants exist. For example, if the object to be annotated by the user is a plant, by displaying an NDVI composite image, the user can easily understand the location of the plant in the image.

なお情報処理装置１が生成する合成画像は、上記のＮＤＶＩの画像に限らない。情報処理装置１は、例えば水の有無を可視化することができるＮＤＷＩ（Normalized Difference Water Index、正規化水指標）、積雪を可視化することができるＮＤＳＩ（Normalized Difference Snow Index、正規化積雪指標）、及び、砂又はコンクリート等の分布を示すＮＤＳＩ（Normalized Difference Soil Index、正規化土壌指標）等の画像を合成画像として生成してもよい。情報処理装置１は、カメラ３が撮影した複数の撮影画像の中から適宜の２つの画像を選択し、この２つの画像を上記の（１）式を拡張した以下の（２）式に基づいて合成することができる。また情報処理装置１は、撮影画像と合成画像とを更に合成してもよく、合成画像と合成画像とを更に合成してもよい。 Note that the composite image generated by the information processing device 1 is not limited to the above-mentioned NDVI image. The information processing device 1 includes, for example, an NDWI (Normalized Difference Water Index) that can visualize the presence or absence of water, an NDSI (Normalized Difference Snow Index) that can visualize snowfall, and An image such as NDSI (Normalized Difference Soil Index) showing the distribution of sand, concrete, etc. may be generated as a composite image. The information processing device 1 selects two appropriate images from among the plurality of images taken by the camera 3, and calculates these two images based on the following equation (2), which is an extension of the above equation (1). Can be synthesized. Further, the information processing device 1 may further combine the photographed image and the composite image, or may further combine the composite image and the composite image.

合成画像の画素値＝（第１画像の画素値－第２画像の画素値）／（第１画像の画素値＋第２画像の画素値） …（２） Pixel value of composite image = (pixel value of first image - pixel value of second image) / (pixel value of first image + pixel value of second image) ... (2)

なお情報処理装置１は、合成画像の生成及び表示を行わなくてもよい。どのような合成画像を生成するべきであるかは、アノテーションの対象物がどのようなものであるかによる。このため例えば、撮影画像の合成を行うか否か、及び、合成する撮影画像の組み合わせ等の設定を、アノテーションを行うユーザから情報処理装置１が受け付けて、設定された内容に応じて合成画像を生成してもよい。また例えば、アノテーションの作業の開始前に、情報処理装置１が複数種類の組み合わせで適宜に合成画像を生成し、生成した複数種類の合成画像を一覧表示してユーザに適した合成画像を選択させ、選択された合成画像の条件で以後の合成画像の生成を行ってもよい。 Note that the information processing device 1 does not need to generate and display the composite image. What kind of composite image should be generated depends on the type of object to be annotated. For this reason, for example, the information processing device 1 receives settings such as whether or not to synthesize captured images and the combination of captured images to be synthesized from the user performing the annotation, and creates the synthesized image according to the settings. May be generated. For example, before starting the annotation work, the information processing device 1 appropriately generates composite images by combining multiple types, displays a list of the generated multiple types of composite images, and allows the user to select a composite image suitable for the user. , a subsequent composite image may be generated under the conditions of the selected composite image.

図５は、赤外線領域画像の一例を示す模式図である。本図には、対象物としてリンゴを撮影した「ＲＧＢ画像」及び「赤外線領域画像」の２つの画像が情報処理装置１の表示部３４に並べて表示されている。ユーザは、例えば画像に写されたリンゴに存在する異常箇所（傷又は痛んだ箇所など）を特定するアノテーションを行う。ＲＧＢ画像では、リンゴの表面に模様が存在するため、異常箇所がいずれであるかをユーザが判別することが容易ではない。これに対して、特定の波長の赤外線に基づく撮影を行った赤外線領域画像では、リンゴの異常箇所（図中の白抜き矢印で指示された箇所）が正常箇所とは異なる階調で写されるため、ユーザが異常箇所を容易に判別することができる。 FIG. 5 is a schematic diagram showing an example of an infrared region image. In this figure, two images, an "RGB image" and an "infrared region image", obtained by photographing an apple as a target object, are displayed side by side on the display unit 34 of the information processing device 1. The user performs annotation to specify, for example, an abnormal location (such as a scratched or painful location) on the apple photographed in the image. In the RGB image, since there is a pattern on the surface of the apple, it is difficult for the user to determine which part is abnormal. On the other hand, in an infrared region image taken using infrared light of a specific wavelength, the abnormal part of the apple (the part indicated by the white arrow in the figure) is shown in a different gradation from the normal part. Therefore, the user can easily identify the abnormal location.

図６及び図７は、アノテーションの作業の一例を説明するための模式図である。本図には、対象物としてアボガドを撮影した赤外線領域画像に対して、アボガドの異常箇所を特定するアノテーションの作業をユーザが行う例が示されている。情報処理装置１は、例えばカメラ３が複数の波長に基づいて撮影した対象物の複数の撮影画像を表示部３４に並べて表示し、これらの撮影画像の中からアノテーションの作業に用いる撮影画像の選択をユーザから受け付ける。本例では、ユーザが赤外線領域画像を選択し、情報処理装置１は、複数の撮影画像の中から所定の赤外線に基づいて撮影された赤外線領域画像を表示部３４に拡大して表示する。 6 and 7 are schematic diagrams for explaining an example of annotation work. This figure shows an example in which a user performs annotation work to identify abnormalities on an avocado on an infrared region image taken of an avocado as a target object. The information processing device 1 displays, for example, a plurality of captured images of an object taken by the camera 3 based on a plurality of wavelengths side by side on the display unit 34, and selects a captured image to be used for an annotation work from among these captured images. is accepted from the user. In this example, the user selects an infrared region image, and the information processing device 1 enlarges and displays the infrared region image photographed based on a predetermined infrared light from among the plurality of photographed images on the display unit 34.

ユーザは、例えば情報処理装置１に操作部３５として備えられたマウス等の入力デバイスを利用して、表示部３４に表示されたアボガドの赤外線領域画像の中で、異常箇所に対するクリック等の操作を行う。図６上段には、アボガドの赤外線領域画像に対して異常箇所をマウスカーソル（白抜矢印）で指定する様子が示されている。情報処理装置１は、赤外線領域画像に対してユーザが指定した位置（座標）の画素の画素値を取得し、ユーザが指定した位置を含む異常箇所の候補領域を算出し、この候補領域を元の赤外線領域画像に重畳して表示する。図６下段には、元の赤外線領域画像に候補領域が多角形の図形として重畳して表示された様子が示されている。例えば情報処理装置１は、ユーザが指定した画素を含み、この画素に連なり、且つ、この画素と同じ画素値（この画素との画素値の差が所定範囲内）の複数の画素の集まりを候補領域とすることができる。 For example, the user uses an input device such as a mouse provided as the operation unit 35 of the information processing device 1 to perform an operation such as clicking on an abnormal part in the infrared region image of the avocado displayed on the display unit 34. conduct. The upper part of FIG. 6 shows how an abnormal location is specified with a mouse cursor (white arrow) in an infrared region image of an avocado. The information processing device 1 acquires the pixel value of a pixel at a position (coordinates) specified by the user in the infrared region image, calculates a candidate area for the abnormality including the position specified by the user, and uses this candidate area as the original. Displayed superimposed on the infrared region image. The lower part of FIG. 6 shows how the candidate area is displayed as a polygonal figure superimposed on the original infrared region image. For example, the information processing device 1 selects as candidates a collection of pixels that include a pixel specified by the user, are connected to this pixel, and have the same pixel value as this pixel (the difference in pixel value from this pixel is within a predetermined range). It can be a region.

ユーザは、例えば赤外線領域画像に対して重畳された候補領域に対して、形状又はサイズ等を変更する調整の作業を必要に応じて行うことができる。図７上段には、図６下段の候補領域に対してサイズを小さくする調整を行ったものが示されている。必要に応じたユーザの調整作業の完了後、情報処理装置１は、この候補領域をアノテーションの結果となる領域、即ちアボガドの異常領域として受け付ける。また更に情報処理装置１は、この確定した異常領域と同様の特徴を有する一又は複数の候補領域を赤外線領域画像から算出し、一又は複数の候補領域を元の赤外線領域画像に重畳して表示する。図７下段には、確定した異常領域と共に、複数の候補領域が赤外線領域画像に重畳して表示された様子が示されている。例えば情報処理装置１は、確定した異常領域に含まれる複数の画素の画素値の範囲を取得して、この画素値の範囲と同様の範囲の複数の画素が集まり、且つ、画素数が閾値を超える画像領域を抽出することで一又は候補領域とすることができる。 For example, the user can perform adjustment operations such as changing the shape or size of the candidate area superimposed on the infrared region image, as necessary. The upper part of FIG. 7 shows the candidate area in the lower part of FIG. 6 which has been adjusted to reduce its size. After the user completes the adjustment work as necessary, the information processing device 1 accepts this candidate area as an annotation result area, that is, an avocado abnormal area. Furthermore, the information processing device 1 calculates one or more candidate regions having the same characteristics as the determined abnormal region from the infrared region image, and displays the one or more candidate regions by superimposing them on the original infrared region image. do. The lower part of FIG. 7 shows how a plurality of candidate regions are displayed superimposed on the infrared region image together with the determined abnormal region. For example, the information processing device 1 acquires the pixel value range of a plurality of pixels included in the confirmed abnormal region, and determines whether a plurality of pixels having a similar range to this pixel value range are gathered and the number of pixels is less than the threshold value. By extracting an image area that exceeds the above, it can be set as one or a candidate area.

ユーザは、例えば表示された一又は複数の候補領域に対して、候補領域の削除及び調整等の作業を必要に応じて行うことができる。必要に応じたこれらの作業の完了後、情報処理装置１は、一又は複数の候補領域をアノテーションの結果となる領域、即ちアボガドの異常領域として受け付ける。情報処理装置１は、上記の手順で撮影画像に対するユーザのアノテーションの作業の結果を受け付け、アノテーションの結果を示すデータを生成する。 The user can, for example, perform operations such as deletion and adjustment of the candidate area on one or more displayed candidate areas, as necessary. After completing these tasks as necessary, the information processing device 1 accepts one or more candidate regions as an annotation result region, that is, an avocado abnormal region. The information processing device 1 receives the result of the user's annotation work on the photographed image according to the above-described procedure, and generates data indicating the annotation result.

情報処理装置１が生成するアノテーション結果のデータは、カメラ３の撮影画像と同じサイズの配列（行列）とすることができる。例えば撮影画像が１００画素×１００画素のサイズであれば、アノテーションの結果のデータは１００×１００の配列とし、画像の各画素に対応する配列の要素に異常であることを示す”１”又は正常であることを示す”０”のいずれかの値を情報処理装置が格納することで、アノテーションの結果のデータが生成される。なお、アノテーションにより設定される値は、“０”及び“１”等の２値に限るものではなく、例えば“０”、“１”及び“２”等の３値であってもよく、４値以上であってもよい。また、アノテーションの結果のデータの構成は、これに限るものではなく、例えば図７下段等に示した一又は複数の領域を示す多角形の頂点の座標を集めたデータであってもよい。 The annotation result data generated by the information processing device 1 can be an array (matrix) of the same size as the image captured by the camera 3. For example, if the captured image has a size of 100 pixels x 100 pixels, the annotation result data will be a 100 x 100 array, and the element of the array corresponding to each pixel of the image will be ``1'' indicating abnormality or normal. The information processing device stores any value of "0" indicating that the annotation result data is generated. Note that the value set by the annotation is not limited to binary values such as "0" and "1", but may be three values such as "0", "1", and "2", for example, It may be greater than or equal to the value. Furthermore, the structure of the data as a result of annotation is not limited to this, and may be data that collects the coordinates of the vertices of a polygon indicating one or more regions shown in the lower part of FIG. 7, for example.

なお、ユーザがアノテーションの作業を行う場合には、多数の画像に対して連続的に作業を繰り返して行う場合が多い。このような場合に情報処理装置１は、例えば最初のアノテーションの作業については図６及び図７に示した手順で処理を行い、２つ目以降のアノテーションの作業については手順の一部を省略してもよい。例えば最初のアノテーションの作業が赤外線領域画像を用いて行われた場合、情報処理装置１は、２つ目以降のアノテーションの作業について異なる光の波長に基づいて撮影された複数の撮影画像の一覧表示及び選択の手順を省略し、２つ目以降のアノテーションの作業ではユーザの選択を受け付けることなく赤外線領域画像を表示してもよい。また更に情報処理装置１は、最初のアノテーションの作業において異常領域とされた領域と同じ特徴を有する領域を２つ目以降のアノテーションの対称画像から抽出し、図７下段と同様に複数の候補領域としてユーザに提示してもよい。 Note that when a user performs annotation work, the user often continuously and repeatedly performs the work on a large number of images. In such a case, the information processing device 1 processes, for example, the first annotation task according to the steps shown in FIGS. 6 and 7, and omits some of the steps for the second and subsequent annotation tasks. You can. For example, when the first annotation work is performed using an infrared region image, the information processing device 1 displays a list of multiple captured images taken based on different wavelengths of light for the second and subsequent annotation work. The infrared region image may be displayed without accepting the user's selection in the second and subsequent annotation operations by omitting the selection procedure. Furthermore, the information processing device 1 extracts a region having the same characteristics as the region determined as an abnormal region in the first annotation work from the symmetrical images of the second and subsequent annotations, and extracts a plurality of candidate regions as in the lower part of FIG. may be presented to the user as

アノテーションの結果のデータを生成した情報処理装置１は、このデータを撮影画像と対応付けて学習用データとし、学習用データ記憶部１２ｂに記憶する。図８は、学習用データの一例を説明するための模式図である。図示の例では、カメラ３が回転フィルタ７を介した撮影を行うことにより、「ＲＧＢ画像」、「赤色領域画像」、「緑色領域画像」、「青色領域画像」、「赤外線領域画像」及び「紫外線領域画像」の６種類の撮影画像が得られるものとしている。また情報処理装置１は、例えば図６及び図７に示したように、「赤外線領域画像」に基づくユーザのアノテーションの作業を受け付けて、アノテーションの結果のデータを生成したものとする。 The information processing device 1 that has generated the annotation result data associates this data with the photographed image as learning data, and stores it in the learning data storage section 12b. FIG. 8 is a schematic diagram for explaining an example of learning data. In the illustrated example, the camera 3 captures images through the rotating filter 7, thereby producing "RGB images," "red region images," "green region images," "blue region images," "infrared region images," and " It is assumed that six types of photographed images can be obtained: ``ultraviolet region images''. Further, it is assumed that the information processing device 1 receives the user's annotation work based on the "infrared region image" and generates data as an annotation result, as shown in FIGS. 6 and 7, for example.

情報処理装置１は、例えば６種類の撮影画像のうちの１つである「ＲＧＢ画像」と、ユーザによるアノテーションの結果のデータとを対応付けたデータを学習用データとして生成し、学習用データ記憶部１２ｂに記憶することができる。このように、学習用データには、カメラ３が撮影した複数の撮影画像の全てを含んでもよく、その一部の画像のみを含んでもよい。また、学習用データに一部の撮影画像のみを含める場合、学習用データに含める撮影画像は、図示のように「ＲＧＢ画像」に限られるものではなく、例えば「赤外線領域画像」のみであってもよく、他の画像であってもよい。また学習用データの画像には、例えば複数の撮影画像を合成した合成画像が含まれていてもよく、合成画像のみとアノテーションの結果のデータとが対応付けられて学習用データとされてもよい。 The information processing device 1 generates, as learning data, data in which an "RGB image" that is one of six types of captured images is associated with data as a result of annotation by the user, and stores the learning data. It can be stored in the section 12b. In this way, the learning data may include all of the plurality of images captured by the camera 3, or may include only some of the images. In addition, when only some captured images are included in the learning data, the captured images to be included in the learning data are not limited to "RGB images" as shown in the figure, but may include only "infrared region images", for example. It is also possible to use other images. Further, the images of the training data may include, for example, a composite image obtained by combining multiple captured images, or only the composite image and the data resulting from annotation may be associated with each other and used as the training data. .

学習用データにいずれの撮影画像を含むかは、この学習用データを用いて機械学習がなされる学習モデルの使用目的又は構成等により定まる。例えば、通常のカメラが撮影した撮影画像（ＲＧＢ画像）を基に異常検出を行う学習モデルを機械学習にて生成する場合、学習用データにはＲＧＢ画像とアノテーションの結果のデータとが含まれ得る。また例えば、赤外線による撮影を行うカメラに組み込まれて対象物を検出する学習モデルを生成する場合、学習用データには赤外線領域画像とアノテーションの結果のデータとが含まれ得る。また例えば、回転フィルタ７と同様に複数のフィルタを切り替えて撮影可能なカメラに組み込まれる学習モデルを生成する場合、学習用データにはカメラが有するフィルタに対応する複数の画像とアノテーションの結果のデータとが含まれ得る。情報処理装置１は、学習用データにいずれの撮影画像を含めるかについて、例えばユーザから設定を受け付けてもよい。 Which photographed images are included in the learning data is determined by the purpose of use or configuration of the learning model in which machine learning is performed using the learning data. For example, when using machine learning to generate a learning model that detects anomalies based on captured images (RGB images) taken by a normal camera, the learning data may include RGB images and annotation result data. . Further, for example, when generating a learning model that is incorporated into a camera that performs infrared photography to detect a target object, the learning data may include an infrared region image and annotation result data. For example, when generating a learning model to be incorporated into a camera that can take pictures by switching between multiple filters in the same manner as the rotating filter 7, the learning data includes multiple images corresponding to the filters possessed by the camera and annotation result data. may be included. The information processing device 1 may accept, for example, settings from the user regarding which captured images are included in the learning data.

＜フローチャート＞
図９は、本実施の形態に係る情報処理装置１が行うアノテーション処理の手順を示すフローチャートである。本実施の形態に係る情報処理装置１の処理部１１の撮影画像取得部１１ａは、カメラ３が回転フィルタ７を介して撮影した複数の撮影画像を予め取得し、記憶部１２に記憶して蓄積している。処理部１１のアノテーション処理部１１ｂは、記憶部１２に蓄積された撮影画像の中から、１セットの複数の撮影画像を取得する（ステップＳ１）。アノテーション処理部１１ｂは、例えばユーザによる合成処理の要否の設定に基づいて、ステップＳ１にて取得した複数の撮影画像を適宜に組み合わせて合成画像を生成する（ステップＳ２）。 <Flowchart>
FIG. 9 is a flowchart showing the procedure of annotation processing performed by the information processing device 1 according to the present embodiment. The captured image acquisition unit 11a of the processing unit 11 of the information processing device 1 according to the present embodiment acquires in advance a plurality of captured images captured by the camera 3 via the rotating filter 7, and stores and accumulates them in the storage unit 12. are doing. The annotation processing unit 11b of the processing unit 11 acquires one set of a plurality of captured images from among the captured images accumulated in the storage unit 12 (step S1). The annotation processing unit 11b generates a composite image by appropriately combining the plurality of photographed images acquired in step S1, based on, for example, the user's setting of whether or not composition processing is necessary (step S2).

処理部１１の表示処理部１１ｃは、ステップＳ１にて取得した撮影画像及びステップＳ２にて生成した合成画像を含む複数の画像を表示部３４に表示する（ステップＳ３）。アノテーション処理部１１ｂは、表示部３４に表示した複数の画像の中から、アノテーションの作業に用いる一又は複数の画像の選択を受け付ける（ステップＳ４）。表示処理部１１ｃは、ステップＳ４にて選択された一又は複数の画像を表示部３４に表示する（ステップＳ５）。 The display processing unit 11c of the processing unit 11 displays a plurality of images including the captured image acquired in step S1 and the composite image generated in step S2 on the display unit 34 (step S3). The annotation processing section 11b receives selection of one or more images to be used for annotation work from among the plurality of images displayed on the display section 34 (step S4). The display processing unit 11c displays the one or more images selected in step S4 on the display unit 34 (step S5).

アノテーション処理部１１ｂは、表示部３４に表示した画像に対する異常箇所などのアノテーションの対象箇所（位置、座標など）の指定を、ユーザによる操作部３５の操作に基づいて受け付ける（ステップＳ６）。アノテーション処理部１１ｂは、ステップＳ６にて受け付けた対象箇所を含む、異常領域などのアノテーションの対象領域の候補となる候補領域を算出し、元の画像に重畳して候補領域を表示する（ステップＳ７）。アノテーション処理部１１ｂは、表示した候補領域に対する形状又はサイズ等の調整を、ユーザによる操作部３５の操作に基づいて受け付ける（ステップＳ８）。アノテーション処理部１１ｂは、例えばユーザによる確定の操作が行われたことに応じて、必要に応じて修正がなされた候補領域を、アノテーションの対象領域として確定する（ステップＳ９）。 The annotation processing unit 11b receives a designation of an annotation target location (position, coordinates, etc.) such as an abnormal location for the image displayed on the display unit 34 based on the user's operation on the operation unit 35 (step S6). The annotation processing unit 11b calculates a candidate area that is a candidate for an annotation target area, such as an abnormal area, including the target location received in step S6, and displays the candidate area by superimposing it on the original image (step S7 ). The annotation processing unit 11b receives adjustments to the shape or size of the displayed candidate area based on the user's operation of the operation unit 35 (step S8). For example, in response to a user's confirmation operation, the annotation processing unit 11b determines the candidate region, which has been modified as necessary, as the annotation target region (step S9).

次いでアノテーション処理部１１ｂは、ステップＳ９にて確定した対称領域と類似する一又は複数の他の候補領域を算出し、元の画像に重畳して他の候補領域を表示する（ステップＳ１０）。アノテーション処理部１１ｂは、表示した他の候補領域に対する形状又はサイズ等の調整を、ユーザによる操作部３５の操作に基づいて受け付ける（ステップＳ１１）。アノテーション処理部１１ｂは、例えばユーザによる確定の操作が行われたことに応じて、必要に応じて修正がなされた他の候補領域を、アノテーションの対象領域として確定する（ステップＳ１２）。 Next, the annotation processing unit 11b calculates one or more other candidate regions similar to the symmetrical region determined in step S9, and displays the other candidate regions by superimposing them on the original image (step S10). The annotation processing unit 11b receives adjustments to the shape, size, etc. of the displayed other candidate areas based on the user's operation of the operation unit 35 (step S11). For example, in response to a confirmation operation performed by the user, the annotation processing unit 11b confirms another candidate area, which has been modified as necessary, as the annotation target area (step S12).

アノテーション処理部１１ｂは、ステップＳ９及びＳ１２にて確定した対象領域に関する情報を含むアノテーション結果のデータを生成する（ステップＳ１３）。アノテーション処理部１１ｂは、ステップＳ１にて取得した複数の撮影画像の中から一又は複数の撮影画像を適宜に選択し、選択した一又は複数の撮影画像と、ステップＳ１３にて生成したアノテーション結果のデータとを対応付けた学習用データを生成し、学習用データ記憶部１２ｂに記憶して（ステップＳ１４）、処理を終了する。 The annotation processing unit 11b generates annotation result data including information regarding the target area determined in steps S9 and S12 (step S13). The annotation processing unit 11b appropriately selects one or more captured images from the plurality of captured images acquired in step S1, and combines the selected one or more captured images with the annotation result generated in step S13. The learning data associated with the data is generated and stored in the learning data storage section 12b (step S14), and the process ends.

図１０は、本実施の形態に係る情報処理装置１が行うアノテーション処理の手順を示すフローチャートである。本フローチャートは、ユーザがアノテーションの作業を続けて繰り返し行う場合に、２つ目以降のアノテーションの作業を図９のフローチャートに示した処理の一部を省略して実施する手順である。情報処理装置１は、２つ目以降のアノテーションの作業について、図１０に示す手順又は図９に示す手順のいずれを採用してもよく、いずれの手順で処理を行うかをユーザの選択に基づいて決定してもよい。 FIG. 10 is a flowchart showing the procedure of annotation processing performed by the information processing device 1 according to the present embodiment. This flowchart is a procedure for performing the second and subsequent annotation operations by omitting part of the processing shown in the flowchart of FIG. 9 when the user repeatedly performs annotation operations. The information processing device 1 may adopt either the procedure shown in FIG. 10 or the procedure shown in FIG. 9 for the second and subsequent annotation operations, and determines which procedure to perform processing based on the user's selection. It may be determined by

本実施の形態に係る情報処理装置１の処理部１１のアノテーション処理部１１ｂは、記憶部１２に蓄積された撮影画像の中から、１セットの複数の撮影画像を取得する（ステップＳ３１）。アノテーション処理部１１ｂは、例えばユーザによる合成処理の要否の設定に基づいて、ステップＳ３１にて取得した複数の撮影画像を適宜に組み合わせて合成画像を生成する（ステップＳ３２）。 The annotation processing unit 11b of the processing unit 11 of the information processing device 1 according to the present embodiment acquires one set of a plurality of captured images from among the captured images accumulated in the storage unit 12 (step S31). The annotation processing unit 11b generates a composite image by appropriately combining the plurality of photographed images obtained in step S31, based on, for example, the user's setting of whether or not composition processing is necessary (step S32).

アノテーション処理部１１ｂは、ステップＳ１にて取得した撮影画像及びステップＳ２にて生成した合成画像を含む複数の画像の中から、前回のアノテーション処理においてユーザが選択した画像と同じ条件の画像を選択し、選択した画像を表示部３４に表示する（ステップＳ３３）。またアノテーション処理部１１ｂは、前回のアノテーション処理において確定した対象領域と同じ特徴を有する一又は複数の候補領域を算出し、元の画像に重畳して一又は複数の候補領域を表示する（ステップＳ３４）。 The annotation processing unit 11b selects an image with the same conditions as the image selected by the user in the previous annotation process from among a plurality of images including the captured image acquired in step S1 and the composite image generated in step S2. , displays the selected image on the display section 34 (step S33). The annotation processing unit 11b also calculates one or more candidate regions having the same characteristics as the target region determined in the previous annotation process, and displays the one or more candidate regions by superimposing them on the original image (step S34 ).

アノテーション処理部１１ｂは、表示した候補領域に対する形状又はサイズ等の調整を、ユーザによる操作部３５の操作に基づいて受け付ける（ステップＳ３５）。アノテーション処理部１１ｂは、例えばユーザによる確定の操作が行われたことに応じて、必要に応じて修正がなされた一又は複数の候補領域を、アノテーションの対象領域として確定する（ステップＳ３６）。 The annotation processing unit 11b receives adjustments to the shape or size of the displayed candidate area based on the user's operation of the operation unit 35 (step S35). For example, in response to a user's confirmation operation, the annotation processing unit 11b determines one or more candidate regions, which have been modified as necessary, as the annotation target region (step S36).

アノテーション処理部１１ｂは、ステップＳ３６にて確定した対象領域に関する情報を含むアノテーション結果のデータを生成する（ステップＳ３７）。アノテーション処理部１１ｂは、ステップＳ３１にて取得した複数の撮影画像の中から一又は複数の撮影画像を適宜に選択し、選択した一又は複数の撮影画像と、ステップＳ３７にて生成したアノテーション結果のデータとを対応付けた学習用データを生成し、学習用データ記憶部１２ｂに記憶して（ステップＳ３８）、処理を終了する。 The annotation processing unit 11b generates annotation result data including information regarding the target area determined in step S36 (step S37). The annotation processing unit 11b appropriately selects one or more captured images from the plurality of captured images acquired in step S31, and combines the selected one or more captured images with the annotation result generated in step S37. The learning data associated with the data is generated and stored in the learning data storage section 12b (step S38), and the process ends.

＜まとめ＞
以上の構成の本実施の形態に係る情報処理システムでは、異なる光の波長に基づいて同じ対象物をカメラ３が撮影した複数の撮影画像を情報処理装置１が取得し、取得した複数の撮影画像のうちの少なくとも１つの撮影画像に基づいて、複数の撮影画像のうちの少なくとも１つの撮影画像に対するアノテーション結果のデータ（関連情報）を付与する。なお、アノテーションの作業に用いる少なくとも１つの撮影画像と、アノテーション結果のデータを付与する少なくとも１つの撮影画像とは、同じ撮影画像であってもよく、異なる撮影画像であってもよい。異なる光の波長に基づいて撮影された複数の撮影画像をアノテーションに用いることで、単一の波長に基づいて撮影された１つの撮影画像を基にアノテーションを行う場合と比較して、ユーザによるアノテーションの作業を容易化及び高精度化することが期待できる。 <Summary>
In the information processing system according to the present embodiment configured as described above, the information processing device 1 acquires a plurality of photographed images of the same object taken by the camera 3 based on different wavelengths of light, and the acquired plurality of photographed images Based on at least one of the captured images, annotation result data (related information) is provided to at least one of the plurality of captured images. Note that at least one photographed image used for the annotation work and at least one photographed image to which annotation result data is added may be the same photographed image or may be different photographed images. By using multiple captured images taken based on different wavelengths of light for annotation, the annotation by the user becomes easier compared to when annotation is performed based on one captured image taken based on a single wavelength. It is expected that this work will become easier and more accurate.

また本実施の形態に係る情報処理システムでは、情報処理装置１が複数の撮影画像のうちの少なくとも１つの撮影画像に対する画像領域の指定をユーザから受け付け、受け付けた画像領域の類似領域を候補領域として撮影画像から抽出し、指定された画像領域及び抽出した候補領域をアノテーションの対象領域とし、これら対象領域に関する情報を関連情報として撮影画像に付与する。また情報処理装置１は、ユーザの指定による画像領域及びこれに類似する候補領域を元の撮影画像に重畳して表示してもよい。これらによりユーザは、撮影画像に含まれる複数の領域をアノテーションの対象領域として容易に指定することができる。 Further, in the information processing system according to the present embodiment, the information processing device 1 receives from the user a designation of an image area for at least one of the plurality of captured images, and selects a similar area of the accepted image area as a candidate area. The designated image area and the extracted candidate area are extracted from the photographed image and are set as target areas for annotation, and information regarding these target areas is added to the photographed image as related information. Further, the information processing device 1 may display an image area specified by the user and a candidate area similar thereto in a superimposed manner on the original photographed image. These allow the user to easily specify multiple areas included in the captured image as target areas for annotation.

また本実施の形態に係る情報処理システムでは、情報処理装置１が撮影画像に対する対象箇所（位置、座標など）の選択をユーザから受け付け、受け付けた対象箇所を含む画像領域を決定し、撮影画像に重畳して画像領域を表示する。このときに情報処理装置１は、例えば受け付けた対象箇所の特徴と類似する特徴を有する箇所を撮影画像から抽出することで画像領域を決定することができる。情報処理装置１は、表示した画像領域に対する形状又はサイズ等の修正をユーザから必要に応じて受け付け、必要に応じて修正がなされた（不要であれば修正がなされない）画像領域をアノテーションの対象領域の指定として受け付ける。これによりユーザは、マウス又はペンタブレット等の入力デバイスで画像中の対象箇所を選択することで、この箇所を含む領域をアノテーションの対象領域として容易に指定することができる。 Further, in the information processing system according to the present embodiment, the information processing device 1 receives a selection of a target location (position, coordinates, etc.) for a photographed image from a user, determines an image area including the accepted target location, and selects a target location for a photographed image. Display image areas in a superimposed manner. At this time, the information processing device 1 can determine the image area by, for example, extracting from the captured image a location that has features similar to the features of the received target location. The information processing device 1 accepts modifications to the displayed image region, such as shape or size, from the user as necessary, and subjects the image region that has been modified as necessary (or not modified if unnecessary) to be annotated. Accepted as area specification. Thereby, by selecting a target location in an image with an input device such as a mouse or a pen tablet, the user can easily specify an area including this location as an annotation target area.

また本実施の形態に係る情報処理システムでは、情報処理装置１が撮影画像の取得及び関連情報の付与等のアノテーションの処理を繰り返し行う。情報処理装置１は、今回のアノテーションのための複数の撮影画像を取得した後、前回のアノテーションの結果（関連情報の付与の結果）に基づいて、今回の撮影画像から一又は複数の候補領域を抽出し、撮影画像に重畳して候補領域を表示する。このときに情報処理装置１は、例えば前回のアノテーションにおいてユーザが選択した画像領域と同じ特徴を有する画像領域を今回の撮影画像から候補領域として抽出することができる。情報処理装置１は、表示した候補領域に対する形状又はサイズ等の修正をユーザから必要に応じて受け付け、必要に応じて修正がなされた（不要であれば修正がなされない）候補領域をアノテーションの対象領域とし、この候補領域に関する情報を関連情報として撮影画像に付与する。これにより、同じ処理が繰り返されるアノテーションの作業においてユーザの負担を低減することが期待できる。 Further, in the information processing system according to the present embodiment, the information processing device 1 repeatedly performs annotation processing such as acquisition of captured images and addition of related information. After acquiring a plurality of captured images for the current annotation, the information processing device 1 selects one or more candidate regions from the current captured image based on the results of the previous annotation (results of adding related information). The candidate area is extracted and superimposed on the captured image to display the candidate area. At this time, the information processing device 1 can extract, for example, an image region having the same characteristics as the image region selected by the user in the previous annotation from the currently captured image as a candidate region. The information processing device 1 accepts modifications to the displayed candidate region, such as shape or size, from the user as necessary, and selects the candidate region that has been modified as necessary (or not modified if unnecessary) as an annotation target. The information regarding this candidate area is added to the photographed image as related information. This can be expected to reduce the burden on the user in annotation work where the same process is repeated.

また本実施の形態に係る情報処理システムでは、情報処理装置１が少なくとも２つの撮影画像を合成した合成画像を生成し、生成した合成画像に基づいて関連情報の付与を行う。これにより、カメラ３が撮影した複数の撮影画像に加えて、情報処理装置１が生成した合成画像に基づいてユーザはアノテーションの作業を行うことができるため、アノテーション作業の更なる容易化及び高精度化等が期待できる。 Further, in the information processing system according to the present embodiment, the information processing device 1 generates a composite image by combining at least two captured images, and adds related information based on the generated composite image. As a result, the user can perform annotation work based on the composite image generated by the information processing device 1 in addition to the plurality of images taken by the camera 3, making the annotation work easier and more accurate. We can expect that

また本実施の形態に係る情報処理システムでは、カメラ３が撮影する複数の撮影画像には、可視光に基づいて撮影した可視光画像（例えばＲＧＢ画像）と、非可視光に基づいて撮影した非可視光画像（例えば赤外線領域画像又は紫外線領域画像等）とを含む。情報処理装置１は、非可視光画像に基づいてアノテーションの処理を行い、アノテーション結果の関連情報を可視光画像に付与して学習用データを生成する。このような学習用データを用いた機械学習を行うことにより、例えば可視光で撮影された撮影画像を扱う学習モデルを精度よく生成することが期待できる。 Further, in the information processing system according to the present embodiment, the plurality of captured images captured by the camera 3 include visible light images captured based on visible light (for example, RGB images) and non-visible light images captured based on invisible light. visible light images (for example, infrared region images or ultraviolet region images). The information processing device 1 performs annotation processing based on the non-visible light image, adds related information of the annotation result to the visible light image, and generates learning data. By performing machine learning using such learning data, it is expected that, for example, a learning model that handles images taken with visible light can be generated with high accuracy.

なお本実施の形態においては、例えば図１に示したようにカメラ３及び回転フィルタ７を別装置としているが、これに限るものではなく、回転フィルタ７又はこれと同等の機構をカメラ３が内蔵して備えていてもよい。また本実施の形態に係る情報処理システムでは、カメラ３が撮影した画像を情報処理装置１が取得し、ユーザが情報処理装置１を用いてアノテーションの作業を行う構成としたが、これに限るものではなく、カメラ３に表示部及び操作部等を設けて、ユーザがカメラ３を用いてアノテーションの作業を行う構成であってもよい。またカメラ３が無線通信等により遠隔のサーバ装置等へ撮影画像を送信し、情報処理装置１がサーバ装置と通信を行って撮影画像を取得してもよい。また、図４～図７に示した画像は一例であって、本実施の形態に係る情報処理システムはどのような撮影画像を扱ってもよく、どのようなアノテーションの処理を行ってもよい。 In this embodiment, the camera 3 and the rotary filter 7 are separate devices as shown in FIG. You may prepare by doing so. Further, in the information processing system according to the present embodiment, the information processing device 1 acquires the image taken by the camera 3, and the user performs annotation work using the information processing device 1. However, the configuration is not limited to this. Instead, a configuration may be adopted in which the camera 3 is provided with a display section, an operation section, etc., and the user performs the annotation work using the camera 3. Alternatively, the camera 3 may transmit the photographed image to a remote server device or the like by wireless communication or the like, and the information processing device 1 may communicate with the server device to acquire the photographed image. Furthermore, the images shown in FIGS. 4 to 7 are examples, and the information processing system according to this embodiment may handle any captured images and may perform any annotation processing.

また本実施の形態においては、いわゆるセグメンテーションを行う学習モデルの機械学習に用いる学習用データを生成するアノテーションの作業を例に説明を行ったが、これに限るものではない。例えば、撮影画像に写されたものを複数のクラスに分類する学習モデル又は撮影画像に写されたものの異常の有無を判定する学習モデル等の機械学習に用いる学習用データ、即ち撮影画像に対してクラス又は異常有無を示す１つのラベルが付された学習用データを生成するアノテーションの作業についても本技術は適用可能である。この場合に情報処理装置１は、例えばカメラ３が撮影した複数の撮影画像及びこれらの撮影画像から適宜に生成した合成画像を表示部３４に一覧表示し、ユーザからラベルの入力を受け付けて、撮影画像の少なくとも１つと入力されたラベルとを対応付けた学習用データを生成する。ユーザは、単にＲＧＢ画像のみを見てラベルを付す場合と比較して、異なる光の波長に基づいて撮影された複数の撮影画像を見てラベルを付す作業を行うことで、撮影画像に写されたもののクラス又は異常の有無等を容易に判断することができる。 Further, in this embodiment, annotation work for generating learning data used for machine learning of a learning model that performs so-called segmentation has been described as an example, but the present invention is not limited to this. For example, learning data used for machine learning, such as a learning model that classifies objects captured in a captured image into multiple classes, or a learning model that determines the presence or absence of abnormalities in captured images, that is, learning data for captured images. The present technology is also applicable to annotation work that generates learning data to which a single label indicating the presence or absence of a class or abnormality is attached. In this case, the information processing device 1 displays a list of a plurality of captured images captured by the camera 3 and a composite image appropriately generated from these captured images on the display unit 34, receives a label input from the user, and selects the captured images. Training data is generated in which at least one of the images is associated with the input label. Compared to simply looking at RGB images and labeling them, the user can see and label multiple images taken based on different wavelengths of light, thereby making it easier to label the images. It is possible to easily determine the class of the object and the presence or absence of an abnormality.

今回開示された実施形態はすべての点で例示であって、制限的なものではないと考えられるべきである。本発明の範囲は、上記した意味ではなく、特許請求の範囲によって示され、特許請求の範囲と均等の意味及び範囲内でのすべての変更が含まれることが意図される。 The embodiments disclosed herein are illustrative in all respects and should not be considered restrictive. The scope of the present invention is indicated by the claims rather than the above-mentioned meaning, and is intended to include meanings equivalent to the claims and all changes within the scope.

各実施形態に記載した事項は相互に組み合わせることが可能である。また、特許請求の範囲に記載した独立請求項及び従属請求項は、引用形式に関わらず全てのあらゆる組み合わせにおいて、相互に組み合わせることが可能である。さらに、特許請求の範囲には他の２以上のクレームを引用するクレームを記載する形式（マルチクレーム形式）を用いているが、これに限るものではない。マルチクレームを少なくとも１つ引用するマルチクレーム（マルチマルチクレーム）を記載する形式を用いて記載してもよい。 Items described in each embodiment can be combined with each other. Moreover, the independent claims and dependent claims recited in the claims may be combined with each other in any and all combinations, regardless of the form in which they are cited. Further, although the scope of claims uses a format in which claims refer to two or more other claims (multi-claim format), the invention is not limited to this format. It may be written using a format that describes multiple claims (multi-multi-claims) that cite at least one multiple claim.

１情報処理装置
３カメラ
７回転フィルタ
７ａ～７ｆフィルタ
１１処理部
１１ａ撮影画像取得部
１１ｂアノテーション処理部
１１ｃ表示処理部
１２記憶部
１２ａプログラム
１２ｂ学習用データ記憶部
１３通信部
１４表示部
１５操作部 1 Information processing device 3 Camera 7 Rotation filter 7a to 7f Filter 11 Processing unit 11a Photographed image acquisition unit 11b Annotation processing unit 11c Display processing unit 12 Storage unit 12a Program 12b Learning data storage unit 13 Communication unit 14 Display unit 15 Operation unit

一実施形態に係る情報処理方法は、情報処理装置が、第１波長帯の光に基づいて対象物を撮影した第１の撮影画像及び前記第１波長帯とは異なる第２波長帯の光に基づいて前記対象物を撮影した第２の撮影画像を含む、それぞれが異なる光の波長に基づいて同じ前記対象物を撮影した複数の撮影画像を取得し、取得した複数の撮影画像のうちの少なくとも前記第２の撮影画像に基づいて、前記第１の撮影画像に対する関連情報の付与を行い、前記第１の撮影画像及び前記関連情報を対応付けたデータを出力し、前記データには前記第２の撮影画像を含まない。 In an information processing method according to an embodiment, an information processing apparatus captures a first captured image of an object based on light in a first wavelength band and a second captured image based on light in a second wavelength band different from the first wavelength band. A plurality of photographed images of the same object are acquired based on different wavelengths of light, including a second photographed image of the target object, each of which is photographed based on a different wavelength of light, and at least one of the plurality of acquired images is obtained. Based on the second photographed image, related information is added to the first photographed image, and data in which the first photographed image and the related information are associated is outputted, and the data includes the first photographed image. Does not include the photographed image 2 .

一実施形態に係る情報処理方法は、情報処理装置が、第１波長帯の光に基づいて対象物を撮影した第１の撮影画像及び前記第１波長帯とは異なる第２波長帯の光に基づいて前記対象物を撮影した第２の撮影画像を含む、それぞれが異なる光の波長に基づいて同じ前記対象物を撮影した複数の撮影画像を取得し、取得した複数の撮影画像のうちの少なくとも前記第２の撮影画像に対する画像領域の指定を受け付け、前記画像領域に関する情報を関連情報として、取得した複数の撮影画像に含まれる前記第１の撮影画像に対して付与し、複数の撮影画像の取得及び前記第１の撮影画像に対する関連情報の付与を繰り返し行い、取得した複数の画像のうちの少なくとも前記第２の撮影画像から、前回の前記関連情報の付与の結果に基づいて、一又は複数の候補領域を抽出し、前記第２の撮影画像に重畳して、抽出した前記候補領域を表示し、表示した前記候補領域の修正を受け付け、修正を受け付けた前記候補領域に関する情報を前記関連情報として、取得した複数の撮影画像に含まれる前記第１の撮影画像に対して付与し、前記第１の撮影画像及び前記関連情報を対応付けたデータを出力し、前記データには前記第２の撮影画像を含まない。 In an information processing method according to an embodiment, an information processing apparatus captures a first captured image of an object based on light in a first wavelength band and a second captured image based on light in a second wavelength band different from the first wavelength band. A plurality of photographed images of the same object are acquired based on different wavelengths of light, including a second photographed image of the target object, each of which is photographed based on a different wavelength of light, and at least one of the plurality of acquired images is obtained. A designation of an image area for the second photographed image is accepted, and information regarding the image area is given as related information to the first photographed image included in the plurality of acquired photographed images. and adding related information to the first photographed image are repeatedly performed, and from at least the second photographed image among the plurality of acquired images, one or A plurality of candidate regions are extracted, superimposed on the second photographed image, the extracted candidate regions are displayed, a modification of the displayed candidate region is accepted, and information regarding the candidate region for which modification has been accepted is transmitted to the related As information, data is attached to the first photographed image included in the plurality of acquired photographic images, and data in which the first photographed image and the related information are associated is output, and the data includes the second photographed image. Does not include captured images.

Claims

The information processing device
Obtain multiple images of the same object, each using a different wavelength of light,
Adding related information to at least one of the plurality of captured images based on at least one of the acquired plurality of captured images;
Information processing method.

The information processing device
Accepting designation of an image area for at least one photographed image among the plurality of acquired photographed images,
Extracting a similar region to the accepted image region from the photographed image,
providing information regarding the image area and the similar area as the related information;
The information processing method according to claim 1.

The information processing device
displaying the image region and the similar region superimposed on the photographed image;
The information processing method according to claim 2.

The information processing device
Accepting selection of a target location for at least one of the plurality of acquired images,
determining an image area including the accepted target location;
displaying the determined image area superimposed on the captured image;
Accepting corrections to the displayed image area,
accepting the image area for which modification has been accepted as the designation;
The information processing method according to claim 2 or 3.

The information processing device
Repeatedly acquiring the photographed image and adding the related information,
extracting one or more candidate regions from at least one photographed image among the plurality of acquired images, based on the result of the previous addition of the related information;
displaying the extracted candidate area superimposed on the captured image;
Accepting corrections to the displayed candidate area,
adding information regarding the candidate area for which modification has been accepted as the related information;
The information processing method according to claim 2 or 3.

The information processing device
Generating a composite image by combining at least two of the plurality of captured images,
Adding related information to at least one photographed image among the plurality of photographed images based on the generated composite image;
The information processing method according to claim 1.

The plurality of photographed images include a visible light image photographed based on visible light and a non-visible light image photographed based on non-visible light,
the information processing device adds related information to the visible light image based on the non-visible light image;
The information processing method according to claim 1.

to the computer,
Obtain multiple images of the same object, each using a different wavelength of light,
A computer program that causes a computer program to execute a process of adding related information to at least one of the plurality of captured images based on at least one of the plurality of captured images.

an acquisition unit that acquires a plurality of captured images of the same object, each of which is captured based on a different wavelength of light;
An information processing device, comprising: a provision unit that adds related information to at least one of the plurality of captured images, based on at least one of the acquired plurality of captured images.