JP2021064139A

JP2021064139A - Information processing apparatus, information processing method and information processing program

Info

Publication number: JP2021064139A
Application number: JP2019188126A
Authority: JP
Inventors: 裕介三木; Yusuke Miki; 寿英三宅; Toshihide Miyake; 雅弘藤丸; Masahiro Fujimaru; 恒男牧; Tsuneo Maki; 雅史桑野; Masafumi Kuwano
Original assignee: Hitachi Zosen Corp; Tokyo Eco Service Co Ltd
Current assignee: Hitachi Zosen Corp; Tokyo Eco Service Co Ltd
Priority date: 2019-10-11
Filing date: 2019-10-11
Publication date: 2021-04-22
Anticipated expiration: 2039-10-11
Also published as: CN112651281A; JP7385417B2

Abstract

To improve detection accuracy of detection using machine-learned models.SOLUTION: An information processing apparatus (1) comprises a first detection unit (101) that inputs input data into a first learned model to detect a detection target, wherein the first learned model is capable of detecting a plurality of types of detection targets; and a second detection unit (102) that inputs the input data into a second learned model to detect a second detection target, wherein the second learned model is capable of detecting at least a part of the plurality of types of detection targets. The information processing apparatus determines the final detection result based on a detection result by both detection units.SELECTED DRAWING: Figure 1

Description

本発明は、機械学習により構築された学習済みモデルを用いて検出対象を検出する情報処理装置等に関する。 The present invention relates to an information processing device or the like that detects a detection target using a trained model constructed by machine learning.

近年、深層学習などの機械学習の発展により、画像上での物体の認識・検出精度が向上し、画像認識を用いた用途が広がりつつある。しかし、現状の検出精度は１００％ではないため、用途をさらに広げるためにさらなる工夫が必要となる。 In recent years, with the development of machine learning such as deep learning, the recognition and detection accuracy of objects on images has improved, and the applications using image recognition are expanding. However, since the current detection accuracy is not 100%, further ingenuity is required to further expand the application.

例えば、下記の特許文献１に記載の画像認識装置は、物体を撮影した画像について複数種類のテンプレートによりパターンマッチングを行う。そして、この画像認識装置は、複数種類のテンプレートとのパターン一致が判定された場合であって、テンプレートそれぞれ同士が重なる度合が閾値以上である場合は、テンプレート内の少なくとも一つに係る認識対象物であると認識する。これにより、一種類のテンプレートによりパターンマッチングを行う場合と比べて認識精度を向上させることが可能になる。 For example, the image recognition device described in Patent Document 1 below performs pattern matching on an image of an object using a plurality of types of templates. Then, this image recognition device determines that the pattern matches with a plurality of types of templates, and when the degree of overlap between the templates is equal to or greater than the threshold value, the recognition target object pertaining to at least one of the templates. Recognize that. This makes it possible to improve the recognition accuracy as compared with the case where pattern matching is performed by one type of template.

特開２００８−１６５３９４号公報（２００８年７月１７日公開）Japanese Unexamined Patent Publication No. 2008-165394 (published on July 17, 2008)

しかしながら、上述のような従来技術には、検出精度を向上させる余地がある。例えば、歩行者検知用のテンプレートと看板検知用のテンプレートを用いた場合、歩行者検知用のテンプレートで歩行者の検出漏れが生じた場合、看板検知用のテンプレートでは歩行者は検知されないので、歩行者の検出漏れを補う手段はない。また、歩行者用のテンプレートで誤検出が生じた場合（例えば街路樹を歩行者と誤認識した場合）にも、その誤検出を補う手段はない。また、検出対象が物体ではない場合（例えば音のデータを学習済みモデルに入力して、所定の音成分を検出する場合）に、特許文献１のような複数のテンプレートを用いたときにも、同様に検出精度を向上させる余地がある。 However, in the above-mentioned prior art, there is room for improving the detection accuracy. For example, when a pedestrian detection template and a signboard detection template are used, if a pedestrian detection omission occurs in the pedestrian detection template, the signboard detection template does not detect the pedestrian, so walking. There is no way to make up for the omission of detection by a person. Further, even when a false detection occurs in the template for pedestrians (for example, when a roadside tree is mistakenly recognized as a pedestrian), there is no means for compensating for the false detection. Further, when the detection target is not an object (for example, when sound data is input to a trained model and a predetermined sound component is detected), even when a plurality of templates as in Patent Document 1 are used. Similarly, there is room for improving the detection accuracy.

本発明の一態様は、機械学習済みモデルを用いた検出の検出精度を高めることが可能な情報処理装置等を実現することを目的とする。 One aspect of the present invention is to realize an information processing apparatus or the like capable of improving the detection accuracy of detection using a machine-learned model.

上記の課題を解決するために、本発明の一態様に係る情報処理装置は、複数種類の第１の検出対象を検出できるように機械学習された第１の学習済みモデルに入力データを入力して上記第１の検出対象を検出する第１検出部と、上記複数種類の第１の検出対象の少なくとも一部である第２の検出対象を検出できるように機械学習された第２の学習済みモデルに上記入力データを入力して上記第２の検出対象を検出するか、または、上記第１の検出対象とは異なる第３の検出対象を検出できるように機械学習された第３の学習済みモデルに上記入力データを入力して上記第３の検出対象を検出する第２検出部と、を備え、上記第１検出部の検出結果と、上記第２検出部の検出結果とに基づいて、最終の検出結果を確定する。 In order to solve the above problems, the information processing apparatus according to one aspect of the present invention inputs input data to a first trained model that has been machine-learned so that a plurality of types of first detection targets can be detected. A second machine-learned unit that has been machine-learned so as to be able to detect a first detection unit that detects the first detection target and a second detection target that is at least a part of the plurality of types of first detection targets. A third trained machine-learned so that the input data is input to the model to detect the second detection target, or a third detection target different from the first detection target can be detected. A second detection unit that inputs the input data to the model and detects the third detection target is provided, and based on the detection result of the first detection unit and the detection result of the second detection unit, Confirm the final detection result.

上記の課題を解決するために、本発明の一態様に係る情報処理方法は、１または複数の情報処理装置により実行される情報処理方法であって、複数種類の検出対象を検出できるように機械学習された第１の学習済みモデルに入力データを入力して、該入力データから上記検出対象を検出する第１検出ステップと、上記複数種類の検出対象の少なくとも一部を検出できるように機械学習された第２の学習済みモデルか、または、上記第１の学習済みモデルとは異なる検出対象を検出できるように機械学習された第３の学習済みモデルに上記入力データを入力して、該入力データから検出対象を検出する第２検出ステップと、を含み、上記第１検出ステップの検出結果と、上記第２検出ステップの検出結果とに基づいて、最終の検出結果を確定する確定ステップと、を含む。 In order to solve the above problems, the information processing method according to one aspect of the present invention is an information processing method executed by one or a plurality of information processing devices, and is a machine so as to be able to detect a plurality of types of detection targets. Input data is input to the trained first trained model, the first detection step of detecting the detection target from the input data, and machine learning so that at least a part of the plurality of types of detection targets can be detected. The input data is input to the second trained model that has been processed, or the third trained model that has been machine-learned so that a detection target different from the first trained model can be detected. A confirmation step for determining the final detection result based on the detection result of the first detection step and the detection result of the second detection step, including the second detection step of detecting the detection target from the data. including.

本発明の一態様によれば、機械学習済みモデルを用いた検出の検出精度を高めることが可能になる。 According to one aspect of the present invention, it is possible to improve the detection accuracy of detection using a machine-learned model.

本発明の実施形態１に係る情報処理装置の制御部の機能ブロック図の一例である。It is an example of the functional block diagram of the control part of the information processing apparatus which concerns on Embodiment 1 of this invention. 上記情報処理装置を含む不適物検出システムの構成例を示すブロック図である。It is a block diagram which shows the structural example of the unsuitable substance detection system including the said information processing apparatus. ごみ焼却施設において、ごみ収集車がごみをごみピット内に投下している様子を示す図である。It is a figure which shows a state that a garbage truck is dropping garbage into a garbage pit in a garbage incineration facility. ごみピット内を示す図である。It is a figure which shows the inside of a garbage pit. 上記情報処理装置が実行する処理の流れを説明する図である。It is a figure explaining the flow of the process executed by the said information processing apparatus. 学習済みモデルの構築と再学習を説明する図である。It is a figure explaining construction and relearning of a trained model. 本発明の実施形態２に係る情報処理装置が備える制御部の構成例を示すブロック図である。It is a block diagram which shows the structural example of the control part provided in the information processing apparatus which concerns on Embodiment 2 of this invention. 上記情報処理装置が実行する処理の流れを説明する図である。It is a figure explaining the flow of the process executed by the said information processing apparatus. 本発明の実施形態３に係る情報処理装置が備える制御部の構成例を示すブロック図である。It is a block diagram which shows the structural example of the control part provided in the information processing apparatus which concerns on Embodiment 3 of this invention. 上記情報処理装置が実行する処理の流れを説明する図である。It is a figure explaining the flow of the process executed by the said information processing apparatus.

〔実施形態１〕
近年、ごみ焼却施設への焼却不適物（以下、単に不適物と呼ぶ）の投入が問題となっている。不適物が焼却炉に投入されることで、焼却炉における燃焼の悪化、焼却炉の灰出し設備での閉塞などが発生したり、場合によっては、焼却炉を緊急停止させたりすることもある。従来は、ごみ焼却施設の従業員が、収集したごみをランダムに選択し、選択したごみに不適物が含まれていないかを手作業で確認しており、作業員の負担が大きかった。 [Embodiment 1]
In recent years, the introduction of unsuitable materials for incineration (hereinafter simply referred to as unsuitable materials) into waste incineration facilities has become a problem. When unsuitable materials are put into the incinerator, combustion in the incinerator may deteriorate, the incinerator's ashing equipment may be blocked, and in some cases, the incinerator may be shut down urgently. In the past, employees of waste incineration facilities randomly selected the collected waste and manually checked whether the selected waste contained inappropriate substances, which was a heavy burden on the workers.

また、ごみ焼却施設へ運搬される不適物を減らすために、ごみを収集する担当者に注意喚起しようとした場合、運搬されたごみの中から不適物を検出して、検出された不適物を収集の担当者に提示するシステムが必要となる。この場合、実際には不適物ではないものを、不適物であるとして提示することは好ましくない。また、撮影した画像をそのまま担当者に見せる場合、不適物がどのタイミングでどの位置に写っているかを把握しにくいため好ましくない。 In addition, when trying to alert the person in charge of collecting garbage in order to reduce the unsuitable items transported to the garbage incineration facility, the unsuitable items are detected from the transported garbage and the detected unsuitable items are detected. A system is needed to present to the person in charge of collection. In this case, it is not preferable to present what is not actually unsuitable as unsuitable. Further, when the photographed image is shown to the person in charge as it is, it is not preferable because it is difficult to grasp at what timing and at what position the unsuitable object is captured.

本発明の一実施形態に係る情報処理装置１は、上記のような問題点を解決できるものである。情報処理装置１は、ごみ焼却施設に搬入されたごみから不適物を検出する機能を備えている。具体的には、情報処理装置１は、ごみピットに投入される途中のごみを撮影した画像を用いて、不適物を検出する。なお、ごみピットについては図４に基づいて後述する。また、不適物はごみの投下後に検出してもよい。また、不適物とは、ごみ焼却施設に設けられた焼却炉で焼却すべきでない物体である。不適物の具体例については後述する。 The information processing device 1 according to the embodiment of the present invention can solve the above-mentioned problems. The information processing device 1 has a function of detecting unsuitable substances from the garbage carried into the garbage incineration facility. Specifically, the information processing device 1 detects an unsuitable object by using an image of the garbage being thrown into the garbage pit. The garbage pit will be described later based on FIG. In addition, unsuitable substances may be detected after the garbage is dropped. In addition, unsuitable substances are objects that should not be incinerated in an incinerator installed in a waste incineration facility. Specific examples of unsuitable materials will be described later.

〔システム構成〕
本実施形態に係る不適物検出システムの構成を図２に基づいて説明する。図２は、不適物検出システム１００の構成例を示すブロック図である。不適物検出システム１００は、情報処理装置１、ごみ撮影装置２、車両情報収集装置３、選択表示装置４、および不適物表示装置５を含む。〔System configuration〕
The configuration of the unsuitable object detection system according to the present embodiment will be described with reference to FIG. FIG. 2 is a block diagram showing a configuration example of the unsuitable object detection system 100. The unsuitable object detection system 100 includes an information processing device 1, a garbage photographing device 2, a vehicle information collecting device 3, a selection display device 4, and an unsuitable object display device 5.

また、図２には、情報処理装置１のハードウェア構成の例についても示している。図示のように、情報処理装置１は、制御部１０、高速記憶部１１、大容量記憶部１２、画像ＩＦ（インタフェース）部１３、車両情報ＩＦ部１４、選択表示ＩＦ部１５、不適物表示ＩＦ部１６を備えている。情報処理装置１は、一例として、パーソナルコンピュータ、サーバー、またはワークステーションであってもよい。 Further, FIG. 2 also shows an example of the hardware configuration of the information processing apparatus 1. As shown in the figure, the information processing device 1 includes a control unit 10, a high-speed storage unit 11, a large-capacity storage unit 12, an image IF (interface) unit 13, a vehicle information IF unit 14, a selection display IF unit 15, and an unsuitable object display IF. The part 16 is provided. The information processing device 1 may be, for example, a personal computer, a server, or a workstation.

制御部１０は、情報処理装置１の各部を統括して制御するものである。図１に基づいて後述する制御部１０の各部の機能は、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現することもできるし、ソフトウェアによって実現することもできる。このソフトウェアには、コンピュータを後述する図１、７、９に記載の制御部１０に含まれる各部として機能させる情報処理プログラムが含まれていてもよい。ソフトウェアによって実現する場合、制御部１０は、例えばＣＰＵ（Central Processing Unit）で構成してもよいし、ＧＰＵ（Graphics Processing Unit）で構成してもよく、これらの組み合わせで構成してもよい。また、この場合、上記ソフトウェアは、大容量記憶部１２に保存しておく。そして、制御部１０は、上記ソフトウェアを高速記憶部１１に読み込んで実行する。 The control unit 10 controls each unit of the information processing device 1 in an integrated manner. The functions of each part of the control unit 10 described later based on FIG. 1 can be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or can be realized by software. This software may include an information processing program that causes the computer to function as each unit included in the control unit 10 shown in FIGS. 1, 7, and 9, which will be described later. When realized by software, the control unit 10 may be configured by, for example, a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or a combination of these. Further, in this case, the software is stored in the large-capacity storage unit 12. Then, the control unit 10 reads the software into the high-speed storage unit 11 and executes it.

高速記憶部１１と大容量記憶部１２は、何れも情報処理装置１が使用する各種データを記憶する記憶装置である。高速記憶部１１は大容量記憶部１２と比べて高速でデータの書き込みおよび読出しが可能な記憶装置である。大容量記憶部１２は高速記憶部１１と比べてデータの記憶容量が大きい。高速記憶部１１としては、例えばＳＤＲＡＭ（Synchronous Dynamic Random-Access Memory）等の高速アクセスメモリを適用することもできる。また、大容量記憶部１２としては、例えばＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid-State Drive）、ＳＤ（Secure Digital）カード、あるいはｅＭＭＣ（embedded Multi-Media Controller）等を適用することもできる。 Both the high-speed storage unit 11 and the large-capacity storage unit 12 are storage devices that store various data used by the information processing device 1. The high-speed storage unit 11 is a storage device capable of writing and reading data at a higher speed than the large-capacity storage unit 12. The large-capacity storage unit 12 has a larger data storage capacity than the high-speed storage unit 11. As the high-speed storage unit 11, for example, a high-speed access memory such as SDRAM (Synchronous Dynamic Random-Access Memory) can be applied. Further, as the large-capacity storage unit 12, for example, an HDD (Hard Disk Drive), an SSD (Solid-State Drive), an SD (Secure Digital) card, an eMMC (embedded Multi-Media Controller), or the like can be applied.

画像ＩＦ部１３は、ごみ撮影装置２と情報処理装置１とを通信接続するためのインタフェースである。また、車両情報ＩＦ部１４は、車両情報収集装置３と情報処理装置１とを通信接続するためのインタフェースである。これらのＩＦ部は、有線通信用のものであってもよいし、無線通信用のものであってもよい。例えば、これらのＩＦ部として、ＵＳＢ（Universal Serial Bus）、ＬＡＮ（Local-Area Network）や無線ＬＡＮ等を適用することもできる。 The image IF unit 13 is an interface for communicating and connecting the dust photographing device 2 and the information processing device 1. Further, the vehicle information IF unit 14 is an interface for communicating and connecting the vehicle information collecting device 3 and the information processing device 1. These IF units may be for wired communication or may be for wireless communication. For example, USB (Universal Serial Bus), LAN (Local-Area Network), wireless LAN, or the like can be applied as these IF units.

選択表示ＩＦ部１５は、選択表示装置４と情報処理装置１とを通信接続するためのインタフェースである。また、不適物表示ＩＦ部１６は、不適物表示装置５と情報処理装置１とを通信接続するためのインタフェースである。これらのＩＦ部も、有線通信用のものであってもよいし、無線通信用のものであってもよい。例えば、これらのＩＦ部として、ＨＤＭＩ（High-Definition Multimedia Interface、登録商標）、ＤｉｓｐｌａｙＰｏｒｔ、ＤＶＩ（Digital Visual Interface）、ＶＧＡ（Video Graphics Array）端子、Ｓ端子、あるいはＲＣＡ端子等を適用することもできる。 The selection display IF unit 15 is an interface for communicating and connecting the selection display device 4 and the information processing device 1. Further, the unsuitable object display IF unit 16 is an interface for communicating and connecting the unsuitable object display device 5 and the information processing device 1. These IF units may also be for wired communication or wireless communication. For example, HDMI (High-Definition Multimedia Interface, registered trademark), DisplayPort, DVI (Digital Visual Interface), VGA (Video Graphics Array) terminal, S terminal, RCA terminal, or the like can be applied as these IF units. ..

ごみ撮影装置２は、ごみピットに投下される途中のごみを撮影し、撮影画像を情報処理装置１へ送信する。以降、この撮影画像をごみ画像と呼ぶ。一例として、ごみ撮影装置２は、動画を撮影するハイスピードシャッターカメラであってもよい。なお、ごみ画像は、動画像であってもよいし、連続して撮影された時系列の静止画像であってもよい。ごみ画像は、画像ＩＦ部１３を介して情報処理装置１に入力される。そして、入力されたごみ画像は、そのまま制御部１０で処理することもできるし、高速記憶部１１あるいは大容量記憶部１２に保存した後で、制御部１０で処理することもできる。 The garbage photographing device 2 photographs the garbage in the process of being dropped into the garbage pit, and transmits the photographed image to the information processing device 1. Hereinafter, this photographed image will be referred to as a garbage image. As an example, the dust photographing device 2 may be a high-speed shutter camera that shoots a moving image. The dust image may be a moving image or a time-series still image taken continuously. The garbage image is input to the information processing device 1 via the image IF unit 13. Then, the input garbage image can be processed by the control unit 10 as it is, or can be processed by the control unit 10 after being stored in the high-speed storage unit 11 or the large-capacity storage unit 12.

車両情報収集装置３は、ごみを搬入し、該ごみをごみピット内に投下する車両（いわゆるごみ収集車）の識別情報を収集して情報処理装置１へ送信する。なお、ごみ収集車によるごみピットへのごみの投下については図４に基づいて後述する。この識別情報は、搬入車両特定部１０５が、ごみの搬入主体を特定するために使用される。上記識別情報は、例えば、ナンバープレートのナンバー等を示す情報であってもよい。この場合、車両情報収集装置３は、ナンバープレートを撮影し、撮影した画像を識別情報として情報処理装置１へ送信するものであってもよい。また、車両情報収集装置３は、ごみ収集車の識別情報の入力を受け付けて情報処理装置１へ送信するものであってもよい。 The vehicle information collecting device 3 collects identification information of a vehicle (so-called garbage truck) that carries in garbage and drops the garbage into a garbage pit, and transmits the identification information to the information processing device 1. The dropping of garbage into the garbage pit by the garbage truck will be described later based on FIG. This identification information is used by the carry-in vehicle identification unit 105 to identify the garbage carry-in subject. The identification information may be, for example, information indicating a license plate number or the like. In this case, the vehicle information collecting device 3 may take a picture of the license plate and transmit the taken image as identification information to the information processing device 1. Further, the vehicle information collecting device 3 may receive the input of the identification information of the garbage truck and transmit it to the information processing device 1.

選択表示装置４は、情報処理装置１が検出した不適物の画像を表示する。不適物検出システム１００では、情報処理装置１が、不適物ではないものを不適物と誤判定する可能性を考慮して、情報処理装置１が検出した不適物の画像を選択表示装置４に表示させて、その画像に写っているものが不適物であるか否かを目視確認させる。そして、目視確認の担当者は、選択表示装置４に表示された画像の中から、不適物が写っている画像を選定する。 The selection display device 4 displays an image of an unsuitable object detected by the information processing device 1. In the unsuitable object detection system 100, the information processing device 1 displays an image of the unsuitable object detected by the information processing device 1 on the selection display device 4 in consideration of the possibility that a non-inappropriate object is erroneously determined as an unsuitable object. Let them visually confirm whether or not what is shown in the image is inappropriate. Then, the person in charge of visual confirmation selects an image showing an unsuitable object from the images displayed on the selection display device 4.

不適物表示装置５は、情報処理装置１が検出した不適物の画像のうち、選択表示装置４を介して選定された不適物の画像、すなわち不適物が写っていることが目視確認された画像を表示する。不適物表示装置５は、上記不適物を搬入した担当者や事業者等への注意喚起のために上記画像を表示する。 The unsuitable object display device 5 is an image of an unsuitable object selected via the selection display device 4, that is, an image visually confirmed that the unsuitable object is shown, among the images of the unsuitable object detected by the information processing device 1. Is displayed. The unsuitable object display device 5 displays the above image for alerting the person in charge, the business operator, or the like who brought in the unsuitable object.

〔ごみ画像の撮影〕
図３は、ごみ焼却施設において、ごみ収集車２００がごみをごみピット内に投下している様子を示す図である。図４は、ごみピット内を示す図である。ごみピットは、ごみ焼却施設に収集されたごみを一時的に格納する場所であり、ごみピット内のごみは順次焼却炉に送り込まれて焼却される。図３に示すように、ごみ焼却施設には扉３００Ａ、３００Ｂといった複数の扉（以降、区別する必要が無い場合、扉３００と総称する）が設けられている。また、図４に示すように、扉３００の先にはごみピットが設けられている。つまり、扉３００が開放されることで、ごみピットへごみを投下するための投下口が現れる。図３に示すように、ごみ収集車２００は、投下口からごみをごみピット内に投下する。 [Shooting garbage images]
FIG. 3 is a diagram showing a state in which a garbage truck 200 is dropping garbage into a garbage pit at a garbage incineration facility. FIG. 4 is a diagram showing the inside of the garbage pit. The garbage pit is a place to temporarily store the garbage collected in the garbage incineration facility, and the garbage in the garbage pit is sequentially sent to the incinerator and incinerated. As shown in FIG. 3, the waste incineration facility is provided with a plurality of doors such as doors 300A and 300B (hereinafter, collectively referred to as door 300 when it is not necessary to distinguish them). Further, as shown in FIG. 4, a dust pit is provided at the tip of the door 300. That is, when the door 300 is opened, a drop port for dropping garbage into the garbage pit appears. As shown in FIG. 3, the garbage truck 200 drops garbage into the garbage pit from the drop port.

ごみ撮影装置２は、図４のスロープ６００を流れるごみを撮影可能な位置に取り付ける。例えば、図３および図４に示す取付箇所４００にごみ撮影装置２を取り付けてもよい。取付箇所４００は、各扉３００の表面に位置しているから、取付箇所４００にごみ撮影装置２を取り付けた場合、扉３００が開いたときにごみ撮影装置２がスロープ６００の上方に位置することになり、この位置がごみの撮影に好適である。無論、ごみ撮影装置２の取り付け箇所は、スロープ６００を流れるごみを撮影可能な任意の位置とすることができる。 The dust photographing apparatus 2 is attached to a position where dust flowing on the slope 600 of FIG. 4 can be photographed. For example, the dust photographing apparatus 2 may be attached to the attachment points 400 shown in FIGS. 3 and 4. Since the mounting location 400 is located on the surface of each door 300, when the dust photographing device 2 is mounted on the mounting location 400, the dust photographing device 2 is located above the slope 600 when the door 300 is opened. This position is suitable for taking pictures of garbage. Of course, the attachment location of the dust photographing apparatus 2 can be any position where the dust flowing on the slope 600 can be photographed.

また、車両情報収集装置３が撮影装置である場合、車両情報収集装置３も取付箇所４００に取り付けてもよい。ごみ収集車２００が扉３００に接近する段階では、扉３００が閉まっているため、取付箇所４００に取り付けた車両情報収集装置３からごみ収集車２００のナンバープレート等を撮影することができる。無論、車両情報収集装置３の取り付け箇所は、ごみ収集車２００を撮影可能な任意の位置とすることができ、ごみ撮影装置２とは異なる箇所に取り付けてもよい。また、車両情報収集装置３は、例えば情報の入力装置であってもよく、この場合、車両情報収集装置３をオペレータルームに取り付けて、オペレータによるごみ収集車２００の識別情報の入力を受け付ける構成としてもよい。 Further, when the vehicle information collecting device 3 is a photographing device, the vehicle information collecting device 3 may also be attached to the attachment point 400. When the garbage truck 200 approaches the door 300, the door 300 is closed, so that the license plate of the garbage truck 200 can be photographed from the vehicle information collecting device 3 attached to the mounting location 400. Of course, the mounting location of the vehicle information collecting device 3 can be any position where the garbage truck 200 can be photographed, and may be mounted at a position different from that of the garbage capturing device 2. Further, the vehicle information collecting device 3 may be, for example, an information input device. In this case, the vehicle information collecting device 3 is attached to the operator room to receive the input of the identification information of the garbage truck 200 by the operator. May be good.

〔装置構成〕
情報処理装置１の構成を図１に基づいて説明する。図１は、情報処理装置１の制御部１０の機能ブロック図の一例である。図１に示す制御部１０には、第１検出部１０１、第２検出部１０２、学習部１０３、選択表示制御部１０４、搬入車両特定部１０５、および不適物表示制御部１０６が含まれている。また、図１に示す大容量記憶部１２には、入力データ格納部１２１、検出結果格納部１２２、学習済みモデル格納部１２３、および教師データ格納部１２４が含まれている。〔Device configuration〕
The configuration of the information processing device 1 will be described with reference to FIG. FIG. 1 is an example of a functional block diagram of the control unit 10 of the information processing device 1. The control unit 10 shown in FIG. 1 includes a first detection unit 101, a second detection unit 102, a learning unit 103, a selection display control unit 104, a carry-in vehicle identification unit 105, and an unsuitable object display control unit 106. .. Further, the large-capacity storage unit 12 shown in FIG. 1 includes an input data storage unit 121, a detection result storage unit 122, a learned model storage unit 123, and a teacher data storage unit 124.

第１検出部１０１は、複数種類の第１の検出対象を検出できるように機械学習された第１の学習済みモデルに入力データを入力して上記第１の検出対象を検出する。なお、複数種類の検出対象を検出する、とは、第１の学習済みモデルで学習させた分類（クラスとも呼ばれる）が複数あることを意味する。 The first detection unit 101 detects the first detection target by inputting input data into the first trained model that has been machine-learned so that a plurality of types of first detection targets can be detected. Note that detecting a plurality of types of detection targets means that there are a plurality of classifications (also referred to as classes) trained by the first trained model.

また、第２検出部１０２は、複数種類の第１の検出対象の少なくとも一部である第２の検出対象を検出できるように機械学習された第２の学習済みモデルに上記入力データを入力して上記第２の検出対象を検出する。本実施形態では、複数種類の第１の検出対象が複数種類の不適物であり、第２の検出対象も不適物である例を説明する。なお、第１の検出対象と第２の検出対象には、不適物と外観が類似しているが不適物ではない物体が含まれていてもよい。 Further, the second detection unit 102 inputs the input data to the second trained model machine-learned so as to detect the second detection target which is at least a part of the first detection target of a plurality of types. The second detection target is detected. In the present embodiment, an example will be described in which a plurality of types of first detection targets are a plurality of types of unsuitable objects, and a second type of detection target is also unsuitable. The first detection target and the second detection target may include an object similar in appearance to the unsuitable object but not an unsuitable object.

上記第１の学習済みモデルおよび第２の学習済みモデルは学習済みモデル格納部１２３から、上記入力データは入力データ格納部１２１から読み出される。詳細は後述するが、第１の学習済みモデルおよび第２の学習済みモデルは、教師データ格納部１２４に格納されている元教師データ１２４ａを用いた機械学習により構築される。また、第１の検出対象および第２の検出結果は、検出結果格納部１２２に保存される。これらの検出結果は、追加教師データ１２２ａとして用いられる。追加教師データ１２２ａを教師データ格納部１２４にコピーしたものが追加教師データ１２４ｂである。第１の学習済みモデルおよび第２の学習済みモデルの再学習は、教師データ格納部１２４に格納されている元教師データ１２４ａと追加教師データ１２４ｂを用いて行われる。 The first trained model and the second trained model are read from the trained model storage unit 123, and the input data is read from the input data storage unit 121. Although the details will be described later, the first trained model and the second trained model are constructed by machine learning using the original teacher data 124a stored in the teacher data storage unit 124. Further, the first detection target and the second detection result are stored in the detection result storage unit 122. These detection results are used as additional teacher data 122a. The additional teacher data 122a is a copy of the additional teacher data 122a into the teacher data storage unit 124, which is the additional teacher data 124b. The re-learning of the first trained model and the second trained model is performed using the original teacher data 124a and the additional teacher data 124b stored in the teacher data storage unit 124.

第１の学習済みモデルと第２の学習済みモデルは、機械学習により構築されたモデルであればよい。本実施形態では、第１の学習済みモデルと第２の学習済みモデルが、深層学習により構築したニューラルネットワークの学習済みモデルである例を説明する。より詳細には、これらの学習済みモデルは、画像を入力データとして、その画像に写る検出対象物の物体情報を出力する。物体情報には、物体の分類を示す識別子、位置、大きさ、形状等を示す情報が含まれていてもよい。また、物体情報には、検出結果の確からしさを示す確率値が含まれていてもよい。この確率値は、例えば０〜１の数値であってもよい。 The first trained model and the second trained model may be models constructed by machine learning. In this embodiment, an example will be described in which the first trained model and the second trained model are trained models of a neural network constructed by deep learning. More specifically, these trained models take an image as input data and output object information of a detection target appearing in the image. The object information may include information indicating an identifier indicating the classification of the object, a position, a size, a shape, and the like. Further, the object information may include a probability value indicating the certainty of the detection result. This probability value may be, for example, a numerical value of 0 to 1.

また、第１検出部１０１と第２検出部１０２は、上記の確率値に基づいて物体検出を行ってもよい。この場合、予め検出閾値を設定しておき、上記の確率値が閾値より大きい物体を検出した物体としてもよい。検出閾値が大きいほど検出精度は高くなるが、見逃しが多くなり、検出閾値が小さいほど見逃しが少なくなるが、誤検出が増えるので、必要とされる検出精度等に応じて適切な検出閾値を設定すればよい。なお、学習済みモデルの構築については図６に基づいて後述する。 Further, the first detection unit 101 and the second detection unit 102 may perform object detection based on the above probability values. In this case, the detection threshold value may be set in advance, and an object having the above probability value larger than the threshold value may be detected. The larger the detection threshold, the higher the detection accuracy, but the more missed, and the smaller the detection threshold, the less missed, but the more false detections, so set an appropriate detection threshold according to the required detection accuracy, etc. do it. The construction of the trained model will be described later based on FIG.

学習部１０３は、第１の学習済みモデルと第２の学習済みモデルの再学習を行う。また、第１の学習済みモデルと第２の学習済みモデルの構築も学習部１０３が行う構成としてもよい。学習済みモデルの構築と再学習については図６に基づいて後述する。なお、第１の学習済みモデル用の学習部１０３と第２の学習済みモデル用の学習部１０３をそれぞれ別に設けてもよい。 The learning unit 103 relearns the first trained model and the second trained model. Further, the learning unit 103 may also construct the first trained model and the second trained model. The construction and re-learning of the trained model will be described later based on FIG. The learning unit 103 for the first trained model and the learning unit 103 for the second trained model may be provided separately.

選択表示制御部１０４は、第１検出部１０１と第２検出部１０２の検出結果に基づいて確定された検出結果（例えば不適物が写っていると判定された画像）を選択表示装置４に表示させる。目視確認の担当者は、表示された画像に不適物が写っているかを確認し、不適物が写っている画像を選定する。そして、選択表示制御部１０４は、目視確認の担当者による画像の選定を受け付ける。これにより、誤検出をほぼ確実に回避することができる。 The selection display control unit 104 displays the detection result (for example, an image determined to show an unsuitable object) determined based on the detection results of the first detection unit 101 and the second detection unit 102 on the selection display device 4. Let me. The person in charge of visual confirmation confirms whether or not an unsuitable object is reflected in the displayed image, and selects an image in which the unsuitable object is reflected. Then, the selection display control unit 104 accepts the selection of the image by the person in charge of visual confirmation. As a result, erroneous detection can be almost certainly avoided.

搬入車両特定部１０５は、車両情報収集装置３がから受信する識別情報を用いてごみの搬入車両（例えば図３のごみ収集車２００）を特定する。そして、不適物表示制御部１０６は搬入車両特定部１０５が特定した搬入車両が過去に搬入したごみから情報処理装置１が不適物を検出していた場合、上記不適物の画像を不適物表示装置５に表示させる。これにより、当該搬入車両でごみを搬入した担当者に対して不適物の画像を提示して注意喚起することができる。 The carry-in vehicle identification unit 105 identifies the garbage carry-in vehicle (for example, the garbage collection vehicle 200 in FIG. 3) by using the identification information received from the vehicle information collection device 3. Then, when the information processing device 1 detects an unsuitable object from the garbage carried in by the carry-in vehicle specified by the carry-in vehicle identification unit 105, the unsuitable object display control unit 106 displays the image of the unsuitable object as the unsuitable object display device. Display on 5. As a result, it is possible to present an image of an unsuitable object to the person in charge of carrying in the garbage in the carry-in vehicle to alert the person in charge.

以上のように、情報処理装置１は、複数種類の第１の検出対象を検出できるように機械学習された第１の学習済みモデルに入力データを入力して上記第１の検出対象を検出する第１検出部１０１と、上記複数種類の第１の検出対象の少なくとも一部である第２の検出対象を検出できるように機械学習された第２の学習済みモデルに上記入力データを入力して上記第２の検出対象を検出する第２検出部１０２と、を備えている。そして、情報処理装置１は、第１検出部１０１の検出結果と、第２検出部１０２の検出結果とに基づいて、最終の検出結果を確定する。具体的には、情報処理装置１では、第１検出部１０１および第２検出部１０２が、共通の出力先に検出結果を出力するので、この共通の出力先に出力された検出結果を最終の検出結果とする。 As described above, the information processing apparatus 1 inputs input data into the first trained model machine-learned so as to be able to detect a plurality of types of first detection targets, and detects the first detection target. The input data is input to the first detection unit 101 and the second trained model machine-learned so that the second detection target, which is at least a part of the first detection target of the plurality of types, can be detected. It includes a second detection unit 102 that detects the second detection target. Then, the information processing device 1 determines the final detection result based on the detection result of the first detection unit 101 and the detection result of the second detection unit 102. Specifically, in the information processing device 1, since the first detection unit 101 and the second detection unit 102 output the detection result to the common output destination, the detection result output to the common output destination is finally output. Use as the detection result.

上記の構成によれば、第１の学習済みモデルと第２の学習済みモデルの検出対象の少なくとも一部が重複しているため、当該重複部分について誤検出が生じる可能性を低減することができる。例えば、不適物である板と不適物ではない段ボールとは、外観が類似していることがあり、このような場合には、板を段ボールと誤検出したり、段ボールを板と誤検出したりすることがあり得る。上記の構成によれば、第１の学習済みモデルと第２の学習済みモデルの一方で、上述の板と段ボールのような誤検出が生じたとしても、他方でその物体を正しく検出できていれば、最終的にはその物体の正しい検出結果を出力することができる。よって、機械学習済みモデルを用いた検出の検出精度を高めることが可能になる。 According to the above configuration, since at least a part of the detection target of the first trained model and the second trained model overlaps, the possibility of erroneous detection of the overlapped part can be reduced. .. For example, a board that is unsuitable and a corrugated cardboard that is not unsuitable may have similar appearances. It is possible to do. According to the above configuration, one of the first trained model and the second trained model should be able to correctly detect the object on the other side even if a false detection such as the above-mentioned board and cardboard occurs. Then, in the end, the correct detection result of the object can be output. Therefore, it is possible to improve the detection accuracy of the detection using the machine-learned model.

また、上記の構成によれば、第１検出部１０１が、複数種類の第１の検出対象を検出できるように機械学習された第１の学習済みモデルを用いるため、複数種類の第１の検出対象を一括して効率的に検出することができる。 Further, according to the above configuration, since the first detection unit 101 uses the first trained model machine-learned so that a plurality of types of first detection targets can be detected, a plurality of types of first detections are performed. Targets can be detected collectively and efficiently.

なお、最終の検出結果を決定する方法は、第１検出部１０１および第２検出部１０２の検出結果の出力先を共通化する方法に限られない。例えば、検出結果を統合するためのブロックを制御部１０に追加して、このブロックによって第１検出部１０１および第２検出部１０２の検出結果を統合し、最終の検出結果としてもよい。統合の方法としては、例えば下記のような方法が挙げられる。 The method of determining the final detection result is not limited to the method of sharing the output destinations of the detection results of the first detection unit 101 and the second detection unit 102. For example, a block for integrating the detection results may be added to the control unit 10, and the detection results of the first detection unit 101 and the second detection unit 102 may be integrated by this block as the final detection result. Examples of the integration method include the following methods.

（１）第１検出部１０１の検出結果に第２検出部１０２の検出結果を加えて最終の検出結果とする（例えば、ある画像から第１検出部１０１が不適物Ａ、Ｂを検出し、同じ画像から第２検出部１０２が不適物Ｂ、Ｃを検出した場合に、当該画像からの最終の検出結果を不適物Ａ、Ｂ、Ｃとする等。）。 (1) The detection result of the second detection unit 102 is added to the detection result of the first detection unit 101 to obtain the final detection result (for example, the first detection unit 101 detects unsuitable objects A and B from a certain image, and then obtains the final detection result. When the second detection unit 102 detects unsuitable objects B and C from the same image, the final detection result from the image is defined as unsuitable objects A, B and C, etc.).

（２）第１検出部１０１の検出結果と第２検出部１０２の検出結果の共通部分を最終の検出結果とする（例えば、ある画像から第１検出部１０１が不適物Ａ、Ｂ、Ｃを検出し、同じ画像から第２検出部１０２が不適物Ｂを検出した場合に、当該画像からの最終の検出結果を不適物Ｂとする等。）。 (2) The intersection of the detection result of the first detection unit 101 and the detection result of the second detection unit 102 is set as the final detection result (for example, from a certain image, the first detection unit 101 sets unsuitable objects A, B, and C. When the second detection unit 102 detects the unsuitable object B from the same image, the final detection result from the image is regarded as the unsuitable object B, etc.).

（３）第１検出部１０１の検出結果に第２検出部１０２の検出結果を加えるが、両検出結果が整合しない部分は最終の検出結果から除外する（例えば、物体Ｘと物体Ｙが写る画像について、第１検出部１０１は物体Ｘが不適物Ａ、物体Ｙが不適物Ｂであると検出し、第２検出部１０２は物体Ｘが不適物Ａ、物体Ｙが不適物Ｃであると検出した場合に、最終の検出結果を不適物Ａとする等。）。 (3) The detection result of the second detection unit 102 is added to the detection result of the first detection unit 101, but the portion where both detection results do not match is excluded from the final detection result (for example, an image showing the object X and the object Y). The first detection unit 101 detects that the object X is an unsuitable object A and the object Y is an unsuitable object B, and the second detection unit 102 detects that the object X is an unsuitable object A and the object Y is an unsuitable object C. If this is the case, the final detection result will be the unsuitable object A, etc.).

〔処理の流れ〕
図５は、本実施形態の情報処理装置１が実行する処理の流れを説明する図である。本実施形態の情報処理装置１が実行する処理とその実行順序は、例えば、図５に示す設定ファイルＦ１により定義することができると共に、同図のフローチャートで表すこともできる。 [Processing flow]
FIG. 5 is a diagram illustrating a flow of processing executed by the information processing apparatus 1 of the present embodiment. The processing executed by the information processing apparatus 1 of the present embodiment and the execution order thereof can be defined by, for example, the setting file F1 shown in FIG. 5, and can also be represented by the flowchart of the figure.

（設定ファイルについて）
図５に示す設定ファイルＦ１は、セクションごとに区切られたデータ構造である。１つのセクションは、セクション名から開始する。図５の例では、“［”と“］”に囲まれた文字列がセクション名であり、具体的には［ＥＸ１］と［ＥＸ２］がセクション名である。１つのセクションは、次のセクションの開始または設定ファイル終了により終了する。１つのセクションに各段階で実行する内容を定義する。 (About the configuration file)
The setting file F1 shown in FIG. 5 is a data structure divided into sections. One section starts with the section name. In the example of FIG. 5, the character string surrounded by "[" and "]" is the section name, and specifically, [EX1] and [EX2] are the section names. One section ends with the start of the next section or the end of the configuration file. Define what to do at each stage in one section.

セクションの実行は、本例ではセクション順とするが、この例に限られず、例えばセクション名の一部に実行順を定義してもよい。また、セクション毎に異なったアルゴリズムを実行してもよい。例えば、セクション毎に中間層の数が異なったニューラルネットワークを用いて処理を行うアルゴリズムを実行してもよいし、異なった内部処理を行うアルゴリズムを実行してもよい。なお、上述の各定義は、設定ファイルではなく、他の手段（例えば図５のフローチャートを実行する際の引数）で行ってもよい。 In this example, the execution of sections is in the section order, but the execution order is not limited to this example, and the execution order may be defined as part of the section name, for example. Also, different algorithms may be executed for each section. For example, an algorithm that performs processing using a neural network in which the number of intermediate layers is different for each section may be executed, or an algorithm that performs different internal processing may be executed. It should be noted that each of the above definitions may be performed by other means (for example, an argument when executing the flowchart of FIG. 5) instead of the setting file.

設定ファイルＦ１では、以下のように変数である＜ＫＥＹ＞毎に定義を行う。＜ＫＥＹ＞の値が＜ＶＡＬＵＥ＞である。 In the setting file F1, the definition is made for each variable <KEY> as follows. The value of <KEY> is <VALUE>.

＜ＫＥＹ＞＝＜ＶＡＬＵＥ＞
本例の＜ＫＥＹ＞では、セクション毎に「ｓｃｒｉｐｔ」、「ｓｒｃ」、および「ｄｓｔ」の３つを定義している。このうち「ｓｃｒｉｐｔ」は、実行するスクリプトのファイル名を示す。スクリプトファイルは、例えば不適物を検出するアルゴリズムであり、このアルゴリズムには、不適物検出以外の処理が含まれていてもよい。 <KEY> = <VALUE>
In <KEY> of this example, three "script", "src", and "dst" are defined for each section. Of these, "script" indicates the file name of the script to be executed. The script file is, for example, an algorithm for detecting an unsuitable object, and this algorithm may include processing other than the unsuitable object detection.

また、「ｓｒｃ」は、スクリプトの実行に用いる入力データを示す。例えば、画像から物体検出を行うスクリプトの場合、「ｓｒｃ」は、大容量記憶部１２における処理対象の画像の格納場所や、処理対象の画像リストを示すものとしてもよい。そして、「ｄｓｔ」は、スクリプト実行の結果の出力先を示す。例えば、「ｄｓｔ」は、大容量記憶部１２上の場所を示すものであってもよい。なお、実行時に使うその他パラメータ等は、例えば「ｓｃｒｉｐｔ」に定義してもよい。 Further, "src" indicates input data used for executing the script. For example, in the case of a script that detects an object from an image, "src" may indicate a storage location of an image to be processed in the large-capacity storage unit 12 or a list of images to be processed. Then, "dst" indicates the output destination of the result of script execution. For example, "dst" may indicate a location on the large-capacity storage unit 12. Other parameters and the like used at the time of execution may be defined in, for example, "script".

制御部１０が設定ファイルＦ１を読み込み、設定ファイルＦ１の［ＥＸ１］に定義されている、ファイル名が「ｅｘ１」であるスクリプトファイルを実行することにより、制御部１０は第１検出部１０１として機能する。そして、第１検出部１０１は、「ｅｘ＿ｓｒｃ」を参照して処理対象の画像を特定し、特定した画像から物体検出を行って、その結果を「ｅｘ＿ｄｓｔ」に記録する。次に、制御部１０が、［ＥＸ２］に定義されているファイル名が「ｅｘ２」であるスクリプトファイルを実行することにより、制御部１０は第２検出部１０２として機能する。そして、第２検出部１０２は、「ｅｘ＿ｓｒｃ」を参照して処理対象の画像を特定し、特定した画像から物体検出を行って、その結果を「ｅｘ＿ｄｓｔ」に記録する。 The control unit 10 reads the setting file F1 and executes the script file whose file name is "ex1" defined in [EX1] of the setting file F1, so that the control unit 10 functions as the first detection unit 101. To do. Then, the first detection unit 101 identifies an image to be processed with reference to "ex_src", detects an object from the identified image, and records the result in "ex_dst". Next, the control unit 10 functions as the second detection unit 102 by executing the script file whose file name is "ex2" defined in [EX2]. Then, the second detection unit 102 identifies an image to be processed with reference to "ex_src", detects an object from the identified image, and records the result in "ex_dst".

設定ファイルＦ１の［ＥＸ１］における「ｅｘ１」は、上述の第１の学習済みモデルにより入力データである画像から物体検出するスクリプトである。また、［ＥＸ２］における「ｅｘ２」は、上述の第２の学習済みモデルにより入力データである画像から物体検出するスクリプトである。 “Ex1” in [EX1] of the setting file F1 is a script for detecting an object from an image which is input data by the above-mentioned first trained model. Further, "ex2" in [EX2] is a script for detecting an object from an image which is input data by the above-mentioned second trained model.

また、設定ファイルＦ１における［ＥＸ１］と［ＥＸ２］は、「ｓｒｃ」、および「ｄｓｔ」の値が共通している。つまり、第１検出部１０１と第２検出部１０２は、処理対象の画像が共通であり、該画像からの物体検出結果の出力先も共通している。 Further, [EX1] and [EX2] in the setting file F1 have the same values of "src" and "dst". That is, the first detection unit 101 and the second detection unit 102 have a common image to be processed, and also have a common output destination of the object detection result from the image.

設定ファイルＦ１における［ＥＸ１］として、例えば図５に示すスクリプトＳＣ１１（スクリプトファイル名：ｅｘ１）を適用してもよい。本スクリプトファイルはＬｉｎｕｘ（登録商標）を含むＵＮＩＸ（登録商標）形式のシェルスクリプト例であるが、例えばＤＯＳ（Disk Operating System）のＢａｔｃｈ形式等の他の形式であってもよい。 As [EX1] in the setting file F1, for example, the script SC11 (script file name: ex1) shown in FIG. 5 may be applied. This script file is an example of a UNIX (registered trademark) format shell script including Linux (registered trademark), but may be in another format such as DOS (Disk Operating System) Batch format.

本スクリプトファイルを用いる場合、図５に示すように、（１）設定ファイル項目を元に、（２）スクリプトが実行される。ここで、ｅｘ１が前述の通りに実行するスクリプトファイル名であり、ｅｘ＿ｓｒｃおよびｅｘ＿ｄｓｔは引数としてスクリプトに渡される。ｅｘ１スクリプト（ＳＣ１１）は一行から構成され、ここでは第１検出部１０１の実行ファイル（第１検出コマンド）が実行されるが、この例ではこのコマンドは三つの引数を使っている。第１引数＄１にはｅｘ１の引数ｅｘ＿ｓｒｃが渡され、第２引数＄２にはｅｘ１の引数ｅｘ＿ｄｓｔが渡され、第三引数が学習済みモデル（第１の学習済みモデル）のファイル名である。第１の学習済みモデルとしては、例えば全ての検出対象物を機械学習した学習済みモデルを用いる。検出対象物は、例えば、段ボール、板、木、ござ、および長尺物としてもよい。これらの検出対象物のうち、板、木、ござ、および長尺物は不適物である。段ボールは不適物ではないが、板と外観が類似しているため検出対象物に含めている。このように、不適物と外観が類似したごみを検出対象物に含めることにより、不適物の検出漏れや誤検出が生じる可能性を低減することができる。 When this script file is used, as shown in FIG. 5, (2) the script is executed based on (1) the setting file items. Here, ex1 is the name of the script file to be executed as described above, and ex_src and ex_dst are passed to the script as arguments. The ex1 script (SC11) is composed of one line, and here the executable file (first detection command) of the first detection unit 101 is executed. In this example, this command uses three arguments. The argument ex_src of ex1 is passed to the first argument $ 1, the argument ex_dst of ex1 is passed to the second argument $ 2, and the third argument is the file name of the trained model (first trained model). .. As the first trained model, for example, a trained model in which all detection objects are machine-learned is used. The detection target may be, for example, corrugated cardboard, board, wood, goza, and a long object. Of these detection objects, boards, wood, rugs, and long objects are unsuitable. Corrugated cardboard is not unsuitable, but it is included in the detection target because it is similar in appearance to the board. In this way, by including dust having a similar appearance to the unsuitable object in the detection target object, it is possible to reduce the possibility of omission of detection or erroneous detection of the unsuitable substance.

なお、本例の実行ファイルは三つの引数を用いているが、必要に応じて例えばその他設定項目（検出する閾値等）を引数として用いてもよい。また、必要に応じて引数の順番を変更してもよい。なお、ＳＣ１１は一行のみから構成されているが、例えば第１検出部１０１による処理前や処理後のコマンド（例えば前処理や後処理のコマンド）も必要に応じて追加してもよい。例えば、入力データの整理を行うコマンドや、実行後のログデータから必要な情報を抽出するコマンドなどを追加してもよい。 Although the executable file of this example uses three arguments, for example, other setting items (threshold value to be detected, etc.) may be used as arguments if necessary. Further, the order of the arguments may be changed as needed. Although the SC 11 is composed of only one line, for example, pre-processing and post-processing commands (for example, pre-processing and post-processing commands) by the first detection unit 101 may be added as needed. For example, a command for organizing input data, a command for extracting necessary information from log data after execution, and the like may be added.

スクリプトＳＣ１１を用いる場合、例えば、ｓｒｃ＝ｅｘ＿ｓｒｃにて、ごみ撮影装置２で撮影された動画像のファイル（以下、動画ファイルと呼ぶ）あるいは複数の静止画ファイルを指定してもよい。これにより、当該動画ファイルからの検出結果（例えば物体が検出された画像および物体の位置や大きさ情報等）が「ｅｘ＿ｄｓｔ」に保存される。 When the script SC11 is used, for example, a moving image file (hereinafter referred to as a moving image file) taken by the garbage photographing apparatus 2 or a plurality of still image files may be specified by src = ex_src. As a result, the detection result from the moving image file (for example, the image in which the object is detected and the position and size information of the object) is saved in "ex_dst".

設定ファイルＦ１における［ＥＸ２］としてスクリプトＳＣ１２を適用してもよい。ＳＣ１１とＳＣ１２の違いは実行する実行ファイル名と利用する学習済みモデルである。また、［ＥＸ２］は、［ＥＸ１］の検出対象物のうち、特に検出漏れを避けたいものを検出対象物とするものとしてもよい。例えば、木の検出漏れを避けたい場合、［ＥＸ２］は木を検出対象物とするものとしてもよい。これにより、［ＥＸ１］で木が検出漏れした場合であっても、［ＥＸ２］で木を検出できれば、全体として木の検出漏れが生じることがない。また、［ＥＸ１］を全ての検出対象物の一部を検出するものとした場合、［ＥＸ２］は全ての検出対象物の他の一部を検出するものとしてもよい。この場合も、［ＥＸ１］の検出対象物と、［ＥＸ２］の検出対象物の少なくとも１つを重複させておく。なお、ＳＣ１１とＳＣ１２ではそれぞれ別の実行ファイルを用いているが、学習済みモデルのみが違う場合、第１検出コマンドおよび第２検出コマンドは同じであってもよい。すなわち、第１検出部１０１および第２検出部１０２は同じであってもよい。 Script SC12 may be applied as [EX2] in the setting file F1. The difference between SC11 and SC12 is the name of the executable file to be executed and the trained model to be used. Further, [EX2] may be set as the detection target of [EX1], in which the detection omission is particularly desired to be avoided. For example, when it is desired to avoid omission of detection of a tree, [EX2] may set the tree as a detection target. As a result, even if a tree is missed by [EX1], if the tree can be detected by [EX2], the tree will not be missed as a whole. Further, when [EX1] is to detect a part of all the detection objects, [EX2] may be to detect another part of all the detection objects. Also in this case, at least one of the detection target of [EX1] and the detection target of [EX2] are duplicated. Although different executable files are used for SC11 and SC12, the first detection command and the second detection command may be the same when only the trained model is different. That is, the first detection unit 101 and the second detection unit 102 may be the same.

（フローチャートについて）
図５に示すフローチャートの処理（情報処理方法）を説明する。このフローチャートは同図に示す設定ファイルＦ１に沿った処理の流れを示している。このフローチャートの処理が行われる前に、ごみ撮影装置２で撮影された動画ファイルが入力データ格納部１２１の「ｅｘ＿ｓｒｃ」に格納されているとする。なお、動画ファイルの代わりに、該動画ファイルから抽出された複数のフレーム画像またはごみ撮影装置２により時系列で撮影された複数の静止画ファイルが格納されていてもよい。 (About the flowchart)
The processing (information processing method) of the flowchart shown in FIG. 5 will be described. This flowchart shows the flow of processing according to the setting file F1 shown in the figure. It is assumed that the moving image file photographed by the garbage photographing apparatus 2 is stored in the "ex_src" of the input data storage unit 121 before the processing of this flowchart is performed. Instead of the moving image file, a plurality of frame images extracted from the moving image file or a plurality of still image files taken in time series by the garbage photographing apparatus 2 may be stored.

Ｓ１１（第１検出ステップ）では、第１検出部１０１が、入力データ格納部１２１に格納されている画像から第１の学習済みモデルにより物体検出を行う。具体的には、第１検出部１０１は、入力データ格納部１２１の「ｅｘ＿ｓｒｃ」に格納されている動画ファイルから抽出したフレーム画像を入力データとし、そのフレーム画像をスクリプト名「ｅｘ１」の学習済みモデルに入力して物体情報を出力させる。そして、第１検出部１０１は、物体情報に基づいて物体が検出されたか否かを判定し、物体が検出された場合には、そのフレーム画像と物体情報とを対応付けて検出結果とし、検出結果格納部１２２の「ｅｘ＿ｄｓｔ」に記録する。これらの処理は、「ｅｘ＿ｓｒｃ」に格納されている動画ファイルから抽出したフレーム画像のそれぞれについて行われる。 In S11 (first detection step), the first detection unit 101 detects an object from the image stored in the input data storage unit 121 by the first learned model. Specifically, the first detection unit 101 uses a frame image extracted from a moving image file stored in "ex_src" of the input data storage unit 121 as input data, and the frame image has been trained with the script name "ex1". Input to the model and output the object information. Then, the first detection unit 101 determines whether or not the object is detected based on the object information, and if the object is detected, associates the frame image with the object information to obtain a detection result and detects the object. Record in "ex_dst" of the result storage unit 122. These processes are performed for each of the frame images extracted from the moving image file stored in "ex_src".

Ｓ１２（第２検出ステップ、確定ステップ）では、第２検出部１０２が、Ｓ１１と同じフレーム画像から第２の学習済みモデルにより物体検出を行う。このように、第１の学習済みモデルと第２の学習済みモデルに入力する入力データを同じデータとすることにより、検出漏れの発生を抑えることができる。これは、一方の学習済みモデルによる検出で不適物の検出漏れが生じたときでも、他方による検出でその不適物が検出できれば、全体として検出漏れが生じることがないからである。 In S12 (second detection step, confirmation step), the second detection unit 102 detects an object from the same frame image as S11 by the second learned model. In this way, by setting the input data to be input to the first trained model and the second trained model to be the same data, it is possible to suppress the occurrence of detection omission. This is because even if the detection by one of the trained models causes an omission of detection of an unsuitable object, if the unsuitable object can be detected by the detection by the other, the detection omission does not occur as a whole.

Ｓ１２において、第２検出部１０２は、具体的には、入力データ格納部１２１の「ｅｘ＿ｓｒｃ」に格納されている動画ファイルから抽出されたフレーム画像を入力データとし、そのフレーム画像をスクリプト名「ｅｘ２」の学習済みモデルに入力して物体情報を出力させる。そして、第２検出部１０２は、物体情報に基づいて物体が検出されたか否かを判定し、物体が検出された場合には、そのフレーム画像と物体情報とを対応付けて検出結果とする。第２検出部１０２は、この検出結果を第１検出部１０１の検出結果と共通の出力先である、検出結果格納部１２２の「ｅｘ＿ｄｓｔ」に記録する。これらの処理は、「ｅｘ＿ｓｒｃ」に格納されている動画ファイルから抽出したフレーム画像のそれぞれについて行われる。Ｓ１２の処理の終了時点で検出結果格納部１２２の「ｅｘ＿ｄｓｔ」に記録されているデータが最終の検出結果である。つまり、Ｓ１２の処理により、最終の検出結果が確定する。 In S12, the second detection unit 102 specifically uses the frame image extracted from the moving image file stored in the “ex_src” of the input data storage unit 121 as the input data, and uses the frame image as the input data with the script name “ex2”. Is input to the trained model of "" and the object information is output. Then, the second detection unit 102 determines whether or not the object is detected based on the object information, and if the object is detected, associates the frame image with the object information and obtains the detection result. The second detection unit 102 records this detection result in "ex_dst" of the detection result storage unit 122, which is an output destination common to the detection result of the first detection unit 101. These processes are performed for each of the frame images extracted from the moving image file stored in "ex_src". The data recorded in "ex_dst" of the detection result storage unit 122 at the end of the processing of S12 is the final detection result. That is, the final detection result is determined by the processing of S12.

Ｓ１３では、検出結果の出力が行われ、これにより処理は終了する。検出結果の出力は、例えば選択表示制御部１０４が行ってもよい。この場合、選択表示制御部１０４は、Ｓ１１およびＳ１２の処理により「ｅｘ＿ｄｓｔ」に記録されたフレーム画像と物体情報を選択表示装置４に表示させてもよい。これにより、選択表示制御部１０４の使用者は、情報処理装置１の検出結果が正しいか否かを目視で確認し、その確認結果を情報処理装置１に入力することができる。また、選択表示制御部１０４は、入力された上記の確認結果に従って不適物が写っていることが確認された画像を特定することができる。また、選択表示制御部１０４は、目視確認結果に基づいて、物体情報を修正してもよい。そして、目視により不適物が写っていることが確認された画像は、例えば、その不適物の搬入者が再びごみを搬入してきたとき等に、不適物表示制御部１０６によって不適物表示装置５に表示される。 In S13, the detection result is output, and the process ends. For example, the selection display control unit 104 may output the detection result. In this case, the selection display control unit 104 may display the frame image and the object information recorded in "ex_dst" by the processing of S11 and S12 on the selection display device 4. As a result, the user of the selection display control unit 104 can visually confirm whether or not the detection result of the information processing device 1 is correct, and input the confirmation result to the information processing device 1. In addition, the selection display control unit 104 can identify an image in which it is confirmed that an unsuitable object is captured according to the input confirmation result. Further, the selection display control unit 104 may correct the object information based on the visual confirmation result. Then, the image confirmed by visual inspection that the unsuitable object is reflected is displayed on the unsuitable object display device 5 by the unsuitable object display control unit 106, for example, when the person who carries in the unsuitable object brings in the garbage again. Is displayed.

なお、Ｓ１１の処理とＳ１２の処理の実行順序は図５の例に限られず、Ｓ１２の処理を行った後でＳ１１の処理を行ってもよいし、これらの処理を並行で行ってもよい。何れにせよ、Ｓ１１とＳ１２の処理の両方が終了した時点で最終の検出結果が確定する。また、図５の例では、２つの学習済みモデルを用いているが、３つ以上の学習済みモデルを用いてもよい。この場合、設定ファイルＦ１に３つ目以降の学習済みモデルに対応するセクションを追加すればよい。 The execution order of the processing of S11 and the processing of S12 is not limited to the example of FIG. 5, and the processing of S11 may be performed after the processing of S12, or these processes may be performed in parallel. In any case, the final detection result is determined when both the processes of S11 and S12 are completed. Further, in the example of FIG. 5, two trained models are used, but three or more trained models may be used. In this case, a section corresponding to the third and subsequent trained models may be added to the configuration file F1.

〔入力データの解像度について〕
本実施形態のように、学習済みモデルに対する入力データが画像データである場合、第１検出部１０１は、画像データの解像度を低下させて第１の学習済みモデルに入力してもよい。あるいは、第２検出部１０２が、画像データの解像度を低下させて第２の学習済みモデルに入力してもよい。入力する画像データの解像度を低下させることにより、学習済みモデルを用いた物体検出処理の演算量を減らし、その所要時間を短縮することができる。 [About the resolution of input data]
When the input data for the trained model is image data as in the present embodiment, the first detection unit 101 may reduce the resolution of the image data and input it to the first trained model. Alternatively, the second detection unit 102 may reduce the resolution of the image data and input it to the second trained model. By lowering the resolution of the input image data, it is possible to reduce the amount of calculation of the object detection process using the trained model and shorten the required time.

なお、何れの学習済みモデルに対する入力データの解像度を低下させるかは、各学習済みモデルの検出対象に応じて決めておけばよい。例えば、木や板などのサイズの大きい物体の検出は低解像度の画像データでも容易であるが、缶類などのサイズの小さい物体の検出には高解像度の画像データを用いることが好ましい。このため、使用する学習済みモデルのうち、検出対象のサイズが大きいものについては、撮影されたごみ画像を低解像度化したものを入力データとしてもよい。これにより、サイズの大きい物体について、検出精度は落とすことなく、検出処理の高速化を図ることができる。なお、低解像度の画像データを入力データとする学習済みモデルは、入力データと同じ低解像度の画像データを教師データとして構築しておく。また、低解像度の画像データを入力データとして物体検出した後は、画像データの解像度を元に戻して出力する構成としてもよい。解像度を変える処理は、第１検出部１０１または第２検出部１０２が行う構成としてもよいし、解像度を変えるブロックを制御部１０に別途追加してもよい。 It should be noted that which trained model to reduce the resolution of the input data may be determined according to the detection target of each trained model. For example, it is easy to detect a large object such as a tree or a board even with low resolution image data, but it is preferable to use high resolution image data to detect a small object such as a can. Therefore, among the trained models to be used, for a model having a large detection target size, a low-resolution captured garbage image may be used as input data. As a result, it is possible to speed up the detection process for a large-sized object without degrading the detection accuracy. In the trained model that uses low-resolution image data as input data, the same low-resolution image data as the input data is constructed as teacher data. Further, after the object is detected by using the low resolution image data as the input data, the resolution of the image data may be restored and output. The process of changing the resolution may be performed by the first detection unit 101 or the second detection unit 102, or a block for changing the resolution may be separately added to the control unit 10.

また、各学習済みモデルは、検出対象に応じて、中間層の数が異なっていてもよい。例えば、サイズの大きい物体の検出に用いる学習済みモデルの中間層の数は、よりサイズの小さい物体の検出に用いる学習済みモデルの中間層の数よりも少なくしてもよい。このような構成においても、上記の例と同様に、サイズの大きい物体について、検出精度は落とすことなく、検出処理の高速化を図ることができる。 Further, each trained model may have a different number of intermediate layers depending on the detection target. For example, the number of intermediate layers of the trained model used to detect large objects may be less than the number of intermediate layers of the trained model used to detect smaller objects. Even in such a configuration, as in the above example, it is possible to speed up the detection process for a large-sized object without degrading the detection accuracy.

〔学習済みモデルの構築と再学習〕
上述の第１の学習済みモデルおよび第２の学習済みモデルの構築と再学習について図６に基づいて説明する。図６は、学習済みモデルの構築と再学習を説明する図である。ここではニューラルネットワークの学習済みモデルを構築する例を説明する。ニューラルネットワークを利用する場合、中間層を複数としてもよく、この場合の機械学習は深層学習となる。無論、中間層の数は１つとしてもよいし、ニューラルネットワーク以外の機械学習アルゴリズムを適用することもできる。 [Building and retraining trained models]
The construction and retraining of the first trained model and the second trained model described above will be described with reference to FIG. FIG. 6 is a diagram illustrating the construction and re-learning of the trained model. Here, an example of constructing a trained model of a neural network will be described. When using a neural network, there may be a plurality of intermediate layers, and machine learning in this case is deep learning. Of course, the number of intermediate layers may be one, and machine learning algorithms other than neural networks can be applied.

図示のように、学習済みモデルは初期学習により構築される。そして、初期学習により構築された学習済みモデルを用いて物体検出が行われ、その物体検出結果を用いて再学習が行われ、学習済みモデルが更新される。 As shown, the trained model is built by initial training. Then, object detection is performed using the trained model constructed by the initial learning, re-learning is performed using the object detection result, and the trained model is updated.

初期学習には、検出対象物が写っている画像を教師画像とし、教師画像に写っている検出対象物の物体情報（例えば、物体の分類を示す識別子、位置、大きさ、形状等を示す情報）を正解データとする教師データを用いる。教師データは図１の教師データ格納部１２４に保存されているとするが、初期学習で用いる教師データは元教師データ１２４ａのみである。機械学習では、学習部１０３は、この教師画像をニューラルネットワークに入力し、該ニューラルネットワークの出力値が正解データに近付くように重み値を更新する処理を、教師画像を変更しながら繰り返し行う。 In the initial learning, the image showing the detection target is used as the teacher image, and the object information of the detection target shown in the teacher image (for example, the identifier indicating the classification of the object, the position, the size, the shape, etc.) is shown. ) Is the correct answer data, and the teacher data is used. It is assumed that the teacher data is stored in the teacher data storage unit 124 of FIG. 1, but the teacher data used in the initial learning is only the original teacher data 124a. In machine learning, the learning unit 103 repeatedly inputs the teacher image to the neural network and updates the weight value so that the output value of the neural network approaches the correct answer data while changing the teacher image.

機械学習において、基本的には繰り返し回数が多いほど重み値は最適な値に近付くが、過学習等の要因で、繰り返し後に重み値が最適な値から離れることもある。また、複数の検出対象物を検出する学習済みモデルを構築する場合、ある重み値を適用したときにはある検出対象物の検出精度が高いが、他の検出対象物の検出精度が低くなるということもあり得る。 In machine learning, basically, as the number of repetitions increases, the weight value approaches the optimum value, but due to factors such as overfitting, the weight value may deviate from the optimum value after the repetition. In addition, when constructing a trained model that detects a plurality of detection objects, the detection accuracy of a certain detection target is high when a certain weight value is applied, but the detection accuracy of another detection target is low. possible.

そこで、図６の例では、重み値の異なる複数の学習済みモデル１〜Ｉを生成している。そして、これらの学習済みモデルの中から上述の第１の学習済みモデルとして適用するものと、上述の第２の学習済みモデルとして適用するものとを選定する。例えば、学習済みモデル１〜Ｉのそれぞれに、検出対象物が写っている画像をテストデータとして入力し、その出力値から各検出対象物の検出精度を算出して、算出した検出精度を基準として上記選定を行ってもよい。なお、これらの学習済みモデルは学習済みモデル格納部１２３に保存される。 Therefore, in the example of FIG. 6, a plurality of trained models 1 to I having different weight values are generated. Then, from these trained models, the one to be applied as the above-mentioned first trained model and the one to be applied as the above-mentioned second trained model are selected. For example, an image showing a detection target is input to each of the trained models 1 to I as test data, the detection accuracy of each detection target is calculated from the output value, and the calculated detection accuracy is used as a reference. The above selection may be made. These trained models are stored in the trained model storage unit 123.

また、この選定では、第１の学習済みモデルと第２の学習済みモデルの両方の検出精度が低い検出対象物が生じないようにすることが好ましい。例えば、第１の学習済みモデルが長尺物の検出精度が低い場合、長尺物の検出精度が高いものを第２の学習済みモデルとすることが好ましい。なお、学習済みモデル１〜Ｉから第１の学習済みモデルと第２の学習済みモデルの両方を選定する必要はない。例えば、学習済みモデル１〜Ｉから第１の学習済みモデルを選定した場合、別途構築した複数の学習済みモデルから第２の学習済みモデルを選定してもよい。 Further, in this selection, it is preferable not to generate a detection object having low detection accuracy in both the first trained model and the second trained model. For example, when the first trained model has a low detection accuracy of a long object, it is preferable to use a model having a high detection accuracy of a long object as the second trained model. It is not necessary to select both the first trained model and the second trained model from the trained models 1 to I. For example, when the first trained model is selected from the trained models 1 to I, the second trained model may be selected from a plurality of trained models constructed separately.

また、学習済みモデルの選定は、人手によって行ってもよいし、情報処理装置１に行わせてもよい。後者の場合、学習済みモデルの選定基準を予め設定し、その選定基準を充足するか否かの判定に必要な情報（例えば各学習済みモデルにおける各検出対象物の検出精度を示す情報）を情報処理装置１に入力するか、算出させればよい。 Further, the trained model may be manually selected or may be selected by the information processing apparatus 1. In the latter case, the selection criteria of the trained model are set in advance, and the information necessary for determining whether or not the selection criteria are satisfied (for example, information indicating the detection accuracy of each detection target in each trained model) is provided. It may be input to the processing device 1 or calculated.

このようにして選定した第１の学習済みモデルと第２の学習済みモデルを用いて検出対象画像から物体検出を行うことにより、図５に基づいて説明したように、物体が検出された画像と物体情報が検出結果格納部１２２に記録されていく。学習部１０３は、この画像を教師画像とし、この画像の物体情報を正解データとして追加した教師データと、初期学習に用いた元教師データ１２４ａとを用いて再学習を行う。なお、教師データとして使用する画像と物体情報は、選択表示装置４を介した目視により正しいことが確認されたものとすることが好ましい。ここで選ばれた教師データは追加教師データ１２２ａとし、再学習のために教師データ格納部１２４に追加教師データ１２４ｂとしてコピーしてもよい。なお、ここで追加教師データ１２４ｂは追加教師データ１２２ａのコピーとしているが、コピーせずに１２４ｂを１２２ａと同じにしてもよい。 By performing object detection from the detection target image using the first trained model and the second trained model selected in this way, as described with reference to FIG. 5, the image in which the object is detected and the image Object information is recorded in the detection result storage unit 122. The learning unit 103 uses this image as a teacher image, and relearns using the teacher data to which the object information of this image is added as correct answer data and the original teacher data 124a used for the initial learning. It is preferable that the image and the object information used as the teacher data are confirmed to be correct by visual inspection through the selection display device 4. The teacher data selected here may be the additional teacher data 122a, and may be copied as the additional teacher data 124b to the teacher data storage unit 124 for re-learning. Although the additional teacher data 124b is a copy of the additional teacher data 122a here, 124b may be the same as 122a without copying.

なお、検出結果格納部１２２には画像および物体情報が記録されるが、入力データが静止画の場合は画像を保存する必要はなく、入力データの画像ファイル名のみを記録してもよい。 The image and object information are recorded in the detection result storage unit 122, but when the input data is a still image, it is not necessary to save the image, and only the image file name of the input data may be recorded.

再学習では、学習部１０３は、教師データ格納部１２４に格納されている元教師データ１２４ａと追加教師データ１２４ｂとを用いて学習を行い、重み値の異なる複数の再学習済みモデル１〜Ｊを構築する。そして、その中から第１の学習済みモデルと第２の学習済みモデルが選定される。初期学習との相違点は、機械学習に使用する教師データに追加教師データ１２４ｂが追加されている点である。教師データが追加されることにより、学習済みモデルの検出精度の向上が期待できる。なお、再学習では、学習部１０３は、物体検出の結果として記録されている物体情報の全てを使用する必要はなく、該物体情報の一部を選定して使用してもよい。また、学習部１０３は、再学習を複数回繰り返してもよい。また、実施形態２以降の各学習済みモデルも上記と同様にして構築することができ、また再学習することができる。 In the re-learning, the learning unit 103 learns using the original teacher data 124a and the additional teacher data 124b stored in the teacher data storage unit 124, and relearns a plurality of relearned models 1 to J having different weight values. To construct. Then, the first trained model and the second trained model are selected from them. The difference from the initial learning is that additional teacher data 124b is added to the teacher data used for machine learning. By adding teacher data, it is expected that the detection accuracy of the trained model will be improved. In the re-learning, the learning unit 103 does not need to use all of the object information recorded as a result of the object detection, and may select and use a part of the object information. Further, the learning unit 103 may repeat the re-learning a plurality of times. Further, each trained model of the second and subsequent embodiments can be constructed in the same manner as described above, and can be retrained.

〔実施形態２〕
本発明の他の実施形態について、以下に説明する。なお、説明の便宜上、上記実施形態にて説明した部材と同じ機能を有する部材については、同じ符号を付記し、その説明を繰り返さない。これは後述の実施形態３も同様である。 [Embodiment 2]
Other embodiments of the present invention will be described below. For convenience of explanation, the same reference numerals will be added to the members having the same functions as the members described in the above embodiment, and the description will not be repeated. This also applies to the third embodiment described later.

〔制御部の構成例〕
本実施形態の情報処理装置１の制御部１０の構成例を図７に基づいて説明する。図７は、実施形態２に係る情報処理装置１が備える制御部１０の構成例を示すブロック図である。また、図７では、大容量記憶部１２についても併せて図示している。 [Configuration example of control unit]
A configuration example of the control unit 10 of the information processing device 1 of the present embodiment will be described with reference to FIG. FIG. 7 is a block diagram showing a configuration example of the control unit 10 included in the information processing device 1 according to the second embodiment. Further, in FIG. 7, the large-capacity storage unit 12 is also shown.

図７に示すように、制御部１０には、第１検出部２０１、第２検出部２０２Ａ、および第２検出部２０２Ｂが含まれている。なお、学習部１０３、選択表示制御部１０４、搬入車両特定部１０５、および不適物表示制御部１０６は実施形態１と同様であるから図示を省略している。 As shown in FIG. 7, the control unit 10 includes a first detection unit 201, a second detection unit 202A, and a second detection unit 202B. Since the learning unit 103, the selection display control unit 104, the carry-in vehicle identification unit 105, and the unsuitable object display control unit 106 are the same as those in the first embodiment, they are not shown.

第１検出部２０１は、複数種類の第１の検出対象を検出できるように機械学習された第１の学習済みモデルに入力データを入力して上記第１の検出対象を検出する。第１検出部２０１は、実施形態１の第１検出部１０１と同様の機能を有しているが、第１検出部２０１の検出結果に基づいて第２検出部２０２Ａおよび第２検出部２０２Ｂが使用する入力データが決定される点で、第１検出部１０１と相違している。 The first detection unit 201 inputs input data into the first trained model machine-learned so that a plurality of types of first detection targets can be detected, and detects the first detection target. The first detection unit 201 has the same function as the first detection unit 101 of the first embodiment, but the second detection unit 202A and the second detection unit 202B are based on the detection result of the first detection unit 201. It differs from the first detection unit 101 in that the input data to be used is determined.

第２検出部２０２Ａは、複数種類の第１の検出対象の少なくとも一部である第２の検出対象Ａを検出できるように機械学習された第２の学習済みモデルＡに上記入力データ（第１検出部２０１の検出結果）を入力して上記第２の検出対象Ａを検出する。第２の検出対象Ａについては、第２検出部２０２Ａの検出結果が最終の検出結果となる。第２検出部２０２Ａは、第２の学習済みモデルＡに対する入力データとして、第１検出部２０１に対する入力データのうち、第１検出部２０１が検出対象を検出した入力データを用いる点で、実施形態１の第２検出部１０２と相違している。 The second detection unit 202A inputs the input data (first) to the second trained model A machine-learned so as to be able to detect the second detection target A, which is at least a part of the first detection target of a plurality of types. The detection result of the detection unit 201) is input to detect the second detection target A. For the second detection target A, the detection result of the second detection unit 202A is the final detection result. The second detection unit 202A uses, as the input data for the second trained model A, the input data for which the first detection unit 201 has detected the detection target among the input data for the first detection unit 201. It is different from the second detection unit 102 of 1.

第２検出部２０２Ｂは、第２検出部２０２Ａと同様に、第１検出部２０１によって検出対象が検出された入力データを第２の学習済みモデルＢに入力して該入力データ（第１検出部２０１の検出結果）から、複数種類の第１の検出対象の少なくとも一部である第２の検出対象Ｂを検出する。第２の検出対象Ｂについては、第２検出部２０２Ｂの検出結果が最終の検出結果となる。なお、第２の検出対象Ａと第２の検出対象Ｂは、異なる物体である。 Similar to the second detection unit 202A, the second detection unit 202B inputs the input data in which the detection target is detected by the first detection unit 201 into the second learned model B, and the input data (first detection unit 202B). From the detection result of 201), the second detection target B, which is at least a part of the first detection target of a plurality of types, is detected. For the second detection target B, the detection result of the second detection unit 202B is the final detection result. The second detection target A and the second detection target B are different objects.

以上のように、情報処理装置１は、複数種類の第１の検出対象を検出できるように機械学習された第１の学習済みモデルに入力データを入力して上記第１の検出対象を検出する第１検出部２０１と、上記複数種類の第１の検出対象の少なくとも一部である第２の検出対象を検出できるように機械学習された第２の学習済みモデルに上記入力データを入力して上記第２の検出対象を検出する第２検出部２０２Ａと、を備えている。そして、情報処理装置１は、第１検出部２０１の検出結果と、第２検出部２０２Ａの検出結果とに基づいて、最終の検出結果を確定する。具体的には、本実施形態の情報処理装置１では、第２検出部２０２Ａは、上記複数の入力データのうち第１検出部２０１によって上記第１の検出対象が検出された入力データを第２の学習済みモデルに入力して第２の検出対象Ａを検出し、第２の検出対象Ａについては、第２検出部２０２Ａの検出結果を最終の検出結果とする。また、第２検出部２０２Ｂは、上記複数の入力データのうち第１検出部２０１によって上記第１の検出対象が検出された入力データを第２の学習済みモデルＢに入力して第２の検出対象Ｂを検出し、第２の検出対象Ｂについては、第２検出部２０２Ｂの検出結果を最終の検出結果とする。 As described above, the information processing apparatus 1 inputs input data into the first trained model machine-learned so as to be able to detect a plurality of types of first detection targets, and detects the first detection target. The input data is input to the first detection unit 201 and the second trained model machine-learned so that the second detection target, which is at least a part of the first detection target of the plurality of types, can be detected. It includes a second detection unit 202A that detects the second detection target. Then, the information processing device 1 determines the final detection result based on the detection result of the first detection unit 201 and the detection result of the second detection unit 202A. Specifically, in the information processing device 1 of the present embodiment, the second detection unit 202A secondly uses the input data in which the first detection target is detected by the first detection unit 201 among the plurality of input data. The second detection target A is detected by inputting to the trained model of the above, and the detection result of the second detection unit 202A is set as the final detection result for the second detection target A. Further, the second detection unit 202B inputs the input data in which the first detection target is detected by the first detection unit 201 out of the plurality of input data to the second trained model B to perform the second detection. The target B is detected, and the detection result of the second detection unit 202B is set as the final detection result for the second detection target B.

上記の構成によれば、第１の学習済みモデルと第２の学習済みモデルＡの検出対象の少なくとも一部が重複しているため、当該重複部分について誤検出が生じる可能性を低減することができる。同様に、第１の学習済みモデルと第２の学習済みモデルＢの検出対象の少なくとも一部も重複しているため、当該重複部分について誤検出が生じる可能性を低減することができる。よって、機械学習済みモデルを用いた検出の検出精度を高めることが可能になる。また、上記の構成によれば、第１検出部２０１が、複数種類の第１の検出対象を検出できるように機械学習された第１の学習済みモデルを用いるため、複数種類の第１の検出対象を一括して効率的に検出することができる。 According to the above configuration, since at least a part of the detection target of the first trained model and the second trained model A overlaps, the possibility of erroneous detection of the overlapped part can be reduced. it can. Similarly, since at least a part of the detection target of the first trained model and the second trained model B overlaps, the possibility of erroneous detection of the overlapped part can be reduced. Therefore, it is possible to improve the detection accuracy of the detection using the machine-learned model. Further, according to the above configuration, since the first detection unit 201 uses the first trained model machine-learned so that a plurality of types of first detection targets can be detected, a plurality of types of first detections are performed. Targets can be detected collectively and efficiently.

なお、最終の検出結果を決定する方法は、上述の方法に限られない。例えば、実施形態１で説明したように、検出結果を統合するためのブロックを制御部１０に追加して、このブロックによって第１検出部２０１および第２検出部２０２Ａの検出結果を統合し、最終の検出結果としてもよい。第１検出部２０１および第２検出部２０２Ｂの検出結果の統合についても同様である。また、実施形態１の「入力データの解像度について」で説明した例と同様に、第１の学習済みモデルの入力データか、第２の学習済みモデルＡおよび第２の学習済みモデルＢの何れかまたは両方の入力データとして、解像度を低下させた画像データを用いてもよい。 The method for determining the final detection result is not limited to the above method. For example, as described in the first embodiment, a block for integrating the detection results is added to the control unit 10, and the detection results of the first detection unit 201 and the second detection unit 202A are integrated by this block, and finally. It may be the detection result of. The same applies to the integration of the detection results of the first detection unit 201 and the second detection unit 202B. Further, as in the example described in "Resolution of input data" of the first embodiment, either the input data of the first trained model or the second trained model A and the second trained model B. Alternatively, image data having a reduced resolution may be used as both input data.

なお、図７の例では、第１検出部２０１の検出対象物の一部を検出するブロックとして第２検出部２０２Ａおよび第２検出部２０２Ｂの２つを記載している。しかし、第１検出部２０１の検出対象物の一部を検出するブロックは１つのみであってもよいし、３つ以上であってもよい。 In the example of FIG. 7, two blocks, a second detection unit 202A and a second detection unit 202B, are described as blocks for detecting a part of the detection target object of the first detection unit 201. However, the first detection unit 201 may have only one block for detecting a part of the detection target object, or may have three or more blocks.

〔処理の流れ〕
図８は、本実施形態の情報処理装置１が実行する処理の流れを説明する図である。本実施形態の情報処理装置１が実行する処理とその実行順序は、例えば、図８に示す設定ファイルＦ２により定義することができると共に、同図のフローチャートで表すこともできる。 [Processing flow]
FIG. 8 is a diagram illustrating a flow of processing executed by the information processing apparatus 1 of the present embodiment. The processing executed by the information processing apparatus 1 of the present embodiment and the execution order thereof can be defined by, for example, the setting file F2 shown in FIG. 8 and can also be represented by the flowchart of the figure.

（設定ファイルについて）
設定ファイルＦ２では、［ＥＸ＿ａｌｌ］、［ＥＸ＿ｇｏｚａ］、および［ＥＸ＿ｔｒｅｅ］という３つのセクションがこの順序で定義されている。それぞれのスクリプトの詳細は省略するがＳＣ１１およびＳＣ１２と同様である。［ＥＸ＿ａｌｌ］で用いる学習済みモデルはＳＣ１１と同じであり、全ての検出対象物（例えば、段ボール、板、木、ござ、および長尺物）を検出するセクションである。［ＥＸ＿ｇｏｚａ］は、検出対象物のうちござの検出に特化したセクションであり、［ＥＸ＿ｔｒｅｅ］は、検出対象物のうち木の検出に特化したセクションである。 (About the configuration file)
In the configuration file F2, three sections [EX_all], [EX_goza], and [EX_tree] are defined in this order. The details of each script are omitted, but they are the same as those of SC11 and SC12. The trained model used in [EX_all] is the same as SC11, and is a section for detecting all detection objects (for example, cardboard, board, wood, goza, and long objects). [EX_goza] is a section of the detection target that is specialized for detecting goza, and [EX_tree] is a section of the detection target that is specialized for the detection of trees.

［ＥＸ＿ｇｏｚａ］および［ＥＸ＿ｔｒｅｅ］のｓｒｃは、何れも［ＥＸ＿ａｌｌ］のｄｓｔである「ａｌｌ＿ｒｅｓ」である。つまり、［ＥＸ＿ｇｏｚａ］は、［ＥＸ＿ａｌｌ］により少なくとも何れかの検出対象物が検出された画像からござを検出し、［ＥＸ＿ｔｒｅｅ］は、［ＥＸ＿ａｌｌ］により少なくとも何れかの検出対象物が検出された画像から木を検出する。そして、［ＥＸ＿ｇｏｚａ］によるござの検出結果は「ｇｏｚａ＿ｒｅｓ」に出力され、［ＥＸ＿ｔｒｅｅ］による木の検出結果は「ｔｒｅｅ＿ｒｅｓ」に出力される。これらの検出結果が最終の検出結果となる。 The src of [EX_goza] and [EX_tree] are both "all_res", which is the dst of [EX_all]. That is, [EX_goza] is an image in which at least one of the detection objects is detected by [EX_all], and [EX_tree] is an image in which at least one of the detection objects is detected by [EX_all]. Detect trees from. Then, the detection result of the goza by [EX_goza] is output to "goza_res", and the detection result of the tree by [EX_tree] is output to "tree_res". These detection results are the final detection results.

（フローチャートについて）
制御部１０は、［ＥＸ＿ａｌｌ］に定義されているファイル名が「ａｌｌ」であるスクリプトファイルを実行することにより、第１検出部２０１として機能する。また、制御部１０は、上記スクリプトファイルの実行終了後に、［ＥＸ＿ｇｏｚａ］に定義されているファイル名が「ｇｏｚａ」であるスクリプトファイルを実行することにより、第２検出部２０２Ａとして機能する。そして、制御部１０は、上記スクリプトファイルの実行終了後に、［ＥＸ＿ｔｒｅｅ］に定義されているファイル名が「ｔｒｅｅ」であるスクリプトファイルを実行することにより、第２検出部２０２Ｂとして機能する。 (About the flowchart)
The control unit 10 functions as the first detection unit 201 by executing the script file whose file name is "all" defined in [EX_all]. Further, the control unit 10 functions as the second detection unit 202A by executing the script file whose file name is "goza" defined in [EX_goza] after the execution of the script file is completed. Then, the control unit 10 functions as the second detection unit 202B by executing the script file whose file name is "tree" defined in [EX_tree] after the execution of the script file is completed.

以下、これらの処理部が実行する処理（情報処理方法）をフローチャートに基づいて説明する。このフローチャートの処理が行われる前に、ごみ撮影装置２で撮影された動画ファイルが入力データ格納部１２１の「ａｌｌ＿ｓｒｃ」に格納されているとする。なお、動画ファイルの代わりに、該動画ファイルから抽出された複数のフレーム画像またはごみ撮影装置２により時系列で撮影された複数の静止画ファイルが格納されていてもよい。 Hereinafter, the processing (information processing method) executed by these processing units will be described based on the flowchart. It is assumed that the moving image file photographed by the garbage photographing apparatus 2 is stored in "all_src" of the input data storage unit 121 before the processing of this flowchart is performed. Instead of the moving image file, a plurality of frame images extracted from the moving image file or a plurality of still image files taken in time series by the garbage photographing apparatus 2 may be stored.

Ｓ２１（第１検出ステップ）では、第１検出部２０１が、第１の学習済みモデルを用いて、入力データ格納部１２１に格納されている処理対象画像から全ての検出対象物について物体検出を行う。具体的には、第１検出部２０１は、入力データ格納部１２１の「ａｌｌ＿ｓｒｃ」に格納されている動画ファイルから抽出したフレーム画像を入力データとし、そのフレーム画像をスクリプト名「ａｌｌ」の学習済みモデルに入力して物体情報を出力させる。そして、第１検出部２０１は、フレーム画像から物体が検出された場合には、そのフレーム画像と物体情報とを対応付けて検出結果とし、検出結果格納部１２２の「ａｌｌ＿ｒｅｓ」に記録する。これらの処理は、上記動画ファイルから抽出したフレーム画像のそれぞれについて行われる。 In S21 (first detection step), the first detection unit 201 uses the first trained model to detect objects for all the detection targets from the processing target image stored in the input data storage unit 121. .. Specifically, the first detection unit 201 uses a frame image extracted from a moving image file stored in "all_src" of the input data storage unit 121 as input data, and the frame image has been trained with the script name "all". Input to the model and output the object information. Then, when an object is detected from the frame image, the first detection unit 201 associates the frame image with the object information to obtain a detection result, and records the object in "all_res" of the detection result storage unit 122. These processes are performed for each of the frame images extracted from the moving image file.

上述のように、「ａｌｌ＿ｒｅｓ」に記録されたフレーム画像は、［ＥＸ＿ｇｏｚａ］および［ＥＸ＿ｔｒｅｅ］の入力データとなり、ござと木については再度の検出が試みられる。このため、第１検出部２０１は、ござと木の誤検出が増えても、ござと木の見逃し、すなわちフレーム画像に写るござや木が検出できないことは避けることが好ましい。よって、第１の学習済みモデルの出力値に基づく物体検出において、該出力値に含まれる確率値と比較する検出閾値は低めに設定してもよい。 As described above, the frame image recorded in "all_res" becomes the input data of [EX_goza] and [EX_tree], and the detection of the goza and the tree is attempted again. For this reason, it is preferable that the first detection unit 201 avoids overlooking the rough wood, that is, not being able to detect the rough wood in the frame image, even if the false detection of the rough wood increases. Therefore, in the object detection based on the output value of the first trained model, the detection threshold value to be compared with the probability value included in the output value may be set lower.

Ｓ２２（第２検出ステップ、確定ステップ）では、第２検出部２０２Ａが、Ｓ２１で物体が検出された画像から第２の学習済みモデルＡにより、本例における第２の検出対象Ａであるござの検出を行う。具体的には、第２検出部２０２Ａは、検出結果格納部１２２の「ａｌｌ＿ｒｅｓ」に記録されているフレーム画像を入力データとし、そのフレーム画像をスクリプト名「ｇｏｚａ」の学習済みモデルに入力して物体情報を出力させる。そして、第２検出部２０２Ａは、ござが検出されたと判定した場合には、そのフレーム画像と物体情報とを対応付けて検出結果とし、検出結果格納部１２２の「ｇｏｚａ＿ｒｅｓ」に記録する。これらの処理は、「ａｌｌ＿ｒｅｓ」に格納されているフレーム画像のそれぞれについて行われる。Ｓ２２の処理の終了時点で検出結果格納部１２２の「ｇｏｚａ＿ｒｅｓ」に記録されているデータがござについての最終の検出結果である。つまり、Ｓ２２の処理により、ござの最終の検出結果が確定する。 In S22 (second detection step, confirmation step), the second detection unit 202A is the second detection target A in this example by the second learned model A from the image in which the object is detected in S21. Perform detection. Specifically, the second detection unit 202A uses the frame image recorded in "all_res" of the detection result storage unit 122 as input data, and inputs the frame image into the trained model of the script name "goza". Output object information. Then, when the second detection unit 202A determines that the goza has been detected, the frame image and the object information are associated with each other to obtain a detection result, which is recorded in "goza_res" of the detection result storage unit 122. These processes are performed for each of the frame images stored in "all_res". The data recorded in "goza_res" of the detection result storage unit 122 at the end of the processing of S22 is the final detection result of the goza. That is, the final detection result of the goza is determined by the process of S22.

Ｓ２３では、第２検出部２０２Ｂが、Ｓ２１で物体が検出された画像から第２の学習済みモデルＢにより、本例における第２の検出対象Ｂである木の検出を行う。具体的には、第２検出部２０２Ｂは、検出結果格納部１２２の「ａｌｌ＿ｒｅｓ」に記録されているフレーム画像を入力データとし、そのフレーム画像をスクリプト名「ｔｒｅｅ」の学習済みモデルに入力して物体情報を出力させる。そして、第２検出部２０２Ｂは、木が検出されたと判定した場合には、そのフレーム画像と物体情報とを対応付けて検出結果とし、検出結果格納部１２２の「ｔｒｅｅ＿ｒｅｓ」に記録する。これらの処理は、「ａｌｌ＿ｒｅｓ」に格納されているフレーム画像のそれぞれについて行われる。Ｓ２３の処理の終了時点で検出結果格納部１２２の「ｔｒｅｅ＿ｒｅｓ」に記録されているデータが木についての最終の検出結果である。つまり、Ｓ２３の処理により、木の最終の検出結果が確定する。 In S23, the second detection unit 202B detects the tree, which is the second detection target B in this example, by the second learned model B from the image in which the object is detected in S21. Specifically, the second detection unit 202B uses the frame image recorded in "all_res" of the detection result storage unit 122 as input data, and inputs the frame image into the trained model of the script name "tree". Output object information. Then, when the second detection unit 202B determines that the tree has been detected, the frame image and the object information are associated with each other to obtain a detection result, and the tree is recorded in the "tree_res" of the detection result storage unit 122. These processes are performed for each of the frame images stored in "all_res". The data recorded in the “tree_res” of the detection result storage unit 122 at the end of the processing of S23 is the final detection result for the tree. That is, the final detection result of the tree is determined by the processing of S23.

Ｓ２４では、図５のＳ１３と同様にして検出結果の出力が行われ、これにより処理は終了する。なお、ござの検出結果は検出結果格納部１２２の「ｇｏｚａ＿ｒｅｓ」から読み出せばよく、木の検出結果は検出結果格納部１２２の「ｔｒｅｅ＿ｒｅｓ」から読み出せばよい。また、ござと木以外の検出対象物の検出結果は、検出結果格納部１２２の「ａｌｌ＿ｒｅｓ」から読み出せばよい。 In S24, the detection result is output in the same manner as in S13 of FIG. 5, and the process ends. The detection result of the goza may be read from "goza_res" of the detection result storage unit 122, and the detection result of the tree may be read from "tree_res" of the detection result storage unit 122. Further, the detection result of the detection target object other than the goza and the tree may be read from "all_res" of the detection result storage unit 122.

［ＥＸ＿ａｌｌ］の検出結果には、木ではないものが木として検出されたり、ござではないものがござとして検出されたりする誤検出が含まれ得る。しかし、上記の処理によれば［ＥＸ＿ａｌｌ］で何らかの物体が検出されたフレーム画像については、［ＥＸ＿ｔｒｅｅ］と［ＥＸ＿ｇｏｚａ］による物体検出に供されるので、ござと木を高精度に検出することができる。 The detection result of [EX_all] may include erroneous detection in which a non-tree is detected as a tree or a non-tree is detected as a goza. However, according to the above processing, the frame image in which some object is detected by [EX_all] is used for object detection by [EX_tree] and [EX_goza], so that it is possible to detect the rough wood with high accuracy. it can.

また、上記の構成によれば、［ＥＸ＿ｔｒｅｅ］と［ＥＸ＿ｇｏｚａ］の２つを用いて物体検出する場合と比べて、処理が高速化される場合がある。例えば、１つの動画ファイルから２００枚のフレーム画像を抽出した場合、［ＥＸ＿ｔｒｅｅ］と［ＥＸ＿ｇｏｚａ］の２つを用いれば、［ＥＸ＿ｔｒｅｅ］と［ＥＸ＿ｇｏｚａ］のそれぞれにより２００枚のフレーム画像が処理される。この場合、物体検出処理は、合計で４００回行われる。一方、上記の構成によれば、最初に、２００枚のフレーム画像のそれぞれが［ＥＸ＿ａｌｌ］によって処理される。ここで、３０枚のフレーム画像で物体が検出されたとすると、［ＥＸ＿ｔｒｅｅ］と［ＥＸ＿ｇｏｚａ］のそれぞれで処理されるフレーム画像は３０枚となり、物体検出処理は合計で２６０回行われることになる。よって、物体検出処理の実行回数を大きく削減して、当該処理の所要時間を大きく削減することができる。 Further, according to the above configuration, the processing speed may be increased as compared with the case where the object is detected by using both [EX_tree] and [EX_goza]. For example, when 200 frame images are extracted from one video file, if two of [EX_tree] and [EX_goza] are used, 200 frame images are processed by each of [EX_tree] and [EX_goza]. .. In this case, the object detection process is performed 400 times in total. On the other hand, according to the above configuration, first, each of the 200 frame images is processed by [EX_all]. Here, assuming that an object is detected in 30 frame images, the number of frame images processed by each of [EX_tree] and [EX_goza] is 30, and the object detection process is performed 260 times in total. Therefore, the number of times the object detection process is executed can be greatly reduced, and the time required for the process can be significantly reduced.

〔実施形態３〕
本実施形態の情報処理装置１の制御部１０の構成例を図９に基づいて説明する。図９は、実施形態３に係る情報処理装置１が備える制御部１０の構成例を示すブロック図である。また、図９では、大容量記憶部１２についても併せて図示している。 [Embodiment 3]
A configuration example of the control unit 10 of the information processing device 1 of the present embodiment will be described with reference to FIG. FIG. 9 is a block diagram showing a configuration example of the control unit 10 included in the information processing device 1 according to the third embodiment. Further, in FIG. 9, the large-capacity storage unit 12 is also shown.

図９に示すように、制御部１０には、ごみ画像抽出部３０１、第１検出部３０２、第２検出部３０３、および検出結果統合部３０４が含まれている。なお、学習部１０３、選択表示制御部１０４、搬入車両特定部１０５、および不適物表示制御部１０６は実施形態１と同様であるから図示を省略している。 As shown in FIG. 9, the control unit 10 includes a dust image extraction unit 301, a first detection unit 302, a second detection unit 303, and a detection result integration unit 304. Since the learning unit 103, the selection display control unit 104, the carry-in vehicle identification unit 105, and the unsuitable object display control unit 106 are the same as those in the first embodiment, they are not shown.

ごみ画像抽出部３０１は、物体検出の対象となる画像（例えば、動画ファイルから抽出した各フレーム画像）から、ごみが写っている画像を抽出する。これにより、第１検出部３０２および第２検出部３０３が検出対象とする画像を、ごみが写っている画像に絞り込むことができるので、物体検出処理の実行回数を削減して、該処理を高速化することが可能になる。例えば、動画ファイルから抽出したフレーム画像のうち、３／４にはごみが写っていなかった場合、第１検出部３０２および第２検出部３０３は、ごみが写っているフレーム画像（全フレーム画像の１／４）を対象として物体検出処理を行えばよい。よって、全てのフレーム画像を対象として物体検出処理を行う場合と比べて、物体検出処理の実行回数を大きく削減して、当該処理の所要時間を大きく削減することができる。 The dust image extraction unit 301 extracts an image showing dust from an image to be detected as an object (for example, each frame image extracted from a moving image file). As a result, the images to be detected by the first detection unit 302 and the second detection unit 303 can be narrowed down to the images in which dust is reflected, so that the number of executions of the object detection process can be reduced and the process can be speeded up. It becomes possible to change. For example, when dust is not reflected in 3/4 of the frame images extracted from the moving image file, the first detection unit 302 and the second detection unit 303 are used to capture the dust in the frame image (of all frame images). The object detection process may be performed for 1/4). Therefore, as compared with the case where the object detection process is performed for all the frame images, the number of times the object detection process is executed can be greatly reduced, and the time required for the process can be greatly reduced.

ごみ画像抽出部３０１は、例えば、スロープ６００上をごみが流れている画像と、流れていない画像とを教師データとして構築した学習済みモデルを用いて上記抽出を行ってもよい。この学習済みモデルは、スロープ６００上のごみの有無を識別できればよく、ごみの分類の判別等は不要である。よって、この学習済みモデルをニューラルネットワークのモデルとした場合、後述する第１検出部３０２や第２検出部３０３が使用する学習済みモデルと比べて、中間層の数を少なくしてもよい。 The dust image extraction unit 301 may perform the above extraction using, for example, a trained model constructed by using an image in which dust is flowing on the slope 600 and an image in which dust is not flowing as teacher data. This trained model only needs to be able to identify the presence or absence of dust on the slope 600, and it is not necessary to discriminate the classification of dust. Therefore, when this trained model is used as a model of a neural network, the number of intermediate layers may be smaller than that of the trained model used by the first detection unit 302 and the second detection unit 303, which will be described later.

また、ごみ画像抽出部３０１が使用する学習済みモデルの入力データとする画像は、第１検出部３０２や第２検出部３０３が使用する学習済みモデルの入力データとする画像よりも、低解像度の画像としてもよい。なお、第１検出部３０２や第２検出部３０３には、より高解像度の画像を入力することが好ましい。このため、ごみ画像を低解像度化してごみ画像抽出部３０１の入力データとした場合、ごみ画像抽出部３０１は、低解像度化前の解像度の画像を出力結果として保存することが好ましい。 Further, the image used as the input data of the trained model used by the garbage image extraction unit 301 has a lower resolution than the image used as the input data of the trained model used by the first detection unit 302 and the second detection unit 303. It may be an image. It is preferable to input a higher resolution image to the first detection unit 302 and the second detection unit 303. Therefore, when the waste image is reduced in resolution and used as the input data of the waste image extraction unit 301, it is preferable that the waste image extraction unit 301 saves the image with the resolution before the reduction in resolution as the output result.

ごみ画像抽出部３０１は、情報処理装置１の必須の構成要素ではないが、第１検出部３０２および第２検出部３０３による物体検出を効率化するために含めている。ごみ画像抽出部３０１は、実施形態１、２の情報処理装置１にも適用可能である。 The dust image extraction unit 301 is not an essential component of the information processing device 1, but is included in order to improve the efficiency of object detection by the first detection unit 302 and the second detection unit 303. The waste image extraction unit 301 can also be applied to the information processing device 1 of the first and second embodiments.

第１検出部３０２は、複数種類の第１の検出対象を検出できるように機械学習された第１の学習済みモデルに入力データを入力して上記第１の検出対象を検出する。また、第２検出部３０３は、上記第１の検出対象とは異なる第３の検出対象を検出できるように機械学習された第３の学習済みモデルに上記入力データを入力して上記第３の検出対象を検出する。 The first detection unit 302 inputs input data into the first trained model that has been machine-learned so that a plurality of types of first detection targets can be detected, and detects the first detection target. Further, the second detection unit 303 inputs the input data to the third trained model machine-learned so as to detect a third detection target different from the first detection target, and the third detection unit 303 inputs the input data to the third trained model. Detect the detection target.

本実施形態の情報処理装置１においても、上述の各実施形態と同様に、第１検出部３０２の検出結果と、第２検出部３０３の検出結果とに基づいて、最終の検出結果が確定される。具体的には、検出結果統合部３０４が、第１検出部３０２の検出結果と、第２検出部３０３の検出結果とに基づいて最終の検出結果を確定する。 Also in the information processing device 1 of the present embodiment, the final detection result is determined based on the detection result of the first detection unit 302 and the detection result of the second detection unit 303, as in each of the above-described embodiments. To. Specifically, the detection result integration unit 304 determines the final detection result based on the detection result of the first detection unit 302 and the detection result of the second detection unit 303.

より詳細には、検出結果統合部３０４は、第１検出部３０２が第１の検出対象として検出した検出対象から、第２検出部３０３が第３の検出対象として検出したものを除いた残りを、第１の検出対象の検出結果とする。言い換えれば、検出結果統合部３０４は、第１の学習済みモデルに基づく検出結果と、第３の学習済みモデルに基づく検出結果とが整合しない場合には、第１の学習済みモデルに基づく検出結果を無効とする。 More specifically, the detection result integration unit 304 excludes the remainder of the detection target detected by the first detection unit 302 as the first detection target, excluding the detection target detected by the second detection unit 303 as the third detection target. , The detection result of the first detection target. In other words, the detection result integration unit 304 determines the detection result based on the first trained model when the detection result based on the first trained model and the detection result based on the third trained model do not match. Is invalid.

上記の構成によれば、第１の学習済みモデルと第３の学習済みモデルの検出対象は異なっている。このため、ある検出対象について、第１検出部３０２が第１の検出対象として検出したときに、同じ検出対象について、第２検出部３０３が第３の検出対象として検出することがあり得る。このような場合、第１検出部３０２と第２検出部３０３の何れかが誤検出していると判断できる。したがって、第１検出部３０２が第１の検出対象として検出した検出対象から、第２検出部３０３が第３の検出対象として検出したものを除いた残りを、第１の検出対象の検出結果とする上記の構成によれば、誤検出を低減することができる。よって、上記の構成によれば、機械学習済みモデルを用いた検出の検出精度を高めることが可能になる。また、第１検出部３０２は、複数種類の第１の検出対象を検出できるように機械学習された第１の学習済みモデルを用いるため、複数種類の第１の検出対象を一括して効率的に検出することができる。 According to the above configuration, the detection targets of the first trained model and the third trained model are different. Therefore, when the first detection unit 302 detects a certain detection target as the first detection target, the second detection unit 303 may detect the same detection target as the third detection target. In such a case, it can be determined that either the first detection unit 302 or the second detection unit 303 has erroneously detected. Therefore, the remainder of the detection target detected by the first detection unit 302 as the first detection target, excluding the detection target detected by the second detection unit 303 as the third detection target, is the detection result of the first detection target. According to the above configuration, erroneous detection can be reduced. Therefore, according to the above configuration, it is possible to improve the detection accuracy of the detection using the machine-learned model. Further, since the first detection unit 302 uses the first trained model machine-learned so that a plurality of types of first detection targets can be detected, the plurality of types of first detection targets can be efficiently collectively detected. Can be detected.

なお、本実施形態においても、実施形態１の「入力データの解像度について」で説明した例と同様にして、第１の学習済みモデルまたは第３の学習済みモデルに入力する画像データとして、解像度を低下させた画像データを用いてもよい。 Also in this embodiment, the resolution is set as the image data to be input to the first trained model or the third trained model in the same manner as in the example described in "About the resolution of the input data" of the first embodiment. The reduced image data may be used.

〔処理の流れ〕
図１０は、本実施形態の情報処理装置１が実行する処理の流れを説明する図である。本実施形態の情報処理装置１が実行する処理とその実行順序は、例えば、図１０に示す設定ファイルＦ３により定義することができると共に、同図のフローチャートで表すこともできる。 [Processing flow]
FIG. 10 is a diagram illustrating a flow of processing executed by the information processing apparatus 1 of the present embodiment. The processing executed by the information processing apparatus 1 of the present embodiment and the execution order thereof can be defined by, for example, the setting file F3 shown in FIG. 10, and can also be represented by the flowchart of the figure.

（設定ファイルについて）
設定ファイルＦ３では、［ＥＸ＿ｔｒａｓｈ］、［ＥＸ＿ａｌｌ］、［ＥＸ＿ｂａｇ］、および［ＥＸ＿ｆｉｎａｌ］という４つのセクションがこの順序で定義されている。［ＥＸ＿ａｌｌ］は、図５のセクション［ＥＸ１］と同じであり、全ての検出対象物（例えば、段ボール、板、木、ござ、および長尺物）を検出するセクションである。［ＥＸ＿ｔｒａｓｈ］は、ごみが写っている画像と写っていない画像の中からごみが写っている画像を抽出するセクションである。また、［ＥＸ＿ｂａｇ］はごみ袋とスロープ６００（図４参照）を検出対象とするセクションであり、［ＥＸ＿ｆｉｎａｌ］は、［ＥＸ＿ａｌｌ］の結果から［ＥＸ＿ｂａｇ］の結果を除いた結果を出力するセクションである。 (About the configuration file)
In the configuration file F3, four sections [EX_trash], [EX_all], [EX_bag], and [EX_final] are defined in this order. [EX_all] is the same as the section [EX1] of FIG. 5, and is a section for detecting all detection objects (for example, corrugated cardboard, board, wood, goza, and long objects). [EX_trash] is a section for extracting an image showing dust from an image showing dust and an image not showing dust. Further, [EX_bag] is a section for detecting the garbage bag and the slope 600 (see FIG. 4), and [EX_final] is a section for outputting the result obtained by removing the result of [EX_bag] from the result of [EX_all]. is there.

［ＥＸ＿ｔｒａｓｈ］のｄｓｔは「ｔｒａｓｈ＿ｒｅｓ」であり、「ｔｒａｓｈ＿ｒｅｓ」は［ＥＸ＿ａｌｌ］のｓｒｃである。つまり、［ＥＸ＿ｔｒａｓｈ］で抽出された画像から、［ＥＸ＿ａｌｌ］により検出対象物の検出が行われる。 The dst of [EX_trash] is "trash_res", and "trash_res" is the src of [EX_all]. That is, the detection target is detected by [EX_all] from the image extracted by [EX_trash].

また、［ＥＸ＿ａｌｌ］のｄｓｔは「ａｌｌ＿ｒｅｓ」であり、「ａｌｌ＿ｒｅｓ」は［ＥＸ＿ｂａｇ］のｓｒｃである。つまり、［ＥＸ＿ａｌｌ］で少なくとも何れかの検出対象物が検出された画像から、［ＥＸ＿ｂａｇ］によりごみ袋とスロープ６００の検出が行われる。 Further, the dst of [EX_all] is "all_res", and "all_res" is the src of [EX_bag]. That is, the garbage bag and the slope 600 are detected by [EX_bag] from the image in which at least one of the detection objects is detected by [EX_all].

そして、［ＥＸ＿ｆｉｎａｌ］のｓｒｃは「ａｌｌ＿ｒｅｓ」であり、ｄｓｔは「ｆｉｎａｌ＿ｒｅｓ」である。「ａｌｌ＿ｒｅｓ」には、［ＥＸ＿ａｌｌ］の検出結果と、［ＥＸ＿ｂａｇ］の検出結果が記録されるから、［ＥＸ＿ｆｉｎａｌ］はこれらの検出結果に基づく最終の検出結果を「ｆｉｎａｌ＿ｒｅｓ」に出力する。 Then, the src of [EX_final] is "all_res", and the dst is "final_res". Since the detection result of [EX_all] and the detection result of [EX_bag] are recorded in "all_res", [EX_final] outputs the final detection result based on these detection results to "final_res".

ここで、本実施形態において、［ＥＸ＿ａｌｌ］と［ＥＸ＿ｂａｇ］と［ＥＸ＿ｆｉｎａｌ］とを組み合わせて使用する理由について説明する。ごみが入ったごみ袋の形や色は無数にあり、それゆえごみ画像に写るごみ袋とごみ袋との間の部分の形状も多様なものとなる。例えば、ごみ袋とごみ袋との間の部分が長尺の棒のように見える場合には、実際には不適物が存在しない当該部分に基づき、不適物が写っていると誤検出されることがある。このような誤検出を避けるため、ごみ袋も学習させることが好ましいが、ごみ画像に写るごみ袋の数は、例えば１枚の画像に数十個程度となることもあり、非常に多い。それゆえ、全ての教師画像に対して正解データを作ることは非常に手間がかかる。スロープ６００も同様であり、スロープ６００上のごみの状況に応じて、画像に写るスロープ６００の外観が変わるため、全ての教師画像に対してスロープの正解データを作ることは非常に手間がかかる。 Here, the reason why [EX_all], [EX_bag], and [EX_final] are used in combination in the present embodiment will be described. There are innumerable shapes and colors of garbage bags containing garbage, and therefore the shape of the part between the garbage bags shown in the garbage image is also diverse. For example, if the part between the garbage bag and the garbage bag looks like a long stick, it is erroneously detected that the unsuitable object is reflected based on the part where the unsuitable object does not actually exist. There is. In order to avoid such false detection, it is preferable to learn the garbage bags as well, but the number of garbage bags reflected in the garbage image may be, for example, several tens in one image, which is very large. Therefore, it is very troublesome to create correct answer data for all teacher images. The same applies to the slope 600, and the appearance of the slope 600 shown in the image changes depending on the condition of dust on the slope 600. Therefore, it is very troublesome to create correct answer data of the slope for all the teacher images.

このため、本実施形態では、不適物等を検出する［ＥＸ＿ａｌｌ］とは別に、ごみ袋とスロープ６００の検出に特化した［ＥＸ＿ｂａｇ］を用いる。［ＥＸ＿ｂａｇ］において使用する第３の学習済みモデルは、ごみ袋とスロープ６００のみの教師データを用いた機械学習で構築することができる。 Therefore, in the present embodiment, apart from [EX_all] that detects unsuitable substances and the like, [EX_bag] specialized for detecting the garbage bag and the slope 600 is used. The third trained model used in [EX_bag] can be constructed by machine learning using the teacher data of only the garbage bag and the slope 600.

［ＥＸ＿ｆｉｎａｌ］は、［ＥＸ＿ａｌｌ］によって不適物が検出された場所に、［ＥＸ＿ｂａｇ］がごみ袋またはスロープ６００を検出していないか確認するためのセクションである。具体的には、［ＥＸ＿ｆｉｎａｌ］は、［ＥＸ＿ａｌｌ］によって不適物が検出された場所に、ごみ袋とスロープ６００の何れも検出されていない場合に、当該不適物の検出結果を「ｆｉｎａｌ＿ｒｅｓ」に出力するセクションである。これにより、［ＥＸ＿ａｌｌ］による不適物の検出結果のうち、［ＥＸ＿ｂａｇ］の検出結果からみてごみ袋またはスロープ６００を不適物と誤検出した可能性があるものを除いた検出結果を速やかに選別することができる。つまり、ごみ袋またはスロープ６００を不適物として誤検出してしまうことを短時間の処理で効率的に低減することができる。 [EX_final] is a section for confirming whether [EX_bag] has detected a garbage bag or a slope 600 at a place where an unsuitable substance is detected by [EX_all]. Specifically, [EX_final] outputs the detection result of the unsuitable material to "final_res" when neither the garbage bag nor the slope 600 is detected at the place where the unsuitable material is detected by [EX_all]. This is the section to do. As a result, among the detection results of unsuitable substances by [EX_all], the detection results excluding those that may have erroneously detected the garbage bag or slope 600 as unsuitable objects from the detection results of [EX_bag] are promptly selected. be able to. That is, it is possible to efficiently reduce the false detection of the garbage bag or the slope 600 as an unsuitable object in a short time process.

（フローチャートについて）
制御部１０は、［ＥＸ＿ｔｒａｓｈ］に定義されているファイル名が「ｔｒａｓｈ」であるスクリプトファイルを実行することにより、ごみ画像抽出部３０１として機能する。また、制御部１０は、上記スクリプトファイルの実行終了後に、［ＥＸ＿ａｌｌ］に定義されているファイル名が「ａｌｌ」であるスクリプトファイルを実行することにより、第１検出部３０２として機能する。そして、制御部１０は、上記スクリプトファイルの実行終了後に、［ＥＸ＿ｂａｇ］に定義されているファイル名が「ｂａｇ」であるスクリプトファイルを実行することにより、第２検出部３０３として機能する。さらに、制御部１０は、上記スクリプトファイルの実行終了後に、［ＥＸ＿ｆｉｎａｌ］に定義されているファイル名が「ｆｉｎａｌ」であるスクリプトファイルを実行することにより、検出結果統合部３０４として機能する。 (About the flowchart)
The control unit 10 functions as the garbage image extraction unit 301 by executing the script file whose file name is "trash" defined in [EX_thrash]. Further, the control unit 10 functions as the first detection unit 302 by executing the script file whose file name is "all" defined in [EX_all] after the execution of the script file is completed. Then, the control unit 10 functions as the second detection unit 303 by executing the script file whose file name is "bag" defined in [EX_bag] after the execution of the script file is completed. Further, the control unit 10 functions as the detection result integration unit 304 by executing the script file whose file name is "final" defined in [EX_final] after the execution of the script file is completed.

Ｓ３１では、ごみ画像抽出部３０１が、処理対象画像の中からごみが写る画像を抽出する。具体的には、ごみ画像抽出部３０１は、入力データ格納部１２１の「ａｌｌ＿ｓｒｃ」に格納されている動画ファイルから抽出したフレーム画像をスクリプト名「ｔｒａｓｈ」の学習済みモデルに入力して物体情報を出力させる。そして、ごみ画像抽出部３０１は、該物体情報に基づきごみが検出されたと判定したフレーム画像を検出結果格納部１２２の「ｔｒａｓｈ＿ｒｅｓ」に記録する。これらの処理は、上記動画ファイルから抽出したフレーム画像のそれぞれについて行われる。 In S31, the dust image extraction unit 301 extracts an image in which dust is captured from the image to be processed. Specifically, the garbage image extraction unit 301 inputs the frame image extracted from the moving image file stored in the “all_src” of the input data storage unit 121 into the trained model of the script name “trash” to input the object information. Output. Then, the dust image extraction unit 301 records a frame image determined that dust has been detected based on the object information in the “trash_res” of the detection result storage unit 122. These processes are performed for each of the frame images extracted from the moving image file.

Ｓ３２（第１検出ステップ）では、第１検出部３０２が、Ｓ３１で抽出されたフレーム画像から全ての検出対象物について物体検出を行う。具体的には、第１検出部３０２は、検出結果格納部１２２の「ｔｒａｓｈ＿ｒｅｓ」に格納されているフレーム画像を入力データとし、そのフレーム画像をスクリプト名「ａｌｌ」の学習済みモデルに入力して物体情報を出力させる。そして、第１検出部３０２は、該物体情報に基づき物体が検出されたと判定したフレーム画像と、該物体情報とを対応付けて検出結果とし、検出結果格納部１２２の「ａｌｌ＿ｒｅｓ」に記録する。これらの処理は、「ｔｒａｓｈ＿ｒｅｓ」に格納されているフレーム画像のそれぞれについて行われる。 In S32 (first detection step), the first detection unit 302 detects an object for all the detection objects from the frame image extracted in S31. Specifically, the first detection unit 302 uses the frame image stored in "thrash_res" of the detection result storage unit 122 as input data, and inputs the frame image into the trained model of the script name "all". Output object information. Then, the first detection unit 302 associates the frame image determined that the object has been detected based on the object information with the object information to obtain a detection result, and records it in "all_res" of the detection result storage unit 122. These processes are performed for each of the frame images stored in "trash_res".

Ｓ３３（第２検出ステップ）では、第２検出部３０３が、Ｓ３１で抽出された画像から第３の学習済みモデルによりごみ袋とスロープ６００の検出を行う。具体的には、第２検出部３０３は、検出結果格納部１２２の「ｔｒａｓｈ＿ｒｅｓ」に記録されているフレーム画像を入力データとし、そのフレーム画像をスクリプト名「ｂａｇ」の学習済みモデルに入力して物体情報を出力させる。そして、第２検出部３０３は、該物体情報に基づいて物体すなわちごみ袋またはスロープが検出されたか否かを判定し、検出されたと判定した場合には、そのフレーム画像と物体情報とを対応付けて検出結果とし、検出結果格納部１２２の「ａｌｌ＿ｒｅｓ」に記録する。これらの処理は、「ｔｒａｓｈ＿ｒｅｓ」に格納されているフレーム画像のそれぞれについて行われる。 In S33 (second detection step), the second detection unit 303 detects the garbage bag and the slope 600 from the image extracted in S31 by the third learned model. Specifically, the second detection unit 303 uses the frame image recorded in "trash_res" of the detection result storage unit 122 as input data, and inputs the frame image into the trained model of the script name "bag". Output object information. Then, the second detection unit 303 determines whether or not an object, that is, a garbage bag or a slope is detected based on the object information, and if it is determined that the object is detected, associates the frame image with the object information. The detection result is recorded in "all_res" of the detection result storage unit 122. These processes are performed for each of the frame images stored in "trash_res".

なお、ごみ袋とスロープ６００をそれぞれ別の学習済みモデルを用いて検出する構成としてもよい。また、第２検出部３０３の検出対象は、第１検出部３０２の検出対象と異なる物体であればよく、ごみ袋やスロープ６００に限られない。ただし、第２検出部３０３の検出対象は、第１検出部３０２の検出対象と外観が類似している物体であることが好ましい。例えば、第２検出部３０３の検出対象は、不適物と外観が類似しているが不適物ではない物体（例えば段ボール等）を検出対象としてもよい。 The garbage bag and the slope 600 may be detected by using different trained models. Further, the detection target of the second detection unit 303 may be an object different from the detection target of the first detection unit 302, and is not limited to the garbage bag or the slope 600. However, the detection target of the second detection unit 303 is preferably an object having a similar appearance to the detection target of the first detection unit 302. For example, the detection target of the second detection unit 303 may be an object (for example, corrugated cardboard) that is similar in appearance to an unsuitable object but is not an unsuitable object.

Ｓ３４（確定ステップ）では、検出結果統合部３０４が、Ｓ３２とＳ３３の検出結果に基づいて最終の検出結果を確定させる。より詳細には、検出結果統合部３０４は、第１検出部３０２がＳ３２で検出した検出物から、第２検出部３０３がＳ３３でごみ袋またはスロープとして検出したものを除いた残りを、検出対象物の最終の検出結果とする。 In S34 (confirmation step), the detection result integration unit 304 confirms the final detection result based on the detection results of S32 and S33. More specifically, the detection result integration unit 304 detects the rest of the detection object detected by the first detection unit 302 in S32, excluding the one detected by the second detection unit 303 as a garbage bag or a slope in S33. The final detection result of the object.

具体的には、検出結果統合部３０４は、第１検出部３０２が「ａｌｌ＿ｒｅｓ」に格納した各検出対象物の物体情報から、当該検出対象物が画像上で占める範囲を特定する。次に、検出結果統合部３０４は、第２検出部３０３が「ａｌｌ＿ｒｅｓ」に格納した物体情報に基づき、上記範囲にごみ袋またはスロープ６００が検出されているか否かを判定する。ここで、検出結果統合部３０４は、上記範囲にごみ袋またはスロープ６００が検出されていないと判定した場合には、その検出対象物の物体情報とフレーム画像とを対応付けて最終の検出結果とし、検出結果格納部１２２の「ｆｉｎａｌ＿ｒｅｓ」に記録する。一方、検出結果統合部３０４は、上記範囲にごみ袋またはスロープ６００が検出されていると判定した場合には、その検出対象物の物体情報とフレーム画像は記録しない。つまり、この検出対象物の検出結果は誤検出であるとして無効にされる。 Specifically, the detection result integration unit 304 specifies the range occupied by the detection target on the image from the object information of each detection target stored in "all_res" by the first detection unit 302. Next, the detection result integration unit 304 determines whether or not the garbage bag or the slope 600 is detected in the above range based on the object information stored in "all_res" by the second detection unit 303. Here, when the detection result integration unit 304 determines that the garbage bag or the slope 600 is not detected in the above range, the object information of the detection target and the frame image are associated with each other to obtain the final detection result. , Record in "final_res" of the detection result storage unit 122. On the other hand, when the detection result integration unit 304 determines that the garbage bag or the slope 600 is detected in the above range, the detection result integration unit 304 does not record the object information and the frame image of the detection target object. That is, the detection result of this detection object is invalidated as an erroneous detection.

Ｓ３５では、図５のＳ１３と同様にして検出結果の出力が行われ、これにより処理は終了する。なお、出力する検出結果は検出結果格納部１２２の「ｆｉｎａｌ＿ｒｅｓ」から読み出せばよい。 In S35, the detection result is output in the same manner as in S13 of FIG. 5, and the process ends. The output detection result may be read from "final_res" in the detection result storage unit 122.

〔変形例〕
上述の各実施形態における物体検出や物体の分類等には、機械学習済みのニューラルネットワーク（深層学習したものを含む）以外の人工知能・機械学習アルゴリズムを用いることもできる。 [Modification example]
Artificial intelligence / machine learning algorithms other than machine-learned neural networks (including deep-learned ones) can also be used for object detection, object classification, and the like in each of the above-described embodiments.

上記各実施形態で説明した各処理の実行主体は、適宜変更することが可能である。例えば、図１、図８、または図１０に示す各ブロックの少なくとも何れかを省略し、省略した処理部を他の一または複数の装置に設けてもよい。この場合、上述した各実施形態の処理は、一または複数の情報処理装置により実行される。 The execution subject of each process described in each of the above embodiments can be changed as appropriate. For example, at least one of the blocks shown in FIGS. 1, 8 or 10 may be omitted, and the omitted processing unit may be provided in the other device or a plurality of devices. In this case, the processing of each of the above-described embodiments is executed by one or more information processing devices.

また、上記各実施形態ではごみ画像から不適物等を検出する例を説明したが、検出対象物は任意であり、不適物等に限られない。さらに、情報処理装置１の使用する学習済みモデルに対する入力データは画像データに限られず、例えば音声データであってもよい。この場合、情報処理装置１は、入力された音声データに含まれる所定の音の成分を検出対象として検出する構成としてもよい。 Further, in each of the above embodiments, an example of detecting an unsuitable substance or the like from a garbage image has been described, but the object to be detected is arbitrary and is not limited to the unsuitable object or the like. Further, the input data for the trained model used by the information processing device 1 is not limited to image data, and may be, for example, audio data. In this case, the information processing device 1 may be configured to detect a predetermined sound component included in the input voice data as a detection target.

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。 The present invention is not limited to the above-described embodiments, and various modifications can be made within the scope of the claims, and the embodiments obtained by appropriately combining the technical means disclosed in the different embodiments. Is also included in the technical scope of the present invention.

１情報処理装置
１０１、２０１、３０２第１検出部
１０２、２０２Ａ、２０２Ｂ、３０３第２検出部
３０４検出結果統合部 1 Information processing device 101, 201, 302 First detection unit 102, 202A, 202B, 303 Second detection unit 304 Detection result integration unit

Claims

A first detection unit that detects the first detection target by inputting input data into the first trained model that has been machine-learned so that a plurality of types of first detection targets can be detected.
The input data is input to the second trained model machine-learned so that the second detection target, which is at least a part of the first detection target of the plurality of types, can be detected, and the second detection target is set. The input data is input to the third trained model that has been machine-learned so that it can be detected or a third detection target different from the first detection target can be detected, and the third detection target is set. It is equipped with a second detection unit for detection.
An information processing apparatus characterized in that the final detection result is determined based on the detection result of the first detection unit and the detection result of the second detection unit.

The second detection unit inputs the input data to the second trained model to detect the second detection target, and then detects the second detection target.
The first detection unit and the second detection unit output the detection result to a common output destination.
The information processing apparatus according to claim 1, wherein the detection result of the first detection unit and the detection result of the second detection unit, which are output to the common output destination, are the final detection results. ..

The first detection unit inputs a plurality of input data to the first trained model and detects the first detection target from each input data.
The second detection unit inputs the input data in which the first detection target is detected by the first detection unit to the second trained model among the plurality of input data, and the second detection target Detected and
The information processing apparatus according to claim 1, wherein the detection result of the second detection unit is used as the final detection result.

The second detection unit inputs the input data to the third trained model to detect the third detection target, and then detects the third detection target.
The remainder of the detection target detected by the first detection unit as the first detection target, excluding the detection target detected by the second detection unit as the third detection target, is used as the detection result of the first detection target. The information processing apparatus according to claim 1, further comprising a detection result integration unit.

The above input data is image data,
The first detection unit inputs data having a reduced resolution of the image data into the first trained model, or
Any of claims 1 to 4, wherein the second detection unit inputs data having a reduced resolution of the image data to the second trained model or the third trained model. The information processing apparatus according to item 1.

An information processing method executed by one or more information processing devices.
A first detection step in which input data is input to a first trained model machine-learned so that a plurality of types of detection targets can be detected, and the detection target is detected from the input data.
A second trained model machine-learned to detect at least a part of the plurality of types of detection targets, or machine-learned to detect a detection target different from the first trained model. A second detection step of inputting the above input data into the third trained model and detecting a detection target from the input data is included.
An information processing method comprising: a confirmation step of determining a final detection result based on a detection result of the first detection step and a detection result of the second detection step.

The information processing program for operating a computer as the information processing device according to claim 1, wherein the computer functions as the first detection unit and the second detection unit.