CN112651281A

CN112651281A - Information processing apparatus, information processing method, and recording medium

Info

Publication number: CN112651281A
Application number: CN202011063846.1A
Authority: CN
Inventors: 三木裕介; 三宅寿英; 藤丸雅弘; 牧恒男; 桑野雅史
Original assignee: Hitachi Zosen Corp; Tokyo Eco Service Co Ltd
Current assignee: Hitachi Zosen Corp; Tokyo Eco Service Co Ltd
Priority date: 2019-10-11
Filing date: 2020-09-30
Publication date: 2021-04-13
Also published as: JP2021064139A; JP7385417B2

Abstract

The technical problem is as follows: the detection accuracy of detection using a machine learning model is improved. The solution is as follows: an information processing device (1) is provided with: a first detection unit (101) that inputs input data into a first learned model capable of detecting a plurality of types of detection objects and detects the detection objects; and a second detection unit (102) that inputs the input data to a second learned model capable of detecting at least a part of the plurality of types of detection objects and detects a second detection object, wherein the information processing device (1) determines a final detection result based on detection results of both the detection units.

Description

Information processing apparatus, information processing method, and recording medium

Technical Field

The present invention relates to an information processing apparatus and the like that detect a detection object using a learned model constructed by machine learning.

Background

In recent years, the recognition/detection accuracy of an object on an image is being improved due to the development of machine learning such as deep learning. Further, the use of image recognition technology is expanding. However, since the current detection accuracy is not 100%, further improvement is required for further expanding the applications.

For example, an image recognition apparatus described in patent document 1 below performs pattern matching on an image of a captured object using a plurality of templates. When the pattern recognition device determines that the patterns of the plurality of templates match each other and the degree of overlapping of the templates is equal to or greater than a threshold value, the image recognition device recognizes that the recognition object is at least one of the templates. This can improve the recognition accuracy as compared with the case of performing pattern matching using one template.

Documents of the prior art

Patent document

Patent document 1: japanese laid-open patent publication No. 2008-165394 (published 7.17.2008) "

Disclosure of Invention

Technical problem to be solved

However, in the above-described conventional technique, there is room for improvement in detection accuracy. For example, when a template for pedestrian detection and a template for sign detection are used, if missing detection of a pedestrian occurs using the template for pedestrian detection, the pedestrian is not detected in the template for sign detection, and therefore, there is no method for correcting the missing detection of the pedestrian. In addition, even when false detection occurs using a template for a pedestrian (for example, when a roadside tree is erroneously recognized as a pedestrian), there is no method for correcting the false detection. In addition, when the detection target is not an object (for example, when data of a voice is input to a learned model and a predetermined voice component is detected), there is room for improving the detection accuracy similarly even when a plurality of templates as in patent document 1 are used.

An object of one embodiment of the present invention is to provide an information processing apparatus and the like capable of improving detection accuracy of detection using an already-performed machine learning model.

(II) technical scheme

In order to solve the above-described problem, an information processing apparatus according to an aspect of the present invention includes: a first detection unit that detects a first detection object by inputting input data to a first learned model that has been machine-learned so as to be able to detect a plurality of types of first detection objects; and a second detection unit that detects the second detection object by inputting the input data to a second learned model that is machine-learned so that a second detection object that is at least a part of the plurality of types of first detection objects can be detected, or detects a third detection object by inputting the input data to a third learned model that is machine-learned so that a third detection object different from the first detection object can be detected, wherein the information processing device determines a final detection result based on a detection result of the first detection unit and a detection result of the second detection unit.

In order to solve the above-described problems, an information processing method according to an aspect of the present invention is executed by one or more information processing apparatuses, and includes: a first detection step of inputting input data to a first learned model that has been machine-learned so as to be able to detect a plurality of types of detection objects, and detecting the detection objects from the input data; a second detection step of inputting the input data to a second learned model that is machine-learned so that at least a part of the plurality of types of detection objects can be detected, or to a third learned model that is machine-learned so that a detection object different from the first learned model can be detected, and detecting the detection object from the input data, the information processing method further comprising: a determination step of determining a final detection result based on the detection result of the first detection step and the detection result of the second detection step.

(III) advantageous effects

According to one embodiment of the present invention, the detection accuracy of detection using an already-performed machine learning model can be improved.

Drawings

Fig. 1 is an example of a functional block diagram of a control unit of an information processing apparatus according to embodiment 1 of the present invention.

Fig. 2 is a block diagram showing a configuration example of an undesirable substance detection system including the information processing apparatus.

Fig. 3 is a view showing a situation where the refuse collection vehicle drops refuse into the refuse pit in the refuse incineration facility.

Fig. 4 is a view showing the inside of a trash pit.

Fig. 5 is a diagram illustrating a flow of processing executed by the information processing apparatus.

Fig. 6 is a diagram illustrating construction and relearning of a learned model.

Fig. 7 is a block diagram showing a configuration example of a control unit provided in an information processing device according to embodiment 2 of the present invention.

Fig. 8 is a diagram illustrating a flow of processing executed by the information processing apparatus.

Fig. 9 is a block diagram showing a configuration example of a control unit provided in an information processing apparatus according to embodiment 3 of the present invention.

Fig. 10 is a diagram illustrating a flow of processing executed by the information processing apparatus.

Description of the reference numerals

1-an information processing apparatus; 101. 201, 302-first detection part; 102. 202A, 202B, 303 — a second detection section; 304-detection result integration section.

Detailed Description

[ embodiment mode 1]

In recent years, there has been a problem that incineration of an inappropriate material (hereinafter simply referred to as an inappropriate material) is put into a waste incineration facility. When an inappropriate material is charged into the incinerator, there may occur deterioration of combustion in the incinerator, clogging of ash discharge equipment of the incinerator, and the like, and in some cases, the incinerator may be brought to an emergency stop. Currently, workers in waste incineration facilities randomly select collected waste and manually confirm whether the selected waste contains an inappropriate object, which results in a large burden on the operators.

In addition, in order to reduce the number of the inappropriate materials transported to the waste incineration facility, a system is required which detects the inappropriate materials from the transported waste and presents the detected inappropriate materials to the collection person in charge, in a case where attention is paid to the person in charge of collecting the waste. In such a case, it is not desirable to suggest garbage that is not actually an inappropriate thing as an inappropriate thing. Further, when the photographed image is directly viewed by the person in charge, it is difficult to know at which time and at which position the inappropriate object was photographed, which is not preferable.

The information processing apparatus 1 according to one embodiment of the present invention can solve the above-described problems. The information processing device 1 has a function of detecting an inappropriate object from garbage carried into a garbage incineration facility. Specifically, the information processing apparatus 1 detects an inappropriate object using an image of the garbage captured while being thrown into the garbage pit. Further, the following description will be made with respect to the trash pit based on fig. 4. In addition, the undesirable material may be detected after the litter is deposited. The inappropriate material is an object that should not be incinerated in an incinerator provided in a waste incineration facility. Specific examples of such inappropriate substances will be described below.

[ System Structure ]

The configuration of the undesirable substance detection system according to the present embodiment will be described with reference to fig. 2. Fig. 2 is a block diagram showing a configuration example of the undesirable substance detection system 100. The inappropriate-object detection system 100 includes an information processing device 1, a garbage image capturing device 2, a vehicle information collecting device 3, a selection display device 4, and an inappropriate-object display device 5.

Fig. 2 also shows an example of the hardware configuration of the information processing apparatus 1. As shown in the drawing, the information processing apparatus 1 includes: the vehicle information display device includes a control unit 10, a high-speed storage unit 11, a large-capacity storage unit 12, an image IF (interface) unit 13, a vehicle information IF unit 14, a selection display IF unit 15, and an inappropriate object display IF unit 16. The information processing apparatus 1 may be a personal computer, a server, or a workstation, as an example.

The control unit 10 controls each unit of the information processing apparatus 1 as a whole. The functions of each part of the control unit 10 described below with reference to fig. 1 can be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or can be realized by software. The software may include an information processing program that causes a computer to function as each unit included in the control unit 10 described below with reference to fig. 1, 7, and 9. In the case of software implementation, the control Unit 10 may be constituted by a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or a combination thereof. In this case, the software is stored in the mass storage unit 12 in advance. Then, the control unit 10 reads the software into the high-speed storage unit 11 and executes the software.

The high-speed storage unit 11 and the large-capacity storage unit 12 are storage devices for storing various data used by the information processing apparatus 1. The high-speed storage unit 11 is a storage device capable of writing and reading data at a higher speed than the large-capacity storage unit 12. The large-capacity storage unit 12 has a larger data storage capacity than the high-speed storage unit 11. As the high-speed storage unit 11, for example, a high-speed Access Memory such as SDRAM (Synchronous Dynamic Random-Access Memory) can be applied. Further, as the large-capacity storage unit 12, for example, an HDD (Hard Disk Drive), an SSD (Solid-State Drive), an SD (Secure Digital) card, an eMMC (embedded Multi-Media Controller), or the like can be applied.

The image IF unit 13 is an interface for communicatively connecting the garbage image pickup device 2 and the information processing device 1. The vehicle information IF unit 14 is an interface for communicatively connecting the vehicle information collection device 3 and the information processing device 1. These IF units may be IF units for wired communication or IF units for wireless communication. For example, USB (Universal Serial Bus), LAN (Local-Area Network), wireless LAN, or the like can be applied as these IF units.

The selection display IF unit 15 is an interface for communicatively connecting the selection display device 4 to the information processing device 1. The inappropriate-material display IF unit 16 is an interface for communicatively connecting the inappropriate-material display device 5 and the information processing device 1. These IF units may be for wired communication or for wireless communication. For example, HDMI (High-Definition Multimedia Interface, registered trademark), DisplayPort (display port), DVI (Digital Visual Interface), VGA (Video Graphics Array) terminal, S terminal, RCA terminal, or the like can be applied as the IF unit.

The trash imaging device 2 images the trash in the middle of being thrown into the trash pit, and transmits the captured image to the information processing device 1. Hereinafter, this captured image is referred to as a spam image. For example, the dust capture device 2 may be a high-speed shutter camera that captures a moving image. The garbage image may be a moving image or a time-series still image obtained by continuous shooting. The garbage image is input to the information processing apparatus 1 via the image IF unit 13. The input garbage image may be processed directly by the control unit 10, or may be stored in the high-speed storage unit 11 or the large-capacity storage unit 12 and then processed by the control unit 10.

The vehicle information collection device 3 carries in the garbage, collects identification information of a vehicle (so-called garbage collection vehicle) that drops the garbage into a garbage pit, and transmits the identification information to the information processing device 1. Further, the following will explain dropping of garbage into a garbage pit by a garbage collection vehicle based on fig. 4. The identification information is used by the incoming vehicle specifying unit 105 to specify the incoming subject of the refuse. The identification information may be information indicating a number of a license plate, for example. In this case, the vehicle information collection apparatus 3 may capture a license plate and transmit the captured image as identification information to the information processing apparatus 1. The vehicle information collection device 3 can receive input of identification information of the garbage collection vehicle and transmit the information to the information processing device 1.

The selection display device 4 displays an image of an inappropriate object detected by the information processing device 1. In the inappropriate-substance detection system 100, in consideration of the possibility that the information processing apparatus 1 may erroneously determine that the garbage which is not an inappropriate substance is an inappropriate substance, the selection display apparatus 4 displays an image of the inappropriate substance detected by the information processing apparatus 1 and visually confirms whether or not the object appearing on the image is an inappropriate substance. Then, the person in charge who visually checks selects an image in which an inappropriate object is present from among the images displayed on the selection display device 4.

The inappropriate-object display device 5 displays an image of an inappropriate object selected via the selection display device 4, that is, an image in which an inappropriate object is present, among the images of the inappropriate objects detected by the information processing device 1. The inappropriate-object display device 5 displays the image in order to attract the attention of a person in charge, an operator, or the like who carries the inappropriate object.

[ garbage image shooting ]

Fig. 3 is a view showing a situation where the refuse collection vehicle 200 drops refuse into a refuse pit in the refuse incineration facility. Fig. 4 is a view showing the inside of a trash pit. The refuse pit is a place where refuse collected in a refuse incineration facility is temporarily stored, and the refuse in the refuse pit is sequentially sent to an incinerator to be incinerated. As shown in fig. 3, a plurality of doors (hereinafter, collectively referred to as the doors 300 when there is no need to distinguish the doors) such as the

doors

300A and 300B are provided in the waste incineration facility. As shown in fig. 4, a trash pit is provided in front of the door 300. That is, through the open doorway 300, a drop opening for dropping the garbage into the garbage pit occurs. As shown in fig. 3, the garbage collection vehicle 200 drops garbage into the garbage pit from the drop port.

The garbage camera 2 is installed at a position where it can photograph garbage flowing along the slope 600 of fig. 4. For example, the garbage camera 2 may be mounted at the mounting position 400 shown in fig. 3 and 4. Since the installation position 400 is located on the surface of each door 300, in the case of installing the trash photographing device 2 on the installation position 400, the trash photographing device 2 is located above the inclined plane 600 when the door 300 is opened, which is suitable for photographing trash. Of course, the installation position of the dust photographing device 2 can be set to any position where the dust flowing along the slope 600 can be photographed.

In addition, when the vehicle information collection device 3 is an imaging device, the vehicle information collection device 3 may be attached to the attachment position 400. When the garbage collection vehicle 200 approaches the door 300, the door 300 is closed, and thus the license plate or the like of the garbage collection vehicle 200 can be imaged from the vehicle information collection device 3 attached to the attachment position 400. Of course, the mounting position of the vehicle information collection device 3 may be set to any position where the image of the garbage collection vehicle 200 can be captured, or may be set to a position different from the position of the image capture device 2. The vehicle information collection device 3 may be an information input device, for example, and in this case, the vehicle information collection device 3 may be mounted in an operation room and configured to receive an input of identification information of the garbage collection vehicle 200 by an operator.

[ device Structure ]

The configuration of the information processing apparatus 1 will be described with reference to fig. 1. Fig. 1 is an example of a functional block diagram of the control unit 10 of the information processing apparatus 1. The control unit 10 shown in fig. 1 includes a first detection unit 101, a second detection unit 102, a learning unit 103, a selection display control unit 104, a carried-in vehicle specifying unit 105, and an inappropriate object display control unit 106. The large-capacity storage unit 12 shown in fig. 1 includes an input data storage unit 121, a detection result storage unit 122, a learned model storage unit 123, and a teaching data storage unit 124.

The first detection unit 101 inputs input data to a first learned model that has been machine-learned so as to be able to detect a plurality of types of first detection objects, and detects the first detection objects. In addition, detecting a plurality of types of detection targets means that there are a plurality of classes (also referred to as classes) learned in the first learned model.

The second detection unit 102 inputs the input data to a second learned model that is machine-learned so as to be able to detect a second detection object that is at least a part of the plurality of types of first detection objects, and detects the second detection object. In the present embodiment, an example will be described in which a plurality of types of first detection targets are a plurality of types of inappropriate objects, and a second detection target is also an inappropriate object. In addition, the first detection object and the second detection object may include an object that is similar in appearance to the inappropriate object but is not an inappropriate object.

The first learned model and the second learned model are read from the learned model storage unit 123, and the input data is read from the input data storage unit 121. As will be described in detail later, the first learned model and the second learned model are constructed by machine learning using the original teaching data 124a stored in the teaching data storage section 124. The first detection target and the second detection result are stored in the detection result storage unit 122. These detection results are used as the additional teaching data 122 a. The additional teaching data 124b is obtained by copying the additional teaching data 122a to the teaching data storage unit 124. The first learned model and the second learned model are relearned using the original teaching data 124a and the additional teaching data 124b stored in the teaching data storage unit 124.

The first learned model and the second learned model may be any models that are constructed by machine learning. In this embodiment, an example will be described in which the first learned model and the second learned model are learned models of a neural network constructed by deep learning. More specifically, these learned models output object information of a detection target object appearing on an image using the image as input data. The object information may include an identifier indicating the classification of the object, and information indicating the position, size, shape, and the like. The object information may include a probability value indicating the certainty of the detection result. The probability value may be, for example, a value of 0 to 1.

The first detector 101 and the second detector 102 may detect an object based on the probability value. In this case, a detection threshold value may be set in advance, and an object having a probability value larger than the threshold value may be used as the detected object. The detection accuracy is higher as the detection threshold is larger, but the number of missing cases is increased, and the false detection is increased as the detection threshold is smaller, so that an appropriate detection threshold may be set depending on the required detection accuracy or the like. Further, regarding the construction of the learned model, the following will be explained based on fig. 6.

The learning unit 103 relearnss the first learned model and the second learned model. In addition, the learning unit 103 may also construct the first learned model and the second learned model. With regard to the construction and relearning of the learned model, the following description will be made based on fig. 6. Further, the learning unit 103 for the first learned model and the learning unit 103 for the second learned model may be provided separately.

The selection display control unit 104 causes the selection display device 4 to display a detection result (for example, an image determined to represent an inappropriate object) determined based on the detection results of the first detection unit 101 and the second detection unit 102. The person in charge who visually confirms whether or not an inappropriate object is present in the displayed image, and selects an image in which the inappropriate object is present. Then, the selection display control unit 104 receives image selection by the person in charge who is visually confirmed. This can substantially reliably avoid erroneous detection.

The incoming vehicle specifying unit 105 specifies an incoming vehicle (e.g., the refuse collection vehicle 200 of fig. 3) for refuse using the identification information received from the vehicle information collection device 3. Then, when the information processing device 1 detects an inappropriate object from the garbage that the incoming vehicle identified by the incoming vehicle identification unit 105 has been incoming in the past, the inappropriate object display control unit 106 causes the inappropriate object display device 5 to display an image of the inappropriate object. This makes it possible to present an image of an inappropriate object to the person in charge of the garbage carried in by the incoming vehicle and attract attention.

As described above, the information processing apparatus 1 includes: a first detection unit 101 that detects a plurality of types of first detection objects by inputting input data into a first learned model that has been machine-learned so as to be able to detect the first detection objects; and a second detection unit 102 that detects the second detection object by inputting the input data to a second learned model that is machine-learned so that the second detection object that is at least a part of the plurality of types of first detection objects can be detected. Then, the information processing apparatus 1 determines a final detection result based on the detection result of the first detection portion 101 and the detection result of the second detection portion 102. Specifically, in the information processing apparatus 1, the first detection unit 101 and the second detection unit 102 output the detection results to the common output destination, and therefore the detection results output to the common output destination are set as the final detection results.

According to the above configuration, at least a part of the detection objects of the first learned model and the second learned model overlap with each other, and therefore, the possibility of erroneous detection of the overlapping part can be reduced. For example, a board as an improper object and corrugated paper as a non-improper object are similar in appearance, and in this case, there is a possibility that the board is erroneously detected as corrugated paper, or the corrugated paper is erroneously detected as a board. According to the above configuration, even if the above-described erroneous detection of the sheet or the corrugated paper occurs in one of the first learned model and the second learned model, the object can be accurately detected by the other, and finally, an accurate detection result of the object can be output. Therefore, the detection accuracy of detection using the already-performed machine learning model can be improved.

Further, according to the above configuration, since the first detection unit 101 uses the first learned model that has been machine-learned so as to be able to detect a plurality of types of first detection objects, it is possible to efficiently detect a plurality of types of first detection objects at once.

The method of determining the final detection result is not limited to the method of sharing the output destinations of the detection results of the first detection unit 101 and the second detection unit 102. For example, a block for integrating the detection results may be added to the control unit 10, and the detection results of the first detection unit 101 and the second detection unit 102 may be integrated by the block to obtain the final detection result. The following methods can be exemplified as an overall method.

(1) The result of detection by the second detector 102 is added to the result of detection by the first detector 101 to obtain a final detection result (for example, when the detector 101 detects the inappropriate object A, B from a certain image and the second detector 102 detects the inappropriate object B, C from the same image, the final detection result from the image is set to the inappropriate object A, B, C).

(2) The common part of the detection result of the first detection unit 101 and the detection result of the second detection unit 102 is set as the final detection result (for example, in the case where the first detection unit 101 detects the inappropriate object A, B, C from a certain image and the second detection unit 102 detects the inappropriate object B from the same image, the final detection result from the image is set as the inappropriate object B, and the like).

(3) The detection result of the second detection unit 102 is added to the detection result of the first detection unit 101, but a portion in which the two detection results do not match is excluded from the final detection result (for example, regarding the image in which the object X and the object Y appear, in the case where the first detection unit 101 detects that the object X is the improper object a and the object Y is the improper object B, and the second detection unit 102 detects that the object X is the improper object a and the object Y is the improper object C, the final detection result is set to the improper object a, and the like).

[ procedure for treatment ]

Fig. 5 is a diagram illustrating a flow of processing executed by the information processing apparatus 1 according to the present embodiment. The process executed by the information processing apparatus 1 of the present embodiment and the execution order thereof can be defined by, for example, a setting file F1 shown in fig. 5, and can also be represented by a flowchart of the figure.

(about setting files)

The setting file F1 shown in fig. 5 is a data structure divided by segments. A segment starts with a segment name. In the example of fig. 5, the character strings enclosed by "[" and "]" are segment names, and specifically, [ EX1] and [ EX2] are segment names. A segment is ended by the start of the next segment or the end of the setup file. The content executed at each stage is defined in one segment.

The execution of the segments is set in the segment order in this example, but the present invention is not limited to this example, and the execution order may be defined in a part of the segment name, for example. In addition, different algorithms may be executed for each segment. For example, an algorithm for processing using a neural network having a different number of intermediate layers may be executed for each segment, and an algorithm for performing different internal processing may be executed. In addition, the above definitions may be made by other methods (for example, arguments when the flowchart of fig. 5 is executed) than by setting a file.

In the setting file F1, each variable, < KEY >, is defined as follows. The VALUE of < KEY > is < VALUE >.

In < KEY > in this example, three of "script", "src", and "dst" are defined for each segment. Where "script" represents the file name of the executed script. The script file is, for example, an algorithm for detecting an inappropriate object. The algorithm may include processing other than detection of an inappropriate substance.

In addition, "src" represents input data for executing a script. For example, in the case of a script for detecting an object in an image, "src" may indicate a storage position of an image to be processed in the large-capacity storage unit 12 or an image list of the image to be processed. Also, "dst" represents an output destination of the script execution result. For example, "dst" may indicate a position on the mass storage section 12. In addition, other parameters and the like used when executing the script may be defined in, for example, "script".

The control unit 10 reads the installation file F1, executes a script file having a file name "EX 1" defined in [ EX1] of the installation file F1, and functions as the first detection unit 101. Then, the first detection unit 101 specifies an image to be processed with reference to "ex _ src", performs object detection on the specified image, and records the result in "ex _ dst". Next, the control unit 10 executes the script file with the file name "EX 2" defined in [ EX2], and the control unit 10 functions as the second detection unit 102. Then, the second detection unit 102 specifies an image to be processed with reference to "ex _ src", performs object detection on the specified image, and records the result in "ex _ dst".

"EX 1" in [ EX1] of the setting file F1 is a script for performing object detection from an image as input data using the above-described first learned model. "EX 2" in [ EX2] is a script for detecting an object from an image as input data using the second learned model described above.

In addition, the values of "src" and "dst" of [ EX1] and [ EX2] in the setting file F1 are common. That is, the images to be processed by the first detection unit 101 and the second detection unit 102 are common, and the output destinations of the object detection results for the images are also common.

As [ EX1] in the setting file F1, for example, a script SC11 (script file name: EX1) shown in fig. 5 can also be applied. The script file is a shell script file of UNIX (registered trademark) format including Linux (registered trademark), but may be other format such as Batch format of DOS (Disk Operating System).

In the case of using the present script file, (1) based on the setting file item, and (2) executing the script, as shown in fig. 5. Here, ex1 is the script file name executed as described above, and ex _ src and ex _ dst are passed to the script as arguments. The ex1 script (SC11) has a one-line structure, and executes the execution file (first detection command) of the first detection unit 101. In this example, the instruction uses three arguments. The argument ex _ src of ex1 is passed to the first argument $1, and the argument ex _ dst of ex1 is passed to the second argument $2, the third argument being the file name of the learned model (first learned model). As the first learned model, for example, a learned model in which machine learning is performed on all detection target objects is used. The detection object may be, for example, corrugated paper, board, wood, mat, and long strip. Among these objects to be detected, boards, wood, mats, and long objects are inappropriate objects. Corrugated paper is not an inappropriate object, but is similar in appearance to a board, and is therefore contained in an object to be detected. In this way, by including the refuse having an appearance similar to that of the inappropriate object in the detection target object, it is possible to reduce the possibility of occurrence of the erroneous detection or the missing detection of the inappropriate object.

Although the execution file in this example uses three arguments, other setting items (such as a detection threshold) may be used as arguments, if necessary. In addition, the order of the arguments may be changed as necessary. Although the SC11 has a configuration with only one line, for example, instructions before and after processing (for example, instructions for preprocessing and post-processing) by the first detection unit 101 may be added as necessary. For example, an instruction for sorting input data, an instruction for extracting necessary information from executed log data, and the like may be added.

In the case of using the script SC11, for example, a file of a moving image captured by the garbage capture device 2 (hereinafter referred to as a moving image file) or a plurality of still image files can be specified by src ═ ex _ src. Thus, the detection result of the moving image file (for example, the image in which the object is detected, and the position and size information of the object) is stored in "ex _ dst".

The script SC12 can be applied as [ EX2] in the setting file F1. The difference between SC11 and SC12 is the filename executed and the learned model used. Further, [ EX2] among the objects to be detected of [ EX1], an object for which omission is particularly desired to be avoided can be set as the object to be detected. For example, when it is desired to avoid missing detection of wood, [ EX2] can use wood as the detection target. Thus, even when wood is detected as missing in [ EX1], wood is not detected as a whole as long as wood can be detected by [ EX2 ]. In addition, when [ EX1] is set to detect a part of all the detection objects, it may be set to [ EX2] to detect another part of all the detection objects. In this case, at least one of the object to be detected of [ EX1] and the object to be detected of [ EX2] may be overlapped in advance. Further, although different execution files are used in SC11 and SC12, the first detection instruction and the second detection instruction may be the same when only the learned models are different. That is, the first detection unit 101 and the second detection unit 102 may be the same.

(for flow chart)

The processing (information processing method) of the flowchart shown in fig. 5 will be described. This flowchart shows the flow of processing according to the setting file F1 shown in the figure. The "ex _ src" in which the moving image file captured by the garbage image capturing device 2 before the processing of the flowchart is performed is stored in the input data storage unit 121 is set. Instead of the moving image file, a plurality of frame images extracted from the moving image file or a plurality of still image files captured in time series by the garbage image capturing device 2 may be stored.

In S11 (first detection step), the first detection unit 101 performs object detection using the first learned model from the image stored in the input data storage unit 121. Specifically, the first detection unit 101 sets, as input data, a frame image extracted from the moving image file of "ex _ src" stored in the input data storage unit 121, inputs the frame image to a learned model with a script name of "ex 1", and outputs object information. Then, the first detection unit 101 determines whether or not an object is detected based on the object information, and when an object is detected, associates the frame image with the object information as a detection result and records the detection result in "ex _ dst" of the detection result storage unit 122. These processes are performed for each frame image extracted from the moving image file stored in "ex _ src".

In S12 (second detection step, determination step), the second detection unit 102 performs object detection using the second learned model from the same frame image as in S11. In this way, the input data to the first learned model and the second learned model are the same data, and thus occurrence of missing detection can be suppressed. This is because, even when the erroneous object is detected by one learned model, the erroneous object is not detected as a whole if it can be detected by another learned model.

In S12, the second detection unit 102 specifically takes as input data a frame image extracted from the moving image file of "ex _ src" stored in the input data storage unit 121. The second detection unit 102 inputs the frame image to the learned model having the script name "ex 2" and causes it to output object information. Then, the second detection unit 102 determines whether or not an object is detected based on the object information, and if an object is detected, associates the frame image with the object information as a detection result. The second detection unit 102 records the detection result in "ex _ dst" of the detection result storage unit 122, which is a common output destination with the detection result of the first detection unit 101. These processes are performed for frame images extracted from the moving image file stored in "ex _ src", respectively. At the processing end time of S12, the data of "ex _ dst" recorded in the detection result storage unit 122 is the final detection result. That is, by the processing of S12, the final detection result is determined.

In S13, the detection result is output, and the process ends. For example, the selection display control unit 104 may output the detection result. In this case, the selection display control unit 104 may cause the selection display device 4 to display the frame image and the object information recorded in "ex _ dst" through the processing of S11 and S12. Thus, the user of the selection display control unit 104 can visually confirm whether or not the detection result of the information processing apparatus 1 is correct, and can input the confirmation result to the information processing apparatus 1. The selection display control unit 104 can specify an image in which the presence of an inappropriate object is confirmed based on the input confirmation result. Further, the selection display control unit 104 may correct the object information based on the result of visual confirmation. Then, the image in which the presence of the inappropriate object is confirmed by visual observation is displayed on the inappropriate object display device 5 by the inappropriate object display control unit 106, for example, when the user of the inappropriate object takes in the garbage again.

Further, the execution order of the process of S11 and the process of S12 is not limited to the example of fig. 5, and the process of S11 may be executed after the process of S12 is executed, or these processes may be performed in parallel. In either case, the final detection result is determined at the point of time when both the processing of S11 and S12 are finished. In the example of fig. 5, two learned models are used, but three or more learned models may be used. In this case, a segment corresponding to the third and subsequent learned models may be added to the setting file F1.

[ resolution with respect to input data ]

In the case where the input data to the learned model is image data as in the present embodiment, the first detection unit 101 may reduce the resolution of the image data and input the image data to the first learned model. Alternatively, the second detection section 102 may reduce the resolution of the image data and input the image data to the second learned model. By reducing the resolution of the input image data, the amount of computation of the object detection process using the learned model can be reduced, and the time required for the object detection process can be shortened.

Whether or not to reduce the resolution of the input data for which learned model may be determined in advance according to the detection target of each learned model may be determined. For example, although detection of a large-sized object such as wood or board is easy with low-resolution image data, it is preferable to use high-resolution image data for detection of a small-sized object such as a can. Therefore, among the learned models used, a model having a large size of the detection target can be input data obtained by reducing the resolution of the captured spam image. Thus, the detection process can be speeded up without lowering the detection accuracy for a large-sized object. Further, the learned model having the image data of low resolution as the input data may previously construct the image data of low resolution identical to the input data as the teaching data. Further, the image data of low resolution may be used as input data to perform object detection, and then the original image data may be returned to the resolution and output. The first detection unit 101 or the second detection unit 102 may perform the process of changing the resolution, or a module for changing the resolution may be separately added to the control unit 10.

In addition, the number of intermediate layers of each learned model may differ depending on the detection object. For example, the number of intermediate layers of the learned model for detection of large-sized objects may be less than the number of intermediate layers of the learned model for detection of smaller-sized objects. In such a configuration, similarly to the above-described example, the detection process can be speeded up without lowering the detection accuracy for a large-sized object.

[ construction and relearning of learned models ]

The construction and relearning of the first learned model and the second learned model described above will be described based on fig. 6. Fig. 6 is a diagram illustrating construction and relearning of a learned model. Here, an example of constructing a learned model of a neural network will be described. In the case of using a neural network, the number of intermediate layers can be made plural, and machine learning in this case is deep learning. Of course, the number of intermediate layers may be one, and machine learning algorithms other than neural networks can also be applied.

As shown, the learned model is constructed by initial learning. Further, object detection is performed using the learned model constructed by the initial learning, relearning is performed using the object detection result, and the learned model is updated.

In the initial learning, an image showing the detection target object is used as a teaching image, and object information (for example, an identifier indicating a classification of an object, information indicating a position, a size, a shape, and the like) of the detection target object to be shown in the teaching image is used as accurate data. It is assumed that tutorial data is stored in tutorial data storage 124 of fig. 1. The tutorial data used in the initial learning is only the original tutorial data 124 a. In the machine learning, the learning unit 103 inputs the teaching image to the neural network, and repeats the process of updating the weight value while changing the teaching image so that the output value of the neural network approaches the correct data.

In machine learning, basically, the weight value approaches the optimal value as the number of repetitions increases, but the weight value may deviate from the optimal value after repetition due to over-learning or the like. In addition, when a learned model for detecting a plurality of detection objects is constructed, the detection accuracy of a certain detection object is high when a certain weight value is applied, but the detection accuracy of other detection objects may be low.

Therefore, in the example of fig. 6, a plurality of learned models 1 to I having different weight values are generated. Then, a model applied as the first learned model and a model applied as the second learned model are selected from these learned models. For example, an image showing the detection target object may be input as test data to each of the learned models 1 to I, the detection accuracy of each detection target object may be calculated from the output value, and the above-described selection may be performed using the calculated detection accuracy as a reference. These learned models are stored in the learned model storage unit 123.

In this selection, it is preferable that the detection target object having low detection accuracy of both the first learned model and the second learned model is not generated. For example, when the first learned model has low detection accuracy of a long object, it is preferable to use a model having high detection accuracy of a long object as the second learned model. Further, it is not necessary to select both the first learned model and the second learned model from the learned models 1 to I. For example, when a first learned model is selected from the learned models 1 to I, a second learned model may be selected from a plurality of learned models that are separately constructed.

The learned model may be selected manually or may be selected by the information processing apparatus 1. In the latter case, the selection criterion of the learned model may be set in advance, and information (for example, information indicating the detection accuracy of each detection target object in each learned model) necessary for determining whether or not the selection criterion is satisfied may be input to the information processing device 1 or calculated.

By performing object detection on the detection target image using the first learned model and the second learned model selected in this way, as described with reference to fig. 5, the image in which the object is detected and the object information are recorded in the detection result storage unit 122. The learning unit 103 uses the image as a teaching image, and relearnss the image using teaching data obtained by adding object information of the image as accurate data and the original teaching data 124a used for the initial learning. Further, it is preferable that the image and object information used as teaching data are images and object information which are confirmed to be correct by visual observation through the selection display device 4. The selected teaching data is set as additional teaching data 122a, and can be copied as additional teaching data 124b to the teaching data storage unit 124 for relearning. Note that, although the added teaching data 124b is a copy of the added teaching data 122a, the added teaching data 124b may be the same as the added teaching data 122a without being copied.

Although the image and the object information are recorded in the detection result storage unit 122, when the input data is a still image, it is not necessary to save the image, and only the image file name of the input data may be recorded.

In the relearning, the learning unit 103 performs learning using the original teaching data 124a and the additional teaching data 124b stored in the teaching data storage unit 124, and constructs a plurality of relearning-performed models 1 to J having different weight values. Also, a first learned model and a second learned model are selected therefrom. The difference from the initial learning is that additional teaching data 124b is added to teaching data for machine learning. By adding teaching data, it is expected to improve the detection accuracy of the learned model. In the relearning, the learning unit 103 does not need to use all the object information recorded as a result of the object detection, and may select and use a part of the object information. In addition, the learning unit 103 may repeat the relearning a plurality of times. In addition, it is also possible to construct each learned model of embodiment 2 and thereafter in the same manner as described above, and to relearn it.

[ embodiment 2]

Another embodiment of the present invention will be described below. For convenience of description, the same reference numerals are used for the members having the same functions as those described in the above embodiments, and the description thereof will not be repeated. The same applies to embodiment 3 described below.

[ example of the Structure of the control section ]

A configuration example of the control unit 10 of the information processing device 1 according to the present embodiment will be described with reference to fig. 7. Fig. 7 is a block diagram showing a configuration example of the control unit 10 provided in the information processing device 1 according to embodiment 2. Fig. 7 also shows the mass storage unit 12.

As shown in fig. 7, control unit 10 includes a first detection unit 201, a second detection unit 202A, and a second detection unit 202B. The learning unit 103, the selection display control unit 104, the incoming vehicle specifying unit 105, and the inappropriate object display control unit 106 are the same as those in embodiment 1, and therefore, illustration thereof is omitted.

The first detection unit 201 inputs input data to a first learned model that has been machine-learned so as to be able to detect a plurality of types of first detection objects, and detects the first detection objects. The first detection unit 201 has the same function as the first detection unit 101 of embodiment 1, but is different from the first detection unit 101 in that input data used by the second detection unit 202A and the second detection unit 202B is specified based on the detection result of the first detection unit 201.

The second detection unit 202A inputs the input data (the detection result of the first detection unit 201) to the second learned model a that has been machine-learned so as to be able to detect the second detection object a that is at least a part of the plurality of types of first detection objects, and detects the second detection object a. The detection result of the second detection unit 202A is the final detection result for the second detection object a. The second detection unit 202A is different from the second detection unit 102 of embodiment 1 in that, of the input data to the first detection unit 201, the input data in which the first detection unit 201 detects the detection target is used as the input data to the second learned model a.

Similarly to the second detection unit 202A, the second detection unit 202B inputs the input data of the detection object detected by the first detection unit 201 to the second learned model B, and detects the second detection object B, which is at least a part of the plurality of types of first detection objects, from the input data (the detection result of the first detection unit 201). The detection result of the second detection unit 202B is the final detection result for the second detection target B. Further, the second detection object a is a different object from the second detection object B.

As described above, the information processing apparatus 1 includes: a first detection unit 201 that detects a plurality of types of first detection objects by inputting input data into a first learned model that has been machine-learned so as to be able to detect the first detection objects; and a second detection unit 202A that detects the second detection object by inputting the input data into a second learned model that is machine-learned so that the second detection object, which is at least a part of the plurality of types of first detection objects, can be detected. Further, the information processing apparatus 1 determines a final detection result based on the detection result of the first detection portion 201 and the detection result of the second detection portion 202A. Specifically, in the information processing device 1 according to the present embodiment, the second detection unit 202A inputs, to the second learned model, the input data of the plurality of input data, the first detection unit 201 of which detects the first detection target, and detects the second detection target a, and sets the detection result of the second detection unit 202A as the final detection result for the second detection target a. The second detection unit 202B inputs, of the plurality of input data, the input data detected by the first detection unit 201 as the first detection target to the second learned model to detect the second detection target B, and sets the detection result of the second detection unit 202B as the final detection result for the second detection target B.

According to the above configuration, at least a part of the detection objects of the first learned model and the second learned model a overlap with each other, and therefore, the possibility of erroneous detection of the overlapping part can be reduced. Similarly, since at least a part of the detection targets of the first learned model and the second learned model B also overlap, the possibility of false detection of the overlapping part can be reduced. Therefore, the detection accuracy of detection using the already-performed machine learning model can be improved. Further, according to the above configuration, since the first detection unit 201 uses the first learned model that has been machine-learned so as to be able to detect a plurality of types of first detection objects, it is possible to efficiently detect a plurality of types of first detection objects at once.

Further, the method of determining the final detection result is not limited to the above-described method. For example, as described in embodiment 1, a module for integrating the detection results may be added to the control unit 10, and the detection results of the first detection unit 201 and the second detection unit 202A may be integrated by the module to obtain the final detection result. The same applies to the integration of the detection results of the first detection unit 201 and the second detection unit 202B. As in the example described in "resolution of input data" in embodiment 1, image data with reduced resolution may be used as input data of the first learned model or input data of either or both of the second learned model a and the second learned model B.

In the example of fig. 7, two

second detection units

202A and 202B are described as means for detecting a part of the detection object of the first detection unit 201. However, the number of modules for detecting a part of the detection target object of the first detection unit 201 may be only one, or may be three or more.

[ procedure for treatment ]

Fig. 8 is a diagram illustrating a flow of processing executed by the information processing apparatus 1 according to the present embodiment. The process executed by the information processing apparatus 1 of the present embodiment and the execution order thereof can be defined by, for example, a setting file F2 shown in fig. 8, and can also be represented by a flowchart of the figure.

(about setting files)

In the setting file F2, three sections of [ EX _ all ], [ EX _ goza ], and [ EX _ tree ] are defined in this order. Although details of each script are omitted, they are the same as SC11 and SC 12. The learned model used in [ EX _ all ] is a segment for detecting all objects to be detected (for example, corrugated paper, board, wood, mat, and long object) similarly to SC 11. [ EX _ goza ] is a section dedicated to the detection of matting in an object to be detected, and [ EX _ tree ] is a section dedicated to the detection of wood in an object to be detected.

Src of [ EX _ goza ] and [ EX _ tree ] is "all _ res" of dst as [ EX _ all ]. That is, [ EX _ goza ] detects a mat from an image in which at least any one detection object is detected by [ EX _ all ], and [ EX _ tree ] detects a wood from an image in which at least any one detection object is detected by [ EX _ all ]. Then, the detection result of mat by [ EX _ goza ] is output to [ goza _ res ], and the detection result of wood by [ EX _ tree ] is output to [ tree _ res ]. These test results are the final test results.

(for flow chart)

The control unit 10 functions as the first detection unit 201 by executing a script file having a file name "all" defined in [ EX _ all ]. After the execution of the script file is completed, the control unit 10 executes the script file with the file name "goza" defined in [ EX _ goza ], thereby functioning as the second detection unit 202A. After the execution of the script file is completed, the control unit 10 executes the script file having the file name "tree" defined in the [ EX _ tree ], thereby functioning as the second detection unit 202B.

The processing (information processing method) executed by these processing units will be described below based on a flowchart. Before the processing of this flowchart is performed, the moving image file captured by the garbage image capturing device 2 is set to be stored in "all _ src" of the input data storage unit 121. Instead of the moving image file, a plurality of frame images extracted from the moving image file or a plurality of still image files captured in time series by the garbage image capturing device 2 may be stored.

In S21 (first detection step), the first detection unit 201 performs object detection on all detection targets from the processing target images stored in the input data storage unit 121 using the first learned model. Specifically, the first detection unit 201 sets, as input data, a frame image extracted from the moving image file of "all _ src" stored in the input data storage unit 121, inputs the frame image to the learned model with the script name "all", and outputs object information. When an object is detected from the frame image, the first detection unit 201 associates the frame image with the object information as a detection result, and records the detection result in "all _ res" of the detection result storage unit 122. These processes are performed for each frame image extracted from the moving image file.

As described above, the frame image recorded in "all _ res" becomes input data of [ EX _ goza ] and [ EX _ tree ], and the mat and wood are tried to be detected again. Therefore, it is preferable that the first detection unit 201 avoid overlooking the mat and the wood, that is, avoid failing to detect the mat and the wood appearing in the frame image, although false detection of the mat and the wood is increased. Therefore, in the object detection based on the output value of the first learned model, the detection threshold value to be compared with the probability value included in the output value may be set low.

In S22 (second detection step, determination step), the second detection unit 202A detects the mat, which is the second detection object a in this example, from the image in which the object was detected in S21 using the second learned model a. Specifically, the second detection unit 202A sets the frame image of "all _ res" recorded in the detection result storage unit 122 as input data, inputs the frame image to the learned model with the script name "goza", and outputs the object information. When determining that the mat is detected, the second detection unit 202A associates the frame image with the object information as a detection result, and records the detection result in [ goza _ res ] of the detection result storage unit 122. These processes are performed for the frame images stored in "all _ res", respectively. The data of [ goza _ res ] recorded in the detection result storage unit 122 at the processing end time of S22 is the final detection result for the mat. That is, by the process of S22, the final detection result of the mat is determined.

In S23, the second detection unit 202B detects wood, which is the second detection object B in this example, from the image in which the object was detected in S21 using the second learned model B. Specifically, the second detection unit 202B sets the frame image of "all _ res" recorded in the detection result storage unit 122 as input data, inputs the frame image to the learned model with the script name "tree", and outputs the object information. When it is determined that wood is detected, the second detection unit 202B associates the frame image with the object information as a detection result, and records the detection result in [ tree _ res ] of the detection result storage unit 122. These processes are performed for the frame images stored in "all _ res", respectively. The [ tree _ res ] data recorded in the detection result storage unit 122 at the processing end time of S23 is the final detection result for wood. That is, the final detection result of wood is determined by the process of S23.

In S24, the detection result is output in the same manner as in S13 of fig. 5, and the process ends. The mat detection result may be read from [ goza _ res ] in the detection result storage unit 122, and the wood detection result may be read from [ tree _ res ] in the detection result storage unit 122. The detection results of the mat and the objects to be detected other than wood may be read from "all _ res" in the detection result storage unit 122.

The detection result of [ EX _ all ] may include false detection of detecting an object other than wood as wood and an object other than mat as mat. However, according to the above-described processing, since the frame image in which a certain object is detected in [ EX _ all ] is supplied to the object detection based on [ EX _ tree ] and [ EX _ goza ], the mat and the wood can be detected with high accuracy.

Further, according to the above configuration, the processing can be speeded up as compared with the case where the object detection is performed using both [ EX _ tree ] and [ EX _ goza ]. For example, in the case of extracting 200 frame images from one moving image file, if two of [ EX _ tree ] and [ EX _ goza ] are used, 200 frame images are processed by [ EX _ tree ] and [ EX _ goza ] respectively. In this case, the object detection processing is performed 400 times in total. On the other hand, according to the above configuration, 200 frame images are processed by [ EX _ all ] at first. Here, when an object is detected in 30 frame images, the total number of frame images processed in [ EX _ tree ] and [ EX _ goza ] is 30, and object detection processing is performed 260 times. Therefore, the number of times of execution of the object detection processing can be significantly reduced, and the time required for the processing can be significantly reduced.

[ embodiment 3 ]

A configuration example of the control unit 10 of the information processing device 1 according to the present embodiment will be described with reference to fig. 9. Fig. 9 is a block diagram showing a configuration example of the control unit 10 provided in the information processing device 1 according to embodiment 3. In addition, fig. 9 also illustrates the mass storage unit 12 in a lump.

As shown in fig. 9, the control unit 10 includes a garbage image extracting unit 301, a first detecting unit 302, a second detecting unit 303, and a detection result integrating unit 304. The learning unit 103, the selection display control unit 104, the incoming vehicle specifying unit 105, and the inappropriate object display control unit 106 are the same as those in embodiment 1, and therefore, illustration thereof is omitted.

The spam image extraction unit 301 extracts an image showing spam from images that are objects to be detected (for example, frame images extracted from a moving image file). Thus, since the images to be detected by the first and second detecting

units

302 and 303 can be locked to the image showing the garbage, the number of times of executing the object detection processing can be reduced, and the processing can be speeded up. For example, when no garbage appears in 3/4 frame images extracted from a moving image file, the first detector 302 and the second detector 303 may perform object detection processing for frame images (1/4 for all frame images) in which garbage appears. Therefore, the number of times of execution of the object detection processing can be significantly reduced, and the time required for the object detection processing can be significantly reduced, as compared with a case where the object detection processing is performed for all the frame images.

The garbage image extraction unit 301 can perform the above extraction using a learned model constructed using, for example, an image in which garbage flows on the slope 600 and an image in which garbage does not flow as teaching data. The learned model only needs to be able to recognize the presence or absence of garbage on the slope 600, and there is no need to determine the classification of garbage. Therefore, when the learned model is used as a model of a neural network, the number of intermediate layers can be reduced as compared with the learned models used by the first and

second detection units

302 and 303 described below.

The image as the input data of the learned model used by the spam image extraction unit 301 may be an image with a lower resolution than the image as the input data of the learned model used by the first detection unit 302 and the second detection unit 303. Further, it is preferable that a higher resolution image is input to the first detection unit 302 and the second detection unit 303. Therefore, when the resolution of the garbage image is lowered and the garbage image is used as input data to the garbage image extracting unit 301, the garbage image extracting unit 301 preferably stores an image of the resolution before the lowering as an output result.

The spam image extraction unit 301 is not an essential component of the information processing device 1, and is included to efficiently detect an object by the first detection unit 302 and the second detection unit 303. The garbage image extracting unit 301 can also be applied to the information processing apparatus 1 according to embodiments 1 and 2.

The first detection unit 302 inputs input data to a first learned model that has been machine-learned so as to be able to detect a plurality of types of first detection objects, and detects the first detection objects. The second detection unit 303 inputs the input data to a third learned model that is machine-learned so as to be able to detect a third detection object different from the first detection object, and detects the third detection object.

As in the above-described embodiments, the information processing device 1 of the present embodiment also determines the final detection result based on the detection result of the first detection unit 302 and the detection result of the second detection unit 303. Specifically, the detection result integrating unit 304 determines the final detection result based on the detection result of the first detecting unit 302 and the detection result of the second detecting unit 303.

More specifically, the detection result integration unit 304 sets, as the detection result of the first detection object, the remaining portion obtained by removing the object detected as the third detection object by the second detection unit 303 from the detection objects detected as the first detection object by the first detection unit 302. In other words, the detection result integration unit 304 invalidates the detection result based on the first learned model if the detection result based on the first learned model does not match the detection result based on the third learned model.

According to the above configuration, the first learned model and the third learned model are different in detection object. Therefore, when the first detection unit 302 detects a certain detection target as a first detection target, the second detection unit 303 can detect the same detection target as a third detection target. In such a case, it can be determined that the erroneous detection occurs in either of the first detection unit 302 and the second detection unit 303. Therefore, according to the above configuration, erroneous detection can be reduced, and the remaining portion of the detection target detected as the third detection target by the second detection unit 303 is removed from the detection target detected as the first detection target by the first detection unit 302, and the detection result of the first detection target is obtained. Therefore, according to the above configuration, the detection accuracy of detection using the machine learning model can be improved. Further, since the first detection unit 302 uses the first learned model that has been machine-learned so as to be able to detect a plurality of types of first detection objects, it is possible to efficiently detect a plurality of types of first detection objects at once.

In the present embodiment, similarly to the example described in "resolution of input data" in embodiment 1, image data with a reduced resolution may be used as the image data input to the first learned model or the third learned model.

[ procedure for treatment ]

Fig. 10 is a diagram illustrating a flow of processing executed by the information processing apparatus 1 according to the present embodiment. The process executed by the information processing apparatus 1 of the present embodiment and the execution order thereof can be defined by, for example, a setting file F3 shown in fig. 10, and can also be represented by a flowchart of the figure.

(about setting files)

In the setting file F3, four segments of [ EX _ trace ], [ EX _ all ], [ EX _ bag ], and [ EX _ final ] are defined in this order. [ EX _ all ] is a section for detecting all objects to be detected (for example, corrugated paper, board, wood, mat, and long object) similarly to the section [ EX1] of FIG. 5. [ EX _ hash ] is a segment for extracting an image exhibiting garbage from an image exhibiting garbage and an image not exhibiting garbage. Further, [ EX _ bag ] is a section in which the disposal bag and the slope 600 (see fig. 4) are detected, and [ EX _ final ] is a section in which the result of removing [ EX _ bag ] from the result of [ EX _ all ] is output.

Dst of [ EX _ hash ] is "hash _ res", and "hash _ res" is src of [ EX _ all ]. That is, the detection object is detected by [ EX _ all ] with respect to the image extracted by [ EX _ hash ].

In addition, dst of [ EX _ all ] is "all _ res", and "all _ res" is src of [ EX _ bag ]. That is, the trash bag and the inclined surface 600 are detected by [ EX _ bag ] on the image in which at least any detection object is detected by [ EX _ all ].

Furthermore, src of [ EX _ final ] is "all _ res", dst is "final _ res". Since the detection result of [ EX _ all ] and the detection result of [ EX _ bag ] are recorded in "all _ res", the [ EX _ final ] outputs the final detection result based on these detection results to "final _ res".

Here, in the present embodiment, the reason why the combination [ EX _ all ], [ EX _ bag ], [ EX _ final ] is used will be described. Since there are countless shapes and colors of the trash bags filled with trash, the shapes of the portions between the trash bags showing the trash image are also various. For example, when a portion between the trash bags is regarded as a long bar, the portion may be erroneously detected as presenting an inappropriate object based on the fact that the inappropriate object is not actually present. In order to avoid such false detection, it is preferable to learn the trash bags, but the number of trash bags appearing in a trash image is about several tens of trash bags in one image, for example, and is very large. Therefore, it takes much time and effort to make correct data for all the teaching images. Similarly, since the appearance of the slope 600 appearing on the image changes according to the condition of the trash on the slope 600, it takes much time and effort to create correct data of the slope for all the teaching images.

Therefore, in the present embodiment, in addition to [ EX _ all ] for detecting an inappropriate object or the like, [ EX _ bag ] dedicated to detection of the trash bag and the slope 600 is used. The third learned model used in EX bag can be constructed by machine learning using only the teaching data of the trash bag and the slope 600.

[ EX _ final ] is a section for confirming whether [ EX _ bag ] detects a trash bag or a slope 600 at a position where an inappropriate object is detected by [ EX _ all ]. Specifically, [ EX _ final ] is a section that outputs the detection result of an inappropriate object to "final _ res" when any of the trash bag and the inclined surface 600 is not detected at the position where the inappropriate object is detected by [ EX _ all ]. This makes it possible to quickly sort out the detection results from which the parts that may erroneously detect the trash bag or the inclined surface 600 as the inappropriate object are excluded from the detection results from [ EX _ bag ] among the detection results from the inappropriate objects from [ EX _ all ]. That is, erroneous detection of the trash bag or the slope 600 as an inappropriate object can be effectively reduced in a short process.

(for flow chart)

The control unit 10 functions as the garbage image extracting unit 301 by executing a script file having a file name "hash" defined in [ EX _ hash ]. After the execution of the script file is completed, the control unit 10 executes the file name defined in [ EX _ all ] as

The script file of "all" functions as the first detection unit 302. After the execution of the script file is completed, the control unit 10 functions as the second detection unit 303 by executing the script file having the file name "bag" defined in [ EX _ bag ]. After the execution of the script file is completed, the control unit 10 functions as the detection result integration unit 304 by executing the script file having the file name "final" defined in [ EX _ final ].

The processing (information processing method) executed by these processing units will be described below based on a flowchart. The moving image file captured by the garbage capture device 2 before the processing of the flowchart is set to be stored in "all _ src" of the input data storage unit 121. Instead of the moving image file, a plurality of frame images extracted from the moving image file or a plurality of still image files captured in time series by the garbage image capturing device 2 may be stored.

In S31, the spam image extraction unit 301 extracts an image showing spam from the processing target image. Specifically, the garbage image extracting unit 301 inputs a frame image extracted from the moving image file of "all _ src" stored in the input data storage unit 121 to a learned model with a script name "hash" and outputs object information. Then, the trash image extracting unit 301 records the frame image determined to have detected trash based on the object information in "hash _ res" of the detection result storing unit 122. These processes are performed for each frame image extracted from the moving image file.

In S32 (first detection step), the first detector 302 performs object detection on all the detection targets from the frame image extracted in S31. Specifically, the first detection unit 302 takes the frame image of "hash _ res" stored in the detection result storage unit 122 as input data, inputs the frame image to the learned model with the script name "all", and outputs object information. Then, the first detection unit 302 associates the frame image determined to have the object detected based on the object information with the object information as a detection result, and records the detection result in "all _ res" in the detection result storage unit 122. These processes are performed for the frame images stored in "hash _ res", respectively.

At S33 (second detection step), the second detection section 303 detects the trash bag and the slope 600 from the image extracted at S31 using the third learned model. Specifically, the second detection unit 303 takes the frame image of "hash _ res" recorded in the detection result storage unit 122 as input data, inputs the frame image to the learned model with the script name "bag", and outputs the object information. Then, the second detection unit 303 determines whether or not an object is detected based on the object information, that is, whether or not a trash bag or a slope is detected. Here, when it is determined that the frame image is detected, the second detection unit 303 associates the frame image with the object information as a detection result, and records the detection result in "all _ res" in the detection result storage unit 122. These processes are performed for the frame images stored in "hash _ res", respectively.

In addition, other learned models may be used to detect the configuration of the trash bag and the incline 600, respectively. The detection target of the second detection unit 303 is not limited to the trash bag or the inclined surface 600 as long as it is an object different from the detection target of the first detection unit 302. However, it is preferable that the detection target of the second detection unit 303 is an object similar in appearance to the detection target of the first detection unit 302. For example, the detection target of the second detection portion 303 may be an object (for example, corrugated paper or the like) having an appearance similar to that of an inappropriate object but not an inappropriate object.

In S34 (determination step), the detection result integration section 304 is caused to determine the final detection result based on the detection results of S32 and S33. More specifically, the detection result integrating unit 304 sets, as the final detection result of the detection target object, the remaining portion of the detection target object detected by the first detection unit 302 at S32, excluding the object detected by the second detection unit 303 at S33 as a trash bag or a slope.

Specifically, the detection result integration unit 304 specifies the range of each detection target object on the image based on the object information of each detection target object stored in "all _ res" by the first detection unit 302. Next, the detection result integrating unit 304 determines whether or not the trash bag or the slope 600 is detected in the above range based on the object information stored in "all _ res" by the second detecting unit 303. Here, when it is determined that the trash bag or the slope 600 is not detected in the above range, the detection result integrating unit 304 associates the object information of the detection target with the frame image as a final detection result, and records the final detection result in the "final _ res" of the detection result storage unit 122. On the other hand, when it is determined that the trash bag or the slope 600 is detected in the above range, the detection result integrating unit 304 does not record the object information and the frame image of the detection target. That is, the detection result of the detection target is invalid as false detection.

In S35, the detection result is output in the same manner as in S13 of fig. 5, and the process ends. The output detection result may be read from "final _ res" in the detection result storage unit 122.

[ modified example ]

In the object detection, the classification of the object, and the like in the above-described embodiments, artificial intelligence and machine learning algorithms other than the neural network (including the deeply learned contents) that has been subjected to machine learning can be used.

The execution body of each process described in each of the above embodiments can be appropriately changed. For example, at least one of the blocks shown in fig. 1, 8, or 10 may be omitted, and the omitted processing unit may be provided in one or more other apparatuses. In this case, the processes of the above-described embodiments are executed by one or more information processing apparatuses.

In the above embodiments, the example of detecting an inappropriate object or the like from a spam image has been described, but the detection target object is arbitrary and is not limited to an inappropriate object or the like. The input data to the learned model used by the information processing device 1 is not limited to image data, and may be, for example, audio data. In this case, the information processing device 1 may be configured to detect a component of a predetermined sound included in the input sound data as a detection target.

The present invention is not limited to the above embodiments, and various modifications can be made within the scope of the claims, and embodiments obtained by appropriately combining technical means disclosed in different embodiments are also included in the technical scope of the present invention.

Claims

1. An information processing apparatus, comprising:

a first detection unit that detects a first detection object by inputting input data to a first learned model that has been machine-learned so as to be able to detect a plurality of types of first detection objects; and

a second detection unit that inputs the input data to a second learned model that is machine-learned so as to be able to detect a second detection object that is at least a part of the plurality of types of first detection objects and detects the second detection object, or that inputs the input data to a third learned model that is machine-learned so as to be able to detect a third detection object that is different from the first detection object and detects the third detection object,

the information processing apparatus determines a final detection result based on the detection result of the first detection portion and the detection result of the second detection portion.

2. The information processing apparatus according to claim 1,

the second detection unit inputs the input data to the second learned model and detects the second detection target,

the first detection unit and the second detection unit output detection results to a common output destination,

the detection result of the first detection section and the detection result of the second detection section, which are output to the common output destination, are taken as final detection results.

3. The information processing apparatus according to claim 1,

the first detection unit inputs a plurality of input data to the first learned model and detects the first detection target from each input data,

the second detection unit inputs, of the plurality of input data, input data in which the first detection target is detected by the first detection unit into the second learned model and detects the second detection target,

the information processing apparatus takes the detection result of the second detection section as a final detection result.

4. The information processing apparatus according to claim 1,

the second detection unit inputs the input data to the third learned model and detects the third detection target,

the information processing apparatus includes a detection result integration unit configured to determine, as a detection result of the first detection object, a remaining portion of the detection object detected as the first detection object by the first detection unit excluding an object detected as the third detection object by the second detection unit.

5. The information processing apparatus according to any one of claims 1 to 4,

the input data is image data and the input data is,

the first detection unit inputs data in which the resolution of the image data is reduced to the first learned model, or,

the second detection unit inputs data in which the resolution of the image data is reduced to the second learned model or the third learned model.

6. An information processing method executed by one or more information processing apparatuses, characterized by comprising:

a first detection step of inputting input data to a first learned model that has been machine-learned so as to be able to detect a plurality of types of detection objects, and detecting the detection objects from the input data;

a second detection step of inputting the input data to a second learned model that is machine-learned so that at least a part of the plurality of types of detection objects can be detected, or to a third learned model that is machine-learned so that a detection object different from the first learned model can be detected, and detecting the detection object from the input data,

the information processing method further includes:

a determination step of determining a final detection result based on the detection result of the first detection step and the detection result of the second detection step.

7. A computer-readable recording medium in which an information processing program for causing a computer to function as the information processing apparatus according to claim 1 and causing a computer to function as the first detection unit and the second detection unit is recorded.