WO2020149242A1 - Work assistance device, work assistance method, program, and object detection model - Google Patents

Work assistance device, work assistance method, program, and object detection model Download PDF

Info

Publication number
WO2020149242A1
WO2020149242A1 PCT/JP2020/000731 JP2020000731W WO2020149242A1 WO 2020149242 A1 WO2020149242 A1 WO 2020149242A1 JP 2020000731 W JP2020000731 W JP 2020000731W WO 2020149242 A1 WO2020149242 A1 WO 2020149242A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
work support
probability
detection model
support device
Prior art date
Application number
PCT/JP2020/000731
Other languages
French (fr)
Japanese (ja)
Inventor
清水 秀樹
亮介 田嶋
Original Assignee
Arithmer株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from JP2019006450A external-priority patent/JP6508797B1/en
Priority claimed from JP2019066075A external-priority patent/JP6756961B1/en
Application filed by Arithmer株式会社 filed Critical Arithmer株式会社
Publication of WO2020149242A1 publication Critical patent/WO2020149242A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present disclosure relates to a work support device, a work support method, a program, and an object detection model.
  • Patent Document 1 Japanese Patent Laid-Open No. 2018-169672
  • an insufficient pattern is identified by identifying an insufficient pattern with a small number of teacher images and spatially inverting or changing the color tone of a certain teacher image.
  • a technique for generating a new teacher image belonging to is disclosed.
  • the work support device is a work support device that supports a work for setting whether or not an object is shown in an area in an image.
  • the work support device includes an extraction unit and a generation unit.
  • the extraction unit includes an area in which the probability that the object is imaged is equal to or greater than a first threshold value from an arbitrary moving image by using the object detection model constructed by using the teacher image in which the object is imaged. Extract the image as a candidate image.
  • the generation unit generates coordinate information of a region in the candidate image in which the probability that the object is captured is equal to or higher than the first threshold value.
  • the work support device includes an extraction unit and an output unit.
  • the extraction unit includes an area in which the probability that the object is imaged is equal to or greater than a first threshold value from an arbitrary moving image by using the object detection model constructed by using the teacher image in which the object is imaged. Extract the image as a candidate image.
  • the output unit outputs an image in which a bounding box corresponding to a region in which the probability that the object is captured is equal to or higher than the first threshold is combined with the candidate image.
  • FIG. 3 is a partially enlarged view of FIG. 2.
  • 5 is a schematic diagram for explaining the concept of information processing by the processing unit 24.
  • FIG. 6 is a flowchart for explaining the operation of the work support device 20.
  • 9 is a flowchart for explaining the operation of the work support device 20 according to Modification A.
  • FIG. 13 is a schematic diagram for explaining end-to-end learning according to modification B. It is a schematic diagram which shows an example of the display screen of the work assistance apparatus 20 which concerns on modification C, D.
  • FIG. 1 is a schematic diagram showing the configuration of the work support device 20 according to the present embodiment.
  • FIG. 2 is a diagram showing an example of the candidate image Gk output by the work support device 20.
  • FIG. 3 is a partially enlarged view of a broken line portion of FIG.
  • FIG. 4 is a schematic diagram for explaining the concept of information processing by the processing unit 24 described later.
  • the work support device 20 is a device that supports the work of generating the teacher image Gt used for the object detection model 21M.
  • Gt is used to collectively describe a plurality of teacher images
  • Gt1 is used with a subscript when the individual teacher images are described separately.
  • the “object detection model 21M” is constructed by a neural network in which weights are adjusted based on the teacher image Gt on which the object O is copied, and the area extraction of the object O in the image and the extraction of the object O are performed. Perform object recognition. Specifically, the object detection model 21M calculates the probability that the object O appears in the image when the image is input, and if the calculated probability is a predetermined value or more, the object O Outputs the area where is appearing.
  • the target object detection model 21M can detect a plurality of preset target objects.
  • the area of the object O is defined by coordinate information b1 to b4 corresponding to the four vertices of the bounding box B combined in the image Gk. Therefore, in the teacher image Gt of the object detection model 21M, the object O is imaged in the area corresponding to the coordinate information b1 to b4 of the bounding box B.
  • a “traffic light” is shown as the object O, but the object O is not limited to this.
  • the object O any object can be adopted.
  • the object O can be set not only by the type of object but also by the state or the like.
  • the object O may be set not only as a traffic light but also as a traffic light that displays a red traffic light and a traffic light that displays a blue traffic light.
  • the work support device 20 can be realized by any computer, and includes a storage unit 21, an input unit 22, an output unit 23, and a processing unit 24.
  • the work support device 20 may be realized as hardware using an LSI (Large Scale Integration), an ASIC (Application Specific Integrated Circuit), an FPGA (Field-Programmable Gate Array), or the like.
  • the storage unit 21 stores various kinds of information, and is realized by an arbitrary storage device such as a memory and a hard disk.
  • the storage unit 21 stores information such as the weight of the neural network that constructs the object detection model 21M.
  • the storage unit 21 stores a plurality of teacher images Gt, and stores a plurality of teacher images Gt1 to Gtp (p is a natural number other than 1) in the initial state.
  • the storage unit 21 also stores a moving image GD for generating a new teacher image Gtq (q is a value other than 1 to p).
  • the moving image GD is captured by an arbitrary image capturing device.
  • the input unit 22 is realized by an arbitrary input device such as a keyboard, a mouse, a touch panel, etc., and inputs various information to a computer.
  • the output unit 23 is realized by any output device such as a display, a touch panel, a speaker, etc., and outputs various information from the computer.
  • the processing unit 24 executes various types of information processing, and is realized by a processor such as a CPU or GPU and a memory.
  • the processing unit 24 causes the extraction unit 24A, the generation unit 24B, the synthesis unit 24C, the setting unit 24D, by reading one or a plurality of programs stored in the storage unit 21 into the CPU, GPU, or the like of the computer.
  • the updating unit 24E functions as the updating unit 24E.
  • each function of the processing unit 24 will be described with reference to FIG.
  • the extraction unit 24A uses the object detection model 21M, and the probability that the object O is captured from the images Gd1 to Gdj of each frame of the arbitrary moving image GD is the first threshold P1 or more and the second threshold.
  • An image including a region that is P2 or less is extracted as a candidate image Gk (denoted as Gk1 to Gk3 in FIG. 4).
  • the first threshold P1 is set to about 10%
  • the second threshold P2 is set to about 60%.
  • the generation unit 24B generates coordinate information of an area in the candidate image Gk in which the probability that the object O is captured is equal to or higher than the first threshold P1.
  • the generation unit 24B also generates a file in which coordinate information b1 to b4 of the vertices of the bounding box B (denoted as B1 to B3 in FIG. 4) corresponding to the area in which the object O is imaged is described. ..
  • the coordinate information b1 to b4 can be defined by the two-dimensional coordinates of each vertex. However, not limited to this, the coordinate information b1 to b4 may be defined by the two-dimensional coordinates of one vertex and the width and height from the vertex, assuming that the bounding box B is a square or a rectangle.
  • a file in which eight values corresponding to four vertices on the two-dimensional coordinates are described is generated.
  • a file in which a total of four values including two values corresponding to one vertex on the two-dimensional coordinates and two values indicating the width and height from the vertex is generated is generated.
  • the combining unit 24C combines the bounding box B with the candidate image Gk and displays it on the display of the output unit 23.
  • the setting unit 24D sets, via the input unit 22, whether or not the object O is shown in the bounding box B combined with the candidate image Gk.
  • the object O is shown in the bounding boxes B1 and B3 of the candidate images Gk1 and Gk3, but the object P other than the object O is shown in the bounding box B2 of the candidate image Gk2.
  • the setting unit 24D sets, via the input unit 22, "candidate images Gk1 and Gk3 are target images in which the target O is captured" (U1, U3).
  • the setting unit 24D makes a setting via the input unit 22 that "the candidate image Gk2 is not a target image in which the target O is captured" (U2).
  • the setting unit 24D generates an arbitrary bounding box B in the image by designating the coordinates, and sets that the object O is a target image in which the target O is captured using the bounding box B. Is also possible.
  • the updating unit 24E updates the target object detection model 21M by adding an image in which whether or not the target object O is set to the current teacher image Gt and readjusting the weight of the neural network.
  • FIG. 5 is a flowchart for explaining the operation of the work support device 20 according to the present embodiment.
  • a moving image GD of the surrounding environment of the image capturing device is captured by an arbitrary image capturing device.
  • these moving images GD are stored in the storage unit 21 of the work support device 20 in a timely manner (S1).
  • the work support apparatus 20 When the probability that the object O is copied is within the above range, the work support apparatus 20 outputs the image of that frame as a “candidate image Gk” (S4-Yes, S5). Specifically, as shown in FIGS. 2 and 3, an image in which the bounding box B is combined in the area where the object O is displayed is displayed on the display constituting the output unit 23.
  • the worker determines whether the candidate image Gk is a target image in which the target O is captured (S6).
  • the work support apparatus 20 accepts the setting as to whether the candidate image Gk is the target image, via the input unit 22 and the setting unit 24D. For example, when the images as shown in FIGS. 2 and 3 are displayed as the candidate image Gk, the “traffic light”, which is the object O, is shown in the bounding box B, and therefore the operator does not need to perform additional work.
  • the setting that the image is a physical image can be accepted (S6-Yes, S7).
  • the target object O when the target object O is not shown in the candidate image Gk, it means that an object other than the target object O is erroneously recognized as a “traffic light” and is extracted, and thus the bounding of the erroneously recognized object is performed.
  • the setting that the image is not the object image is accepted (S6-No, S8).
  • the extraction unit 24A uses the object detection model 21M, and the probability that the object O is copied from an arbitrary moving image GD is the first.
  • An image including a region having a threshold value P1 or more and a second threshold value P2 or less is extracted as a candidate image Gk.
  • the generation unit 24B generates coordinate information b1 to b4 of a region in the candidate image Gk in which the probability that the object O is photographed is not less than the first threshold P1 and not more than the second threshold P2.
  • the work support device 20 also includes a setting unit 24D.
  • the setting unit 24D sets that the object O is imaged in the bounding box B or that the object O is not imaged in the bounding box B through the operation of the input unit 22 by the operator. Accept.
  • the worker can efficiently perform the work of setting whether or not the object O is imaged in the image Gdi of each frame of the moving image GD.
  • the worker only needs to confirm whether or not the object O is imaged in the area corresponding to the bounding box B in the candidate image Gk in which the object O is displayed with a certain probability of appearance.
  • the image showing the object O can be used as a new teacher image Gtq (see FIG. 4).
  • the work support device 20 it is possible to efficiently collect a large number of teacher images Gt.
  • the extraction unit 24A extracts, as the candidate image Gk, an image including a region in which the probability that the object O is copied is equal to or higher than the first threshold P1. Therefore, an image in which the object O is not copied is excluded. It will be. In other words, the extraction unit 24A does not extract a noise image as the candidate image Gk. As a result, the setting of whether or not the object O is a captured image is made efficient.
  • the extraction unit 24A extracts, as a candidate image Gk, an image in which the probability that the object O is captured is equal to or less than the second threshold value P2. As a result, it is possible to efficiently collect a new teacher image that contributes to the improvement of the detection accuracy of the object detection model 21M.
  • the weight of the target object detection model 21M is updated by adding a target image that can be detected with high probability using the current teacher image group Gt1 to Gtp as a new teacher image to the current teacher image group Gt1 to Gtp.
  • a teacher image often does not contribute to improvement in detection accuracy of the object detection model 21M.
  • an object image that cannot be detected with high probability using the current teacher image group Gt1 to Gtp is added to the target object detection model 21M.
  • the work support apparatus 20 When the weight is updated, a significant change occurs in the feature amount of the target object O extracted from the current teacher image group Gt1 to Gtp, and the detection accuracy of the target object detection model 21M is improved.
  • the work support apparatus 20 extracts an image whose detection accuracy does not increase in the current teacher image groups Gt1 to Gtp as the candidate image Gk, thereby detecting the detection accuracy of the object detection model 21M. It is possible to efficiently collect new teacher images that contribute to improvement.
  • the work support device 20 further includes an updating unit 24E.
  • the updating unit 24E adds an image in which whether or not the target object O is captured to the current teacher image Gt, adjusts the weight of the neural network, and updates the target object detection model 21M.
  • the accuracy of detecting the object O in the object detection model 21M is improved according to the use of the work support device 20. As a result, it becomes possible to provide an object detection model with high detection accuracy.
  • the object detection model 21M can detect a plurality of objects.
  • the setting unit 24D can also accept a change in the setting of the target object corresponding to the bounding box B. Specifically, it is set that the second object, not the first object, is imaged in the candidate image that is output as the probability that the first object is imaged is greater than or equal to the first threshold value. be able to. For example, when a bounding box B indicating that the probability that a traffic light displaying a green light is displayed is greater than or equal to a first threshold value and less than or equal to a second threshold value is displayed, a red light signal is captured in the actual candidate image Gk. If so, the user can set through the input unit 22 and the setting unit 24D that the candidate image Gk includes a traffic light displaying a red traffic light.
  • the extraction unit 24A may stop the extraction of the candidate image Gk when the change amount of the previously extracted image is equal to or less than the predetermined amount.
  • the extraction unit 24A according to the modification A stores the previously extracted image as the reference image Gc.
  • the extraction unit 24A includes an image including a region in which the probability that the object O is imaged is equal to or more than the first threshold P1 and equal to or less than the second threshold P2.
  • the extraction unit 24A stops extracting the image as a candidate image Gk when the change amount of the image Gdi of one frame forming the moving image GD from the reference image Gc is equal to or less than a predetermined amount.
  • the work support device 20 according to this modification A executes the operation shown in the flowchart of FIG.
  • steps T1 to T4, T6 to T8, and T10 to T12 execute the same processing as steps S1 to S9 described above, respectively.
  • steps T5 and T9 are added.
  • step T5 the candidate image Gk is extracted only when the amount of change from the reference image Gc is larger than the predetermined amount.
  • step T9 when the object image is newly set, the object image is set as a new reference image Gc.
  • the work support apparatus 20 does not collect the candidate images Gk that do not contribute to the improvement in the detection accuracy of the object detection model 21M.
  • an image in which the amount of change from the reference image Gc is less than or equal to a predetermined amount is an image similar to the reference image Gc. Therefore, such an image is added as a new teacher image to the current teacher image groups Gt1 to Gtp.
  • the weight of the object detection model 21M is updated, there is often no significant change in the feature amount of the object O extracted from the current teacher image groups Gt1 to Gtp. That is, such a teacher image often does not contribute to the improvement of the detection accuracy of the object detection model 21M.
  • the object detection model 21M can be quickly constructed while reducing the calculation load.
  • the work support device 20 according to the modification A can efficiently collect the candidate images Gk that contribute to the improvement of the detection accuracy of the object detection model 21M.
  • the object detection model 21M is constructed by a neural network that performs area extraction of the object O and object recognition of the object O end-to-end. It may be one. With such a configuration, the object O can be detected at high speed, and the object O can be detected in real time.
  • end-to-end means, as shown in the concept of FIG. 7A, a neural network having an appropriate structure for the processing of region extraction of the object O and object recognition of the object O. It means learning the input/output relations directly through the network.
  • an object detection model 21M can be realized by using an algorithm such as YOLO (You Only Look Once) or SSD (Single Shot Multi Box Detector).
  • the object detection model 21M is not limited to this, and as shown in the concept of FIG. 7B, is constructed by a combination of an algorithm and a neural network for individually extracting the area of the object O and recognizing the object of the object O. It may be done.
  • the work support device 20 synthesizes an image showing the type of the target object and the value of the probability that the target object O is photographed with the candidate image Gk and outputs the candidate image Gk. It may be one. Accordingly, the worker can easily recognize what the target object O shown in the candidate image Gk is.
  • the image corresponding to the bounding box B is a traffic light that displays a red traffic light (denoted as Red_light in FIG. 8 ), and the probability that it is a traffic light that displays a red traffic light. Is 43.21%.
  • the area indicated by the symbol M is displayed near the corresponding bounding box B.
  • the work support device 20 uses the object detection model 21M, and the probability that the object O is imaged from an arbitrary moving image GD is equal to or higher than the first threshold value and equal to or lower than the second threshold value.
  • An image including a certain area may be stored as a candidate image Gk in a folder divided for each object. Further, the work support device 20 may output the candidate image Gk stored for each folder together with the bounding box B.
  • an operator opens a folder in which a plurality of candidate images in which a traffic light displaying a green signal is displayed is accumulated and continuously outputs the images in the folder so that the traffic light displaying the green signal in those candidate images. It is possible to efficiently determine whether or not is captured. Further, when the operator continuously confirms the images in the folder, the operator clicks an icon, which is indicated by symbol I2 in FIG. Can be displayed. Here, when the icon I2 is clicked, the next image is displayed, and at the same time, the candidate image Gk being displayed is set to include a traffic light displaying a green signal. In short, the worker can perform the annotation work for generating the teacher image Gt used for the object detection model 21M by simply clicking the icon I2 while continuously checking the images. Note that the symbol I1 in FIG. 8 is an icon that means return to the front, and when this icon I1 is clicked, the previously displayed candidate image is displayed.
  • the present disclosure is not limited to the above embodiments as they are.
  • the present disclosure can be embodied by modifying the constituent elements within the scope not departing from the gist of the present invention at the implementation stage.
  • the present disclosure can form various disclosures by appropriately combining a plurality of constituent elements disclosed in each of the above-described embodiments. For example, some components may be deleted from all the components shown in the embodiment. Further, the constituent elements may be appropriately combined with different embodiments.
  • the work support apparatus of the first aspect is a work support apparatus that supports a work for setting whether or not an object is imaged in a region within an image.
  • the work support device includes an extraction unit and a generation unit.
  • the extraction unit includes an area in which the probability that an object is imaged is equal to or higher than a first threshold value from an arbitrary moving image by using an object detection model constructed using a teacher image in which the object is imaged. Extract the image as a candidate image.
  • the generation unit generates coordinate information of an area in the candidate image in which the probability that the object is captured is equal to or higher than the first threshold value.
  • a work support apparatus is the work support apparatus according to the first aspect, in which the object detection model is constructed by a neural network that performs end-to-end object region extraction and object object recognition. Is. With such a configuration, it is possible to speed up the detection of the target object.
  • the work support apparatus is the work support apparatus according to the first aspect or the second aspect, wherein the extraction unit selects an image including a region in which the probability that the object is captured is equal to or less than a second threshold as a candidate image. To extract. With such a configuration, it is possible to efficiently collect candidate images that contribute to improving the detection accuracy of the object detection model.
  • the work support apparatus is the work support apparatus according to the first to third aspects, and the extraction unit extracts the candidate image when the change amount from the previously extracted candidate image is less than or equal to a predetermined amount. To stop. As a result, the collection of candidate images that do not contribute to the improvement in the detection accuracy of the object detection model is stopped. As a result, it is possible to efficiently collect candidate images that contribute to improving the detection accuracy of the object detection model.
  • the work support apparatus is the work support apparatus according to the first to fourth aspects, and further includes an updating unit.
  • the updating unit updates the target object detection model by adding an image in which whether or not the target object is set to the teacher image. With such a configuration, it is possible to efficiently collect the candidate images that contribute to the improvement of the detection accuracy of the object according to the use.
  • the sixth aspect of the object detection model is the object detection model updated by the work support apparatus of the fifth aspect. Therefore, an object detection model with high detection accuracy can be provided.
  • the program according to the seventh aspect causes a computer to function as a work support device that supports a work for setting whether or not an object is shown in an area in an image.
  • This program causes a computer to function as an extraction unit and a generation unit.
  • the extraction unit includes an area in which the probability that the object is imaged is equal to or greater than a first threshold value from an arbitrary moving image by using the object detection model constructed by using the teacher image in which the object is imaged. Extract the image as a candidate image.
  • the generation unit generates coordinate information of a region in the candidate image in which the probability that the object is captured is equal to or higher than the first threshold value.
  • the work support method of the eighth aspect is a method of using a computer to support work for setting whether or not an object is shown in an area within an image.
  • the probability that the object is copied from an arbitrary moving image is equal to or higher than a first threshold value by using an object detection model constructed using a teacher image in which the object is copied.
  • An image including a region is extracted as a candidate image.
  • the coordinate information of the region in the candidate image in which the probability that the object is photographed is the first threshold value or more is generated. Therefore, according to this work support method, it can be efficiently set whether or not the target object is a captured target object image. As a result, a large number of teacher images can be efficiently collected.
  • the work support apparatus includes an extraction unit and an output unit.
  • the extraction unit includes an area in which the probability that the object is imaged is equal to or greater than a first threshold value from an arbitrary moving image by using the object detection model constructed by using the teacher image in which the object is imaged. Extract the image as a candidate image. Further, the output unit outputs an image in which a bounding box corresponding to a region in which the probability that the object is captured is equal to or higher than the first threshold is combined with the candidate image.
  • the work support apparatus is the work support apparatus according to the ninth aspect, and accepts a setting that an object is imaged in the bounding box or that the object is not imaged in the bounding box.
  • a setting unit is further provided. Through this setting unit, it can be efficiently set whether or not the target object is a captured target object image.
  • the work support apparatus of the eleventh aspect is the work support apparatus of the ninth or tenth aspect, and the object detection model detects a plurality of objects.
  • the work support device replaces the first target object with the second target object for the image in which the bounding box corresponding to the region in which the probability that the first target object is captured is equal to or higher than the first threshold value is combined. It further includes a setting unit that accepts the setting that the is captured. With such a configuration, correction can be easily performed when the object is erroneously detected.
  • the work support apparatus is the work support apparatus according to the ninth to eleventh aspects, and the object detection model detects a plurality of objects.
  • the work support device uses the object detection model to classify, from any moving image, an image including a region in which the probability that the object is captured is equal to or more than a first threshold value as a candidate image, for each object. Stored in the specified folder. With this, by continuously displaying the images accumulated in each folder, it is only necessary to confirm whether or not the target object is imaged, which reduces the burden on the operator.
  • the work support apparatus is the work support apparatus according to the ninth to twelfth aspects, and combines the value of the probability that the object is photographed with the candidate image and outputs it. This allows the operator to easily recognize what the target object is in the candidate image.
  • the work support apparatus is the work support apparatus according to the ninth to thirteenth aspects, wherein the extraction unit selects an image including a region in which the probability that the object is captured is equal to or less than the second threshold as a candidate image. To extract. With such a configuration, it is possible to efficiently collect candidate images that contribute to improving the detection accuracy of the object detection model.
  • the work support apparatus is the work support apparatus according to the ninth to fourteenth aspects, wherein the extraction unit extracts the candidate image when the amount of change from the previously extracted candidate image is less than or equal to a predetermined amount. To stop. As a result, it is possible to efficiently collect candidate images that contribute to improving the detection accuracy of the object detection model.
  • a work support apparatus is the work support apparatus according to the ninth to fifteenth aspects, in which an image in which whether or not an object is photographed is set to a teacher image and an object detection model is set. An updating unit for updating is further provided. With such a configuration, it is possible to efficiently collect the candidate images that contribute to the improvement of the detection accuracy of the object according to the use.
  • the program of the seventeenth aspect causes a computer to function as an extraction unit and an output unit.
  • the extraction unit includes an area in which the probability that the object is imaged is equal to or greater than a first threshold value from an arbitrary moving image by using the object detection model constructed by using the teacher image in which the object is imaged. Extract the image as a candidate image.
  • the output unit outputs, to the candidate image, an image in which a bounding box corresponding to a region in which the probability that the object is captured is equal to or higher than the first threshold value is combined.
  • the work support method is a method of using a computer to support work for setting whether or not an object (O) is imaged in a region within an image.
  • the probability that the object is copied from an arbitrary moving image is equal to or higher than a first threshold value by using an object detection model constructed using a teacher image in which the object is copied.
  • An image including a region is extracted as a candidate image.
  • an image in which a bounding box corresponding to a region in which the probability that an object is imaged is equal to or higher than a first threshold is combined is output to the candidate image. Therefore, according to this work support method, by maintaining the display of the bounding box or deleting it, it is possible to efficiently set whether or not the object is the image of the object. As a result, a large number of teacher images can be efficiently collected.

Abstract

The present invention can efficiently set whether an image is one in which an object appears. A work assistance device 20 is provided with an extraction unit 24A and an output unit 23. The extraction unit 24A uses an object detection model 21M, which is constructed using a teaching image Gt in which an object O appears, to extract as a candidate image Gk from an arbitrary moving image GD an image containing a region for which the probability that the object appears is equal to or greater than a first threshold value. The output unit 23 outputs an image in which a bounding box B corresponding to the region for which the probability that the object appears is equal to or greater than the first threshold value has been combined with the relevant candidate image.

Description

作業支援装置、作業支援方法、プログラム、及び対象物検知モデル。Work support device, work support method, program, and object detection model.
 本開示は、作業支援装置、作業支援方法、プログラム、及び対象物検知モデルに関する。 The present disclosure relates to a work support device, a work support method, a program, and an object detection model.
 従来、機械学習のための教師画像を生成する方法が検討されている。例えば、特許文献1(特開2018-169672号公報)には、教師画像の個数が少ない不足パターンを特定し、ある教師画像を空間的に反転したり色調を変更したりすることにより、不足パターンに属する新たな教師画像を生成する技術が開示されている。 Conventionally, a method of generating a teacher image for machine learning has been studied. For example, in Patent Document 1 (Japanese Patent Laid-Open No. 2018-169672), an insufficient pattern is identified by identifying an insufficient pattern with a small number of teacher images and spatially inverting or changing the color tone of a certain teacher image. There is disclosed a technique for generating a new teacher image belonging to.
 特許文献1に記載されているように、効果的な学習を行なうには、多数の教師画像を集めることが望ましい。 As described in Patent Document 1, it is desirable to collect a large number of teacher images for effective learning.
 本開示の第1の態様の作業支援装置は、画像内の領域に対象物が写されているか否かを設定するための作業を支援する作業支援装置である。ここで、作業支援装置は、抽出部及び生成部を備える。抽出部は、対象物が写された教師画像を用いて構築された対象物検知モデルを用いて、任意の動画像から、対象物が写されている確率が第1閾値以上である領域を含む画像を候補画像として抽出する。生成部は、候補画像における、対象物が写されている確率が第1閾値以上である領域の座標情報を生成する。このような構成により、候補画像内の座標領域を指定又は解除することで、対象物が写されている対象物画像であるか否かを効率的に設定できる。結果として、多数の教師画像を効率的に集めることができる。 The work support device according to the first aspect of the present disclosure is a work support device that supports a work for setting whether or not an object is shown in an area in an image. Here, the work support device includes an extraction unit and a generation unit. The extraction unit includes an area in which the probability that the object is imaged is equal to or greater than a first threshold value from an arbitrary moving image by using the object detection model constructed by using the teacher image in which the object is imaged. Extract the image as a candidate image. The generation unit generates coordinate information of a region in the candidate image in which the probability that the object is captured is equal to or higher than the first threshold value. With such a configuration, by designating or canceling the coordinate area in the candidate image, it is possible to efficiently set whether or not the target object is the image of the target object. As a result, a large number of teacher images can be efficiently collected.
 本開示の第2の態様の作業支援装置は、抽出部と、出力部と、を備える。抽出部は、対象物が写された教師画像を用いて構築された対象物検知モデルを用いて、任意の動画像から、対象物が写されている確率が第1閾値以上である領域を含む画像を候補画像として抽出する。また、出力部は、当該候補画像に、対象物が写されている確率が第1閾値以上である領域に対応するバウンディングボックスが合成された画像を出力する。このような構成により、バウンディングボックスの表示を維持するか、又は削除することで、対象物が写されている対象物画像であるか否かを効率的に設定できる。結果として、多数の教師画像を効率的に集めることができる。 The work support device according to the second aspect of the present disclosure includes an extraction unit and an output unit. The extraction unit includes an area in which the probability that the object is imaged is equal to or greater than a first threshold value from an arbitrary moving image by using the object detection model constructed by using the teacher image in which the object is imaged. Extract the image as a candidate image. In addition, the output unit outputs an image in which a bounding box corresponding to a region in which the probability that the object is captured is equal to or higher than the first threshold is combined with the candidate image. With such a configuration, by maintaining the display of the bounding box or deleting it, it is possible to efficiently set whether or not the target object is the image of the target object. As a result, a large number of teacher images can be efficiently collected.
作業支援装置20の構成を示す模式図である。It is a schematic diagram showing a configuration of a work support device 20. 候補画像Gkの一例を示す図である。It is a figure which shows an example of the candidate image Gk. 図2の一部拡大図である。FIG. 3 is a partially enlarged view of FIG. 2. 処理部24による情報処理の概念を説明するための模式図である。5 is a schematic diagram for explaining the concept of information processing by the processing unit 24. FIG. 作業支援装置20の動作を説明するためのフローチャートである。6 is a flowchart for explaining the operation of the work support device 20. 変形例Aに係る作業支援装置20の動作を説明するためのフローチャートである。9 is a flowchart for explaining the operation of the work support device 20 according to Modification A. 変形例Bに係るエンドツーエンドの学習を説明するための模式図である。FIG. 13 is a schematic diagram for explaining end-to-end learning according to modification B. 変形例C,Dに係る作業支援装置20の表示画面の一例を示す模式図である。It is a schematic diagram which shows an example of the display screen of the work assistance apparatus 20 which concerns on modification C, D.
 (1)作業支援装置の構成
 以下、本開示の一実施形態に係る作業支援装置の構成について図面を用いて説明する。
 図1は本実施形態に係る作業支援装置20の構成を示す模式図である。図2は作業支援装置20により出力される候補画像Gkの一例を示す図である。図3は図2の破線部分の一部拡大図である。図4は後述する処理部24による情報処理の概念を説明するための模式図である。
(1) Configuration of Work Support Device Hereinafter, the configuration of the work support device according to an embodiment of the present disclosure will be described with reference to the drawings.
FIG. 1 is a schematic diagram showing the configuration of the work support device 20 according to the present embodiment. FIG. 2 is a diagram showing an example of the candidate image Gk output by the work support device 20. FIG. 3 is a partially enlarged view of a broken line portion of FIG. FIG. 4 is a schematic diagram for explaining the concept of information processing by the processing unit 24 described later.
 作業支援装置20は、対象物検知モデル21Mに用いる教師画像Gtを生成する作業を支援する装置である。なお、以下の説明において、複数の教師画像をまとめて説明する場合はGtと表記し、個々の教師画像を区別して説明する場合は添え字を付してGt1のように表記する。 The work support device 20 is a device that supports the work of generating the teacher image Gt used for the object detection model 21M. In the following description, Gt is used to collectively describe a plurality of teacher images, and Gt1 is used with a subscript when the individual teacher images are described separately.
 「対象物検知モデル21M」は、対象物Oが写された教師画像Gtに基づいて重みが調整されたニューラルネットワークにより構築されており、画像内における対象物Oの領域抽出及び当該対象物Oの物体認識を行なう。具体的には、対象物検知モデル21Mは、画像が入力されると、当該画像に対象物Oが出現している確率を算出し、算出した確率が所定値以上であれば、その対象物Oの出現している領域を出力する。
 なお、対象物検知モデル21Mは、予め設定された複数の対象物を検知できるものである。
The “object detection model 21M” is constructed by a neural network in which weights are adjusted based on the teacher image Gt on which the object O is copied, and the area extraction of the object O in the image and the extraction of the object O are performed. Perform object recognition. Specifically, the object detection model 21M calculates the probability that the object O appears in the image when the image is input, and if the calculated probability is a predetermined value or more, the object O Outputs the area where is appearing.
The target object detection model 21M can detect a plurality of preset target objects.
 対象物Oの領域は、図2,3に示すように、画像Gk内に合成されるバウンディングボックスBの4つの頂点に対応する座標情報b1~b4で定義される。したがって、対象物検知モデル21Mの教師画像Gtは、バウンディングボックスBの座標情報b1~b4に対応する領域に、対象物Oが写されているものとなる。 As shown in FIGS. 2 and 3, the area of the object O is defined by coordinate information b1 to b4 corresponding to the four vertices of the bounding box B combined in the image Gk. Therefore, in the teacher image Gt of the object detection model 21M, the object O is imaged in the area corresponding to the coordinate information b1 to b4 of the bounding box B.
 なお、図2,3における例では、対象物Oとして「信号機」が示されているが、対象物Oはこれに限定されるものではない。対象物Oは任意の物体を採用できる。また、対象物Oは物体の種類だけでなく、状態等を区別して設定することもできる。例えば、対象物Oとして、単に信号機とするのではなく、赤信号を表示する信号機と、青信号を表示する信号機とを区別して設定することもできる。 In the examples in FIGS. 2 and 3, a “traffic light” is shown as the object O, but the object O is not limited to this. As the object O, any object can be adopted. The object O can be set not only by the type of object but also by the state or the like. For example, the object O may be set not only as a traffic light but also as a traffic light that displays a red traffic light and a traffic light that displays a blue traffic light.
 作業支援装置20は、任意のコンピュータにより実現することができ、記憶部21、入力部22、出力部23、及び処理部24を備える。なお、作業支援装置20は、LSI(Large Scale Integration),ASIC(Application Specific Integrated Circuit),FPGA(Field-Programmable Gate Array)などを用いてハードウェアとして実現されるものでもよい。 The work support device 20 can be realized by any computer, and includes a storage unit 21, an input unit 22, an output unit 23, and a processing unit 24. The work support device 20 may be realized as hardware using an LSI (Large Scale Integration), an ASIC (Application Specific Integrated Circuit), an FPGA (Field-Programmable Gate Array), or the like.
 記憶部21は、各種情報を記憶するものであり、メモリ及びハードディスク等の任意の記憶装置により実現される。ここでは、記憶部21は、対象物検知モデル21Mを構築するニューラルネットワークの重み等の情報を記憶する。また、記憶部21は、複数の教師画像Gtを記憶するものであり、初期状態では複数の教師画像Gt1~Gtp(pは1以外の自然数)を記憶する。また、記憶部21は、新たな教師画像Gtq(qは1~p以外の値)を生成するための動画像GDを記憶する。なお、動画像GDは、任意の撮像装置により撮像される。また、動画像GDは、複数のフレームの静止画像Gdi(i=1~j,jは自然数)から構成される。 The storage unit 21 stores various kinds of information, and is realized by an arbitrary storage device such as a memory and a hard disk. Here, the storage unit 21 stores information such as the weight of the neural network that constructs the object detection model 21M. The storage unit 21 stores a plurality of teacher images Gt, and stores a plurality of teacher images Gt1 to Gtp (p is a natural number other than 1) in the initial state. The storage unit 21 also stores a moving image GD for generating a new teacher image Gtq (q is a value other than 1 to p). The moving image GD is captured by an arbitrary image capturing device. The moving image GD is composed of a plurality of still images Gdi (i=1 to j, j is a natural number).
 入力部22は、キーボード、マウス、タッチパネル等の任意の入力装置により実現され、コンピュータに各種情報を入力する。 The input unit 22 is realized by an arbitrary input device such as a keyboard, a mouse, a touch panel, etc., and inputs various information to a computer.
 出力部23は、ディスプレイ、タッチパネル、スピーカー等の任意の出力装置により実現され、コンピュータから各種情報を出力する。 The output unit 23 is realized by any output device such as a display, a touch panel, a speaker, etc., and outputs various information from the computer.
 処理部24は、各種情報処理を実行するものであり、CPU又はGPU等のプロセッサ、及びメモリにより実現される。ここでは、コンピュータのCPU,GPU等に、記憶部21に記憶された一又は複数のプログラムが読み込まれることにより、処理部24が、抽出部24A、生成部24B、合成部24C、設定部24D、及び更新部24Eとして機能する。以下、処理部24の各機能について、図4を参照しながら説明する。 The processing unit 24 executes various types of information processing, and is realized by a processor such as a CPU or GPU and a memory. Here, the processing unit 24 causes the extraction unit 24A, the generation unit 24B, the synthesis unit 24C, the setting unit 24D, by reading one or a plurality of programs stored in the storage unit 21 into the CPU, GPU, or the like of the computer. And functions as the updating unit 24E. Hereinafter, each function of the processing unit 24 will be described with reference to FIG.
 抽出部24Aは、対象物検知モデル21Mを用いて、任意の動画像GDの各フレームの画像Gd1~Gdjから、対象物Oが写されている確率が第1閾値P1以上、かつ、第2閾値P2以下である領域を含む画像を候補画像Gk(図4では、Gk1~Gk3と表記する)として抽出する。ここでは、第1閾値P1を10%程度と設定し、第2閾値P2を60%程度と設定する。 The extraction unit 24A uses the object detection model 21M, and the probability that the object O is captured from the images Gd1 to Gdj of each frame of the arbitrary moving image GD is the first threshold P1 or more and the second threshold. An image including a region that is P2 or less is extracted as a candidate image Gk (denoted as Gk1 to Gk3 in FIG. 4). Here, the first threshold P1 is set to about 10%, and the second threshold P2 is set to about 60%.
 生成部24Bは、候補画像Gkにおいて、対象物Oが写されている確率が第1閾値P1以上である領域の座標情報を生成する。また、生成部24Bは、対象物Oが写されている領域に対応するバウンディングボックスB(図4では、B1~B3と表記する)の頂点の座標情報b1~b4が記述されたファイルを生成する。
 なお、座標情報b1~b4は、各頂点の二次元座標で定義することができる。ただし、これに限らず、座標情報b1~b4は、バウンディングボックスBが正方形又は長方形であることを前提に、1つの頂点の二次元座標と、その頂点からの幅及び高さとで定義することもできる。前者の場合は、二次元座標上での4つの頂点に対応する8つの値が記述されたファイルが生成される。後者の場合は、二次元座標上の1つの頂点に対応する2つの値と、そこからの幅及び高さを示す2つの値との合計4つの値が記述されたファイルが生成される。
The generation unit 24B generates coordinate information of an area in the candidate image Gk in which the probability that the object O is captured is equal to or higher than the first threshold P1. The generation unit 24B also generates a file in which coordinate information b1 to b4 of the vertices of the bounding box B (denoted as B1 to B3 in FIG. 4) corresponding to the area in which the object O is imaged is described. ..
The coordinate information b1 to b4 can be defined by the two-dimensional coordinates of each vertex. However, not limited to this, the coordinate information b1 to b4 may be defined by the two-dimensional coordinates of one vertex and the width and height from the vertex, assuming that the bounding box B is a square or a rectangle. it can. In the former case, a file in which eight values corresponding to four vertices on the two-dimensional coordinates are described is generated. In the latter case, a file in which a total of four values including two values corresponding to one vertex on the two-dimensional coordinates and two values indicating the width and height from the vertex is generated is generated.
 合成部24Cは、候補画像GkにバウンディングボックスBを合成し、出力部23のディスプレイに表示する。 The combining unit 24C combines the bounding box B with the candidate image Gk and displays it on the display of the output unit 23.
 設定部24Dは、入力部22を介して、候補画像Gkに合成されたバウンディングボックスB内に対象物Oが写されているか否かを設定する。図4に示す例では、候補画像Gk1,Gk3のバウンディングボックスB1,B3には対象物Oが写されているが、候補画像Gk2のバウンディングボックスB2には対象物O以外の物体Pが写されている。このような場合、設定部24Dは、入力部22を介して、「候補画像Gk1,Gk3は対象物Oが写されている対象物画像である」との設定を行なう(U1,U3)。また、設定部24Dは、入力部22を介して、「候補画像Gk2は対象物Oが写されている対象物画像ではない」との設定を行なう(U2)。 The setting unit 24D sets, via the input unit 22, whether or not the object O is shown in the bounding box B combined with the candidate image Gk. In the example shown in FIG. 4, the object O is shown in the bounding boxes B1 and B3 of the candidate images Gk1 and Gk3, but the object P other than the object O is shown in the bounding box B2 of the candidate image Gk2. There is. In such a case, the setting unit 24D sets, via the input unit 22, "candidate images Gk1 and Gk3 are target images in which the target O is captured" (U1, U3). Further, the setting unit 24D makes a setting via the input unit 22 that "the candidate image Gk2 is not a target image in which the target O is captured" (U2).
 なお、設定部24Dは、座標を指定することで任意のバウンディングボックスBを画像内に生成し、そのバウンディングボックスBを用いて対象物Oが写されている対象物画像であることを設定することも可能である。 The setting unit 24D generates an arbitrary bounding box B in the image by designating the coordinates, and sets that the object O is a target image in which the target O is captured using the bounding box B. Is also possible.
 更新部24Eは、対象物Oが写されているか否かが設定された画像を、現在の教師画像Gtに加えて、ニューラルネットワークの重みを再調整し、対象物検知モデル21Mを更新する。 The updating unit 24E updates the target object detection model 21M by adding an image in which whether or not the target object O is set to the current teacher image Gt and readjusting the weight of the neural network.
 (1-2)作業支援装置の動作
 図5は本実施形態に係る作業支援装置20の動作を説明するためのフローチャートである。
 まず、任意の撮像装置により、当該撮像装置の周辺環境の動画像GDが撮像される。そして、これらの動画像GDが適時、作業支援装置20の記憶部21に記憶される(S1)。
(1-2) Operation of Work Support Device FIG. 5 is a flowchart for explaining the operation of the work support device 20 according to the present embodiment.
First, a moving image GD of the surrounding environment of the image capturing device is captured by an arbitrary image capturing device. Then, these moving images GD are stored in the storage unit 21 of the work support device 20 in a timely manner (S1).
 次に、作業支援装置20は、初期の教師画像群Gt1~Gtpを用いて構築された対象物検知モデル21Mを用いて、記憶部21に記憶された動画像GDを構成する一フレームの画像Gdi(i=1~j,jは自然数)に、対象物Oが写されている確率が第1閾値P1以上、第2閾値P2以下である領域が含まれているか否かを判定する(S2~S4)。 Next, the work support apparatus 20 uses the object detection model 21M constructed by using the initial teacher image groups Gt1 to Gtp, and uses the object detection model 21M to construct the moving image GD stored in the storage unit 21 as one frame image Gdi. It is determined whether (i=1 to j, j is a natural number) includes a region in which the probability that the object O is copied is equal to or more than the first threshold P1 and equal to or less than the second threshold P2 (S2 to S4).
 作業支援装置20は、対象物Oが写されている確率が上記範囲内である場合、そのフレームの画像を「候補画像Gk」として出力する(S4-Yes,S5)。具体的には、図2,3に示すように、対象物Oが写されている領域にバウンディングボックスBが合成された画像が出力部23を構成するディスプレイに表示される。 When the probability that the object O is copied is within the above range, the work support apparatus 20 outputs the image of that frame as a “candidate image Gk” (S4-Yes, S5). Specifically, as shown in FIGS. 2 and 3, an image in which the bounding box B is combined in the area where the object O is displayed is displayed on the display constituting the output unit 23.
 続いて、作業者が、候補画像Gkが対象物Oが写されている対象物画像であるか否かを判定する(S6)。この際、作業支援装置20は、入力部22及び設定部24Dを介して、候補画像Gkが対象物画像であるか否かの設定を受け付ける。例えば、図2,3に示すような画像が候補画像Gkとして表示された場合、バウンディングボックスBに対象物Oである「信号機」が写されているので、作業者に追加作業をさせることなく対象物画像であることの設定を受け付けることができる(S6-Yes,S7)。一方、候補画像Gkに、対象物Oが写されていない場合には、対象物O以外の物体を「信号機」と誤認識して抽出していることになるので、その誤認識した物体のバウンディングボックスBを削除する処理を作業者に行なわせてから、対象物画像ではないことの設定を受け付ける(S6-No,S8)。 Subsequently, the worker determines whether the candidate image Gk is a target image in which the target O is captured (S6). At this time, the work support apparatus 20 accepts the setting as to whether the candidate image Gk is the target image, via the input unit 22 and the setting unit 24D. For example, when the images as shown in FIGS. 2 and 3 are displayed as the candidate image Gk, the “traffic light”, which is the object O, is shown in the bounding box B, and therefore the operator does not need to perform additional work. The setting that the image is a physical image can be accepted (S6-Yes, S7). On the other hand, when the target object O is not shown in the candidate image Gk, it means that an object other than the target object O is erroneously recognized as a “traffic light” and is extracted, and thus the bounding of the erroneously recognized object is performed. After allowing the operator to perform the process of deleting the box B, the setting that the image is not the object image is accepted (S6-No, S8).
 この後、順次、動画像GDの最後のフレームの画像Gdjに達するまで、上記処理が順次実行される(S9,S10)。 After that, the above processes are sequentially executed until the image Gdj of the last frame of the moving image GD is reached (S9, S10).
 (3)作業支援装置の特徴
 (3-1)
 以上説明したように、本実施形態に係る作業支援装置20では、抽出部24Aが、対象物検知モデル21Mを用いて、任意の動画像GDから、対象物Oが写されている確率が第1閾値P1以上、第2閾値P2以下である領域を含む画像を候補画像Gkとして抽出する。また、生成部24Bが、候補画像Gkにおける、対象物Oが写されている確率が第1閾値P1以上、第2閾値P2以下である領域の座標情報b1~b4を生成する。また、合成部24Cが、座標情報b1~b4に対応するバウンディングボックスBを生成して候補画像Gkに合成する。そして、バウンディングボックスBが表示された候補画像Gkが出力部23を構成するディスプレイに表示される。
 また、作業支援装置20は、設定部24Dを備えている。設定部24Dは、作業者による入力部22の操作を介して、バウンディングボックスB内に対象物Oが写されていること、又は、バウンディングボックスB内に対象物Oが写されていないことの設定を受け付ける。
(3) Features of work support device (3-1)
As described above, in the work support device 20 according to the present embodiment, the extraction unit 24A uses the object detection model 21M, and the probability that the object O is copied from an arbitrary moving image GD is the first. An image including a region having a threshold value P1 or more and a second threshold value P2 or less is extracted as a candidate image Gk. In addition, the generation unit 24B generates coordinate information b1 to b4 of a region in the candidate image Gk in which the probability that the object O is photographed is not less than the first threshold P1 and not more than the second threshold P2. Further, the combining unit 24C generates the bounding box B corresponding to the coordinate information b1 to b4 and combines the bounding box B with the candidate image Gk. Then, the candidate image Gk in which the bounding box B is displayed is displayed on the display constituting the output unit 23.
The work support device 20 also includes a setting unit 24D. The setting unit 24D sets that the object O is imaged in the bounding box B or that the object O is not imaged in the bounding box B through the operation of the input unit 22 by the operator. Accept.
 したがって、このような作業支援装置20を用いることで、作業者は、動画像GDの各フレームの画像Gdi内に、対象物Oが写されているか否かを設定する作業を効率的に実行できるようになる。具体的に、作業者は、ある程度の出現確率で対象物Oが表示される候補画像Gk内のバウンディングボックスBに対応する領域に、対象物Oが写されているか否かを確認するだけで済むようになる。そして、対象物Oが写されている画像は、新たな教師画像Gtqとして用いることができる(図4参照)。要するに、作業支援装置20を用いることで多数の教師画像Gtを効率的に集めることができるようになる。 Therefore, by using the work support device 20 as described above, the worker can efficiently perform the work of setting whether or not the object O is imaged in the image Gdi of each frame of the moving image GD. Like Specifically, the worker only needs to confirm whether or not the object O is imaged in the area corresponding to the bounding box B in the candidate image Gk in which the object O is displayed with a certain probability of appearance. Like Then, the image showing the object O can be used as a new teacher image Gtq (see FIG. 4). In short, by using the work support device 20, it is possible to efficiently collect a large number of teacher images Gt.
 (3-2)
 特に、抽出部24Aは、対象物Oが写されている確率が第1閾値P1以上である領域を含む画像を候補画像Gkとして抽出するので、対象物Oが写されていない画像が除外されることになる。換言すると、抽出部24Aは、ノイズとなる画像を候補画像Gkとして抽出しないようにしている。これにより、対象物Oが写されている画像であるか否かの設定が効率化される。
(3-2)
In particular, the extraction unit 24A extracts, as the candidate image Gk, an image including a region in which the probability that the object O is copied is equal to or higher than the first threshold P1. Therefore, an image in which the object O is not copied is excluded. It will be. In other words, the extraction unit 24A does not extract a noise image as the candidate image Gk. As a result, the setting of whether or not the object O is a captured image is made efficient.
 (3-3)
 また、抽出部24Aは、対象物Oが写されている確率が第2閾値P2以下である画像を候補画像Gkとして抽出する。これにより、対象物検知モデル21Mの検知精度の向上に寄与する新たな教師画像を効率的に収集することを実現している。
(3-3)
Further, the extraction unit 24A extracts, as a candidate image Gk, an image in which the probability that the object O is captured is equal to or less than the second threshold value P2. As a result, it is possible to efficiently collect a new teacher image that contributes to the improvement of the detection accuracy of the object detection model 21M.
 補足すると、現在の教師画像群Gt1~Gtpを用いて高確率で検知可能な対象物画像を新たな教師画像として現在の教師画像群Gt1~Gtpに加えて対象物検知モデル21Mの重みを更新しても、現在の教師画像群Gt1~Gtpから抽出される対象物Oの特徴量に対して有意な変化が生じないことが多い。すなわち、そのような教師画像は、対象物検知モデル21Mの検知精度の向上に寄与しないことが多い。これに対し、現在の教師画像群Gt1~Gtpを用いて高確率で検知可能ではない対象物画像を新たな教師画像Gtqとして現在の教師画像群Gt1~Gtpに加えて、対象物検知モデル21Mの重みを更新すると、現在の教師画像群Gt1~Gtpから抽出される対象物Oの特徴量に対して有意な変化が生じ、対象物検知モデル21Mの検知精度が向上することになる。
 このように、本実施形態に係る作業支援装置20は、現在の教師画像群Gt1~Gtpでは、検知精度が上がらない画像を候補画像Gkとして抽出することで、対象物検知モデル21Mの検知精度の向上に寄与する新たな教師画像を効率的に収集することを実現している。
Supplementally, the weight of the target object detection model 21M is updated by adding a target image that can be detected with high probability using the current teacher image group Gt1 to Gtp as a new teacher image to the current teacher image group Gt1 to Gtp. However, in many cases, no significant change occurs in the feature amount of the object O extracted from the current teacher image group Gt1 to Gtp. That is, such a teacher image often does not contribute to improvement in detection accuracy of the object detection model 21M. On the other hand, in addition to the current teacher image group Gt1 to Gtp as a new teacher image Gtq, an object image that cannot be detected with high probability using the current teacher image group Gt1 to Gtp is added to the target object detection model 21M. When the weight is updated, a significant change occurs in the feature amount of the target object O extracted from the current teacher image group Gt1 to Gtp, and the detection accuracy of the target object detection model 21M is improved.
As described above, the work support apparatus 20 according to the present embodiment extracts an image whose detection accuracy does not increase in the current teacher image groups Gt1 to Gtp as the candidate image Gk, thereby detecting the detection accuracy of the object detection model 21M. It is possible to efficiently collect new teacher images that contribute to improvement.
 (3-4)
 また、本実施形態に係る作業支援装置20では、更新部24Eをさらに備える。更新部24Eは、対象物Oが写されているか否かが設定された画像を現在の教師画像Gtに加えて、ニューラルネットワークの重みを調整し、対象物検知モデル21Mを更新する。このような構成により、作業支援装置20の使用に応じて、対象物検知モデル21Mにおける対象物Oを検知する精度が向上する。結果として、検知精度の高い対象物検知モデルを提供できるようになる。
(3-4)
The work support device 20 according to the present embodiment further includes an updating unit 24E. The updating unit 24E adds an image in which whether or not the target object O is captured to the current teacher image Gt, adjusts the weight of the neural network, and updates the target object detection model 21M. With such a configuration, the accuracy of detecting the object O in the object detection model 21M is improved according to the use of the work support device 20. As a result, it becomes possible to provide an object detection model with high detection accuracy.
 (3-5)
 また、本実施形態に係る対象物検知モデル21Mは、複数の対象物を検知することができる。さらに、設定部24Dは、バウンディングボックスBに対応する対象物の設定の変更を受け付けることもできる。具体的には、第1対象物が写されている確率が第1閾値以上であるとして出力された候補画像に対し、第1対象物ではなく第2対象物が写されていることを設定することができる。
 例えば、青信号を表示する信号機が写されている確率が第1閾値以上第2閾値以下であることを示すバウンディングボックスBが表示されているときに、実際の候補画像Gkに赤信号が写されている場合、ユーザは入力部22及び設定部24Dを介して、当該候補画像Gkには、赤信号を表示する信号機が写されていると設定することができる。
(3-5)
Further, the object detection model 21M according to the present embodiment can detect a plurality of objects. Furthermore, the setting unit 24D can also accept a change in the setting of the target object corresponding to the bounding box B. Specifically, it is set that the second object, not the first object, is imaged in the candidate image that is output as the probability that the first object is imaged is greater than or equal to the first threshold value. be able to.
For example, when a bounding box B indicating that the probability that a traffic light displaying a green light is displayed is greater than or equal to a first threshold value and less than or equal to a second threshold value is displayed, a red light signal is captured in the actual candidate image Gk. If so, the user can set through the input unit 22 and the setting unit 24D that the candidate image Gk includes a traffic light displaying a red traffic light.
 (4)変形例
 (4-1)変形例A
 本実施形態に係る作業支援装置20は、抽出部24Aが、前回抽出された画像の変化量が所定量以下である場合、候補画像Gkの抽出を停止するものでもよい。具体的には、変形例Aに係る抽出部24Aは、前回抽出された画像を基準画像Gcとして記憶する。そして、抽出部24Aは、当該基準画像Gcからの変化量が所定量以下である場合、対象物Oが写されている確率が第1閾値P1以上、第2閾値P2以下である領域を含む画像の抽出を停止する。換言すると、抽出部24Aにより候補画像Gkが抽出された場合、当該候補画像が基準画像Gcとして設定される。また、抽出部24Aは、動画像GDを構成する一フレームの画像Gdiの、基準画像Gcからの変化量が所定量以下である場合、当該画像を候補画像Gkとして抽出することを停止する。
(4) Modified Example (4-1) Modified Example A
In the work support device 20 according to the present embodiment, the extraction unit 24A may stop the extraction of the candidate image Gk when the change amount of the previously extracted image is equal to or less than the predetermined amount. Specifically, the extraction unit 24A according to the modification A stores the previously extracted image as the reference image Gc. Then, when the amount of change from the reference image Gc is equal to or less than the predetermined amount, the extraction unit 24A includes an image including a region in which the probability that the object O is imaged is equal to or more than the first threshold P1 and equal to or less than the second threshold P2. To stop extracting. In other words, when the extraction unit 24A extracts the candidate image Gk, the candidate image is set as the reference image Gc. In addition, the extraction unit 24A stops extracting the image as a candidate image Gk when the change amount of the image Gdi of one frame forming the moving image GD from the reference image Gc is equal to or less than a predetermined amount.
 この変形例Aに係る作業支援装置20は、図6のフローチャートに示すような動作を実行する。変形例Aに係る作業支援装置20では、ステップT1~T4,T6~T8,T10~T12が、それぞれ上述したステップS1~S9と同様の処理を実行する。一方、変形例Aに係る作業支援装置20では、ステップT5,T9の処理が追加される。ステップT5では、基準画像Gcからの変化量が所定量より大きい場合にのみ候補画像Gkが抽出される。また、ステップT9では、新たに対象物画像であると設定された場合に、当該対象物画像が新たな基準画像Gcとして設定される。 The work support device 20 according to this modification A executes the operation shown in the flowchart of FIG. In the work support device 20 according to the modification A, steps T1 to T4, T6 to T8, and T10 to T12 execute the same processing as steps S1 to S9 described above, respectively. On the other hand, in the work support device 20 according to the modified example A, the processes of steps T5 and T9 are added. In step T5, the candidate image Gk is extracted only when the amount of change from the reference image Gc is larger than the predetermined amount. Further, in step T9, when the object image is newly set, the object image is set as a new reference image Gc.
 このような構成により、変形例Aに係る作業支援装置20では、対象物検知モデル21Mの検知精度の向上に寄与しない候補画像Gkが収集されないことになる。補足すると、基準画像Gcからの変化量が所定量以下である画像は、基準画像Gcと類似した画像であるので、このような画像を新たな教師画像として現在の教師画像群Gt1~Gtpに加えて、対象物検知モデル21Mの重みを更新しても、現在の教師画像群Gt1~Gtpから抽出される対象物Oの特徴量に対して有意な変化が生じないことが多い。すなわち、そのような教師画像は、対象物検知モデル21Mの検知精度の向上に寄与しないことが多い。そこで、そのような画像を無視することで、演算負荷を低減しつつ対象物検知モデル21Mを迅速に構築できるようになる。
 換言すると、変形例Aに係る作業支援装置20は、対象物検知モデル21Mの検知精度の向上に寄与する候補画像Gkを効率的に収集できるものとなっている。
With such a configuration, the work support apparatus 20 according to the modification A does not collect the candidate images Gk that do not contribute to the improvement in the detection accuracy of the object detection model 21M. Supplementally, an image in which the amount of change from the reference image Gc is less than or equal to a predetermined amount is an image similar to the reference image Gc. Therefore, such an image is added as a new teacher image to the current teacher image groups Gt1 to Gtp. Thus, even if the weight of the object detection model 21M is updated, there is often no significant change in the feature amount of the object O extracted from the current teacher image groups Gt1 to Gtp. That is, such a teacher image often does not contribute to the improvement of the detection accuracy of the object detection model 21M. Therefore, by ignoring such an image, the object detection model 21M can be quickly constructed while reducing the calculation load.
In other words, the work support device 20 according to the modification A can efficiently collect the candidate images Gk that contribute to the improvement of the detection accuracy of the object detection model 21M.
 (4-2)変形例B
 また、本実施形態に係る作業支援装置20は、対象物検知モデル21Mが、対象物Oの領域抽出及び対象物Oの物体認識をエンドツーエンド(End to End)で行なうニューラルネットワークにより構築されたものでもよい。このような構成により、対象物Oの検知を高速化することができ、リアルタイムで対象物Oを検知することができる。
(4-2) Modification B
Further, in the work support device 20 according to the present embodiment, the object detection model 21M is constructed by a neural network that performs area extraction of the object O and object recognition of the object O end-to-end. It may be one. With such a configuration, the object O can be detected at high speed, and the object O can be detected in real time.
 なお、ここでいう、エンドツーエンドとは、図7(a)に概念を示すように、対象物Oの領域抽出及び対象物Oの物体認識という処理に対して適切な構造を持つ一つのニューラルネットワークにより入出力関係を直接学習することをいう。例えば、このような対象物検知モデル21Mは、YOLO(You Only Look Once)又はSSD(Single Shot MultiBox Detector)などのアルゴリズムを用いて実現することができる。 It should be noted that the term “end-to-end” as used herein means, as shown in the concept of FIG. 7A, a neural network having an appropriate structure for the processing of region extraction of the object O and object recognition of the object O. It means learning the input/output relations directly through the network. For example, such an object detection model 21M can be realized by using an algorithm such as YOLO (You Only Look Once) or SSD (Single Shot Multi Box Detector).
 ただし、対象物検知モデル21Mはこれに限らず、図7(b)に概念を示すように、対象物Oの領域抽出及び対象物Oの物体認識を個別に行うアルゴリズム及びニューラルネットワークの組み合わせにより構築されるものでもよい。 However, the object detection model 21M is not limited to this, and as shown in the concept of FIG. 7B, is constructed by a combination of an algorithm and a neural network for individually extracting the area of the object O and recognizing the object of the object O. It may be done.
 (4-3)変形例C
 また、本実施形態に係る作業支援装置20は、図8に示すように、対象物の種類と対象物Oが写されている確率の値とを示す画像を候補画像Gkに合成して出力するものでもよい。これにより、作業者は候補画像Gkに写された対象物Oが何であるかを容易に認識することができる。例えば図8では、記号Mで示される領域に、バウンディングボックスBに対応する画像が、赤信号を表示する信号機(図8ではRed_lightと表記)である旨と、赤信号を表示する信号機である確率が43.21%であることとが示されている。なお、記号Mで示される領域は対応するバウンディングボックスBの近傍に表示される。
(4-3) Modification C
Further, the work support device 20 according to the present embodiment, as shown in FIG. 8, synthesizes an image showing the type of the target object and the value of the probability that the target object O is photographed with the candidate image Gk and outputs the candidate image Gk. It may be one. Accordingly, the worker can easily recognize what the target object O shown in the candidate image Gk is. For example, in FIG. 8, in the area indicated by the symbol M, the image corresponding to the bounding box B is a traffic light that displays a red traffic light (denoted as Red_light in FIG. 8 ), and the probability that it is a traffic light that displays a red traffic light. Is 43.21%. The area indicated by the symbol M is displayed near the corresponding bounding box B.
 (4-4)変形例D
 また、本実施形態に係る作業支援装置20は、対象物検知モデル21Mを用いて、任意の動画像GDから、対象物Oが写されている確率が第1閾値以上であり第2閾値以下である領域を含む画像を候補画像Gkとして、対象物毎に区分けされたフォルダに格納するものでもよい。さらに、作業支援装置20は、フォルダ毎に格納された候補画像Gkを、バウンディングボックスBとともに出力するものでもよい。
(4-4) Modification D
Further, the work support device 20 according to the present embodiment uses the object detection model 21M, and the probability that the object O is imaged from an arbitrary moving image GD is equal to or higher than the first threshold value and equal to or lower than the second threshold value. An image including a certain area may be stored as a candidate image Gk in a folder divided for each object. Further, the work support device 20 may output the candidate image Gk stored for each folder together with the bounding box B.
 これにより、候補画像Gkに対象物が写されているか否かを効率的に判断することが可能となる。補足すると、各フォルダに蓄積された画像は所定の対象物に関連付けられているので、作業者は、各フォルダに蓄積された画像を連続的に表示したときに、当該対象物が写されているか否かだけを確認すればよいことになる。 With this, it becomes possible to efficiently judge whether or not the object is shown in the candidate image Gk. Supplementally, since the images accumulated in each folder are associated with a predetermined target object, when the worker continuously displays the images accumulated in each folder, whether the target object is copied or not is displayed. You just have to confirm whether or not.
 例えば、作業者は、青信号を表示する信号機が写された候補画像が複数蓄積されたフォルダを開き、当該フォルダ内の画像を連続的に出力することで、それらの候補画像に青信号を表示する信号機が写されているか否かを効率的に判断することができる。また、作業者は、当該フォルダ内の画像を連続的に確認する場合、図8の記号I2で示されるような、次に進むことを意味するアイコンをポインタPでクリックすることで、次の画像を表示することができる。ここで、アイコンI2をクリックした場合には、次の画像を表示すると同時に、表示中の候補画像Gkに、青信号を表示する信号機が写されていることの設定も行われる。要するに、作業者は画像を連続的に確認しながら、アイコンI2をクリックするだけで、対象物検知モデル21Mに用いられる教師画像Gtを生成するためのアノテーション作業を実行できるようになる。なお、図8の記号I1は前に戻ることを意味するアイコンであり、このアイコンI1がクリックされた場合には、前回表示された候補画像が表示されることになる。 For example, an operator opens a folder in which a plurality of candidate images in which a traffic light displaying a green signal is displayed is accumulated and continuously outputs the images in the folder so that the traffic light displaying the green signal in those candidate images. It is possible to efficiently determine whether or not is captured. Further, when the operator continuously confirms the images in the folder, the operator clicks an icon, which is indicated by symbol I2 in FIG. Can be displayed. Here, when the icon I2 is clicked, the next image is displayed, and at the same time, the candidate image Gk being displayed is set to include a traffic light displaying a green signal. In short, the worker can perform the annotation work for generating the teacher image Gt used for the object detection model 21M by simply clicking the icon I2 while continuously checking the images. Note that the symbol I1 in FIG. 8 is an icon that means return to the front, and when this icon I1 is clicked, the previously displayed candidate image is displayed.
 なお、候補画像Gkに同一種類の複数の対象物が写されている場合は、当該対象物に対応するフォルダにそのまま格納される。一方、候補画像Gkに異なる種類の複数の対象物が写されている場合は、例外であることを示すフォルダに格納される。 If a plurality of objects of the same type are shown in the candidate image Gk, they are stored in the folder corresponding to the objects as they are. On the other hand, when a plurality of objects of different types are shown in the candidate image Gk, they are stored in a folder indicating an exception.
 本開示は、上記各実施形態そのままに限定されるものではない。本開示は、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できるものである。また、本開示は、上記各実施形態に開示されている複数の構成要素の適宜な組み合わせにより種々の開示を形成できるものである。例えば、実施形態に示される全構成要素から幾つかの構成要素は削除してもよいものである。さらに、異なる実施形態に構成要素を適宜組み合わせてもよいものである。 The present disclosure is not limited to the above embodiments as they are. The present disclosure can be embodied by modifying the constituent elements within the scope not departing from the gist of the present invention at the implementation stage. In addition, the present disclosure can form various disclosures by appropriately combining a plurality of constituent elements disclosed in each of the above-described embodiments. For example, some components may be deleted from all the components shown in the embodiment. Further, the constituent elements may be appropriately combined with different embodiments.
 また、上記各実施形態には、以下の各観点の構成も開示される。
 第1観点の作業支援装置は、画像内の領域に対象物が写されているか否かを設定するための作業を支援する作業支援装置である。ここで、作業支援装置は、抽出部及び生成部を備える。抽出部は、対象物が写された教師画像を用いて構築された対象物検知モデルを用いて、任意の動画像から、対象物が写されている確率が第1閾値以上である領域を含む画像を候補画像として抽出する。生成部は、候補画像における、対象物が写されている確率が第1閾値以上である領域の座標情報を生成する。このような構成により、候補画像内の座標領域を指定又は解除することで、対象物が写されている対象物画像であるか否かを効率的に設定できる。結果として、多数の教師画像を効率的に集めることができる。
Further, the configurations of the following aspects are also disclosed in the above-described embodiments.
The work support apparatus of the first aspect is a work support apparatus that supports a work for setting whether or not an object is imaged in a region within an image. Here, the work support device includes an extraction unit and a generation unit. The extraction unit includes an area in which the probability that an object is imaged is equal to or higher than a first threshold value from an arbitrary moving image by using an object detection model constructed using a teacher image in which the object is imaged. Extract the image as a candidate image. The generation unit generates coordinate information of an area in the candidate image in which the probability that the object is captured is equal to or higher than the first threshold value. With such a configuration, by designating or canceling the coordinate area in the candidate image, it is possible to efficiently set whether or not the target object is the image of the target object. As a result, a large number of teacher images can be efficiently collected.
 第2観点の作業支援装置は、第1観点の作業支援装置であって、対象物検知モデルが、対象物の領域抽出及び対象物の物体認識をエンドツーエンドで行なうニューラルネットワークにより構築されたものである。このような構成により、対象物の検知を高速化することができる。 A work support apparatus according to a second aspect is the work support apparatus according to the first aspect, in which the object detection model is constructed by a neural network that performs end-to-end object region extraction and object object recognition. Is. With such a configuration, it is possible to speed up the detection of the target object.
 第3観点の作業支援装置は、第1観点又は第2観点の作業支援装置であって、抽出部が、対象物が写されている確率が第2閾値以下である領域を含む画像を候補画像として抽出する。このような構成により、対象物検知モデルの検知精度の向上に寄与する候補画像を効率的に収集できる。 The work support apparatus according to the third aspect is the work support apparatus according to the first aspect or the second aspect, wherein the extraction unit selects an image including a region in which the probability that the object is captured is equal to or less than a second threshold as a candidate image. To extract. With such a configuration, it is possible to efficiently collect candidate images that contribute to improving the detection accuracy of the object detection model.
 第4観点の作業支援装置は、第1観点から第3観点の作業支援装置であって、抽出部が、前回抽出された候補画像からの変化量が所定量以下である場合、候補画像の抽出を停止する。これにより、対象物検知モデルの検知精度の向上に寄与しない候補画像の収集が停止される。結果として、対象物検知モデルの検知精度の向上に寄与する候補画像を効率的に収集することができる。 The work support apparatus according to the fourth aspect is the work support apparatus according to the first to third aspects, and the extraction unit extracts the candidate image when the change amount from the previously extracted candidate image is less than or equal to a predetermined amount. To stop. As a result, the collection of candidate images that do not contribute to the improvement in the detection accuracy of the object detection model is stopped. As a result, it is possible to efficiently collect candidate images that contribute to improving the detection accuracy of the object detection model.
 第5観点の作業支援装置では、第1観点から第4観点の作業支援装置であって、更新部をさらに備える。更新部は、対象物が写されているか否かが設定された画像を教師画像に加えて、対象物検知モデルを更新する。このような構成により、使用に応じて、対象物の検知精度の向上に寄与する候補画像を効率的に収集できる。 The work support apparatus according to the fifth aspect is the work support apparatus according to the first to fourth aspects, and further includes an updating unit. The updating unit updates the target object detection model by adding an image in which whether or not the target object is set to the teacher image. With such a configuration, it is possible to efficiently collect the candidate images that contribute to the improvement of the detection accuracy of the object according to the use.
 第6観点の対象物検知モデルは、第5観点の作業支援装置により更新された対象物検知モデルである。したがって、検知精度の高い対象物検知モデルを提供できる。 The sixth aspect of the object detection model is the object detection model updated by the work support apparatus of the fifth aspect. Therefore, an object detection model with high detection accuracy can be provided.
 第7観点のプログラムは、コンピュータを、画像内の領域に対象物が写されているか否かを設定するための作業を支援する作業支援装置として機能させるものである。このプログラムは、コンピュータを、抽出部及び生成部として機能させる。抽出部は、対象物が写された教師画像を用いて構築された対象物検知モデルを用いて、任意の動画像から、対象物が写されている確率が第1閾値以上である領域を含む画像を候補画像として抽出する。生成部は、候補画像における、対象物が写されている確率が第1閾値以上である領域の座標情報を生成する。このような構成により、候補画像内の座標を指定又は解除することで、対象物が写されている対象物画像であるか否かを効率的に設定できる。結果として、多数の教師画像を効率的に集めることができる。 The program according to the seventh aspect causes a computer to function as a work support device that supports a work for setting whether or not an object is shown in an area in an image. This program causes a computer to function as an extraction unit and a generation unit. The extraction unit includes an area in which the probability that the object is imaged is equal to or greater than a first threshold value from an arbitrary moving image by using the object detection model constructed by using the teacher image in which the object is imaged. Extract the image as a candidate image. The generation unit generates coordinate information of a region in the candidate image in which the probability that the object is captured is equal to or higher than the first threshold value. With such a configuration, by designating or canceling the coordinates in the candidate image, it is possible to efficiently set whether or not the target object is a captured target object image. As a result, a large number of teacher images can be efficiently collected.
 第8観点の作業支援方法は、コンピュータを用いて、画像内の領域に対象物が写されているか否かを設定するための作業を支援する方法である。この作業支援方法では、対象物が写された教師画像を用いて構築された対象物検知モデルを用いて、任意の動画像から、前記対象物が写されている確率が第1閾値以上である領域を含む画像を候補画像として抽出する。そして、この作業支援方法では、候補画像における、対象物が写されている確率が第1閾値以上である領域の座標情報を生成する。したがって、この作業支援方法によれば、対象物が写されている対象物画像であるか否かを効率的に設定できる。結果として、多数の教師画像を効率的に集めることができる。 The work support method of the eighth aspect is a method of using a computer to support work for setting whether or not an object is shown in an area within an image. In this work support method, the probability that the object is copied from an arbitrary moving image is equal to or higher than a first threshold value by using an object detection model constructed using a teacher image in which the object is copied. An image including a region is extracted as a candidate image. Then, in this work support method, the coordinate information of the region in the candidate image in which the probability that the object is photographed is the first threshold value or more is generated. Therefore, according to this work support method, it can be efficiently set whether or not the target object is a captured target object image. As a result, a large number of teacher images can be efficiently collected.
 第9観点の作業支援装置は、抽出部と、出力部と、を備える。抽出部は、対象物が写された教師画像を用いて構築された対象物検知モデルを用いて、任意の動画像から、対象物が写されている確率が第1閾値以上である領域を含む画像を候補画像として抽出する。また、出力部は、当該候補画像に、対象物が写されている確率が第1閾値以上である領域に対応するバウンディングボックスが合成された画像を出力する。このような構成により、バウンディングボックスの表示を維持するか、又は削除することで、対象物が写されている対象物画像であるか否かを効率的に設定できる。結果として、多数の教師画像を効率的に集めることができる。 The work support apparatus according to the ninth aspect includes an extraction unit and an output unit. The extraction unit includes an area in which the probability that the object is imaged is equal to or greater than a first threshold value from an arbitrary moving image by using the object detection model constructed by using the teacher image in which the object is imaged. Extract the image as a candidate image. Further, the output unit outputs an image in which a bounding box corresponding to a region in which the probability that the object is captured is equal to or higher than the first threshold is combined with the candidate image. With such a configuration, by maintaining the display of the bounding box or deleting it, it is possible to efficiently set whether or not the target object is the image of the target object. As a result, a large number of teacher images can be efficiently collected.
 第10観点の作業支援装置は第9観点の作業支援装置であって、バウンディングボックス内に対象物が写されていること、又は、バウンディングボックス内に対象物が写されていないことの設定を受け付ける設定部をさらに備える。この設定部を介して、対象物が写されている対象物画像であるか否かを効率的に設定できる。 The work support apparatus according to the tenth aspect is the work support apparatus according to the ninth aspect, and accepts a setting that an object is imaged in the bounding box or that the object is not imaged in the bounding box. A setting unit is further provided. Through this setting unit, it can be efficiently set whether or not the target object is a captured target object image.
 第11観点の作業支援装置は、第9観点又は第10観点の作業支援装置であって、対象物検知モデルが、複数の対象物を検知するものである。また、作業支援装置は、第1対象物が写されている確率が第1閾値以上である領域に対応するバウンディングボックスが合成された画像に対し、当該第1対象物に代えて第2対象物が写されていることの設定を受け付ける設定部をさらに備える。このような構成により、対象物を誤検知した場合の修正を容易に行うことができる。 The work support apparatus of the eleventh aspect is the work support apparatus of the ninth or tenth aspect, and the object detection model detects a plurality of objects. In addition, the work support device replaces the first target object with the second target object for the image in which the bounding box corresponding to the region in which the probability that the first target object is captured is equal to or higher than the first threshold value is combined. It further includes a setting unit that accepts the setting that the is captured. With such a configuration, correction can be easily performed when the object is erroneously detected.
 第12観点の作業支援装置は、第9観点から第11観点の作業支援装置であって、対象物検知モデルが、複数の対象物を検知するものである。また、作業支援装置は、対象物検知モデルを用いて、任意の動画像から、対象物が写されている確率が第1閾値以上である領域を含む画像を候補画像として、対象物毎に区分けされたフォルダに格納する。これにより、各フォルダに蓄積された画像を連続的に表示することで、当該対象物が写されているか否かだけを確認すればよいので、作業者の負担が軽減される。 The work support apparatus according to the twelfth aspect is the work support apparatus according to the ninth to eleventh aspects, and the object detection model detects a plurality of objects. In addition, the work support device uses the object detection model to classify, from any moving image, an image including a region in which the probability that the object is captured is equal to or more than a first threshold value as a candidate image, for each object. Stored in the specified folder. With this, by continuously displaying the images accumulated in each folder, it is only necessary to confirm whether or not the target object is imaged, which reduces the burden on the operator.
 第13観点の作業支援装置は、第9観点から第12観点の作業支援装置であって、対象物が写されている確率の値を候補画像に合成して出力する。これにより、作業者が、候補画像に写された対象物が何であるかを容易に認識することができる。 The work support apparatus according to the thirteenth aspect is the work support apparatus according to the ninth to twelfth aspects, and combines the value of the probability that the object is photographed with the candidate image and outputs it. This allows the operator to easily recognize what the target object is in the candidate image.
 第14観点の作業支援装置は、第9観点から第13観点の作業支援装置であって、抽出部が、対象物が写されている確率が第2閾値以下である領域を含む画像を候補画像として抽出する。このような構成により、対象物検知モデルの検知精度の向上に寄与する候補画像を効率的に収集できる。 The work support apparatus according to the fourteenth aspect is the work support apparatus according to the ninth to thirteenth aspects, wherein the extraction unit selects an image including a region in which the probability that the object is captured is equal to or less than the second threshold as a candidate image. To extract. With such a configuration, it is possible to efficiently collect candidate images that contribute to improving the detection accuracy of the object detection model.
 第15観点の作業支援装置は、第9観点から第14観点の作業支援装置であって、抽出部が、前回抽出された候補画像からの変化量が所定量以下である場合、候補画像の抽出を停止する。結果として、対象物検知モデルの検知精度の向上に寄与する候補画像を効率的に収集することができる。 The work support apparatus according to the fifteenth aspect is the work support apparatus according to the ninth to fourteenth aspects, wherein the extraction unit extracts the candidate image when the amount of change from the previously extracted candidate image is less than or equal to a predetermined amount. To stop. As a result, it is possible to efficiently collect candidate images that contribute to improving the detection accuracy of the object detection model.
 第16観点の作業支援装置は、第9観点から第15観点の作業支援装置であって、対象物が写されているか否かが設定された画像を教師画像に加えて、対象物検知モデルを更新する更新部をさらに備える。このような構成により、使用に応じて、対象物の検知精度の向上に寄与する候補画像を効率的に収集できる。 A work support apparatus according to a sixteenth aspect is the work support apparatus according to the ninth to fifteenth aspects, in which an image in which whether or not an object is photographed is set to a teacher image and an object detection model is set. An updating unit for updating is further provided. With such a configuration, it is possible to efficiently collect the candidate images that contribute to the improvement of the detection accuracy of the object according to the use.
 第17観点のプログラムは、コンピュータを、抽出部及び出力部として機能させるものである。抽出部は、対象物が写された教師画像を用いて構築された対象物検知モデルを用いて、任意の動画像から、対象物が写されている確率が第1閾値以上である領域を含む画像を候補画像として抽出する。出力部は、当該候補画像に、対象物が写されている確率が第1閾値以上である領域に対応するバウンディングボックスが合成された画像を出力する。このような構成により、バウンディングボックスの表示を維持するか、又は削除することで、対象物が写されている対象物画像であるか否かを効率的に設定できる。結果として、多数の教師画像を効率的に集めることができる。 The program of the seventeenth aspect causes a computer to function as an extraction unit and an output unit. The extraction unit includes an area in which the probability that the object is imaged is equal to or greater than a first threshold value from an arbitrary moving image by using the object detection model constructed by using the teacher image in which the object is imaged. Extract the image as a candidate image. The output unit outputs, to the candidate image, an image in which a bounding box corresponding to a region in which the probability that the object is captured is equal to or higher than the first threshold value is combined. With such a configuration, by maintaining the display of the bounding box or deleting it, it is possible to efficiently set whether or not the target object is the image of the target object. As a result, a large number of teacher images can be efficiently collected.
 第18観点の作業支援方法は、コンピュータを用いて、画像内の領域に対象物(O)が写されているか否かを設定するための作業を支援する方法である。この作業支援方法では、対象物が写された教師画像を用いて構築された対象物検知モデルを用いて、任意の動画像から、前記対象物が写されている確率が第1閾値以上である領域を含む画像を候補画像として抽出する。また、この作業支援方法では、当該候補画像に、対象物が写されている確率が第1閾値以上である領域に対応するバウンディングボックスが合成された画像を出力する。したがって、この作業支援方法によれば、バウンディングボックスの表示を維持するか、又は削除することで、対象物が写されている対象物画像であるか否かを効率的に設定できる。結果として、多数の教師画像を効率的に集めることができる。 The work support method according to the eighteenth aspect is a method of using a computer to support work for setting whether or not an object (O) is imaged in a region within an image. In this work support method, the probability that the object is copied from an arbitrary moving image is equal to or higher than a first threshold value by using an object detection model constructed using a teacher image in which the object is copied. An image including a region is extracted as a candidate image. Further, in this work support method, an image in which a bounding box corresponding to a region in which the probability that an object is imaged is equal to or higher than a first threshold is combined is output to the candidate image. Therefore, according to this work support method, by maintaining the display of the bounding box or deleting it, it is possible to efficiently set whether or not the object is the image of the object. As a result, a large number of teacher images can be efficiently collected.
20   作業支援装置
21   記憶部
21M  対象物検知モデル
22   入力部
23   出力部
24   処理部
24A  抽出部
24B  生成部
24C  合成部
24D  設定部
24E  更新部
GD   動画像
Gd   フレームの画像
Gk   候補画像
Gt   教師画像
Gc   基準画像
M    対象物が写されている確率の値が表示される領域
I1   アイコン(前に戻る)
I2   アイコン(次に進む)
20 work support device 21 storage unit 21M object detection model 22 input unit 23 output unit 24 processing unit 24A extracting unit 24B generating unit 24C combining unit 24D setting unit 24E updating unit GD moving image Gd frame image Gk candidate image Gt teacher image Gc Reference image M Area I1 icon in which the value of the probability that the object is imaged is displayed (back)
I2 icon (Next)
特開2018-169672号公報JP, 2008-169672, A

Claims (18)

  1.  画像内の領域に対象物(O)が写されているか否かを設定するための作業を支援する作業支援装置(20)であって、
     前記対象物が写された教師画像(Gt)を用いて構築された対象物検知モデル(21M)を用いて、任意の動画像(GD)から、前記対象物が写されている確率が第1閾値以上である領域を含む画像を候補画像(Gk)として抽出する抽出部(24A)と、
     前記候補画像における、前記対象物が写されている確率が第1閾値以上である領域の座標情報を生成する生成部(24B)と、
    を備える、作業支援装置(20)。
    A work support device (20) for supporting work for setting whether or not an object (O) is imaged in a region within an image,
    Using the target object detection model (21M) constructed using the teacher image (Gt) in which the target object is copied, the probability that the target object is copied from an arbitrary moving image (GD) is first. An extraction unit (24A) that extracts an image including a region equal to or more than a threshold value as a candidate image (Gk);
    A generation unit (24B) that generates coordinate information of a region in the candidate image in which the probability that the object is photographed is a first threshold value or more;
    A work support device (20) comprising:
  2.  前記対象物検知モデルは、対象物の領域抽出及び対象物の物体認識をエンドツーエンドで行なうニューラルネットワークにより構築されたものである、
     請求項1に記載の作業支援装置。
    The object detection model is constructed by a neural network that performs end-to-end object extraction and object area detection of the object,
    The work support device according to claim 1.
  3.  前記抽出部は、前記対象物が写されている確率が第2閾値以下である領域を含む画像を前記候補画像として抽出する、
     請求項1又は2に記載の作業支援装置。
    The extraction unit extracts, as the candidate image, an image including a region in which the probability that the object is captured is a second threshold value or less,
    The work support device according to claim 1.
  4.  前記抽出部は、前回抽出された候補画像からの変化量が所定量以下である場合、前記候補画像の抽出を停止する、
     請求項1から3のいずれか1項に記載の作業支援装置。
    The extraction unit stops extraction of the candidate image when the amount of change from the previously extracted candidate image is less than or equal to a predetermined amount.
    The work support device according to any one of claims 1 to 3.
  5.  前記対象物が写されているか否かが設定された画像を前記教師画像に加えて、前記対象物検知モデルを更新する更新部(24E)をさらに備える、
     請求項1から4のいずれか1項に記載の作業支援装置。
    An update unit (24E) that updates the target object detection model by adding an image in which whether or not the target object is captured to the teacher image is further provided.
    The work support device according to any one of claims 1 to 4.
  6.  請求項5に記載の作業支援装置により更新された対象物検知モデル。 An object detection model updated by the work support device according to claim 5.
  7.  コンピュータを、
     画像内の領域に対象物(O)が写されているか否かを設定するための作業を支援する作業支援装置(20)として機能させるプログラムであって、
     前記対象物が写された教師画像(Gt)を用いて構築された対象物検知モデル(21M)を用いて、任意の動画像(GD)から、前記対象物が写されている確率が第1閾値以上である領域を含む画像を候補画像(Gk)として抽出する抽出部(24A)、
     前記候補画像における、前記対象物が写されている確率が第1閾値以上である領域の座標情報を生成する生成部(24B)、
    として機能させるプログラム。
    Computer,
    A program that functions as a work support device (20) that supports a work for setting whether or not an object (O) is imaged in an area in an image,
    Using the target object detection model (21M) constructed using the teacher image (Gt) in which the target object is copied, the probability that the target object is copied from an arbitrary moving image (GD) is first. An extraction unit (24A) that extracts an image including a region equal to or larger than a threshold value as a candidate image (Gk),
    A generation unit (24B) that generates coordinate information of an area in the candidate image in which the probability that the object is captured is equal to or higher than a first threshold value,
    A program to function as.
  8.  コンピュータを用いて、画像内の領域に対象物(O)が写されているか否かを設定するための作業を支援する作業支援方法であって、
     前記対象物が写された教師画像(Gt)を用いて構築された対象物検知モデル(21M)を用いて、任意の動画像(GD)から、前記対象物が写されている確率が第1閾値以上である領域を含む画像を候補画像(Gk)として抽出し、
     前記候補画像における、前記対象物が写されている確率が第1閾値以上である領域の座標情報を生成する、
     作業支援方法。
    A work support method for supporting a work for setting whether or not an object (O) is imaged in an area in an image using a computer,
    Using the target object detection model (21M) constructed using the teacher image (Gt) in which the target object is copied, the probability that the target object is copied from an arbitrary moving image (GD) is first. An image including a region that is equal to or greater than a threshold is extracted as a candidate image (Gk),
    Generating coordinate information of an area in the candidate image in which the probability that the object is imaged is a first threshold value or more;
    Work support method.
  9.  対象物(O)が写された教師画像(Gt)を用いて構築された対象物検知モデル(21M)を用いて、任意の動画像(GD)から、前記対象物が写されている確率が第1閾値以上である領域を含む画像を候補画像(Gk)として抽出する抽出部(24A)と、
     当該候補画像に、前記対象物が写されている確率が第1閾値以上である領域に対応するバウンディングボックス(B)が合成された画像を出力する出力部(23)と、
     を備える、作業支援装置(20)。
    Using the object detection model (21M) constructed using the teacher image (Gt) in which the object (O) is copied, the probability that the object is copied from an arbitrary moving image (GD) is calculated. An extraction unit (24A) that extracts an image including a region that is equal to or larger than the first threshold value as a candidate image (Gk);
    An output unit (23) that outputs an image in which a bounding box (B) corresponding to an area in which the probability that the object is imaged is equal to or higher than a first threshold is combined with the candidate image,
    A work support device (20) comprising:
  10.  前記バウンディングボックス内に前記対象物が写されていること、又は、前記バウンディングボックス内に前記対象物が写されていないことの設定を受け付ける設定部(22,24D)、
    をさらに備える。請求項9に記載の作業支援装置。
    A setting unit (22, 24D) that accepts a setting that the object is imaged in the bounding box, or that the object is not imaged in the bounding box,
    Is further provided. The work support device according to claim 9.
  11.  前記対象物検知モデルは、複数の対象物を検知するものであり、
     第1対象物が写されている確率が第1閾値以上である領域に対応するバウンディングボックスが合成された画像に対し、当該第1対象物に代えて第2対象物が写されていることの設定を受け付ける設定部(22,24D)、
    をさらに備える。請求項9又は10に記載の作業支援装置。
    The object detection model is for detecting a plurality of objects,
    For the image in which the bounding box corresponding to the area in which the probability that the first object is copied is equal to or greater than the first threshold value is combined, the second object is copied instead of the first object. A setting unit (22, 24D) that receives the setting,
    Is further provided. The work support device according to claim 9.
  12.  前記対象物検知モデルは、複数の対象物を検知するものであり、
     前記対象物検知モデル(21M)を用いて、任意の動画像(GD)から、対象物が写されている確率が第1閾値以上である領域を含む画像を候補画像(Gk)として、対象物毎に区分けされたフォルダに格納する、
     請求項9から11のいずれか1項に記載の作業支援装置。
    The object detection model is for detecting a plurality of objects,
    Using the object detection model (21M), an image including an area in which the probability that the object is imaged is equal to or higher than a first threshold value is set as a candidate image (Gk) from an arbitrary moving image (GD). Store in a folder that is divided for each
    The work support device according to any one of claims 9 to 11.
  13.  前記対象物が写されている確率の値を前記候補画像に合成して出力する、
     請求項9から12のいずれか1項に記載の作業支援装置。
    A value of the probability that the object is imaged is combined with the candidate image and output.
    The work support device according to any one of claims 9 to 12.
  14.  前記抽出部は、前記対象物が写されている確率が第2閾値以下である領域を含む画像を前記候補画像として抽出する、
     請求項9から13のいずれか1項に記載の作業支援装置。
    The extraction unit extracts, as the candidate image, an image including a region in which the probability that the object is captured is a second threshold value or less,
    The work support device according to any one of claims 9 to 13.
  15.  前記抽出部は、前回抽出された候補画像からの変化量が所定量以下である場合、前記候補画像の抽出を停止する、
     請求項9から14のいずれか1項に記載の作業支援装置。
    The extraction unit stops extraction of the candidate image when the amount of change from the previously extracted candidate image is less than or equal to a predetermined amount.
    The work support device according to any one of claims 9 to 14.
  16.  前記対象物が
    写されているか否かが設定された画像を前記教師画像に加えて、前記対象物検知モデルを更新する更新部(24E)をさらに備える、
     請求項9から15のいずれか1項に記載の作業支援装置。
    An update unit (24E) that updates the target object detection model by adding an image in which whether or not the target object is captured to the teacher image is further provided.
    The work support device according to any one of claims 9 to 15.
  17.  コンピュータを、
     対象物(O)が写された教師画像(Gt)を用いて構築された対象物検知モデル(21M)を用いて、任意の動画像(GD)から、前記対象物が写されている確率が第1閾値以上である領域を含む画像を候補画像(Gk)として抽出する抽出部(24A)、
     当該候補画像に、前記対象物が写されている確率が第1閾値以上である領域に対応するバウンディングボックス(B)が合成された画像を出力する出力部(23)、
    として機能させるプログラム。
    Computer,
    Using the object detection model (21M) constructed using the teacher image (Gt) in which the object (O) is copied, the probability that the object is copied from an arbitrary moving image (GD) is calculated. An extraction unit (24A) that extracts an image including a region that is equal to or larger than the first threshold value as a candidate image (Gk),
    An output unit (23) that outputs an image in which a bounding box (B) corresponding to an area in which the probability that the object is imaged is equal to or higher than a first threshold is combined with the candidate image,
    A program to function as.
  18.  コンピュータを用いて、画像内の領域に対象物(O)が写されているか否かを設定するための作業を支援する作業支援方法であって、
     対象物(O)が写された教師画像(Gt)を用いて構築された対象物検知モデル(21M)を用いて、任意の動画像(GD)から、前記対象物が写されている確率が第1閾値以上である領域を含む画像を候補画像(Gk)として抽出し、
     当該候補画像に、前記対象物が写されている確率が第1閾値以上である領域に対応するバウンディングボックス(B)が合成された画像を出力する、
     作業支援方法。
     
     
    A work support method for supporting a work for setting whether or not an object (O) is imaged in an area in an image using a computer,
    Using the object detection model (21M) constructed using the teacher image (Gt) in which the object (O) is copied, the probability that the object is copied from an arbitrary moving image (GD) is calculated. An image including a region that is equal to or larger than the first threshold is extracted as a candidate image (Gk),
    An image in which a bounding box (B) corresponding to a region in which the probability that the object is imaged is equal to or higher than a first threshold is combined is output to the candidate image,
    Work support method.

PCT/JP2020/000731 2019-01-17 2020-01-10 Work assistance device, work assistance method, program, and object detection model WO2020149242A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2019-006450 2019-01-17
JP2019006450A JP6508797B1 (en) 2019-01-17 2019-01-17 Work support apparatus, work support method, program, and object detection model.
JP2019-066075 2019-03-29
JP2019066075A JP6756961B1 (en) 2019-03-29 2019-03-29 Work support devices, work support methods, programs, and object detection models.

Publications (1)

Publication Number Publication Date
WO2020149242A1 true WO2020149242A1 (en) 2020-07-23

Family

ID=71613028

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/000731 WO2020149242A1 (en) 2019-01-17 2020-01-10 Work assistance device, work assistance method, program, and object detection model

Country Status (1)

Country Link
WO (1) WO2020149242A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023170912A1 (en) * 2022-03-11 2023-09-14 日本電気株式会社 Information processing device, generation method, information processing method, and computer-readable medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07160661A (en) * 1993-12-02 1995-06-23 Hitachi Ltd Automatic teacher data extraction method for neural network, neural network system using the same and plant operation supporting device
JP2013232181A (en) * 2012-04-06 2013-11-14 Canon Inc Image processing apparatus, and image processing method
JP2015191334A (en) * 2014-03-27 2015-11-02 キヤノン株式会社 Information processor and information processing method
JP2017162025A (en) * 2016-03-07 2017-09-14 株式会社東芝 Classification label allocation device, classification label allocation method, and program
JP2018151833A (en) * 2017-03-13 2018-09-27 パナソニック株式会社 Identifier learning device and identifier learning method
JP2018151843A (en) * 2017-03-13 2018-09-27 ファナック株式会社 Apparatus and method for image processing to calculate a likelihood of an image of an object detected from an input image
US20180348346A1 (en) * 2017-05-31 2018-12-06 Uber Technologies, Inc. Hybrid-View Lidar-Based Object Detection

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07160661A (en) * 1993-12-02 1995-06-23 Hitachi Ltd Automatic teacher data extraction method for neural network, neural network system using the same and plant operation supporting device
JP2013232181A (en) * 2012-04-06 2013-11-14 Canon Inc Image processing apparatus, and image processing method
JP2015191334A (en) * 2014-03-27 2015-11-02 キヤノン株式会社 Information processor and information processing method
JP2017162025A (en) * 2016-03-07 2017-09-14 株式会社東芝 Classification label allocation device, classification label allocation method, and program
JP2018151833A (en) * 2017-03-13 2018-09-27 パナソニック株式会社 Identifier learning device and identifier learning method
JP2018151843A (en) * 2017-03-13 2018-09-27 ファナック株式会社 Apparatus and method for image processing to calculate a likelihood of an image of an object detected from an input image
US20180348346A1 (en) * 2017-05-31 2018-12-06 Uber Technologies, Inc. Hybrid-View Lidar-Based Object Detection

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023170912A1 (en) * 2022-03-11 2023-09-14 日本電気株式会社 Information processing device, generation method, information processing method, and computer-readable medium

Similar Documents

Publication Publication Date Title
US20110242130A1 (en) Image processing apparatus, image processing method, and computer-readable medium
CN106648571B (en) Method and device for calibrating application interface
WO2020149242A1 (en) Work assistance device, work assistance method, program, and object detection model
JP6623851B2 (en) Learning method, information processing device and learning program
US20140310248A1 (en) Verification support program, verification support apparatus, and verification support method
JP6756961B1 (en) Work support devices, work support methods, programs, and object detection models.
CN108156504B (en) Video display method and device
JP6508797B1 (en) Work support apparatus, work support method, program, and object detection model.
JP7135314B2 (en) Display program, display method and display device
JP2011048639A (en) Apparatus and method for detecting and evaluating clone code, and program
JP2015099564A (en) Visual programming device and control method therefor
JP2011141664A (en) Device, method and program for comparing document
JP6487100B1 (en) Form processing apparatus and form processing method
WO2021116810A1 (en) Apparatus, method and computer-readable storage medium for detecting objects in a video signal based on visual evidence using an output of a machine learning model
JP2011198006A (en) Object detecting apparatus, object detecting method, and object detecting program
JPWO2020085374A1 (en) Proficiency index providing device, proficiency index providing method, and program
JP4960188B2 (en) Screen transition diagram display method and system
CN113312125B (en) Multi-window adjusting method, system, readable storage medium and electronic equipment
JPWO2015093231A1 (en) Image processing device
US20230196566A1 (en) Image annotation system and method
US20230196725A1 (en) Image annotation system and method
JP6996618B2 (en) Adaptive interface providing device, adaptive interface providing method, and program
JP6490852B1 (en) Unread determination threshold setting method and unread determination threshold setting apparatus
US20140351685A1 (en) Method and apparatus for interactive review of a dataset
JP6836349B2 (en) Information processing equipment, information processing methods, and programs

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20741973

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20741973

Country of ref document: EP

Kind code of ref document: A1