WO2022185899A1 - Information processing device, information processing method, method for manufacturing detection model, and program - Google Patents

Information processing device, information processing method, method for manufacturing detection model, and program Download PDF

Info

Publication number
WO2022185899A1
WO2022185899A1 PCT/JP2022/005877 JP2022005877W WO2022185899A1 WO 2022185899 A1 WO2022185899 A1 WO 2022185899A1 JP 2022005877 W JP2022005877 W JP 2022005877W WO 2022185899 A1 WO2022185899 A1 WO 2022185899A1
Authority
WO
WIPO (PCT)
Prior art keywords
data set
inference
threshold
pseudo
detection model
Prior art date
Application number
PCT/JP2022/005877
Other languages
French (fr)
Japanese (ja)
Inventor
勇貴 田中
周平 吉田
真 寺尾
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to JP2023503690A priority Critical patent/JPWO2022185899A1/ja
Publication of WO2022185899A1 publication Critical patent/WO2022185899A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present invention relates to a technique for associating pseudo-labels with one or more images included in a dataset used for re-learning a detection model.
  • a detection model that detects objects contained in images becomes a highly accurate detection model by learning using a large number of correct data.
  • the process of collecting a large amount of data and associating correct labels with the data is expensive. For this reason, in order to generate a highly accurate detection model from a small number of correct data, there is known a technique of associating false labels with non-correct data.
  • a pseudo-label is a reliable inference result obtained by inferring an image from a non-correct dataset using a detection model trained only on a dataset with correct answers.
  • Non-Patent Document 1 discloses a method of adopting an inference result whose reliability is equal to or higher than a threshold value as a pseudo label.
  • Non-Patent Document 1 Since the method described in Non-Patent Document 1 requires adjustment to set an appropriate threshold, there is room for reducing the time and computational costs required for this adjustment. In other words, there is room for further reducing the cost of generating highly accurate detection models using pseudo labels.
  • an object of one aspect of the present invention is to provide a technology capable of generating a highly accurate detection model while suppressing generation costs.
  • An information processing apparatus includes learning means for learning a detection model using a first data set, and inputting each of one or more images included in an evaluation data set to the detection model.
  • threshold determination means for determining a first threshold with reference to a comparison result between one or more inference results obtained by the above and one or more correct labels attached to each of the one or more images; inference means for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the second data set into the detection model; Data after pseudo-labeling by setting an inference result having a reliability equal to or higher than the first threshold among one or more inference results by the means as a pseudo-label and associating the pseudo-label with the corresponding image a data set generation means for generating the set.
  • An information processing apparatus includes first learning means for learning a first detection model using a first data set, and training of a second detection model using a second data set.
  • second learning means for performing learning; one or more inference results obtained by inputting each of one or more images included in the first evaluation data set into the first detection model; or a first threshold determination means for determining a first threshold with reference to a comparison result with one or more correct labels attached to each of the plurality of images; or comparing one or more inference results obtained by inputting each of a plurality of images into the second detection model and one or more correct labels attached to each of the one or more images; second threshold determination means for determining a second threshold with reference to the one or more images included in the second data set, by inputting each of the one or more images into the first detection model; first inference means for obtaining one or more inference results for each of a plurality of images; and inputting each of the one or more images included in the first data set into the second detection model.
  • a second inference means for obtaining one or more inference results for each of the one or more images
  • a first data set generation means for generating a second data set after pseudo-labeling by setting an inference result having a reliability of to a pseudo-label and associating the pseudo-label with the corresponding image
  • An information processing apparatus includes acquisition means for acquiring a target image, and detection means for detecting an object included in the target image using a target image detection model, the target
  • the image detection model includes a learning process for learning the detection model using the first data set, and one or more images obtained by inputting each of one or more images included in the evaluation data set into the detection model.
  • Inference processing for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images into the detection model, obtaining one or more inference results from the inference processing Among them, a dataset generation process for generating a dataset after pseudo-labeling by setting an inference result having a reliability equal to or higher than the first threshold as a pseudo-label and associating the pseudo-label with the corresponding image; and a pseudo-label reference learning process for learning the detection model for the target image by referring to the data set after the pseudo-labeling.
  • An information processing method includes a learning step of learning a detection model using a first data set; a threshold determination step of determining a first threshold with reference to a comparison result between one or more inference results obtained by and one or more correct labels attached to each of the one or more images; an inference step of obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the second data set into the detection model; By setting an inference result having a reliability equal to or higher than the first threshold among one or more inference results by the process to a pseudo-label and associating the pseudo-label with the corresponding image, data after pseudo-labeling and a data set generation step for generating the set.
  • An information processing method includes acquiring a target image, and detecting an object included in the target image using a target image detection model.
  • the detection model includes a learning process for learning the detection model using the first data set, and one or more inferences obtained by inputting each of one or more images included in the evaluation data set into the detection model.
  • Threshold determination processing for determining a first threshold with reference to a comparison result between the result and one or more correct labels attached to each of the one or more images, and one or more included in the second data set
  • Inference processing for obtaining one or more inference results for each of the one or more images by inputting each of a plurality of images into the detection model, one or more inference results by the inference processing,
  • a dataset generation process for generating a dataset after pseudo-labeling by setting an inference result having a reliability equal to or higher than the first threshold as a pseudo-label and associating the pseudo-label with the corresponding image; It is learned by a pseudo-label reference learning process of learning the detection model for the target image by referring to the data set to which pseudo-labels have been assigned.
  • a detection model manufacturing method includes a learning step of learning a detection model using a first data set; A threshold determination step of determining a first threshold by referring to a comparison result between one or more inference results obtained by inputting to and one or more correct labels attached to each of the one or more images and an inference step of obtaining one or more inference results for each of the one or more images contained in the second data set by inputting each of the one or more images into the detection model; setting an inference result having a reliability equal to or higher than the first threshold among the one or more inference results in the inference step as a pseudo label, and associating the pseudo label with the corresponding image, and a pseudo-label reference learning step of learning a target image detection model for detecting an object included in the target image using the pseudo-labeled dataset. including.
  • a program according to an aspect of the present invention is a program for causing a computer to function as an information processing device, the computer comprising: learning means for learning a detection model using a first data set; Comparison of one or more inference results obtained by inputting each of one or more images included in the set into the detection model and one or more correct labels attached to each of the one or more images threshold determination means for determining a threshold value with reference to the result; or an inference means for obtaining a plurality of inference results, and among the one or more inference results obtained by the inference means, an inference result having a reliability equal to or higher than the threshold is set as a pseudo label, and the pseudo label is assigned to the corresponding image.
  • learning means for learning a detection model using a first data set
  • threshold determination means for determining a threshold value with reference to the result
  • an inference means
  • a program according to an aspect of the present invention is a program for causing a computer to function as an information processing apparatus, wherein the computer acquires a target image using an acquisition unit for acquiring a target image and a target image detection model. and the detection means for detecting an object included in the target image detection model is a learning process for learning the detection model using the first data set, the evaluation data set 1 or A threshold with reference to a comparison result between one or more inference results obtained by inputting each of a plurality of images into the detection model and one or more correct labels attached to each of the one or more images and inputting each of the one or more images contained in the second data set into the detection model to obtain one or more inference results for each of the one or more images.
  • an inference process setting an inference result having a reliability equal to or higher than the threshold among one or more inference results from the inference process as a pseudo-label, and associating the pseudo-label with the corresponding image to assign the pseudo-label It is learned by a data set generation process for generating a later data set and a pseudo label reference learning process for learning the detection model for the target image by referring to the data set after the pseudo labeling.
  • FIG. 1 is a block diagram showing the configuration of an information processing device according to exemplary Embodiment 1 of the present invention
  • FIG. 2 is a flow chart showing the flow of an information processing method executed by the information processing apparatus shown in FIG. 1
  • FIG. 1 is a block diagram showing the configuration of an information processing device according to exemplary Embodiment 1 of the present invention
  • FIG. 4 is a flow chart showing the flow of an information processing method executed by the information processing apparatus shown in FIG. 3
  • FIG. 7 is a block diagram showing the configuration of an information processing apparatus according to exemplary Embodiment 2 of the present invention
  • FIG. 10 is a diagram showing specific examples of data included in the first data set and the second data set according to exemplary embodiment 2 of the present invention
  • 6 is a graph showing the relationship between the relevance rate and the recall rate calculated by the information processing apparatus shown in FIG. 5
  • FIG. 10 is a diagram showing a specific example of data included in a pseudo-labeled data set according to illustrative embodiment 2 of the present invention
  • FIG. 6 is a flowchart showing the flow of an information processing method executed by the information processing apparatus shown in FIG. 5
  • FIG. 7 is a block diagram showing the configuration of an information processing apparatus according to exemplary Embodiment 2 of the present invention
  • FIG. 11 is a block diagram showing the configuration of an information processing apparatus according to exemplary Embodiment 3 of the present invention
  • FIG. 10 is a diagram showing specific examples of data contained in the first data set and the second data set according to exemplary embodiment 3 of the present invention
  • FIG. 11 shows an example of data contained in a second dataset and a pseudo-labeled dataset generated from the second dataset according to illustrative embodiment 3 of the present invention
  • FIG. 11 is a block diagram showing the configuration of an information processing apparatus according to exemplary Embodiment 4 of the present invention
  • FIG. 10 is a diagram showing specific examples of data contained in the first data set and the second data set according to illustrative embodiment 4 of the present invention
  • FIG. 10 is a diagram showing an example of data contained in a pseudo-labeled data set according to illustrative embodiment 4 of the present invention
  • FIG. 12 is a block diagram showing the configuration of an information processing apparatus according to exemplary Embodiment 5 of the present invention
  • FIG. 18 is a flowchart showing a flow of an information processing method executed by the information processing apparatus shown in FIG. 17
  • FIG. 12 is a block diagram showing the configuration of an information processing device according to exemplary Embodiment 6 of the present invention
  • 1 is a block diagram showing an example of a hardware configuration of an information processing device in each exemplary embodiment of the present invention
  • the information processing device 10 has a function as a dataset generation device that generates a dataset after adding a pseudo-label by attaching a pseudo-label to a target dataset.
  • the information processing device 10 first learns the detection model using the first data set. Further, the information processing apparatus 10 includes one or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more inference results attached to each of the images. The first threshold is determined by referring to the result of comparison with the correct label. Furthermore, the information processing apparatus 10 obtains one or more inference results for each of the one or more images included in the second data set by inputting each of the one or more images into the detection model.
  • the information processing device 10 sets an inference result having a reliability equal to or higher than the first threshold among one or more inference results from one or more images included in the second data set as a pseudo label,
  • a pseudo-labeled data set is generated by associating the pseudo-labels with the corresponding images.
  • FIG. 1 is a block diagram showing the configuration of an information processing device 10. As shown in FIG.
  • the information processing device 10 includes a learning unit 101, a threshold determination unit 102, an inference unit 103, and a dataset generation unit 104.
  • the learning unit 101 is a configuration that implements learning means in this exemplary embodiment.
  • the threshold determination unit 102 is a configuration that implements threshold determination means in this exemplary embodiment.
  • the inference unit 103 is a configuration that realizes inference means in this exemplary embodiment.
  • the data set generation unit 104 is a configuration that implements data set generation means in this exemplary embodiment.
  • a learning unit 101 learns a detection model using the first data set. Specifically, the learning unit 101 uses a first data set including one or more images to learn a detection model for detecting objects included in the images. Detection means that by inputting an image into the detection model, - Presence or absence of an object included in the image - Position of the object included in the image - Size of the object included in the image - Output of inference results regarding at least one of the category of the object included in the image. The learning unit 101 learns a detection model that receives an image as an input and outputs an inference result as described above.
  • the threshold determination unit 102 determines one or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct answers given to each of the images.
  • a first threshold is determined by referring to the comparison result with the label.
  • the correct label is, for one or more objects included in each of one or more images included in the evaluation data set, It is a label containing ground truth data regarding at least one of the position of an object included in the image, the size of the object included in the image, and the category of the object included in the image.
  • the inference unit 103 obtains one or more inference results for each of the one or more images included in the second data set by inputting each of the one or more images into the detection model described above.
  • the second data set includes one or more images different from the first data set.
  • the data set generation unit 104 sets an inference result having a reliability equal to or higher than a first threshold among the one or more inference results by the inference unit 103 as a pseudo label, and associates the pseudo label with the corresponding image. generates a pseudo-labeled dataset.
  • the pseudo label is, for each of one or more images included in the second data set,
  • the label includes data on at least one of the position of each of one or more objects inferred to be objects by the inference unit 103, the size of each object, and the category of each object.
  • the pseudo label assigned to the object may or may not match the correct label.
  • any one or more items of the position, size, and category of the object contained in the correct data about the object match the position, size, and category of the object in the pseudo label, and the other items match It may happen that you do not.
  • the accuracy of pseudo-labels can generally be adjusted by adjusting the first threshold described above, but adjusting the first threshold generally requires time and computational costs.
  • the first data set for determining whether or not to use the inference result for each of the images included in the second data set as a pseudo label is adopted. Therefore, according to the information processing apparatus 10 according to the exemplary embodiment, it is possible to reduce the cost for adjusting the first threshold. Therefore, according to the information processing apparatus 10 according to this exemplary embodiment, it is possible to generate a highly accurate detection model while suppressing the generation cost.
  • FIG. 2 is a flow diagram showing the flow of the information processing method S10.
  • the information processing device 10 performs the information processing method S10 to generate a second data set including images with associated pseudo-labels.
  • the information processing method S10 includes steps S101 to S104.
  • Step S101 the learning unit 101 learns a detection model. Specifically, the learning unit 101 learns the detection model using the first data set. Step S101 is a learning step in this exemplary embodiment.
  • Step S102 the threshold determination unit 102 determines a first threshold. Specifically, the threshold determination unit 102 inputs one or more images included in the evaluation data set to the detection model to obtain one or more inference results, and the A first threshold for determining a pseudo label is determined by referring to the result of comparison with one or more correct labels. Step S102 is the threshold determination step in this exemplary embodiment.
  • Step S103 the inference unit 103 makes an inference. Specifically, the inference unit 103 obtains one or more inference results for each of the one or more images included in the second data set by inputting each of the one or more images into the detection model. Step S103 is an inference step in this exemplary embodiment.
  • Step S104 the data set generation unit 104 generates a data set after adding pseudo labels. Specifically, the data set generation unit 104 sets an inference result having a reliability equal to or higher than a first threshold among the one or more inference results in step S103 as a pseudo label, and sets the pseudo label as a second inference result. generates a pseudo-labeled dataset by associating it with the corresponding images in the dataset. Step S104 is the data set generation step in this exemplary embodiment.
  • step S103 is not limited to after execution of step S102.
  • the execution timing may be after execution of step S101 and before execution of step S104, for example, before execution of step S102.
  • the same effects as those of the information processing apparatus 10 can be obtained. That is, in the information processing method S10 according to this exemplary embodiment, the first threshold for determining whether the inference result for each of the images included in the second data set is to be the pseudo label is automatically set. A configuration determined by is adopted. Therefore, according to the information processing method S10 according to the exemplary embodiment, it is possible to reduce the cost for adjusting the first threshold. Therefore, according to the information processing method S10 according to this exemplary embodiment, it is possible to generate a highly accurate detection model while suppressing the generation cost.
  • the information processing device 20 acquires a target image and uses the target image detection model to detect an object included in the image.
  • the target image detection model is a re-learning of the detection model learned by the information processing apparatus 10 described above, specifically, the learning unit 101. This is a detection model that has been relearned with reference to the dataset.
  • the target image detection model is not limited to this.
  • the detection model for the target image may be a detection model trained using the dataset after the pseudo-labeling, for example, a new detection model trained using the dataset after the pseudo-labeling. good too.
  • the new detection model is a detection model different from the detection model learned by the learning unit 101 .
  • FIG. 3 is a block diagram showing the configuration of the information processing device 20. As shown in FIG.
  • the information processing device 20 includes an acquisition unit 201 and a detection unit 202 .
  • the acquisition unit 201 is a configuration that implements acquisition means in this exemplary embodiment.
  • the detection unit 202 is a configuration that realizes detection means in this exemplary embodiment.
  • the acquisition unit 201 acquires the target image.
  • the target image is an image input to the detection model in order to detect an object included in the image.
  • the acquisition unit 201 may acquire the target image by reading the target image stored in the information processing device 20, or may acquire the target image supplied from the imaging device.
  • the acquisition unit 201 may acquire the target image via an input device (not shown).
  • the acquiring unit 201 may acquire the target image from another device (not shown) communicably connected to the information processing device 20 .
  • the detection unit 202 detects an object included in the target image using the target image detection model.
  • the target image detection model is a detection model used to detect an object included in the target image, and the target image detection model according to the present exemplary embodiment is the above-described relearned detection model. .
  • the detection unit 202 acquires an inference result output from the target image detection model by inputting the target image into the target image detection model.
  • the detection unit 202 holds a target image detection model, and inputs the target image to the target image detection model.
  • the detection unit 202 accesses a target image detection model stored in a storage device (not shown) and inputs a target image.
  • a pseudo-label is determined using the automatically determined first threshold, and a data set including images associated with the pseudo-label is generated.
  • a configuration is adopted in which an object is detected using a target image detection model that has been trained using the target image detection model. Therefore, according to the information processing apparatus 20 according to the present exemplary embodiment, it is possible to detect an object included in an image using a target image detection model in which the cost for adjusting the first threshold is reduced. effect is obtained.
  • FIG. 4 is a flow diagram showing the flow of the information processing method S20.
  • the information processing device 20 executes the information processing method S20 in order to detect an object included in the target image.
  • the information processing method S20 includes steps S201 and S202.
  • Step S201 the acquisition unit 201 acquires a target image.
  • Step S202 the detection unit 202 detects an object. Specifically, the detection unit 202 detects an object included in the target image using the target image detection model. More specifically, the detection unit 202 inputs the target image acquired by the acquisition unit 201 to the target image detection model, and acquires the inference result output by the detection model.
  • the same effects as those of the information processing apparatus 20 can be obtained. That is, in the information processing method S20 according to the present exemplary embodiment, the pseudo-label is determined using the automatically determined first threshold, and the data set including the image associated with the pseudo-label is used for training.
  • a configuration is adopted in which an object is detected using the target image detection model for which the above is performed. Therefore, according to the information processing method S20 according to the exemplary embodiment, it is possible to detect an object included in an image using a target image detection model that reduces the cost of adjusting the first threshold. effect is obtained.
  • the information processing apparatus 10a is a modification of the first exemplary embodiment. Specifically, the information processing device 10a acquires the first data set, and performs the learning of the detection model, the determination of the threshold value, the inference, and the creation of the data set after pseudo-labeling as described in the first exemplary embodiment. . Further, the information processing apparatus 10a learns the detection model for the target image using the generated pseudo-labeled data set. Typically, the detection model for the target image is a detection model that is re-learned with respect to the above-mentioned detection model, and is a detection model that has been re-learned with reference to the dataset after the pseudo-labeling. As described above, the target image detection model is not limited to a re-learned detection model, and may be a detection model learned using a data set after pseudo-labeling.
  • FIG. 5 is a block diagram showing the configuration of the information processing device 10a.
  • the information processing device 10a includes a control unit 100a and a storage unit 150a.
  • the control unit 100a centrally controls each unit of the information processing device 10a.
  • the storage unit 150a stores various programs and data used by the information processing apparatus 10a.
  • the storage unit 150a stores an evaluation data set DSE, a data set 1 (DS1), a data set 2 (DS2), a data set 2' (DS2'), and an object detection model DM.
  • the evaluation data set DSE is the evaluation data set in this exemplary embodiment.
  • Data set 1 (DS1) is the first data set in this exemplary embodiment.
  • Data set 2 (DS2) is the second data set in this exemplary embodiment.
  • Data set 2' (DS2') is the pseudo-labeled data set in this exemplary embodiment.
  • FIG. 6 is a diagram showing a specific example of data included in data set 1 (DS1) and data set 2 (DS2). Specifically, FIG. 6 shows one of the images contained in dataset 1 (DS1) and one of the images contained in dataset 2 (DS2).
  • each of these images contains five objects, specifically three people and two bags.
  • each of the five objects is associated with a correct label.
  • the correct label is a label containing categories and bounding boxes as shown in FIG.
  • the category is category information indicating the category of the object included in the image associated with the correct label, and specifically, correct data regarding the category of the object.
  • each of the three persons is associated with a "person" category
  • each of the two bags is associated with a "bag” category.
  • the bounding box is area information indicating the area of the object included in the image associated with the correct label, and specifically, correct data regarding the position and size of the object included in the image.
  • One bounding box is associated with one object, and a typical example of the bounding box is data indicating the minimum rectangle enclosing the object, as shown in FIG.
  • the “image”, “dataset” and “correct label” described in each exemplary embodiment can be expressed as follows.
  • the image x input to the sensing model is an element of the data space X;
  • the data space X corresponds to the data set containing the image x.
  • the number of objects included in one image x is arbitrary.
  • a correct label can be represented by a pair (y,b) of category y and bounding box b.
  • category y is an element of category set Y, and in the example of FIG. 6, set Y is "person and bag".
  • the data set D associated with the correct label is the image x and the set of all objects included in the image x group with as a set of can be expressed as
  • control unit 100a includes a learning unit 101, a threshold value determination unit 102, an inference unit 103, a data set generation unit 104, and a relearning unit 105.
  • the threshold determination unit 102 includes an evaluation data set inference unit 1021, an evaluation value calculation unit 1022, and a threshold determination unit 1023.
  • the dataset generator 104 includes a pseudo-label generator 1041 and an association unit 1042, as shown in FIG.
  • the evaluation data set inference unit 1021, the evaluation value calculation unit 1022, and the threshold determination unit 1023 correspond to the threshold determination unit 102 in exemplary embodiment 1, and are configured to implement the threshold determination means in this exemplary embodiment.
  • the pseudo-label generating unit 1041 and the associating unit 1042 correspond to the dataset generating unit 104 in exemplary embodiment 1, and are configured to implement the dataset generating means in this exemplary embodiment.
  • the re-learning unit 105 is a configuration that implements pseudo-label reference learning means in this exemplary embodiment.
  • the learning unit 101 acquires the data set 1 (DS1), and uses the data set 1 (DS1) to learn an object detection model for pseudo label generation. That is, the learning unit 101 also functions as an acquisition unit that acquires the first data set. Specifically, the learning unit 101 reads the data set 1 (DS1) stored in the storage unit 150a, and associates a correct label with each of the data set 1 (DS1), that is, one or more images. We train an object detection model for generating pseudo-labels using the data set. Then, the learning unit 101 outputs the trained pseudo-label generation object detection model to the evaluation data set inference unit 1021 and the inference unit 103 .
  • the evaluation data set inference unit 1021 generates an inference result based on the evaluation data set. Specifically, the evaluation data set inference unit 1021 acquires the evaluation data set DSE and the pseudo label generation object detection model, and converts each of the one or more images included in the evaluation data set DSE into pseudo label generation. input to the object detection model for use and obtain inference results. More specifically, the evaluation data set inference unit 1021 reads the evaluation data set DSE stored in the storage unit 150 a and inputs it to the pseudo label generation object detection model acquired from the learning unit 101 . Then, the evaluation data set inference unit 1021 acquires the inference result output by the object detection model for pseudo label generation, and outputs the inference result to the evaluation value calculation unit 1022 .
  • the evaluation data set DSE is a data set in which a correct label is associated with each object included in each image, similar to the data set 1 (DS1).
  • the images contained in the evaluation data set DSE may be part of the images contained in the data set 1 (DS1).
  • the images included in the evaluation data set DSE may be generated by giving correct labels to part of the images included in the data set 2 (DS2).
  • the images included in the evaluation data set DSE may be generated by assigning correct labels to images not included in the data set 1 (DS1) and the data set 2 (DS2). .
  • the inference result by the evaluation data set inference unit 1021 is, for each of one or more images included in the evaluation data set DSE, position of each of the one or more objects inferred to be an object, size of each of the objects, and/or category of each of the objects; Contains data.
  • the inference result by the evaluation data set inference unit 1021 includes category, bounding box and confidence.
  • the reliability is an example of data related to the certainty of inference, and is a numerical value with 0 as the minimum value and 1 as the maximum value, for example.
  • the evaluation value calculation unit 1022 calculates an evaluation value based on the inference result. Specifically, the evaluation value calculation unit 1022 compares each inference result for each of the one or more images included in the evaluation data set DSE with the correct label for each image, based on each inference result Calculate the evaluation value of
  • the evaluation value is the harmonic average of the precision and recall, that is, the F value.
  • F value calculation processing executed by the evaluation value calculation unit 1022 will be described.
  • the evaluation value calculation unit 1022 executes the following processes (1) to (6).
  • the reference value is, for example, 0.9.
  • a plurality of F values are calculated in the F value calculation process. Then, the reference value becomes a different value for each F value. That is, the value 0.9 described above can be expressed as the initial value of the reference value.
  • TP is an inference result in which the degree of overlap between the bounding box and the bounding box of the correct label is equal to or greater than a predetermined value and the category matches the correct label.
  • FP is (A) An inference result where the category matches the correct label, but the degree of overlap between the bounding box and the bounding box of the correct label is less than or equal to a predetermined value (B) An inference result where the correct label where the bounding box overlaps and the category are different (C) Any inference result in which there is no correct label that overlaps the bounding box.
  • IOU Intersection Over Union
  • FN false negative
  • D A correct label that does not have an inference result whose bounding box overlaps
  • E A correct label whose category is different from the inference result whose bounding box overlaps.
  • FIG. 7 is a graph showing the relationship between precision and recall. As shown in FIG. 7, the higher the reliability, the higher the precision, but the lower the recall. On the other hand, the lower the reliability, the lower the precision, but the higher the recall. In this way, there is a trade-off relationship between precision and recall.
  • the reliability is the reference value set in the process (2).
  • the evaluation value calculation unit 1022 decreases the reference value and executes the processing of (2) to (6) again. For example, the evaluation value calculation unit 1022 sets the next reference value to 0.8. In other words, the evaluation value calculator 1022 calculates the F value based on the following reference values. The evaluation value calculation unit 1022 repeats the processes (2) to (6) to calculate the F value based on each reference value. Thereby, a plurality of F values are calculated based on each of the different reference values.
  • the evaluation value calculation unit 1022 repeats the processes (2) to (6) until the F value is calculated with a reference value equal to or lower than the minimum reliability.
  • the F value is calculated for all inference results.
  • the processes (3) and (4) are omitted, and the past You may use the specific result in the process of (3) and (4).
  • the evaluation value calculation unit 1022 associates each calculated evaluation value, that is, the F value with the reference value used in calculating each F value, and outputs it to the threshold determination unit 1023 .
  • the evaluation value calculation unit 1022 may calculate the precision and the recall for each category when there are multiple categories in the inference result and the correct label.
  • the evaluation value calculation unit 1022 calculates the F value for each category. As a result, each reference value is associated with a plurality of F values calculated for each category.
  • the evaluation value calculated by the evaluation value calculation unit 1022 is not limited to the F value.
  • the evaluation value may be a value that emphasizes precision or recall.
  • the evaluation value calculation unit 1022 calculates the evaluation value by, for example, ⁇ (1+ ⁇ 2 ) ⁇ relevance ⁇ recall ⁇ / ⁇ ( ⁇ 2 ⁇ relevance)+recall). can be calculated.
  • is a value for adjusting the degree of importance of precision to recall. , it becomes an evaluation value that emphasizes the relevance rate.
  • the method of specifying at least part of the inference result in the process (2) when calculating a plurality of evaluation values is not limited to the above example.
  • the evaluation value calculation unit 1022 may identify a predetermined number of inference results in descending order of reliability. In this example, the evaluation value calculation unit 1022 increases the predetermined number by a predetermined number each time the process (6) is completed and the next processes (2) to (6) are performed. Then, the evaluation value calculation unit 1022 repeats the processes (2) to (6) until all the inference results are specified by the process (2) and the evaluation values are calculated. It should be noted that the amount of increase in the predetermined number in the last process (2) should be 1 or more and the predetermined number or less. In this example, each calculated evaluation value is associated with the minimum reliability among the reliability of the specified inference result and output to the threshold determination unit 1023 .
  • the evaluation value calculation unit 1022 For every inference result, specify TP, FP and FN. - A plurality of thresholds are set for the reliability, and the number of TPs with reliability equal to or higher than each threshold is specified. • Calculate precision and recall for each of the specified numbers of TPs. - An evaluation value (typical example: F value) is calculated for each of the calculated multiple combinations of precision and recall. and processing may be performed. Note that the number of specified TPs is proportional to the value of the recall. In this example, each calculated evaluation value is associated with the threshold value used to specify the number of TPs, and is output to the threshold determination unit 1023 .
  • the threshold determination unit 1023 determines the threshold based on the evaluation value. Specifically, the threshold determination unit 1023 identifies the maximum value among the acquired F values, and sets the reference value associated with the identified F value as the threshold.
  • the maximum value of the F values can be expressed as a value that balances the precision and the recall.
  • the F value is calculated by a formula including the precision and the recall. Therefore, the threshold determination unit 1023 determines the threshold by referring to the precision and the recall indicated by the comparison result of the evaluation value calculation unit 1022. Then it can be expressed.
  • the precision and recall at which the F value is the maximum value is the maximum precision or recall in the graph of FIG. It is not a point, but a point indicated by a star in the graph in FIG. 7, for example.
  • Threshold determination section 1023 outputs the determined threshold to pseudo-label generation section 1041 .
  • the threshold determination unit 1023 sets the threshold for each category. That is, the threshold determination unit 1023 determines a plurality of thresholds for each category, associates the plurality of thresholds with information indicating the corresponding category, and outputs the information to the pseudo-label generation unit 1041 .
  • the inference unit 103 reads the data set 2 (DS2) stored in the storage unit 150a, and adds one or a plurality of , and obtain one or more inference results for each of the images.
  • the inference unit 103 outputs the acquired inference result to the pseudo-label generation unit 1041 .
  • the pseudo-label generation unit 1041 generates pseudo-labels. Specifically, the pseudo-label generation unit 1041 sets an inference result having reliability equal to or higher than the threshold determined by the threshold determination unit 1023 among one or more inference results by the inference unit 103 as a pseudo-label. The pseudo-label generation unit 1041 outputs the inference result set in the pseudo-label to the association unit 1042 .
  • the pseudo-label generation unit 1041 selects one or more inference results from the inference unit 103 that have a reliability equal to or higher than the threshold set for each category. Set the result to a pseudo-label. Specifically, the pseudo-label generation unit 1041 classifies the inference result by the inference unit 103 for each category, and specifies a corresponding threshold for each classification, in other words, a threshold that matches the category. Then, the pseudo-label generation unit 1041 compares the reliability of each inference result with the specified threshold for each classification, and sets an inference result having a reliability equal to or higher than the threshold as a pseudo-label.
  • the association unit 1042 associates the pseudo label set by the pseudo label generation unit 1041 with the corresponding image. This produces a dataset 2' (DS2') in which each of the one or more images contained in the dataset 2 (DS2) is associated with a pseudo-label.
  • the association unit 1042 stores the generated data set 2 ′ (DS2′) in the storage unit 150 a and notifies the relearning unit 105 of it.
  • FIG. 8 is a diagram showing a specific example of data included in data set 2' (DS2'). Specifically, FIG. 8 shows one of the images contained in dataset 2' (DS2'). The image is an image included in data set 2 (DS2) shown in FIG. 6, and a pseudo label is associated with each of the five objects included in the image.
  • pseudo-labels are labels that include categories and bounding boxes as shown in FIG.
  • the category is category information indicating the category of objects included in the image associated with the pseudo label.
  • each of the three persons is associated with a "person" category
  • each of the two bags is associated with a "bag” category.
  • a bounding box is area information indicating the area of an object contained in an image associated with a pseudo label.
  • One bounding box is associated with one object, and a typical example of the bounding box is data indicating the minimum rectangle enclosing the object, as shown in FIG.
  • the re-learning unit 105 learns the detection model for the target image using the data set after the pseudo-labeling. As an example, the re-learning unit 105 re-learns the detection model learned by the learning unit 101 as learning of the target image detection model. Specifically, the relearning unit 105 reads the data set 2' (DS2') from the storage unit 150a, and uses the data set 2' (DS2') to learn the object detection model DM. Then, the relearning unit 105 stores the learned object detection model DM in the storage unit 150a. As another example, the relearning unit 105 may learn a new detection model as learning of the target image detection model, and store the new detection model in the storage unit 150a.
  • the information processing apparatus 10a according to the present exemplary embodiment adopts a configuration in which the target image detection model is learned using the data set to which pseudo labels have been assigned. Therefore, according to the information processing apparatus 10a according to the exemplary embodiment, it is possible to reduce the cost of adjusting the threshold value and generate the target image detection model. Therefore, according to the information processing apparatus 10a according to the exemplary embodiment, it is possible to generate a highly accurate target image detection model while suppressing the generation cost.
  • the information processing apparatus 10a in the information processing apparatus 10a according to this exemplary embodiment, a configuration is adopted in which the detection model learned by the learning unit 101 is re-learned as the learning of the target image detection model. Therefore, according to the information processing apparatus 10a according to the present exemplary embodiment, it is possible to reduce the cost of re-learning and improve the accuracy of the detection model.
  • the threshold for determining whether the inference result for each of the images included in the second data set is to be the pseudo label is automatically determined. configuration is adopted. Therefore, according to the information processing apparatus 10a according to the present exemplary embodiment, it is possible to reduce the number of times of re-learning required each time the threshold is adjusted to one. As a result, the time required for re-learning can be reduced, and the time required for generation of the detection model can be reduced.
  • the correct label and the pseudo label include area information and category information. Therefore, according to the information processing apparatus 10a according to the exemplary embodiment, it is possible to improve the accuracy of detecting an object included in an image using a re-learned detection model.
  • the information processing apparatus 10a according to the exemplary embodiment adopts a configuration in which the threshold is determined by referring to the calculated relevance rate and recall rate. Therefore, according to the information processing apparatus 10a according to the exemplary embodiment, it is possible to improve the accuracy of pseudo label setting. Further, according to the information processing apparatus 10a according to the present exemplary embodiment, pseudo labels can be set in consideration of both the quality of learning data (precision rate) and the amount of learning data (recall rate). It is possible to obtain an effect that a highly accurate target image detection model can be generated.
  • the inference result of the image included in the evaluation data set and the correct answer associated with the image are obtained by the pseudo label generation object detection model.
  • a configuration may be employed in which a threshold is set for each category of labels. Therefore, according to the information processing apparatus 10a according to the present exemplary embodiment, which employs this configuration, it is possible to improve the accuracy of setting pseudo labels.
  • a configuration may be adopted in which the images included in the evaluation data set DSE are included in the first data set. Therefore, according to the information processing apparatus 10a according to the present exemplary embodiment, which employs this configuration, there is no need to newly perform a high-cost correct answer assignment task in order to generate the evaluation data set DSE. effect is obtained. Further, according to the information processing apparatus 10a according to the present exemplary embodiment, which employs the configuration, it is possible to reduce the number of images to be prepared in advance.
  • images included in the evaluation data set DSE are generated by giving correct labels to part of the second data set.
  • a configuration may be employed. Therefore, according to the information processing apparatus 10a according to the present exemplary embodiment that employs this configuration, a part of the data set to which the pseudo label is assigned is used as the evaluation data set DSE to determine the threshold value. Therefore, it is possible to obtain an effect that the accuracy of the assigned pseudo-label can be improved. Further, according to the information processing apparatus 10a according to this exemplary embodiment, which employs the configuration, it is possible to reduce the number of images to be prepared in advance.
  • FIG. 9 is a flowchart showing the flow of the information processing method S10a.
  • the information processing device 10a performs the information processing method S10a to generate a second data set including images with associated pseudo-labels.
  • Step S101 the learning unit 101 learns a detection model. Specifically, the learning unit 101 reads the data set 1 (DS1) stored in the storage unit 150a, and associates a correct label with each of the data set 1 (DS1), that is, one or more images. We train an object detection model for generating pseudo-labels using the data set. Then, the learning unit 101 outputs the trained pseudo-label generation object detection model to the evaluation data set inference unit 1021 and the inference unit 103 .
  • step S1021 the evaluation data set inference unit 1021 generates an inference result based on the evaluation data set. Specifically, the evaluation data set inference unit 1021 reads the evaluation data set DSE stored in the storage unit 150 a and inputs it to the pseudo label generation object detection model acquired from the learning unit 101 . Then, the evaluation data set inference unit 1021 acquires the inference result output by the object detection model for pseudo label generation, and outputs the inference result to the evaluation value calculation unit 1022 .
  • step S1022 the evaluation value calculator 1022 calculates an evaluation value based on the inference result. Specifically, the evaluation value calculation unit 1022 calculates the inference result specified based on the reference value among the inference results by the evaluation data set inference unit 1021 and each of one or a plurality of images included in the evaluation data set DSE. A precision rate and a recall rate are calculated based on the comparison result with the correct label in , and an F value as an evaluation value is calculated from the precision rate and the recall rate. The evaluation value calculation unit 1022 repeats calculation of the F value by changing the reference value, and calculates a plurality of F values corresponding to each reference value. The evaluation value calculation unit 1022 associates each of the calculated F values with the corresponding reference value and outputs them to the threshold determination unit 1023 .
  • step S1023 the threshold determination unit 1023 determines a threshold based on the evaluation value. Specifically, the threshold determination unit 1023 identifies the maximum value among the acquired F values, and sets the reference value associated with the identified F value as the threshold. Threshold determination section 1023 outputs the determined threshold to pseudo-label generation section 1041 .
  • steps S1021 to S1023 correspond to step S102 described in the first exemplary embodiment.
  • Step S103 the inference unit 103 makes an inference. Specifically, the inference unit 103 reads the data set 2 (DS2) stored in the storage unit 150a, and adds the pseudo label generation object detection model acquired from the learning unit 101 to the data set 2 (DS2) Input each of the one or more images included and obtain one or more inference results for each of the images. The inference unit 103 outputs the acquired inference result to the pseudo-label generation unit 1041 .
  • step S1041 the pseudo-label generation unit 1041 generates pseudo-labels. Specifically, the pseudo-label generation unit 1041 sets an inference result having reliability equal to or higher than the threshold determined by the threshold determination unit 1023 among one or more inference results by the inference unit 103 as a pseudo-label. The pseudo-label generation unit 1041 outputs the inference result set in the pseudo-label to the association unit 1042 .
  • Step S1042 the associating unit 1042 associates the image with the pseudo label. Specifically, the associating unit 1042 associates each of the one or more images included in dataset 2 (DS2) with a corresponding pseudo label to generate dataset 2′ (DS2′).
  • the pseudo-label generation unit 1041 stores the data set 2′ (DS2′) generated by the association unit 1042 in the storage unit 150a and notifies the relearning unit 105 of it.
  • steps S1041 and S1042 correspond to step S104 described in the first exemplary embodiment.
  • the re-learning unit 105 learns the target image detection model using the data set after the pseudo-labeling.
  • the relearning unit 105 performs relearning of the detection model learned by the learning unit 101 as the learning. Specifically, the relearning unit 105 reads the data set 2' (DS2') from the storage unit 150a, and uses the data set 2' (DS2') to learn the object detection model DM. Then, the relearning unit 105 stores the learned object detection model DM in the storage unit 150a.
  • the relearning unit 105 may learn a new detection model as learning of the target image detection model, and store the new detection model in the storage unit 150a.
  • the same effects as those of the information processing apparatus 10a can be obtained. That is, in the information processing method S10a according to the present exemplary embodiment, a configuration is adopted in which the target image detection model is learned using the data set to which pseudo labels have been added. Therefore, according to the information processing method S10a according to the present exemplary embodiment, it is possible to reduce the cost of adjusting the threshold value and generate the target image detection model used by the information processing apparatus. . Therefore, according to the information processing method S10a according to this exemplary embodiment, it is possible to generate a highly accurate detection model while suppressing the generation cost.
  • FIG. 10 is a block diagram showing the configuration of the information processing device 20a.
  • the information processing device 20a includes a control section 200a, a storage section 250a and an output section 260a.
  • the control unit 200a centrally controls each unit of the information processing device 20a.
  • the storage unit 250a stores various programs and data used by the information processing device 20a.
  • the output unit 260a outputs information processing results by the information processing device 20a.
  • the storage unit 250a stores the target data set TDS and the object detection model DM.
  • the target data set TDS is a data set containing one or more target images that are object detection targets.
  • the object detection model DM is a target image detection model, specifically, an object detection model DM generated by the re-learning unit 105 of the information processing apparatus 10a.
  • the object detection model DM is - a learning process for learning a detection model using the first data set; ⁇ One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct answers given to each of the one or more images threshold determination processing for determining a threshold with reference to the comparison result with the label; an inference process for obtaining one or more inference results for each of the one or more images contained in the second data set by inputting each of the one or more images into the detection model; - Of the one or more inference results from the inference process, an inference result having a reliability equal to or higher than a threshold is set as a pseudo-label, and the pseudo-label is associated with the corresponding image to create a data set after pseudo-labeling and a pseudo-label reference learning process for re-learning the detection model for the target image by referring to the data set after giving the pseudo-label.
  • the object detection model DM is manufactured by a method including the
  • control unit 200a includes an acquisition unit 201 and a detection unit 202.
  • FIG. 10 (Configuration of control unit 200a) As shown in FIG. 10, the control unit 200a includes an acquisition unit 201 and a detection unit 202.
  • FIG. 10 (Configuration of control unit 200a) As shown in FIG. 10, the control unit 200a includes an acquisition unit 201 and a detection unit 202.
  • FIG. 10 (Configuration of control unit 200a) As shown in FIG. 10, the control unit 200a includes an acquisition unit 201 and a detection unit 202.
  • the acquisition unit 201 acquires the target image. Specifically, the acquisition unit 201 reads the target data set TDS from the storage unit 250 a and outputs it to the detection unit 202 .
  • the detection unit 202 detects an object included in the target image using the target image detection model. Specifically, the detection unit 202 inputs the target image included in the target data set TDS acquired from the acquisition unit 201 to the object detection model DM, and acquires the inference result output from the object detection model DM. The detection unit 202 outputs the obtained inference result to the output unit 260a. As a result, the output unit 260a, for each target image, At least one of the presence/absence of an object included in the target image, the position of the object included in the target image, the size of the object included in the target image, and the category of the object included in the target image is output.
  • the output unit 260a causes the display device to display the target image in which at least a part of the object is assigned a category and a bounding box.
  • the display device may be the output unit 260a, or may be a display device (not shown) communicably connected to the information processing device 20a.
  • a pseudo label is determined using an automatically determined threshold, and a data set including images associated with the pseudo label is used for learning.
  • a configuration is adopted in which an object is detected using the target image detection model for which the above is performed. For this reason, according to the information processing apparatus 20a according to the present exemplary embodiment, it is possible to detect an object included in an image using a target image detection model in which the cost for adjusting the threshold value is reduced. be done.
  • the information processing apparatus 20a in the information processing apparatus 20a according to the present exemplary embodiment, a configuration is adopted in which an inference result for the target image is output by the target image detection model. Therefore, according to the information processing device 20a according to the exemplary embodiment, an effect is obtained that the user of the information processing device 20a can recognize the inference result.
  • FIG. 11 is a block diagram showing the configuration of the information processing device 10b.
  • the information processing device 10b includes a control section 100b and a storage section 150b.
  • the control unit 100b centrally controls each unit of the information processing device 10b.
  • the storage unit 150b stores various programs and data used by the information processing device 10b.
  • the difference between the storage unit 150b and the storage unit 150a described in the second exemplary embodiment is the data included in the data set 2 (DS2). Details of the data will be described with reference to FIG.
  • FIG. 12 is a diagram showing a specific example of data included in dataset 1 (DS1) and dataset 2 (DS2). Specifically, FIG. 12 shows one of the images contained in dataset 1 (DS1) and one of the images contained in dataset 2 (DS2).
  • the images contained in dataset 1 (DS1) contain five objects, specifically three dogs and two cows.
  • the images contained in dataset 2 (DS2) also contain five objects, specifically two dogs and three cows.
  • Data set 1 (DS1) and data set 2 (DS2) according to this exemplary embodiment are a plurality of data sets (also called expert data sets) with different categories (responsibility ranges) assigned as correct answers.
  • the evaluation data set DSE is a data set including images in which correct labels are associated with dogs, similar to data set 1 (DS1).
  • control unit 100b differs from the control unit 100a described in the second exemplary embodiment in that it includes an associating unit 1042b instead of the associating unit 1042.
  • the pseudo-label generating unit 1041 and the associating unit 1042b correspond to the dataset generating unit 104 in exemplary embodiment 1, and are configured to implement the dataset generating means in this exemplary embodiment.
  • the association unit 1042b has the following functions in addition to the functions of the association unit 1042. That is, when the object included in the image associated with the pseudo label is given a correct label, the associating unit 1042b associates the area indicated by the area information included in the pseudo label with the area included in the correct label. If the degree of overlap with the area indicated by the area information is greater than or equal to a predetermined degree, the pseudo label is deleted.
  • FIG. 13 is a diagram showing specific examples of data included in data set 2 (DS2), data set 2′ (DS2′), and data set 2′′. (DS2), one of the images contained in each of dataset 2' (DS2') and dataset 2'' is shown. Note that data set 2 (DS2) according to this exemplary embodiment has already been described and will not be repeated here.
  • Data set 2' is a data set in which each of the one or more images included in data set 2 (DS2) is associated with a pseudo-label, as described in exemplary embodiment 2. Since the pseudo-label can be said to be based on the data set 1 (DS1) and the evaluation data set DSE, in the example of FIG. 13, the correct label whose category is "dog" is associated with a part of the object. .
  • the object Ob1 is associated with a pseudo-label including the category "dog", that is, an incorrect pseudo-label.
  • the correct label is associated with the object Ob1.
  • object Ob1 is associated with the correct label in addition to the pseudo label.
  • the associating unit 1042b identifies the corresponding image from the images included in the dataset 2 (DS2) for each image included in the dataset 2' (DS2').
  • the associating unit 1042b selects one of the images included in the data set 2′ (DS2′), and for each of the bounding boxes of the pseudo labels associated with the image, the correct label included in the identified image. Calculate the IOU with the bounding box of .
  • the IOU corresponds to the degree of overlap described above.
  • the associating unit 1042b performs this process for all images included in dataset 2' (DS2').
  • the associating unit 1042b deletes the pseudo label when there is a correct label with an IOU equal to or greater than a predetermined value.
  • the IOU between the pseudo label associated with the object Ob1 and the correct label associated with the object Ob1 is greater than or equal to a predetermined value. Therefore, the associating unit 1042b deletes the pseudo label associated with the object Ob1.
  • the image included in the data set 2′′ shown in FIG. 13 is the image after the pseudo label is deleted. As shown in FIG. 13, in the image, the pseudo label associated with the object Ob1 is deleted, and the object Only correct labels are associated with Ob1.
  • the pseudo label and the correct label attached to the image in the pseudo label and the correct label attached to the image, the area indicated by the area information included in the pseudo label and the area information included in the correct label is greater than or equal to a predetermined degree, the pseudo label is deleted. Therefore, according to the information processing apparatus 10b according to the present exemplary embodiment, if the pseudo label is not appropriate, the pseudo label is deleted and the correct label remains. The effect of being able to improve accuracy is obtained. Note that the pseudo-label is not appropriate, for example, (1) the category of the pseudo-label is different from the category of the object, (2) the bounding box of the pseudo-label does not enclose part of the object, etc. point to
  • the information processing device 10c determines a threshold based on each of the expert datasets, and assigns a pseudo-label to each of the plurality of datasets based on the threshold.
  • FIG. 14 is a block diagram showing the configuration of the information processing device 10c.
  • the information processing device 10c includes a first control section 100c, a first storage section 150c, a second control section 110c, and a second storage section 160c.
  • the first control unit 100c and the second control unit 110c collectively control each unit of the information processing device 10c.
  • the first storage unit 150c and the second storage unit 160c store various programs and data used by the information processing device 10c.
  • first control unit 100c and the second control unit 110c may be integrated.
  • first storage unit 150c and the second storage unit 160c may be integrated.
  • second control unit 110c and the second storage unit 160c may be provided in another device communicably connected to the information processing device 10c.
  • the first storage unit 150c stores data set 1 (DS1), data set 2 (DS2), evaluation data set 1 (DSE1), and evaluation data set 2 (DSE2).
  • Data set 1 (DS1) is the first data set in this exemplary embodiment.
  • Data set 2 (DS2) is the second data set in this exemplary embodiment.
  • Data set 1 (DS1) and data set 2 (DS2) are the expert data sets described above.
  • Evaluation dataset 1 (DSE1) is the first evaluation dataset in this exemplary embodiment.
  • Evaluation dataset 2 (DSE2) is the second evaluation dataset in this exemplary embodiment.
  • FIG. 15 is a diagram showing a specific example of data included in dataset 1 (DS1) and dataset 2 (DS2). Specifically, FIG. 15 shows one of the images contained in dataset 1 (DS1) and one of the images contained in dataset 2 (DS2).
  • Each of these images contains five objects, specifically three people and two bags.
  • each of the two bags is associated with a correct label. That is, data set 1 (DS1) is an expert data set whose responsibility is "bag”.
  • data set 2 (DS2) is an expert data set whose scope of responsibility is "person”.
  • evaluation dataset 1 (DSE1) is a dataset in which a correct label is associated with each of the objects within the scope of responsibility included in each image.
  • evaluation data set 1 (DSE1) is a data set that includes images in which correct labels are associated with bags.
  • the images contained in evaluation dataset 1 (DSE1) may be part of the images contained in dataset 1 (DS1).
  • the images included in the evaluation data set 1 (DSE1) are images that are not included in the data set 1 (DS1), and the correct label is associated with the object of the responsibility range in the data set 1 (DS1). It may be an image that
  • dataset 2 for evaluation is a dataset in which a correct label is associated with each of the objects within the scope of responsibility included in each image.
  • evaluation data set 2 (DSE2) is a data set containing images in which correct labels are associated with people.
  • the images contained in evaluation dataset 2 (DSE2) may be part of the images contained in dataset 2 (DS2).
  • the images included in the evaluation data set 2 (DSE2) are images that are not included in the data set 2 (DS2), and the correct labels are associated with the objects in the scope of responsibility in the data set 2 (DS2). It may be an image that
  • the first control unit 100c includes a first learning unit 101-1, a second learning unit 101-2, a first threshold determination unit 102-1, a second threshold determination unit 102 -2, a first inference unit 103-1, a second inference unit 103-2, a first data set generation unit 104-1, and a second data set generation unit 104-2.
  • the first learning unit 101-1 is configured to implement the first learning means in this exemplary embodiment.
  • the second learning unit 101-2 is configured to implement the second learning means in this exemplary embodiment.
  • the first threshold determination unit 102-1 is a configuration that implements the first threshold determination means in this exemplary embodiment.
  • the second threshold determination unit 102-2 is a configuration that implements the second threshold determination means in this exemplary embodiment.
  • the first inference unit 103-1 is a configuration that implements the first inference means in this exemplary embodiment.
  • the second inference unit 103-2 is a configuration that implements the second inference means in this exemplary embodiment.
  • the first data set generation unit 104-1 is a configuration that implements the first data set generation means in this exemplary embodiment.
  • the second data set generation unit 104-2 is a configuration that implements the second data set generation means in this exemplary embodiment.
  • the first learning unit 101-1 uses the first data set to learn the first detection model. Specifically, the first learning unit 101-1 acquires the data set 1 (DS1), and uses the data set 1 (DS1) to learn the first pseudo label generation object detection model PDM1. conduct. More specifically, the first learning unit 101-1 reads the data set 1 (DS1) stored in the first storage unit 150c, and uses the data set 1 (DS1) to perform the first The object detection model PDM1 for pseudo label generation is learned. Then, the first learning unit 101-1 outputs the learned first object detection model PDM1 for pseudo label generation to the first threshold determination unit 102-1 and the first inference unit 103-1.
  • the first threshold determination unit 102-1 inputs one or more inference results obtained by inputting each of one or more images included in the first evaluation data set into the first detection model, and the one or more inference results. Alternatively, the first threshold is determined by referring to the result of comparison with one or more correct labels attached to each of the plurality of images.
  • the first threshold determination unit 102-1 reads the evaluation data set 1 (DSE1) stored in the first storage unit 150c, and the first threshold value acquired from the first learning unit 101-1 1 to the pseudo label generation object detection model PDM1. Then, the first threshold determination unit 102-1 acquires the inference result output by the first pseudo-label generating object detection model PDM1.
  • DSE1 evaluation data set 1
  • PDM1 pseudo label generation object detection model
  • the first threshold determination unit 102-1 compares each inference result in each of the one or more images included in the evaluation data set 1 (DSE1) with the correct label in each of the images. Based on this, the evaluation value of each inference result is calculated.
  • the evaluation value is, for example, the F value. Note that the details of the calculation process of the F value in the example where the evaluation value is the F value have been described in the second exemplary embodiment, and thus the description will not be repeated here.
  • the first threshold determination unit 102-1 identifies the maximum value among the plurality of F values calculated for each reference value, and sets the reference value linked to the identified F value as the threshold. This threshold is the above-described first threshold.
  • First threshold determination section 102-1 outputs the determined first threshold to first data set generation section 104-1.
  • the first inference unit 103-1 inputs each of the one or more images included in the second data set to the first detection model, thereby obtaining one or more images for each of the one or more images.
  • Get the inference result of Specifically, the first inference unit 103-1 reads the data set 2 (DS2) stored in the first storage unit 150c, and obtains the first pseudo data obtained from the first learning unit 101-1.
  • One or more images included in the data set 2 (DS2) are input to the label generation object detection model PDM1, and one or more inference results PR1 are obtained for each of the images.
  • First inference unit 103-1 outputs obtained inference result PR1 to first data set generation unit 104-1.
  • First data set generation unit 104-1 sets an inference result having a reliability equal to or higher than a first threshold among one or more inference results by first inference unit 103-1 as a pseudo label, A second post-pseudo-labeled data set is generated by associating the pseudo-labels with the corresponding images. Specifically, first data set generation unit 104-1 simulates an inference result having a reliability equal to or higher than a first threshold among one or more inference results PR1 by first inference unit 103-1. Set to label. The first dataset generator 104-1 then associates the pseudo-label with the corresponding image.
  • the first data set generation unit 104-1 stores the generated data set 2' (DS2') in the second storage unit 160c.
  • the second learning unit 101-2 learns the second detection model using the second data set. Specifically, the second learning unit 101-2 acquires the data set 2 (DS2), and uses the data set 2 (DS2) to learn the second pseudo label generation object detection model PDM2. conduct. More specifically, the second learning unit 101-2 reads the data set 2 (DS2) stored in the first storage unit 150c, and uses the data set 2 (DS2) to perform the second The object detection model PDM2 for pseudo label generation is learned. Then, the second learning unit 101-2 outputs the learned second object detection model PDM2 for pseudo label generation to the second threshold determination unit 102-2 and the second inference unit 103-2.
  • the second threshold determination unit 102-2 inputs one or more inference results obtained by inputting each of one or more images included in the second evaluation data set into the second detection model, and the one or more inference results.
  • the second threshold is determined by referring to the result of comparison with one or more correct labels attached to each of the plurality of images.
  • the second threshold determination unit 102-2 reads the evaluation data set 2 (DSE2) stored in the first storage unit 150c, and the second threshold value obtained from the second learning unit 101-2 2 to the object detection model PDM2 for pseudo label generation. Then, the second threshold determination unit 102-2 acquires the inference result output by the second pseudo-label generation object detection model PDM2.
  • DSE2 evaluation data set 2
  • the second threshold determination unit 102-2 compares each inference result in each of the one or more images included in the evaluation data set 2 (DSE2) with the correct label in each of the images. Based on this, the evaluation value of each inference result is calculated.
  • the evaluation value is, for example, the F value. Note that the details of the calculation process of the F value in the example where the evaluation value is the F value have been described in the second exemplary embodiment, and thus the description will not be repeated here.
  • the second threshold determination unit 102-2 identifies the maximum value among the plurality of F values calculated for each reference value, and sets the reference value linked to the identified F value as the threshold. This threshold is the above-described second threshold. Second threshold determination section 102-2 outputs the determined second threshold to second data set generation section 104-2.
  • the second inference unit 103-2 inputs each of the one or more images included in the first data set to the second detection model, thereby obtaining one or more images for each of the one or more images.
  • Get the inference result of Specifically, the second inference unit 103-2 reads the data set 1 (DS1) stored in the first storage unit 150c, and obtains the second pseudo data obtained from the second learning unit 101-2.
  • One or more images included in the data set 1 (DS1) are input to the label generation object detection model PDM2, and one or more inference results PR2 are obtained for each of the images.
  • Second inference unit 103-2 outputs obtained inference result PR2 to second data set generation unit 104-2.
  • the second data set generation unit 104-2 sets an inference result having a reliability equal to or higher than a second threshold among the one or more inference results by the second inference unit 103-2 as a pseudo label,
  • a first post-pseudo-labeled data set is generated by associating the pseudo-labels with the corresponding images.
  • the second data set generation unit 104-2 simulates an inference result having a reliability equal to or higher than the second threshold among the one or more inference results PR2 by the second inference unit 103-2.
  • Second data set generator 104-2 then associates the pseudo-label with the corresponding image.
  • the second data set generation unit 104-2 stores the generated data set 1' (DS1') in the second storage unit 160c.
  • FIG. 16 is a diagram showing specific examples of data included in data set 1' (DS1') and data set 2' (DS2'). Specifically, FIG. 16 shows one of the images contained in data set 1' (DS1') and one of the images contained in data set 2' (DS2').
  • the images included in dataset 1' (DS1') shown in FIG. 16 are the same as the images included in dataset 1 (DS1) (see FIG. 15).
  • each of the two bags is associated with a correct label and each of the three persons with a pseudo-label.
  • the correct label is the correct label associated with the bag that is the responsibility of the dataset 1 (DS1) in the images included in the dataset 1 (DS1) that is the source of the dataset 1′ (DS1′).
  • the pseudo-label is a pseudo-label set by the second data set generator 104-2 based on the inference result PR2.
  • the inference result PR2 is an inference result using the second pseudo-label generation object detection model PDM2 that has been trained using the data set 2 (DS2) whose responsibility range is a person. Associated.
  • the images included in dataset 2' (DS2') shown in FIG. 16 are the same as the images included in dataset 2 (DS2) (see FIG. 15).
  • each of the three persons is associated with a correct label and each of the two bags with a pseudo-label.
  • the correct label is the correct label associated with the person responsible for the dataset 2 (DS2) in the images included in the dataset 2 (DS2) that is the source of the dataset 2′ (DS2′).
  • the pseudo-label is a pseudo-label set by the first data set generator 104-1 based on the inference result PR1.
  • the inference result PR1 is an inference result using the first pseudo-label generation object detection model PDM1 that has been trained using the data set 1 (DS1) whose responsibility range is the bag. Associated.
  • the second storage unit 160c stores data set 1' (DS1'), data set 2' (DS2'), and object detection model DM.
  • Data set 1' (DS1') and data set 2' (DS2') are data sets generated by the second data set generator 104-2 and the first data set generator 104-1, respectively.
  • the object detection model DM is a target image detection model, and the details thereof will be described later.
  • the second control unit 110c includes a re-learning unit 105.
  • the relearning unit 105 is a re-learning unit.
  • the re-learning unit 105 is a configuration that implements pseudo-label reference learning means in this exemplary embodiment.
  • the re-learning unit 105 learns the target image detection model using the data set to which the pseudo label has been assigned. Specifically, the re-learning unit 105 re-learns the first pseudo-label generation object detection model PDM1 or the second pseudo-label generation object detection model PDM2 as the learning.
  • the relearning unit 105 reads the data set 1′ (DS1′) and the data set 2′ (DS2′) from the second storage unit 160c, and reads the data set 1′ (DS1′) and data
  • the set 2′ (DS2′) is used to relearn the first pseudo-label generation object detection model PDM1 or the second pseudo-label generation object detection model PDM2.
  • the relearning unit 105 stores the object detection model DM generated by the relearning in the second storage unit 160c.
  • the relearning unit 105 may learn a new object detection model DM using data set 1′ (DS1′) and data set 2′ (DS2′).
  • the new object detection model DM is a target image detection model that is different from both the first pseudo-label generation object detection model PDM1 and the second pseudo-label generation object detection model PDM2.
  • a threshold is determined based on each of a plurality of expert datasets, and a pseudo label is assigned to each of the plurality of datasets based on the threshold. is adopted.
  • a plurality of datasets each assigned a pseudo-label specifically dataset 1′ (DS1′) and dataset 2′ ( DS2') can be used to re-learn the detection model, so that the effect of further improving the detection accuracy of the object included in the image using the re-learned detection model can be obtained.
  • the information processing apparatus 10c when generating a plurality of data sets each assigned a pseudo label, that is, when a plurality of thresholds for determining pseudo labels are required
  • a plurality of thresholds for determining pseudo labels are required
  • the plurality of thresholds can be automatically determined, an effect is obtained that the cost for adjusting the thresholds can be reduced.
  • the number of expert data sets is "2"
  • the number of expert data sets is not limited to this example.
  • the number of data sets and evaluation data sets stored in the information processing device 10c, and the number of members realizing the learning means, threshold value determination means, inference means, and data set generation means in the information processing device 10c are expert data. It depends on the number of sets. For example, when the number of expert data sets is "3", the information processing device 10c further stores a third data set and a third evaluation data set, and further includes a third learning unit, a third It further comprises a threshold determiner, a third reasoner and a third data set generator.
  • the first data set generation unit 104-1 and the second data set generation unit 104-2 have the function of the associating unit 1042b described in the third exemplary embodiment. good too. That is, the first data set generation unit 104-1 generates data set 2′ (DS2′) in the case where the correct label is assigned to the object included in the image associated with the pseudo label, and the pseudo label If the degree of overlap between the region indicated by the region information included in the label and the region indicated by the region information included in the correct label is greater than or equal to a predetermined degree, the pseudo label may be deleted.
  • DS2′ data set 2′
  • the second data set generation unit 104-2 generates data set 1′ (DS1)′ in the case where the correct label is assigned to the object included in the image associated with the pseudo label, and the pseudo label If the degree of overlap between the region indicated by the region information included in the label and the region indicated by the region information included in the correct label is greater than or equal to a predetermined degree, the pseudo label may be deleted.
  • FIG. 17 is a block diagram showing the configuration of the information processing device 10d.
  • the information processing device 10d includes a control section 100d and a storage section 150d.
  • the control unit 100d centrally controls each unit of the information processing device 10d.
  • the storage unit 150d stores various programs and data used by the information processing device 10d.
  • the control unit 100d includes a non-learning region determination unit 106 in addition to the learning unit 101, the threshold determination unit 102, the inference unit 103, the data set generation unit 104, and the relearning unit 105 according to the second exemplary embodiment described above.
  • the non-learning area determination unit 106 is a configuration that implements non-learning area determination means in this exemplary embodiment.
  • the threshold determination unit 102 determines one or more inference results obtained by inputting each of one or more images included in the evaluation data set DSE into the detection model, A first threshold is determined with reference to a comparison result with one or more correct labels attached to each of one or more images.
  • the method by which the threshold determination unit 102 determines the first threshold has already been described in the second exemplary embodiment above, so the description will not be repeated here.
  • the threshold determination unit 102 further includes one or more inference results obtained by inputting each of one or more images included in the evaluation data set DSE into the detection model; Alternatively, a second threshold that is smaller than the first threshold is determined by referring to a comparison result with one or more correct labels attached to each of the plurality of images.
  • the second threshold is a value smaller than the first threshold.
  • the first threshold may be a value that emphasizes precision
  • the second threshold may be a value that emphasizes recall. good.
  • the first threshold is the confidence that F 0.5 -score, which is the F value that emphasizes precision, takes the maximum value
  • the second threshold is the confidence that F 2 -score, which emphasizes recall, takes the maximum value. degree.
  • the non-learning region determining unit 106 determines the first threshold among one or more inference results by the inference unit 103 in the pseudo-labeled dataset 2′ (DS2′) generated by the dataset generation unit 104.
  • a region corresponding to an inference result having a reliability less than and equal to or greater than the second threshold is determined as a non-learning region that is not subject to learning by the relearning unit 105 .
  • FIG. 18 is a flowchart showing the flow of the information processing method S10d.
  • the information processing method S10d includes steps S101 to S1022, S1023d, S103 to S1041, S1041d, and S1042. Among these steps, steps S101-S1022, S103-S1041, and S1042 have already been described in the exemplary embodiment 2 above, so the description will not be repeated here.
  • Step S1023d the threshold determination unit 1023 determines the first threshold and the second threshold based on the evaluation value. Specifically, the threshold determination unit 1023 identifies the maximum value among the plurality of acquired F values that emphasize the precision, for example, and sets the reference value associated with the identified F value as the first threshold. do. In addition, the threshold determination unit 1023 identifies the maximum value among the plurality of obtained F-values that emphasize recall, for example, and sets the reference value associated with the identified F-value as the second threshold. The threshold determination unit 1023 outputs the determined first threshold and second threshold to the pseudo-label generation unit 1041 .
  • Step S1041d the non-learning region determination unit 106 selects one or more inference results from the inference unit 103 in the pseudo-labeled dataset 2′ (DS2′) generated by the dataset generation unit 104.
  • a region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the second threshold is determined as a non-learning region that is not subject to learning by the relearning unit 105 .
  • the above A region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the second threshold is determined as a non-learning region that is not subject to learning by the relearning unit 105 .
  • a region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the second threshold tends to have a low-reliability pseudo-label even if a pseudo-label is assigned to the region.
  • the detection accuracy of the target image detection model can be improved. It is possible to reduce the cost of generation using the threshold and the second threshold.
  • FIG. 19 is a block diagram showing the configuration of the information processing device 10d.
  • the information processing apparatus 10e includes a first control section 100e, a second control section 110e, a first storage section 150e, and a second storage section 160e.
  • the first control unit 100e and the second control unit 110e collectively control each unit of the information processing device 10e.
  • the first storage unit 150e and the second storage unit 160e store various programs and data used by the information processing device 10e.
  • the first control unit 100e includes, in addition to the configuration of the first control unit 100c of the information processing apparatus 10c shown in the above-described fourth exemplary embodiment, a first non-learning region determination unit 106-1 and a second learning non-execution region determination unit 106-2.
  • the first non-learning area determining section 106-1 is a configuration that implements first non-learning area determining means in this exemplary embodiment.
  • the second non-learning area determination unit 106-2 is a configuration that implements second non-learning area determination means in this exemplary embodiment.
  • the first threshold determination unit 102-1 similarly to the above-described exemplary embodiment 4, inputs each of the one or more images included in the evaluation data set 1 (DSE1) to the detection model to obtain 1 Alternatively, the first threshold is determined by referring to the results of comparison between the plurality of inference results and the one or more correct labels attached to each of the one or more images.
  • the method of determining the first threshold by the first threshold determination unit 102-1 has already been described in the above-described exemplary embodiment 4, so the description will not be repeated here.
  • the first threshold determination unit 102-1 further includes one or more images obtained by inputting each of the one or more images included in the evaluation data set 1 (DSE1) into the detection model.
  • a third threshold that is smaller than the first threshold is determined by referring to the results of comparison between the plurality of inference results and the one or more correct labels attached to each of the one or more images.
  • the third threshold is a value smaller than the first threshold.
  • the first threshold may be a value that emphasizes precision
  • the third threshold may be a value that emphasizes recall. good.
  • the first threshold is the confidence that F 0.5 -score, which is the F value that emphasizes precision, takes the maximum value
  • the third threshold is the confidence that F 2 -score, which emphasizes recall, takes the maximum value. degree.
  • the second threshold determination unit 102-2 similarly to the above-described exemplary embodiment 4, inputs each of the one or more images included in the evaluation data set 2 (DSE2) to the detection model to obtain 1 Alternatively, the second threshold is determined by referring to the results of comparison between the plurality of inference results and the one or more correct labels attached to each of the one or more images.
  • the method of determining the second threshold by the second threshold determining unit 102-2 has already been described in the above-described exemplary embodiment 4, so the description will not be repeated here.
  • the second threshold determination unit 102-2 further includes one or more images obtained by inputting each of the one or more images included in the evaluation data set 2 (DSE2) into the detection model.
  • a fourth threshold smaller than the second threshold is determined by referring to a comparison result between the plurality of inference results and the one or more correct labels attached to each of the one or more images.
  • the fourth threshold is a value smaller than the second threshold.
  • the second threshold may be a value that emphasizes precision
  • the fourth threshold may be a value that emphasizes recall. good.
  • the second threshold is the confidence that F 0.5 -score, which is the F value that emphasizes precision, takes the maximum value
  • the fourth threshold is the confidence that F 2 -score, which emphasizes recall, takes the maximum value. degree.
  • First learning non-implementation region determination unit 106-1 performs the first inference unit 103-1 in the pseudo-labeled dataset 2′ (DS2′) generated by first dataset generation unit 104-1.
  • a region corresponding to an inference result having a reliability less than the first threshold and equal to or greater than the third threshold is defined as a non-learning region that is not subject to learning by the relearning unit 105. Determined as
  • Second learning non-implementation region determination unit 106-2 uses second inference unit 103-2 in pseudo-labeled dataset 1′ (DS1′) generated by second dataset generation unit 104-2.
  • DS1′ pseudo-labeled dataset 1′
  • a region corresponding to an inference result having a reliability less than the second threshold and equal to or greater than the fourth threshold is a non-learning region that is not subject to learning by the relearning unit 105. Determined as
  • the relearning unit 105 determines whether a non-learning area is not subject to learning.
  • the second inference unit 103-2 performs Among one or more inference results, a region corresponding to an inference result having a reliability less than the second threshold and equal to or greater than the fourth threshold is defined as a non-learning region that is not subject to learning by the relearning unit 105. decide.
  • a region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the third threshold tends to be a pseudo-label with low reliability even if a pseudo-label is assigned.
  • a pseudo label is assigned to an area corresponding to an inference result having a reliability lower than the second threshold and equal to or higher than the fourth threshold, there is a tendency for the pseudo label to be a low-reliability pseudo label.
  • the detection accuracy of the target image detection model can be improved.
  • the second threshold, the third threshold, and the fourth threshold can be improved.
  • Some or all of the functions of the information processing devices 10, 10a to 10e, 20 and 20a may be realized by hardware such as integrated circuits (IC chips) or by software.
  • the information processing devices 10, 10a to 10e, 20 and 20a are implemented by computers that execute program instructions, which are software that implements each function, for example.
  • An example of such a computer (hereinafter referred to as computer C) is shown in FIG.
  • Computer C comprises at least one processor C1 and at least one memory C2.
  • a program P for operating the computer C as the information processing apparatuses 10, 10a to 10e, 20 and 20a is recorded in the memory C2.
  • the processor C1 reads the program P from the memory C2 and executes it, thereby realizing each function of the information processing devices 10, 10a to 10e, 20 and 20a.
  • processor C1 for example, CPU (Central Processing Unit), GPU (Graphic Processing Unit), DSP (Digital Signal Processor), MPU (Micro Processing Unit), FPU (Floating point number Processing Unit), PPU (Physics Processing Unit) , a microcontroller, or a combination thereof.
  • memory C2 for example, a flash memory, HDD (Hard Disk Drive), SSD (Solid State Drive), or a combination thereof can be used.
  • the computer C may further include a RAM (Random Access Memory) for expanding the program P during execution and temporarily storing various data.
  • Computer C may further include a communication interface for sending and receiving data to and from other devices.
  • Computer C may further include an input/output interface for connecting input/output devices such as a keyboard, mouse, display, and printer.
  • the program P can be recorded on a non-temporary tangible recording medium M that is readable by the computer C.
  • a recording medium M for example, a tape, disk, card, semiconductor memory, programmable logic circuit, or the like can be used.
  • the computer C can acquire the program P via such a recording medium M.
  • the program P can be transmitted via a transmission medium.
  • a transmission medium for example, a communication network or broadcast waves can be used.
  • Computer C can also obtain program P via such a transmission medium.
  • (Appendix 1) a learning means for learning a detection model using the first data set; One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images
  • a threshold determination means for determining the first threshold with reference to the comparison result of inference means for obtaining one or more inference results for each of the one or more images included in the second data set by inputting each of the one or more images into the detection model; setting an inference result having a reliability equal to or higher than the first threshold among the one or more inference results by the inference means as a pseudo label, and associating the pseudo label with the corresponding image, and data set generation means for generating a data set of .
  • the inference result of the image included in the evaluation data set by the detection model trained using the first data set is compared with the correct label associated with the image. Based on this, the first threshold for setting the pseudo-label is automatically determined. Therefore, according to the configuration of Supplementary Note 1, it is possible to reduce the cost for adjusting the first threshold. Then, according to the configuration of Supplementary Note 1, from the inference result of the image included in the second data set by the detection model, the inference result having a confidence level equal to or higher than the automatically determined first threshold is set as the pseudo label. set to associate the pseudo-label with the corresponding image. Therefore, according to the configuration of Supplementary Note 1, it is possible to reduce the cost of generating a data set including images to which pseudo labels are assigned.
  • An information processing apparatus further comprising pseudo-label reference learning means for learning a target image detection model for detecting an object included in the target image, using the data set to which the pseudo label has been assigned.
  • Appendix 3 The information processing device according to appendix 2, The information processing apparatus, wherein the pseudo label reference learning means re-learns the detection model as the learning of the target image detection model.
  • the detection model is re-learned using the data set after pseudo-labeling. Therefore, according to the configuration of Supplementary Note 3, it is possible to reduce the cost of re-learning the detection model. Also, if an appropriate value can be determined as the threshold, the number of times the threshold is adjusted can be reduced, and the number of re-learning of the detection model required each time the threshold is adjusted can be reduced. As a result, it is possible to reduce the time until re-learning of the detection model is completed.
  • the threshold determination means is One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct answers given to each of the one or more images Determine a second threshold that is less than the first threshold by referring to the comparison result with the label;
  • the information processing device is Inference having reliability less than the first threshold and equal to or greater than the second threshold among one or more inference results by the inference means in the pseudo-labeled dataset generated by the dataset generation means
  • An information processing apparatus further comprising non-learning area determination means for determining an area corresponding to the result as a non-learning area that is not subject to learning by the pseudo label reference learning means.
  • a region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the second threshold tends to be a pseudo-label with low reliability even if a pseudo-label is assigned.
  • the information processing device includes area information indicating the area of the object included in the image associated with the correct label, and category information indicating the category of the object
  • the correct label and the pseudo label include area information and category information. Therefore, according to the configuration of Supplementary Note 5, it is possible to improve the accuracy of detecting an object included in an image using a detection model that has been re-learned using a data set after pseudo-labeling. .
  • the information processing device according to appendix 5, At least part of the one or more images included in the second data set is labeled with one or more correct labels,
  • the data set generation means is When a correct label is assigned to an object included in an image associated with the pseudo label, the area indicated by the area information included in the pseudo label and the area indicated by the area information included in the correct label 2.
  • the pseudo label in the pseudo labels and the correct labels attached to the images included in the second data set, the regions indicated by the region information included in the pseudo labels and the regions indicated by the region information included in the correct labels If the degree of overlap with is greater than or equal to a predetermined degree, the pseudo label is deleted. Therefore, according to the configuration of Supplementary Note 6, when the pseudo label is not appropriate, the pseudo label is deleted and the correct label remains. Therefore, the accuracy of object detection using the retrained detection model is improved. becomes possible. In particular, in the case of datasets in which the correct labels are attached to visually similar objects, there is a high probability that pseudo-labels with incorrect categories will be associated with the objects. On the other hand, according to the configuration of Supplementary Note 6, since the erroneous pseudo label can be deleted, the pseudo label can be generated with high accuracy, and the object detection accuracy using the target image detection model can be improved. can be improved.
  • the pseudo-label is not appropriate, for example, (1) the category of the pseudo-label is different from the category of the object, (2) the bounding box of the pseudo-label does not enclose part of the object, etc. point to
  • the information processing device according to appendix 5 or 6,
  • the threshold determination means sets the first threshold for each category
  • the data set generation means is setting an inference result having a reliability equal to or higher than the first threshold set for each category among the one or more inference results by the inference means as a pseudo label, and associating the pseudo label with the corresponding image;
  • An information processing apparatus characterized by generating a data set after pseudo-labeling.
  • the inference result of the image included in the evaluation data set by the detection model trained using the first data set and the correct label associated with the image for each category is set to a first threshold, and an inference result equal to or greater than the first threshold is set as a pseudo label. Therefore, according to the configuration of Supplementary Note 7, the accuracy of setting pseudo labels can be improved.
  • Appendix 8 The information processing device according to any one of Appendices 1 to 7, The information processing apparatus, wherein the threshold determination means determines the first threshold by referring to the matching rate and the recall rate indicated by the comparison result.
  • the first threshold is determined by referring to the precision and recall calculated from . Therefore, according to the configuration of Supplementary Note 8, it is possible to improve the accuracy of setting pseudo labels.
  • the configuration of Supplementary Note 8 since it is possible to set pseudo labels in consideration of both the quality of learning data (relevance rate) and the amount of learning data (recall rate), highly accurate target image detection model can be generated.
  • Appendix 9 The information processing device according to any one of Appendices 1 to 8, The information processing apparatus, wherein the evaluation data set is included in the first data set.
  • the images included in the evaluation data set are included in the first data set. Therefore, according to the configuration of Supplementary Note 9, there is no need to newly perform a high-cost correct answer assignment work for generating the evaluation data set. Further, according to the configuration of Supplementary Note 9, it is possible to reduce the number of images prepared in advance.
  • Appendix 10 The information processing device according to any one of Appendices 1 to 8, The information processing apparatus, wherein the evaluation data set is generated by giving a correct label to a part of the second data set.
  • the images included in the evaluation data set are generated by giving correct labels to part of the second data set. Therefore, according to the configuration of Supplementary Note 10, a part of the data set to which the pseudo-label is assigned is used as the evaluation data set to determine the threshold value, so the accuracy of the assigned pseudo-label is improved. It is possible to Further, according to the configuration of Supplementary Note 10, it is possible to reduce the number of images prepared in advance.
  • the inference result of the image included in the first evaluation data set by the detection model trained using the first data set, and the correct label associated with the image. automatically determine a first threshold for setting pseudo-labels to the second data set based on the comparison of .
  • the inference result of the image included in the second evaluation data set by the detection model trained using the second data set, and the correct answer associated with the image A second threshold for setting the pseudo-labels to the first data set is automatically determined based on the comparison with the labels. Therefore, according to the configuration of Supplementary Note 11, it is possible to reduce the cost for adjusting the first threshold and the second threshold.
  • Appendix 12 The information processing device according to Appendix 11, A pseudo label for learning a target image detection model for detecting an object included in the target image using the first data set after the pseudo labeling and the second data set after the pseudo labeling An information processing apparatus, further comprising reference learning means.
  • Appendix 13 The information processing device according to Appendix 12, The information processing apparatus, wherein the pseudo label reference learning means re-learns the first detection model and the second detection model as learning of the target image detection model.
  • the first detection model and the second detection model are re-learned using the pseudo-labeled first data set and the pseudo-labeled second data set. Therefore, according to the configuration of Supplementary Note 13, it is possible to reduce the cost of re-learning the first detection model and the second detection model. Also, if it is possible to determine appropriate values for the first threshold and the second threshold, the number of threshold adjustments can be reduced, and the number of re-learning of the detection model required each time the threshold is adjusted can be reduced. can be reduced. As a result, it is possible to reduce the time until re-learning of the detection model is completed.
  • the information processing device is One or more inference results obtained by inputting each of one or more images included in the first evaluation data set into the first detection model, and one or more inference results attached to each of the one or more images Determine a third threshold that is smaller than the first threshold with reference to a comparison result with one or more correct labels;
  • the second threshold determination means is One or more inference results obtained by inputting each of one or more images included in the second evaluation data set into the second detection model, and one or more inference results attached to each of the one or more images Determine a fourth threshold that is smaller than the second threshold by referring to a comparison result with one or more correct labels;
  • the information processing device is In the pseudo-labeled second data set generated by the first data set generation means, one or more inference results by the first inference means are less than the first threshold and the third a first non-learning region determination means for determining a region corresponding to an inference result having a reliability equal to or higher than a threshold of as
  • a region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the third threshold tends to be a pseudo-label with low reliability even if a pseudo-label is assigned.
  • a pseudo label is assigned to an area corresponding to an inference result having a reliability lower than the second threshold and equal to or higher than the fourth threshold, there is a tendency for the pseudo label to be a low-reliability pseudo label.
  • a pseudo label is determined using an automatically determined threshold, and a target image detection model trained using a data set including images associated with the pseudo label is used. to detect objects included in the target image. Therefore, according to the configuration of Supplementary Note 15, it is possible to detect an object included in the target image using the target image detection model in which the cost for adjusting the threshold value is reduced.
  • a region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the second threshold tends to be a pseudo-label with low reliability even if a pseudo-label is assigned.
  • re-learning can be performed using pseudo labels with relatively high reliability. The detection accuracy of the target image detection model can be improved.
  • (Appendix 17) a learning step of learning a detection model using the first data set; One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images
  • a region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the second threshold tends to be a pseudo-label with low reliability even if a pseudo-label is assigned.
  • the target image detection model includes: A learning process for learning a detection model using the first data set; One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination process for determining a threshold with reference to the result of comparison with Inference processing for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the second data set into the detection model; By setting an inference result having a reliability equal to or higher than the threshold among the one or more inference results by the inference process as a pseudo label and associating the pseudo label with the corresponding image, the data set after pseudo labeling and a pseudo-label reference learning process for learning the target image detection model by referring to the data set after the pseudo-labeling. Processing method.
  • (Appendix 20) a learning step of learning a detection model using the first data set; One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images
  • the inference result of the image included in the evaluation data set by the detection model trained using the first data set is compared with the correct label associated with the image. Based on this, the threshold value for pseudo-label setting is automatically determined. Therefore, according to the configuration of Supplementary Note 20, it is possible to reduce the cost for adjusting the threshold. Then, according to the configuration of Supplementary Note 20, learning of the target image detection model is performed using the data set to which the pseudo label has been assigned. Therefore, according to the configuration of Supplementary Note 20, it is possible to manufacture the target image detection model while reducing the cost for adjusting the threshold value. As a result, it is possible to reduce the cost of learning the target image detection model.
  • the number of times the threshold value is adjusted can be reduced, and the number of times of learning required each time the threshold value is adjusted can be reduced. As a result, it is possible to reduce the time until the learning of the target image detection model is completed.
  • Appendix 21 A method for manufacturing a detection model according to Appendix 20, In the threshold determination step, a second threshold smaller than the first threshold is also determined with reference to the comparison result, In the dataset generating step, in the pseudo-labeled dataset, one or more inference results obtained in the inference step have a degree of confidence less than the first threshold and greater than or equal to the second threshold.
  • a detection model manufacturing method wherein a region corresponding to the result is determined as a non-learning region that is not subject to learning in the pseudo label reference learning step.
  • a region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the second threshold tends to be a pseudo-label with low reliability even if a pseudo-label is assigned.
  • a program for causing a computer to function as an information processing device comprising: a learning means for learning a detection model using the first data set; One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images
  • a program for causing a computer to function as an information processing device comprising: acquisition means for acquiring a target image; detection means for detecting an object included in the target image using a target image detection model; function as The target image detection model includes: A learning process for learning a detection model using the first data set; One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination process for determining a threshold with reference to the result of comparison with Inference processing for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the second data set into the detection model; By setting an inference result having a reliability equal to or higher than the threshold among the one or more inference results by the inference process as a pseudo label and associating the pseudo label with the corresponding image, the data set after pseudo labeling and a pseudo-label reference learning process for learning the target image detection model by
  • At least one processor is provided, and the processor inputs each of one or more images included in the evaluation data set into the detection model, and a learning process of learning a detection model using a first data set.
  • a threshold determination process for determining a first threshold with reference to a comparison result between one or more inference results obtained by and one or more correct labels attached to each of the one or more images; inference processing for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the two data sets into the detection model; of the one or more inference results by setting an inference result having a confidence level equal to or higher than the first threshold as a pseudo-label, and associating the pseudo-label with the corresponding image, a data set after pseudo-labeling and an information processing device that executes a data set generation process that generates a
  • the information processing apparatus may further include a memory, in which the learning process, the threshold value determination process, the inference process, and the data set generation process are executed by the processor.
  • a program may be stored for causing the Also, this program may be recorded in a computer-readable non-temporary tangible recording medium.
  • At least one processor is provided, and the processor performs an acquisition process of acquiring a target image and a detection process of detecting an object included in the target image using a target image detection model, and the target
  • the image detection model includes a learning process for learning the detection model using the first data set, and one or more images obtained by inputting each of one or more images included in the evaluation data set into the detection model.
  • Inference processing for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images into the detection model, obtaining one or more inference results from the inference processing Among them, a dataset generation process for generating a dataset after pseudo-labeling by setting an inference result having a reliability equal to or higher than the first threshold as a pseudo-label and associating the pseudo-label with the corresponding image; and an information processing apparatus that is learned by a pseudo-label reference learning process of learning the detection model for the target image by referring to the data set after the pseudo-labeling.
  • the information processing apparatus may further include a memory, and the memory may store a program for causing the processor to execute the acquisition process and the detection process. Also, this program may be recorded in a computer-readable non-temporary tangible recording medium.

Abstract

In order to generate a high-accuracy detection model while suppressing generation costs, this information processing device (10) comprises a training unit (101), a threshold value determination unit (102), an inference unit (103), and a dataset generation unit (104). The training unit trains a detection model using a first dataset. The threshold value determination unit compares an inference result obtained by inputting an image included in a dataset for evaluation to the detection model and a correct-answer label attached to the image, thereby determining a threshold value. The inference unit inputs an image included in a second dataset to the detection model and acquires an inference result for the image. The dataset generation unit: associates an inference result that is produced by the inference unit and that has a reliability greater than or equal to the threshold value with the corresponding image, the inference result being associated as a pseudo-label; and generates a dataset after addition of the pseudo-label.

Description

情報処理装置、情報処理方法、検知モデルの製造方法、およびプログラムInformation processing device, information processing method, detection model manufacturing method, and program
 本発明は、検知モデルの再学習に用いるデータセットに含まれる1又は複数の画像に、擬似ラベルを関連付ける技術に関する。 The present invention relates to a technique for associating pseudo-labels with one or more images included in a dataset used for re-learning a detection model.
 画像に含まれるオブジェクトを検知する検知モデルは、多数の正解ありデータを用いて学習を行うことで高精度の検知モデルとなる。一方で、多数のデータを収集し、当該データに正解ラベルを関連付ける処理は高コストである。このため、少数の正解ありデータから高精度の検知モデルを生成するために、正解なしデータに対して擬似ラベルを関連付ける技術が知られている。 A detection model that detects objects contained in images becomes a highly accurate detection model by learning using a large number of correct data. On the other hand, the process of collecting a large amount of data and associating correct labels with the data is expensive. For this reason, in order to generate a highly accurate detection model from a small number of correct data, there is known a technique of associating false labels with non-correct data.
 擬似ラベルとは、正解ありデータセットのみで学習を行った検知モデルを用いて、正解なしデータセットの画像を推論した推論結果のうちの信頼できるものを指す。例えば、非特許文献1には、推論結果の信頼度が閾値以上であるものを擬似ラベルとして採用する手法が開示されている。 A pseudo-label is a reliable inference result obtained by inferring an image from a non-correct dataset using a detection model trained only on a dataset with correct answers. For example, Non-Patent Document 1 discloses a method of adopting an inference result whose reliability is equal to or higher than a threshold value as a pseudo label.
 非特許文献1に記載の手法は、適切な閾値を設定するための調整が必要であるため、この調整にかかる時間的コスト及び計算的コストに削減の余地がある。換言すれば、擬似ラベルを用いた高精度の検知モデルの生成コストをより低減させる余地がある。 Since the method described in Non-Patent Document 1 requires adjustment to set an appropriate threshold, there is room for reducing the time and computational costs required for this adjustment. In other words, there is room for further reducing the cost of generating highly accurate detection models using pseudo labels.
 本発明の一態様は、上記の問題に鑑みてなされたものである。すなわち、本発明の一態様は、生成コストを抑制しつつ、高精度な検知モデルを生成することのできる技術を提供することを一目的とする。 One aspect of the present invention has been made in view of the above problems. That is, an object of one aspect of the present invention is to provide a technology capable of generating a highly accurate detection model while suppressing generation costs.
 本発明の一態様に係る情報処理装置は、第1のデータセットを用いて検知モデルの学習を行う学習手段と、評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第1の閾値を決定する閾値決定手段と、第2のデータセットに含まれる1又は複数の画像の各々を前記検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論手段と、前記推論手段による1又は複数の推論結果のうち、前記第1の閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成するデータセット生成手段と、を備える。 An information processing apparatus according to an aspect of the present invention includes learning means for learning a detection model using a first data set, and inputting each of one or more images included in an evaluation data set to the detection model. threshold determination means for determining a first threshold with reference to a comparison result between one or more inference results obtained by the above and one or more correct labels attached to each of the one or more images; inference means for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the second data set into the detection model; Data after pseudo-labeling by setting an inference result having a reliability equal to or higher than the first threshold among one or more inference results by the means as a pseudo-label and associating the pseudo-label with the corresponding image a data set generation means for generating the set.
 本発明の一態様に係る情報処理装置は、第1のデータセットを用いて第1の検知モデルの学習を行う第1の学習手段と、第2のデータセットを用いて第2の検知モデルの学習を行う第2の学習手段と、第1の評価用データセットに含まれる1又は複数の画像の各々を前記第1の検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第1の閾値を決定する第1の閾値決定手段と、第2の評価用データセットに含まれる1又は複数の画像の各々を前記第2の検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第2の閾値を決定する第2の閾値決定手段と、前記第2のデータセットに含まれる1又は複数の画像の各々を前記第1の検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する第1の推論手段と、前記第1のデータセットに含まれる1又は複数の画像の各々を前記第2の検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する第2の推論手段と、前記第1の推論手段による1又は複数の推論結果のうち、前記第1の閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後の第2のデータセットを生成する第1のデータセット生成手段と、前記第2の推論手段による1又は複数の推論結果のうち、前記第2の閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後の第1のデータセットを生成する第2のデータセット生成手段と、を備える。 An information processing apparatus according to an aspect of the present invention includes first learning means for learning a first detection model using a first data set, and training of a second detection model using a second data set. second learning means for performing learning; one or more inference results obtained by inputting each of one or more images included in the first evaluation data set into the first detection model; or a first threshold determination means for determining a first threshold with reference to a comparison result with one or more correct labels attached to each of the plurality of images; or comparing one or more inference results obtained by inputting each of a plurality of images into the second detection model and one or more correct labels attached to each of the one or more images; second threshold determination means for determining a second threshold with reference to the one or more images included in the second data set, by inputting each of the one or more images into the first detection model; first inference means for obtaining one or more inference results for each of a plurality of images; and inputting each of the one or more images included in the first data set into the second detection model. a second inference means for obtaining one or more inference results for each of the one or more images; a first data set generation means for generating a second data set after pseudo-labeling by setting an inference result having a reliability of to a pseudo-label and associating the pseudo-label with the corresponding image; By setting an inference result having a reliability equal to or higher than the second threshold among one or more inference results by the second inference means as a pseudo label and associating the pseudo label with the corresponding image, and a second data set generating means for generating the first data set after application.
 本発明の一態様に係る情報処理装置は、対象画像を取得する取得手段と、対象画像用検知モデルを用いて、前記対象画像に含まれるオブジェクトの検知を行う検知手段と、を備え、前記対象画像用検知モデルは、第1のデータセットを用いて検知モデルの学習を行う学習処理、評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第1の閾値を決定する閾値決定処理、第2のデータセットに含まれる1又は複数の画像の各々を前記検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論処理、前記推論処理による1又は複数の推論結果のうち、前記第1の閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成するデータセット生成処理、及び前記疑似ラベル付与後のデータセットを参照して、前記対象画像用検知モデルの学習を行う擬似ラベル参照学習処理によって学習されたものである。 An information processing apparatus according to an aspect of the present invention includes acquisition means for acquiring a target image, and detection means for detecting an object included in the target image using a target image detection model, the target The image detection model includes a learning process for learning the detection model using the first data set, and one or more images obtained by inputting each of one or more images included in the evaluation data set into the detection model. with reference to the comparison result between the inference result and one or more correct labels attached to each of the one or more images to determine the first threshold, included in the second data set Inference processing for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images into the detection model, obtaining one or more inference results from the inference processing Among them, a dataset generation process for generating a dataset after pseudo-labeling by setting an inference result having a reliability equal to or higher than the first threshold as a pseudo-label and associating the pseudo-label with the corresponding image; and a pseudo-label reference learning process for learning the detection model for the target image by referring to the data set after the pseudo-labeling.
 本発明の一態様に係る情報処理方法は、第1のデータセットを用いて検知モデルの学習を行う学習工程と、評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第1の閾値を決定する閾値決定工程と、第2のデータセットに含まれる1又は複数の画像の各々を前記検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論工程と、前記推論工程による1又は複数の推論結果のうち、前記第1の閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成するデータセット生成工程と、を含む。 An information processing method according to an aspect of the present invention includes a learning step of learning a detection model using a first data set; a threshold determination step of determining a first threshold with reference to a comparison result between one or more inference results obtained by and one or more correct labels attached to each of the one or more images; an inference step of obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the second data set into the detection model; By setting an inference result having a reliability equal to or higher than the first threshold among one or more inference results by the process to a pseudo-label and associating the pseudo-label with the corresponding image, data after pseudo-labeling and a data set generation step for generating the set.
 本発明の一態様に係る情報処理方法は、対象画像を取得することと、対象画像用検知モデルを用いて、前記対象画像に含まれるオブジェクトの検知を行うことと、を含み、前記対象画像用検知モデルは、第1のデータセットを用いて検知モデルの学習を行う学習処理、評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第1の閾値を決定する閾値決定処理、第2のデータセットに含まれる1又は複数の画像の各々を前記検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論処理、前記推論処理による1又は複数の推論結果のうち、前記第1の閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成するデータセット生成処理、及び前記疑似ラベル付与後のデータセットを参照して、前記対象画像用検知モデルの学習を行う擬似ラベル参照学習処理によって学習されたものである。 An information processing method according to an aspect of the present invention includes acquiring a target image, and detecting an object included in the target image using a target image detection model. The detection model includes a learning process for learning the detection model using the first data set, and one or more inferences obtained by inputting each of one or more images included in the evaluation data set into the detection model. Threshold determination processing for determining a first threshold with reference to a comparison result between the result and one or more correct labels attached to each of the one or more images, and one or more included in the second data set Inference processing for obtaining one or more inference results for each of the one or more images by inputting each of a plurality of images into the detection model, one or more inference results by the inference processing, A dataset generation process for generating a dataset after pseudo-labeling by setting an inference result having a reliability equal to or higher than the first threshold as a pseudo-label and associating the pseudo-label with the corresponding image; It is learned by a pseudo-label reference learning process of learning the detection model for the target image by referring to the data set to which pseudo-labels have been assigned.
 本発明の一態様に係る検知モデルの製造方法は、第1のデータセットを用いて検知モデルの学習を行う学習工程と、評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第1の閾値を決定する閾値決定工程と、第2のデータセットに含まれる1又は複数の画像の各々を前記検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論工程と、前記推論工程による1又は複数の推論結果のうち、前記第1の閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成するデータセット生成工程と、前記疑似ラベル付与後のデータセットを用いて、対象画像に含まれるオブジェクトの検知のための対象画像用検知モデルの学習を行う擬似ラベル参照学習工程とを含む。 A detection model manufacturing method according to an aspect of the present invention includes a learning step of learning a detection model using a first data set; A threshold determination step of determining a first threshold by referring to a comparison result between one or more inference results obtained by inputting to and one or more correct labels attached to each of the one or more images and an inference step of obtaining one or more inference results for each of the one or more images contained in the second data set by inputting each of the one or more images into the detection model; setting an inference result having a reliability equal to or higher than the first threshold among the one or more inference results in the inference step as a pseudo label, and associating the pseudo label with the corresponding image, and a pseudo-label reference learning step of learning a target image detection model for detecting an object included in the target image using the pseudo-labeled dataset. including.
 本発明の一態様に係るプログラムは、コンピュータを情報処理装置として機能させるためのプログラムであって、前記コンピュータを、第1のデータセットを用いて検知モデルの学習を行う学習手段と、評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して閾値を決定する閾値決定手段と、第2のデータセットに含まれる1又は複数の画像の各々を前記検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論手段と、前記推論手段による1又は複数の推論結果のうち、前記閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成するデータセット生成手段と、として機能させる。 A program according to an aspect of the present invention is a program for causing a computer to function as an information processing device, the computer comprising: learning means for learning a detection model using a first data set; Comparison of one or more inference results obtained by inputting each of one or more images included in the set into the detection model and one or more correct labels attached to each of the one or more images threshold determination means for determining a threshold value with reference to the result; or an inference means for obtaining a plurality of inference results, and among the one or more inference results obtained by the inference means, an inference result having a reliability equal to or higher than the threshold is set as a pseudo label, and the pseudo label is assigned to the corresponding image. By associating with , it functions as a dataset generation means for generating a dataset after pseudo-labeling.
 本発明の一態様に係るプログラムは、コンピュータを情報処理装置として機能させるためのプログラムであって、前記コンピュータを、対象画像を取得する取得手段と、対象画像用検知モデルを用いて、前記対象画像に含まれるオブジェクトの検知を行う検知手段と、として機能させ、前記対象画像用検知モデルは、第1のデータセットを用いて検知モデルの学習を行う学習処理、評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して閾値を決定する閾値決定処理、第2のデータセットに含まれる1又は複数の画像の各々を前記検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論処理、前記推論処理による1又は複数の推論結果のうち、前記閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成するデータセット生成処理、及び前記疑似ラベル付与後のデータセットを参照して、前記対象画像用検知モデルの学習を行う擬似ラベル参照学習処理によって学習されたものである。 A program according to an aspect of the present invention is a program for causing a computer to function as an information processing apparatus, wherein the computer acquires a target image using an acquisition unit for acquiring a target image and a target image detection model. and the detection means for detecting an object included in the target image detection model is a learning process for learning the detection model using the first data set, the evaluation data set 1 or A threshold with reference to a comparison result between one or more inference results obtained by inputting each of a plurality of images into the detection model and one or more correct labels attached to each of the one or more images and inputting each of the one or more images contained in the second data set into the detection model to obtain one or more inference results for each of the one or more images. an inference process, setting an inference result having a reliability equal to or higher than the threshold among one or more inference results from the inference process as a pseudo-label, and associating the pseudo-label with the corresponding image to assign the pseudo-label It is learned by a data set generation process for generating a later data set and a pseudo label reference learning process for learning the detection model for the target image by referring to the data set after the pseudo labeling.
 本発明の一態様によれば、生成コストを抑制しつつ、高精度な検知モデルを生成することができる。 According to one aspect of the present invention, it is possible to generate a highly accurate detection model while suppressing the generation cost.
本発明の例示的実施形態1に係る情報処理装置の構成を示すブロック図である。1 is a block diagram showing the configuration of an information processing device according to exemplary Embodiment 1 of the present invention; FIG. 図1に示す情報処理装置が実行する情報処理方法の流れを示すフロー図である。2 is a flow chart showing the flow of an information processing method executed by the information processing apparatus shown in FIG. 1; FIG. 本発明の例示的実施形態1に係る情報処理装置の構成を示すブロック図である。1 is a block diagram showing the configuration of an information processing device according to exemplary Embodiment 1 of the present invention; FIG. 図3に示す情報処理装置が実行する情報処理方法の流れを示すフロー図である。4 is a flow chart showing the flow of an information processing method executed by the information processing apparatus shown in FIG. 3; FIG. 本発明の例示的実施形態2に係る情報処理装置の構成を示すブロック図である。FIG. 7 is a block diagram showing the configuration of an information processing apparatus according to exemplary Embodiment 2 of the present invention; 本発明の例示的実施形態2に係る第1のデータセットおよび第2のデータセットに含まれるデータの具体例を示す図である。FIG. 10 is a diagram showing specific examples of data included in the first data set and the second data set according to exemplary embodiment 2 of the present invention; 図5に示す情報処理装置が算出する適合率と再現率との関係を示すグラフである。6 is a graph showing the relationship between the relevance rate and the recall rate calculated by the information processing apparatus shown in FIG. 5; 本発明の例示的実施形態2に係る、擬似ラベルが付与されたデータセットに含まれるデータの具体例を示す図である。FIG. 10 is a diagram showing a specific example of data included in a pseudo-labeled data set according to illustrative embodiment 2 of the present invention; 図5に示す情報処理装置が実行する情報処理方法の流れを示すフロー図である。FIG. 6 is a flowchart showing the flow of an information processing method executed by the information processing apparatus shown in FIG. 5; 本発明の例示的実施形態2に係る情報処理装置の構成を示すブロック図である。FIG. 7 is a block diagram showing the configuration of an information processing apparatus according to exemplary Embodiment 2 of the present invention; 本発明の例示的実施形態3に係る情報処理装置の構成を示すブロック図である。FIG. 11 is a block diagram showing the configuration of an information processing apparatus according to exemplary Embodiment 3 of the present invention; 本発明の例示的実施形態3に係る第1のデータセットおよび第2のデータセットに含まれるデータの具体例を示す図である。FIG. 10 is a diagram showing specific examples of data contained in the first data set and the second data set according to exemplary embodiment 3 of the present invention; 本発明の例示的実施形態3に係る第2のデータセットと、第2のデータセットから生成された、擬似ラベルが付与されたデータセットとに含まれるデータの具体例を示す図である。FIG. 11 shows an example of data contained in a second dataset and a pseudo-labeled dataset generated from the second dataset according to illustrative embodiment 3 of the present invention; 本発明の例示的実施形態4に係る情報処理装置の構成を示すブロック図である。FIG. 11 is a block diagram showing the configuration of an information processing apparatus according to exemplary Embodiment 4 of the present invention; 本発明の例示的実施形態4に係る第1のデータセットおよび第2のデータセットに含まれるデータの具体例を示す図である。FIG. 10 is a diagram showing specific examples of data contained in the first data set and the second data set according to illustrative embodiment 4 of the present invention; 本発明の例示的実施形態4に係る、擬似ラベルが付与されたデータセットに含まれるデータの具体例を示す図である。FIG. 10 is a diagram showing an example of data contained in a pseudo-labeled data set according to illustrative embodiment 4 of the present invention; 本発明の例示的実施形態5に係る情報処理装置の構成を示すブロック図である。FIG. 12 is a block diagram showing the configuration of an information processing apparatus according to exemplary Embodiment 5 of the present invention; 図17に示す情報処理装置が実行する情報処理方法の流れを示すフロー図である。FIG. 18 is a flowchart showing a flow of an information processing method executed by the information processing apparatus shown in FIG. 17; 本発明の例示的実施形態6に係る情報処理装置の構成を示すブロック図である。FIG. 12 is a block diagram showing the configuration of an information processing device according to exemplary Embodiment 6 of the present invention; 本発明の各例示的実施形態における情報処理装置のハードウェア構成の一例を示すブロック図である。1 is a block diagram showing an example of a hardware configuration of an information processing device in each exemplary embodiment of the present invention; FIG.
 〔例示的実施形態1〕
 本発明の第1の例示的実施形態について、図面を参照して詳細に説明する。本例示的実施形態は、後述する例示的実施形態の基本となる形態である。
[Exemplary embodiment 1]
A first exemplary embodiment of the invention will now be described in detail with reference to the drawings. This exemplary embodiment is the basis for the exemplary embodiments described later.
 <情報処理装置10の概要>
 本例示的実施形態に係る情報処理装置10は、対象のデータセットに疑似ラベルを付与することによって擬似ラベル付与後のデータセットを生成するデータセット生成装置としての機能を有している。
<Overview of Information Processing Device 10>
The information processing device 10 according to this exemplary embodiment has a function as a dataset generation device that generates a dataset after adding a pseudo-label by attaching a pseudo-label to a target dataset.
 より具体的に言えば、情報処理装置10は、まず、第1のデータセットを用いて検知モデルの学習を行う。さらに情報処理装置10は、評価用データセットに含まれる1又は複数の画像の各々を検知モデルに入力して得られる1又は複数の推論結果と、当該画像の各々に付された1又は複数の正解ラベルとの比較結果とを参照して第1の閾値を決定する。さらに情報処理装置10は、第2のデータセットに含まれる1又は複数の画像の各々を検知モデルに入力することによって、当該画像の各々についての1又は複数の推論結果を取得する。さらに情報処理装置10は、第2のデータセットに含まれる1又は複数の画像からの1又は複数の推論結果のうち、第1の閾値以上の信頼度を有する推論結果を擬似ラベルに設定し、当該擬似ラベルを、対応する画像に関連付けることにより、擬似ラベル付与後のデータセットを生成する。 More specifically, the information processing device 10 first learns the detection model using the first data set. Further, the information processing apparatus 10 includes one or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more inference results attached to each of the images. The first threshold is determined by referring to the result of comparison with the correct label. Furthermore, the information processing apparatus 10 obtains one or more inference results for each of the one or more images included in the second data set by inputting each of the one or more images into the detection model. Furthermore, the information processing device 10 sets an inference result having a reliability equal to or higher than the first threshold among one or more inference results from one or more images included in the second data set as a pseudo label, A pseudo-labeled data set is generated by associating the pseudo-labels with the corresponding images.
 <情報処理装置10の構成>
 本例示的実施形態に係る情報処理装置10の構成について、図1を参照して説明する。図1は、情報処理装置10の構成を示すブロック図である。
<Configuration of information processing device 10>
A configuration of an information processing apparatus 10 according to this exemplary embodiment will be described with reference to FIG. FIG. 1 is a block diagram showing the configuration of an information processing device 10. As shown in FIG.
 図1に示すように、情報処理装置10は、学習部101と、閾値決定部102と、推論部103と、データセット生成部104とを備える。学習部101は、本例示的実施形態において学習手段を実現する構成である。閾値決定部102は、本例示的実施形態において閾値決定手段を実現する構成である。推論部103は、本例示的実施形態において推論手段を実現する構成である。データセット生成部104は、本例示的実施形態においてデータセット生成手段を実現する構成である。 As shown in FIG. 1, the information processing device 10 includes a learning unit 101, a threshold determination unit 102, an inference unit 103, and a dataset generation unit 104. The learning unit 101 is a configuration that implements learning means in this exemplary embodiment. The threshold determination unit 102 is a configuration that implements threshold determination means in this exemplary embodiment. The inference unit 103 is a configuration that realizes inference means in this exemplary embodiment. The data set generation unit 104 is a configuration that implements data set generation means in this exemplary embodiment.
 学習部101は、第1のデータセットを用いて検知モデルの学習を行う。具体的には、学習部101は、1又は複数の画像を含む第1のデータセットを用いて、当該画像に含まれるオブジェクトの検知のための検知モデルの学習を行う。検知とは、画像を検知モデルに入力することにより、
・当該画像に含まれるオブジェクトの存否
・当該画像に含まれるオブジェクトの位置
・当該画像に含まれるオブジェクトのサイズ
・当該画像に含まれるオブジェクトのカテゴリ
の少なくとも何れかに関する推論結果を出力することである。学習部101は、画像を入力とし上記のような推論結果を出力する検知モデルを学習させる。
A learning unit 101 learns a detection model using the first data set. Specifically, the learning unit 101 uses a first data set including one or more images to learn a detection model for detecting objects included in the images. Detection means that by inputting an image into the detection model,
- Presence or absence of an object included in the image - Position of the object included in the image - Size of the object included in the image - Output of inference results regarding at least one of the category of the object included in the image. The learning unit 101 learns a detection model that receives an image as an input and outputs an inference result as described above.
 閾値決定部102は、評価用データセットに含まれる1又は複数の画像の各々を検知モデルに入力して得られる1又は複数の推論結果と、当該画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第1の閾値を決定する。ここで、正解ラベルとは、評価用データセットに含まれる1又は複数の画像の各々に含まれる1又は複数のオブジェクトについて、
・当該画像に含まれるオブジェクトの位置
・当該画像に含まれるオブジェクトのサイズ
・当該画像に含まれるオブジェクトのカテゴリ
の少なくとも何れかに関する正解(Ground Truth)データを含むラベルのことである。
The threshold determination unit 102 determines one or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct answers given to each of the images. A first threshold is determined by referring to the comparison result with the label. Here, the correct label is, for one or more objects included in each of one or more images included in the evaluation data set,
It is a label containing ground truth data regarding at least one of the position of an object included in the image, the size of the object included in the image, and the category of the object included in the image.
 推論部103は、第2のデータセットに含まれる1又は複数の画像の各々を、上述した検知モデルに入力することによって、当該画像の各々についての1又は複数の推論結果を取得する。第2のデータセットとは、第1のデータセットと異なる1又は複数の画像を含む。 The inference unit 103 obtains one or more inference results for each of the one or more images included in the second data set by inputting each of the one or more images into the detection model described above. The second data set includes one or more images different from the first data set.
 データセット生成部104は、推論部103による1又は複数の推論結果のうち、第1の閾値以上の信頼度を有する推論結果を擬似ラベルに設定し、当該擬似ラベルを、対応する画像に関連付けることによって、擬似ラベル付与後のデータセットを生成する。ここで、擬似ラベルとは、第2のデータセットに含まれる1又は複数の画像の各々について、
・推論部103によってオブジェクトであると推論された1又は複数のオブジェクトの各々の位置
・上記各オブジェクトのサイズ
・上記各オブジェクトのカテゴリ
の少なくとも何れかに関するデータを含むラベルである。
The data set generation unit 104 sets an inference result having a reliability equal to or higher than a first threshold among the one or more inference results by the inference unit 103 as a pseudo label, and associates the pseudo label with the corresponding image. generates a pseudo-labeled dataset. Here, the pseudo label is, for each of one or more images included in the second data set,
The label includes data on at least one of the position of each of one or more objects inferred to be objects by the inference unit 103, the size of each object, and the category of each object.
 なお、あるオブジェクトに対して正解ラベルが存在すると仮定した場合、当該オブジェクトに対して付与された疑似ラベルは、当該正解ラベルと一致する場合もあるし、一致しない場合もある。例えば、当該オブジェクトに関する正解データに含まれる当該オブジェクトの位置、サイズ、カテゴリのうち、何れか1又は複数の項目が、疑似ラベルにおける当該オブジェクトの位置、サイズ、カテゴリに一致し、他の項目は一致しないといったことも起こり得る。 It should be noted that, assuming that a correct label exists for a certain object, the pseudo label assigned to the object may or may not match the correct label. For example, any one or more items of the position, size, and category of the object contained in the correct data about the object match the position, size, and category of the object in the pseudo label, and the other items match It may happen that you do not.
 疑似ラベルの精度は、一般に、上述した第1の閾値を調整することによって調整することができるが、当該第1の閾値の調整には、一般に、時間的コスト及び計算的コストが必要となり得る。 The accuracy of pseudo-labels can generally be adjusted by adjusting the first threshold described above, but adjusting the first threshold generally requires time and computational costs.
 上述のように、本例示的実施形態に係る情報処理装置10においては、第2のデータセットに含まれる画像の各々についての推論結果を擬似ラベルとするか否かを決定するための第1の閾値を自動で決定する構成が採用されている。このため、本例示的実施形態に係る情報処理装置10によれば、当該第1の閾値の調整に関するコストを削減することができるという効果が得られる。したがって、本例示的実施形態に係る情報処理装置10によれば、生成コストを抑制しつつ、高精度な検知モデルを生成することができる。 As described above, in the information processing apparatus 10 according to the present exemplary embodiment, the first data set for determining whether or not to use the inference result for each of the images included in the second data set as a pseudo label. A configuration for automatically determining the threshold value is adopted. Therefore, according to the information processing apparatus 10 according to the exemplary embodiment, it is possible to reduce the cost for adjusting the first threshold. Therefore, according to the information processing apparatus 10 according to this exemplary embodiment, it is possible to generate a highly accurate detection model while suppressing the generation cost.
 <情報処理方法の流れ>
 以上のように構成された情報処理装置10が実行する情報処理方法S10の流れについて、図2を参照して説明する。図2は、情報処理方法S10の流れを示すフロー図である。情報処理装置10は、擬似ラベルが関連付けられた画像を含む第2のデータセットを生成するために、情報処理方法S10を実行する。
<Flow of information processing method>
The flow of the information processing method S10 executed by the information processing apparatus 10 configured as described above will be described with reference to FIG. FIG. 2 is a flow diagram showing the flow of the information processing method S10. The information processing device 10 performs the information processing method S10 to generate a second data set including images with associated pseudo-labels.
 図2に示すように、情報処理方法S10は、ステップS101~S104を含む。 As shown in FIG. 2, the information processing method S10 includes steps S101 to S104.
 (ステップS101)
 ステップS101において、学習部101は、検知モデルの学習を行う。具体的には、学習部101は、第1のデータセットを用いて検知モデルの学習を行う。ステップS101は、本例示的実施形態における学習工程である。
(Step S101)
In step S101, the learning unit 101 learns a detection model. Specifically, the learning unit 101 learns the detection model using the first data set. Step S101 is a learning step in this exemplary embodiment.
 (ステップS102)
 ステップS102において、閾値決定部102は、第1の閾値を決定する。具体的には、閾値決定部102は、評価用データセットに含まれる1又は複数の画像の各々を検知モデルに入力して得られる1又は複数の推論結果と、当該画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して、擬似ラベルを決定するための第1の閾値を決定する。ステップS102は、本例示的実施形態における閾値決定工程である。
(Step S102)
In step S102, the threshold determination unit 102 determines a first threshold. Specifically, the threshold determination unit 102 inputs one or more images included in the evaluation data set to the detection model to obtain one or more inference results, and the A first threshold for determining a pseudo label is determined by referring to the result of comparison with one or more correct labels. Step S102 is the threshold determination step in this exemplary embodiment.
 (ステップS103)
 ステップS103において、推論部103は、推論を行う。具体的には、推論部103は、第2のデータセットに含まれる1又は複数の画像の各々を検知モデルに入力することによって、当該画像の各々についての1又は複数の推論結果を取得する。ステップS103は、本例示的実施形態における推論工程である。
(Step S103)
In step S103, the inference unit 103 makes an inference. Specifically, the inference unit 103 obtains one or more inference results for each of the one or more images included in the second data set by inputting each of the one or more images into the detection model. Step S103 is an inference step in this exemplary embodiment.
 (ステップS104)
 ステップS104において、データセット生成部104は、擬似ラベル付与後のデータセットを生成する。具体的には、データセット生成部104は、ステップS103における1又は複数の推論結果のうち、第1の閾値以上の信頼度を有する推論結果を擬似ラベルに設定し、当該擬似ラベルを、第2のデータセットにおける対応する画像に関連付けることによって、擬似ラベル付与後のデータセットを生成する。ステップS104は、本例示的実施形態におけるデータセット生成工程である。
(Step S104)
In step S104, the data set generation unit 104 generates a data set after adding pseudo labels. Specifically, the data set generation unit 104 sets an inference result having a reliability equal to or higher than a first threshold among the one or more inference results in step S103 as a pseudo label, and sets the pseudo label as a second inference result. generates a pseudo-labeled dataset by associating it with the corresponding images in the dataset. Step S104 is the data set generation step in this exemplary embodiment.
 なお、ステップS103の実行タイミングは、ステップS102の実行後に限定されない。当該実行タイミングは、ステップS101の実行後、かつステップS104の実行前であればよく、例えば、ステップS102の実行前であってもよい。 Note that the execution timing of step S103 is not limited to after execution of step S102. The execution timing may be after execution of step S101 and before execution of step S104, for example, before execution of step S102.
 上述のように、本例示的実施形態に係る情報処理方法S10によれば、情報処理装置10と同様の作用効果が得られる。すなわち、本例示的実施形態に係る情報処理方法S10においては、第2のデータセットに含まれる画像の各々についての推論結果を擬似ラベルとするか否かを決定するための第1の閾値を自動で決定する構成が採用されている。このため、本例示的実施形態に係る情報処理方法S10によれば、当該第1の閾値の調整に関するコストを削減することができるという効果が得られる。したがって、本例示的実施形態に係る情報処理方法S10によれば、生成コストを抑制しつつ、高精度な検知モデルを生成することができる。 As described above, according to the information processing method S10 according to this exemplary embodiment, the same effects as those of the information processing apparatus 10 can be obtained. That is, in the information processing method S10 according to this exemplary embodiment, the first threshold for determining whether the inference result for each of the images included in the second data set is to be the pseudo label is automatically set. A configuration determined by is adopted. Therefore, according to the information processing method S10 according to the exemplary embodiment, it is possible to reduce the cost for adjusting the first threshold. Therefore, according to the information processing method S10 according to this exemplary embodiment, it is possible to generate a highly accurate detection model while suppressing the generation cost.
 <情報処理装置20の概要>
 情報処理装置20は、対象画像を取得し、対象画像用検知モデルを用いて、当該画像に含まれるオブジェクトの検知を行う。典型的には、対象画像用検知モデルは、上述した情報処理装置10、具体的には学習部101が学習した検知モデルに対する再学習であって、情報処理装置10が生成した擬似ラベル付与後のデータセットを参照した再学習が行われた検知モデルである。なお、対象画像用検知モデルはこれに限定されない。対象画像用検知モデルは、擬似ラベル付与後のデータセットを用いて学習された検知モデルであればよく、例えば、擬似ラベル付与後のデータセットを用いて学習された、新たな検知モデルであってもよい。ここで、新たな検知モデルとは、学習部101が学習した検知モデルとは異なる検知モデルである。
<Overview of Information Processing Device 20>
The information processing device 20 acquires a target image and uses the target image detection model to detect an object included in the image. Typically, the target image detection model is a re-learning of the detection model learned by the information processing apparatus 10 described above, specifically, the learning unit 101. This is a detection model that has been relearned with reference to the dataset. Note that the target image detection model is not limited to this. The detection model for the target image may be a detection model trained using the dataset after the pseudo-labeling, for example, a new detection model trained using the dataset after the pseudo-labeling. good too. Here, the new detection model is a detection model different from the detection model learned by the learning unit 101 .
 <情報処理装置20の構成>
 本例示的実施形態に係る情報処理装置20の構成について、図3を参照して説明する。図3は、情報処理装置20の構成を示すブロック図である。
<Configuration of information processing device 20>
The configuration of the information processing device 20 according to this exemplary embodiment will be described with reference to FIG. FIG. 3 is a block diagram showing the configuration of the information processing device 20. As shown in FIG.
 図3に示すように、情報処理装置20は、取得部201と、検知部202とを備える。取得部201は、本例示的実施形態において取得手段を実現する構成である。検知部202は、本例示的実施形態において検知手段を実現する構成である。 As shown in FIG. 3 , the information processing device 20 includes an acquisition unit 201 and a detection unit 202 . The acquisition unit 201 is a configuration that implements acquisition means in this exemplary embodiment. The detection unit 202 is a configuration that realizes detection means in this exemplary embodiment.
 取得部201は、対象画像を取得する。ここで、対象画像とは、当該画像に含まれるオブジェクトの検知のために、検知モデルに入力される画像である。例えば、取得部201は、情報処理装置20に記憶された対象画像を読み出すことで、対象画像を取得してもよいし、撮像装置から供給される対象画像を取得してもよい。また、例えば、取得部201は、入力装置(図示せず)を介して対象画像を取得してもよい。また、例えば、取得部201は、情報処理装置20と通信可能に接続された他の装置(図示せず)から対象画像を取得してもよい。 The acquisition unit 201 acquires the target image. Here, the target image is an image input to the detection model in order to detect an object included in the image. For example, the acquisition unit 201 may acquire the target image by reading the target image stored in the information processing device 20, or may acquire the target image supplied from the imaging device. Also, for example, the acquisition unit 201 may acquire the target image via an input device (not shown). Further, for example, the acquiring unit 201 may acquire the target image from another device (not shown) communicably connected to the information processing device 20 .
 検知部202は、対象画像用検知モデルを用いて、対象画像に含まれるオブジェクトの検知を行う。対象画像用検知モデルは、対象画像に含まれるオブジェクトを検知するために用いる検知モデルであり、本例示的実施形態に係る対象画像用検知モデルは、上述した再学習が行われた検知モデルである。検知部202は、対象画像を対象画像用検知モデルに入力することにより、対象画像用検知モデルから出力された推論結果を取得する。例えば、検知部202は、対象画像用検知モデルを保持しており、当該対象画像用検知モデルに対象画像を入力する。また、例えば、検知部202は、記憶装置(図示せず)に記憶されている対象画像用検知モデルにアクセスし、対象画像を入力する。 The detection unit 202 detects an object included in the target image using the target image detection model. The target image detection model is a detection model used to detect an object included in the target image, and the target image detection model according to the present exemplary embodiment is the above-described relearned detection model. . The detection unit 202 acquires an inference result output from the target image detection model by inputting the target image into the target image detection model. For example, the detection unit 202 holds a target image detection model, and inputs the target image to the target image detection model. Further, for example, the detection unit 202 accesses a target image detection model stored in a storage device (not shown) and inputs a target image.
 上述のように、本例示的実施形態に係る情報処理装置20においては、自動で決定された第1の閾値を用いて擬似ラベルが決定され、当該擬似ラベルが関連付けられた画像を含むデータセットを用いて学習が行われた対象画像用検知モデルを用いてオブジェクトを検知する構成が採用されている。このため、本例示的実施形態に係る情報処理装置20によれば、第1の閾値の調整に関するコストを削減した対象画像用検知モデルを用いて、画像に含まれるオブジェクトを検知することができるという効果が得られる。 As described above, in the information processing apparatus 20 according to the present exemplary embodiment, a pseudo-label is determined using the automatically determined first threshold, and a data set including images associated with the pseudo-label is generated. A configuration is adopted in which an object is detected using a target image detection model that has been trained using the target image detection model. Therefore, according to the information processing apparatus 20 according to the present exemplary embodiment, it is possible to detect an object included in an image using a target image detection model in which the cost for adjusting the first threshold is reduced. effect is obtained.
 <情報処理方法の流れ>
 以上のように構成された情報処理装置20が実行する情報処理方法S20の流れについて、図4を参照して説明する。図4は、情報処理方法S20の流れを示すフロー図である。情報処理装置20は、対象画像に含まれるオブジェクトを検知するために、情報処理方法S20を実行する。
<Flow of information processing method>
The flow of the information processing method S20 executed by the information processing apparatus 20 configured as described above will be described with reference to FIG. FIG. 4 is a flow diagram showing the flow of the information processing method S20. The information processing device 20 executes the information processing method S20 in order to detect an object included in the target image.
 図4に示すように、情報処理方法S20は、ステップS201およびS202を含む。 As shown in FIG. 4, the information processing method S20 includes steps S201 and S202.
 (ステップS201)
 ステップS201において、取得部201は、対象画像を取得する。
(Step S201)
In step S201, the acquisition unit 201 acquires a target image.
 (ステップS202)
 ステップS202において、検知部202は、オブジェクトを検知する。具体的には、検知部202は、対象画像用検知モデルを用いて、対象画像に含まれるオブジェクトの検知を行う。より具体的には、検知部202は、取得部201が取得した対象画像を対象画像用検知モデルに入力し、当該検知モデルが出力した推論結果を取得する。
 上述のように、本例示的実施形態に係る情報処理方法S20によれば、情報処理装置20と同様の作用効果が得られる。すなわち、本例示的実施形態に係る情報処理方法S20においては、自動で決定された第1の閾値を用いて擬似ラベルを決定し、当該擬似ラベルが関連付けられた画像を含むデータセットを用いて学習が行われた対象画像用検知モデルを用いてオブジェクトを検知する構成が採用されている。このため、本例示的実施形態に係る情報処理方法S20によれば、第1の閾値の調整に関するコストを削減した対象画像用検知モデルを用いて、画像に含まれるオブジェクトを検知することができるという効果が得られる。
(Step S202)
In step S202, the detection unit 202 detects an object. Specifically, the detection unit 202 detects an object included in the target image using the target image detection model. More specifically, the detection unit 202 inputs the target image acquired by the acquisition unit 201 to the target image detection model, and acquires the inference result output by the detection model.
As described above, according to the information processing method S20 according to this exemplary embodiment, the same effects as those of the information processing apparatus 20 can be obtained. That is, in the information processing method S20 according to the present exemplary embodiment, the pseudo-label is determined using the automatically determined first threshold, and the data set including the image associated with the pseudo-label is used for training. A configuration is adopted in which an object is detected using the target image detection model for which the above is performed. Therefore, according to the information processing method S20 according to the exemplary embodiment, it is possible to detect an object included in an image using a target image detection model that reduces the cost of adjusting the first threshold. effect is obtained.
 〔例示的実施形態2〕
 本発明の第2の例示的実施形態について、図面を参照して詳細に説明する。なお、例示的実施形態1にて説明した構成要素と同じ機能を有する構成要素については、同じ符号を付し、その説明を適宜省略する。
[Exemplary embodiment 2]
A second exemplary embodiment of the invention will now be described in detail with reference to the drawings. Components having the same functions as the components described in the exemplary embodiment 1 are denoted by the same reference numerals, and descriptions thereof are omitted as appropriate.
 <情報処理装置10aの概要>
 本例示的実施形態に係る情報処理装置10aは、例示的実施形態1を変形したものである。具体的には、情報処理装置10aは、第1のデータセットを取得し、例示的実施形態1で説明した検知モデルの学習、閾値の決定、推論および擬似ラベル付与後のデータセットの作成を行う。さらに情報処理装置10aは、生成した擬似ラベル付与後のデータセットを用いて、対象画像用検知モデルの学習を行う。典型的には、対象画像用検知モデルは、上記検知モデルに対する再学習であって、擬似ラベル付与後のデータセットを参照した再学習が行われた検知モデルである。なお、上述したとおり、対象画像用検知モデルは、再学習が行われた検知モデルに限定されず、擬似ラベル付与後のデータセットを用いて学習された検知モデルであればよい。
<Overview of Information Processing Device 10a>
The information processing apparatus 10a according to this exemplary embodiment is a modification of the first exemplary embodiment. Specifically, the information processing device 10a acquires the first data set, and performs the learning of the detection model, the determination of the threshold value, the inference, and the creation of the data set after pseudo-labeling as described in the first exemplary embodiment. . Further, the information processing apparatus 10a learns the detection model for the target image using the generated pseudo-labeled data set. Typically, the detection model for the target image is a detection model that is re-learned with respect to the above-mentioned detection model, and is a detection model that has been re-learned with reference to the dataset after the pseudo-labeling. As described above, the target image detection model is not limited to a re-learned detection model, and may be a detection model learned using a data set after pseudo-labeling.
 <情報処理装置10aの構成>
 情報処理装置10aの構成について、図5を参照して説明する。図5は、情報処理装置10aの構成を示すブロック図である。図5に示すように、情報処理装置10aは、制御部100aおよび記憶部150aを備える。制御部100aは、情報処理装置10aの各部を統括して制御する。記憶部150aは、情報処理装置10aが使用する各種プログラムやデータを記憶する。
<Configuration of Information Processing Device 10a>
The configuration of the information processing device 10a will be described with reference to FIG. FIG. 5 is a block diagram showing the configuration of the information processing device 10a. As shown in FIG. 5, the information processing device 10a includes a control unit 100a and a storage unit 150a. The control unit 100a centrally controls each unit of the information processing device 10a. The storage unit 150a stores various programs and data used by the information processing apparatus 10a.
 記憶部150aは、評価用データセットDSE、データセット1(DS1)、データセット2(DS2)、データセット2’(DS2’)、物体検知モデルDMを記憶する。評価用データセットDSEは、本例示的実施形態における評価用データセットである。データセット1(DS1)は、本例示的実施形態における第1のデータセットである。データセット2(DS2)は、本例示的実施形態における第2のデータセットである。データセット2’(DS2’)は、本例示的実施形態における、疑似ラベル付与後のデータセットである。 The storage unit 150a stores an evaluation data set DSE, a data set 1 (DS1), a data set 2 (DS2), a data set 2' (DS2'), and an object detection model DM. The evaluation data set DSE is the evaluation data set in this exemplary embodiment. Data set 1 (DS1) is the first data set in this exemplary embodiment. Data set 2 (DS2) is the second data set in this exemplary embodiment. Data set 2' (DS2') is the pseudo-labeled data set in this exemplary embodiment.
 ここで、データセット1(DS1)およびデータセット2(DS2)の詳細について説明する。図6は、データセット1(DS1)およびデータセット2(DS2)に含まれるデータの具体例を示す図である。具体的には、図6には、データセット1(DS1)に含まれる画像の1つと、データセット2(DS2)に含まれる画像の1つとが示されている。 Here, the details of dataset 1 (DS1) and dataset 2 (DS2) will be described. FIG. 6 is a diagram showing a specific example of data included in data set 1 (DS1) and data set 2 (DS2). Specifically, FIG. 6 shows one of the images contained in dataset 1 (DS1) and one of the images contained in dataset 2 (DS2).
 これらの画像の各々には、5つのオブジェクト、具体的には3人の人物と、2つの鞄とが含まれている。データセット1(DS1)に含まれる画像において、当該5つのオブジェクトの各々には、正解ラベルが関連付けられている。典型的には、正解ラベルは、図6に示すようにカテゴリおよびバウンディングボックスを含むラベルである。カテゴリは、正解ラベルに関連付けられた画像に含まれるオブジェクトのカテゴリを示すカテゴリ情報であり、具体的には、当該オブジェクトのカテゴリに関する正解データである。図6の例では、3人の人物の各々には「person」のカテゴリが、2つの鞄の各々には「bag」のカテゴリが関連付けられている。バウンディングボックスは、正解ラベルに関連付けられた画像に含まれるオブジェクトの領域を示す領域情報であり、具体的には、画像に含まれるオブジェクトの位置およびサイズに関する正解データである。1つのオブジェクトに1つのバウンディングボックスが関連付けられており、バウンディングボックスの典型例は、図6に示すように、オブジェクトが内包される最小の矩形を示すデータである。 Each of these images contains five objects, specifically three people and two bags. In the images contained in dataset 1 (DS1), each of the five objects is associated with a correct label. Typically, the correct label is a label containing categories and bounding boxes as shown in FIG. The category is category information indicating the category of the object included in the image associated with the correct label, and specifically, correct data regarding the category of the object. In the example of FIG. 6, each of the three persons is associated with a "person" category, and each of the two bags is associated with a "bag" category. The bounding box is area information indicating the area of the object included in the image associated with the correct label, and specifically, correct data regarding the position and size of the object included in the image. One bounding box is associated with one object, and a typical example of the bounding box is data indicating the minimum rectangle enclosing the object, as shown in FIG.
 一方、データセット2(DS2)に含まれる画像において、オブジェクトには正解ラベルは関連付けられていない。 On the other hand, in the images included in dataset 2 (DS2), no correct label is associated with the object.
 以上をふまえ、各例示的実施形態に記載の「画像」、「データセット」および「正解ラベル」は、以下のように表現することができる。
・検知モデルに入力される画像xは、データ空間Xの要素である。ここで、データ空間Xは当該画像xを含むデータセットに対応する。なお、1つの画像xに含まれるオブジェクトの数は任意である。
・正解ラベルは、カテゴリyおよびバウンディングボックスbの組(y,b)で表現することができる。なお、カテゴリyは、カテゴリの集合Yの要素であり、図6の例では、集合Yは、「personおよびbag」である。
・以上より、正解ラベルが関連付けられたデータセットDは、画像xと、画像xに含まれる全てのオブジェクトの集合
Figure JPOXMLDOC01-appb-M000001
との組
Figure JPOXMLDOC01-appb-M000002
の集合として、
Figure JPOXMLDOC01-appb-M000003
と表現することができる。
Based on the above, the “image”, “dataset” and “correct label” described in each exemplary embodiment can be expressed as follows.
• The image x input to the sensing model is an element of the data space X; Here, the data space X corresponds to the data set containing the image x. Note that the number of objects included in one image x is arbitrary.
• A correct label can be represented by a pair (y,b) of category y and bounding box b. Note that category y is an element of category set Y, and in the example of FIG. 6, set Y is "person and bag".
・From the above, the data set D associated with the correct label is the image x and the set of all objects included in the image x
Figure JPOXMLDOC01-appb-M000001
group with
Figure JPOXMLDOC01-appb-M000002
as a set of
Figure JPOXMLDOC01-appb-M000003
can be expressed as
 (制御部100aの構成)
 図5に示すように、制御部100aは、学習部101、閾値決定部102、推論部103、データセット生成部104、及び再学習部105を備えている。また、閾値決定部102は、図5に示すように、評価データセット推論部1021、評価値算出部1022、及び閾値判断部1023を備えている。また、データセット生成部104は、図5に示すように、擬似ラベル生成部1041、及び関連付け部1042を備えている。
(Configuration of control unit 100a)
As shown in FIG. 5, the control unit 100a includes a learning unit 101, a threshold value determination unit 102, an inference unit 103, a data set generation unit 104, and a relearning unit 105. 5, the threshold determination unit 102 includes an evaluation data set inference unit 1021, an evaluation value calculation unit 1022, and a threshold determination unit 1023. FIG. In addition, the dataset generator 104 includes a pseudo-label generator 1041 and an association unit 1042, as shown in FIG.
 評価データセット推論部1021、評価値算出部1022および閾値判断部1023は、例示的実施形態1における閾値決定部102に相当し、本例示的実施形態において閾値決定手段を実現する構成である。擬似ラベル生成部1041および関連付け部1042は、例示的実施形態1におけるデータセット生成部104に相当し、本例示的実施形態においてデータセット生成手段を実現する構成である。再学習部105は、本例示的実施形態において擬似ラベル参照学習手段を実現する構成である。 The evaluation data set inference unit 1021, the evaluation value calculation unit 1022, and the threshold determination unit 1023 correspond to the threshold determination unit 102 in exemplary embodiment 1, and are configured to implement the threshold determination means in this exemplary embodiment. The pseudo-label generating unit 1041 and the associating unit 1042 correspond to the dataset generating unit 104 in exemplary embodiment 1, and are configured to implement the dataset generating means in this exemplary embodiment. The re-learning unit 105 is a configuration that implements pseudo-label reference learning means in this exemplary embodiment.
 学習部101は、データセット1(DS1)を取得し、当該データセット1(DS1)を用いて擬似ラベル生成用物体検知モデルの学習を行う。すなわち、学習部101は、第1のデータセットを取得する取得部としても機能する。具体的には、学習部101は、記憶部150aに記憶されているデータセット1(DS1)を読み出し、当該データセット1(DS1)、すなわち、1又は複数の画像の各々に正解ラベルが関連付けられたデータセットを用いて、擬似ラベル生成用物体検知モデルの学習を行う。そして、学習部101は、学習済みの擬似ラベル生成用物体検知モデルを評価データセット推論部1021と、推論部103とへ出力する。 The learning unit 101 acquires the data set 1 (DS1), and uses the data set 1 (DS1) to learn an object detection model for pseudo label generation. That is, the learning unit 101 also functions as an acquisition unit that acquires the first data set. Specifically, the learning unit 101 reads the data set 1 (DS1) stored in the storage unit 150a, and associates a correct label with each of the data set 1 (DS1), that is, one or more images. We train an object detection model for generating pseudo-labels using the data set. Then, the learning unit 101 outputs the trained pseudo-label generation object detection model to the evaluation data set inference unit 1021 and the inference unit 103 .
 評価データセット推論部1021は、評価用データセットによる推論結果を生成する。具体的には、評価データセット推論部1021は、評価用データセットDSEおよび擬似ラベル生成用物体検知モデルを取得し、当該評価用データセットDSEに含まれる1又は複数の画像の各々を擬似ラベル生成用物体検知モデルに入力して推論結果を得る。より具体的には、評価データセット推論部1021は、記憶部150aに記憶されている評価用データセットDSEを読み出し、学習部101から取得した擬似ラベル生成用物体検知モデルに入力する。そして、評価データセット推論部1021は、擬似ラベル生成用物体検知モデルが出力した推論結果を取得し、当該推論結果を評価値算出部1022へ出力する。 The evaluation data set inference unit 1021 generates an inference result based on the evaluation data set. Specifically, the evaluation data set inference unit 1021 acquires the evaluation data set DSE and the pseudo label generation object detection model, and converts each of the one or more images included in the evaluation data set DSE into pseudo label generation. input to the object detection model for use and obtain inference results. More specifically, the evaluation data set inference unit 1021 reads the evaluation data set DSE stored in the storage unit 150 a and inputs it to the pseudo label generation object detection model acquired from the learning unit 101 . Then, the evaluation data set inference unit 1021 acquires the inference result output by the object detection model for pseudo label generation, and outputs the inference result to the evaluation value calculation unit 1022 .
 評価用データセットDSEは、データセット1(DS1)と同様に、各画像に含まれるオブジェクトの各々に正解ラベルが関連付けられたデータセットである。例えば、評価用データセットDSEに含まれる画像は、データセット1(DS1)に含まれる画像の一部であってもよい。また、例えば、評価用データセットDSEに含まれる画像は、データセット2(DS2)に含まれる画像の一部に正解ラベルを付与することにより生成されたものであってもよい。また、例えば、評価用データセットDSEに含まれる画像は、データセット1(DS1)およびデータセット2(DS2)に含まれない画像に正解ラベルを付与することにより生成されたものであってもよい。 The evaluation data set DSE is a data set in which a correct label is associated with each object included in each image, similar to the data set 1 (DS1). For example, the images contained in the evaluation data set DSE may be part of the images contained in the data set 1 (DS1). Further, for example, the images included in the evaluation data set DSE may be generated by giving correct labels to part of the images included in the data set 2 (DS2). Also, for example, the images included in the evaluation data set DSE may be generated by assigning correct labels to images not included in the data set 1 (DS1) and the data set 2 (DS2). .
 評価データセット推論部1021による推論結果は、評価用データセットDSEに含まれる1又は複数の画像の各々について、
・オブジェクトであると推論された1又は複数のオブジェクトの各々の位置
・上記各オブジェクトのサイズ
・上記各オブジェクトのカテゴリ
の少なくともいずれかに関するデータを含み、さらに、上記各オブジェクトについて、推論の確からしさに関するデータを含む。典型的には、評価データセット推論部1021による推論結果は、カテゴリ、バウンディングボックスおよび信頼度を含む。信頼度は、推論の確からしさに関するデータの一例であり、例えば、0を最小値、1を最大値とする数値である。
The inference result by the evaluation data set inference unit 1021 is, for each of one or more images included in the evaluation data set DSE,
position of each of the one or more objects inferred to be an object, size of each of the objects, and/or category of each of the objects; Contains data. Typically, the inference result by the evaluation data set inference unit 1021 includes category, bounding box and confidence. The reliability is an example of data related to the certainty of inference, and is a numerical value with 0 as the minimum value and 1 as the maximum value, for example.
 評価値算出部1022は、推論結果に基づき評価値を算出する。具体的には、評価値算出部1022は、評価用データセットDSEに含まれる1又は複数の画像の各々における各推論結果と、当該画像の各々における正解ラベルとの比較結果に基づき、各推論結果の評価値を算出する。 The evaluation value calculation unit 1022 calculates an evaluation value based on the inference result. Specifically, the evaluation value calculation unit 1022 compares each inference result for each of the one or more images included in the evaluation data set DSE with the correct label for each image, based on each inference result Calculate the evaluation value of
 例えば、評価値は、適合率(precision)と再現率(recall)との調和平均、すなわちF値である。ここで、評価値算出部1022が実行するF値の算出処理について説明する。 For example, the evaluation value is the harmonic average of the precision and recall, that is, the F value. Here, F value calculation processing executed by the evaluation value calculation unit 1022 will be described.
 具体的には、評価値算出部1022は、以下の(1)~(6)の処理を実行する。 Specifically, the evaluation value calculation unit 1022 executes the following processes (1) to (6).
 (1)すべての推論結果を信頼度が高い順にソートする。 (1) Sort all inference results in descending order of reliability.
 (2)信頼度が基準値以上である推論結果を特定する。当該基準値は、例えば0.9とする。なお、後述のとおり、F値の算出処理では複数のF値を算出する。そして、当該基準値はF値の各々において異なる値となる。つまり、上述の値0.9は、基準値の初期値と表現することができる。 (2) Identify inference results whose reliability is greater than or equal to the reference value. The reference value is, for example, 0.9. As will be described later, a plurality of F values are calculated in the F value calculation process. Then, the reference value becomes a different value for each F value. That is, the value 0.9 described above can be expressed as the initial value of the reference value.
 (3)特定した推論結果について、TP(True Positive)、FP(false positive)のいずれであるかを特定する。ここで、TPは、バウンディングボックスと正解ラベルのバウンディングボックスとの重なり度合いが所定値以上であり、かつ、カテゴリが正解ラベルと一致している推論結果である。また、FPは、
 (A)カテゴリが正解ラベルと一致しているが、バウンディングボックスと当該正解ラベルのバウンディングボックスとの重なり度合いが所定値以下である推論結果
 (B)バウンディングボックスが重なる正解ラベルとカテゴリが異なる推論結果
 (C)バウンディングボックスが重なる正解ラベルが存在しない推論結果
のいずれかである。なお、バウンディングボックスの重なり度合いを示す値としては、例えば、IOU(Intersection Over Union)を用いる。
(3) Specify whether the specified inference result is TP (True Positive) or FP (False Positive). Here, TP is an inference result in which the degree of overlap between the bounding box and the bounding box of the correct label is equal to or greater than a predetermined value and the category matches the correct label. Also, FP is
(A) An inference result where the category matches the correct label, but the degree of overlap between the bounding box and the bounding box of the correct label is less than or equal to a predetermined value (B) An inference result where the correct label where the bounding box overlaps and the category are different (C) Any inference result in which there is no correct label that overlaps the bounding box. As a value indicating the degree of overlapping of bounding boxes, for example, IOU (Intersection Over Union) is used.
 (4)正解ラベルについて、FN(false negative)となる正解ラベルを特定する。FNは、
 (D)バウンディングボックスが重なる推論結果が存在しない正解ラベル
 (E)バウンディングボックスが重なる推論結果とカテゴリが異なる正解ラベル
のいずれかである。
(4) For correct labels, identify correct labels that are FN (false negative). FN is
(D) A correct label that does not have an inference result whose bounding box overlaps (E) A correct label whose category is different from the inference result whose bounding box overlaps.
 (5)適合率および再現率を算出する。適合率は、推論結果の正解率であり、例えば、適合率=TPの数/(TPの数+FPの数)で算出される。再現率は、正解ラベルのうち、正しく推論された割合であり、例えば、再現率=TPの数/(TPの数+FNの数)で算出される。図7は、適合率と再現率との関係を示すグラフである。図7に示すとおり、信頼度が高いほど適合率は高くなるが、再現率が低くなる。一方、信頼度が低いほど適合率は低くなるが、再現率が高くなる。このように、適合率と再現率とはトレードオフの関係となる。なお、ここでの信頼度とは、(2)の処理で設定した基準値である。 (5) Calculate precision and recall. The precision rate is the accuracy rate of the inference result, and is calculated by, for example, precision = number of TPs/(number of TPs + number of FPs). The recall rate is the ratio of correctly inferred labels out of the correct labels, and is calculated by, for example, recall rate=number of TPs/(number of TPs+number of FNs). FIG. 7 is a graph showing the relationship between precision and recall. As shown in FIG. 7, the higher the reliability, the higher the precision, but the lower the recall. On the other hand, the lower the reliability, the lower the precision, but the higher the recall. In this way, there is a trade-off relationship between precision and recall. The reliability here is the reference value set in the process (2).
 (6)F値を算出する。F値は、(2×適合率×再現率)/(適合率+再現率)で算出される。 (6) Calculate the F value. The F value is calculated by (2×precision×recall)/(precision+recall).
 以上の処理が終了すると、評価値算出部1022は、基準値を減少させ、(2)~(6)の処理を再度実行する。例えば、評価値算出部1022は、次の基準値を0.8とする。換言すれば、評価値算出部1022は、次の基準値に基づくF値を算出する。評価値算出部1022は、(2)~(6)の処理を繰り返して、各基準値に基づくF値を算出する。これにより、異なる基準値の各々に基づく複数のF値が算出される。 When the above processing is completed, the evaluation value calculation unit 1022 decreases the reference value and executes the processing of (2) to (6) again. For example, the evaluation value calculation unit 1022 sets the next reference value to 0.8. In other words, the evaluation value calculator 1022 calculates the F value based on the following reference values. The evaluation value calculation unit 1022 repeats the processes (2) to (6) to calculate the F value based on each reference value. Thereby, a plurality of F values are calculated based on each of the different reference values.
 一例として、評価値算出部1022は、最小の信頼度以下の基準値でF値を算出するまで(2)~(6)の処理を繰り返す。この例の場合、最後の(2)~(6)の処理は、すべての推論結果を対象としてF値が算出される。なお、2回目以降の(2)の処理で特定された推論結果のうち、過去の(2)の処理で特定済みの推論結果については、(3)および(4)の処理を省略し、過去の(3)および(4)の処理における特定結果を用いてもよい。 As an example, the evaluation value calculation unit 1022 repeats the processes (2) to (6) until the F value is calculated with a reference value equal to or lower than the minimum reliability. In the case of this example, in the last processing (2) to (6), the F value is calculated for all inference results. Of the inference results specified in the process (2) from the second time onward, for the inference results already specified in the process (2) in the past, the processes (3) and (4) are omitted, and the past You may use the specific result in the process of (3) and (4).
 評価値算出部1022は、算出した各評価値、すなわちF値について、各F値の算出において用いた基準値を紐付けて閾値判断部1023へ出力する。 The evaluation value calculation unit 1022 associates each calculated evaluation value, that is, the F value with the reference value used in calculating each F value, and outputs it to the threshold determination unit 1023 .
 なお、(5)の処理において、評価値算出部1022は、推論結果および正解ラベルにおいてカテゴリが複数ある場合、カテゴリ毎に適合率および再現率を算出してもよい。この例の場合、(6)の処理では、評価値算出部1022は、カテゴリ毎にF値を算出することとなる。結果として、基準値の各々には、カテゴリごとに算出された複数のF値が紐づけられる。 In addition, in the process of (5), the evaluation value calculation unit 1022 may calculate the precision and the recall for each category when there are multiple categories in the inference result and the correct label. In this example, in the process (6), the evaluation value calculation unit 1022 calculates the F value for each category. As a result, each reference value is associated with a plurality of F values calculated for each category.
 また、評価値算出部1022が算出する評価値は、F値に限定されない。例えば、当該評価値は適合率または再現率を重視した値であってもよい。この例の場合、(6)の処理において、評価値算出部1022は、例えば、{(1+β)×適合率×再現率}/{(β×適合率)+再現率)で評価値を算出してもよい。βは再現率に対する適合率の重要度を調整するための値であり、βの値を0<β<1の範囲とすれば、再現率を重視する評価値となり、1<βの範囲とすれば、適合率を重視する評価値となる。 Also, the evaluation value calculated by the evaluation value calculation unit 1022 is not limited to the F value. For example, the evaluation value may be a value that emphasizes precision or recall. In this example, in the process of (6), the evaluation value calculation unit 1022 calculates the evaluation value by, for example, {(1+β 2 )×relevance×recall}/{(β 2 ×relevance)+recall). can be calculated. β is a value for adjusting the degree of importance of precision to recall. , it becomes an evaluation value that emphasizes the relevance rate.
 なお、複数の評価値を算出するにあたり、(2)の処理において推論結果の少なくとも一部を特定する方法は、上述の例に限定されない。例えば、評価値算出部1022は、所定の個数の推論結果を信頼度が高い順に特定してもよい。この例において、評価値算出部1022は、当該所定の個数を、(6)の処理が終了し、次の(2)~(6)の処理となるたびに所定数増加させる。そして、評価値算出部1022は、すべての推論結果を(2)の処理で特定して評価値を算出するまで、(2)~(6)の処理を繰り返す。なお、最後の(2)の処理における所定の個数の増加量は、1以上所定数以下であればよい。この例では、算出した各評価値に、特定した推論結果における信頼度のうち、最小の信頼度を紐付けて閾値判断部1023へ出力する。 It should be noted that the method of specifying at least part of the inference result in the process (2) when calculating a plurality of evaluation values is not limited to the above example. For example, the evaluation value calculation unit 1022 may identify a predetermined number of inference results in descending order of reliability. In this example, the evaluation value calculation unit 1022 increases the predetermined number by a predetermined number each time the process (6) is completed and the next processes (2) to (6) are performed. Then, the evaluation value calculation unit 1022 repeats the processes (2) to (6) until all the inference results are specified by the process (2) and the evaluation values are calculated. It should be noted that the amount of increase in the predetermined number in the last process (2) should be 1 or more and the predetermined number or less. In this example, each calculated evaluation value is associated with the minimum reliability among the reliability of the specified inference result and output to the threshold determination unit 1023 .
 また、例えば、評価値算出部1022は、(2)~(6)の処理に代えて、
・全ての推論結果について、TP,FPおよびFNを特定する。
・信頼度に複数の閾値を設定しておき、各閾値以上の信頼度であるTPの数を特定する。
・特定したTPの数の各々について、適合率および再現率を算出する。
・算出した複数の適合率および再現率の組み合わせの各々について、評価値(典型例:F値)を算出する。
との処理を実行してもよい。なお、特定するTPの数は、再現率の値に比例する。この例では、算出した各評価値に、TPの数の特定に用いた閾値を紐づけて閾値判断部1023へ出力する。
Further, for example, instead of the processing of (2) to (6), the evaluation value calculation unit 1022
- For every inference result, specify TP, FP and FN.
- A plurality of thresholds are set for the reliability, and the number of TPs with reliability equal to or higher than each threshold is specified.
• Calculate precision and recall for each of the specified numbers of TPs.
- An evaluation value (typical example: F value) is calculated for each of the calculated multiple combinations of precision and recall.
and processing may be performed. Note that the number of specified TPs is proportional to the value of the recall. In this example, each calculated evaluation value is associated with the threshold value used to specify the number of TPs, and is output to the threshold determination unit 1023 .
 閾値判断部1023は、評価値に基づき閾値を決定する。具体的には、閾値判断部1023は、取得した複数のF値のうち、最大値を特定し、特定したF値に紐付けられた基準値を閾値とする。ここで、F値のうちの最大値は、適合率と再現率とのバランスがとれる値と表現することができる。上述したとおり、F値は適合率および再現率を含む式で算出されるので、閾値判断部1023は、評価値算出部1022による比較結果が示す適合率と再現率とを参照して閾値を決定すると表現することができる。また、上述したとおり、適合率と再現率とはトレードオフの関係となるので、F値が最大値となる適合率および再現率は、図7におけるグラフにおいて、適合率または再現率が最大となる点ではなく、例えば図7におけるグラフの星印が示す点となる。閾値判断部1023は、決定した閾値を擬似ラベル生成部1041へ出力する。 The threshold determination unit 1023 determines the threshold based on the evaluation value. Specifically, the threshold determination unit 1023 identifies the maximum value among the acquired F values, and sets the reference value associated with the identified F value as the threshold. Here, the maximum value of the F values can be expressed as a value that balances the precision and the recall. As described above, the F value is calculated by a formula including the precision and the recall. Therefore, the threshold determination unit 1023 determines the threshold by referring to the precision and the recall indicated by the comparison result of the evaluation value calculation unit 1022. Then it can be expressed. In addition, as described above, there is a trade-off relationship between precision and recall. Therefore, the precision and recall at which the F value is the maximum value is the maximum precision or recall in the graph of FIG. It is not a point, but a point indicated by a star in the graph in FIG. 7, for example. Threshold determination section 1023 outputs the determined threshold to pseudo-label generation section 1041 .
 なお、カテゴリ毎にF値が算出される例の場合、閾値判断部1023は、カテゴリ毎に閾値を設定する。すなわち、閾値判断部1023は、カテゴリ毎に複数の閾値を決定し、当該複数の閾値に、対応するカテゴリを示す情報を紐づけて擬似ラベル生成部1041へ出力する。 In addition, in the case of an example in which the F value is calculated for each category, the threshold determination unit 1023 sets the threshold for each category. That is, the threshold determination unit 1023 determines a plurality of thresholds for each category, associates the plurality of thresholds with information indicating the corresponding category, and outputs the information to the pseudo-label generation unit 1041 .
 推論部103は、記憶部150aに記憶されているデータセット2(DS2)を読み出し、学習部101から取得した擬似ラベル生成用物体検知モデルに、当該データセット2(DS2)に含まれる1又は複数の画像の各々を入力し、当該画像の各々についての1又は複数の推論結果を取得する。推論部103は、取得した推論結果を擬似ラベル生成部1041へ出力する。 The inference unit 103 reads the data set 2 (DS2) stored in the storage unit 150a, and adds one or a plurality of , and obtain one or more inference results for each of the images. The inference unit 103 outputs the acquired inference result to the pseudo-label generation unit 1041 .
 擬似ラベル生成部1041は、擬似ラベルを生成する。具体的には、擬似ラベル生成部1041は、推論部103による1又は複数の推論結果のうち、閾値判断部1023が決定した閾値以上の信頼度を有する推論結果を擬似ラベルに設定する。擬似ラベル生成部1041は、擬似ラベルに設定した推論結果を、関連付け部1042へ出力する。 The pseudo-label generation unit 1041 generates pseudo-labels. Specifically, the pseudo-label generation unit 1041 sets an inference result having reliability equal to or higher than the threshold determined by the threshold determination unit 1023 among one or more inference results by the inference unit 103 as a pseudo-label. The pseudo-label generation unit 1041 outputs the inference result set in the pseudo-label to the association unit 1042 .
 なお、擬似ラベル生成部1041は、カテゴリ毎に設定された複数の閾値を取得した場合、推論部103による1又は複数の推論結果のうち、カテゴリ毎に設定された閾値以上の信頼度を有する推論結果を擬似ラベルに設定する。具体的には、擬似ラベル生成部1041は、推論部103による推論結果をカテゴリ毎に分類し、それぞれの分類について、対応する閾値、換言すれば、カテゴリが一致する閾値を特定する。そして、擬似ラベル生成部1041は、それぞれの分類について、各推論結果の信頼度と、特定した閾値とを比較し、当該閾値以上の信頼度を有する推論結果を擬似ラベルに設定する。 Note that when obtaining a plurality of thresholds set for each category, the pseudo-label generation unit 1041 selects one or more inference results from the inference unit 103 that have a reliability equal to or higher than the threshold set for each category. Set the result to a pseudo-label. Specifically, the pseudo-label generation unit 1041 classifies the inference result by the inference unit 103 for each category, and specifies a corresponding threshold for each classification, in other words, a threshold that matches the category. Then, the pseudo-label generation unit 1041 compares the reliability of each inference result with the specified threshold for each classification, and sets an inference result having a reliability equal to or higher than the threshold as a pseudo-label.
 関連付け部1042は、擬似ラベル生成部1041が設定した擬似ラベルを、対応する画像に関連付ける。これにより、データセット2(DS2)に含まれる1又は複数の画像の各々に、擬似ラベルが関連付けられたデータセット2’(DS2’)が生成される。関連付け部1042は、生成したデータセット2’(DS2’)を記憶部150aに記憶し、再学習部105へ通知する。 The association unit 1042 associates the pseudo label set by the pseudo label generation unit 1041 with the corresponding image. This produces a dataset 2' (DS2') in which each of the one or more images contained in the dataset 2 (DS2) is associated with a pseudo-label. The association unit 1042 stores the generated data set 2 ′ (DS2′) in the storage unit 150 a and notifies the relearning unit 105 of it.
 図8は、データセット2’(DS2’)に含まれるデータの具体例を示す図である。具体的には、図8には、データセット2’(DS2’)に含まれる画像の1つが示されている。当該画像は、図6に示したデータセット2(DS2)に含まれる画像であり、当該画像に含まれる5つのオブジェクトの各々に擬似ラベルが関連付けられている。典型的には、擬似ラベルは、図8に示すようにカテゴリおよびバウンディングボックスを含むラベルである。カテゴリは、擬似ラベルに関連付けられた画像に含まれるオブジェクトのカテゴリを示すカテゴリ情報である。図8の例では、3人の人物の各々には「person」のカテゴリが、2つの鞄の各々には「bag」のカテゴリが関連付けられている。バウンディングボックスは、擬似ラベルに関連付けられた画像に含まれるオブジェクトの領域を示す領域情報である。1つのオブジェクトに1つのバウンディングボックスが関連付けられており、バウンディングボックスの典型例は、図8に示すように、オブジェクトが内包される最小の矩形を示すデータである。 FIG. 8 is a diagram showing a specific example of data included in data set 2' (DS2'). Specifically, FIG. 8 shows one of the images contained in dataset 2' (DS2'). The image is an image included in data set 2 (DS2) shown in FIG. 6, and a pseudo label is associated with each of the five objects included in the image. Typically, pseudo-labels are labels that include categories and bounding boxes as shown in FIG. The category is category information indicating the category of objects included in the image associated with the pseudo label. In the example of FIG. 8, each of the three persons is associated with a "person" category, and each of the two bags is associated with a "bag" category. A bounding box is area information indicating the area of an object contained in an image associated with a pseudo label. One bounding box is associated with one object, and a typical example of the bounding box is data indicating the minimum rectangle enclosing the object, as shown in FIG.
 再学習部105は、擬似ラベル付与後のデータセットを用いて、対象画像用検知モデルの学習を行う。一例として、再学習部105は、対象画像用検知モデルの学習として、学習部101により学習された検知モデルの再学習を行う。具体的には、再学習部105は、データセット2’(DS2’)を記憶部150aから読み出し、当該データセット2’(DS2’)を用いて、物体検知モデルDMの学習を行う。そして、再学習部105は、学習済みの物体検知モデルDMを記憶部150aに記憶する。また、他の例として、再学習部105は、対象画像用検知モデルの学習として、新たな検知モデルの学習を行い、当該新たな検知モデルを記憶部150aに記憶してもよい。 The re-learning unit 105 learns the detection model for the target image using the data set after the pseudo-labeling. As an example, the re-learning unit 105 re-learns the detection model learned by the learning unit 101 as learning of the target image detection model. Specifically, the relearning unit 105 reads the data set 2' (DS2') from the storage unit 150a, and uses the data set 2' (DS2') to learn the object detection model DM. Then, the relearning unit 105 stores the learned object detection model DM in the storage unit 150a. As another example, the relearning unit 105 may learn a new detection model as learning of the target image detection model, and store the new detection model in the storage unit 150a.
 上述のように、本例示的実施形態に係る情報処理装置10aにおいては、擬似ラベル付与後のデータセットを用いて対象画像用検知モデルの学習を行う構成が採用されている。このため、本例示的実施形態に係る情報処理装置10aによれば、閾値の調整に関するコストを削減して対象画像用検知モデルを生成することができるという効果が得られる。したがって、本例示的実施形態に係る情報処理装置10aによれば、生成コストを抑制しつつ、高精度な対象画像用検知モデルを生成することができる。 As described above, the information processing apparatus 10a according to the present exemplary embodiment adopts a configuration in which the target image detection model is learned using the data set to which pseudo labels have been assigned. Therefore, according to the information processing apparatus 10a according to the exemplary embodiment, it is possible to reduce the cost of adjusting the threshold value and generate the target image detection model. Therefore, according to the information processing apparatus 10a according to the exemplary embodiment, it is possible to generate a highly accurate target image detection model while suppressing the generation cost.
 また、本例示的実施形態に係る情報処理装置10aにおいては、当該対象画像用検知モデルの学習として、学習部101により学習された検知モデルの再学習を行う構成が採用されている。このため、本例示的実施形態に係る情報処理装置10aによれば、再学習にかかるコストを抑制しつつ、検知モデルをより高精度なものとすることができる。 Further, in the information processing apparatus 10a according to this exemplary embodiment, a configuration is adopted in which the detection model learned by the learning unit 101 is re-learned as the learning of the target image detection model. Therefore, according to the information processing apparatus 10a according to the present exemplary embodiment, it is possible to reduce the cost of re-learning and improve the accuracy of the detection model.
 また、本例示的実施形態に係る情報処理装置10aにおいては、第2のデータセットに含まれる画像の各々についての推論結果を擬似ラベルとするか否かを決定するための閾値を自動で決定する構成が採用されている。このため、本例示的実施形態に係る情報処理装置10aによれば、閾値の調整の度に必要であった再学習の回数を1回とすることができるという効果が得られる。結果として、再学習にかかる時間を低減させることができ、検知モデルの生成にかかる時間を低減させることができるという効果が得られる。 Further, in the information processing apparatus 10a according to the present exemplary embodiment, the threshold for determining whether the inference result for each of the images included in the second data set is to be the pseudo label is automatically determined. configuration is adopted. Therefore, according to the information processing apparatus 10a according to the present exemplary embodiment, it is possible to reduce the number of times of re-learning required each time the threshold is adjusted to one. As a result, the time required for re-learning can be reduced, and the time required for generation of the detection model can be reduced.
 また、上述のように、本例示的実施形態に係る情報処理装置10aにおいては、正解ラベルおよび擬似ラベルが、領域情報およびカテゴリ情報を含む構成が採用されている。このため、本例示的実施形態に係る情報処理装置10aによれば、再学習後の検知モデルを用いた、画像に含まれるオブジェクトの検知の精度を向上させることができるという効果が得られる。 Further, as described above, in the information processing apparatus 10a according to this exemplary embodiment, a configuration is adopted in which the correct label and the pseudo label include area information and category information. Therefore, according to the information processing apparatus 10a according to the exemplary embodiment, it is possible to improve the accuracy of detecting an object included in an image using a re-learned detection model.
 また、上述のように、本例示的実施形態に係る情報処理装置10aにおいては、算出した適合率と再現率とを参照して閾値を決定する構成が採用されている。このため、本例示的実施形態に係る情報処理装置10aによれば、擬似ラベルの設定の精度を向上させることができるという効果が得られる。また、本例示的実施形態に係る情報処理装置10aによれば、学習データの質(適合率)および学習データの量(再現率)の両方を考慮して擬似ラベルを設定することができるため、高精度な対象画像用検知モデルを生成することができるという効果が得られる。 Further, as described above, the information processing apparatus 10a according to the exemplary embodiment adopts a configuration in which the threshold is determined by referring to the calculated relevance rate and recall rate. Therefore, according to the information processing apparatus 10a according to the exemplary embodiment, it is possible to improve the accuracy of pseudo label setting. Further, according to the information processing apparatus 10a according to the present exemplary embodiment, pseudo labels can be set in consideration of both the quality of learning data (precision rate) and the amount of learning data (recall rate). It is possible to obtain an effect that a highly accurate target image detection model can be generated.
 また、上述のように、本例示的実施形態に係る情報処理装置10aにおいては、擬似ラベル生成用物体検知モデルによる、評価用データセットに含まれる画像の推論結果と、当該画像に関連付けられた正解ラベルとにおけるカテゴリ毎に閾値が設定される構成が採用されてもよい。このため、当該構成を採用した本例示的実施形態に係る情報処理装置10aによれば、擬似ラベルの設定の精度を向上させることができる。 Further, as described above, in the information processing apparatus 10a according to the present exemplary embodiment, the inference result of the image included in the evaluation data set and the correct answer associated with the image are obtained by the pseudo label generation object detection model. A configuration may be employed in which a threshold is set for each category of labels. Therefore, according to the information processing apparatus 10a according to the present exemplary embodiment, which employs this configuration, it is possible to improve the accuracy of setting pseudo labels.
 また、上述のように、本例示的実施形態に係る情報処理装置10aにおいては、評価用データセットDSEに含まれる画像は、第1のデータセットに含まれる構成が採用されてもよい。このため、当該構成を採用した本例示的実施形態に係る情報処理装置10aによれば、評価用データセットDSEの生成のために、作業にかかるコストの高い正解付け作業を新たに行う必要がなくなるという効果が得られる。また、当該構成を採用した本例示的実施形態に係る情報処理装置10aによれば、予め用意する画像の数を抑えることができるという効果が得られる。 Further, as described above, in the information processing device 10a according to the present exemplary embodiment, a configuration may be adopted in which the images included in the evaluation data set DSE are included in the first data set. Therefore, according to the information processing apparatus 10a according to the present exemplary embodiment, which employs this configuration, there is no need to newly perform a high-cost correct answer assignment task in order to generate the evaluation data set DSE. effect is obtained. Further, according to the information processing apparatus 10a according to the present exemplary embodiment, which employs the configuration, it is possible to reduce the number of images to be prepared in advance.
 また、上述のように、本例示的実施形態に係る情報処理装置10aにおいては、評価用データセットDSEに含まれる画像は、第2のデータセットの一部に、正解ラベルを付与することによって生成する構成が採用されてもよい。このため、当該構成を採用した本例示的実施形態に係る情報処理装置10aによれば、擬似ラベルが付与されるデータセットの一部が評価用データセットDSEとして用いられて閾値が決定されることとなるので、付与される擬似ラベルの精度を向上させることができるという効果が得られる。また、当該構成を採用した本例示的実施形態に係る情報処理装置10aによれば、あらかじめ用意する画像の数を抑えることができるという効果が得られる。 Further, as described above, in the information processing apparatus 10a according to the present exemplary embodiment, images included in the evaluation data set DSE are generated by giving correct labels to part of the second data set. A configuration may be employed. Therefore, according to the information processing apparatus 10a according to the present exemplary embodiment that employs this configuration, a part of the data set to which the pseudo label is assigned is used as the evaluation data set DSE to determine the threshold value. Therefore, it is possible to obtain an effect that the accuracy of the assigned pseudo-label can be improved. Further, according to the information processing apparatus 10a according to this exemplary embodiment, which employs the configuration, it is possible to reduce the number of images to be prepared in advance.
 <情報処理方法の流れ>
 以上のように構成された情報処理装置10aが実行する情報処理方法S10aの流れについて、図9を参照して説明する。図9は、情報処理方法S10aの流れを示すフロー図である。情報処理装置10aは、擬似ラベルが関連付けられた画像を含む第2のデータセットを生成するために、情報処理方法S10aを実行する。
<Flow of information processing method>
The flow of the information processing method S10a executed by the information processing apparatus 10a configured as described above will be described with reference to FIG. FIG. 9 is a flowchart showing the flow of the information processing method S10a. The information processing device 10a performs the information processing method S10a to generate a second data set including images with associated pseudo-labels.
 (ステップS101)
 ステップS101において、学習部101は、検知モデルを学習する。具体的には、学習部101は、記憶部150aに記憶されているデータセット1(DS1)を読み出し、当該データセット1(DS1)、すなわち、1又は複数の画像の各々に正解ラベルが関連付けられたデータセットを用いて、擬似ラベル生成用物体検知モデルの学習を行う。そして、学習部101は、学習済みの擬似ラベル生成用物体検知モデルを評価データセット推論部1021と、推論部103とへ出力する。
(Step S101)
In step S101, the learning unit 101 learns a detection model. Specifically, the learning unit 101 reads the data set 1 (DS1) stored in the storage unit 150a, and associates a correct label with each of the data set 1 (DS1), that is, one or more images. We train an object detection model for generating pseudo-labels using the data set. Then, the learning unit 101 outputs the trained pseudo-label generation object detection model to the evaluation data set inference unit 1021 and the inference unit 103 .
 (ステップS1021)
 ステップS1021において、評価データセット推論部1021は、評価用データセットによる推論結果を生成する。具体的には、評価データセット推論部1021は、記憶部150aに記憶されている評価用データセットDSEを読み出し、学習部101から取得した擬似ラベル生成用物体検知モデルに入力する。そして、評価データセット推論部1021は、擬似ラベル生成用物体検知モデルが出力した推論結果を取得し、当該推論結果を評価値算出部1022へ出力する。
(Step S1021)
In step S1021, the evaluation data set inference unit 1021 generates an inference result based on the evaluation data set. Specifically, the evaluation data set inference unit 1021 reads the evaluation data set DSE stored in the storage unit 150 a and inputs it to the pseudo label generation object detection model acquired from the learning unit 101 . Then, the evaluation data set inference unit 1021 acquires the inference result output by the object detection model for pseudo label generation, and outputs the inference result to the evaluation value calculation unit 1022 .
 (ステップS1022)
 ステップS1022において、評価値算出部1022は、推論結果に基づき評価値を算出する。具体的には、評価値算出部1022は、評価データセット推論部1021による推論結果のうち、基準値に基づき特定された推論結果と、評価用データセットDSEに含まれる1又は複数の画像の各々における正解ラベルとの比較結果に基づき適合率および再現率を算出し、当該適合率および再現率から、評価値としてのF値を算出する。評価値算出部1022は、基準値を変更してF値の算出を繰り返し、各基準値に対応する複数のF値を算出する。評価値算出部1022は、算出したF値の各々に、対応する基準値を紐付けて閾値判断部1023へ出力する。
(Step S1022)
In step S1022, the evaluation value calculator 1022 calculates an evaluation value based on the inference result. Specifically, the evaluation value calculation unit 1022 calculates the inference result specified based on the reference value among the inference results by the evaluation data set inference unit 1021 and each of one or a plurality of images included in the evaluation data set DSE. A precision rate and a recall rate are calculated based on the comparison result with the correct label in , and an F value as an evaluation value is calculated from the precision rate and the recall rate. The evaluation value calculation unit 1022 repeats calculation of the F value by changing the reference value, and calculates a plurality of F values corresponding to each reference value. The evaluation value calculation unit 1022 associates each of the calculated F values with the corresponding reference value and outputs them to the threshold determination unit 1023 .
 (ステップS1023)
 ステップS1023において、閾値判断部1023は、評価値に基づき閾値を決定する。具体的には、閾値判断部1023は、取得した複数のF値のうち、最大値を特定し、特定したF値に紐付けられた基準値を閾値とする。閾値判断部1023は、決定した閾値を擬似ラベル生成部1041へ出力する。
(Step S1023)
In step S1023, the threshold determination unit 1023 determines a threshold based on the evaluation value. Specifically, the threshold determination unit 1023 identifies the maximum value among the acquired F values, and sets the reference value associated with the identified F value as the threshold. Threshold determination section 1023 outputs the determined threshold to pseudo-label generation section 1041 .
 なお、ステップS1021~S1023は、例示的実施形態1にて説明したステップS102に対応する。 Note that steps S1021 to S1023 correspond to step S102 described in the first exemplary embodiment.
 (ステップS103)
 ステップS103において、推論部103は、推論を行う。具体的には、推論部103は、記憶部150aに記憶されているデータセット2(DS2)を読み出し、学習部101から取得した擬似ラベル生成用物体検知モデルに、当該データセット2(DS2)に含まれる1又は複数の画像の各々を入力し、当該画像の各々についての1又は複数の推論結果を取得する。推論部103は、取得した推論結果を擬似ラベル生成部1041へ出力する。
(Step S103)
In step S103, the inference unit 103 makes an inference. Specifically, the inference unit 103 reads the data set 2 (DS2) stored in the storage unit 150a, and adds the pseudo label generation object detection model acquired from the learning unit 101 to the data set 2 (DS2) Input each of the one or more images included and obtain one or more inference results for each of the images. The inference unit 103 outputs the acquired inference result to the pseudo-label generation unit 1041 .
 (ステップS1041)
 ステップS1041において、擬似ラベル生成部1041は、擬似ラベルを生成する。具体的には、擬似ラベル生成部1041は、推論部103による1又は複数の推論結果のうち、閾値判断部1023が決定した閾値以上の信頼度を有する推論結果を擬似ラベルに設定する。擬似ラベル生成部1041は、擬似ラベルに設定した推論結果を、関連付け部1042へ出力する。
(Step S1041)
In step S1041, the pseudo-label generation unit 1041 generates pseudo-labels. Specifically, the pseudo-label generation unit 1041 sets an inference result having reliability equal to or higher than the threshold determined by the threshold determination unit 1023 among one or more inference results by the inference unit 103 as a pseudo-label. The pseudo-label generation unit 1041 outputs the inference result set in the pseudo-label to the association unit 1042 .
 (ステップS1042)
 ステップS1042において、関連付け部1042は、画像と擬似ラベルとを関連付ける。具体的には、関連付け部1042は、データセット2(DS2)に含まれる1又は複数の画像の各々に、対応する擬似ラベルを関連付け、データセット2’(DS2’)を生成する。擬似ラベル生成部1041は、関連付け部1042は、生成したデータセット2’(DS2’)を記憶部150aに記憶し、再学習部105へ通知する。
(Step S1042)
In step S1042, the associating unit 1042 associates the image with the pseudo label. Specifically, the associating unit 1042 associates each of the one or more images included in dataset 2 (DS2) with a corresponding pseudo label to generate dataset 2′ (DS2′). The pseudo-label generation unit 1041 stores the data set 2′ (DS2′) generated by the association unit 1042 in the storage unit 150a and notifies the relearning unit 105 of it.
 なお、ステップS1041~S1042は、例示的実施形態1にて説明したステップS104に対応する。 Note that steps S1041 and S1042 correspond to step S104 described in the first exemplary embodiment.
 また、図9に示してはいないが、再学習部105は、擬似ラベル付与後のデータセットを用いて対象画像用検知モデルの学習を行う。一例として、再学習部105は、当該学習として、学習部101により学習された検知モデルの再学習を行う。具体的には、再学習部105は、データセット2’(DS2’)を記憶部150aから読み出し、当該データセット2’(DS2’)を用いて、物体検知モデルDMの学習を行う。そして、再学習部105は、学習済みの物体検知モデルDMを記憶部150aに記憶する。また、他の例として、再学習部105は、対象画像用検知モデルの学習として、新たな検知モデルの学習を行い、当該新たな検知モデルを記憶部150aに記憶してもよい。 Although not shown in FIG. 9, the re-learning unit 105 learns the target image detection model using the data set after the pseudo-labeling. As an example, the relearning unit 105 performs relearning of the detection model learned by the learning unit 101 as the learning. Specifically, the relearning unit 105 reads the data set 2' (DS2') from the storage unit 150a, and uses the data set 2' (DS2') to learn the object detection model DM. Then, the relearning unit 105 stores the learned object detection model DM in the storage unit 150a. As another example, the relearning unit 105 may learn a new detection model as learning of the target image detection model, and store the new detection model in the storage unit 150a.
 上述のように、本例示的実施形態に係る情報処理方法S10aによれば、情報処理装置10aと同様の作用効果が得られる。すなわち、本例示的実施形態に係る情報処理方法S10aにおいては、擬似ラベル付与後のデータセットを用いて対象画像用検知モデルの学習を行う構成が採用されている。このため、本例示的実施形態に係る情報処理方法S10aによれば、閾値の調整に関するコストを削減して、情報処理装置が使用する対象画像用検知モデルを生成することができるという効果が得られる。したがって、本例示的実施形態に係る情報処理方法S10aによれば、生成コストを抑制しつつ、高精度な検知モデルを生成することができる。 As described above, according to the information processing method S10a according to this exemplary embodiment, the same effects as those of the information processing apparatus 10a can be obtained. That is, in the information processing method S10a according to the present exemplary embodiment, a configuration is adopted in which the target image detection model is learned using the data set to which pseudo labels have been added. Therefore, according to the information processing method S10a according to the present exemplary embodiment, it is possible to reduce the cost of adjusting the threshold value and generate the target image detection model used by the information processing apparatus. . Therefore, according to the information processing method S10a according to this exemplary embodiment, it is possible to generate a highly accurate detection model while suppressing the generation cost.
 <情報処理装置20aの構成>
 本例示的実施形態に係る情報処理装置20aの構成について、図10を参照して説明する。図10は、情報処理装置20aの構成を示すブロック図である。
<Configuration of information processing device 20a>
The configuration of the information processing device 20a according to this exemplary embodiment will be described with reference to FIG. FIG. 10 is a block diagram showing the configuration of the information processing device 20a.
 図10に示すように、情報処理装置20aは、制御部200a、記憶部250aおよび出力部260aを備える。制御部200aは、情報処理装置20aの各部を統括して制御する。記憶部250aは、情報処理装置20aが使用する各種プログラムやデータを記憶する。出力部260aは、情報処理装置20aによる情報処理結果を出力する。 As shown in FIG. 10, the information processing device 20a includes a control section 200a, a storage section 250a and an output section 260a. The control unit 200a centrally controls each unit of the information processing device 20a. The storage unit 250a stores various programs and data used by the information processing device 20a. The output unit 260a outputs information processing results by the information processing device 20a.
 記憶部250aは、対象データセットTDSおよび物体検知モデルDMを記憶する。対象データセットTDSは、オブジェクトの検知対象である1又は複数の対象画像を含むデータセットである。物体検知モデルDMは、対象画像用検知モデルであり、具体的には、情報処理装置10aの再学習部105が生成した物体検知モデルDMである。 The storage unit 250a stores the target data set TDS and the object detection model DM. The target data set TDS is a data set containing one or more target images that are object detection targets. The object detection model DM is a target image detection model, specifically, an object detection model DM generated by the re-learning unit 105 of the information processing apparatus 10a.
 すなわち、物体検知モデルDMは、
・第1のデータセットを用いて検知モデルの学習を行う学習処理、
・評価用データセットに含まれる1又は複数の画像の各々を当該検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して閾値を決定する閾値決定処理、
・第2のデータセットに含まれる1又は複数の画像の各々を当該検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論処理、
・当該推論処理による1又は複数の推論結果のうち、閾値以上の信頼度を有する推論結果を擬似ラベルに設定し、当該擬似ラベルを、対応する画像に関連付けることによって、擬似ラベル付与後のデータセットを生成するデータセット生成処理、及び
・擬似ラベル付与後のデータセットを参照して、対象画像用検知モデルの再学習を行う擬似ラベル参照学習処理
によって学習されたものである。換言すれば、物体検知モデルDMは、上記各処理の各々を行う工程を含む方法で製造される。
That is, the object detection model DM is
- a learning process for learning a detection model using the first data set;
・One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct answers given to each of the one or more images threshold determination processing for determining a threshold with reference to the comparison result with the label;
an inference process for obtaining one or more inference results for each of the one or more images contained in the second data set by inputting each of the one or more images into the detection model;
- Of the one or more inference results from the inference process, an inference result having a reliability equal to or higher than a threshold is set as a pseudo-label, and the pseudo-label is associated with the corresponding image to create a data set after pseudo-labeling and a pseudo-label reference learning process for re-learning the detection model for the target image by referring to the data set after giving the pseudo-label. In other words, the object detection model DM is manufactured by a method including the steps of performing each of the above processes.
 (制御部200aの構成)
 図10に示すように、制御部200aは、取得部201および検知部202を含む。
(Configuration of control unit 200a)
As shown in FIG. 10, the control unit 200a includes an acquisition unit 201 and a detection unit 202. FIG.
 取得部201は、対象画像を取得する。具体的には、取得部201は、対象データセットTDSを記憶部250aから読み出し、検知部202へ出力する。 The acquisition unit 201 acquires the target image. Specifically, the acquisition unit 201 reads the target data set TDS from the storage unit 250 a and outputs it to the detection unit 202 .
 検知部202は、対象画像用検知モデルを用いて、対象画像に含まれるオブジェクトの検知を行う。具体的には、検知部202は、取得部201から取得した対象データセットTDSに含まれる対象画像を物体検知モデルDMに入力し、物体検知モデルDMから出力された推論結果を取得する。検知部202は、取得した推論結果を出力部260aへ出力する。これにより、出力部260aは、対象画像の各々について、
・対象画像に含まれるオブジェクトの存否
・対象画像に含まれるオブジェクトの位置
・対象画像に含まれるオブジェクトのサイズ
・対象画像に含まれるオブジェクトのカテゴリ
の少なくとも何れかを出力する。典型的には、出力部260aは、オブジェクトの少なくとも一部にカテゴリおよびバウンディングボックスが付された対象画像を表示装置に表示させる。当該表示装置は、出力部260aであってもよいし、情報処理装置20aと通信可能に接続された表示装置(図示せず)であってもよい。
The detection unit 202 detects an object included in the target image using the target image detection model. Specifically, the detection unit 202 inputs the target image included in the target data set TDS acquired from the acquisition unit 201 to the object detection model DM, and acquires the inference result output from the object detection model DM. The detection unit 202 outputs the obtained inference result to the output unit 260a. As a result, the output unit 260a, for each target image,
At least one of the presence/absence of an object included in the target image, the position of the object included in the target image, the size of the object included in the target image, and the category of the object included in the target image is output. Typically, the output unit 260a causes the display device to display the target image in which at least a part of the object is assigned a category and a bounding box. The display device may be the output unit 260a, or may be a display device (not shown) communicably connected to the information processing device 20a.
 上述のように、本例示的実施形態に係る情報処理装置20aにおいては、自動で決定された閾値を用いて擬似ラベルを決定し、当該擬似ラベルが関連付けられた画像を含むデータセットを用いて学習が行われた対象画像用検知モデルを用いてオブジェクトを検知する構成が採用されている。このため、本例示的実施形態に係る情報処理装置20aによれば、閾値の調整に関するコストを削減した対象画像用検知モデルを用いて、画像に含まれるオブジェクトを検知することができるという効果が得られる。 As described above, in the information processing apparatus 20a according to the present exemplary embodiment, a pseudo label is determined using an automatically determined threshold, and a data set including images associated with the pseudo label is used for learning. A configuration is adopted in which an object is detected using the target image detection model for which the above is performed. For this reason, according to the information processing apparatus 20a according to the present exemplary embodiment, it is possible to detect an object included in an image using a target image detection model in which the cost for adjusting the threshold value is reduced. be done.
 また、本例示的実施形態に係る情報処理装置20aにおいては、対象画像用検知モデルによる、対象画像における推論結果を出力する構成が採用されている。このため、本例示的実施形態に係る情報処理装置20aによれば、情報処理装置20aのユーザが、当該推論結果を認識することができるという効果が得られる。 In addition, in the information processing apparatus 20a according to the present exemplary embodiment, a configuration is adopted in which an inference result for the target image is output by the target image detection model. Therefore, according to the information processing device 20a according to the exemplary embodiment, an effect is obtained that the user of the information processing device 20a can recognize the inference result.
 〔例示的実施形態3〕
 本発明の第3の例示的実施形態について、図面を参照して詳細に説明する。なお、例示的実施形態1または2にて説明した構成要素と同じ機能を有する構成要素については、同じ符号を付記し、その説明を繰り返さない。
[Exemplary embodiment 3]
A third exemplary embodiment of the invention will now be described in detail with reference to the drawings. Components having the same functions as the components described in the exemplary embodiments 1 and 2 are denoted by the same reference numerals, and description thereof will not be repeated.
 <情報処理装置10bの構成>
 情報処理装置10bの構成について、図11を参照して説明する。図11は、情報処理装置10bの構成を示すブロック図である。図11に示すように、情報処理装置10bは、制御部100bおよび記憶部150bを備える。制御部100bは、情報処理装置10bの各部を統括して制御する。記憶部150bは、情報処理装置10bが使用する各種プログラムやデータを記憶する。
<Configuration of information processing device 10b>
The configuration of the information processing device 10b will be described with reference to FIG. FIG. 11 is a block diagram showing the configuration of the information processing device 10b. As shown in FIG. 11, the information processing device 10b includes a control section 100b and a storage section 150b. The control unit 100b centrally controls each unit of the information processing device 10b. The storage unit 150b stores various programs and data used by the information processing device 10b.
 記憶部150bが、例示的実施形態2にて説明した記憶部150aと異なる点は、データセット2(DS2)に含まれるデータである。当該データの詳細について、図12を参照して説明する。 The difference between the storage unit 150b and the storage unit 150a described in the second exemplary embodiment is the data included in the data set 2 (DS2). Details of the data will be described with reference to FIG.
 図12は、データセット1(DS1)およびデータセット2(DS2)に含まれるデータの具体例を示す図である。具体的には、図12には、データセット1(DS1)に含まれる画像の1つと、データセット2(DS2)に含まれる画像の1つとが示されている。 FIG. 12 is a diagram showing a specific example of data included in dataset 1 (DS1) and dataset 2 (DS2). Specifically, FIG. 12 shows one of the images contained in dataset 1 (DS1) and one of the images contained in dataset 2 (DS2).
 本例示的実施形態において、データセット1(DS1)に含まれる画像には、5つのオブジェクト、具体的には、3体の犬と、2体の牛とが含まれている。また、データセット2(DS2)に含まれる画像には、5つのオブジェクト、具体的には、2体の犬と、3体の牛とが含まれている。本例示的実施形態に係るデータセット1(DS1)およびデータセット2(DS2)は、正解付けされたカテゴリ(責任範囲)が異なる複数のデータセット(エキスパートデータセットとも呼ぶ)である。 In this exemplary embodiment, the images contained in dataset 1 (DS1) contain five objects, specifically three dogs and two cows. The images contained in dataset 2 (DS2) also contain five objects, specifically two dogs and three cows. Data set 1 (DS1) and data set 2 (DS2) according to this exemplary embodiment are a plurality of data sets (also called expert data sets) with different categories (responsibility ranges) assigned as correct answers.
 本例示的実施形態では、例示的実施形態2と異なり、データセット2(DS2)に含まれる1又は複数の画像の少なくとも一部には、1又は複数の正解ラベルが付されている。データセット1(DS1)に含まれる画像において、3体の犬の各々には正解ラベルが関連付けられている。すなわち、データセット1(DS1)は、責任範囲が「犬(dog)」であるエキスパートデータセットである。データセット2(DS2)に含まれる画像において、オブジェクトOb1を含む3体の牛の各々には正解ラベルが関連付けられている。すなわち、データセット2(DS2)は、責任範囲が「牛(cow)」であるエキスパートデータセットである。なお、図示してはいないが、本例示的実施形態に係る評価用データセットDSEは、データセット1(DS1)と同様に、犬に正解ラベルが関連付けられた画像を含むデータセットである。 In this exemplary embodiment, unlike exemplary embodiment 2, at least some of the one or more images included in dataset 2 (DS2) are labeled with one or more correct labels. In the images contained in dataset 1 (DS1), each of the three dogs is associated with a correct label. That is, dataset 1 (DS1) is an expert dataset whose responsibility is "dog". In the images included in data set 2 (DS2), each of the three cows including object Ob1 is associated with a correct label. That is, data set 2 (DS2) is an expert data set whose responsibility is "cow". Although not shown, the evaluation data set DSE according to this exemplary embodiment is a data set including images in which correct labels are associated with dogs, similar to data set 1 (DS1).
 (制御部100bの構成)
 図11に示すように、制御部100bが、例示的実施形態2にて説明した制御部100aと異なる点は、関連付け部1042に代えて関連付け部1042bを含む点である。擬似ラベル生成部1041および関連付け部1042bは、例示的実施形態1におけるデータセット生成部104に相当し、本例示的実施形態においてデータセット生成手段を実現する構成である。
(Configuration of control unit 100b)
As shown in FIG. 11, the control unit 100b differs from the control unit 100a described in the second exemplary embodiment in that it includes an associating unit 1042b instead of the associating unit 1042. As shown in FIG. The pseudo-label generating unit 1041 and the associating unit 1042b correspond to the dataset generating unit 104 in exemplary embodiment 1, and are configured to implement the dataset generating means in this exemplary embodiment.
 関連付け部1042bは、関連付け部1042の機能に加え、以下の機能を有する。すなわち、関連付け部1042bは、擬似ラベルに関連付けられた画像に含まれるオブジェクトに正解ラベルが付与されていた場合であって、当該擬似ラベルに含まれる領域情報が示す領域と、当該正解ラベルに含まれる領域情報が示す領域との重なりの度合いが所定の度合い以上である場合に、当該擬似ラベルを削除する。 The association unit 1042b has the following functions in addition to the functions of the association unit 1042. That is, when the object included in the image associated with the pseudo label is given a correct label, the associating unit 1042b associates the area indicated by the area information included in the pseudo label with the area included in the correct label. If the degree of overlap with the area indicated by the area information is greater than or equal to a predetermined degree, the pseudo label is deleted.
 図13は、データセット2(DS2)、データセット2’(DS2’)およびデータセット2”に含まれるデータの具体例を示す図である。具体的には、図13には、データセット2(DS2)、データセット2’(DS2’)およびデータセット2”の各々に含まれる画像の1つが示されている。なお、本例示的実施形態に係るデータセット2(DS2)はすでに説明しているため、ここでは説明を繰り返さない。 FIG. 13 is a diagram showing specific examples of data included in data set 2 (DS2), data set 2′ (DS2′), and data set 2″. (DS2), one of the images contained in each of dataset 2' (DS2') and dataset 2'' is shown. Note that data set 2 (DS2) according to this exemplary embodiment has already been described and will not be repeated here.
 データセット2’(DS2’)は、例示的実施形態2にて説明したとおり、データセット2(DS2)に含まれる1又は複数の画像の各々に、擬似ラベルが関連付けられたデータセットである。当該擬似ラベルは、データセット1(DS1)および評価用データセットDSEに基づくものといえるため、図13の例では、カテゴリが「dog」である正解ラベルが、オブジェクトの一部に関連付けられている。ここで、図13の例では、オブジェクトOb1に「dog」とのカテゴリを含む擬似ラベル、すなわち誤った擬似ラベルが関連付けられている。なお、図13には示していないが、データセット2’(DS2’)の生成元であるデータセット2(DS2)に含まれる画像において、オブジェクトOb1には正解ラベルが関連付けられているため、データセット2’(DS2’)に含まれる画像において、オブジェクトOb1には、擬似ラベルに加えて当該正解ラベルが関連付けられている。 Data set 2' (DS2') is a data set in which each of the one or more images included in data set 2 (DS2) is associated with a pseudo-label, as described in exemplary embodiment 2. Since the pseudo-label can be said to be based on the data set 1 (DS1) and the evaluation data set DSE, in the example of FIG. 13, the correct label whose category is "dog" is associated with a part of the object. . Here, in the example of FIG. 13, the object Ob1 is associated with a pseudo-label including the category "dog", that is, an incorrect pseudo-label. Although not shown in FIG. 13, in the image included in the data set 2 (DS2) that is the source of the data set 2′ (DS2′), the correct label is associated with the object Ob1. In the images included in set 2' (DS2'), object Ob1 is associated with the correct label in addition to the pseudo label.
 関連付け部1042bは、データセット2’(DS2’)に含まれる各画像について、データセット2(DS2)に含まれる画像から対応する画像を特定する。 The associating unit 1042b identifies the corresponding image from the images included in the dataset 2 (DS2) for each image included in the dataset 2' (DS2').
 続いて、関連付け部1042bは、データセット2’(DS2’)に含まれる画像の1つを選択し、当該画像に関連付けられた擬似ラベルのバウンディングボックスの各々について、特定した画像に含まれる正解ラベルのバウンディングボックスとのIOUを算出する。当該IOUが、上述した重なりの度合いに相当する。関連付け部1042bは、データセット2’(DS2’)に含まれる画像のすべてについてこの処理を実行する。 Subsequently, the associating unit 1042b selects one of the images included in the data set 2′ (DS2′), and for each of the bounding boxes of the pseudo labels associated with the image, the correct label included in the identified image. Calculate the IOU with the bounding box of . The IOU corresponds to the degree of overlap described above. The associating unit 1042b performs this process for all images included in dataset 2' (DS2').
 関連付け部1042bは、IOUが所定値以上となる正解ラベルがある場合、擬似ラベルを削除する。図13の例では、オブジェクトOb1に関連付けられた擬似ラベルが、オブジェクトOb1に関連付けられた正解ラベルとのIOUが所定値以上となる。このため、関連付け部1042bは、オブジェクトOb1に関連付けられた擬似ラベルを削除する。図13に示すデータセット2”に含まれる画像は、当該擬似ラベルの削除後の画像である。図13に示すように、当該画像では、オブジェクトOb1に関連付けられていた擬似ラベルが削除され、オブジェクトOb1には正解ラベルのみが関連付けられている。 The associating unit 1042b deletes the pseudo label when there is a correct label with an IOU equal to or greater than a predetermined value. In the example of FIG. 13, the IOU between the pseudo label associated with the object Ob1 and the correct label associated with the object Ob1 is greater than or equal to a predetermined value. Therefore, the associating unit 1042b deletes the pseudo label associated with the object Ob1. The image included in the data set 2″ shown in FIG. 13 is the image after the pseudo label is deleted. As shown in FIG. 13, in the image, the pseudo label associated with the object Ob1 is deleted, and the object Only correct labels are associated with Ob1.
 上述のように、本例示的実施形態に係る情報処理装置10bにおいては、画像に付された擬似ラベルおよび正解ラベルにおいて、擬似ラベルに含まれる領域情報が示す領域と、正解ラベルに含まれる領域情報が示す領域との重なりの度合いが所定の度合い以上である場合に、擬似ラベルを削除する構成が採用されている。このため、本例示的実施形態に係る情報処理装置10bによれば、擬似ラベルが適切でない場合に当該擬似ラベルが削除され、正解ラベルが残るので、対象画像用検知モデルを用いた、オブジェクトの検知精度を向上させることができるという効果が得られる。なお、擬似ラベルが適切でないとは、例えば、(1)当該擬似ラベルのカテゴリが、オブジェクトのカテゴリと異なる、(2)当該擬似ラベルのバウンディングボックスが、オブジェクトの一部を内包していない、などを指す。 As described above, in the information processing apparatus 10b according to the present exemplary embodiment, in the pseudo label and the correct label attached to the image, the area indicated by the area information included in the pseudo label and the area information included in the correct label is greater than or equal to a predetermined degree, the pseudo label is deleted. Therefore, according to the information processing apparatus 10b according to the present exemplary embodiment, if the pseudo label is not appropriate, the pseudo label is deleted and the correct label remains. The effect of being able to improve accuracy is obtained. Note that the pseudo-label is not appropriate, for example, (1) the category of the pseudo-label is different from the category of the object, (2) the bounding box of the pseudo-label does not enclose part of the object, etc. point to
 特に、本例示的実施形態にて示した犬、牛などの、見た目が似ているオブジェクトに正解ラベルが付されたエキスパートデータセットの場合、オブジェクトにカテゴリが誤った擬似ラベルが関連付けられる可能性が高い。これに対し、本例示的実施形態に係る情報処理装置10bによれば、この誤った擬似ラベルを削除することができるので、精度よく擬似ラベルを生成することができ、対象画像用検知モデルを用いた、オブジェクトの検知精度を向上させることができるという効果が得られる。 In particular, in the case of an expert data set in which similar-looking objects such as dogs, cows, etc. shown in this exemplary embodiment were labeled correctly, it is possible that pseudo-labels with incorrect categories would be associated with the objects. high. On the other hand, according to the information processing apparatus 10b according to the present exemplary embodiment, since the erroneous pseudo label can be deleted, the pseudo label can be generated with high accuracy, and the target image detection model can be used. In addition, it is possible to obtain an effect that the accuracy of object detection can be improved.
 〔例示的実施形態4〕
 本発明の第4の例示的実施形態について、図面を参照して詳細に説明する。なお、例示的実施形態1~3にて説明した構成要素と同じ機能を有する構成要素については、同じ符号を付記し、その説明を繰り返さない。
[Exemplary embodiment 4]
A fourth exemplary embodiment of the invention will now be described in detail with reference to the drawings. Components having the same functions as those described in exemplary embodiments 1 to 3 are denoted by the same reference numerals, and description thereof will not be repeated.
 <情報処理装置10cの概要>
 本例示的実施形態に係る情報処理装置10cは、エキスパートデータセットの各々に基づき閾値を決定し、当該閾値に基づき複数のデータセットの各々に擬似ラベルを付与する。
<Overview of Information Processing Device 10c>
The information processing device 10c according to this exemplary embodiment determines a threshold based on each of the expert datasets, and assigns a pseudo-label to each of the plurality of datasets based on the threshold.
 <情報処理装置10cの構成>
 情報処理装置10cの構成について、図14を参照して説明する。図14は、情報処理装置10cの構成を示すブロック図である。図14に示すように、情報処理装置10cは、第1の制御部100c、第1の記憶部150c、第2の制御部110c、第2の記憶部160cを備える。第1の制御部100cおよび第2の制御部110cは、情報処理装置10cの各部を統括して制御する。第1の記憶部150cおよび第2の記憶部160cは、情報処理装置10cが使用する各種プログラムやデータを記憶する。
<Configuration of information processing device 10c>
The configuration of the information processing device 10c will be described with reference to FIG. FIG. 14 is a block diagram showing the configuration of the information processing device 10c. As shown in FIG. 14, the information processing device 10c includes a first control section 100c, a first storage section 150c, a second control section 110c, and a second storage section 160c. The first control unit 100c and the second control unit 110c collectively control each unit of the information processing device 10c. The first storage unit 150c and the second storage unit 160c store various programs and data used by the information processing device 10c.
 なお、第1の制御部100cおよび第2の制御部110cは一体となっていてもよい。また、第1の記憶部150cおよび第2の記憶部160cは一体となっていてもよい。あるいは、第2の制御部110cおよび第2の記憶部160cは、情報処理装置10cと通信可能に接続された別装置に備えられていてもよい。 Note that the first control unit 100c and the second control unit 110c may be integrated. Also, the first storage unit 150c and the second storage unit 160c may be integrated. Alternatively, the second control unit 110c and the second storage unit 160c may be provided in another device communicably connected to the information processing device 10c.
 第1の記憶部150cは、データセット1(DS1)、データセット2(DS2)、評価用データセット1(DSE1)および評価用データセット2(DSE2)を記憶する。データセット1(DS1)は、本例示的実施形態における第1のデータセットである。データセット2(DS2)は、本例示的実施形態における第2のデータセットである。また、データセット1(DS1)およびデータセット2(DS2)は、上述のエキスパートデータセットである。評価用データセット1(DSE1)は、本例示的実施形態における第1の評価用データセットである。評価用データセット2(DSE2)は、本例示的実施形態における第2の評価用データセットである。 The first storage unit 150c stores data set 1 (DS1), data set 2 (DS2), evaluation data set 1 (DSE1), and evaluation data set 2 (DSE2). Data set 1 (DS1) is the first data set in this exemplary embodiment. Data set 2 (DS2) is the second data set in this exemplary embodiment. Data set 1 (DS1) and data set 2 (DS2) are the expert data sets described above. Evaluation dataset 1 (DSE1) is the first evaluation dataset in this exemplary embodiment. Evaluation dataset 2 (DSE2) is the second evaluation dataset in this exemplary embodiment.
 ここで、データセット1(DS1)およびデータセット2(DS2)の詳細について説明する。図15は、データセット1(DS1)およびデータセット2(DS2)に含まれるデータの具体例を示す図である。具体的には、図15には、データセット1(DS1)に含まれる画像の1つと、データセット2(DS2)に含まれる画像の1つとが示されている。 Here, the details of dataset 1 (DS1) and dataset 2 (DS2) will be described. FIG. 15 is a diagram showing a specific example of data included in dataset 1 (DS1) and dataset 2 (DS2). Specifically, FIG. 15 shows one of the images contained in dataset 1 (DS1) and one of the images contained in dataset 2 (DS2).
 これらの画像の各々には、5つのオブジェクト、具体的には3人の人物と、2つの鞄とが含まれている。データセット1(DS1)に含まれる画像において、2つの鞄の各々には正解ラベルが関連付けられている。すなわち、データセット1(DS1)は、責任範囲が「鞄(bag)」であるエキスパートデータセットである。データセット2(DS2)に含まれる画像において、3人の人物には正解ラベルが関連付けられている。すなわち、データセット2(DS2)は、責任範囲が「人物(person)」であるエキスパートデータセットである。 Each of these images contains five objects, specifically three people and two bags. In the images contained in dataset 1 (DS1), each of the two bags is associated with a correct label. That is, data set 1 (DS1) is an expert data set whose responsibility is "bag". In the images contained in dataset 2 (DS2), the three persons are associated with correct labels. That is, data set 2 (DS2) is an expert data set whose scope of responsibility is "person".
 評価用データセット1(DSE1)は、データセット1(DS1)と同様に、各画像に含まれる責任範囲のオブジェクトの各々に正解ラベルが関連付けられたデータセットである。図15の例に基づけば、評価用データセット1(DSE1)は、鞄に正解ラベルが関連付けられた画像を含むデータセットである。例えば、評価用データセット1(DSE1)に含まれる画像は、データセット1(DS1)に含まれる画像の一部であってもよい。また、例えば、評価用データセット1(DSE1)に含まれる画像は、データセット1(DS1)に含まれない画像であって、データセット1(DS1)における責任範囲のオブジェクトに正解ラベルが関連付けられた画像であってもよい。 Similar to dataset 1 (DS1), evaluation dataset 1 (DSE1) is a dataset in which a correct label is associated with each of the objects within the scope of responsibility included in each image. Based on the example of FIG. 15, evaluation data set 1 (DSE1) is a data set that includes images in which correct labels are associated with bags. For example, the images contained in evaluation dataset 1 (DSE1) may be part of the images contained in dataset 1 (DS1). Also, for example, the images included in the evaluation data set 1 (DSE1) are images that are not included in the data set 1 (DS1), and the correct label is associated with the object of the responsibility range in the data set 1 (DS1). It may be an image that
 評価用データセット2(DSE2)は、データセット2(DS2)と同様に、各画像に含まれる責任範囲のオブジェクトの各々に正解ラベルが関連付けられたデータセットである。図15の例に基づけば、評価用データセット2(DSE2)は、人物に正解ラベルが関連付けられた画像を含むデータセットである。例えば、評価用データセット2(DSE2)に含まれる画像は、データセット2(DS2)に含まれる画像の一部であってもよい。また、例えば、評価用データセット2(DSE2)に含まれる画像は、データセット2(DS2)に含まれない画像であって、データセット2(DS2)における責任範囲のオブジェクトに正解ラベルが関連付けられた画像であってもよい。 Similarly to dataset 2 (DS2), dataset 2 for evaluation (DSE2) is a dataset in which a correct label is associated with each of the objects within the scope of responsibility included in each image. Based on the example of FIG. 15, evaluation data set 2 (DSE2) is a data set containing images in which correct labels are associated with people. For example, the images contained in evaluation dataset 2 (DSE2) may be part of the images contained in dataset 2 (DS2). Also, for example, the images included in the evaluation data set 2 (DSE2) are images that are not included in the data set 2 (DS2), and the correct labels are associated with the objects in the scope of responsibility in the data set 2 (DS2). It may be an image that
 (第1の制御部100cの構成)
 図14に示すように、第1の制御部100cは、第1の学習部101-1、第2の学習部101-2、第1の閾値決定部102-1、第2の閾値決定部102-2、第1の推論部103-1、第2の推論部103-2、第1のデータセット生成部104-1、第2のデータセット生成部104-2を備えている。
(Configuration of first control unit 100c)
As shown in FIG. 14, the first control unit 100c includes a first learning unit 101-1, a second learning unit 101-2, a first threshold determination unit 102-1, a second threshold determination unit 102 -2, a first inference unit 103-1, a second inference unit 103-2, a first data set generation unit 104-1, and a second data set generation unit 104-2.
 第1の学習部101-1は、本例示的実施形態において第1の学習手段を実現する構成である。第2の学習部101-2は、本例示的実施形態において第2の学習手段を実現する構成である。第1の閾値決定部102-1は、本例示的実施形態において第1の閾値決定手段を実現する構成である。第2の閾値決定部102-2は、本例示的実施形態において第2の閾値決定手段を実現する構成である。第1の推論部103-1は、本例示的実施形態において第1の推論手段を実現する構成である。第2の推論部103-2は、本例示的実施形態において第2の推論手段を実現する構成である。第1のデータセット生成部104-1は、本例示的実施形態において第1のデータセット生成手段を実現する構成である。第2のデータセット生成部104-2は、本例示的実施形態において第2のデータセット生成手段を実現する構成である。 The first learning unit 101-1 is configured to implement the first learning means in this exemplary embodiment. The second learning unit 101-2 is configured to implement the second learning means in this exemplary embodiment. The first threshold determination unit 102-1 is a configuration that implements the first threshold determination means in this exemplary embodiment. The second threshold determination unit 102-2 is a configuration that implements the second threshold determination means in this exemplary embodiment. The first inference unit 103-1 is a configuration that implements the first inference means in this exemplary embodiment. The second inference unit 103-2 is a configuration that implements the second inference means in this exemplary embodiment. The first data set generation unit 104-1 is a configuration that implements the first data set generation means in this exemplary embodiment. The second data set generation unit 104-2 is a configuration that implements the second data set generation means in this exemplary embodiment.
 第1の学習部101-1は、第1のデータセットを用いて第1の検知モデルの学習を行う。具体的には、第1の学習部101-1は、データセット1(DS1)を取得し、当該データセット1(DS1)を用いて、第1の擬似ラベル生成用物体検知モデルPDM1の学習を行う。より具体的には、第1の学習部101-1は、第1の記憶部150cに記憶されているデータセット1(DS1)を読み出し、当該データセット1(DS1)を用いて、第1の擬似ラベル生成用物体検知モデルPDM1の学習を行う。そして、第1の学習部101-1は、学習済みの第1の擬似ラベル生成用物体検知モデルPDM1を第1の閾値決定部102-1および第1の推論部103-1へ出力する。 The first learning unit 101-1 uses the first data set to learn the first detection model. Specifically, the first learning unit 101-1 acquires the data set 1 (DS1), and uses the data set 1 (DS1) to learn the first pseudo label generation object detection model PDM1. conduct. More specifically, the first learning unit 101-1 reads the data set 1 (DS1) stored in the first storage unit 150c, and uses the data set 1 (DS1) to perform the first The object detection model PDM1 for pseudo label generation is learned. Then, the first learning unit 101-1 outputs the learned first object detection model PDM1 for pseudo label generation to the first threshold determination unit 102-1 and the first inference unit 103-1.
 第1の閾値決定部102-1は、第1の評価用データセットに含まれる1又は複数の画像の各々を第1の検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第1の閾値を決定する。 The first threshold determination unit 102-1 inputs one or more inference results obtained by inputting each of one or more images included in the first evaluation data set into the first detection model, and the one or more inference results. Alternatively, the first threshold is determined by referring to the result of comparison with one or more correct labels attached to each of the plurality of images.
 具体的には、第1の閾値決定部102-1は、第1の記憶部150cに記憶されている評価用データセット1(DSE1)を読み出し、第1の学習部101-1から取得した第1の擬似ラベル生成用物体検知モデルPDM1に入力する。そして、第1の閾値決定部102-1は、第1の擬似ラベル生成用物体検知モデルPDM1が出力した推論結果を取得する。 Specifically, the first threshold determination unit 102-1 reads the evaluation data set 1 (DSE1) stored in the first storage unit 150c, and the first threshold value acquired from the first learning unit 101-1 1 to the pseudo label generation object detection model PDM1. Then, the first threshold determination unit 102-1 acquires the inference result output by the first pseudo-label generating object detection model PDM1.
 続いて、第1の閾値決定部102-1は、評価用データセット1(DSE1)に含まれる1又は複数の画像の各々における各推論結果と、当該画像の各々における正解ラベルとの比較結果に基づき、各推論結果の評価値を算出する。当該評価値は、例えば、F値である。なお、評価値がF値である例における、F値の算出処理の詳細は、例示的実施形態2にて説明しているため、ここでは説明を繰り返さない。 Subsequently, the first threshold determination unit 102-1 compares each inference result in each of the one or more images included in the evaluation data set 1 (DSE1) with the correct label in each of the images. Based on this, the evaluation value of each inference result is calculated. The evaluation value is, for example, the F value. Note that the details of the calculation process of the F value in the example where the evaluation value is the F value have been described in the second exemplary embodiment, and thus the description will not be repeated here.
 続いて、第1の閾値決定部102-1は、基準値ごとに算出された複数のF値のうちの最大値を特定し、特定したF値に紐づけられた基準値を閾値とする。当該閾値が上述の第1の閾値である。第1の閾値決定部102-1は、決定した第1の閾値を第1のデータセット生成部104-1へ出力する。 Next, the first threshold determination unit 102-1 identifies the maximum value among the plurality of F values calculated for each reference value, and sets the reference value linked to the identified F value as the threshold. This threshold is the above-described first threshold. First threshold determination section 102-1 outputs the determined first threshold to first data set generation section 104-1.
 第1の推論部103-1は、第2のデータセットに含まれる1又は複数の画像の各々を第1の検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する。具体的には、第1の推論部103-1は、第1の記憶部150cに記憶されているデータセット2(DS2)を読み出し、第1の学習部101-1から取得した第1の擬似ラベル生成用物体検知モデルPDM1に、当該データセット2(DS2)に含まれる1又は複数の画像の各々を入力し、当該画像の各々についての1又は複数の推論結果PR1を取得する。第1の推論部103-1は、取得した推論結果PR1を第1のデータセット生成部104-1へ出力する。 The first inference unit 103-1 inputs each of the one or more images included in the second data set to the first detection model, thereby obtaining one or more images for each of the one or more images. Get the inference result of Specifically, the first inference unit 103-1 reads the data set 2 (DS2) stored in the first storage unit 150c, and obtains the first pseudo data obtained from the first learning unit 101-1. One or more images included in the data set 2 (DS2) are input to the label generation object detection model PDM1, and one or more inference results PR1 are obtained for each of the images. First inference unit 103-1 outputs obtained inference result PR1 to first data set generation unit 104-1.
 第1のデータセット生成部104-1は、第1の推論部103-1による1又は複数の推論結果のうち、第1の閾値以上の信頼度を有する推論結果を擬似ラベルに設定し、当該擬似ラベルを、対応する画像に関連付けることによって、擬似ラベル付与後の第2のデータセットを生成する。具体的には、第1のデータセット生成部104-1は、第1の推論部103-1による1又は複数の推論結果PR1のうち、第1の閾値以上の信頼度を有する推論結果を擬似ラベルに設定する。続いて、第1のデータセット生成部104-1は、当該擬似ラベルを、対応する画像に関連付ける。これにより、データセット2(DS2)に含まれる1又は複数の画像の各々に、擬似ラベルが関連付けられたデータセット2’(DS2’)が生成される。第1のデータセット生成部104-1は、生成したデータセット2’(DS2’)を第2の記憶部160cに記憶する。 First data set generation unit 104-1 sets an inference result having a reliability equal to or higher than a first threshold among one or more inference results by first inference unit 103-1 as a pseudo label, A second post-pseudo-labeled data set is generated by associating the pseudo-labels with the corresponding images. Specifically, first data set generation unit 104-1 simulates an inference result having a reliability equal to or higher than a first threshold among one or more inference results PR1 by first inference unit 103-1. Set to label. The first dataset generator 104-1 then associates the pseudo-label with the corresponding image. This generates a dataset 2' (DS2') in which each of the one or more images contained in the dataset 2 (DS2) is associated with a pseudo-label. The first data set generation unit 104-1 stores the generated data set 2' (DS2') in the second storage unit 160c.
 第2の学習部101-2は、第2のデータセットを用いて第2の検知モデルの学習を行う。具体的には、第2の学習部101-2は、データセット2(DS2)を取得し、当該データセット2(DS2)を用いて、第2の擬似ラベル生成用物体検知モデルPDM2の学習を行う。より具体的には、第2の学習部101-2は、第1の記憶部150cに記憶されているデータセット2(DS2)を読み出し、当該データセット2(DS2)を用いて、第2の擬似ラベル生成用物体検知モデルPDM2の学習を行う。そして、第2の学習部101-2は、学習済みの第2の擬似ラベル生成用物体検知モデルPDM2を第2の閾値決定部102-2および第2の推論部103-2へ出力する。 The second learning unit 101-2 learns the second detection model using the second data set. Specifically, the second learning unit 101-2 acquires the data set 2 (DS2), and uses the data set 2 (DS2) to learn the second pseudo label generation object detection model PDM2. conduct. More specifically, the second learning unit 101-2 reads the data set 2 (DS2) stored in the first storage unit 150c, and uses the data set 2 (DS2) to perform the second The object detection model PDM2 for pseudo label generation is learned. Then, the second learning unit 101-2 outputs the learned second object detection model PDM2 for pseudo label generation to the second threshold determination unit 102-2 and the second inference unit 103-2.
 第2の閾値決定部102-2は、第2の評価用データセットに含まれる1又は複数の画像の各々を第2の検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第2の閾値を決定する。 The second threshold determination unit 102-2 inputs one or more inference results obtained by inputting each of one or more images included in the second evaluation data set into the second detection model, and the one or more inference results. Alternatively, the second threshold is determined by referring to the result of comparison with one or more correct labels attached to each of the plurality of images.
 具体的には、第2の閾値決定部102-2は、第1の記憶部150cに記憶されている評価用データセット2(DSE2)を読み出し、第2の学習部101-2から取得した第2の擬似ラベル生成用物体検知モデルPDM2に入力する。そして、第2の閾値決定部102-2は、第2の擬似ラベル生成用物体検知モデルPDM2が出力した推論結果を取得する。 Specifically, the second threshold determination unit 102-2 reads the evaluation data set 2 (DSE2) stored in the first storage unit 150c, and the second threshold value obtained from the second learning unit 101-2 2 to the object detection model PDM2 for pseudo label generation. Then, the second threshold determination unit 102-2 acquires the inference result output by the second pseudo-label generation object detection model PDM2.
 続いて、第2の閾値決定部102-2は、評価用データセット2(DSE2)に含まれる1又は複数の画像の各々における各推論結果と、当該画像の各々における正解ラベルとの比較結果に基づき、各推論結果の評価値を算出する。当該評価値は、例えば、F値である。なお、評価値がF値である例における、F値の算出処理の詳細は、例示的実施形態2にて説明しているため、ここでは説明を繰り返さない。 Subsequently, the second threshold determination unit 102-2 compares each inference result in each of the one or more images included in the evaluation data set 2 (DSE2) with the correct label in each of the images. Based on this, the evaluation value of each inference result is calculated. The evaluation value is, for example, the F value. Note that the details of the calculation process of the F value in the example where the evaluation value is the F value have been described in the second exemplary embodiment, and thus the description will not be repeated here.
 続いて、第2の閾値決定部102-2は、基準値ごとに算出された複数のF値のうちの最大値を特定し、特定したF値に紐づけられた基準値を閾値とする。当該閾値が上述の第2の閾値である。第2の閾値決定部102-2は、決定した第2の閾値を第2のデータセット生成部104-2に出力する。 Subsequently, the second threshold determination unit 102-2 identifies the maximum value among the plurality of F values calculated for each reference value, and sets the reference value linked to the identified F value as the threshold. This threshold is the above-described second threshold. Second threshold determination section 102-2 outputs the determined second threshold to second data set generation section 104-2.
 第2の推論部103-2は、第1のデータセットに含まれる1又は複数の画像の各々を第2の検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する。具体的には、第2の推論部103-2は、第1の記憶部150cに記憶されているデータセット1(DS1)を読み出し、第2の学習部101-2から取得した第2の擬似ラベル生成用物体検知モデルPDM2に、当該データセット1(DS1)に含まれる1又は複数の画像の各々を入力し、当該画像の各々についての1又は複数の推論結果PR2を取得する。第2の推論部103-2は、取得した推論結果PR2を第2のデータセット生成部104-2へ出力する。 The second inference unit 103-2 inputs each of the one or more images included in the first data set to the second detection model, thereby obtaining one or more images for each of the one or more images. Get the inference result of Specifically, the second inference unit 103-2 reads the data set 1 (DS1) stored in the first storage unit 150c, and obtains the second pseudo data obtained from the second learning unit 101-2. One or more images included in the data set 1 (DS1) are input to the label generation object detection model PDM2, and one or more inference results PR2 are obtained for each of the images. Second inference unit 103-2 outputs obtained inference result PR2 to second data set generation unit 104-2.
 第2のデータセット生成部104-2は、第2の推論部103-2による1又は複数の推論結果のうち、第2の閾値以上の信頼度を有する推論結果を擬似ラベルに設定し、当該擬似ラベルを、対応する画像に関連付けることによって、擬似ラベル付与後の第1のデータセットを生成する。具体的には、第2のデータセット生成部104-2は、第2の推論部103-2による1又は複数の推論結果PR2のうち、第2の閾値以上の信頼度を有する推論結果を擬似ラベルに設定する。続いて、第2のデータセット生成部104-2は、当該擬似ラベルを、対応する画像に関連付ける。これにより、データセット1(DS1)に含まれる1又は複数の画像の各々に、擬似ラベルが関連付けられたデータセット1’(DS1’)が生成される。第2のデータセット生成部104-2は、生成したデータセット1’(DS1’)を第2の記憶部160cに記憶する。 The second data set generation unit 104-2 sets an inference result having a reliability equal to or higher than a second threshold among the one or more inference results by the second inference unit 103-2 as a pseudo label, A first post-pseudo-labeled data set is generated by associating the pseudo-labels with the corresponding images. Specifically, the second data set generation unit 104-2 simulates an inference result having a reliability equal to or higher than the second threshold among the one or more inference results PR2 by the second inference unit 103-2. Set to label. Second data set generator 104-2 then associates the pseudo-label with the corresponding image. This generates a dataset 1' (DS1') in which each of the one or more images contained in the dataset 1 (DS1) is associated with a pseudo-label. The second data set generation unit 104-2 stores the generated data set 1' (DS1') in the second storage unit 160c.
 ここで、データセット1’(DS1’)およびデータセット2’(DS2’)の詳細について説明する。図16は、データセット1’(DS1’)およびデータセット2’(DS2’)に含まれるデータの具体例を示す図である。具体的には、図16には、データセット1’(DS1’)に含まれる画像の1つと、データセット2’(DS2’)に含まれる画像の1つとが示されている。 Here, the details of dataset 1' (DS1') and dataset 2' (DS2') will be described. FIG. 16 is a diagram showing specific examples of data included in data set 1' (DS1') and data set 2' (DS2'). Specifically, FIG. 16 shows one of the images contained in data set 1' (DS1') and one of the images contained in data set 2' (DS2').
 図16に示すデータセット1’(DS1’)に含まれる画像は、データセット1(DS1)に含まれる画像(図15参照)と同一である。データセット1’(DS1’)に含まれる画像のオブジェクトについて、2つの鞄の各々には正解ラベルが、3人の人物の各々には擬似ラベルが関連付けられている。当該正解ラベルは、データセット1’(DS1’)の生成元であるデータセット1(DS1)に含まれる画像において、データセット1(DS1)の責任範囲である鞄に関連付けられていた正解ラベルである。また、当該擬似ラベルは、第2のデータセット生成部104-2が、推論結果PR2に基づき設定した擬似ラベルである。推論結果PR2は、責任範囲が人物であるデータセット2(DS2)による学習が行われた、第2の擬似ラベル生成用物体検知モデルPDM2を用いた推論結果であるので、当該擬似ラベルは人物に関連付けられている。 The images included in dataset 1' (DS1') shown in FIG. 16 are the same as the images included in dataset 1 (DS1) (see FIG. 15). For the image objects contained in dataset 1' (DS1'), each of the two bags is associated with a correct label and each of the three persons with a pseudo-label. The correct label is the correct label associated with the bag that is the responsibility of the dataset 1 (DS1) in the images included in the dataset 1 (DS1) that is the source of the dataset 1′ (DS1′). be. Also, the pseudo-label is a pseudo-label set by the second data set generator 104-2 based on the inference result PR2. The inference result PR2 is an inference result using the second pseudo-label generation object detection model PDM2 that has been trained using the data set 2 (DS2) whose responsibility range is a person. Associated.
 図16に示すデータセット2’(DS2’)に含まれる画像は、データセット2(DS2)に含まれる画像(図15参照)と同一である。データセット2’(DS2’)に含まれる画像のオブジェクトについて、3人の人物の各々には正解ラベルが、2つの鞄の各々には擬似ラベルが関連付けられている。当該正解ラベルは、データセット2’(DS2’)の生成元であるデータセット2(DS2)に含まれる画像において、データセット2(DS2)の責任範囲である人物に関連付けられていた正解ラベルである。また、当該擬似ラベルは、第1のデータセット生成部104-1が、推論結果PR1に基づき設定した擬似ラベルである。推論結果PR1は、責任範囲が鞄であるデータセット1(DS1)による学習が行われた、第1の擬似ラベル生成用物体検知モデルPDM1を用いた推論結果であるので、当該擬似ラベルは鞄に関連付けられている。 The images included in dataset 2' (DS2') shown in FIG. 16 are the same as the images included in dataset 2 (DS2) (see FIG. 15). For the image objects contained in dataset 2' (DS2'), each of the three persons is associated with a correct label and each of the two bags with a pseudo-label. The correct label is the correct label associated with the person responsible for the dataset 2 (DS2) in the images included in the dataset 2 (DS2) that is the source of the dataset 2′ (DS2′). be. Also, the pseudo-label is a pseudo-label set by the first data set generator 104-1 based on the inference result PR1. The inference result PR1 is an inference result using the first pseudo-label generation object detection model PDM1 that has been trained using the data set 1 (DS1) whose responsibility range is the bag. Associated.
 第2の記憶部160cは、データセット1’(DS1’)、データセット2’(DS2’)および物体検知モデルDMを記憶している。データセット1’(DS1’)およびデータセット2’(DS2’)は、それぞれ、第2のデータセット生成部104-2および第1のデータセット生成部104-1が生成したデータセットである。物体検知モデルDMは、対象画像用検知モデルであり、詳細については後述する。 The second storage unit 160c stores data set 1' (DS1'), data set 2' (DS2'), and object detection model DM. Data set 1' (DS1') and data set 2' (DS2') are data sets generated by the second data set generator 104-2 and the first data set generator 104-1, respectively. The object detection model DM is a target image detection model, and the details thereof will be described later.
 (第2の制御部110cの構成)
 図14に示すように、第2の制御部110cは、再学習部105を備えている。再学習部105は、本例示的実施形態において擬似ラベル参照学習手段を実現する構成である。再学習部105は、擬似ラベル付与後のデータセットを用いて対象画像用検知モデルの学習を行う。具体的には、再学習部105は、当該学習として、第1の擬似ラベル生成用物体検知モデルPDM1、または、第2の擬似ラベル生成用物体検知モデルPDM2の再学習を行う。より具体的には、再学習部105は、データセット1’(DS1’)およびデータセット2’(DS2’)を第2の記憶部160cから読み出し、当該データセット1’(DS1’)およびデータセット2’(DS2’)を用いて、第1の擬似ラベル生成用物体検知モデルPDM1、または、第2の擬似ラベル生成用物体検知モデルPDM2の再学習を行う。そして、再学習部105は、当該再学習によって生成された物体検知モデルDMを第2の記憶部160cに記憶する。なお、再学習部105は、データセット1’(DS1’)およびデータセット2’(DS2’)を用いて、新たな物体検知モデルDMの学習を行ってもよい。新たな物体検知モデルDMとは、第1の擬似ラベル生成用物体検知モデルPDM1及び第2の擬似ラベル生成用物体検知モデルPDM2のいずれとも異なる対象画像用検知モデルである。
(Configuration of the second control unit 110c)
As shown in FIG. 14, the second control unit 110c includes a re-learning unit 105. The relearning unit 105 is a re-learning unit. The re-learning unit 105 is a configuration that implements pseudo-label reference learning means in this exemplary embodiment. The re-learning unit 105 learns the target image detection model using the data set to which the pseudo label has been assigned. Specifically, the re-learning unit 105 re-learns the first pseudo-label generation object detection model PDM1 or the second pseudo-label generation object detection model PDM2 as the learning. More specifically, the relearning unit 105 reads the data set 1′ (DS1′) and the data set 2′ (DS2′) from the second storage unit 160c, and reads the data set 1′ (DS1′) and data The set 2′ (DS2′) is used to relearn the first pseudo-label generation object detection model PDM1 or the second pseudo-label generation object detection model PDM2. Then, the relearning unit 105 stores the object detection model DM generated by the relearning in the second storage unit 160c. Note that the relearning unit 105 may learn a new object detection model DM using data set 1′ (DS1′) and data set 2′ (DS2′). The new object detection model DM is a target image detection model that is different from both the first pseudo-label generation object detection model PDM1 and the second pseudo-label generation object detection model PDM2.
 上述のように、本例示的実施形態に係る情報処理装置10cにおいては、複数のエキスパートデータセットの各々に基づき閾値を決定し、当該閾値に基づき複数のデータセットの各々に擬似ラベルを付与する構成が採用されている。このため、本例示的実施形態に係る情報処理装置10cによれば、各々に擬似ラベルが付与された複数のデータセット、具体的には、データセット1’(DS1’)およびデータセット2’(DS2’)を用いて検知モデルの再学習を行うことができるので、再学習後の検知モデルを用いた、画像に含まれるオブジェクトの検知精度をさらに向上させることができるという効果が得られる。また、本例示的実施形態に係る情報処理装置10cによれば、各々に擬似ラベルが付与された複数のデータセットを生成する場合、すなわち、擬似ラベルを決定するための閾値が複数必要である場合でも、当該複数の閾値を自動で決定することができるので、閾値の調整に関するコストを削減することができるという効果が得られる。また、本例示的実施形態に係る情報処理装置10cによれば、責任範囲の異なる複数のデータセットから、高精度な1つの対象画像用検知モデルの学習を行うことができるという効果が得られる。 As described above, in the information processing device 10c according to the present exemplary embodiment, a threshold is determined based on each of a plurality of expert datasets, and a pseudo label is assigned to each of the plurality of datasets based on the threshold. is adopted. For this reason, according to the information processing apparatus 10c according to the present exemplary embodiment, a plurality of datasets each assigned a pseudo-label, specifically dataset 1′ (DS1′) and dataset 2′ ( DS2') can be used to re-learn the detection model, so that the effect of further improving the detection accuracy of the object included in the image using the re-learned detection model can be obtained. Further, according to the information processing apparatus 10c according to this exemplary embodiment, when generating a plurality of data sets each assigned a pseudo label, that is, when a plurality of thresholds for determining pseudo labels are required However, since the plurality of thresholds can be automatically determined, an effect is obtained that the cost for adjusting the thresholds can be reduced. Further, according to the information processing apparatus 10c according to the exemplary embodiment, it is possible to obtain an effect that one highly accurate target image detection model can be learned from a plurality of data sets with different responsibilities.
 なお、本例示的実施形態では、エキスパートデータセットの数が「2」である例を説明したが、エキスパートデータセットの数はこの例に限定されない。また、情報処理装置10cが記憶するデータセットおよび評価用データセットの数、並びに、情報処理装置10cにおける学習手段、閾値決定手段、推論手段およびデータセット生成手段を実現する部材の数は、エキスパートデータセットの数に応じたものとなる。例えば、エキスパートデータセットの数を「3」とする場合、情報処理装置10cは、第3のデータセットおよび第3の評価用データセットをさらに記憶し、また、第3の学習部、第3の閾値決定部、第3の推論部および第3のデータセット生成部をさらに備える。 In this exemplary embodiment, an example in which the number of expert data sets is "2" has been described, but the number of expert data sets is not limited to this example. In addition, the number of data sets and evaluation data sets stored in the information processing device 10c, and the number of members realizing the learning means, threshold value determination means, inference means, and data set generation means in the information processing device 10c are expert data. It depends on the number of sets. For example, when the number of expert data sets is "3", the information processing device 10c further stores a third data set and a third evaluation data set, and further includes a third learning unit, a third It further comprises a threshold determiner, a third reasoner and a third data set generator.
 また、本例示的実施形態では、各エキスパートデータセットの責任範囲はそれぞれ異なるものとして説明したが、責任範囲が、エキスパートデータセット間で重複していてもよい。 Also, in the present exemplary embodiment, the scope of responsibility of each expert dataset has been described as different, but the scope of responsibility may overlap between expert datasets.
 また、本例示的実施形態に係る第1のデータセット生成部104-1および第2のデータセット生成部104-2は、例示的実施形態3にて説明した関連付け部1042bの機能を備えていてもよい。すなわち、第1のデータセット生成部104-1は、データセット2’(DS2’)について、擬似ラベルに関連付けられた画像に含まれるオブジェクトに正解ラベルが付与されていた場合であって、当該擬似ラベルに含まれる領域情報が示す領域と、当該正解ラベルに含まれる領域情報が示す領域との重なりの度合いが所定の度合い以上である場合に、当該擬似ラベルを削除してもよい。また、第2のデータセット生成部104-2は、データセット1’(DS1)’について、擬似ラベルに関連付けられた画像に含まれるオブジェクトに正解ラベルが付与されていた場合であって、当該擬似ラベルに含まれる領域情報が示す領域と、当該正解ラベルに含まれる領域情報が示す領域との重なりの度合いが所定の度合い以上である場合に、当該擬似ラベルを削除してもよい。 Also, the first data set generation unit 104-1 and the second data set generation unit 104-2 according to this exemplary embodiment have the function of the associating unit 1042b described in the third exemplary embodiment. good too. That is, the first data set generation unit 104-1 generates data set 2′ (DS2′) in the case where the correct label is assigned to the object included in the image associated with the pseudo label, and the pseudo label If the degree of overlap between the region indicated by the region information included in the label and the region indicated by the region information included in the correct label is greater than or equal to a predetermined degree, the pseudo label may be deleted. In addition, the second data set generation unit 104-2 generates data set 1′ (DS1)′ in the case where the correct label is assigned to the object included in the image associated with the pseudo label, and the pseudo label If the degree of overlap between the region indicated by the region information included in the label and the region indicated by the region information included in the correct label is greater than or equal to a predetermined degree, the pseudo label may be deleted.
 〔例示的実施形態5〕
 本発明の第5の例示的実施形態について、図面を参照して詳細に説明する。なお、例示的実施形態1~4にて説明した構成要素と同じ機能を有する構成要素については、同じ符号を付記し、その説明を繰り返さない。
[Exemplary embodiment 5]
A fifth exemplary embodiment of the present invention will now be described in detail with reference to the drawings. Components having the same functions as the components described in exemplary embodiments 1 to 4 are denoted by the same reference numerals, and description thereof will not be repeated.
 <情報処理装置10dの構成>
 本例示的実施形態に係る情報処理装置10dの構成について、図17を参照して説明する。図17は、情報処理装置10dの構成を示すブロック図である。図17に示すように、情報処理装置10dは、制御部100dおよび記憶部150dを備える。制御部100dは、情報処理装置10dの各部を統括して制御する。記憶部150dは、情報処理装置10dが使用する各種プログラムおよびデータを記憶する。
<Configuration of information processing device 10d>
The configuration of an information processing device 10d according to this exemplary embodiment will be described with reference to FIG. FIG. 17 is a block diagram showing the configuration of the information processing device 10d. As shown in FIG. 17, the information processing device 10d includes a control section 100d and a storage section 150d. The control unit 100d centrally controls each unit of the information processing device 10d. The storage unit 150d stores various programs and data used by the information processing device 10d.
 制御部100dは、上述した例示的実施形態2に係る学習部101、閾値決定部102、推論部103、データセット生成部104および再学習部105に加えて、学習不実施領域決定部106を備える。学習不実施領域決定部106は、本例示的実施形態において学習不実施領域決定手段を実現する構成である。 The control unit 100d includes a non-learning region determination unit 106 in addition to the learning unit 101, the threshold determination unit 102, the inference unit 103, the data set generation unit 104, and the relearning unit 105 according to the second exemplary embodiment described above. . The non-learning area determination unit 106 is a configuration that implements non-learning area determination means in this exemplary embodiment.
 閾値決定部102は、上述の例示的実施形態2と同様に、評価用データセットDSEに含まれる1又は複数の画像の各々を検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第1の閾値を決定する。閾値決定部102が第1の閾値を決定する手法は上述の例示的実施形態2においてすでに説明しているため、ここでは説明を繰り返さない。 As in the second exemplary embodiment described above, the threshold determination unit 102 determines one or more inference results obtained by inputting each of one or more images included in the evaluation data set DSE into the detection model, A first threshold is determined with reference to a comparison result with one or more correct labels attached to each of one or more images. The method by which the threshold determination unit 102 determines the first threshold has already been described in the second exemplary embodiment above, so the description will not be repeated here.
 また、本例示的実施形態では、閾値決定部102は更に、評価用データセットDSEに含まれる1又は複数の画像の各々を検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して、第1の閾値より小さい第2の閾値を決定する。 In addition, in this exemplary embodiment, the threshold determination unit 102 further includes one or more inference results obtained by inputting each of one or more images included in the evaluation data set DSE into the detection model; Alternatively, a second threshold that is smaller than the first threshold is determined by referring to a comparison result with one or more correct labels attached to each of the plurality of images.
 第2の閾値は第1の閾値より小さい値であり、一例として、第1の閾値が適合率(precision)を重視した値とし、第2の閾値を再現率(recall)を重視した値としてもよい。例えば、第1の閾値は適合率を重視したF値であるF0.5-scoreが最大値をとる信頼度であり、第2の閾値は再現率を重視したF2-scoreが最大値をとる信頼度であってもよい。 The second threshold is a value smaller than the first threshold. For example, the first threshold may be a value that emphasizes precision, and the second threshold may be a value that emphasizes recall. good. For example, the first threshold is the confidence that F 0.5 -score, which is the F value that emphasizes precision, takes the maximum value, and the second threshold is the confidence that F 2 -score, which emphasizes recall, takes the maximum value. degree.
 学習不実施領域決定部106は、データセット生成部104が生成した疑似ラベル付与後のデータセット2’(DS2’)において、推論部103による1又は複数の推論結果のうち、上記第1の閾値未満かつ上記第2の閾値以上の信頼度を有する推論結果に対応する領域を、再学習部105による学習の対象とならない学習不実施領域として決定する。 The non-learning region determining unit 106 determines the first threshold among one or more inference results by the inference unit 103 in the pseudo-labeled dataset 2′ (DS2′) generated by the dataset generation unit 104. A region corresponding to an inference result having a reliability less than and equal to or greater than the second threshold is determined as a non-learning region that is not subject to learning by the relearning unit 105 .
 <情報処理方法の流れ>
 以上のように構成された情報処理装置10dが実行する情報処理方法S10dの流れについて、図18を参照して説明する。図18は、情報処理方法S10dの流れを示すフロー図である。情報処理方法S10dは、ステップS101~S1022、S1023d、S103~S1041、S1041d、およびS1042を含む。これらのステップのうち、ステップS101~S1022、S103~S1041、およびS1042は、上述の例示的実施形態2においてすでに説明しているため、ここでは説明を繰り返さない。
<Flow of information processing method>
The flow of the information processing method S10d executed by the information processing apparatus 10d configured as described above will be described with reference to FIG. FIG. 18 is a flowchart showing the flow of the information processing method S10d. The information processing method S10d includes steps S101 to S1022, S1023d, S103 to S1041, S1041d, and S1042. Among these steps, steps S101-S1022, S103-S1041, and S1042 have already been described in the exemplary embodiment 2 above, so the description will not be repeated here.
 (ステップS1023d)
 ステップS1023dにおいて、閾値判断部1023は、評価値に基づき第1の閾値および第2の閾値を決定する。具体的には、閾値判断部1023は、取得した、例えば適合率を重視した複数のF値のうち、最大値を特定し、特定したF値に紐付けられた基準値を第1の閾値とする。また、閾値判断部1023は、取得した、例えば再現率を重視した複数のF値のうち、最大値を特定し、特定したF値に紐付けられた基準値を第2の閾値とする。閾値判断部1023は、決定した第1の閾値及び第2の閾値を擬似ラベル生成部1041へ出力する。
(Step S1023d)
In step S1023d, the threshold determination unit 1023 determines the first threshold and the second threshold based on the evaluation value. Specifically, the threshold determination unit 1023 identifies the maximum value among the plurality of acquired F values that emphasize the precision, for example, and sets the reference value associated with the identified F value as the first threshold. do. In addition, the threshold determination unit 1023 identifies the maximum value among the plurality of obtained F-values that emphasize recall, for example, and sets the reference value associated with the identified F-value as the second threshold. The threshold determination unit 1023 outputs the determined first threshold and second threshold to the pseudo-label generation unit 1041 .
 (ステップS1041d)
 ステップS1041dにおいて、学習不実施領域決定部106は、データセット生成部104が生成した疑似ラベル付与後のデータセット2’(DS2’)において、推論部103による1又は複数の推論結果のうち、上記第1の閾値未満かつ上記第2の閾値以上の信頼度を有する推論結果に対応する領域を、再学習部105による学習の対象とならない学習不実施領域として決定する。
(Step S1041d)
In step S1041d, the non-learning region determination unit 106 selects one or more inference results from the inference unit 103 in the pseudo-labeled dataset 2′ (DS2′) generated by the dataset generation unit 104. A region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the second threshold is determined as a non-learning region that is not subject to learning by the relearning unit 105 .
 上述のように、本例示的実施形態に係る情報処理装置10dにおいては、データセット生成部104が生成した疑似ラベル付与後のデータセットにおいて、推論部103による1又は複数の推論結果のうち、上記第1の閾値未満かつ上記第2の閾値以上の信頼度を有する推論結果に対応する領域を、再学習部105による学習の対象とならない学習不実施領域として決定する。上記第1の閾値未満かつ上記第2の閾値以上の信頼度を有する推論結果に対応する領域は、疑似ラベルを付与したとしても、信頼性の低い疑似ラベルになるという傾向がある。このような領域を学習不実施領域に設定することによって、信頼性が相対的に高い疑似ラベルを用いて再学習をおこなうことができるので、再学習部105による対象画像用検知モデル(学習部101により学習された検知モデル)の検知精度を向上させることができる。 As described above, in the information processing apparatus 10d according to the present exemplary embodiment, in the pseudo-labeled dataset generated by the dataset generation unit 104, among the one or more inference results by the inference unit 103, the above A region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the second threshold is determined as a non-learning region that is not subject to learning by the relearning unit 105 . A region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the second threshold tends to have a low-reliability pseudo-label even if a pseudo-label is assigned to the region. By setting such a region as a non-learning region, relearning can be performed using pseudo labels with relatively high reliability. can improve the detection accuracy of the detection model learned by
 また、当該構成を採用した情報処理装置10dによれば、対象画像用検知モデルの検知精度を向上させることができるため、疑似ラベルが付与された画像を含むデータセットの生成であって第1の閾値と第2の閾値とを用いた生成にかかるコストを削減することが可能となる。 Further, according to the information processing device 10d adopting this configuration, the detection accuracy of the target image detection model can be improved. It is possible to reduce the cost of generation using the threshold and the second threshold.
 〔例示的実施形態6〕
 本発明の第6の例示的実施形態について、図面を参照して詳細に説明する。なお、例示的実施形態1~5にて説明した構成要素と同じ機能を有する構成要素については、同じ符号を付記し、その説明を繰り返さない。
[Exemplary embodiment 6]
A sixth exemplary embodiment of the invention will now be described in detail with reference to the drawings. Components having the same functions as the components described in the exemplary embodiments 1 to 5 are denoted by the same reference numerals, and the description thereof will not be repeated.
 <情報処理装置10eの構成>
 本例示的実施形態に係る情報処理装置10eの構成について、図19を参照して説明する。図19は、情報処理装置10dの構成を示すブロック図である。図19に示すように、情報処理装置10eは、第1の制御部100e、第2の制御部110e、第1の記憶部150e、および第2の記憶部160eを備える。第1の制御部100eおよび第2の制御部110eは、情報処理装置10eの各部を統括して制御する。第1の記憶部150eおよび第2の記憶部160eは、情報処理装置10eが使用する各種プログラムおよびデータを記憶する。
<Configuration of information processing device 10e>
A configuration of an information processing apparatus 10e according to this exemplary embodiment will be described with reference to FIG. FIG. 19 is a block diagram showing the configuration of the information processing device 10d. As shown in FIG. 19, the information processing apparatus 10e includes a first control section 100e, a second control section 110e, a first storage section 150e, and a second storage section 160e. The first control unit 100e and the second control unit 110e collectively control each unit of the information processing device 10e. The first storage unit 150e and the second storage unit 160e store various programs and data used by the information processing device 10e.
 第1の制御部100eは、上述の例示的実施形態4で示した情報処理装置10cの第1の制御部100cの構成に加え、第1の学習不実施領域決定部106-1、及び第2の学習不実施領域決定部106-2を備える。第1の学習不実施領域決定部106-1は、本例示的実施形態において第1の学習不実施領域決定手段を実現する構成である。第2の学習不実施領域決定部106-2は、本例示的実施形態において第2の学習不実施領域決定手段を実現する構成である。 The first control unit 100e includes, in addition to the configuration of the first control unit 100c of the information processing apparatus 10c shown in the above-described fourth exemplary embodiment, a first non-learning region determination unit 106-1 and a second learning non-execution region determination unit 106-2. The first non-learning area determining section 106-1 is a configuration that implements first non-learning area determining means in this exemplary embodiment. The second non-learning area determination unit 106-2 is a configuration that implements second non-learning area determination means in this exemplary embodiment.
 第1の閾値決定部102-1は、上述の例示的実施形態4と同様に、評価用データセット1(DSE1)に含まれる1又は複数の画像の各々を検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第1の閾値を決定する。第1の閾値決定部102-1が第1の閾値を決定する手法は上述の例示的実施形態4においてすでに説明しているため、ここでは説明を繰り返さない。 The first threshold determination unit 102-1, similarly to the above-described exemplary embodiment 4, inputs each of the one or more images included in the evaluation data set 1 (DSE1) to the detection model to obtain 1 Alternatively, the first threshold is determined by referring to the results of comparison between the plurality of inference results and the one or more correct labels attached to each of the one or more images. The method of determining the first threshold by the first threshold determination unit 102-1 has already been described in the above-described exemplary embodiment 4, so the description will not be repeated here.
 また、本例示的実施形態では、第1の閾値決定部102-1は更に、評価用データセット1(DSE1)に含まれる1又は複数の画像の各々を検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して、第1の閾値より小さい第3の閾値を決定する。 Also, in this exemplary embodiment, the first threshold determination unit 102-1 further includes one or more images obtained by inputting each of the one or more images included in the evaluation data set 1 (DSE1) into the detection model. A third threshold that is smaller than the first threshold is determined by referring to the results of comparison between the plurality of inference results and the one or more correct labels attached to each of the one or more images.
 第3の閾値は第1の閾値より小さい値であり、一例として、第1の閾値が適合率(precision)を重視した値とし、第3の閾値を再現率(recall)を重視した値としてもよい。例えば、第1の閾値は適合率を重視したF値であるF0.5-scoreが最大値をとる信頼度であり、第3の閾値は再現率を重視したF2-scoreが最大値をとる信頼度であってもよい。 The third threshold is a value smaller than the first threshold. For example, the first threshold may be a value that emphasizes precision, and the third threshold may be a value that emphasizes recall. good. For example, the first threshold is the confidence that F 0.5 -score, which is the F value that emphasizes precision, takes the maximum value, and the third threshold is the confidence that F 2 -score, which emphasizes recall, takes the maximum value. degree.
 第2の閾値決定部102-2は、上述の例示的実施形態4と同様に、評価用データセット2(DSE2)に含まれる1又は複数の画像の各々を検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第2の閾値を決定する。第2の閾値決定部102-2が第2の閾値を決定する手法は上述の例示的実施形態4においてすでに説明しているため、ここでは説明を繰り返さない。 The second threshold determination unit 102-2, similarly to the above-described exemplary embodiment 4, inputs each of the one or more images included in the evaluation data set 2 (DSE2) to the detection model to obtain 1 Alternatively, the second threshold is determined by referring to the results of comparison between the plurality of inference results and the one or more correct labels attached to each of the one or more images. The method of determining the second threshold by the second threshold determining unit 102-2 has already been described in the above-described exemplary embodiment 4, so the description will not be repeated here.
 また、本例示的実施形態では、第2の閾値決定部102-2は更に、評価用データセット2(DSE2)に含まれる1又は複数の画像の各々を検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して、第2の閾値より小さい第4の閾値を決定する。 Also, in this exemplary embodiment, the second threshold determination unit 102-2 further includes one or more images obtained by inputting each of the one or more images included in the evaluation data set 2 (DSE2) into the detection model. A fourth threshold smaller than the second threshold is determined by referring to a comparison result between the plurality of inference results and the one or more correct labels attached to each of the one or more images.
 第4の閾値は第2の閾値より小さい値であり、一例として、第2の閾値が適合率(precision)を重視した値とし、第4の閾値を再現率(recall)を重視した値としてもよい。例えば、第2の閾値は適合率を重視したF値であるF0.5-scoreが最大値をとる信頼度であり、第4の閾値は再現率を重視したF2-scoreが最大値をとる信頼度であってもよい。 The fourth threshold is a value smaller than the second threshold. For example, the second threshold may be a value that emphasizes precision, and the fourth threshold may be a value that emphasizes recall. good. For example, the second threshold is the confidence that F 0.5 -score, which is the F value that emphasizes precision, takes the maximum value, and the fourth threshold is the confidence that F 2 -score, which emphasizes recall, takes the maximum value. degree.
 第1の学習不実施領域決定部106-1は、第1のデータセット生成部104-1が生成した疑似ラベル付与後のデータセット2’(DS2’)において、第1の推論部103-1による1又は複数の推論結果のうち、上記第1の閾値未満かつ上記第3の閾値以上の信頼度を有する推論結果に対応する領域を、再学習部105による学習の対象とならない学習不実施領域として決定する。 First learning non-implementation region determination unit 106-1 performs the first inference unit 103-1 in the pseudo-labeled dataset 2′ (DS2′) generated by first dataset generation unit 104-1. Of the one or more inference results, a region corresponding to an inference result having a reliability less than the first threshold and equal to or greater than the third threshold is defined as a non-learning region that is not subject to learning by the relearning unit 105. Determined as
 第2の学習不実施領域決定部106-2は、第2のデータセット生成部104-2が生成した疑似ラベル付与後のデータセット1’(DS1’)において、第2の推論部103-2による1又は複数の推論結果のうち、上記第2の閾値未満かつ上記第4の閾値以上の信頼度を有する推論結果に対応する領域を、再学習部105による学習の対象とならない学習不実施領域として決定する。 Second learning non-implementation region determination unit 106-2 uses second inference unit 103-2 in pseudo-labeled dataset 1′ (DS1′) generated by second dataset generation unit 104-2. Of the one or more inference results, a region corresponding to an inference result having a reliability less than the second threshold and equal to or greater than the fourth threshold is a non-learning region that is not subject to learning by the relearning unit 105. Determined as
 上述のように、本例示的実施形態に係る情報処理装置10eにおいては、第1のデータセット生成部104-1が生成した上記疑似ラベル付与後の第2のデータセット2’(DS2’)において、第1の推論部103-1による1又は複数の推論結果のうち、上記第1の閾値未満かつ上記第3の閾値以上の信頼度を有する推論結果に対応する領域を、再学習部105による学習の対象とならない学習不実施領域として決定する。また、情報処理装置10eにおいては、第2のデータセット生成部104-2が生成した上記疑似ラベル付与後の第1のデータセット1’(DS1’)において、第2の推論部103-2による1又は複数の推論結果のうち、上記第2の閾値未満かつ上記第4の閾値以上の信頼度を有する推論結果に対応する領域を、再学習部105による学習の対象とならない学習不実施領域として決定する。 As described above, in the information processing device 10e according to the present exemplary embodiment, in the pseudo-labeled second dataset 2′ (DS2′) generated by the first dataset generator 104-1, , from among one or more inference results by the first inference unit 103-1, a region corresponding to an inference result having a reliability less than the first threshold and equal to or greater than the third threshold is selected by the relearning unit 105. It is determined as a non-learning area that is not subject to learning. Further, in the information processing device 10e, the second inference unit 103-2 performs Among one or more inference results, a region corresponding to an inference result having a reliability less than the second threshold and equal to or greater than the fourth threshold is defined as a non-learning region that is not subject to learning by the relearning unit 105. decide.
 上記第1の閾値未満かつ上記第3の閾値以上の信頼度を有する推論結果に対応する領域は、疑似ラベルを付与したとしても、信頼性の低い疑似ラベルになるという傾向がある。また、上記第2の閾値未満かつ上記第4の閾値以上の信頼度を有する推論結果に対応する領域は、疑似ラベルを付与したとしても、信頼性の低い疑似ラベルになるという傾向がある。このような領域を学習不実施領域に設定することによって、信頼性が相対的に高い疑似ラベルを用いて再学習をおこなうことができるので、再学習部105による対象画像用検知モデル(第1の学習部101-1、第2の学習部101-2により学習された検知モデル)の検知精度を向上させることができる。 A region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the third threshold tends to be a pseudo-label with low reliability even if a pseudo-label is assigned. In addition, even if a pseudo label is assigned to an area corresponding to an inference result having a reliability lower than the second threshold and equal to or higher than the fourth threshold, there is a tendency for the pseudo label to be a low-reliability pseudo label. By setting such an area as a non-learning area, relearning can be performed using pseudo labels with relatively high reliability. The detection accuracy of the detection model learned by the learning unit 101-1 and the second learning unit 101-2 can be improved.
 また、当該構成を採用した情報処理装置10eによれば、対象画像用検知モデルの検知精度を向上させることができるため、疑似ラベルが付与された画像を含むデータセットの生成であって、第1の閾値、第2の閾値、第3の閾値及び第4の閾値を用いたデータセットの生成にかかるコストを削減することが可能となる。 Further, according to the information processing apparatus 10e adopting the configuration, the detection accuracy of the target image detection model can be improved. , the second threshold, the third threshold, and the fourth threshold.
 〔ソフトウェアによる実現例〕
 情報処理装置10、10a~10e、20および20aの一部又は全部の機能は、集積回路(ICチップ)等のハードウェアによって実現してもよいし、ソフトウェアによって実現してもよい。
[Example of realization by software]
Some or all of the functions of the information processing devices 10, 10a to 10e, 20 and 20a may be realized by hardware such as integrated circuits (IC chips) or by software.
 後者の場合、情報処理装置10、10a~10e、20および20aは、例えば、各機能を実現するソフトウェアであるプログラムの命令を実行するコンピュータによって実現される。このようなコンピュータの一例(以下、コンピュータCと記載する)を図20に示す。コンピュータCは、少なくとも1つのプロセッサC1と、少なくとも1つのメモリC2と、を備えている。メモリC2には、コンピュータCを情報処理装置10、10a~10e、20および20aとして動作させるためのプログラムPが記録されている。コンピュータCにおいて、プロセッサC1は、プログラムPをメモリC2から読み取って実行することにより、情報処理装置10、10a~10e、20および20aの各機能が実現される。 In the latter case, the information processing devices 10, 10a to 10e, 20 and 20a are implemented by computers that execute program instructions, which are software that implements each function, for example. An example of such a computer (hereinafter referred to as computer C) is shown in FIG. Computer C comprises at least one processor C1 and at least one memory C2. A program P for operating the computer C as the information processing apparatuses 10, 10a to 10e, 20 and 20a is recorded in the memory C2. In the computer C, the processor C1 reads the program P from the memory C2 and executes it, thereby realizing each function of the information processing devices 10, 10a to 10e, 20 and 20a.
 プロセッサC1としては、例えば、CPU(Central Processing Unit)、GPU(Graphic Processing Unit)、DSP(Digital Signal Processor)、MPU(Micro Processing Unit)、FPU(Floating point number Processing Unit)、PPU(Physics Processing Unit)、マイクロコントローラ、又は、これらの組み合わせなどを用いることができる。メモリC2としては、例えば、フラッシュメモリ、HDD(Hard Disk Drive)、SSD(Solid State Drive)、又は、これらの組み合わせなどを用いることができる。 As the processor C1, for example, CPU (Central Processing Unit), GPU (Graphic Processing Unit), DSP (Digital Signal Processor), MPU (Micro Processing Unit), FPU (Floating point number Processing Unit), PPU (Physics Processing Unit) , a microcontroller, or a combination thereof. As the memory C2, for example, a flash memory, HDD (Hard Disk Drive), SSD (Solid State Drive), or a combination thereof can be used.
 なお、コンピュータCは、プログラムPを実行時に展開したり、各種データを一時的に記憶したりするためのRAM(Random Access Memory)を更に備えていてもよい。また、コンピュータCは、他の装置との間でデータを送受信するための通信インタフェースを更に備えていてもよい。また、コンピュータCは、キーボードやマウス、ディスプレイやプリンタなどの入出力機器を接続するための入出力インタフェースを更に備えていてもよい。 Note that the computer C may further include a RAM (Random Access Memory) for expanding the program P during execution and temporarily storing various data. Computer C may further include a communication interface for sending and receiving data to and from other devices. Computer C may further include an input/output interface for connecting input/output devices such as a keyboard, mouse, display, and printer.
 また、プログラムPは、コンピュータCが読み取り可能な、一時的でない有形の記録媒体Mに記録することができる。このような記録媒体Mとしては、例えば、テープ、ディスク、カード、半導体メモリ、又はプログラマブルな論理回路などを用いることができる。コンピュータCは、このような記録媒体Mを介してプログラムPを取得することができる。また、プログラムPは、伝送媒体を介して伝送することができる。このような伝送媒体としては、例えば、通信ネットワーク、又は放送波などを用いることができる。コンピュータCは、このような伝送媒体を介してプログラムPを取得することもできる。 In addition, the program P can be recorded on a non-temporary tangible recording medium M that is readable by the computer C. As such a recording medium M, for example, a tape, disk, card, semiconductor memory, programmable logic circuit, or the like can be used. The computer C can acquire the program P via such a recording medium M. Also, the program P can be transmitted via a transmission medium. As such a transmission medium, for example, a communication network or broadcast waves can be used. Computer C can also obtain program P via such a transmission medium.
 〔付記事項1〕
 本発明は、上述した実施形態に限定されるものでなく、請求項に示した範囲で種々の変更が可能である。例えば、上述した実施形態に開示された技術的手段を適宜組み合わせて得られる実施形態についても、本発明の技術的範囲に含まれる。
[Appendix 1]
The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope of the claims. For example, embodiments obtained by appropriately combining the technical means disclosed in the embodiments described above are also included in the technical scope of the present invention.
 〔付記事項2〕
 上述した実施形態の一部又は全部は、以下のようにも記載され得る。ただし、本発明は、以下の記載する態様に限定されるものではない。
[Appendix 2]
Some or all of the above-described embodiments may also be described as follows. However, the present invention is not limited to the embodiments described below.
 (付記1)
 第1のデータセットを用いて検知モデルの学習を行う学習手段と、
 評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第1の閾値を決定する閾値決定手段と、
 第2のデータセットに含まれる1又は複数の画像の各々を前記検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論手段と、
 前記推論手段による1又は複数の推論結果のうち、前記第1の閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成するデータセット生成手段と
を備えていることを特徴とする情報処理装置。
(Appendix 1)
a learning means for learning a detection model using the first data set;
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination means for determining the first threshold with reference to the comparison result of
inference means for obtaining one or more inference results for each of the one or more images included in the second data set by inputting each of the one or more images into the detection model;
setting an inference result having a reliability equal to or higher than the first threshold among the one or more inference results by the inference means as a pseudo label, and associating the pseudo label with the corresponding image, and data set generation means for generating a data set of .
 付記1の構成によれば、第1のデータセットを用いて学習が行われた検知モデルによる、評価用データセットに含まれる画像の推論結果と、当該画像に関連付けられた正解ラベルとの比較に基づき、擬似ラベルの設定のための第1の閾値を自動で決定する。このため、付記1の構成によれば、当該第1の閾値の調整に関するコストを削減することが可能となる。そして、付記1の構成によれば、当該検知モデルによる、第2のデータセットに含まれる画像の推論結果から、自動で決定された第1の閾値以上の信頼度を有する推論結果を擬似ラベルに設定し、当該擬似ラベルを対応する画像に関連付ける。このため、付記1の構成によれば、擬似ラベルが付与された画像を含むデータセットの生成にかかるコストを削減することが可能となる。 According to the configuration of Supplementary Note 1, the inference result of the image included in the evaluation data set by the detection model trained using the first data set is compared with the correct label associated with the image. Based on this, the first threshold for setting the pseudo-label is automatically determined. Therefore, according to the configuration of Supplementary Note 1, it is possible to reduce the cost for adjusting the first threshold. Then, according to the configuration of Supplementary Note 1, from the inference result of the image included in the second data set by the detection model, the inference result having a confidence level equal to or higher than the automatically determined first threshold is set as the pseudo label. set to associate the pseudo-label with the corresponding image. Therefore, according to the configuration of Supplementary Note 1, it is possible to reduce the cost of generating a data set including images to which pseudo labels are assigned.
 (付記2)
 付記1に記載の情報処理装置であって、
 前記疑似ラベル付与後のデータセットを用いて、対象画像に含まれるオブジェクトの検知のための対象画像用検知モデルの学習を行う擬似ラベル参照学習手段を更に備えている
ことを特徴とする情報処理装置。
(Appendix 2)
The information processing device according to Supplementary Note 1,
An information processing apparatus, further comprising pseudo-label reference learning means for learning a target image detection model for detecting an object included in the target image, using the data set to which the pseudo label has been assigned. .
 付記2の構成によれば、擬似ラベル付与後のデータセットを用いて対象画像用検知モデルの学習を行う。このため、付記2の構成によれば、閾値の調整に関するコストを削減して、対象画像用検知モデルを生成することが可能となる。結果として、対象画像用検知モデルの学習を行うまでのコストを削減することができる。また、閾値として適切な値を決定することができれば、閾値の調整回数を低減させることができ、閾値の調整の度に必要となる対象画像用検知モデルの学習(再学習)の回数を低減することができる。結果として、対象画像用検知モデルの学習が完了するまでの時間を低減させることができる。 According to the configuration of Supplementary Note 2, learning of the target image detection model is performed using the data set after the pseudo-labeling. Therefore, according to the configuration of Supplementary Note 2, it is possible to generate a target image detection model while reducing the cost associated with threshold adjustment. As a result, it is possible to reduce the cost of learning the target image detection model. Also, if an appropriate value can be determined as the threshold, the number of threshold adjustments can be reduced, and the number of learning (re-learning) of the target image detection model required each time the threshold is adjusted can be reduced. be able to. As a result, it is possible to reduce the time until the learning of the target image detection model is completed.
 (付記3)
 付記2に記載の情報処理装置であって、
 前記擬似ラベル参照学習手段は、前記対象画像用検知モデルの学習として、前記検知モデルの再学習を行う
ことを特徴とする情報処理装置。
(Appendix 3)
The information processing device according to appendix 2,
The information processing apparatus, wherein the pseudo label reference learning means re-learns the detection model as the learning of the target image detection model.
 付記3の構成によれば、擬似ラベル付与後のデータセットを用いて検知モデルの再学習を行う。このため、付記3の構成によれば、検知モデルの再学習を行うまでのコストを削減することができる。また、閾値として適切な値を決定することができれば、閾値の調整回数を低減させることができ、閾値の調整の度に必要となる検知モデルの再学習の回数を低減することができる。結果として、検知モデルの再学習が完了するまでの時間を低減させることができる。 According to the configuration of Supplementary Note 3, the detection model is re-learned using the data set after pseudo-labeling. Therefore, according to the configuration of Supplementary Note 3, it is possible to reduce the cost of re-learning the detection model. Also, if an appropriate value can be determined as the threshold, the number of times the threshold is adjusted can be reduced, and the number of re-learning of the detection model required each time the threshold is adjusted can be reduced. As a result, it is possible to reduce the time until re-learning of the detection model is completed.
 (付記4)
 付記2又は3に記載の情報処理装置であって、
 前記閾値決定手段は、
 前記評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して、前記第1の閾値より小さい第2の閾値を決定し、
 当該情報処理装置は、
 前記データセット生成手段が生成した前記疑似ラベル付与後のデータセットにおいて、前記推論手段による1又は複数の推論結果のうち、前記第1の閾値未満かつ前記第2の閾値以上の信頼度を有する推論結果に対応する領域を、前記擬似ラベル参照学習手段による学習の対象とならない学習不実施領域として決定する学習不実施領域決定手段を更に備えている
ことを特徴とする情報処理装置。
(Appendix 4)
The information processing device according to appendix 2 or 3,
The threshold determination means is
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct answers given to each of the one or more images Determine a second threshold that is less than the first threshold by referring to the comparison result with the label;
The information processing device is
Inference having reliability less than the first threshold and equal to or greater than the second threshold among one or more inference results by the inference means in the pseudo-labeled dataset generated by the dataset generation means An information processing apparatus, further comprising non-learning area determination means for determining an area corresponding to the result as a non-learning area that is not subject to learning by the pseudo label reference learning means.
 上記第1の閾値未満かつ上記第2の閾値以上の信頼度を有する推論結果に対応する領域は、疑似ラベルを付与したとしても、信頼性の低い疑似ラベルになるという傾向がある。このような領域を学習不実施領域に設定することによって、信頼性が相対的に高い疑似ラベルを用いて再学習をおこなうことができるので、付記4の構成によれば、疑似ラベル参照学習手段による対象画像用検知モデルの検知精度を向上させることができる。 A region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the second threshold tends to be a pseudo-label with low reliability even if a pseudo-label is assigned. By setting such an area as a non-learning area, re-learning can be performed using pseudo labels with relatively high reliability. The detection accuracy of the target image detection model can be improved.
 (付記5)
 付記1から4のいずれか1つに記載の情報処理装置であって、
 前記正解ラベルには、当該正解ラベルに関連付けられた画像に含まれるオブジェクトの領域を示す領域情報、及び、当該オブジェクトのカテゴリを示すカテゴリ情報が含まれており、
 前記疑似ラベルには、当該疑似ラベルに関連付けられた画像に含まれるオブジェクトの領域を示す領域情報、及び、当該オブジェクトのカテゴリを示すカテゴリ情報が含まれている
ことを特徴とする情報処理装置。
(Appendix 5)
The information processing device according to any one of Appendices 1 to 4,
The correct label includes area information indicating the area of the object included in the image associated with the correct label, and category information indicating the category of the object,
An information processing apparatus, wherein the pseudo label includes area information indicating an area of an object included in an image associated with the pseudo label, and category information indicating a category of the object.
 付記5の構成によれば、正解ラベルおよび擬似ラベルには、領域情報およびカテゴリ情報が含まれている。このため、付記5の構成によれば、擬似ラベル付与後のデータセットを用いて再学習が行われた検知モデルを用いた、画像に含まれるオブジェクトの検知の精度を向上させることが可能となる。 According to the configuration of Supplementary Note 5, the correct label and the pseudo label include area information and category information. Therefore, according to the configuration of Supplementary Note 5, it is possible to improve the accuracy of detecting an object included in an image using a detection model that has been re-learned using a data set after pseudo-labeling. .
 (付記6)
 付記5に記載の情報処理装置であって、
 前記第2のデータセットに含まれる1又は複数の画像の少なくとも一部には、1又は複数の正解ラベルが付されており、
 前記データセット生成手段は、
  前記疑似ラベルに関連付けられた画像に含まれるオブジェクトに正解ラベルが付与されていた場合であって、当該疑似ラベルに含まれる領域情報が示す領域と、当該正解ラベルに含まれる領域情報が示す領域との重なりの度合いが所定の度合い以上である場合に、当該疑似ラベルを削除する
ことを特徴とする情報処理装置。
(Appendix 6)
The information processing device according to appendix 5,
At least part of the one or more images included in the second data set is labeled with one or more correct labels,
The data set generation means is
When a correct label is assigned to an object included in an image associated with the pseudo label, the area indicated by the area information included in the pseudo label and the area indicated by the area information included in the correct label 2. An information processing apparatus, wherein the pseudo label is deleted when the degree of overlap of is greater than or equal to a predetermined degree.
 付記6の構成によれば、第2のデータセットに含まれる画像に付された擬似ラベルおよび正解ラベルにおいて、擬似ラベルに含まれる領域情報が示す領域と、正解ラベルに含まれる領域情報が示す領域との重なりの度合いが所定の度合い以上である場合に、擬似ラベルを削除する。このため、付記6の構成によれば、当該擬似ラベルが適切でない場合に当該擬似ラベルが削除され、正解ラベルが残るので、再学習後の検知モデルを用いた、オブジェクトの検知の精度を向上させることが可能となる。特に、見た目が似ているオブジェクトに正解ラベルが付されたデータセットの場合、オブジェクトにカテゴリが誤った擬似ラベルが関連付けられる可能性が高い。これに対し、付記6の構成によれば、当該誤った擬似ラベルを削除することができるので、精度よく擬似ラベルを生成することができ、対象画像用検知モデルを用いた、オブジェクトの検知精度を向上させることが可能となる。 According to the configuration of appendix 6, in the pseudo labels and the correct labels attached to the images included in the second data set, the regions indicated by the region information included in the pseudo labels and the regions indicated by the region information included in the correct labels If the degree of overlap with is greater than or equal to a predetermined degree, the pseudo label is deleted. Therefore, according to the configuration of Supplementary Note 6, when the pseudo label is not appropriate, the pseudo label is deleted and the correct label remains. Therefore, the accuracy of object detection using the retrained detection model is improved. becomes possible. In particular, in the case of datasets in which the correct labels are attached to visually similar objects, there is a high probability that pseudo-labels with incorrect categories will be associated with the objects. On the other hand, according to the configuration of Supplementary Note 6, since the erroneous pseudo label can be deleted, the pseudo label can be generated with high accuracy, and the object detection accuracy using the target image detection model can be improved. can be improved.
 なお、擬似ラベルが適切でないとは、例えば、(1)当該擬似ラベルのカテゴリが、オブジェクトのカテゴリと異なる、(2)当該擬似ラベルのバウンディングボックスが、オブジェクトの一部を内包していない、などを指す。 Note that the pseudo-label is not appropriate, for example, (1) the category of the pseudo-label is different from the category of the object, (2) the bounding box of the pseudo-label does not enclose part of the object, etc. point to
 (付記7)
 付記5又は6に記載の情報処理装置であって、
  前記閾値決定手段は、カテゴリ毎に前記第1の閾値を設定し、
 前記データセット生成手段は、
  前記推論手段による1又は複数の推論結果のうち、カテゴリ毎に設定された前記第1の閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成する
ことを特徴とする情報処理装置。
(Appendix 7)
The information processing device according to appendix 5 or 6,
The threshold determination means sets the first threshold for each category,
The data set generation means is
setting an inference result having a reliability equal to or higher than the first threshold set for each category among the one or more inference results by the inference means as a pseudo label, and associating the pseudo label with the corresponding image; An information processing apparatus characterized by generating a data set after pseudo-labeling.
 付記7の構成によれば、第1のデータセットを用いて学習が行われた検知モデルによる、評価用データセットに含まれる画像の推論結果と、当該画像に関連付けられた正解ラベルとにおけるカテゴリ毎に第1の閾値が設定され、当該第1の閾値以上の推論結果を擬似ラベルに設定する。このため、付記7の構成によれば、擬似ラベルの設定の精度を向上させることができる。 According to the configuration of Supplementary Note 7, the inference result of the image included in the evaluation data set by the detection model trained using the first data set and the correct label associated with the image for each category is set to a first threshold, and an inference result equal to or greater than the first threshold is set as a pseudo label. Therefore, according to the configuration of Supplementary Note 7, the accuracy of setting pseudo labels can be improved.
 (付記8)
 付記1から7のいずれか1つに記載の情報処理装置であって、
  前記閾値決定手段は、前記比較結果が示す適合率と再現率とを参照して前記第1の閾値を決定する
ことを特徴とする情報処理装置。
(Appendix 8)
The information processing device according to any one of Appendices 1 to 7,
The information processing apparatus, wherein the threshold determination means determines the first threshold by referring to the matching rate and the recall rate indicated by the comparison result.
 付記8の構成によれば、第1のデータセットを用いて学習が行われた検知モデルによる、評価用データセットに含まれる画像の推論結果と、当該画像に関連付けられた正解ラベルとの比較結果から算出した適合率と再現率とを参照して第1の閾値を決定する。このため、付記8の構成によれば、擬似ラベルの設定の精度を向上させることが可能となる。また、付記8の構成によれば、学習データの質(適合率)および学習データの量(再現率)の両方を考慮して擬似ラベルを設定することができるため、高精度な対象画像用検知モデルを生成することが可能となる。 According to the configuration of Supplementary Note 8, the inference result of the image included in the evaluation data set by the detection model trained using the first data set and the comparison result of the correct label associated with the image. The first threshold is determined by referring to the precision and recall calculated from . Therefore, according to the configuration of Supplementary Note 8, it is possible to improve the accuracy of setting pseudo labels. In addition, according to the configuration of Supplementary Note 8, since it is possible to set pseudo labels in consideration of both the quality of learning data (relevance rate) and the amount of learning data (recall rate), highly accurate target image detection model can be generated.
 (付記9)
 付記1から8の何れか1つに記載の情報処理装置であって、
 前記評価用データセットは、前記第1のデータセットに含まれている
ことを特徴とする情報処理装置。
(Appendix 9)
The information processing device according to any one of Appendices 1 to 8,
The information processing apparatus, wherein the evaluation data set is included in the first data set.
 付記9の構成によれば、評価用データセットに含まれる画像は、第1のデータセットに含まれる。このため、付記9の構成によれば、評価用データセットの生成のために、作業にかかるコストの高い正解付け作業を新たに行う必要がなくなる。また、付記9の構成によれば、予め用意する画像の数を抑えることが可能となる。 According to the configuration of appendix 9, the images included in the evaluation data set are included in the first data set. Therefore, according to the configuration of Supplementary Note 9, there is no need to newly perform a high-cost correct answer assignment work for generating the evaluation data set. Further, according to the configuration of Supplementary Note 9, it is possible to reduce the number of images prepared in advance.
 (付記10)
 付記1から8の何れか1つに記載の情報処理装置であって、
 前記評価用データセットは、前記第2のデータセットの一部に、正解ラベルを付与することによって生成されたものである
ことを特徴とする情報処理装置。
(Appendix 10)
The information processing device according to any one of Appendices 1 to 8,
The information processing apparatus, wherein the evaluation data set is generated by giving a correct label to a part of the second data set.
 付記10の構成によれば、評価用データセットに含まれる画像は、第2のデータセットの一部に、正解ラベルを付与することによって生成される。このため、付記10の構成によれば、擬似ラベルが付与されるデータセットの一部が評価用データセットとして用いられて閾値が決定されることとなるので、付与される擬似ラベルの精度を向上させることが可能となる。また、付記10の構成によれば、あらかじめ用意する画像の数を抑えることが可能となる。 According to the configuration of appendix 10, the images included in the evaluation data set are generated by giving correct labels to part of the second data set. Therefore, according to the configuration of Supplementary Note 10, a part of the data set to which the pseudo-label is assigned is used as the evaluation data set to determine the threshold value, so the accuracy of the assigned pseudo-label is improved. It is possible to Further, according to the configuration of Supplementary Note 10, it is possible to reduce the number of images prepared in advance.
 (付記11)
 第1のデータセットを用いて第1の検知モデルの学習を行う第1の学習手段と、
 第2のデータセットを用いて第2の検知モデルの学習を行う第2の学習手段と、
 第1の評価用データセットに含まれる1又は複数の画像の各々を前記第1の検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第1の閾値を決定する第1の閾値決定手段と、
 第2の評価用データセットに含まれる1又は複数の画像の各々を前記第2の検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第2の閾値を決定する第2の閾値決定手段と、
 前記第2のデータセットに含まれる1又は複数の画像の各々を前記第1の検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する第1の推論手段と、
 前記第1のデータセットに含まれる1又は複数の画像の各々を前記第2の検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する第2の推論手段と、
 前記第1の推論手段による1又は複数の推論結果のうち、前記第1の閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後の第2のデータセットを生成する第1のデータセット生成手段と、
 前記第2の推論手段による1又は複数の推論結果のうち、前記第2の閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後の第1のデータセットを生成する第2のデータセット生成手段と、
を備えていることを特徴とする情報処理装置。
(Appendix 11)
a first learning means for learning a first detection model using a first data set;
a second learning means for learning a second detection model using a second data set;
One or more inference results obtained by inputting each of one or more images included in the first evaluation data set into the first detection model, and one or more inference results attached to each of the one or more images a first threshold determination means for determining a first threshold with reference to a comparison result with one or more correct labels;
One or more inference results obtained by inputting each of one or more images included in the second evaluation data set into the second detection model, and one or more inference results attached to each of the one or more images a second threshold determination means for determining a second threshold with reference to a comparison result with one or more correct labels;
a first obtaining one or more inference results for each of the one or more images contained in the second data set by inputting each of the one or more images into the first detection model; an inference means for
a second obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the first data set into the second detection model; an inference means for
By setting an inference result having a reliability equal to or higher than the first threshold among the one or more inference results by the first inference means as a pseudo label and associating the pseudo label with the corresponding image, a first data set generation means for generating a second data set after labeling;
By setting an inference result having a reliability equal to or higher than the second threshold among the one or more inference results by the second inference means as a pseudo label and associating the pseudo label with the corresponding image, a second data set generating means for generating a first data set after labeling;
An information processing device comprising:
 付記11の構成によれば、第1のデータセットを用いて学習が行われた検知モデルによる、第1の評価用データセットに含まれる画像の推論結果と、当該画像に関連付けられた正解ラベルとの比較に基づき、第2のデータセットへの擬似ラベルの設定のための第1の閾値を自動で決定する。また、付記11の記載によれば、第2のデータセットを用いて学習が行われた検知モデルによる、第2の評価用データセットに含まれる画像の推論結果と、当該画像に関連付けられた正解ラベルとの比較に基づき、第1のデータセットへの擬似ラベルの設定のための第2の閾値を自動で決定する。このため、付記11の構成によれば、第1の閾値および第2の閾値の調整に関するコストを削減することが可能となる。つまり、付記11の構成によれば、各々に擬似ラベルが付与された2つのデータセットを生成する場合でも、これら2つのデータセットの各々に擬似ラベルを設定するための2つの閾値の調整に関するコストを削減することが可能となる。そして、付記11の構成によれば、検知モデルの再学習は、擬似ラベルが付与された2つのデータセットを用いて行われることとなるので、再学習後の検知モデルを用いた、画像に含まれるオブジェクトの検知精度をさらに向上させることが可能となる。 According to the configuration of Supplementary Note 11, the inference result of the image included in the first evaluation data set by the detection model trained using the first data set, and the correct label associated with the image. automatically determine a first threshold for setting pseudo-labels to the second data set based on the comparison of . Further, according to the description of appendix 11, the inference result of the image included in the second evaluation data set by the detection model trained using the second data set, and the correct answer associated with the image A second threshold for setting the pseudo-labels to the first data set is automatically determined based on the comparison with the labels. Therefore, according to the configuration of Supplementary Note 11, it is possible to reduce the cost for adjusting the first threshold and the second threshold. In other words, according to the configuration of Supplementary Note 11, even when generating two datasets with pseudo-labels attached to each, the cost associated with adjusting two thresholds for setting pseudo-labels in each of these two datasets is can be reduced. Then, according to the configuration of Supplementary Note 11, the re-learning of the detection model is performed using the two data sets to which the pseudo-labels are assigned. It is possible to further improve the accuracy of object detection.
 (付記12)
 付記11に記載の情報処理装置であって、
 前記疑似ラベル付与後の第1のデータセット、及び前記疑似ラベル付与後の第2のデータセットを用いて、対象画像に含まれるオブジェクトの検知のための対象画像用検知モデルの学習を行う擬似ラベル参照学習手段を更に備えている
ことを特徴とする情報処理装置。
(Appendix 12)
The information processing device according to Appendix 11,
A pseudo label for learning a target image detection model for detecting an object included in the target image using the first data set after the pseudo labeling and the second data set after the pseudo labeling An information processing apparatus, further comprising reference learning means.
 付記12の構成によれば、擬似ラベル付与後の第1のデータセット、及び疑似ラベル付与後の第2のデータセットを用いて対象画像用検知モデルの学習を行う。このため、付記12の構成によれば、第1の閾値の調整および第2の閾値の調整に関するコストを削減して、対象画像用検知モデルを生成することが可能となる。結果として、対象画像用検知モデルの学習を行うまでのコストを削減することができる。また、第1の閾値および第2の閾値として適切な値を決定することができれば、閾値の調整回数を低減させることができ、閾値の調整の度に必要となる対象画像用検知モデルの学習(再学習)の回数を低減することができる。結果として、対象画像用検知モデルの学習が完了するまでの時間を低減させることができる。 According to the configuration of Supplementary Note 12, learning of the target image detection model is performed using the first data set after the pseudo-labeling and the second data set after the pseudo-labeling. Therefore, according to the configuration of Supplementary Note 12, it is possible to generate a target image detection model while reducing the costs associated with adjusting the first threshold value and adjusting the second threshold value. As a result, it is possible to reduce the cost of learning the target image detection model. Also, if it is possible to determine appropriate values for the first and second thresholds, the number of threshold adjustments can be reduced, and the target image detection model learning ( relearning) can be reduced. As a result, it is possible to reduce the time until the learning of the target image detection model is completed.
 (付記13)
 付記12に記載の情報処理装置であって、
 前記擬似ラベル参照学習手段は、前記対象画像用検知モデルの学習として、前記第1の検知モデル、及び前記第2の検知モデルの再学習を行う
ことを特徴とする情報処理装置。
(Appendix 13)
The information processing device according to Appendix 12,
The information processing apparatus, wherein the pseudo label reference learning means re-learns the first detection model and the second detection model as learning of the target image detection model.
 付記13の構成によれば、擬似ラベル付与後の第1のデータセットおよび疑似ラベル付与後の第2のデータセットを用いて第1の検知モデルおよび第2の検知モデルの再学習を行う。このため、付記13の構成によれば、第1の検知モデルおよび第2の検知モデルの再学習を行うまでのコストを削減することができる。また、第1の閾値および第2の閾値として適切な値を決定することができれば、閾値の調整回数を低減させることができ、閾値の調整の度に必要となる検知モデルの再学習の回数を低減することができる。結果として、検知モデルの再学習が完了するまでの時間を低減させることができる。 According to the configuration of appendix 13, the first detection model and the second detection model are re-learned using the pseudo-labeled first data set and the pseudo-labeled second data set. Therefore, according to the configuration of Supplementary Note 13, it is possible to reduce the cost of re-learning the first detection model and the second detection model. Also, if it is possible to determine appropriate values for the first threshold and the second threshold, the number of threshold adjustments can be reduced, and the number of re-learning of the detection model required each time the threshold is adjusted can be reduced. can be reduced. As a result, it is possible to reduce the time until re-learning of the detection model is completed.
 (付記14)
 付記12又は13に記載の情報処理装置であって、
 前記第1の閾値決定手段は、
 第1の評価用データセットに含まれる1又は複数の画像の各々を前記第1の検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して、前記第1の閾値より小さい第3の閾値を決定し、
 前記第2の閾値決定手段は、
 第2の評価用データセットに含まれる1又は複数の画像の各々を前記第2の検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して、前記第2の閾値より小さい第4の閾値を決定し、
 当該情報処理装置は、
 前記第1のデータセット生成手段が生成した前記疑似ラベル付与後の第2のデータセットにおいて、前記第1の推論手段による1又は複数の推論結果のうち、前記第1の閾値未満かつ前記第3の閾値以上の信頼度を有する推論結果に対応する領域を、前記擬似ラベル参照学習手段による学習の対象とならない学習不実施領域として決定する第1の学習不実施領域決定手段と、
 前記第2のデータセット生成手段が生成した前記疑似ラベル付与後の第1のデータセットにおいて、前記第2の推論手段による1又は複数の推論結果のうち、前記第2の閾値未満かつ前記第4の閾値以上の信頼度を有する推論結果に対応する領域を、前記擬似ラベル参照学習手段による学習の対象とならない学習不実施領域として決定する第2の学習不実施領域決定手段と、
を備えている情報処理装置。
(Appendix 14)
The information processing device according to appendix 12 or 13,
The first threshold determination means is
One or more inference results obtained by inputting each of one or more images included in the first evaluation data set into the first detection model, and one or more inference results attached to each of the one or more images Determine a third threshold that is smaller than the first threshold with reference to a comparison result with one or more correct labels;
The second threshold determination means is
One or more inference results obtained by inputting each of one or more images included in the second evaluation data set into the second detection model, and one or more inference results attached to each of the one or more images Determine a fourth threshold that is smaller than the second threshold by referring to a comparison result with one or more correct labels;
The information processing device is
In the pseudo-labeled second data set generated by the first data set generation means, one or more inference results by the first inference means are less than the first threshold and the third a first non-learning region determination means for determining a region corresponding to an inference result having a reliability equal to or higher than a threshold of as a non-learning region that is not subject to learning by the pseudo label reference learning means;
In the pseudo-labeled first data set generated by the second data set generation means, one or more inference results by the second inference means are less than the second threshold and the fourth a second non-learning region determination means for determining a region corresponding to an inference result having a reliability equal to or higher than a threshold of as a non-learning region that is not subject to learning by the pseudo label reference learning means;
Information processing device equipped with.
 上記第1の閾値未満かつ上記第3の閾値以上の信頼度を有する推論結果に対応する領域は、疑似ラベルを付与したとしても、信頼性の低い疑似ラベルになるという傾向がある。また、上記第2の閾値未満かつ上記第4の閾値以上の信頼度を有する推論結果に対応する領域は、疑似ラベルを付与したとしても、信頼性の低い疑似ラベルになるという傾向がある。このような領域を学習不実施領域に設定することによって、付記14の構成によれば、信頼性が相対的に高い疑似ラベルを用いて再学習をおこなうことができるので、疑似ラベル参照学習手段による対象画像用検知モデルの検知精度を向上させることができる。 A region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the third threshold tends to be a pseudo-label with low reliability even if a pseudo-label is assigned. In addition, even if a pseudo label is assigned to an area corresponding to an inference result having a reliability lower than the second threshold and equal to or higher than the fourth threshold, there is a tendency for the pseudo label to be a low-reliability pseudo label. By setting such an area as a non-learning area, according to the configuration of Supplementary Note 14, re-learning can be performed using pseudo labels with relatively high reliability. The detection accuracy of the target image detection model can be improved.
 (付記15)
 対象画像を取得する取得手段と、
 対象画像用検知モデルを用いて、前記対象画像に含まれるオブジェクトの検知を行う検知手段と、
を備え、
 前記対象画像用検知モデルは、
  第1のデータセットを用いて検知モデルの学習を行う学習処理、
  評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して閾値を決定する閾値決定処理、
  第2のデータセットに含まれる1又は複数の画像の各々を前記検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論処理、
  前記推論処理による1又は複数の推論結果のうち、前記閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成するデータセット生成処理、及び
  前記疑似ラベル付与後のデータセットを参照して、前記対象画像用検知モデルの学習を行う擬似ラベル参照学習処理
によって学習されたものである
ことを特徴とする情報処理装置。
(Appendix 15)
acquisition means for acquiring a target image;
detection means for detecting an object included in the target image using a target image detection model;
with
The target image detection model includes:
A learning process for learning a detection model using the first data set;
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination process for determining a threshold with reference to the result of comparison with
Inference processing for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the second data set into the detection model;
By setting an inference result having a reliability equal to or higher than the threshold among the one or more inference results by the inference process as a pseudo label and associating the pseudo label with the corresponding image, the data set after pseudo labeling and a pseudo-label reference learning process for learning the target image detection model by referring to the data set after the pseudo-labeling. processing equipment.
 付記15の構成によれば、自動で決定された閾値を用いて擬似ラベルが決定され、当該擬似ラベルが関連付けられた画像を含むデータセットを用いて学習が行われた対象画像用検知モデルを用いて、対象画像に含まれるオブジェクトを検知する。このため、付記15の構成によれば、閾値の調整に関するコストを削減した対象画像用検知モデルを用いて、対象画像に含まれるオブジェクトを検知することが可能となる。 According to the configuration of Supplementary Note 15, a pseudo label is determined using an automatically determined threshold, and a target image detection model trained using a data set including images associated with the pseudo label is used. to detect objects included in the target image. Therefore, according to the configuration of Supplementary Note 15, it is possible to detect an object included in the target image using the target image detection model in which the cost for adjusting the threshold value is reduced.
 (付記16)
 付記15に記載の情報処理装置であって、
 前記閾値決定処理では、前記比較結果を参照して、前記第1の閾値より小さい第2の閾値も決定し、
 前記データセット生成処理では、前記疑似ラベル付与後のデータセットにおいて、前記推論処理による1又は複数の推論結果のうち、前記第1の閾値未満かつ前記第2の閾値以上の信頼度を有する推論結果に対応する領域を、前記擬似ラベル参照学習処理による学習の対象とならない学習不実施領域として決定し、
 前記擬似ラベル参照学習処理では、前記学習不実施領域を含む前記疑似ラベル付与後のデータセットを参照して、前記対象画像用検知モデルの学習を行う、
ことを特徴とする情報処理装置。
(Appendix 16)
The information processing device according to appendix 15,
In the threshold determination process, a second threshold smaller than the first threshold is also determined with reference to the comparison result,
In the data set generation process, in the pseudo-labeled data set, one or more inference results obtained by the inference process have a reliability less than the first threshold and equal to or greater than the second threshold. is determined as a non-learning region that is not subject to learning by the pseudo label reference learning process,
In the pseudo-label reference learning process, the target image detection model is learned by referring to the pseudo-labeled data set including the non-learning region.
An information processing device characterized by:
 上記第1の閾値未満かつ上記第2の閾値以上の信頼度を有する推論結果に対応する領域は、疑似ラベルを付与したとしても、信頼性の低い疑似ラベルになるという傾向がある。このような領域を学習不実施領域に設定することによって、信頼性が相対的に高い疑似ラベルを用いて再学習をおこなうことができるので、付記16の構成によれば、疑似ラベル参照学習処理における対象画像用検知モデルの検知精度を向上させることができる。 A region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the second threshold tends to be a pseudo-label with low reliability even if a pseudo-label is assigned. By setting such a region as a non-learning region, re-learning can be performed using pseudo labels with relatively high reliability. The detection accuracy of the target image detection model can be improved.
 (付記17)
 第1のデータセットを用いて検知モデルの学習を行う学習工程と、
 評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して閾値を決定する閾値決定工程と、
 第2のデータセットに含まれる1又は複数の画像の各々を前記検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論工程と、
 前記推論工程による1又は複数の推論結果のうち、前記閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成するデータセット生成工程と
を含んでいることを特徴とする情報処理方法。
(Appendix 17)
a learning step of learning a detection model using the first data set;
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination step of determining a threshold with reference to the comparison result of
an inference step of obtaining one or more inference results for each of the one or more images contained in a second data set by inputting each of the one or more images into the sensing model;
setting an inference result having a reliability equal to or higher than the threshold among the one or more inference results obtained by the inference step as a pseudo-label, and associating the pseudo-label with the corresponding image to obtain a data set after pseudo-labeling; and a data set generation step of generating
 付記17の構成によれば、付記1に記載の情報処理装置と同様の効果を奏する。 According to the configuration of Supplementary Note 17, the same effect as that of the information processing apparatus described in Supplementary Note 1 is achieved.
 (付記18)
 付記17に記載の情報処理方法であって、
 前記閾値決定工程においては、前記比較結果を参照して、前記第1の閾値より小さい第2の閾値も決定し、
 前記データセット生成工程においては、前記疑似ラベル付与後のデータセットにおいて、前記推論工程による1又は複数の推論結果のうち、前記第1の閾値未満かつ前記第2の閾値以上の信頼度を有する推論結果に対応する領域を、前記擬似ラベル参照学習工程における学習の対象とならない学習不実施領域として決定する
ことを特徴とする情報処理方法。
(Appendix 18)
The information processing method according to appendix 17,
In the threshold determination step, a second threshold smaller than the first threshold is also determined with reference to the comparison result,
In the dataset generating step, in the pseudo-labeled dataset, one or more inference results obtained in the inference step have a degree of confidence less than the first threshold and greater than or equal to the second threshold. An information processing method, wherein a region corresponding to the result is determined as a non-learning region that is not subject to learning in the pseudo label reference learning step.
 上記第1の閾値未満かつ上記第2の閾値以上の信頼度を有する推論結果に対応する領域は、疑似ラベルを付与したとしても、信頼性の低い疑似ラベルになるという傾向がある。このような領域を学習不実施領域に設定することによって、信頼性が相対的に高い疑似ラベルを用いて再学習をおこなうことができるので、付記18の構成によれば、疑似ラベル参照学習工程における対象画像用検知モデルの検知精度を向上させることができる。 A region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the second threshold tends to be a pseudo-label with low reliability even if a pseudo-label is assigned. By setting such an area as a non-learning area, re-learning can be performed using pseudo labels with relatively high reliability. The detection accuracy of the target image detection model can be improved.
 (付記19)
 対象画像を取得することと、
 対象画像用検知モデルを用いて、前記対象画像に含まれるオブジェクトの検知を行うことと、
を含み、
 前記対象画像用検知モデルは、
  第1のデータセットを用いて検知モデルの学習を行う学習処理、
  評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して閾値を決定する閾値決定処理、
  第2のデータセットに含まれる1又は複数の画像の各々を前記検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論処理、
  前記推論処理による1又は複数の推論結果のうち、前記閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成するデータセット生成処理、及び
  前記疑似ラベル付与後のデータセットを参照して、前記対象画像用検知モデルの学習を行う擬似ラベル参照学習処理
によって学習されたものである
ことを特徴とする情報処理方法。
(Appendix 19)
obtaining a target image;
Detecting an object included in the target image using a target image detection model;
including
The target image detection model includes:
A learning process for learning a detection model using the first data set;
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination process for determining a threshold with reference to the result of comparison with
Inference processing for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the second data set into the detection model;
By setting an inference result having a reliability equal to or higher than the threshold among the one or more inference results by the inference process as a pseudo label and associating the pseudo label with the corresponding image, the data set after pseudo labeling and a pseudo-label reference learning process for learning the target image detection model by referring to the data set after the pseudo-labeling. Processing method.
 付記19の構成によれば、付記15に記載の情報処理装置と同様の作用効果を奏する。 According to the configuration of Supplementary Note 19, the same effects as those of the information processing apparatus described in Supplementary Note 15 are obtained.
 (付記20)
 第1のデータセットを用いて検知モデルの学習を行う学習工程と、
 評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して閾値を決定する閾値決定工程と、
 第2のデータセットに含まれる1又は複数の画像の各々を前記検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論工程と、
 前記推論工程による1又は複数の推論結果のうち、前記閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成するデータセット生成工程と、
 前記疑似ラベル付与後のデータセットを用いて、対象画像に含まれるオブジェクトの検知のための対象画像用検知モデルの学習を行う擬似ラベル参照学習工程と
を含んでいることを特徴とする検知モデルの製造方法。
(Appendix 20)
a learning step of learning a detection model using the first data set;
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination step of determining a threshold with reference to the comparison result of
an inference step of obtaining one or more inference results for each of the one or more images contained in a second data set by inputting each of the one or more images into the sensing model;
setting an inference result having a reliability equal to or higher than the threshold among the one or more inference results obtained by the inference step as a pseudo-label, and associating the pseudo-label with the corresponding image to obtain a data set after pseudo-labeling; a data set generation step that generates
and a pseudo label reference learning step of learning a target image detection model for detecting an object included in the target image using the data set after the pseudo labeling. Production method.
 付記20の構成によれば、第1のデータセットを用いて学習が行われた検知モデルによる、評価用データセットに含まれる画像の推論結果と、当該画像に関連付けられた正解ラベルとの比較に基づき、擬似ラベルの設定のための閾値を自動で決定する。このため、付記20の構成によれば、当該閾値の調整に関するコストを削減することが可能となる。そして、付記20の構成によれば、擬似ラベル付与後のデータセットを用いて対象画像用検知モデルの学習を行う。このため、付記20の構成によれば、閾値の調整に関するコストを削減して対象画像用検知モデルを製造することが可能となる。結果として、対象画像用検知モデルの学習を行うまでのコストを削減することができる。また、閾値として適切な値を決定することができれば、閾値の調整回数を低減させることができ、閾値の調整の度に必要となる学習の回数を低減することができる。結果として、対象画像用検知モデルの学習が完了するまでの時間を低減させることができる。 According to the configuration of Supplementary Note 20, the inference result of the image included in the evaluation data set by the detection model trained using the first data set is compared with the correct label associated with the image. Based on this, the threshold value for pseudo-label setting is automatically determined. Therefore, according to the configuration of Supplementary Note 20, it is possible to reduce the cost for adjusting the threshold. Then, according to the configuration of Supplementary Note 20, learning of the target image detection model is performed using the data set to which the pseudo label has been assigned. Therefore, according to the configuration of Supplementary Note 20, it is possible to manufacture the target image detection model while reducing the cost for adjusting the threshold value. As a result, it is possible to reduce the cost of learning the target image detection model. Also, if an appropriate value can be determined as the threshold value, the number of times the threshold value is adjusted can be reduced, and the number of times of learning required each time the threshold value is adjusted can be reduced. As a result, it is possible to reduce the time until the learning of the target image detection model is completed.
 (付記21)
 付記20に記載の検知モデルの製造方法であって、
 前記閾値決定工程においては、前記比較結果を参照して、前記第1の閾値より小さい第2の閾値も決定し、
 前記データセット生成工程においては、前記疑似ラベル付与後のデータセットにおいて、前記推論工程による1又は複数の推論結果のうち、前記第1の閾値未満かつ前記第2の閾値以上の信頼度を有する推論結果に対応する領域を、前記擬似ラベル参照学習工程における学習の対象とならない学習不実施領域として決定する
ことを特徴とする検知モデルの製造方法。
(Appendix 21)
A method for manufacturing a detection model according to Appendix 20,
In the threshold determination step, a second threshold smaller than the first threshold is also determined with reference to the comparison result,
In the dataset generating step, in the pseudo-labeled dataset, one or more inference results obtained in the inference step have a degree of confidence less than the first threshold and greater than or equal to the second threshold. A detection model manufacturing method, wherein a region corresponding to the result is determined as a non-learning region that is not subject to learning in the pseudo label reference learning step.
 上記第1の閾値未満かつ上記第2の閾値以上の信頼度を有する推論結果に対応する領域は、疑似ラベルを付与したとしても、信頼性の低い疑似ラベルになるという傾向がある。このような領域を学習不実施領域に設定することによって、信頼性が相対的に高い疑似ラベルを用いて再学習をおこなうことができるので、付記21の構成によれば、疑似ラベル参照学習工程における対象画像用検知モデルの検知精度を向上させることができる。 A region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the second threshold tends to be a pseudo-label with low reliability even if a pseudo-label is assigned. By setting such an area as a non-learning area, re-learning can be performed using pseudo labels with relatively high reliability. The detection accuracy of the target image detection model can be improved.
 (付記22)
 コンピュータを情報処理装置として機能させるためのプログラムであって、前記コンピュータを、
 第1のデータセットを用いて検知モデルの学習を行う学習手段と、
 評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して閾値を決定する閾値決定手段と、
 第2のデータセットに含まれる1又は複数の画像の各々を前記検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論手段と、
 前記推論手段による1又は複数の推論結果のうち、前記閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成するデータセット生成手段と
として機能させるプログラム。
(Appendix 22)
A program for causing a computer to function as an information processing device, the computer comprising:
a learning means for learning a detection model using the first data set;
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination means for determining a threshold with reference to the result of comparison with
inference means for obtaining one or more inference results for each of the one or more images included in the second data set by inputting each of the one or more images into the detection model;
setting an inference result having a reliability equal to or higher than the threshold among the one or more inference results by the inference means as a pseudo-label, and associating the pseudo-label with the corresponding image to obtain a data set after pseudo-labeling; A program that functions as a dataset generator that generates a .
 付記22の構成によれば、付記1に記載の情報処理装置と同様の作用効果を奏する。 According to the configuration of Supplementary Note 22, the same effects as those of the information processing apparatus described in Supplementary Note 1 are achieved.
 (付記23)
 コンピュータを情報処理装置として機能させるためのプログラムであって、前記コンピュータを、
 対象画像を取得する取得手段と、
 対象画像用検知モデルを用いて、前記対象画像に含まれるオブジェクトの検知を行う検知手段と、
として機能させ、
 前記対象画像用検知モデルは、
  第1のデータセットを用いて検知モデルの学習を行う学習処理、
  評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して閾値を決定する閾値決定処理、
  第2のデータセットに含まれる1又は複数の画像の各々を前記検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論処理、
  前記推論処理による1又は複数の推論結果のうち、前記閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成するデータセット生成処理、及び
  前記疑似ラベル付与後のデータセットを参照して、前記対象画像用検知モデルの学習を行う擬似ラベル参照学習処理
によって学習されたものであるプログラム。
(Appendix 23)
A program for causing a computer to function as an information processing device, the computer comprising:
acquisition means for acquiring a target image;
detection means for detecting an object included in the target image using a target image detection model;
function as
The target image detection model includes:
A learning process for learning a detection model using the first data set;
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination process for determining a threshold with reference to the result of comparison with
Inference processing for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the second data set into the detection model;
By setting an inference result having a reliability equal to or higher than the threshold among the one or more inference results by the inference process as a pseudo label and associating the pseudo label with the corresponding image, the data set after pseudo labeling and a pseudo-label reference learning process for learning the target image detection model by referring to the pseudo-labeled data set.
 付記23の構成によれば、付記15に記載の情報処理装置と同様の作用効果を奏する。 According to the configuration of Supplementary Note 23, the same effect as the information processing apparatus described in Supplementary Note 15 is achieved.
 〔付記事項3〕
 上述した実施形態の一部又は全部は、更に、以下のように表現することもできる。
[Appendix 3]
Some or all of the embodiments described above can also be expressed as follows.
 少なくとも1つのプロセッサを備え、前記プロセッサは、第1のデータセットを用いて検知モデルの学習を行う学習処理と、評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第1の閾値を決定する閾値決定処理と、第2のデータセットに含まれる1又は複数の画像の各々を前記検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論処理と、前記推論手段による1又は複数の推論結果のうち、前記第1の閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成するデータセット生成処理と、を実行する情報処理装置。 At least one processor is provided, and the processor inputs each of one or more images included in the evaluation data set into the detection model, and a learning process of learning a detection model using a first data set. a threshold determination process for determining a first threshold with reference to a comparison result between one or more inference results obtained by and one or more correct labels attached to each of the one or more images; inference processing for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the two data sets into the detection model; of the one or more inference results by setting an inference result having a confidence level equal to or higher than the first threshold as a pseudo-label, and associating the pseudo-label with the corresponding image, a data set after pseudo-labeling and an information processing device that executes a data set generation process that generates a
 なお、この情報処理装置は、更にメモリを備えていてもよく、このメモリには、前記学習処理と、前記閾値決定処理と、前記推論処理と、前記データセット生成処理と、を前記プロセッサに実行させるためのプログラムが記憶されていてもよい。また、このプログラムは、コンピュータ読み取り可能な一時的でない有形の記録媒体に記録されていてもよい。 Note that the information processing apparatus may further include a memory, in which the learning process, the threshold value determination process, the inference process, and the data set generation process are executed by the processor. A program may be stored for causing the Also, this program may be recorded in a computer-readable non-temporary tangible recording medium.
 少なくとも1つのプロセッサを備え、前記プロセッサは、対象画像を取得する取得処理と、対象画像用検知モデルを用いて、前記対象画像に含まれるオブジェクトの検知を行う検知処理と、を実行し、前記対象画像用検知モデルは、第1のデータセットを用いて検知モデルの学習を行う学習処理、評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第1の閾値を決定する閾値決定処理、第2のデータセットに含まれる1又は複数の画像の各々を前記検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論処理、前記推論処理による1又は複数の推論結果のうち、前記第1の閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成するデータセット生成処理、及び前記疑似ラベル付与後のデータセットを参照して、前記対象画像用検知モデルの学習を行う擬似ラベル参照学習処理によって学習されたものである、情報処理装置。 At least one processor is provided, and the processor performs an acquisition process of acquiring a target image and a detection process of detecting an object included in the target image using a target image detection model, and the target The image detection model includes a learning process for learning the detection model using the first data set, and one or more images obtained by inputting each of one or more images included in the evaluation data set into the detection model. with reference to the comparison result between the inference result and one or more correct labels attached to each of the one or more images to determine the first threshold, included in the second data set Inference processing for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images into the detection model, obtaining one or more inference results from the inference processing Among them, a dataset generation process for generating a dataset after pseudo-labeling by setting an inference result having a reliability equal to or higher than the first threshold as a pseudo-label and associating the pseudo-label with the corresponding image; and an information processing apparatus that is learned by a pseudo-label reference learning process of learning the detection model for the target image by referring to the data set after the pseudo-labeling.
 なお、この情報処理装置は、更にメモリを備えていてもよく、このメモリには、前記取得処理と、前記検知処理とを前記プロセッサに実行させるためのプログラムが記憶されていてもよい。また、このプログラムは、コンピュータ読み取り可能な一時的でない有形の記録媒体に記録されていてもよい。 The information processing apparatus may further include a memory, and the memory may store a program for causing the processor to execute the acquisition process and the detection process. Also, this program may be recorded in a computer-readable non-temporary tangible recording medium.
 10、10a、10b、10c、20、20a 情報処理装置
 101 学習部
 101-1 第1の学習部
 101-2 第2の学習部
 102 閾値決定部
 102-1 第1の閾値決定部
 102-2 第2の閾値決定部
 103 推論部
 103-1 第1の推論部
 103-2 第2の推論部
 104 データセット生成部
 104-1 第1のデータセット生成部
 104-2 第2のデータセット生成部
 105 再学習部
 106 学習不実施領域決定部
 106-1 第1の学習不実施領域決定部
 106-2 第2の学習不実施領域決定部
 201 取得部
 202 検知部
 DS1 データセット1
 DS1’ データセット1’
 DS2 データセット2
 DS2’ データセット2’
 DSE 評価用データセット
 DSE1 評価用データセット1
 DSE2 評価用データセット2
 DM 物体検知モデル

 
10, 10a, 10b, 10c, 20, 20a Information processing device 101 Learning unit 101-1 First learning unit 101-2 Second learning unit 102 Threshold determination unit 102-1 First threshold determination unit 102-2 Second 2 threshold determination unit 103 inference unit 103-1 first inference unit 103-2 second inference unit 104 data set generation unit 104-1 first data set generation unit 104-2 second data set generation unit 105 Re-learning unit 106 Non-learning area determination unit 106-1 First non-learning area determination unit 106-2 Second non-learning area determination unit 201 Acquisition unit 202 Detection unit DS1 Data set 1
DS1' data set 1'
DS2 data set 2
DS2' data set 2'
DSE Evaluation data set DSE1 Evaluation data set 1
DSE2 Evaluation data set 2
DM object detection model

Claims (23)

  1.  第1のデータセットを用いて検知モデルの学習を行う学習手段と、
     評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第1の閾値を決定する閾値決定手段と、
     第2のデータセットに含まれる1又は複数の画像の各々を前記検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論手段と、
     前記推論手段による1又は複数の推論結果のうち、前記第1の閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成するデータセット生成手段と
    を備えていることを特徴とする情報処理装置。
    a learning means for learning a detection model using the first data set;
    One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination means for determining the first threshold with reference to the comparison result of
    inference means for obtaining one or more inference results for each of the one or more images included in the second data set by inputting each of the one or more images into the detection model;
    setting an inference result having a reliability equal to or higher than the first threshold among the one or more inference results by the inference means as a pseudo label, and associating the pseudo label with the corresponding image, and data set generation means for generating a data set of .
  2.  前記疑似ラベル付与後のデータセットを用いて、対象画像に含まれるオブジェクトの検知のための対象画像用検知モデルの学習を行う擬似ラベル参照学習手段を更に備えている
    ことを特徴とする請求項1に記載の情報処理装置。
    2. The method according to claim 1, further comprising pseudo-label reference learning means for learning a target image detection model for detecting an object included in the target image, using the data set to which the pseudo label has been assigned. The information processing device according to .
  3.  前記擬似ラベル参照学習手段は、前記対象画像用検知モデルの学習として、前記検知モデルの再学習を行う
    ことを特徴とする請求項2に記載の情報処理装置。
    3. The information processing apparatus according to claim 2, wherein the pseudo label reference learning means re-learns the detection model as the learning of the target image detection model.
  4.  前記閾値決定手段は、
     前記評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して、前記第1の閾値より小さい第2の閾値を決定し、
     当該情報処理装置は、
     前記データセット生成手段が生成した前記疑似ラベル付与後のデータセットにおいて、前記推論手段による1又は複数の推論結果のうち、前記第1の閾値未満かつ前記第2の閾値以上の信頼度を有する推論結果に対応する領域を、前記擬似ラベル参照学習手段による学習の対象とならない学習不実施領域として決定する学習不実施領域決定手段を更に備えている
    請求項2又は3に記載の情報処理装置。
    The threshold determination means is
    One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct answers given to each of the one or more images Determine a second threshold that is less than the first threshold by referring to the comparison result with the label;
    The information processing device is
    Inference having reliability less than the first threshold and equal to or greater than the second threshold among one or more inference results by the inference means in the pseudo-labeled dataset generated by the dataset generation means 4. The information processing apparatus according to claim 2, further comprising non-learning area determination means for determining an area corresponding to the result as a non-learning area that is not subject to learning by said pseudo label reference learning means.
  5.  前記正解ラベルには、当該正解ラベルに関連付けられた画像に含まれるオブジェクトの領域を示す領域情報、及び、当該オブジェクトのカテゴリを示すカテゴリ情報が含まれており、
     前記疑似ラベルには、当該疑似ラベルに関連付けられた画像に含まれるオブジェクトの領域を示す領域情報、及び、当該オブジェクトのカテゴリを示すカテゴリ情報が含まれている
    ことを特徴とする請求項1から4の何れか1項に記載の情報処理装置。
    The correct label includes area information indicating the area of the object included in the image associated with the correct label, and category information indicating the category of the object,
    5. The pseudo label includes area information indicating the area of the object included in the image associated with the pseudo label, and category information indicating the category of the object. The information processing apparatus according to any one of 1.
  6.  前記第2のデータセットに含まれる1又は複数の画像の少なくとも一部には、1又は複数の正解ラベルが付されており、
     前記データセット生成手段は、
      前記疑似ラベルに関連付けられた画像に含まれるオブジェクトに正解ラベルが付与されていた場合であって、当該疑似ラベルに含まれる領域情報が示す領域と、当該正解ラベルに含まれる領域情報が示す領域との重なりの度合いが所定の度合い以上である場合に、当該疑似ラベルを削除する
    ことを特徴とする請求項5に記載の情報処理装置。
    At least part of the one or more images included in the second data set is labeled with one or more correct labels,
    The data set generation means is
    When a correct label is assigned to an object included in an image associated with the pseudo label, the area indicated by the area information included in the pseudo label and the area indicated by the area information included in the correct label 6. The information processing apparatus according to claim 5, wherein the pseudo label is deleted when the degree of overlap of the two is greater than or equal to a predetermined degree.
  7.  前記閾値決定手段は、カテゴリ毎に前記第1の閾値を設定し、
     前記データセット生成手段は、
      前記推論手段による1又は複数の推論結果のうち、カテゴリ毎に設定された前記第1の閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成する
    ことを特徴とする請求項5又は6に記載の情報処理装置。
    The threshold determination means sets the first threshold for each category,
    The data set generation means is
    setting an inference result having a reliability equal to or higher than the first threshold set for each category among the one or more inference results by the inference means as a pseudo label, and associating the pseudo label with the corresponding image; 7. The information processing apparatus according to claim 5, wherein the pseudo-labeled data set is generated by:
  8.  前記閾値決定手段は、前記比較結果が示す適合率と再現率とを参照して前記第1の閾値を決定する
    ことを特徴とする請求項1から7の何れか1項に記載の情報処理装置。
    8. The information processing apparatus according to any one of claims 1 to 7, wherein said threshold determination means determines said first threshold by referring to a precision and a recall indicated by said comparison result. .
  9.  前記評価用データセットは、前記第1のデータセットに含まれている
    ことを特徴とする請求項1から8の何れか1項に記載の情報処理装置。
    9. The information processing apparatus according to claim 1, wherein said evaluation data set is included in said first data set.
  10.  前記評価用データセットは、前記第2のデータセットの一部に、正解ラベルを付与することによって生成されたものである
    ことを特徴とする請求項1から8の何れか1項に記載の情報処理装置。
    9. The information according to any one of claims 1 to 8, wherein the evaluation data set is generated by giving a correct label to a part of the second data set. processing equipment.
  11.  第1のデータセットを用いて第1の検知モデルの学習を行う第1の学習手段と、
     第2のデータセットを用いて第2の検知モデルの学習を行う第2の学習手段と、
     第1の評価用データセットに含まれる1又は複数の画像の各々を前記第1の検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第1の閾値を決定する第1の閾値決定手段と、
     第2の評価用データセットに含まれる1又は複数の画像の各々を前記第2の検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第2の閾値を決定する第2の閾値決定手段と、
     前記第2のデータセットに含まれる1又は複数の画像の各々を前記第1の検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する第1の推論手段と、
     前記第1のデータセットに含まれる1又は複数の画像の各々を前記第2の検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する第2の推論手段と、
     前記第1の推論手段による1又は複数の推論結果のうち、前記第1の閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後の第2のデータセットを生成する第1のデータセット生成手段と、
     前記第2の推論手段による1又は複数の推論結果のうち、前記第2の閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後の第1のデータセットを生成する第2のデータセット生成手段と、
    を備えていることを特徴とする情報処理装置。
    a first learning means for learning a first detection model using a first data set;
    a second learning means for learning a second detection model using a second data set;
    One or more inference results obtained by inputting each of one or more images included in the first evaluation data set into the first detection model, and one or more inference results attached to each of the one or more images a first threshold determination means for determining a first threshold with reference to a comparison result with one or more correct labels;
    One or more inference results obtained by inputting each of one or more images included in the second evaluation data set into the second detection model, and one or more inference results attached to each of the one or more images a second threshold determination means for determining a second threshold with reference to a comparison result with one or more correct labels;
    a first obtaining one or more inference results for each of the one or more images contained in the second data set by inputting each of the one or more images into the first detection model; an inference means for
    a second obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the first data set into the second detection model; an inference means for
    By setting an inference result having a reliability equal to or higher than the first threshold among the one or more inference results by the first inference means as a pseudo label and associating the pseudo label with the corresponding image, a first data set generation means for generating a second data set after labeling;
    By setting an inference result having a reliability equal to or higher than the second threshold among the one or more inference results by the second inference means as a pseudo label and associating the pseudo label with the corresponding image, a second data set generating means for generating a first data set after labeling;
    An information processing device comprising:
  12.  前記疑似ラベル付与後の第1のデータセット、及び前記疑似ラベル付与後の第2のデータセットを用いて、対象画像に含まれるオブジェクトの検知のための対象画像用検知モデルの学習を行う擬似ラベル参照学習手段を更に備えている
    ことを特徴とする請求項11に記載の情報処理装置。
    A pseudo label for learning a target image detection model for detecting an object included in the target image using the first data set after the pseudo labeling and the second data set after the pseudo labeling 12. The information processing apparatus according to claim 11, further comprising reference learning means.
  13.  前記擬似ラベル参照学習手段は、前記対象画像用検知モデルの学習として、前記第1の検知モデル、及び前記第2の検知モデルの再学習を行う
    ことを特徴とする請求項12に記載の情報処理装置。
    13. The information processing according to claim 12, wherein said pseudo label reference learning means re-learns said first detection model and said second detection model as learning of said target image detection model. Device.
  14.  前記第1の閾値決定手段は、
     第1の評価用データセットに含まれる1又は複数の画像の各々を前記第1の検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して、前記第1の閾値より小さい第3の閾値を決定し、
     前記第2の閾値決定手段は、
     第2の評価用データセットに含まれる1又は複数の画像の各々を前記第2の検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して、前記第2の閾値より小さい第4の閾値を決定し、
     当該情報処理装置は、
     前記第1のデータセット生成手段が生成した前記疑似ラベル付与後の第2のデータセットにおいて、前記第1の推論手段による1又は複数の推論結果のうち、前記第1の閾値未満かつ前記第3の閾値以上の信頼度を有する推論結果に対応する領域を、前記擬似ラベル参照学習手段による学習の対象とならない学習不実施領域として決定する第1の学習不実施領域決定手段と、
     前記第2のデータセット生成手段が生成した前記疑似ラベル付与後の第1のデータセットにおいて、前記第2の推論手段による1又は複数の推論結果のうち、前記第2の閾値未満かつ前記第4の閾値以上の信頼度を有する推論結果に対応する領域を、前記擬似ラベル参照学習手段による学習の対象とならない学習不実施領域として決定する第2の学習不実施領域決定手段と、
    を備えている請求項12又は13に記載の情報処理装置。
    The first threshold determination means is
    One or more inference results obtained by inputting each of one or more images included in the first evaluation data set into the first detection model, and one or more inference results attached to each of the one or more images Determine a third threshold that is smaller than the first threshold with reference to a comparison result with one or more correct labels;
    The second threshold determination means is
    One or more inference results obtained by inputting each of one or more images included in the second evaluation data set into the second detection model, and one or more inference results attached to each of the one or more images Determine a fourth threshold that is smaller than the second threshold by referring to a comparison result with one or more correct labels;
    The information processing device is
    In the pseudo-labeled second data set generated by the first data set generation means, one or more inference results by the first inference means are less than the first threshold and the third a first non-learning region determination means for determining a region corresponding to an inference result having a reliability equal to or higher than a threshold of as a non-learning region that is not subject to learning by the pseudo label reference learning means;
    In the pseudo-labeled first data set generated by the second data set generation means, one or more inference results by the second inference means are less than the second threshold and the fourth a second non-learning region determination means for determining a region corresponding to an inference result having a reliability equal to or higher than a threshold of as a non-learning region that is not subject to learning by the pseudo label reference learning means;
    The information processing apparatus according to claim 12 or 13, comprising:
  15.  対象画像を取得する取得手段と、
     対象画像用検知モデルを用いて、前記対象画像に含まれるオブジェクトの検知を行う検知手段と、
    を備え、
     前記対象画像用検知モデルは、
      第1のデータセットを用いて検知モデルの学習を行う学習処理、
      評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第1の閾値を決定する閾値決定処理、
      第2のデータセットに含まれる1又は複数の画像の各々を前記検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論処理、
      前記推論処理による1又は複数の推論結果のうち、前記第1の閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成するデータセット生成処理、及び
      前記疑似ラベル付与後のデータセットを参照して、前記対象画像用検知モデルの学習を行う擬似ラベル参照学習処理
    によって学習されたものである
    ことを特徴とする情報処理装置。
    acquisition means for acquiring a target image;
    detection means for detecting an object included in the target image using a target image detection model;
    with
    The target image detection model includes:
    A learning process for learning a detection model using the first data set;
    One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination process for determining the first threshold with reference to the comparison result of
    Inference processing for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the second data set into the detection model;
    By setting an inference result having a reliability equal to or higher than the first threshold among the one or more inference results by the inference processing as a pseudo label and associating the pseudo label with the corresponding image, after giving the pseudo label and a pseudo-label reference learning process for learning the target image detection model by referring to the pseudo-labeled data set. and information processing equipment.
  16.  前記閾値決定処理では、前記比較結果を参照して、前記第1の閾値より小さい第2の閾値も決定し、
     前記データセット生成処理では、前記疑似ラベル付与後のデータセットにおいて、前記推論処理による1又は複数の推論結果のうち、前記第1の閾値未満かつ前記第2の閾値以上の信頼度を有する推論結果に対応する領域を、前記擬似ラベル参照学習処理による学習の対象とならない学習不実施領域として決定し、
     前記擬似ラベル参照学習処理では、前記学習不実施領域を含む前記疑似ラベル付与後のデータセットを参照して、前記対象画像用検知モデルの学習を行う
    請求項15に記載の情報処理装置。
    In the threshold determination process, a second threshold smaller than the first threshold is also determined with reference to the comparison result,
    In the data set generation process, in the pseudo-labeled data set, one or more inference results obtained by the inference process have a reliability less than the first threshold and equal to or greater than the second threshold. is determined as a non-learning region that is not subject to learning by the pseudo label reference learning process,
    16. The information processing apparatus according to claim 15, wherein in the pseudo-label reference learning process, the target image detection model is learned by referring to the pseudo-labeled data set including the non-learning region.
  17.  第1のデータセットを用いて検知モデルの学習を行う学習工程と、
     評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第1の閾値を決定する閾値決定工程と、
     第2のデータセットに含まれる1又は複数の画像の各々を前記検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論工程と、
     前記推論工程による1又は複数の推論結果のうち、前記第1の閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成するデータセット生成工程と
    を含んでいることを特徴とする情報処理方法。
    a learning step of learning a detection model using the first data set;
    One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination step of determining the first threshold with reference to the comparison result of
    an inference step of obtaining one or more inference results for each of the one or more images contained in a second data set by inputting each of the one or more images into the sensing model;
    setting an inference result having a reliability equal to or higher than the first threshold among the one or more inference results in the inference step as a pseudo label, and associating the pseudo label with the corresponding image, and a data set generation step of generating a data set of .
  18.  前記疑似ラベル付与後の第1のデータセット、及び前記疑似ラベル付与後の第2のデータセットを用いて、対象画像に含まれるオブジェクトの検知のための対象画像用検知モデルの学習を行う擬似ラベル参照学習工程を更に含み、
     前記閾値決定工程においては、前記比較結果を参照して、前記第1の閾値より小さい第2の閾値も決定し、
     前記データセット生成工程においては、前記疑似ラベル付与後のデータセットにおいて、前記推論工程による1又は複数の推論結果のうち、前記第1の閾値未満かつ前記第2の閾値以上の信頼度を有する推論結果に対応する領域を、前記擬似ラベル参照学習工程における学習の対象とならない学習不実施領域として決定する
    請求項17に記載の情報処理方法。
    A pseudo label for learning a target image detection model for detecting an object included in the target image using the first data set after the pseudo labeling and the second data set after the pseudo labeling further comprising a reference learning step;
    In the threshold determination step, a second threshold smaller than the first threshold is also determined with reference to the comparison result,
    In the dataset generating step, in the pseudo-labeled dataset, one or more inference results obtained in the inference step have a degree of confidence less than the first threshold and greater than or equal to the second threshold. 18. The information processing method according to claim 17, wherein an area corresponding to the result is determined as a non-learning area that is not subject to learning in the pseudo label reference learning step.
  19.  対象画像を取得することと、
     対象画像用検知モデルを用いて、前記対象画像に含まれるオブジェクトの検知を行うことと、
    を含み、
     前記対象画像用検知モデルは、
      第1のデータセットを用いて検知モデルの学習を行う学習処理、
      評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第1の閾値を決定する閾値決定処理、
      第2のデータセットに含まれる1又は複数の画像の各々を前記検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論処理、
      前記推論処理による1又は複数の推論結果のうち、前記第1の閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成するデータセット生成処理、及び
      前記疑似ラベル付与後のデータセットを参照して、前記対象画像用検知モデルの学習を行う擬似ラベル参照学習処理
    によって学習されたものである
    ことを特徴とする情報処理方法。
    obtaining a target image;
    Detecting an object included in the target image using a target image detection model;
    including
    The target image detection model includes:
    A learning process for learning a detection model using the first data set;
    One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination process for determining the first threshold with reference to the comparison result of
    Inference processing for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the second data set into the detection model;
    By setting an inference result having a reliability equal to or higher than the first threshold among the one or more inference results by the inference processing as a pseudo label and associating the pseudo label with the corresponding image, after giving the pseudo label and a pseudo-label reference learning process for learning the target image detection model by referring to the pseudo-labeled data set. Information processing method.
  20.  第1のデータセットを用いて検知モデルの学習を行う学習工程と、
     評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して第1の閾値を決定する閾値決定工程と、
     第2のデータセットに含まれる1又は複数の画像の各々を前記検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論工程と、
     前記推論工程による1又は複数の推論結果のうち、前記第1の閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成するデータセット生成工程と、
     前記疑似ラベル付与後のデータセットを用いて、対象画像に含まれるオブジェクトの検知のための対象画像用検知モデルの学習を行う擬似ラベル参照学習工程と
    を含んでいることを特徴とする検知モデルの製造方法。
    a learning step of learning a detection model using the first data set;
    One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination step of determining the first threshold with reference to the comparison result of
    an inference step of obtaining one or more inference results for each of the one or more images contained in a second data set by inputting each of the one or more images into the sensing model;
    setting an inference result having a reliability equal to or higher than the first threshold among the one or more inference results in the inference step as a pseudo label, and associating the pseudo label with the corresponding image, a dataset generation step for generating a dataset of
    and a pseudo label reference learning step of learning a target image detection model for detecting an object included in the target image using the data set after the pseudo labeling. Production method.
  21.  前記閾値決定工程においては、前記比較結果を参照して、前記第1の閾値より小さい第2の閾値も決定し、
     前記データセット生成工程においては、前記疑似ラベル付与後のデータセットにおいて、前記推論工程による1又は複数の推論結果のうち、前記第1の閾値未満かつ前記第2の閾値以上の信頼度を有する推論結果に対応する領域を、前記擬似ラベル参照学習工程における学習の対象とならない学習不実施領域として決定する
    請求項20に記載の検知モデルの製造方法。
    In the threshold determination step, a second threshold smaller than the first threshold is also determined with reference to the comparison result,
    In the dataset generating step, in the pseudo-labeled dataset, one or more inference results obtained in the inference step have a degree of confidence less than the first threshold and greater than or equal to the second threshold. 21. The detection model manufacturing method according to claim 20, wherein a region corresponding to the result is determined as a non-learning region that is not subject to learning in the pseudo label reference learning step.
  22.  コンピュータを情報処理装置として機能させるためのプログラムであって、前記コンピュータを、
     第1のデータセットを用いて検知モデルの学習を行う学習手段と、
     評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して閾値を決定する閾値決定手段と、
     第2のデータセットに含まれる1又は複数の画像の各々を前記検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論手段と、
     前記推論手段による1又は複数の推論結果のうち、前記閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成するデータセット生成手段と
    として機能させるプログラム。
    A program for causing a computer to function as an information processing device, the computer comprising:
    a learning means for learning a detection model using the first data set;
    One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination means for determining a threshold with reference to the result of comparison with
    inference means for obtaining one or more inference results for each of the one or more images included in the second data set by inputting each of the one or more images into the detection model;
    setting an inference result having a reliability equal to or higher than the threshold among the one or more inference results by the inference means as a pseudo-label, and associating the pseudo-label with the corresponding image to obtain a data set after pseudo-labeling; A program that functions as a dataset generator that generates a .
  23.  コンピュータを情報処理装置として機能させるためのプログラムであって、前記コンピュータを、
     対象画像を取得する取得手段と、
     対象画像用検知モデルを用いて、前記対象画像に含まれるオブジェクトの検知を行う検知手段と、
    として機能させ、
     前記対象画像用検知モデルは、
      第1のデータセットを用いて検知モデルの学習を行う学習処理、
      評価用データセットに含まれる1又は複数の画像の各々を前記検知モデルに入力して得られる1又は複数の推論結果と、当該1又は複数の画像の各々に付された1又は複数の正解ラベルとの比較結果を参照して閾値を決定する閾値決定処理、
      第2のデータセットに含まれる1又は複数の画像の各々を前記検知モデルに入力することによって、当該1又は複数の画像の各々についての1又は複数の推論結果を取得する推論処理、
      前記推論処理による1又は複数の推論結果のうち、前記閾値以上の信頼度を有する推論結果を疑似ラベルに設定し、当該疑似ラベルを、対応する画像に関連付けることによって、疑似ラベル付与後のデータセットを生成するデータセット生成処理、及び
      前記疑似ラベル付与後のデータセットを参照して、前記対象画像用検知モデルの学習を行う擬似ラベル参照学習処理
    によって学習されたものであるプログラム。

     
    A program for causing a computer to function as an information processing device, the computer comprising:
    acquisition means for acquiring a target image;
    detection means for detecting an object included in the target image using a target image detection model;
    function as
    The target image detection model includes:
    A learning process for learning a detection model using the first data set;
    One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination process for determining a threshold with reference to the result of comparison with
    Inference processing for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the second data set into the detection model;
    By setting an inference result having a reliability equal to or higher than the threshold among the one or more inference results by the inference process as a pseudo label and associating the pseudo label with the corresponding image, the data set after pseudo labeling and a pseudo-label reference learning process for learning the target image detection model by referring to the pseudo-labeled data set.

PCT/JP2022/005877 2021-03-05 2022-02-15 Information processing device, information processing method, method for manufacturing detection model, and program WO2022185899A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2023503690A JPWO2022185899A1 (en) 2021-03-05 2022-02-15

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JPPCT/JP2021/008696 2021-03-05
PCT/JP2021/008696 WO2022185531A1 (en) 2021-03-05 2021-03-05 Information processing device, information processing method, manufacturing method for detection model, and program

Publications (1)

Publication Number Publication Date
WO2022185899A1 true WO2022185899A1 (en) 2022-09-09

Family

ID=83154107

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/JP2021/008696 WO2022185531A1 (en) 2021-03-05 2021-03-05 Information processing device, information processing method, manufacturing method for detection model, and program
PCT/JP2022/005877 WO2022185899A1 (en) 2021-03-05 2022-02-15 Information processing device, information processing method, method for manufacturing detection model, and program

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/008696 WO2022185531A1 (en) 2021-03-05 2021-03-05 Information processing device, information processing method, manufacturing method for detection model, and program

Country Status (2)

Country Link
JP (1) JPWO2022185899A1 (en)
WO (2) WO2022185531A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116343050A (en) * 2023-05-26 2023-06-27 成都理工大学 Target detection method for remote sensing image noise annotation based on self-adaptive weight

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020225923A1 (en) * 2019-05-09 2020-11-12 日本電信電話株式会社 Analysis device, analysis method, and analysis program
US20200410388A1 (en) * 2019-06-25 2020-12-31 International Business Machines Corporation Model training using a teacher-student learning paradigm

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020225923A1 (en) * 2019-05-09 2020-11-12 日本電信電話株式会社 Analysis device, analysis method, and analysis program
US20200410388A1 (en) * 2019-06-25 2020-12-31 International Business Machines Corporation Model training using a teacher-student learning paradigm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KE MEI; CHUANG ZHU; JIAQI ZOU; SHANGHANG ZHANG: "Instance Adaptive Self-Training for Unsupervised Domain Adaptation", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 27 August 2020 (2020-08-27), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081749759 *
RHEE HOCHANG; CHO NAM IK: "Efficient and Robust Pseudo-Labeling for Unsupervised Domain Adaptation", 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), IEEE, 18 November 2019 (2019-11-18), pages 980 - 985, XP033733147, DOI: 10.1109/APSIPAASC47483.2019.9023239 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116343050A (en) * 2023-05-26 2023-06-27 成都理工大学 Target detection method for remote sensing image noise annotation based on self-adaptive weight
CN116343050B (en) * 2023-05-26 2023-08-01 成都理工大学 Target detection method for remote sensing image noise annotation based on self-adaptive weight

Also Published As

Publication number Publication date
JPWO2022185899A1 (en) 2022-09-09
WO2022185531A1 (en) 2022-09-09

Similar Documents

Publication Publication Date Title
US20210390416A1 (en) Variable parameter probability for machine-learning model generation and training
US10410121B2 (en) Adjusting automated neural network generation based on evaluation of candidate neural networks
US20200210847A1 (en) Ensembling of neural network models
CN111126574B (en) Method, device and storage medium for training machine learning model based on endoscopic image
US20180260531A1 (en) Training random decision trees for sensor data processing
US20200202210A1 (en) Systems and methods for training a neural network
WO2023109208A1 (en) Few-shot object detection method and apparatus
US11803780B2 (en) Training ensemble models to improve performance in the presence of unreliable base classifiers
WO2013116865A1 (en) Systems, methods, and media for updating a classifier
WO2019123451A1 (en) System and method for use in training machine learning utilities
US20220245405A1 (en) Deterioration suppression program, deterioration suppression method, and non-transitory computer-readable storage medium
WO2022185899A1 (en) Information processing device, information processing method, method for manufacturing detection model, and program
WO2021096799A1 (en) Deep face recognition based on clustering over unlabeled face data
WO2014176056A2 (en) Data classification
US20200250544A1 (en) Learning method, storage medium, and learning apparatus
WO2016084326A1 (en) Information processing system, information processing method, and recording medium
WO2021200392A1 (en) Data adjustment system, data adjustment device, data adjustment method, terminal device, and information processing device
CN112561073A (en) Training machine learning models using batch-based active learning schemes
CN114266927A (en) Unsupervised saliency target detection method, system, equipment and medium
Kumar Differentially private transferrable deep learning with membership-mappings
CN116596556A (en) Beef cattle traceability management system and method
Hu et al. P-Diff: Learning classifier with noisy labels based on probability difference distributions
US20220269718A1 (en) Method And Apparatus For Tracking Object
US11829883B2 (en) Executing a genetic algorithm on a low-power controller
US20210295151A1 (en) Method of machine-learning by collecting features of data and apparatus thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22762956

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023503690

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22762956

Country of ref document: EP

Kind code of ref document: A1