WO2022185899A1

WO2022185899A1 - Information processing device, information processing method, method for manufacturing detection model, and program

Info

Publication number: WO2022185899A1
Application number: PCT/JP2022/005877
Authority: WO
Inventors: 勇貴田中; 周平吉田; 真寺尾
Original assignee: 日本電気株式会社
Priority date: 2021-03-05
Filing date: 2022-02-15
Publication date: 2022-09-09
Also published as: JPWO2022185899A1; WO2022185531A1

Abstract

In order to generate a high-accuracy detection model while suppressing generation costs, this information processing device (10) comprises a training unit (101), a threshold value determination unit (102), an inference unit (103), and a dataset generation unit (104). The training unit trains a detection model using a first dataset. The threshold value determination unit compares an inference result obtained by inputting an image included in a dataset for evaluation to the detection model and a correct-answer label attached to the image, thereby determining a threshold value. The inference unit inputs an image included in a second dataset to the detection model and acquires an inference result for the image. The dataset generation unit: associates an inference result that is produced by the inference unit and that has a reliability greater than or equal to the threshold value with the corresponding image, the inference result being associated as a pseudo-label; and generates a dataset after addition of the pseudo-label.

Description

Information processing device, information processing method, detection model manufacturing method, and program

The present invention relates to a technique for associating pseudo-labels with one or more images included in a dataset used for re-learning a detection model.

A detection model that detects objects contained in images becomes a highly accurate detection model by learning using a large number of correct data. On the other hand, the process of collecting a large amount of data and associating correct labels with the data is expensive. For this reason, in order to generate a highly accurate detection model from a small number of correct data, there is known a technique of associating false labels with non-correct data.

A pseudo-label is a reliable inference result obtained by inferring an image from a non-correct dataset using a detection model trained only on a dataset with correct answers. For example, Non-Patent Document 1 discloses a method of adopting an inference result whose reliability is equal to or higher than a threshold value as a pseudo label.

Since the method described in Non-Patent Document 1 requires adjustment to set an appropriate threshold, there is room for reducing the time and computational costs required for this adjustment. In other words, there is room for further reducing the cost of generating highly accurate detection models using pseudo labels.

One aspect of the present invention has been made in view of the above problems. That is, an object of one aspect of the present invention is to provide a technology capable of generating a highly accurate detection model while suppressing generation costs.

An information processing apparatus according to an aspect of the present invention includes learning means for learning a detection model using a first data set, and inputting each of one or more images included in an evaluation data set to the detection model. threshold determination means for determining a first threshold with reference to a comparison result between one or more inference results obtained by the above and one or more correct labels attached to each of the one or more images; inference means for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the second data set into the detection model; Data after pseudo-labeling by setting an inference result having a reliability equal to or higher than the first threshold among one or more inference results by the means as a pseudo-label and associating the pseudo-label with the corresponding image a data set generation means for generating the set.

An information processing apparatus according to an aspect of the present invention includes first learning means for learning a first detection model using a first data set, and training of a second detection model using a second data set. second learning means for performing learning; one or more inference results obtained by inputting each of one or more images included in the first evaluation data set into the first detection model; or a first threshold determination means for determining a first threshold with reference to a comparison result with one or more correct labels attached to each of the plurality of images; or comparing one or more inference results obtained by inputting each of a plurality of images into the second detection model and one or more correct labels attached to each of the one or more images; second threshold determination means for determining a second threshold with reference to the one or more images included in the second data set, by inputting each of the one or more images into the first detection model; first inference means for obtaining one or more inference results for each of a plurality of images; and inputting each of the one or more images included in the first data set into the second detection model. a second inference means for obtaining one or more inference results for each of the one or more images; a first data set generation means for generating a second data set after pseudo-labeling by setting an inference result having a reliability of to a pseudo-label and associating the pseudo-label with the corresponding image; By setting an inference result having a reliability equal to or higher than the second threshold among one or more inference results by the second inference means as a pseudo label and associating the pseudo label with the corresponding image, and a second data set generating means for generating the first data set after application.

An information processing apparatus according to an aspect of the present invention includes acquisition means for acquiring a target image, and detection means for detecting an object included in the target image using a target image detection model, the target The image detection model includes a learning process for learning the detection model using the first data set, and one or more images obtained by inputting each of one or more images included in the evaluation data set into the detection model. with reference to the comparison result between the inference result and one or more correct labels attached to each of the one or more images to determine the first threshold, included in the second data set Inference processing for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images into the detection model, obtaining one or more inference results from the inference processing Among them, a dataset generation process for generating a dataset after pseudo-labeling by setting an inference result having a reliability equal to or higher than the first threshold as a pseudo-label and associating the pseudo-label with the corresponding image; and a pseudo-label reference learning process for learning the detection model for the target image by referring to the data set after the pseudo-labeling.

An information processing method according to an aspect of the present invention includes a learning step of learning a detection model using a first data set; a threshold determination step of determining a first threshold with reference to a comparison result between one or more inference results obtained by and one or more correct labels attached to each of the one or more images; an inference step of obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the second data set into the detection model; By setting an inference result having a reliability equal to or higher than the first threshold among one or more inference results by the process to a pseudo-label and associating the pseudo-label with the corresponding image, data after pseudo-labeling and a data set generation step for generating the set.

An information processing method according to an aspect of the present invention includes acquiring a target image, and detecting an object included in the target image using a target image detection model. The detection model includes a learning process for learning the detection model using the first data set, and one or more inferences obtained by inputting each of one or more images included in the evaluation data set into the detection model. Threshold determination processing for determining a first threshold with reference to a comparison result between the result and one or more correct labels attached to each of the one or more images, and one or more included in the second data set Inference processing for obtaining one or more inference results for each of the one or more images by inputting each of a plurality of images into the detection model, one or more inference results by the inference processing, A dataset generation process for generating a dataset after pseudo-labeling by setting an inference result having a reliability equal to or higher than the first threshold as a pseudo-label and associating the pseudo-label with the corresponding image; It is learned by a pseudo-label reference learning process of learning the detection model for the target image by referring to the data set to which pseudo-labels have been assigned.

A detection model manufacturing method according to an aspect of the present invention includes a learning step of learning a detection model using a first data set; A threshold determination step of determining a first threshold by referring to a comparison result between one or more inference results obtained by inputting to and one or more correct labels attached to each of the one or more images and an inference step of obtaining one or more inference results for each of the one or more images contained in the second data set by inputting each of the one or more images into the detection model; setting an inference result having a reliability equal to or higher than the first threshold among the one or more inference results in the inference step as a pseudo label, and associating the pseudo label with the corresponding image, and a pseudo-label reference learning step of learning a target image detection model for detecting an object included in the target image using the pseudo-labeled dataset. including.

A program according to an aspect of the present invention is a program for causing a computer to function as an information processing device, the computer comprising: learning means for learning a detection model using a first data set; Comparison of one or more inference results obtained by inputting each of one or more images included in the set into the detection model and one or more correct labels attached to each of the one or more images threshold determination means for determining a threshold value with reference to the result; or an inference means for obtaining a plurality of inference results, and among the one or more inference results obtained by the inference means, an inference result having a reliability equal to or higher than the threshold is set as a pseudo label, and the pseudo label is assigned to the corresponding image. By associating with , it functions as a dataset generation means for generating a dataset after pseudo-labeling.

A program according to an aspect of the present invention is a program for causing a computer to function as an information processing apparatus, wherein the computer acquires a target image using an acquisition unit for acquiring a target image and a target image detection model. and the detection means for detecting an object included in the target image detection model is a learning process for learning the detection model using the first data set, the evaluation data set 1 or A threshold with reference to a comparison result between one or more inference results obtained by inputting each of a plurality of images into the detection model and one or more correct labels attached to each of the one or more images and inputting each of the one or more images contained in the second data set into the detection model to obtain one or more inference results for each of the one or more images. an inference process, setting an inference result having a reliability equal to or higher than the threshold among one or more inference results from the inference process as a pseudo-label, and associating the pseudo-label with the corresponding image to assign the pseudo-label It is learned by a data set generation process for generating a later data set and a pseudo label reference learning process for learning the detection model for the target image by referring to the data set after the pseudo labeling.

According to one aspect of the present invention, it is possible to generate a highly accurate detection model while suppressing the generation cost.

1 is a block diagram showing the configuration of an information processing device according to exemplary Embodiment 1 of the present invention; FIG. 2 is a flow chart showing the flow of an information processing method executed by the information processing apparatus shown in FIG. 1; FIG. 1 is a block diagram showing the configuration of an information processing device according to exemplary Embodiment 1 of the present invention; FIG. 4 is a flow chart showing the flow of an information processing method executed by the information processing apparatus shown in FIG. 3; FIG. FIG. 7 is a block diagram showing the configuration of an information processing apparatus according to exemplary Embodiment 2 of the present invention; FIG. 10 is a diagram showing specific examples of data included in the first data set and the second data set according to exemplary embodiment 2 of the present invention; 6 is a graph showing the relationship between the relevance rate and the recall rate calculated by the information processing apparatus shown in FIG. 5; FIG. 10 is a diagram showing a specific example of data included in a pseudo-labeled data set according to illustrative embodiment 2 of the present invention; FIG. 6 is a flowchart showing the flow of an information processing method executed by the information processing apparatus shown in FIG. 5; FIG. 7 is a block diagram showing the configuration of an information processing apparatus according to exemplary Embodiment 2 of the present invention; FIG. 11 is a block diagram showing the configuration of an information processing apparatus according to exemplary Embodiment 3 of the present invention; FIG. 10 is a diagram showing specific examples of data contained in the first data set and the second data set according to exemplary embodiment 3 of the present invention; FIG. 11 shows an example of data contained in a second dataset and a pseudo-labeled dataset generated from the second dataset according to illustrative embodiment 3 of the present invention; FIG. 11 is a block diagram showing the configuration of an information processing apparatus according to exemplary Embodiment 4 of the present invention; FIG. 10 is a diagram showing specific examples of data contained in the first data set and the second data set according to illustrative embodiment 4 of the present invention; FIG. 10 is a diagram showing an example of data contained in a pseudo-labeled data set according to illustrative embodiment 4 of the present invention; FIG. 12 is a block diagram showing the configuration of an information processing apparatus according to exemplary Embodiment 5 of the present invention; FIG. 18 is a flowchart showing a flow of an information processing method executed by the information processing apparatus shown in FIG. 17; FIG. 12 is a block diagram showing the configuration of an information processing device according to exemplary Embodiment 6 of the present invention; 1 is a block diagram showing an example of a hardware configuration of an information processing device in each exemplary embodiment of the present invention; FIG.

[Exemplary embodiment 1]
A first exemplary embodiment of the invention will now be described in detail with reference to the drawings. This exemplary embodiment is the basis for the exemplary embodiments described later.

<Overview of Information Processing Device 10>
The information processing device 10 according to this exemplary embodiment has a function as a dataset generation device that generates a dataset after adding a pseudo-label by attaching a pseudo-label to a target dataset.

More specifically, the information processing device 10 first learns the detection model using the first data set. Further, the information processing apparatus 10 includes one or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more inference results attached to each of the images. The first threshold is determined by referring to the result of comparison with the correct label. Furthermore, the information processing apparatus 10 obtains one or more inference results for each of the one or more images included in the second data set by inputting each of the one or more images into the detection model. Furthermore, the information processing device 10 sets an inference result having a reliability equal to or higher than the first threshold among one or more inference results from one or more images included in the second data set as a pseudo label, A pseudo-labeled data set is generated by associating the pseudo-labels with the corresponding images.

<Configuration of information processing device 10>
A configuration of an information processing apparatus 10 according to this exemplary embodiment will be described with reference to FIG. FIG. 1 is a block diagram showing the configuration of an information processing device 10. As shown in FIG.

As shown in FIG. 1, the information processing device 10 includes a learning unit 101, a threshold determination unit 102, an inference unit 103, and a dataset generation unit 104. The learning unit 101 is a configuration that implements learning means in this exemplary embodiment. The threshold determination unit 102 is a configuration that implements threshold determination means in this exemplary embodiment. The inference unit 103 is a configuration that realizes inference means in this exemplary embodiment. The data set generation unit 104 is a configuration that implements data set generation means in this exemplary embodiment.

A learning unit 101 learns a detection model using the first data set. Specifically, the learning unit 101 uses a first data set including one or more images to learn a detection model for detecting objects included in the images. Detection means that by inputting an image into the detection model,
- Presence or absence of an object included in the image - Position of the object included in the image - Size of the object included in the image - Output of inference results regarding at least one of the category of the object included in the image. The learning unit 101 learns a detection model that receives an image as an input and outputs an inference result as described above.

The threshold determination unit 102 determines one or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct answers given to each of the images. A first threshold is determined by referring to the comparison result with the label. Here, the correct label is, for one or more objects included in each of one or more images included in the evaluation data set,
It is a label containing ground truth data regarding at least one of the position of an object included in the image, the size of the object included in the image, and the category of the object included in the image.

The inference unit 103 obtains one or more inference results for each of the one or more images included in the second data set by inputting each of the one or more images into the detection model described above. The second data set includes one or more images different from the first data set.

The data set generation unit 104 sets an inference result having a reliability equal to or higher than a first threshold among the one or more inference results by the inference unit 103 as a pseudo label, and associates the pseudo label with the corresponding image. generates a pseudo-labeled dataset. Here, the pseudo label is, for each of one or more images included in the second data set,
The label includes data on at least one of the position of each of one or more objects inferred to be objects by the inference unit 103, the size of each object, and the category of each object.

It should be noted that, assuming that a correct label exists for a certain object, the pseudo label assigned to the object may or may not match the correct label. For example, any one or more items of the position, size, and category of the object contained in the correct data about the object match the position, size, and category of the object in the pseudo label, and the other items match It may happen that you do not.

The accuracy of pseudo-labels can generally be adjusted by adjusting the first threshold described above, but adjusting the first threshold generally requires time and computational costs.

As described above, in the information processing apparatus 10 according to the present exemplary embodiment, the first data set for determining whether or not to use the inference result for each of the images included in the second data set as a pseudo label. A configuration for automatically determining the threshold value is adopted. Therefore, according to the information processing apparatus 10 according to the exemplary embodiment, it is possible to reduce the cost for adjusting the first threshold. Therefore, according to the information processing apparatus 10 according to this exemplary embodiment, it is possible to generate a highly accurate detection model while suppressing the generation cost.

<Flow of information processing method>
The flow of the information processing method S10 executed by the information processing apparatus 10 configured as described above will be described with reference to FIG. FIG. 2 is a flow diagram showing the flow of the information processing method S10. The information processing device 10 performs the information processing method S10 to generate a second data set including images with associated pseudo-labels.

As shown in FIG. 2, the information processing method S10 includes steps S101 to S104.

(Step S101)
In step S101, the learning unit 101 learns a detection model. Specifically, the learning unit 101 learns the detection model using the first data set. Step S101 is a learning step in this exemplary embodiment.

(Step S102)
In step S102, the threshold determination unit 102 determines a first threshold. Specifically, the threshold determination unit 102 inputs one or more images included in the evaluation data set to the detection model to obtain one or more inference results, and the A first threshold for determining a pseudo label is determined by referring to the result of comparison with one or more correct labels. Step S102 is the threshold determination step in this exemplary embodiment.

(Step S103)
In step S103, the inference unit 103 makes an inference. Specifically, the inference unit 103 obtains one or more inference results for each of the one or more images included in the second data set by inputting each of the one or more images into the detection model. Step S103 is an inference step in this exemplary embodiment.

(Step S104)
In step S104, the data set generation unit 104 generates a data set after adding pseudo labels. Specifically, the data set generation unit 104 sets an inference result having a reliability equal to or higher than a first threshold among the one or more inference results in step S103 as a pseudo label, and sets the pseudo label as a second inference result. generates a pseudo-labeled dataset by associating it with the corresponding images in the dataset. Step S104 is the data set generation step in this exemplary embodiment.

Note that the execution timing of step S103 is not limited to after execution of step S102. The execution timing may be after execution of step S101 and before execution of step S104, for example, before execution of step S102.

As described above, according to the information processing method S10 according to this exemplary embodiment, the same effects as those of the information processing apparatus 10 can be obtained. That is, in the information processing method S10 according to this exemplary embodiment, the first threshold for determining whether the inference result for each of the images included in the second data set is to be the pseudo label is automatically set. A configuration determined by is adopted. Therefore, according to the information processing method S10 according to the exemplary embodiment, it is possible to reduce the cost for adjusting the first threshold. Therefore, according to the information processing method S10 according to this exemplary embodiment, it is possible to generate a highly accurate detection model while suppressing the generation cost.

<Overview of Information Processing Device 20>
The information processing device 20 acquires a target image and uses the target image detection model to detect an object included in the image. Typically, the target image detection model is a re-learning of the detection model learned by the information processing apparatus 10 described above, specifically, the learning unit 101. This is a detection model that has been relearned with reference to the dataset. Note that the target image detection model is not limited to this. The detection model for the target image may be a detection model trained using the dataset after the pseudo-labeling, for example, a new detection model trained using the dataset after the pseudo-labeling. good too. Here, the new detection model is a detection model different from the detection model learned by the learning unit 101 .

<Configuration of information processing device 20>
The configuration of the information processing device 20 according to this exemplary embodiment will be described with reference to FIG. FIG. 3 is a block diagram showing the configuration of the information processing device 20. As shown in FIG.

As shown in FIG. 3 , the information processing device 20 includes an acquisition unit 201 and a detection unit 202 . The acquisition unit 201 is a configuration that implements acquisition means in this exemplary embodiment. The detection unit 202 is a configuration that realizes detection means in this exemplary embodiment.

The acquisition unit 201 acquires the target image. Here, the target image is an image input to the detection model in order to detect an object included in the image. For example, the acquisition unit 201 may acquire the target image by reading the target image stored in the information processing device 20, or may acquire the target image supplied from the imaging device. Also, for example, the acquisition unit 201 may acquire the target image via an input device (not shown). Further, for example, the acquiring unit 201 may acquire the target image from another device (not shown) communicably connected to the information processing device 20 .

The detection unit 202 detects an object included in the target image using the target image detection model. The target image detection model is a detection model used to detect an object included in the target image, and the target image detection model according to the present exemplary embodiment is the above-described relearned detection model. . The detection unit 202 acquires an inference result output from the target image detection model by inputting the target image into the target image detection model. For example, the detection unit 202 holds a target image detection model, and inputs the target image to the target image detection model. Further, for example, the detection unit 202 accesses a target image detection model stored in a storage device (not shown) and inputs a target image.

As described above, in the information processing apparatus 20 according to the present exemplary embodiment, a pseudo-label is determined using the automatically determined first threshold, and a data set including images associated with the pseudo-label is generated. A configuration is adopted in which an object is detected using a target image detection model that has been trained using the target image detection model. Therefore, according to the information processing apparatus 20 according to the present exemplary embodiment, it is possible to detect an object included in an image using a target image detection model in which the cost for adjusting the first threshold is reduced. effect is obtained.

<Flow of information processing method>
The flow of the information processing method S20 executed by the information processing apparatus 20 configured as described above will be described with reference to FIG. FIG. 4 is a flow diagram showing the flow of the information processing method S20. The information processing device 20 executes the information processing method S20 in order to detect an object included in the target image.

As shown in FIG. 4, the information processing method S20 includes steps S201 and S202.

(Step S201)
In step S201, the acquisition unit 201 acquires a target image.

(Step S202)
In step S202, the detection unit 202 detects an object. Specifically, the detection unit 202 detects an object included in the target image using the target image detection model. More specifically, the detection unit 202 inputs the target image acquired by the acquisition unit 201 to the target image detection model, and acquires the inference result output by the detection model.
As described above, according to the information processing method S20 according to this exemplary embodiment, the same effects as those of the information processing apparatus 20 can be obtained. That is, in the information processing method S20 according to the present exemplary embodiment, the pseudo-label is determined using the automatically determined first threshold, and the data set including the image associated with the pseudo-label is used for training. A configuration is adopted in which an object is detected using the target image detection model for which the above is performed. Therefore, according to the information processing method S20 according to the exemplary embodiment, it is possible to detect an object included in an image using a target image detection model that reduces the cost of adjusting the first threshold. effect is obtained.

[Exemplary embodiment 2]
A second exemplary embodiment of the invention will now be described in detail with reference to the drawings. Components having the same functions as the components described in the exemplary embodiment 1 are denoted by the same reference numerals, and descriptions thereof are omitted as appropriate.

<Overview of Information Processing Device 10a>
The information processing apparatus 10a according to this exemplary embodiment is a modification of the first exemplary embodiment. Specifically, the information processing device 10a acquires the first data set, and performs the learning of the detection model, the determination of the threshold value, the inference, and the creation of the data set after pseudo-labeling as described in the first exemplary embodiment. . Further, the information processing apparatus 10a learns the detection model for the target image using the generated pseudo-labeled data set. Typically, the detection model for the target image is a detection model that is re-learned with respect to the above-mentioned detection model, and is a detection model that has been re-learned with reference to the dataset after the pseudo-labeling. As described above, the target image detection model is not limited to a re-learned detection model, and may be a detection model learned using a data set after pseudo-labeling.

<Configuration of Information Processing Device 10a>
The configuration of the information processing device 10a will be described with reference to FIG. FIG. 5 is a block diagram showing the configuration of the information processing device 10a. As shown in FIG. 5, the information processing device 10a includes a control unit 100a and a storage unit 150a. The control unit 100a centrally controls each unit of the information processing device 10a. The storage unit 150a stores various programs and data used by the information processing apparatus 10a.

The storage unit 150a stores an evaluation data set DSE, a data set 1 (DS1), a data set 2 (DS2), a data set 2' (DS2'), and an object detection model DM. The evaluation data set DSE is the evaluation data set in this exemplary embodiment. Data set 1 (DS1) is the first data set in this exemplary embodiment. Data set 2 (DS2) is the second data set in this exemplary embodiment. Data set 2' (DS2') is the pseudo-labeled data set in this exemplary embodiment.

Here, the details of dataset 1 (DS1) and dataset 2 (DS2) will be described. FIG. 6 is a diagram showing a specific example of data included in data set 1 (DS1) and data set 2 (DS2). Specifically, FIG. 6 shows one of the images contained in dataset 1 (DS1) and one of the images contained in dataset 2 (DS2).

Each of these images contains five objects, specifically three people and two bags. In the images contained in dataset 1 (DS1), each of the five objects is associated with a correct label. Typically, the correct label is a label containing categories and bounding boxes as shown in FIG. The category is category information indicating the category of the object included in the image associated with the correct label, and specifically, correct data regarding the category of the object. In the example of FIG. 6, each of the three persons is associated with a "person" category, and each of the two bags is associated with a "bag" category. The bounding box is area information indicating the area of the object included in the image associated with the correct label, and specifically, correct data regarding the position and size of the object included in the image. One bounding box is associated with one object, and a typical example of the bounding box is data indicating the minimum rectangle enclosing the object, as shown in FIG.

On the other hand, in the images included in dataset 2 (DS2), no correct label is associated with the object.

Based on the above, the “image”, “dataset” and “correct label” described in each exemplary embodiment can be expressed as follows.
• The image x input to the sensing model is an element of the data space X; Here, the data space X corresponds to the data set containing the image x. Note that the number of objects included in one image x is arbitrary.
• A correct label can be represented by a pair (y,b) of category y and bounding box b. Note that category y is an element of category set Y, and in the example of FIG. 6, set Y is "person and bag".
・From the above, the data set D associated with the correct label is the image x and the set of all objects included in the image x

group with

as a set of

can be expressed as

(Configuration of control unit 100a)
As shown in FIG. 5, the control unit 100a includes a learning unit 101, a threshold value determination unit 102, an inference unit 103, a data set generation unit 104, and a relearning unit 105. 5, the threshold determination unit 102 includes an evaluation data set inference unit 1021, an evaluation value calculation unit 1022, and a threshold determination unit 1023. FIG. In addition, the dataset generator 104 includes a pseudo-label generator 1041 and an association unit 1042, as shown in FIG.

The evaluation data set inference unit 1021, the evaluation value calculation unit 1022, and the threshold determination unit 1023 correspond to the threshold determination unit 102 in exemplary embodiment 1, and are configured to implement the threshold determination means in this exemplary embodiment. The pseudo-label generating unit 1041 and the associating unit 1042 correspond to the dataset generating unit 104 in exemplary embodiment 1, and are configured to implement the dataset generating means in this exemplary embodiment. The re-learning unit 105 is a configuration that implements pseudo-label reference learning means in this exemplary embodiment.

The learning unit 101 acquires the data set 1 (DS1), and uses the data set 1 (DS1) to learn an object detection model for pseudo label generation. That is, the learning unit 101 also functions as an acquisition unit that acquires the first data set. Specifically, the learning unit 101 reads the data set 1 (DS1) stored in the storage unit 150a, and associates a correct label with each of the data set 1 (DS1), that is, one or more images. We train an object detection model for generating pseudo-labels using the data set. Then, the learning unit 101 outputs the trained pseudo-label generation object detection model to the evaluation data set inference unit 1021 and the inference unit 103 .

The evaluation data set inference unit 1021 generates an inference result based on the evaluation data set. Specifically, the evaluation data set inference unit 1021 acquires the evaluation data set DSE and the pseudo label generation object detection model, and converts each of the one or more images included in the evaluation data set DSE into pseudo label generation. input to the object detection model for use and obtain inference results. More specifically, the evaluation data set inference unit 1021 reads the evaluation data set DSE stored in the storage unit 150 a and inputs it to the pseudo label generation object detection model acquired from the learning unit 101 . Then, the evaluation data set inference unit 1021 acquires the inference result output by the object detection model for pseudo label generation, and outputs the inference result to the evaluation value calculation unit 1022 .

The evaluation data set DSE is a data set in which a correct label is associated with each object included in each image, similar to the data set 1 (DS1). For example, the images contained in the evaluation data set DSE may be part of the images contained in the data set 1 (DS1). Further, for example, the images included in the evaluation data set DSE may be generated by giving correct labels to part of the images included in the data set 2 (DS2). Also, for example, the images included in the evaluation data set DSE may be generated by assigning correct labels to images not included in the data set 1 (DS1) and the data set 2 (DS2). .

The inference result by the evaluation data set inference unit 1021 is, for each of one or more images included in the evaluation data set DSE,
position of each of the one or more objects inferred to be an object, size of each of the objects, and/or category of each of the objects; Contains data. Typically, the inference result by the evaluation data set inference unit 1021 includes category, bounding box and confidence. The reliability is an example of data related to the certainty of inference, and is a numerical value with 0 as the minimum value and 1 as the maximum value, for example.

The evaluation value calculation unit 1022 calculates an evaluation value based on the inference result. Specifically, the evaluation value calculation unit 1022 compares each inference result for each of the one or more images included in the evaluation data set DSE with the correct label for each image, based on each inference result Calculate the evaluation value of

For example, the evaluation value is the harmonic average of the precision and recall, that is, the F value. Here, F value calculation processing executed by the evaluation value calculation unit 1022 will be described.

Specifically, the evaluation value calculation unit 1022 executes the following processes (1) to (6).

(1) Sort all inference results in descending order of reliability.

(2) Identify inference results whose reliability is greater than or equal to the reference value. The reference value is, for example, 0.9. As will be described later, a plurality of F values are calculated in the F value calculation process. Then, the reference value becomes a different value for each F value. That is, the value 0.9 described above can be expressed as the initial value of the reference value.

(3) Specify whether the specified inference result is TP (True Positive) or FP (False Positive). Here, TP is an inference result in which the degree of overlap between the bounding box and the bounding box of the correct label is equal to or greater than a predetermined value and the category matches the correct label. Also, FP is
(A) An inference result where the category matches the correct label, but the degree of overlap between the bounding box and the bounding box of the correct label is less than or equal to a predetermined value (B) An inference result where the correct label where the bounding box overlaps and the category are different (C) Any inference result in which there is no correct label that overlaps the bounding box. As a value indicating the degree of overlapping of bounding boxes, for example, IOU (Intersection Over Union) is used.

(4) For correct labels, identify correct labels that are FN (false negative). FN is
(D) A correct label that does not have an inference result whose bounding box overlaps (E) A correct label whose category is different from the inference result whose bounding box overlaps.

(5) Calculate precision and recall. The precision rate is the accuracy rate of the inference result, and is calculated by, for example, precision = number of TPs/(number of TPs + number of FPs). The recall rate is the ratio of correctly inferred labels out of the correct labels, and is calculated by, for example, recall rate=number of TPs/(number of TPs+number of FNs). FIG. 7 is a graph showing the relationship between precision and recall. As shown in FIG. 7, the higher the reliability, the higher the precision, but the lower the recall. On the other hand, the lower the reliability, the lower the precision, but the higher the recall. In this way, there is a trade-off relationship between precision and recall. The reliability here is the reference value set in the process (2).

(6) Calculate the F value. The F value is calculated by (2×precision×recall)/(precision+recall).

When the above processing is completed, the evaluation value calculation unit 1022 decreases the reference value and executes the processing of (2) to (6) again. For example, the evaluation value calculation unit 1022 sets the next reference value to 0.8. In other words, the evaluation value calculator 1022 calculates the F value based on the following reference values. The evaluation value calculation unit 1022 repeats the processes (2) to (6) to calculate the F value based on each reference value. Thereby, a plurality of F values are calculated based on each of the different reference values.

As an example, the evaluation value calculation unit 1022 repeats the processes (2) to (6) until the F value is calculated with a reference value equal to or lower than the minimum reliability. In the case of this example, in the last processing (2) to (6), the F value is calculated for all inference results. Of the inference results specified in the process (2) from the second time onward, for the inference results already specified in the process (2) in the past, the processes (3) and (4) are omitted, and the past You may use the specific result in the process of (3) and (4).

The evaluation value calculation unit 1022 associates each calculated evaluation value, that is, the F value with the reference value used in calculating each F value, and outputs it to the threshold determination unit 1023 .

In addition, in the process of (5), the evaluation value calculation unit 1022 may calculate the precision and the recall for each category when there are multiple categories in the inference result and the correct label. In this example, in the process (6), the evaluation value calculation unit 1022 calculates the F value for each category. As a result, each reference value is associated with a plurality of F values calculated for each category.

Also, the evaluation value calculated by the evaluation value calculation unit 1022 is not limited to the F value. For example, the evaluation value may be a value that emphasizes precision or recall. In this example, in the process of (6), the evaluation value calculation unit 1022 calculates the evaluation value by, for example, {(1+β ² )×relevance×recall}/{(β ² ×relevance)+recall). can be calculated. β is a value for adjusting the degree of importance of precision to recall. , it becomes an evaluation value that emphasizes the relevance rate.

It should be noted that the method of specifying at least part of the inference result in the process (2) when calculating a plurality of evaluation values is not limited to the above example. For example, the evaluation value calculation unit 1022 may identify a predetermined number of inference results in descending order of reliability. In this example, the evaluation value calculation unit 1022 increases the predetermined number by a predetermined number each time the process (6) is completed and the next processes (2) to (6) are performed. Then, the evaluation value calculation unit 1022 repeats the processes (2) to (6) until all the inference results are specified by the process (2) and the evaluation values are calculated. It should be noted that the amount of increase in the predetermined number in the last process (2) should be 1 or more and the predetermined number or less. In this example, each calculated evaluation value is associated with the minimum reliability among the reliability of the specified inference result and output to the threshold determination unit 1023 .

Further, for example, instead of the processing of (2) to (6), the evaluation value calculation unit 1022
- For every inference result, specify TP, FP and FN.
- A plurality of thresholds are set for the reliability, and the number of TPs with reliability equal to or higher than each threshold is specified.
• Calculate precision and recall for each of the specified numbers of TPs.
- An evaluation value (typical example: F value) is calculated for each of the calculated multiple combinations of precision and recall.
and processing may be performed. Note that the number of specified TPs is proportional to the value of the recall. In this example, each calculated evaluation value is associated with the threshold value used to specify the number of TPs, and is output to the threshold determination unit 1023 .

The threshold determination unit 1023 determines the threshold based on the evaluation value. Specifically, the threshold determination unit 1023 identifies the maximum value among the acquired F values, and sets the reference value associated with the identified F value as the threshold. Here, the maximum value of the F values can be expressed as a value that balances the precision and the recall. As described above, the F value is calculated by a formula including the precision and the recall. Therefore, the threshold determination unit 1023 determines the threshold by referring to the precision and the recall indicated by the comparison result of the evaluation value calculation unit 1022. Then it can be expressed. In addition, as described above, there is a trade-off relationship between precision and recall. Therefore, the precision and recall at which the F value is the maximum value is the maximum precision or recall in the graph of FIG. It is not a point, but a point indicated by a star in the graph in FIG. 7, for example. Threshold determination section 1023 outputs the determined threshold to pseudo-label generation section 1041 .

In addition, in the case of an example in which the F value is calculated for each category, the threshold determination unit 1023 sets the threshold for each category. That is, the threshold determination unit 1023 determines a plurality of thresholds for each category, associates the plurality of thresholds with information indicating the corresponding category, and outputs the information to the pseudo-label generation unit 1041 .

The inference unit 103 reads the data set 2 (DS2) stored in the storage unit 150a, and adds one or a plurality of , and obtain one or more inference results for each of the images. The inference unit 103 outputs the acquired inference result to the pseudo-label generation unit 1041 .

The pseudo-label generation unit 1041 generates pseudo-labels. Specifically, the pseudo-label generation unit 1041 sets an inference result having reliability equal to or higher than the threshold determined by the threshold determination unit 1023 among one or more inference results by the inference unit 103 as a pseudo-label. The pseudo-label generation unit 1041 outputs the inference result set in the pseudo-label to the association unit 1042 .

Note that when obtaining a plurality of thresholds set for each category, the pseudo-label generation unit 1041 selects one or more inference results from the inference unit 103 that have a reliability equal to or higher than the threshold set for each category. Set the result to a pseudo-label. Specifically, the pseudo-label generation unit 1041 classifies the inference result by the inference unit 103 for each category, and specifies a corresponding threshold for each classification, in other words, a threshold that matches the category. Then, the pseudo-label generation unit 1041 compares the reliability of each inference result with the specified threshold for each classification, and sets an inference result having a reliability equal to or higher than the threshold as a pseudo-label.

The association unit 1042 associates the pseudo label set by the pseudo label generation unit 1041 with the corresponding image. This produces a dataset 2' (DS2') in which each of the one or more images contained in the dataset 2 (DS2) is associated with a pseudo-label. The association unit 1042 stores the generated data set 2 ′ (DS2′) in the storage unit 150 a and notifies the relearning unit 105 of it.

FIG. 8 is a diagram showing a specific example of data included in data set 2' (DS2'). Specifically, FIG. 8 shows one of the images contained in dataset 2' (DS2'). The image is an image included in data set 2 (DS2) shown in FIG. 6, and a pseudo label is associated with each of the five objects included in the image. Typically, pseudo-labels are labels that include categories and bounding boxes as shown in FIG. The category is category information indicating the category of objects included in the image associated with the pseudo label. In the example of FIG. 8, each of the three persons is associated with a "person" category, and each of the two bags is associated with a "bag" category. A bounding box is area information indicating the area of an object contained in an image associated with a pseudo label. One bounding box is associated with one object, and a typical example of the bounding box is data indicating the minimum rectangle enclosing the object, as shown in FIG.

The re-learning unit 105 learns the detection model for the target image using the data set after the pseudo-labeling. As an example, the re-learning unit 105 re-learns the detection model learned by the learning unit 101 as learning of the target image detection model. Specifically, the relearning unit 105 reads the data set 2' (DS2') from the storage unit 150a, and uses the data set 2' (DS2') to learn the object detection model DM. Then, the relearning unit 105 stores the learned object detection model DM in the storage unit 150a. As another example, the relearning unit 105 may learn a new detection model as learning of the target image detection model, and store the new detection model in the storage unit 150a.

As described above, the information processing apparatus 10a according to the present exemplary embodiment adopts a configuration in which the target image detection model is learned using the data set to which pseudo labels have been assigned. Therefore, according to the information processing apparatus 10a according to the exemplary embodiment, it is possible to reduce the cost of adjusting the threshold value and generate the target image detection model. Therefore, according to the information processing apparatus 10a according to the exemplary embodiment, it is possible to generate a highly accurate target image detection model while suppressing the generation cost.

Further, in the information processing apparatus 10a according to this exemplary embodiment, a configuration is adopted in which the detection model learned by the learning unit 101 is re-learned as the learning of the target image detection model. Therefore, according to the information processing apparatus 10a according to the present exemplary embodiment, it is possible to reduce the cost of re-learning and improve the accuracy of the detection model.

Further, in the information processing apparatus 10a according to the present exemplary embodiment, the threshold for determining whether the inference result for each of the images included in the second data set is to be the pseudo label is automatically determined. configuration is adopted. Therefore, according to the information processing apparatus 10a according to the present exemplary embodiment, it is possible to reduce the number of times of re-learning required each time the threshold is adjusted to one. As a result, the time required for re-learning can be reduced, and the time required for generation of the detection model can be reduced.

Further, as described above, in the information processing apparatus 10a according to this exemplary embodiment, a configuration is adopted in which the correct label and the pseudo label include area information and category information. Therefore, according to the information processing apparatus 10a according to the exemplary embodiment, it is possible to improve the accuracy of detecting an object included in an image using a re-learned detection model.

Further, as described above, the information processing apparatus 10a according to the exemplary embodiment adopts a configuration in which the threshold is determined by referring to the calculated relevance rate and recall rate. Therefore, according to the information processing apparatus 10a according to the exemplary embodiment, it is possible to improve the accuracy of pseudo label setting. Further, according to the information processing apparatus 10a according to the present exemplary embodiment, pseudo labels can be set in consideration of both the quality of learning data (precision rate) and the amount of learning data (recall rate). It is possible to obtain an effect that a highly accurate target image detection model can be generated.

Further, as described above, in the information processing apparatus 10a according to the present exemplary embodiment, the inference result of the image included in the evaluation data set and the correct answer associated with the image are obtained by the pseudo label generation object detection model. A configuration may be employed in which a threshold is set for each category of labels. Therefore, according to the information processing apparatus 10a according to the present exemplary embodiment, which employs this configuration, it is possible to improve the accuracy of setting pseudo labels.

Further, as described above, in the information processing device 10a according to the present exemplary embodiment, a configuration may be adopted in which the images included in the evaluation data set DSE are included in the first data set. Therefore, according to the information processing apparatus 10a according to the present exemplary embodiment, which employs this configuration, there is no need to newly perform a high-cost correct answer assignment task in order to generate the evaluation data set DSE. effect is obtained. Further, according to the information processing apparatus 10a according to the present exemplary embodiment, which employs the configuration, it is possible to reduce the number of images to be prepared in advance.

Further, as described above, in the information processing apparatus 10a according to the present exemplary embodiment, images included in the evaluation data set DSE are generated by giving correct labels to part of the second data set. A configuration may be employed. Therefore, according to the information processing apparatus 10a according to the present exemplary embodiment that employs this configuration, a part of the data set to which the pseudo label is assigned is used as the evaluation data set DSE to determine the threshold value. Therefore, it is possible to obtain an effect that the accuracy of the assigned pseudo-label can be improved. Further, according to the information processing apparatus 10a according to this exemplary embodiment, which employs the configuration, it is possible to reduce the number of images to be prepared in advance.

<Flow of information processing method>
The flow of the information processing method S10a executed by the information processing apparatus 10a configured as described above will be described with reference to FIG. FIG. 9 is a flowchart showing the flow of the information processing method S10a. The information processing device 10a performs the information processing method S10a to generate a second data set including images with associated pseudo-labels.

(Step S101)
In step S101, the learning unit 101 learns a detection model. Specifically, the learning unit 101 reads the data set 1 (DS1) stored in the storage unit 150a, and associates a correct label with each of the data set 1 (DS1), that is, one or more images. We train an object detection model for generating pseudo-labels using the data set. Then, the learning unit 101 outputs the trained pseudo-label generation object detection model to the evaluation data set inference unit 1021 and the inference unit 103 .

(Step S1021)
In step S1021, the evaluation data set inference unit 1021 generates an inference result based on the evaluation data set. Specifically, the evaluation data set inference unit 1021 reads the evaluation data set DSE stored in the storage unit 150 a and inputs it to the pseudo label generation object detection model acquired from the learning unit 101 . Then, the evaluation data set inference unit 1021 acquires the inference result output by the object detection model for pseudo label generation, and outputs the inference result to the evaluation value calculation unit 1022 .

(Step S1022)
In step S1022, the evaluation value calculator 1022 calculates an evaluation value based on the inference result. Specifically, the evaluation value calculation unit 1022 calculates the inference result specified based on the reference value among the inference results by the evaluation data set inference unit 1021 and each of one or a plurality of images included in the evaluation data set DSE. A precision rate and a recall rate are calculated based on the comparison result with the correct label in , and an F value as an evaluation value is calculated from the precision rate and the recall rate. The evaluation value calculation unit 1022 repeats calculation of the F value by changing the reference value, and calculates a plurality of F values corresponding to each reference value. The evaluation value calculation unit 1022 associates each of the calculated F values with the corresponding reference value and outputs them to the threshold determination unit 1023 .

(Step S1023)
In step S1023, the threshold determination unit 1023 determines a threshold based on the evaluation value. Specifically, the threshold determination unit 1023 identifies the maximum value among the acquired F values, and sets the reference value associated with the identified F value as the threshold. Threshold determination section 1023 outputs the determined threshold to pseudo-label generation section 1041 .

Note that steps S1021 to S1023 correspond to step S102 described in the first exemplary embodiment.

(Step S103)
In step S103, the inference unit 103 makes an inference. Specifically, the inference unit 103 reads the data set 2 (DS2) stored in the storage unit 150a, and adds the pseudo label generation object detection model acquired from the learning unit 101 to the data set 2 (DS2) Input each of the one or more images included and obtain one or more inference results for each of the images. The inference unit 103 outputs the acquired inference result to the pseudo-label generation unit 1041 .

(Step S1041)
In step S1041, the pseudo-label generation unit 1041 generates pseudo-labels. Specifically, the pseudo-label generation unit 1041 sets an inference result having reliability equal to or higher than the threshold determined by the threshold determination unit 1023 among one or more inference results by the inference unit 103 as a pseudo-label. The pseudo-label generation unit 1041 outputs the inference result set in the pseudo-label to the association unit 1042 .

(Step S1042)
In step S1042, the associating unit 1042 associates the image with the pseudo label. Specifically, the associating unit 1042 associates each of the one or more images included in dataset 2 (DS2) with a corresponding pseudo label to generate dataset 2′ (DS2′). The pseudo-label generation unit 1041 stores the data set 2′ (DS2′) generated by the association unit 1042 in the storage unit 150a and notifies the relearning unit 105 of it.

Note that steps S1041 and S1042 correspond to step S104 described in the first exemplary embodiment.

Although not shown in FIG. 9, the re-learning unit 105 learns the target image detection model using the data set after the pseudo-labeling. As an example, the relearning unit 105 performs relearning of the detection model learned by the learning unit 101 as the learning. Specifically, the relearning unit 105 reads the data set 2' (DS2') from the storage unit 150a, and uses the data set 2' (DS2') to learn the object detection model DM. Then, the relearning unit 105 stores the learned object detection model DM in the storage unit 150a. As another example, the relearning unit 105 may learn a new detection model as learning of the target image detection model, and store the new detection model in the storage unit 150a.

As described above, according to the information processing method S10a according to this exemplary embodiment, the same effects as those of the information processing apparatus 10a can be obtained. That is, in the information processing method S10a according to the present exemplary embodiment, a configuration is adopted in which the target image detection model is learned using the data set to which pseudo labels have been added. Therefore, according to the information processing method S10a according to the present exemplary embodiment, it is possible to reduce the cost of adjusting the threshold value and generate the target image detection model used by the information processing apparatus. . Therefore, according to the information processing method S10a according to this exemplary embodiment, it is possible to generate a highly accurate detection model while suppressing the generation cost.

<Configuration of information processing device 20a>
The configuration of the information processing device 20a according to this exemplary embodiment will be described with reference to FIG. FIG. 10 is a block diagram showing the configuration of the information processing device 20a.

As shown in FIG. 10, the information processing device 20a includes a control section 200a, a storage section 250a and an output section 260a. The control unit 200a centrally controls each unit of the information processing device 20a. The storage unit 250a stores various programs and data used by the information processing device 20a. The output unit 260a outputs information processing results by the information processing device 20a.

The storage unit 250a stores the target data set TDS and the object detection model DM. The target data set TDS is a data set containing one or more target images that are object detection targets. The object detection model DM is a target image detection model, specifically, an object detection model DM generated by the re-learning unit 105 of the information processing apparatus 10a.

That is, the object detection model DM is
- a learning process for learning a detection model using the first data set;
・One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct answers given to each of the one or more images threshold determination processing for determining a threshold with reference to the comparison result with the label;
an inference process for obtaining one or more inference results for each of the one or more images contained in the second data set by inputting each of the one or more images into the detection model;
- Of the one or more inference results from the inference process, an inference result having a reliability equal to or higher than a threshold is set as a pseudo-label, and the pseudo-label is associated with the corresponding image to create a data set after pseudo-labeling and a pseudo-label reference learning process for re-learning the detection model for the target image by referring to the data set after giving the pseudo-label. In other words, the object detection model DM is manufactured by a method including the steps of performing each of the above processes.

(Configuration of control unit 200a)
As shown in FIG. 10, the control unit 200a includes an acquisition unit 201 and a detection unit 202. FIG.

The acquisition unit 201 acquires the target image. Specifically, the acquisition unit 201 reads the target data set TDS from the storage unit 250 a and outputs it to the detection unit 202 .

The detection unit 202 detects an object included in the target image using the target image detection model. Specifically, the detection unit 202 inputs the target image included in the target data set TDS acquired from the acquisition unit 201 to the object detection model DM, and acquires the inference result output from the object detection model DM. The detection unit 202 outputs the obtained inference result to the output unit 260a. As a result, the output unit 260a, for each target image,
At least one of the presence/absence of an object included in the target image, the position of the object included in the target image, the size of the object included in the target image, and the category of the object included in the target image is output. Typically, the output unit 260a causes the display device to display the target image in which at least a part of the object is assigned a category and a bounding box. The display device may be the output unit 260a, or may be a display device (not shown) communicably connected to the information processing device 20a.

As described above, in the information processing apparatus 20a according to the present exemplary embodiment, a pseudo label is determined using an automatically determined threshold, and a data set including images associated with the pseudo label is used for learning. A configuration is adopted in which an object is detected using the target image detection model for which the above is performed. For this reason, according to the information processing apparatus 20a according to the present exemplary embodiment, it is possible to detect an object included in an image using a target image detection model in which the cost for adjusting the threshold value is reduced. be done.

In addition, in the information processing apparatus 20a according to the present exemplary embodiment, a configuration is adopted in which an inference result for the target image is output by the target image detection model. Therefore, according to the information processing device 20a according to the exemplary embodiment, an effect is obtained that the user of the information processing device 20a can recognize the inference result.

[Exemplary embodiment 3]
A third exemplary embodiment of the invention will now be described in detail with reference to the drawings. Components having the same functions as the components described in the exemplary embodiments 1 and 2 are denoted by the same reference numerals, and description thereof will not be repeated.

<Configuration of information processing device 10b>
The configuration of the information processing device 10b will be described with reference to FIG. FIG. 11 is a block diagram showing the configuration of the information processing device 10b. As shown in FIG. 11, the information processing device 10b includes a control section 100b and a storage section 150b. The control unit 100b centrally controls each unit of the information processing device 10b. The storage unit 150b stores various programs and data used by the information processing device 10b.

The difference between the storage unit 150b and the storage unit 150a described in the second exemplary embodiment is the data included in the data set 2 (DS2). Details of the data will be described with reference to FIG.

FIG. 12 is a diagram showing a specific example of data included in dataset 1 (DS1) and dataset 2 (DS2). Specifically, FIG. 12 shows one of the images contained in dataset 1 (DS1) and one of the images contained in dataset 2 (DS2).

In this exemplary embodiment, the images contained in dataset 1 (DS1) contain five objects, specifically three dogs and two cows. The images contained in dataset 2 (DS2) also contain five objects, specifically two dogs and three cows. Data set 1 (DS1) and data set 2 (DS2) according to this exemplary embodiment are a plurality of data sets (also called expert data sets) with different categories (responsibility ranges) assigned as correct answers.

In this exemplary embodiment, unlike exemplary embodiment 2, at least some of the one or more images included in dataset 2 (DS2) are labeled with one or more correct labels. In the images contained in dataset 1 (DS1), each of the three dogs is associated with a correct label. That is, dataset 1 (DS1) is an expert dataset whose responsibility is "dog". In the images included in data set 2 (DS2), each of the three cows including object Ob1 is associated with a correct label. That is, data set 2 (DS2) is an expert data set whose responsibility is "cow". Although not shown, the evaluation data set DSE according to this exemplary embodiment is a data set including images in which correct labels are associated with dogs, similar to data set 1 (DS1).

(Configuration of control unit 100b)
As shown in FIG. 11, the control unit 100b differs from the control unit 100a described in the second exemplary embodiment in that it includes an associating unit 1042b instead of the associating unit 1042. As shown in FIG. The pseudo-label generating unit 1041 and the associating unit 1042b correspond to the dataset generating unit 104 in exemplary embodiment 1, and are configured to implement the dataset generating means in this exemplary embodiment.

The association unit 1042b has the following functions in addition to the functions of the association unit 1042. That is, when the object included in the image associated with the pseudo label is given a correct label, the associating unit 1042b associates the area indicated by the area information included in the pseudo label with the area included in the correct label. If the degree of overlap with the area indicated by the area information is greater than or equal to a predetermined degree, the pseudo label is deleted.

FIG. 13 is a diagram showing specific examples of data included in data set 2 (DS2), data set 2′ (DS2′), and data set 2″. (DS2), one of the images contained in each of dataset 2' (DS2') and dataset 2'' is shown. Note that data set 2 (DS2) according to this exemplary embodiment has already been described and will not be repeated here.

Data set 2' (DS2') is a data set in which each of the one or more images included in data set 2 (DS2) is associated with a pseudo-label, as described in exemplary embodiment 2. Since the pseudo-label can be said to be based on the data set 1 (DS1) and the evaluation data set DSE, in the example of FIG. 13, the correct label whose category is "dog" is associated with a part of the object. . Here, in the example of FIG. 13, the object Ob1 is associated with a pseudo-label including the category "dog", that is, an incorrect pseudo-label. Although not shown in FIG. 13, in the image included in the data set 2 (DS2) that is the source of the data set 2′ (DS2′), the correct label is associated with the object Ob1. In the images included in set 2' (DS2'), object Ob1 is associated with the correct label in addition to the pseudo label.

The associating unit 1042b identifies the corresponding image from the images included in the dataset 2 (DS2) for each image included in the dataset 2' (DS2').

Subsequently, the associating unit 1042b selects one of the images included in the data set 2′ (DS2′), and for each of the bounding boxes of the pseudo labels associated with the image, the correct label included in the identified image. Calculate the IOU with the bounding box of . The IOU corresponds to the degree of overlap described above. The associating unit 1042b performs this process for all images included in dataset 2' (DS2').

The associating unit 1042b deletes the pseudo label when there is a correct label with an IOU equal to or greater than a predetermined value. In the example of FIG. 13, the IOU between the pseudo label associated with the object Ob1 and the correct label associated with the object Ob1 is greater than or equal to a predetermined value. Therefore, the associating unit 1042b deletes the pseudo label associated with the object Ob1. The image included in the data set 2″ shown in FIG. 13 is the image after the pseudo label is deleted. As shown in FIG. 13, in the image, the pseudo label associated with the object Ob1 is deleted, and the object Only correct labels are associated with Ob1.

As described above, in the information processing apparatus 10b according to the present exemplary embodiment, in the pseudo label and the correct label attached to the image, the area indicated by the area information included in the pseudo label and the area information included in the correct label is greater than or equal to a predetermined degree, the pseudo label is deleted. Therefore, according to the information processing apparatus 10b according to the present exemplary embodiment, if the pseudo label is not appropriate, the pseudo label is deleted and the correct label remains. The effect of being able to improve accuracy is obtained. Note that the pseudo-label is not appropriate, for example, (1) the category of the pseudo-label is different from the category of the object, (2) the bounding box of the pseudo-label does not enclose part of the object, etc. point to

In particular, in the case of an expert data set in which similar-looking objects such as dogs, cows, etc. shown in this exemplary embodiment were labeled correctly, it is possible that pseudo-labels with incorrect categories would be associated with the objects. high. On the other hand, according to the information processing apparatus 10b according to the present exemplary embodiment, since the erroneous pseudo label can be deleted, the pseudo label can be generated with high accuracy, and the target image detection model can be used. In addition, it is possible to obtain an effect that the accuracy of object detection can be improved.

[Exemplary embodiment 4]
A fourth exemplary embodiment of the invention will now be described in detail with reference to the drawings. Components having the same functions as those described in exemplary embodiments 1 to 3 are denoted by the same reference numerals, and description thereof will not be repeated.

<Overview of Information Processing Device 10c>
The information processing device 10c according to this exemplary embodiment determines a threshold based on each of the expert datasets, and assigns a pseudo-label to each of the plurality of datasets based on the threshold.

<Configuration of information processing device 10c>
The configuration of the information processing device 10c will be described with reference to FIG. FIG. 14 is a block diagram showing the configuration of the information processing device 10c. As shown in FIG. 14, the information processing device 10c includes a first control section 100c, a first storage section 150c, a second control section 110c, and a second storage section 160c. The first control unit 100c and the second control unit 110c collectively control each unit of the information processing device 10c. The first storage unit 150c and the second storage unit 160c store various programs and data used by the information processing device 10c.

Note that the first control unit 100c and the second control unit 110c may be integrated. Also, the first storage unit 150c and the second storage unit 160c may be integrated. Alternatively, the second control unit 110c and the second storage unit 160c may be provided in another device communicably connected to the information processing device 10c.

The first storage unit 150c stores data set 1 (DS1), data set 2 (DS2), evaluation data set 1 (DSE1), and evaluation data set 2 (DSE2). Data set 1 (DS1) is the first data set in this exemplary embodiment. Data set 2 (DS2) is the second data set in this exemplary embodiment. Data set 1 (DS1) and data set 2 (DS2) are the expert data sets described above. Evaluation dataset 1 (DSE1) is the first evaluation dataset in this exemplary embodiment. Evaluation dataset 2 (DSE2) is the second evaluation dataset in this exemplary embodiment.

Here, the details of dataset 1 (DS1) and dataset 2 (DS2) will be described. FIG. 15 is a diagram showing a specific example of data included in dataset 1 (DS1) and dataset 2 (DS2). Specifically, FIG. 15 shows one of the images contained in dataset 1 (DS1) and one of the images contained in dataset 2 (DS2).

Each of these images contains five objects, specifically three people and two bags. In the images contained in dataset 1 (DS1), each of the two bags is associated with a correct label. That is, data set 1 (DS1) is an expert data set whose responsibility is "bag". In the images contained in dataset 2 (DS2), the three persons are associated with correct labels. That is, data set 2 (DS2) is an expert data set whose scope of responsibility is "person".

Similar to dataset 1 (DS1), evaluation dataset 1 (DSE1) is a dataset in which a correct label is associated with each of the objects within the scope of responsibility included in each image. Based on the example of FIG. 15, evaluation data set 1 (DSE1) is a data set that includes images in which correct labels are associated with bags. For example, the images contained in evaluation dataset 1 (DSE1) may be part of the images contained in dataset 1 (DS1). Also, for example, the images included in the evaluation data set 1 (DSE1) are images that are not included in the data set 1 (DS1), and the correct label is associated with the object of the responsibility range in the data set 1 (DS1). It may be an image that

Similarly to dataset 2 (DS2), dataset 2 for evaluation (DSE2) is a dataset in which a correct label is associated with each of the objects within the scope of responsibility included in each image. Based on the example of FIG. 15, evaluation data set 2 (DSE2) is a data set containing images in which correct labels are associated with people. For example, the images contained in evaluation dataset 2 (DSE2) may be part of the images contained in dataset 2 (DS2). Also, for example, the images included in the evaluation data set 2 (DSE2) are images that are not included in the data set 2 (DS2), and the correct labels are associated with the objects in the scope of responsibility in the data set 2 (DS2). It may be an image that

(Configuration of first control unit 100c)
As shown in FIG. 14, the first control unit 100c includes a first learning unit 101-1, a second learning unit 101-2, a first threshold determination unit 102-1, a second threshold determination unit 102 -2, a first inference unit 103-1, a second inference unit 103-2, a first data set generation unit 104-1, and a second data set generation unit 104-2.

The first learning unit 101-1 is configured to implement the first learning means in this exemplary embodiment. The second learning unit 101-2 is configured to implement the second learning means in this exemplary embodiment. The first threshold determination unit 102-1 is a configuration that implements the first threshold determination means in this exemplary embodiment. The second threshold determination unit 102-2 is a configuration that implements the second threshold determination means in this exemplary embodiment. The first inference unit 103-1 is a configuration that implements the first inference means in this exemplary embodiment. The second inference unit 103-2 is a configuration that implements the second inference means in this exemplary embodiment. The first data set generation unit 104-1 is a configuration that implements the first data set generation means in this exemplary embodiment. The second data set generation unit 104-2 is a configuration that implements the second data set generation means in this exemplary embodiment.

The first learning unit 101-1 uses the first data set to learn the first detection model. Specifically, the first learning unit 101-1 acquires the data set 1 (DS1), and uses the data set 1 (DS1) to learn the first pseudo label generation object detection model PDM1. conduct. More specifically, the first learning unit 101-1 reads the data set 1 (DS1) stored in the first storage unit 150c, and uses the data set 1 (DS1) to perform the first The object detection model PDM1 for pseudo label generation is learned. Then, the first learning unit 101-1 outputs the learned first object detection model PDM1 for pseudo label generation to the first threshold determination unit 102-1 and the first inference unit 103-1.

The first threshold determination unit 102-1 inputs one or more inference results obtained by inputting each of one or more images included in the first evaluation data set into the first detection model, and the one or more inference results. Alternatively, the first threshold is determined by referring to the result of comparison with one or more correct labels attached to each of the plurality of images.

Specifically, the first threshold determination unit 102-1 reads the evaluation data set 1 (DSE1) stored in the first storage unit 150c, and the first threshold value acquired from the first learning unit 101-1 1 to the pseudo label generation object detection model PDM1. Then, the first threshold determination unit 102-1 acquires the inference result output by the first pseudo-label generating object detection model PDM1.

Subsequently, the first threshold determination unit 102-1 compares each inference result in each of the one or more images included in the evaluation data set 1 (DSE1) with the correct label in each of the images. Based on this, the evaluation value of each inference result is calculated. The evaluation value is, for example, the F value. Note that the details of the calculation process of the F value in the example where the evaluation value is the F value have been described in the second exemplary embodiment, and thus the description will not be repeated here.

Next, the first threshold determination unit 102-1 identifies the maximum value among the plurality of F values calculated for each reference value, and sets the reference value linked to the identified F value as the threshold. This threshold is the above-described first threshold. First threshold determination section 102-1 outputs the determined first threshold to first data set generation section 104-1.

The first inference unit 103-1 inputs each of the one or more images included in the second data set to the first detection model, thereby obtaining one or more images for each of the one or more images. Get the inference result of Specifically, the first inference unit 103-1 reads the data set 2 (DS2) stored in the first storage unit 150c, and obtains the first pseudo data obtained from the first learning unit 101-1. One or more images included in the data set 2 (DS2) are input to the label generation object detection model PDM1, and one or more inference results PR1 are obtained for each of the images. First inference unit 103-1 outputs obtained inference result PR1 to first data set generation unit 104-1.

First data set generation unit 104-1 sets an inference result having a reliability equal to or higher than a first threshold among one or more inference results by first inference unit 103-1 as a pseudo label, A second post-pseudo-labeled data set is generated by associating the pseudo-labels with the corresponding images. Specifically, first data set generation unit 104-1 simulates an inference result having a reliability equal to or higher than a first threshold among one or more inference results PR1 by first inference unit 103-1. Set to label. The first dataset generator 104-1 then associates the pseudo-label with the corresponding image. This generates a dataset 2' (DS2') in which each of the one or more images contained in the dataset 2 (DS2) is associated with a pseudo-label. The first data set generation unit 104-1 stores the generated data set 2' (DS2') in the second storage unit 160c.

The second learning unit 101-2 learns the second detection model using the second data set. Specifically, the second learning unit 101-2 acquires the data set 2 (DS2), and uses the data set 2 (DS2) to learn the second pseudo label generation object detection model PDM2. conduct. More specifically, the second learning unit 101-2 reads the data set 2 (DS2) stored in the first storage unit 150c, and uses the data set 2 (DS2) to perform the second The object detection model PDM2 for pseudo label generation is learned. Then, the second learning unit 101-2 outputs the learned second object detection model PDM2 for pseudo label generation to the second threshold determination unit 102-2 and the second inference unit 103-2.

The second threshold determination unit 102-2 inputs one or more inference results obtained by inputting each of one or more images included in the second evaluation data set into the second detection model, and the one or more inference results. Alternatively, the second threshold is determined by referring to the result of comparison with one or more correct labels attached to each of the plurality of images.

Specifically, the second threshold determination unit 102-2 reads the evaluation data set 2 (DSE2) stored in the first storage unit 150c, and the second threshold value obtained from the second learning unit 101-2 2 to the object detection model PDM2 for pseudo label generation. Then, the second threshold determination unit 102-2 acquires the inference result output by the second pseudo-label generation object detection model PDM2.

Subsequently, the second threshold determination unit 102-2 compares each inference result in each of the one or more images included in the evaluation data set 2 (DSE2) with the correct label in each of the images. Based on this, the evaluation value of each inference result is calculated. The evaluation value is, for example, the F value. Note that the details of the calculation process of the F value in the example where the evaluation value is the F value have been described in the second exemplary embodiment, and thus the description will not be repeated here.

Subsequently, the second threshold determination unit 102-2 identifies the maximum value among the plurality of F values calculated for each reference value, and sets the reference value linked to the identified F value as the threshold. This threshold is the above-described second threshold. Second threshold determination section 102-2 outputs the determined second threshold to second data set generation section 104-2.

The second inference unit 103-2 inputs each of the one or more images included in the first data set to the second detection model, thereby obtaining one or more images for each of the one or more images. Get the inference result of Specifically, the second inference unit 103-2 reads the data set 1 (DS1) stored in the first storage unit 150c, and obtains the second pseudo data obtained from the second learning unit 101-2. One or more images included in the data set 1 (DS1) are input to the label generation object detection model PDM2, and one or more inference results PR2 are obtained for each of the images. Second inference unit 103-2 outputs obtained inference result PR2 to second data set generation unit 104-2.

The second data set generation unit 104-2 sets an inference result having a reliability equal to or higher than a second threshold among the one or more inference results by the second inference unit 103-2 as a pseudo label, A first post-pseudo-labeled data set is generated by associating the pseudo-labels with the corresponding images. Specifically, the second data set generation unit 104-2 simulates an inference result having a reliability equal to or higher than the second threshold among the one or more inference results PR2 by the second inference unit 103-2. Set to label. Second data set generator 104-2 then associates the pseudo-label with the corresponding image. This generates a dataset 1' (DS1') in which each of the one or more images contained in the dataset 1 (DS1) is associated with a pseudo-label. The second data set generation unit 104-2 stores the generated data set 1' (DS1') in the second storage unit 160c.

Here, the details of dataset 1' (DS1') and dataset 2' (DS2') will be described. FIG. 16 is a diagram showing specific examples of data included in data set 1' (DS1') and data set 2' (DS2'). Specifically, FIG. 16 shows one of the images contained in data set 1' (DS1') and one of the images contained in data set 2' (DS2').

The images included in dataset 1' (DS1') shown in FIG. 16 are the same as the images included in dataset 1 (DS1) (see FIG. 15). For the image objects contained in dataset 1' (DS1'), each of the two bags is associated with a correct label and each of the three persons with a pseudo-label. The correct label is the correct label associated with the bag that is the responsibility of the dataset 1 (DS1) in the images included in the dataset 1 (DS1) that is the source of the dataset 1′ (DS1′). be. Also, the pseudo-label is a pseudo-label set by the second data set generator 104-2 based on the inference result PR2. The inference result PR2 is an inference result using the second pseudo-label generation object detection model PDM2 that has been trained using the data set 2 (DS2) whose responsibility range is a person. Associated.

The images included in dataset 2' (DS2') shown in FIG. 16 are the same as the images included in dataset 2 (DS2) (see FIG. 15). For the image objects contained in dataset 2' (DS2'), each of the three persons is associated with a correct label and each of the two bags with a pseudo-label. The correct label is the correct label associated with the person responsible for the dataset 2 (DS2) in the images included in the dataset 2 (DS2) that is the source of the dataset 2′ (DS2′). be. Also, the pseudo-label is a pseudo-label set by the first data set generator 104-1 based on the inference result PR1. The inference result PR1 is an inference result using the first pseudo-label generation object detection model PDM1 that has been trained using the data set 1 (DS1) whose responsibility range is the bag. Associated.

The second storage unit 160c stores data set 1' (DS1'), data set 2' (DS2'), and object detection model DM. Data set 1' (DS1') and data set 2' (DS2') are data sets generated by the second data set generator 104-2 and the first data set generator 104-1, respectively. The object detection model DM is a target image detection model, and the details thereof will be described later.

(Configuration of the second control unit 110c)
As shown in FIG. 14, the second control unit 110c includes a re-learning unit 105. The relearning unit 105 is a re-learning unit. The re-learning unit 105 is a configuration that implements pseudo-label reference learning means in this exemplary embodiment. The re-learning unit 105 learns the target image detection model using the data set to which the pseudo label has been assigned. Specifically, the re-learning unit 105 re-learns the first pseudo-label generation object detection model PDM1 or the second pseudo-label generation object detection model PDM2 as the learning. More specifically, the relearning unit 105 reads the data set 1′ (DS1′) and the data set 2′ (DS2′) from the second storage unit 160c, and reads the data set 1′ (DS1′) and data The set 2′ (DS2′) is used to relearn the first pseudo-label generation object detection model PDM1 or the second pseudo-label generation object detection model PDM2. Then, the relearning unit 105 stores the object detection model DM generated by the relearning in the second storage unit 160c. Note that the relearning unit 105 may learn a new object detection model DM using data set 1′ (DS1′) and data set 2′ (DS2′). The new object detection model DM is a target image detection model that is different from both the first pseudo-label generation object detection model PDM1 and the second pseudo-label generation object detection model PDM2.

As described above, in the information processing device 10c according to the present exemplary embodiment, a threshold is determined based on each of a plurality of expert datasets, and a pseudo label is assigned to each of the plurality of datasets based on the threshold. is adopted. For this reason, according to the information processing apparatus 10c according to the present exemplary embodiment, a plurality of datasets each assigned a pseudo-label, specifically dataset 1′ (DS1′) and dataset 2′ ( DS2') can be used to re-learn the detection model, so that the effect of further improving the detection accuracy of the object included in the image using the re-learned detection model can be obtained. Further, according to the information processing apparatus 10c according to this exemplary embodiment, when generating a plurality of data sets each assigned a pseudo label, that is, when a plurality of thresholds for determining pseudo labels are required However, since the plurality of thresholds can be automatically determined, an effect is obtained that the cost for adjusting the thresholds can be reduced. Further, according to the information processing apparatus 10c according to the exemplary embodiment, it is possible to obtain an effect that one highly accurate target image detection model can be learned from a plurality of data sets with different responsibilities.

In this exemplary embodiment, an example in which the number of expert data sets is "2" has been described, but the number of expert data sets is not limited to this example. In addition, the number of data sets and evaluation data sets stored in the information processing device 10c, and the number of members realizing the learning means, threshold value determination means, inference means, and data set generation means in the information processing device 10c are expert data. It depends on the number of sets. For example, when the number of expert data sets is "3", the information processing device 10c further stores a third data set and a third evaluation data set, and further includes a third learning unit, a third It further comprises a threshold determiner, a third reasoner and a third data set generator.

Also, in the present exemplary embodiment, the scope of responsibility of each expert dataset has been described as different, but the scope of responsibility may overlap between expert datasets.

Also, the first data set generation unit 104-1 and the second data set generation unit 104-2 according to this exemplary embodiment have the function of the associating unit 1042b described in the third exemplary embodiment. good too. That is, the first data set generation unit 104-1 generates data set 2′ (DS2′) in the case where the correct label is assigned to the object included in the image associated with the pseudo label, and the pseudo label If the degree of overlap between the region indicated by the region information included in the label and the region indicated by the region information included in the correct label is greater than or equal to a predetermined degree, the pseudo label may be deleted. In addition, the second data set generation unit 104-2 generates data set 1′ (DS1)′ in the case where the correct label is assigned to the object included in the image associated with the pseudo label, and the pseudo label If the degree of overlap between the region indicated by the region information included in the label and the region indicated by the region information included in the correct label is greater than or equal to a predetermined degree, the pseudo label may be deleted.

[Exemplary embodiment 5]
A fifth exemplary embodiment of the present invention will now be described in detail with reference to the drawings. Components having the same functions as the components described in exemplary embodiments 1 to 4 are denoted by the same reference numerals, and description thereof will not be repeated.

<Configuration of information processing device 10d>
The configuration of an information processing device 10d according to this exemplary embodiment will be described with reference to FIG. FIG. 17 is a block diagram showing the configuration of the information processing device 10d. As shown in FIG. 17, the information processing device 10d includes a control section 100d and a storage section 150d. The control unit 100d centrally controls each unit of the information processing device 10d. The storage unit 150d stores various programs and data used by the information processing device 10d.

The control unit 100d includes a non-learning region determination unit 106 in addition to the learning unit 101, the threshold determination unit 102, the inference unit 103, the data set generation unit 104, and the relearning unit 105 according to the second exemplary embodiment described above. . The non-learning area determination unit 106 is a configuration that implements non-learning area determination means in this exemplary embodiment.

As in the second exemplary embodiment described above, the threshold determination unit 102 determines one or more inference results obtained by inputting each of one or more images included in the evaluation data set DSE into the detection model, A first threshold is determined with reference to a comparison result with one or more correct labels attached to each of one or more images. The method by which the threshold determination unit 102 determines the first threshold has already been described in the second exemplary embodiment above, so the description will not be repeated here.

In addition, in this exemplary embodiment, the threshold determination unit 102 further includes one or more inference results obtained by inputting each of one or more images included in the evaluation data set DSE into the detection model; Alternatively, a second threshold that is smaller than the first threshold is determined by referring to a comparison result with one or more correct labels attached to each of the plurality of images.

The second threshold is a value smaller than the first threshold. For example, the first threshold may be a value that emphasizes precision, and the second threshold may be a value that emphasizes recall. good. For example, the first threshold is the confidence that F _0.5 -score, which is the F value that emphasizes precision, takes the maximum value, and the second threshold is the confidence that F ₂ -score, which emphasizes recall, takes the maximum value. degree.

The non-learning region determining unit 106 determines the first threshold among one or more inference results by the inference unit 103 in the pseudo-labeled dataset 2′ (DS2′) generated by the dataset generation unit 104. A region corresponding to an inference result having a reliability less than and equal to or greater than the second threshold is determined as a non-learning region that is not subject to learning by the relearning unit 105 .

<Flow of information processing method>
The flow of the information processing method S10d executed by the information processing apparatus 10d configured as described above will be described with reference to FIG. FIG. 18 is a flowchart showing the flow of the information processing method S10d. The information processing method S10d includes steps S101 to S1022, S1023d, S103 to S1041, S1041d, and S1042. Among these steps, steps S101-S1022, S103-S1041, and S1042 have already been described in the exemplary embodiment 2 above, so the description will not be repeated here.

(Step S1023d)
In step S1023d, the threshold determination unit 1023 determines the first threshold and the second threshold based on the evaluation value. Specifically, the threshold determination unit 1023 identifies the maximum value among the plurality of acquired F values that emphasize the precision, for example, and sets the reference value associated with the identified F value as the first threshold. do. In addition, the threshold determination unit 1023 identifies the maximum value among the plurality of obtained F-values that emphasize recall, for example, and sets the reference value associated with the identified F-value as the second threshold. The threshold determination unit 1023 outputs the determined first threshold and second threshold to the pseudo-label generation unit 1041 .

(Step S1041d)
In step S1041d, the non-learning region determination unit 106 selects one or more inference results from the inference unit 103 in the pseudo-labeled dataset 2′ (DS2′) generated by the dataset generation unit 104. A region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the second threshold is determined as a non-learning region that is not subject to learning by the relearning unit 105 .

As described above, in the information processing apparatus 10d according to the present exemplary embodiment, in the pseudo-labeled dataset generated by the dataset generation unit 104, among the one or more inference results by the inference unit 103, the above A region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the second threshold is determined as a non-learning region that is not subject to learning by the relearning unit 105 . A region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the second threshold tends to have a low-reliability pseudo-label even if a pseudo-label is assigned to the region. By setting such a region as a non-learning region, relearning can be performed using pseudo labels with relatively high reliability. can improve the detection accuracy of the detection model learned by

Further, according to the information processing device 10d adopting this configuration, the detection accuracy of the target image detection model can be improved. It is possible to reduce the cost of generation using the threshold and the second threshold.

[Exemplary embodiment 6]
A sixth exemplary embodiment of the invention will now be described in detail with reference to the drawings. Components having the same functions as the components described in the exemplary embodiments 1 to 5 are denoted by the same reference numerals, and the description thereof will not be repeated.

<Configuration of information processing device 10e>
A configuration of an information processing apparatus 10e according to this exemplary embodiment will be described with reference to FIG. FIG. 19 is a block diagram showing the configuration of the information processing device 10d. As shown in FIG. 19, the information processing apparatus 10e includes a first control section 100e, a second control section 110e, a first storage section 150e, and a second storage section 160e. The first control unit 100e and the second control unit 110e collectively control each unit of the information processing device 10e. The first storage unit 150e and the second storage unit 160e store various programs and data used by the information processing device 10e.

The first control unit 100e includes, in addition to the configuration of the first control unit 100c of the information processing apparatus 10c shown in the above-described fourth exemplary embodiment, a first non-learning region determination unit 106-1 and a second learning non-execution region determination unit 106-2. The first non-learning area determining section 106-1 is a configuration that implements first non-learning area determining means in this exemplary embodiment. The second non-learning area determination unit 106-2 is a configuration that implements second non-learning area determination means in this exemplary embodiment.

The first threshold determination unit 102-1, similarly to the above-described exemplary embodiment 4, inputs each of the one or more images included in the evaluation data set 1 (DSE1) to the detection model to obtain 1 Alternatively, the first threshold is determined by referring to the results of comparison between the plurality of inference results and the one or more correct labels attached to each of the one or more images. The method of determining the first threshold by the first threshold determination unit 102-1 has already been described in the above-described exemplary embodiment 4, so the description will not be repeated here.

Also, in this exemplary embodiment, the first threshold determination unit 102-1 further includes one or more images obtained by inputting each of the one or more images included in the evaluation data set 1 (DSE1) into the detection model. A third threshold that is smaller than the first threshold is determined by referring to the results of comparison between the plurality of inference results and the one or more correct labels attached to each of the one or more images.

The third threshold is a value smaller than the first threshold. For example, the first threshold may be a value that emphasizes precision, and the third threshold may be a value that emphasizes recall. good. For example, the first threshold is the confidence that F _0.5 -score, which is the F value that emphasizes precision, takes the maximum value, and the third threshold is the confidence that F ₂ -score, which emphasizes recall, takes the maximum value. degree.

The second threshold determination unit 102-2, similarly to the above-described exemplary embodiment 4, inputs each of the one or more images included in the evaluation data set 2 (DSE2) to the detection model to obtain 1 Alternatively, the second threshold is determined by referring to the results of comparison between the plurality of inference results and the one or more correct labels attached to each of the one or more images. The method of determining the second threshold by the second threshold determining unit 102-2 has already been described in the above-described exemplary embodiment 4, so the description will not be repeated here.

Also, in this exemplary embodiment, the second threshold determination unit 102-2 further includes one or more images obtained by inputting each of the one or more images included in the evaluation data set 2 (DSE2) into the detection model. A fourth threshold smaller than the second threshold is determined by referring to a comparison result between the plurality of inference results and the one or more correct labels attached to each of the one or more images.

The fourth threshold is a value smaller than the second threshold. For example, the second threshold may be a value that emphasizes precision, and the fourth threshold may be a value that emphasizes recall. good. For example, the second threshold is the confidence that F _0.5 -score, which is the F value that emphasizes precision, takes the maximum value, and the fourth threshold is the confidence that F ₂ -score, which emphasizes recall, takes the maximum value. degree.

First learning non-implementation region determination unit 106-1 performs the first inference unit 103-1 in the pseudo-labeled dataset 2′ (DS2′) generated by first dataset generation unit 104-1. Of the one or more inference results, a region corresponding to an inference result having a reliability less than the first threshold and equal to or greater than the third threshold is defined as a non-learning region that is not subject to learning by the relearning unit 105. Determined as

Second learning non-implementation region determination unit 106-2 uses second inference unit 103-2 in pseudo-labeled dataset 1′ (DS1′) generated by second dataset generation unit 104-2. Of the one or more inference results, a region corresponding to an inference result having a reliability less than the second threshold and equal to or greater than the fourth threshold is a non-learning region that is not subject to learning by the relearning unit 105. Determined as

As described above, in the information processing device 10e according to the present exemplary embodiment, in the pseudo-labeled second dataset 2′ (DS2′) generated by the first dataset generator 104-1, , from among one or more inference results by the first inference unit 103-1, a region corresponding to an inference result having a reliability less than the first threshold and equal to or greater than the third threshold is selected by the relearning unit 105. It is determined as a non-learning area that is not subject to learning. Further, in the information processing device 10e, the second inference unit 103-2 performs Among one or more inference results, a region corresponding to an inference result having a reliability less than the second threshold and equal to or greater than the fourth threshold is defined as a non-learning region that is not subject to learning by the relearning unit 105. decide.

A region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the third threshold tends to be a pseudo-label with low reliability even if a pseudo-label is assigned. In addition, even if a pseudo label is assigned to an area corresponding to an inference result having a reliability lower than the second threshold and equal to or higher than the fourth threshold, there is a tendency for the pseudo label to be a low-reliability pseudo label. By setting such an area as a non-learning area, relearning can be performed using pseudo labels with relatively high reliability. The detection accuracy of the detection model learned by the learning unit 101-1 and the second learning unit 101-2 can be improved.

Further, according to the information processing apparatus 10e adopting the configuration, the detection accuracy of the target image detection model can be improved. , the second threshold, the third threshold, and the fourth threshold.

[Example of realization by software]
Some or all of the functions of the

information processing devices

10, 10a to 10e, 20 and 20a may be realized by hardware such as integrated circuits (IC chips) or by software.

In the latter case, the

information processing devices

10, 10a to 10e, 20 and 20a are implemented by computers that execute program instructions, which are software that implements each function, for example. An example of such a computer (hereinafter referred to as computer C) is shown in FIG. Computer C comprises at least one processor C1 and at least one memory C2. A program P for operating the computer C as the

information processing apparatuses

10, 10a to 10e, 20 and 20a is recorded in the memory C2. In the computer C, the processor C1 reads the program P from the memory C2 and executes it, thereby realizing each function of the

information processing devices

10, 10a to 10e, 20 and 20a.

As the processor C1, for example, CPU (Central Processing Unit), GPU (Graphic Processing Unit), DSP (Digital Signal Processor), MPU (Micro Processing Unit), FPU (Floating point number Processing Unit), PPU (Physics Processing Unit) , a microcontroller, or a combination thereof. As the memory C2, for example, a flash memory, HDD (Hard Disk Drive), SSD (Solid State Drive), or a combination thereof can be used.

Note that the computer C may further include a RAM (Random Access Memory) for expanding the program P during execution and temporarily storing various data. Computer C may further include a communication interface for sending and receiving data to and from other devices. Computer C may further include an input/output interface for connecting input/output devices such as a keyboard, mouse, display, and printer.

In addition, the program P can be recorded on a non-temporary tangible recording medium M that is readable by the computer C. As such a recording medium M, for example, a tape, disk, card, semiconductor memory, programmable logic circuit, or the like can be used. The computer C can acquire the program P via such a recording medium M. Also, the program P can be transmitted via a transmission medium. As such a transmission medium, for example, a communication network or broadcast waves can be used. Computer C can also obtain program P via such a transmission medium.

[Appendix 1]
The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope of the claims. For example, embodiments obtained by appropriately combining the technical means disclosed in the embodiments described above are also included in the technical scope of the present invention.

[Appendix 2]
Some or all of the above-described embodiments may also be described as follows. However, the present invention is not limited to the embodiments described below.

(Appendix 1)
a learning means for learning a detection model using the first data set;
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination means for determining the first threshold with reference to the comparison result of
inference means for obtaining one or more inference results for each of the one or more images included in the second data set by inputting each of the one or more images into the detection model;
setting an inference result having a reliability equal to or higher than the first threshold among the one or more inference results by the inference means as a pseudo label, and associating the pseudo label with the corresponding image, and data set generation means for generating a data set of .

According to the configuration of Supplementary Note 1, the inference result of the image included in the evaluation data set by the detection model trained using the first data set is compared with the correct label associated with the image. Based on this, the first threshold for setting the pseudo-label is automatically determined. Therefore, according to the configuration of Supplementary Note 1, it is possible to reduce the cost for adjusting the first threshold. Then, according to the configuration of Supplementary Note 1, from the inference result of the image included in the second data set by the detection model, the inference result having a confidence level equal to or higher than the automatically determined first threshold is set as the pseudo label. set to associate the pseudo-label with the corresponding image. Therefore, according to the configuration of Supplementary Note 1, it is possible to reduce the cost of generating a data set including images to which pseudo labels are assigned.

(Appendix 2)
The information processing device according to Supplementary Note 1,
An information processing apparatus, further comprising pseudo-label reference learning means for learning a target image detection model for detecting an object included in the target image, using the data set to which the pseudo label has been assigned. .

According to the configuration of Supplementary Note 2, learning of the target image detection model is performed using the data set after the pseudo-labeling. Therefore, according to the configuration of Supplementary Note 2, it is possible to generate a target image detection model while reducing the cost associated with threshold adjustment. As a result, it is possible to reduce the cost of learning the target image detection model. Also, if an appropriate value can be determined as the threshold, the number of threshold adjustments can be reduced, and the number of learning (re-learning) of the target image detection model required each time the threshold is adjusted can be reduced. be able to. As a result, it is possible to reduce the time until the learning of the target image detection model is completed.

(Appendix 3)
The information processing device according to appendix 2,
The information processing apparatus, wherein the pseudo label reference learning means re-learns the detection model as the learning of the target image detection model.

According to the configuration of Supplementary Note 3, the detection model is re-learned using the data set after pseudo-labeling. Therefore, according to the configuration of Supplementary Note 3, it is possible to reduce the cost of re-learning the detection model. Also, if an appropriate value can be determined as the threshold, the number of times the threshold is adjusted can be reduced, and the number of re-learning of the detection model required each time the threshold is adjusted can be reduced. As a result, it is possible to reduce the time until re-learning of the detection model is completed.

(Appendix 4)
The information processing device according to appendix 2 or 3,
The threshold determination means is
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct answers given to each of the one or more images Determine a second threshold that is less than the first threshold by referring to the comparison result with the label;
The information processing device is
Inference having reliability less than the first threshold and equal to or greater than the second threshold among one or more inference results by the inference means in the pseudo-labeled dataset generated by the dataset generation means An information processing apparatus, further comprising non-learning area determination means for determining an area corresponding to the result as a non-learning area that is not subject to learning by the pseudo label reference learning means.

A region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the second threshold tends to be a pseudo-label with low reliability even if a pseudo-label is assigned. By setting such an area as a non-learning area, re-learning can be performed using pseudo labels with relatively high reliability. The detection accuracy of the target image detection model can be improved.

(Appendix 5)
The information processing device according to any one of Appendices 1 to 4,
The correct label includes area information indicating the area of the object included in the image associated with the correct label, and category information indicating the category of the object,
An information processing apparatus, wherein the pseudo label includes area information indicating an area of an object included in an image associated with the pseudo label, and category information indicating a category of the object.

According to the configuration of Supplementary Note 5, the correct label and the pseudo label include area information and category information. Therefore, according to the configuration of Supplementary Note 5, it is possible to improve the accuracy of detecting an object included in an image using a detection model that has been re-learned using a data set after pseudo-labeling. .

(Appendix 6)
The information processing device according to appendix 5,
At least part of the one or more images included in the second data set is labeled with one or more correct labels,
The data set generation means is
When a correct label is assigned to an object included in an image associated with the pseudo label, the area indicated by the area information included in the pseudo label and the area indicated by the area information included in the correct label 2. An information processing apparatus, wherein the pseudo label is deleted when the degree of overlap of is greater than or equal to a predetermined degree.

According to the configuration of appendix 6, in the pseudo labels and the correct labels attached to the images included in the second data set, the regions indicated by the region information included in the pseudo labels and the regions indicated by the region information included in the correct labels If the degree of overlap with is greater than or equal to a predetermined degree, the pseudo label is deleted. Therefore, according to the configuration of Supplementary Note 6, when the pseudo label is not appropriate, the pseudo label is deleted and the correct label remains. Therefore, the accuracy of object detection using the retrained detection model is improved. becomes possible. In particular, in the case of datasets in which the correct labels are attached to visually similar objects, there is a high probability that pseudo-labels with incorrect categories will be associated with the objects. On the other hand, according to the configuration of Supplementary Note 6, since the erroneous pseudo label can be deleted, the pseudo label can be generated with high accuracy, and the object detection accuracy using the target image detection model can be improved. can be improved.

Note that the pseudo-label is not appropriate, for example, (1) the category of the pseudo-label is different from the category of the object, (2) the bounding box of the pseudo-label does not enclose part of the object, etc. point to

(Appendix 7)
The information processing device according to appendix 5 or 6,
The threshold determination means sets the first threshold for each category,
The data set generation means is
setting an inference result having a reliability equal to or higher than the first threshold set for each category among the one or more inference results by the inference means as a pseudo label, and associating the pseudo label with the corresponding image; An information processing apparatus characterized by generating a data set after pseudo-labeling.

According to the configuration of Supplementary Note 7, the inference result of the image included in the evaluation data set by the detection model trained using the first data set and the correct label associated with the image for each category is set to a first threshold, and an inference result equal to or greater than the first threshold is set as a pseudo label. Therefore, according to the configuration of Supplementary Note 7, the accuracy of setting pseudo labels can be improved.

(Appendix 8)
The information processing device according to any one of Appendices 1 to 7,
The information processing apparatus, wherein the threshold determination means determines the first threshold by referring to the matching rate and the recall rate indicated by the comparison result.

According to the configuration of Supplementary Note 8, the inference result of the image included in the evaluation data set by the detection model trained using the first data set and the comparison result of the correct label associated with the image. The first threshold is determined by referring to the precision and recall calculated from . Therefore, according to the configuration of Supplementary Note 8, it is possible to improve the accuracy of setting pseudo labels. In addition, according to the configuration of Supplementary Note 8, since it is possible to set pseudo labels in consideration of both the quality of learning data (relevance rate) and the amount of learning data (recall rate), highly accurate target image detection model can be generated.

(Appendix 9)
The information processing device according to any one of Appendices 1 to 8,
The information processing apparatus, wherein the evaluation data set is included in the first data set.

According to the configuration of appendix 9, the images included in the evaluation data set are included in the first data set. Therefore, according to the configuration of Supplementary Note 9, there is no need to newly perform a high-cost correct answer assignment work for generating the evaluation data set. Further, according to the configuration of Supplementary Note 9, it is possible to reduce the number of images prepared in advance.

(Appendix 10)
The information processing device according to any one of Appendices 1 to 8,
The information processing apparatus, wherein the evaluation data set is generated by giving a correct label to a part of the second data set.

According to the configuration of appendix 10, the images included in the evaluation data set are generated by giving correct labels to part of the second data set. Therefore, according to the configuration of Supplementary Note 10, a part of the data set to which the pseudo-label is assigned is used as the evaluation data set to determine the threshold value, so the accuracy of the assigned pseudo-label is improved. It is possible to Further, according to the configuration of Supplementary Note 10, it is possible to reduce the number of images prepared in advance.

(Appendix 11)
a first learning means for learning a first detection model using a first data set;
a second learning means for learning a second detection model using a second data set;
One or more inference results obtained by inputting each of one or more images included in the first evaluation data set into the first detection model, and one or more inference results attached to each of the one or more images a first threshold determination means for determining a first threshold with reference to a comparison result with one or more correct labels;
One or more inference results obtained by inputting each of one or more images included in the second evaluation data set into the second detection model, and one or more inference results attached to each of the one or more images a second threshold determination means for determining a second threshold with reference to a comparison result with one or more correct labels;
a first obtaining one or more inference results for each of the one or more images contained in the second data set by inputting each of the one or more images into the first detection model; an inference means for
a second obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the first data set into the second detection model; an inference means for
By setting an inference result having a reliability equal to or higher than the first threshold among the one or more inference results by the first inference means as a pseudo label and associating the pseudo label with the corresponding image, a first data set generation means for generating a second data set after labeling;
By setting an inference result having a reliability equal to or higher than the second threshold among the one or more inference results by the second inference means as a pseudo label and associating the pseudo label with the corresponding image, a second data set generating means for generating a first data set after labeling;
An information processing device comprising:

According to the configuration of Supplementary Note 11, the inference result of the image included in the first evaluation data set by the detection model trained using the first data set, and the correct label associated with the image. automatically determine a first threshold for setting pseudo-labels to the second data set based on the comparison of . Further, according to the description of appendix 11, the inference result of the image included in the second evaluation data set by the detection model trained using the second data set, and the correct answer associated with the image A second threshold for setting the pseudo-labels to the first data set is automatically determined based on the comparison with the labels. Therefore, according to the configuration of Supplementary Note 11, it is possible to reduce the cost for adjusting the first threshold and the second threshold. In other words, according to the configuration of Supplementary Note 11, even when generating two datasets with pseudo-labels attached to each, the cost associated with adjusting two thresholds for setting pseudo-labels in each of these two datasets is can be reduced. Then, according to the configuration of Supplementary Note 11, the re-learning of the detection model is performed using the two data sets to which the pseudo-labels are assigned. It is possible to further improve the accuracy of object detection.

(Appendix 12)
The information processing device according to Appendix 11,
A pseudo label for learning a target image detection model for detecting an object included in the target image using the first data set after the pseudo labeling and the second data set after the pseudo labeling An information processing apparatus, further comprising reference learning means.

According to the configuration of Supplementary Note 12, learning of the target image detection model is performed using the first data set after the pseudo-labeling and the second data set after the pseudo-labeling. Therefore, according to the configuration of Supplementary Note 12, it is possible to generate a target image detection model while reducing the costs associated with adjusting the first threshold value and adjusting the second threshold value. As a result, it is possible to reduce the cost of learning the target image detection model. Also, if it is possible to determine appropriate values for the first and second thresholds, the number of threshold adjustments can be reduced, and the target image detection model learning ( relearning) can be reduced. As a result, it is possible to reduce the time until the learning of the target image detection model is completed.

(Appendix 13)
The information processing device according to Appendix 12,
The information processing apparatus, wherein the pseudo label reference learning means re-learns the first detection model and the second detection model as learning of the target image detection model.

According to the configuration of appendix 13, the first detection model and the second detection model are re-learned using the pseudo-labeled first data set and the pseudo-labeled second data set. Therefore, according to the configuration of Supplementary Note 13, it is possible to reduce the cost of re-learning the first detection model and the second detection model. Also, if it is possible to determine appropriate values for the first threshold and the second threshold, the number of threshold adjustments can be reduced, and the number of re-learning of the detection model required each time the threshold is adjusted can be reduced. can be reduced. As a result, it is possible to reduce the time until re-learning of the detection model is completed.

(Appendix 14)
The information processing device according to appendix 12 or 13,
The first threshold determination means is
One or more inference results obtained by inputting each of one or more images included in the first evaluation data set into the first detection model, and one or more inference results attached to each of the one or more images Determine a third threshold that is smaller than the first threshold with reference to a comparison result with one or more correct labels;
The second threshold determination means is
One or more inference results obtained by inputting each of one or more images included in the second evaluation data set into the second detection model, and one or more inference results attached to each of the one or more images Determine a fourth threshold that is smaller than the second threshold by referring to a comparison result with one or more correct labels;
The information processing device is
In the pseudo-labeled second data set generated by the first data set generation means, one or more inference results by the first inference means are less than the first threshold and the third a first non-learning region determination means for determining a region corresponding to an inference result having a reliability equal to or higher than a threshold of as a non-learning region that is not subject to learning by the pseudo label reference learning means;
In the pseudo-labeled first data set generated by the second data set generation means, one or more inference results by the second inference means are less than the second threshold and the fourth a second non-learning region determination means for determining a region corresponding to an inference result having a reliability equal to or higher than a threshold of as a non-learning region that is not subject to learning by the pseudo label reference learning means;
Information processing device equipped with.

A region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the third threshold tends to be a pseudo-label with low reliability even if a pseudo-label is assigned. In addition, even if a pseudo label is assigned to an area corresponding to an inference result having a reliability lower than the second threshold and equal to or higher than the fourth threshold, there is a tendency for the pseudo label to be a low-reliability pseudo label. By setting such an area as a non-learning area, according to the configuration of Supplementary Note 14, re-learning can be performed using pseudo labels with relatively high reliability. The detection accuracy of the target image detection model can be improved.

(Appendix 15)
acquisition means for acquiring a target image;
detection means for detecting an object included in the target image using a target image detection model;
with
The target image detection model includes:
A learning process for learning a detection model using the first data set;
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination process for determining a threshold with reference to the result of comparison with
Inference processing for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the second data set into the detection model;
By setting an inference result having a reliability equal to or higher than the threshold among the one or more inference results by the inference process as a pseudo label and associating the pseudo label with the corresponding image, the data set after pseudo labeling and a pseudo-label reference learning process for learning the target image detection model by referring to the data set after the pseudo-labeling. processing equipment.

According to the configuration of Supplementary Note 15, a pseudo label is determined using an automatically determined threshold, and a target image detection model trained using a data set including images associated with the pseudo label is used. to detect objects included in the target image. Therefore, according to the configuration of Supplementary Note 15, it is possible to detect an object included in the target image using the target image detection model in which the cost for adjusting the threshold value is reduced.

(Appendix 16)
The information processing device according to appendix 15,
In the threshold determination process, a second threshold smaller than the first threshold is also determined with reference to the comparison result,
In the data set generation process, in the pseudo-labeled data set, one or more inference results obtained by the inference process have a reliability less than the first threshold and equal to or greater than the second threshold. is determined as a non-learning region that is not subject to learning by the pseudo label reference learning process,
In the pseudo-label reference learning process, the target image detection model is learned by referring to the pseudo-labeled data set including the non-learning region.
An information processing device characterized by:

A region corresponding to an inference result having a reliability less than the first threshold and greater than or equal to the second threshold tends to be a pseudo-label with low reliability even if a pseudo-label is assigned. By setting such a region as a non-learning region, re-learning can be performed using pseudo labels with relatively high reliability. The detection accuracy of the target image detection model can be improved.

(Appendix 17)
a learning step of learning a detection model using the first data set;
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination step of determining a threshold with reference to the comparison result of
an inference step of obtaining one or more inference results for each of the one or more images contained in a second data set by inputting each of the one or more images into the sensing model;
setting an inference result having a reliability equal to or higher than the threshold among the one or more inference results obtained by the inference step as a pseudo-label, and associating the pseudo-label with the corresponding image to obtain a data set after pseudo-labeling; and a data set generation step of generating

According to the configuration of Supplementary Note 17, the same effect as that of the information processing apparatus described in Supplementary Note 1 is achieved.

(Appendix 18)
The information processing method according to appendix 17,
In the threshold determination step, a second threshold smaller than the first threshold is also determined with reference to the comparison result,
In the dataset generating step, in the pseudo-labeled dataset, one or more inference results obtained in the inference step have a degree of confidence less than the first threshold and greater than or equal to the second threshold. An information processing method, wherein a region corresponding to the result is determined as a non-learning region that is not subject to learning in the pseudo label reference learning step.

(Appendix 19)
obtaining a target image;
Detecting an object included in the target image using a target image detection model;
including
The target image detection model includes:
A learning process for learning a detection model using the first data set;
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination process for determining a threshold with reference to the result of comparison with
Inference processing for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the second data set into the detection model;
By setting an inference result having a reliability equal to or higher than the threshold among the one or more inference results by the inference process as a pseudo label and associating the pseudo label with the corresponding image, the data set after pseudo labeling and a pseudo-label reference learning process for learning the target image detection model by referring to the data set after the pseudo-labeling. Processing method.

According to the configuration of Supplementary Note 19, the same effects as those of the information processing apparatus described in Supplementary Note 15 are obtained.

(Appendix 20)
a learning step of learning a detection model using the first data set;
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination step of determining a threshold with reference to the comparison result of
an inference step of obtaining one or more inference results for each of the one or more images contained in a second data set by inputting each of the one or more images into the sensing model;
setting an inference result having a reliability equal to or higher than the threshold among the one or more inference results obtained by the inference step as a pseudo-label, and associating the pseudo-label with the corresponding image to obtain a data set after pseudo-labeling; a data set generation step that generates
and a pseudo label reference learning step of learning a target image detection model for detecting an object included in the target image using the data set after the pseudo labeling. Production method.

According to the configuration of Supplementary Note 20, the inference result of the image included in the evaluation data set by the detection model trained using the first data set is compared with the correct label associated with the image. Based on this, the threshold value for pseudo-label setting is automatically determined. Therefore, according to the configuration of Supplementary Note 20, it is possible to reduce the cost for adjusting the threshold. Then, according to the configuration of Supplementary Note 20, learning of the target image detection model is performed using the data set to which the pseudo label has been assigned. Therefore, according to the configuration of Supplementary Note 20, it is possible to manufacture the target image detection model while reducing the cost for adjusting the threshold value. As a result, it is possible to reduce the cost of learning the target image detection model. Also, if an appropriate value can be determined as the threshold value, the number of times the threshold value is adjusted can be reduced, and the number of times of learning required each time the threshold value is adjusted can be reduced. As a result, it is possible to reduce the time until the learning of the target image detection model is completed.

(Appendix 21)
A method for manufacturing a detection model according to Appendix 20,
In the threshold determination step, a second threshold smaller than the first threshold is also determined with reference to the comparison result,
In the dataset generating step, in the pseudo-labeled dataset, one or more inference results obtained in the inference step have a degree of confidence less than the first threshold and greater than or equal to the second threshold. A detection model manufacturing method, wherein a region corresponding to the result is determined as a non-learning region that is not subject to learning in the pseudo label reference learning step.

(Appendix 22)
A program for causing a computer to function as an information processing device, the computer comprising:
a learning means for learning a detection model using the first data set;
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination means for determining a threshold with reference to the result of comparison with
inference means for obtaining one or more inference results for each of the one or more images included in the second data set by inputting each of the one or more images into the detection model;
setting an inference result having a reliability equal to or higher than the threshold among the one or more inference results by the inference means as a pseudo-label, and associating the pseudo-label with the corresponding image to obtain a data set after pseudo-labeling; A program that functions as a dataset generator that generates a .

According to the configuration of Supplementary Note 22, the same effects as those of the information processing apparatus described in Supplementary Note 1 are achieved.

(Appendix 23)
A program for causing a computer to function as an information processing device, the computer comprising:
acquisition means for acquiring a target image;
detection means for detecting an object included in the target image using a target image detection model;
function as
The target image detection model includes:
A learning process for learning a detection model using the first data set;
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination process for determining a threshold with reference to the result of comparison with
Inference processing for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the second data set into the detection model;
By setting an inference result having a reliability equal to or higher than the threshold among the one or more inference results by the inference process as a pseudo label and associating the pseudo label with the corresponding image, the data set after pseudo labeling and a pseudo-label reference learning process for learning the target image detection model by referring to the pseudo-labeled data set.

According to the configuration of Supplementary Note 23, the same effect as the information processing apparatus described in Supplementary Note 15 is achieved.

[Appendix 3]
Some or all of the embodiments described above can also be expressed as follows.

At least one processor is provided, and the processor inputs each of one or more images included in the evaluation data set into the detection model, and a learning process of learning a detection model using a first data set. a threshold determination process for determining a first threshold with reference to a comparison result between one or more inference results obtained by and one or more correct labels attached to each of the one or more images; inference processing for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the two data sets into the detection model; of the one or more inference results by setting an inference result having a confidence level equal to or higher than the first threshold as a pseudo-label, and associating the pseudo-label with the corresponding image, a data set after pseudo-labeling and an information processing device that executes a data set generation process that generates a

Note that the information processing apparatus may further include a memory, in which the learning process, the threshold value determination process, the inference process, and the data set generation process are executed by the processor. A program may be stored for causing the Also, this program may be recorded in a computer-readable non-temporary tangible recording medium.

At least one processor is provided, and the processor performs an acquisition process of acquiring a target image and a detection process of detecting an object included in the target image using a target image detection model, and the target The image detection model includes a learning process for learning the detection model using the first data set, and one or more images obtained by inputting each of one or more images included in the evaluation data set into the detection model. with reference to the comparison result between the inference result and one or more correct labels attached to each of the one or more images to determine the first threshold, included in the second data set Inference processing for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images into the detection model, obtaining one or more inference results from the inference processing Among them, a dataset generation process for generating a dataset after pseudo-labeling by setting an inference result having a reliability equal to or higher than the first threshold as a pseudo-label and associating the pseudo-label with the corresponding image; and an information processing apparatus that is learned by a pseudo-label reference learning process of learning the detection model for the target image by referring to the data set after the pseudo-labeling.

The information processing apparatus may further include a memory, and the memory may store a program for causing the processor to execute the acquisition process and the detection process. Also, this program may be recorded in a computer-readable non-temporary tangible recording medium.

10, 10a, 10b, 10c, 20, 20a Information processing device 101 Learning unit 101-1 First learning unit 101-2 Second learning unit 102 Threshold determination unit 102-1 First threshold determination unit 102-2 Second 2 threshold determination unit 103 inference unit 103-1 first inference unit 103-2 second inference unit 104 data set generation unit 104-1 first data set generation unit 104-2 second data set generation unit 105 Re-learning unit 106 Non-learning area determination unit 106-1 First non-learning area determination unit 106-2 Second non-learning area determination unit 201 Acquisition unit 202 Detection unit DS1 Data set 1
DS1' data set 1'
DS2 data set 2
DS2' data set 2'
DSE Evaluation data set DSE1 Evaluation data set 1
DSE2 Evaluation data set 2
DM object detection model

Claims

a learning means for learning a detection model using the first data set;
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination means for determining the first threshold with reference to the comparison result of
inference means for obtaining one or more inference results for each of the one or more images included in the second data set by inputting each of the one or more images into the detection model;
setting an inference result having a reliability equal to or higher than the first threshold among the one or more inference results by the inference means as a pseudo label, and associating the pseudo label with the corresponding image, and data set generation means for generating a data set of .
2. The method according to claim 1, further comprising pseudo-label reference learning means for learning a target image detection model for detecting an object included in the target image, using the data set to which the pseudo label has been assigned. The information processing device according to .
3. The information processing apparatus according to claim 2, wherein the pseudo label reference learning means re-learns the detection model as the learning of the target image detection model.
The threshold determination means is
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct answers given to each of the one or more images Determine a second threshold that is less than the first threshold by referring to the comparison result with the label;
The information processing device is
Inference having reliability less than the first threshold and equal to or greater than the second threshold among one or more inference results by the inference means in the pseudo-labeled dataset generated by the dataset generation means 4. The information processing apparatus according to claim 2, further comprising non-learning area determination means for determining an area corresponding to the result as a non-learning area that is not subject to learning by said pseudo label reference learning means.
The correct label includes area information indicating the area of the object included in the image associated with the correct label, and category information indicating the category of the object,
5. The pseudo label includes area information indicating the area of the object included in the image associated with the pseudo label, and category information indicating the category of the object. The information processing apparatus according to any one of 1.
At least part of the one or more images included in the second data set is labeled with one or more correct labels,
The data set generation means is
When a correct label is assigned to an object included in an image associated with the pseudo label, the area indicated by the area information included in the pseudo label and the area indicated by the area information included in the correct label 6. The information processing apparatus according to claim 5, wherein the pseudo label is deleted when the degree of overlap of the two is greater than or equal to a predetermined degree.
The threshold determination means sets the first threshold for each category,
The data set generation means is
setting an inference result having a reliability equal to or higher than the first threshold set for each category among the one or more inference results by the inference means as a pseudo label, and associating the pseudo label with the corresponding image; 7. The information processing apparatus according to claim 5, wherein the pseudo-labeled data set is generated by:
8. The information processing apparatus according to any one of claims 1 to 7, wherein said threshold determination means determines said first threshold by referring to a precision and a recall indicated by said comparison result. .
9. The information processing apparatus according to claim 1, wherein said evaluation data set is included in said first data set.
9. The information according to any one of claims 1 to 8, wherein the evaluation data set is generated by giving a correct label to a part of the second data set. processing equipment.
a first learning means for learning a first detection model using a first data set;
a second learning means for learning a second detection model using a second data set;
One or more inference results obtained by inputting each of one or more images included in the first evaluation data set into the first detection model, and one or more inference results attached to each of the one or more images a first threshold determination means for determining a first threshold with reference to a comparison result with one or more correct labels;
One or more inference results obtained by inputting each of one or more images included in the second evaluation data set into the second detection model, and one or more inference results attached to each of the one or more images a second threshold determination means for determining a second threshold with reference to a comparison result with one or more correct labels;
a first obtaining one or more inference results for each of the one or more images contained in the second data set by inputting each of the one or more images into the first detection model; an inference means for
a second obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the first data set into the second detection model; an inference means for
By setting an inference result having a reliability equal to or higher than the first threshold among the one or more inference results by the first inference means as a pseudo label and associating the pseudo label with the corresponding image, a first data set generation means for generating a second data set after labeling;
By setting an inference result having a reliability equal to or higher than the second threshold among the one or more inference results by the second inference means as a pseudo label and associating the pseudo label with the corresponding image, a second data set generating means for generating a first data set after labeling;
An information processing device comprising:
A pseudo label for learning a target image detection model for detecting an object included in the target image using the first data set after the pseudo labeling and the second data set after the pseudo labeling 12. The information processing apparatus according to claim 11, further comprising reference learning means.
13. The information processing according to claim 12, wherein said pseudo label reference learning means re-learns said first detection model and said second detection model as learning of said target image detection model. Device.
The first threshold determination means is
One or more inference results obtained by inputting each of one or more images included in the first evaluation data set into the first detection model, and one or more inference results attached to each of the one or more images Determine a third threshold that is smaller than the first threshold with reference to a comparison result with one or more correct labels;
The second threshold determination means is
One or more inference results obtained by inputting each of one or more images included in the second evaluation data set into the second detection model, and one or more inference results attached to each of the one or more images Determine a fourth threshold that is smaller than the second threshold by referring to a comparison result with one or more correct labels;
The information processing device is
In the pseudo-labeled second data set generated by the first data set generation means, one or more inference results by the first inference means are less than the first threshold and the third a first non-learning region determination means for determining a region corresponding to an inference result having a reliability equal to or higher than a threshold of as a non-learning region that is not subject to learning by the pseudo label reference learning means;
In the pseudo-labeled first data set generated by the second data set generation means, one or more inference results by the second inference means are less than the second threshold and the fourth a second non-learning region determination means for determining a region corresponding to an inference result having a reliability equal to or higher than a threshold of as a non-learning region that is not subject to learning by the pseudo label reference learning means;
The information processing apparatus according to claim 12 or 13, comprising:
acquisition means for acquiring a target image;
detection means for detecting an object included in the target image using a target image detection model;
with
The target image detection model includes:
A learning process for learning a detection model using the first data set;
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination process for determining the first threshold with reference to the comparison result of
Inference processing for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the second data set into the detection model;
By setting an inference result having a reliability equal to or higher than the first threshold among the one or more inference results by the inference processing as a pseudo label and associating the pseudo label with the corresponding image, after giving the pseudo label and a pseudo-label reference learning process for learning the target image detection model by referring to the pseudo-labeled data set. and information processing equipment.
In the threshold determination process, a second threshold smaller than the first threshold is also determined with reference to the comparison result,
In the data set generation process, in the pseudo-labeled data set, one or more inference results obtained by the inference process have a reliability less than the first threshold and equal to or greater than the second threshold. is determined as a non-learning region that is not subject to learning by the pseudo label reference learning process,
16. The information processing apparatus according to claim 15, wherein in the pseudo-label reference learning process, the target image detection model is learned by referring to the pseudo-labeled data set including the non-learning region.
a learning step of learning a detection model using the first data set;
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination step of determining the first threshold with reference to the comparison result of
an inference step of obtaining one or more inference results for each of the one or more images contained in a second data set by inputting each of the one or more images into the sensing model;
setting an inference result having a reliability equal to or higher than the first threshold among the one or more inference results in the inference step as a pseudo label, and associating the pseudo label with the corresponding image, and a data set generation step of generating a data set of .
A pseudo label for learning a target image detection model for detecting an object included in the target image using the first data set after the pseudo labeling and the second data set after the pseudo labeling further comprising a reference learning step;
In the threshold determination step, a second threshold smaller than the first threshold is also determined with reference to the comparison result,
In the dataset generating step, in the pseudo-labeled dataset, one or more inference results obtained in the inference step have a degree of confidence less than the first threshold and greater than or equal to the second threshold. 18. The information processing method according to claim 17, wherein an area corresponding to the result is determined as a non-learning area that is not subject to learning in the pseudo label reference learning step.
obtaining a target image;
Detecting an object included in the target image using a target image detection model;
including
The target image detection model includes:
A learning process for learning a detection model using the first data set;
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination process for determining the first threshold with reference to the comparison result of
Inference processing for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the second data set into the detection model;
By setting an inference result having a reliability equal to or higher than the first threshold among the one or more inference results by the inference processing as a pseudo label and associating the pseudo label with the corresponding image, after giving the pseudo label and a pseudo-label reference learning process for learning the target image detection model by referring to the pseudo-labeled data set. Information processing method.
a learning step of learning a detection model using the first data set;
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination step of determining the first threshold with reference to the comparison result of
an inference step of obtaining one or more inference results for each of the one or more images contained in a second data set by inputting each of the one or more images into the sensing model;
setting an inference result having a reliability equal to or higher than the first threshold among the one or more inference results in the inference step as a pseudo label, and associating the pseudo label with the corresponding image, a dataset generation step for generating a dataset of
and a pseudo label reference learning step of learning a target image detection model for detecting an object included in the target image using the data set after the pseudo labeling. Production method.
In the threshold determination step, a second threshold smaller than the first threshold is also determined with reference to the comparison result,
In the dataset generating step, in the pseudo-labeled dataset, one or more inference results obtained in the inference step have a degree of confidence less than the first threshold and greater than or equal to the second threshold. 21. The detection model manufacturing method according to claim 20, wherein a region corresponding to the result is determined as a non-learning region that is not subject to learning in the pseudo label reference learning step.
A program for causing a computer to function as an information processing device, the computer comprising:
a learning means for learning a detection model using the first data set;
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination means for determining a threshold with reference to the result of comparison with
inference means for obtaining one or more inference results for each of the one or more images included in the second data set by inputting each of the one or more images into the detection model;
setting an inference result having a reliability equal to or higher than the threshold among the one or more inference results by the inference means as a pseudo-label, and associating the pseudo-label with the corresponding image to obtain a data set after pseudo-labeling; A program that functions as a dataset generator that generates a .
A program for causing a computer to function as an information processing device, the computer comprising:
acquisition means for acquiring a target image;
detection means for detecting an object included in the target image using a target image detection model;
function as
The target image detection model includes:
A learning process for learning a detection model using the first data set;
One or more inference results obtained by inputting each of one or more images included in the evaluation data set into the detection model, and one or more correct labels attached to each of the one or more images A threshold determination process for determining a threshold with reference to the result of comparison with
Inference processing for obtaining one or more inference results for each of the one or more images by inputting each of the one or more images included in the second data set into the detection model;
By setting an inference result having a reliability equal to or higher than the threshold among the one or more inference results by the inference process as a pseudo label and associating the pseudo label with the corresponding image, the data set after pseudo labeling and a pseudo-label reference learning process for learning the target image detection model by referring to the pseudo-labeled data set.