CN114494787A

CN114494787A - Image tag determination method and device, electronic equipment and storage medium

Info

Publication number: CN114494787A
Application number: CN202210143921.8A
Authority: CN
Inventors: 王粟瑶; 梁俪倩
Original assignee: Beijing Horizon Information Technology Co Ltd
Current assignee: Beijing Horizon Information Technology Co Ltd
Priority date: 2022-02-16
Filing date: 2022-02-16
Publication date: 2022-05-13

Abstract

The embodiment of the disclosure discloses a method and a device for determining an image tag, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring first image data; predicting first image data based on a first semi-supervised semantic segmentation model obtained by training to obtain a corresponding first prediction result; the first prediction result comprises first probability data corresponding to the first image data, and the first probability data comprises probabilities that the first pixels in the first image data belong to the types respectively; determining first pseudo tag data corresponding to the first image data based on the first probability data, wherein the first pseudo tag data comprises a type to which each first pixel belongs; determining a first confidence degree corresponding to each first pixel based on the first probability data; and determining first label data corresponding to the first image data based on the first confidence degree and the first pseudo label data respectively corresponding to each first pixel. Automatic labeling of the images is achieved, so that the workload of manual labeling is effectively reduced, and the working efficiency is improved.

Description

Image tag determination method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to computer vision technologies, and in particular, to a method and an apparatus for determining an image tag, an electronic device, and a storage medium.

Background

In the field of computer vision, semantic segmentation of images needs to be realized based on various models, training of the models needs a large amount of image data labeled with labels, in order to reduce manual participation of data labeling, a semi-supervised semantic segmentation model training method gradually becomes one of important technologies, training of the semantic segmentation models can be realized by combining labeled data and unlabelled data, in order to further optimize the performance of the semantic segmentation models, some image data of hard samples which are more valuable to model training are usually determined and manually labeled, and then the image data of the hard samples and corresponding labels of the manual labels are used for optimization training of the semantic segmentation models, but the hard samples are manually labeled, so that the workload of manual labeling is increased, and the work efficiency is low.

Disclosure of Invention

The method and the device for manually marking the difficult samples solve the technical problems that the manual marking work efficiency is low and the like. The embodiment of the disclosure provides a method and a device for determining an image tag, an electronic device and a storage medium.

According to an aspect of an embodiment of the present disclosure, there is provided an image tag determination method including: acquiring first image data; predicting the first image data based on a first semi-supervised semantic segmentation model obtained by training to obtain a corresponding first prediction result; the first prediction result comprises first probability data corresponding to the first image data, wherein the first probability data comprises probabilities that first pixels in the first image data belong to different types respectively; determining, based on the first probability data, first pseudo tag data corresponding to the first image data, the first pseudo tag data including a type to which each of the first pixels belongs; determining a first confidence corresponding to each first pixel based on the first probability data; and determining first label data corresponding to the first image data based on a first confidence degree corresponding to each first pixel and the first pseudo label data.

According to another aspect of the embodiments of the present disclosure, there is provided an image tag determination apparatus including: the first acquisition module is used for acquiring first image data; the first processing module is used for predicting the first image data based on a first semi-supervised semantic segmentation model obtained by training to obtain a corresponding first prediction result; the first prediction result comprises first probability data corresponding to the first image data, wherein the first probability data comprises probabilities that first pixels in the first image data belong to different types respectively; a second processing module, configured to determine, based on the first probability data, first pseudo tag data corresponding to the first image data, where the first pseudo tag data includes a type to which each of the first pixels belongs; a third processing module, configured to determine, based on the first probability data, first confidence levels corresponding to the first pixels, respectively; and the fourth processing module is configured to determine first tag data corresponding to the first image data based on the first confidence degree and the first pseudo tag data respectively corresponding to each first pixel.

According to still another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the method for determining an image tag according to any one of the above embodiments of the present disclosure.

According to still another aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing the processor-executable instructions; the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method for determining an image tag according to any of the above embodiments of the present disclosure.

Based on the image tag determination method, the image tag determination device, the electronic device and the storage medium provided by the embodiments of the present disclosure, the first image data to be annotated can be predicted based on the semi-supervised semantic segmentation model, the pseudo tag corresponding to the first image data and the confidence corresponding to each pixel in the first image data are determined based on the prediction result, and the tag data corresponding to the first image data is determined based on the confidence corresponding to each pixel and the pseudo tag, so that the automatic annotation of the image is realized, thereby effectively reducing the workload of manual annotation and improving the working efficiency.

The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in more detail embodiments of the present disclosure with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. In the drawings, like reference numbers generally represent like parts or steps.

Fig. 1 is an exemplary application scenario of the image tag determination method provided by the present disclosure;

fig. 2 is a flowchart illustrating a method for determining an image tag according to an exemplary embodiment of the disclosure;

FIG. 3 is a flowchart of step 205 provided by an exemplary embodiment of the present disclosure;

FIG. 4 is a flowchart of step 201 provided by an exemplary embodiment of the present disclosure;

FIG. 5 is a flowchart of step 2012 provided by an exemplary embodiment of the present disclosure;

fig. 6 is a flowchart of step 2012 provided by another exemplary embodiment of the present disclosure;

fig. 7 is a flowchart illustrating a method for determining an image tag according to another exemplary embodiment of the present disclosure;

fig. 8 is a flowchart illustrating a method for determining an image tag according to still another exemplary embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of an apparatus for determining an image tag according to an exemplary embodiment of the present disclosure;

fig. 10 is a schematic structural diagram of a fourth processing module 505 according to an exemplary embodiment of the disclosure;

fig. 11 is a schematic structural diagram of a first obtaining module 501 provided in an exemplary embodiment of the present disclosure;

fig. 12 is a schematic structural diagram of a third processing unit 5012 provided in an exemplary embodiment of the present disclosure;

fig. 13 is a schematic structural diagram of a third processing unit 5012 provided in another exemplary embodiment of the present disclosure;

fig. 14 is a schematic structural diagram of an apparatus for determining an image tag according to another exemplary embodiment of the present disclosure;

fig. 15 is a schematic structural diagram of an application embodiment of the electronic device of the present disclosure.

Detailed Description

Hereinafter, example embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.

It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used merely to distinguish one element from another, and are not intended to imply any particular technical meaning, nor is the necessary logical order between them.

It is also understood that in embodiments of the present disclosure, "a plurality" may refer to two or more and "at least one" may refer to one, two or more.

It is also to be understood that any reference to any component, data, or structure in the embodiments of the disclosure, may be generally understood as one or more, unless explicitly defined otherwise or stated otherwise.

In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing an associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.

It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

The disclosed embodiments may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

Summary of the disclosure

In the process of implementing the present disclosure, the inventor finds that, in the field of computer vision, training of a semantic segmentation model requires a large amount of image data labeled with labels, in order to reduce manual participation in data labeling, a semantic segmentation model training method based on semi-supervision gradually becomes one of important technologies, training of the semantic segmentation model based on semi-supervision can be implemented by combining labeled data and unlabeled data, in order to further optimize the performance of the semantic segmentation model, some image data of hard samples which are more valuable to model training are generally determined and manually labeled, and then the image data of hard samples and corresponding labels of manual labeling are used for optimization training of the semantic segmentation model, but the hard samples are manually labeled, so that the workload of manual labeling is increased, and the work efficiency is low.

Brief description of the drawings

Fig. 1 is an exemplary application scenario of the image tag determination method provided in the present disclosure. With respect to the mined hard sample image data, the tag data of the hard sample image data can be automatically determined by executing the image tag determination method of the present disclosure with the image tag determination device of the present disclosure. Specifically, the image data of the difficult sample to be labeled can be predicted based on the semi-supervised semantic segmentation model, the pseudo label corresponding to the image data of the difficult sample and the confidence corresponding to each pixel in the image data of the difficult sample are determined based on the prediction result, the label data corresponding to the image data of the difficult sample is determined based on the confidence corresponding to each pseudo label and each pixel, and automatic labeling of the image is achieved, so that the workload of manual labeling is effectively reduced, and the working efficiency is improved.

The image label determination method provided by the disclosure can be applied to any application field related to image semantic segmentation, including but not limited to the fields of automatic driving, geographic information systems, medical image analysis, robots and the like.

Exemplary method

Fig. 2 is a flowchart illustrating a method for determining an image tag according to an exemplary embodiment of the present disclosure. The embodiment can be applied to an electronic device, such as a server or a terminal, as shown in fig. 2, and includes the following steps:

step 201, first image data is acquired.

The first image data may be any image data that needs to be labeled, such as image data of a hard sample that is mined by any implementable method and has a higher value, and is not limited in particular. The first image data may be obtained in advance and stored at a position, and the first image data is obtained from the position when the marking is needed. The first image data may include one or more frames of images.

Step 202, predicting first image data based on a first semi-supervised semantic segmentation model obtained by training to obtain a corresponding first prediction result; the first prediction result includes first probability data corresponding to the first image data, and the first probability data includes probabilities that the first pixels in the first image data belong to the respective types.

The first semi-supervised semantic segmentation model may be any practicable semi-supervised semantic segmentation model, and the disclosure is not limited thereto, for example, the semi-supervised semantic segmentation model based on deplaybv 3+ and related series thereof. The type can be set as one or more according to actual requirements, such as the field of automatic driving, and the type comprises obstacles such as people, animals, vehicles, road edges and the like. The first probability data includes probabilities that each first pixel in the first image data belongs to each type, for example, a probability that a certain first pixel belongs to a person is 0.1, a probability that the certain first pixel belongs to an animal is 0.2, a probability that the certain first pixel belongs to a vehicle is 0.6, and a probability that the certain first pixel belongs to a road edge is 0.1. The first image data is predicted based on the first semi-supervised semantic segmentation model, namely the first image data is used as the input of the first semi-supervised semantic segmentation model, the output process is obtained through the reasoning of the first semi-supervised semantic segmentation model, and the specific principle is not repeated.

Step 203, determining first pseudo tag data corresponding to the first image data based on the first probability data, wherein the first pseudo tag data comprises a type to which each first pixel belongs.

The first probability data is encoded, for example, one-hot (one-hot) encoded, and the encoding result is used as the first pseudo tag data corresponding to the first image data.

Illustratively, the probability of a first pixel belonging to a person is 0.1, the probability of belonging to an animal is 0.2, the probability of belonging to a vehicle is 0.6, and the probability of belonging to a road edge is 0.1, and the label of the first pixel obtained after encoding is 0010.

Step 204, determining a first confidence corresponding to each first pixel based on the first probability data.

The first confidence of the first pixel is information representing the credibility of the predicted result of the first pixel, and the first confidence can be determined based on any practicable mode. Such as calculating a difference based on two probability values of top 2.

Illustratively, if the probability that a first pixel belongs to a human is 0.1, the probability that the first pixel belongs to an animal is 0.2, the probability that the first pixel belongs to a vehicle is 0.6, and the probability that the first pixel belongs to a road edge is 0.1, the first confidence corresponding to the first pixel is 0.6-0.2-0.4.

Step 205, determining first label data corresponding to the first image data based on the first confidence degree and the first pseudo label data corresponding to each first pixel.

Specifically, whether the type corresponding to each first pixel is valid for supervision is determined based on the first confidence degree corresponding to each first pixel, for example, the classification result confidence degree is poor for the first pixel with the first confidence degree smaller than the second threshold, and the type to which the first pixel belongs may be set as invalid for supervision, that is, when the classification result is used for supervised learning, the type of the first pixel is not learned as a supervision tag, so as to avoid learning an incorrect classification experience. And the first pixel with the first confidence degree larger than the second threshold has better classification result confidence degree, and the type of the first pixel can be used as a label for supervised learning. The specific second threshold may be set according to actual requirements, and this embodiment is not limited.

In an optional example, a second confidence corresponding to the first image data may be determined by averaging based on first confidence corresponding to each first pixel, the first image data is screened based on the second confidence, an image with the second confidence smaller than the first threshold is screened out as image data of a hard sample with a higher value, and subsequent label determination is performed, which may be specifically set according to actual requirements.

The method for determining the image tag provided by this embodiment can predict the first image data to be labeled based on the semi-supervised semantic segmentation model, further determine the pseudo tag corresponding to the first image data and the confidence coefficient corresponding to each pixel in the first image data based on the prediction result, and further determine the tag data corresponding to the first image data based on the confidence coefficient corresponding to each pixel and the pseudo tag, thereby implementing automatic labeling of the image, effectively reducing the workload of manual labeling, and improving the work efficiency.

In an optional example, fig. 3 is a flowchart illustrating step 205 provided in an exemplary embodiment of the present disclosure, and in this example, step 205 may specifically include the following steps:

step 2051 determines, based on the first confidence degrees corresponding to the first pixels, second confidence degrees corresponding to the first image data.

Optionally, a certain rule may be set according to actual requirements to determine the second confidence, for example, the first confidence corresponding to each first pixel is summed and averaged, and the average is taken as the second confidence corresponding to the first image data.

Step 2052, if the second confidence is smaller than the first threshold, updating the type of the first pixel in the first pseudo tag data, whose first confidence is smaller than the second threshold, to the first type, to obtain second pseudo tag data corresponding to the first pseudo tag data, where the first type indicates that the first pixel is invalid for supervision.

Specifically, the first image data is screened based on the second confidence coefficient and the first threshold, a target image with the second confidence coefficient smaller than the first threshold is screened out to be used as a high-value difficult sample image, the type of a first pixel corresponding to the target image in the first pseudo tag data is updated according to the first confidence coefficient and the second threshold, the type of the first pixel with the first confidence coefficient smaller than the second threshold is updated to be a first type, the second pseudo tag data is obtained after updating is finished, and the second pseudo tag data comprises the type of the first pixel with effective supervision and the first type of the first pixel with ineffective supervision.

For the images with the second confidence degree greater than the first threshold, the type of the first pixel may not be updated, the first pseudo tag data may be directly used as the corresponding tag data, or the tags of the images may not be determined, which may be specifically set according to actual requirements.

Step 2053 is to use the second pseudo tag data as the first tag data corresponding to the first image data.

Specifically, the first pixel with the first confidence smaller than the second threshold has poor confidence of the classification result, and the type of the first pixel is set to be invalid for supervision, that is, when the first pixel is used for supervised learning, the type of the first pixel is not used as a supervision tag for learning, so that the wrong classification experience is avoided being learned. And the first pixel with the first confidence degree larger than the second threshold value has better confidence degree of the classification result, the type of the first pixel can be used as a label for supervised learning, and therefore the updated second pseudo label data can be used as the first label data corresponding to the first image data.

The second pseudo tag data may be updated pseudo tag data including only a target image of which the second confidence degree is smaller than the first threshold in the first image data, or may include both the updated pseudo tag data of the target image and first pseudo tag data corresponding to other images in the first image data, which may be specifically set according to actual requirements.

According to the method, the first image data are screened based on the second confidence coefficient and the first threshold corresponding to the first image data, the target image with the second confidence coefficient smaller than the first threshold is screened out and serves as a high-value difficult sample image, the type of the first pixel corresponding to the target image in the first pseudo label data is updated according to the first confidence coefficient and the second threshold, the type of the first pixel with the first confidence coefficient smaller than the second threshold is updated to be the first type, and the second pseudo label data are obtained after updating is completed, so that the type of the first pixel with the lower confidence coefficient is not used for supervised learning, wrong classification experience can be effectively avoided being learned, and the accuracy and effectiveness of the determined label are improved.

In an alternative example, fig. 4 is a flowchart of step 201 provided by an exemplary embodiment of the present disclosure, in this example, the first image data is hard sample image data; step 201 may specifically include the following steps:

in step 2011, unlabeled image data is obtained.

The non-label image data may be image data without a label acquired by any method, and the non-label image data may include one or more images, for example, a large number of non-label images may be acquired as the non-label image data.

Step 2012, a first image data is determined based on the unlabeled image data.

First image data belonging to a difficult sample can be mined from unlabeled image data in any practicable manner.

For example, prediction is performed based on a plurality of semantic segmentation models, an image with large difference of prediction results is obtained as a difficult sample, and first image data is formed; for another example, a semantic segmentation model is used for prediction, and an image with a low confidence coefficient of a prediction result is obtained and is used as a difficult sample to form first image data; it is also possible to combine difficult samples obtained in various ways as the first image data. The method can be specifically set according to actual requirements.

According to the method, through a large amount of label-free image data, the image data which are difficult to sample and have high value are mined and used as the first image data to perform subsequent automatic labeling, and the working efficiency is further improved.

In an alternative example, fig. 5 is a flowchart of step 2012 provided by an exemplary embodiment of the present disclosure, and in this example, the determining the first image data based on the unlabeled image data in step 2012 includes:

step 20121a, predicting the unlabeled image data based on the trained first semantic segmentation model to obtain a second prediction result corresponding to the unlabeled image data; the second prediction result includes a first probability that each second pixel in the unlabeled image data belongs to each type.

The first semantic segmentation model may be any implementable semantic segmentation model, and a specific network architecture is not limited, for example, the FCN (full volumetric network) -based semantic segmentation model and its series, the UNet-based semantic segmentation model and its series, the deep lab-based semantic segmentation model and its series, and the like may be specifically set according to actual requirements. The detailed prediction principle is not described herein.

Step 20122a, predicting the unlabeled image data based on the trained second semantic segmentation model to obtain a third prediction result corresponding to the unlabeled image data; the third prediction result comprises second probabilities that second pixels in the unlabeled image data belong to the types respectively.

The second semantic segmentation model is a semantic segmentation model different from the first semantic segmentation model, and the difference may be that the network structures are different, or the network structures are the same and the network parameters are different, and specifically, the second semantic segmentation model may be set according to actual requirements, and the disclosure is not limited. The detailed prediction principle is not described herein.

Wherein, the

steps

20121a and 20122a are not in sequence.

Step 20123a, determining corresponding third pseudo tag data according to the second prediction result, and determining corresponding fourth pseudo tag data according to the third prediction result; the third pseudo label data comprise the type of each second pixel corresponding to the second prediction result; the fourth pseudo tag data includes a type to which each second pixel corresponding to the third prediction result belongs.

The determination principle of the third pseudo tag data and the fourth pseudo tag data is similar to that of the first pseudo tag data, and is not described herein again.

Step 20124a, determining a difference value between the third pseudo tag data and the fourth pseudo tag data according to the third pseudo tag data and the fourth pseudo tag data; the disparity value is an IOU value and/or a pixel disparity number.

The IOU value is a value obtained by calculation of an IOU (interaction Over Union), and the pixel difference number refers to the number of pixels in which pixels at the same position do not belong to the same type.

In an alternative example, determining the IOU value between the third pseudo tag data and the fourth pseudo tag data based on the third pseudo tag data and the fourth pseudo tag data includes:

for each type, determining a first number of pixels, belonging to the type, of second pixels at the same position in the third pseudo tag data and the fourth pseudo tag data, and a second number of pixels, belonging to the type, of a union of the second pixels in the third pseudo tag data and the fourth pseudo tag data; and taking the ratio of the first quantity to the second quantity as an IOU value corresponding to the type, and averaging the IOU values of all types to obtain the IOU values of the third pseudo tag data and the fourth pseudo tag data.

Illustratively, the types include human, animal, vehicle and road edge, for example, the number of the second pixels belonging to human in the third pseudo tag data is 40, and the number of the second pixels belonging to human in the fourth pseudo tag data is 50, wherein 35 of the second pixels belonging to human in the third pseudo tag data and the fourth pseudo tag data are located at the same position, that is, the first number is 35, the second number is (40-35) +35+ (50-35) ═ 55, and then the IOU value corresponding to the type "human" is 35/55 ═ 0.636. Similarly, other types of corresponding IOU values may be obtained.

In an alternative example, determining the number of pixel differences between the third pseudo tag data and the fourth pseudo tag data based on the third pseudo tag data and the fourth pseudo tag data includes:

and determining the different pixel difference quantity of the second pixel type at the same position under each type, and further obtaining the pixel difference quantity between the third pseudo label data and the fourth pseudo label data based on the average value of the pixel difference quantity of each type.

Illustratively, the types include human, animal, vehicle and road edge, taking human as an example, the number of the second pixels belonging to human in the third pseudo tag data is 40, and the number of the second pixels belonging to human in the fourth pseudo tag data is 50, wherein 35 pixels located at the same position in the third pseudo tag data and the second pixels belonging to human in the fourth pseudo tag data are different, and the number of the pixel difference of the type at the same position is (40-35) + (50-35) ═ 20, that is, the number of the pixel difference corresponding to the type "human" is 20. Similarly, the pixel difference quantity corresponding to other types can be obtained, and the pixel difference quantity between the third pseudo tag data and the fourth pseudo tag data is obtained through averaging.

Step 20125a, determining the first image data according to the difference value.

Specifically, after the difference value between the third pseudo tag data and the fourth pseudo tag data is determined, image screening may be performed according to the difference value based on a preset screening rule, so as to determine the first image data from the non-tag image data. The preset screening rule can be set according to actual requirements, and the embodiment is not limited.

According to the method and the device, the first image data of the difficult sample is determined from the non-label image data based on the difference value of the prediction result of the non-label image data of different semantic segmentation models, so that the automatic mining of the difficult sample is realized, and the working efficiency is further improved.

In one optional example, the unlabeled image data includes at least one unlabeled image; the difference values comprise difference values corresponding to the label-free images respectively; the step 20125a of determining the first image data according to the difference value includes: regarding any one of the unlabelled image data as a first unlabelled image, if the IOU value corresponding to the first unlabelled image is smaller than the IOU threshold and/or the pixel difference number corresponding to the first unlabelled image is larger than the difference number threshold, the first unlabelled image is used as the first image data.

Specifically, if the IOU value corresponding to the first unlabeled image is smaller than the IOU threshold, it indicates that the difference between the prediction results of the different semantic segmentation models is large, that is, the first unlabeled image is not easily segmented by semantics and has a high learning value, and therefore, the first unlabeled image can be used as a difficult sample image and added to the first image data.

According to the method, based on the difference of prediction results of different semantic segmentation models, the difficult sample images with larger difference and higher learning value are excavated to form first image data, a subsequent automatic labeling process is entered, and the effectiveness of automatic labeling work is further improved, so that more valuable labeled data is provided for further optimization training of the semantic segmentation models, and the model performance is improved.

In an alternative example, fig. 6 is a flowchart of step 2012 provided by another exemplary embodiment of the present disclosure, in which example the non-label image data includes at least one non-label image; determining first image data based on the unlabeled image data of step 2012 includes:

step 20121b, predicting the unlabeled image data based on the third semantic segmentation model to obtain a fourth prediction result; the fourth prediction result includes a probability that each third pixel of each unlabeled image in the unlabeled image data belongs to each type.

The third semantic segmentation model may be the first semantic segmentation model or the second semantic segmentation model, or may be any other implementable semantic segmentation model, and is also a semantic segmentation model obtained by pre-training. The detailed prediction principle is not described in detail.

Step 20122b, determining confidence degrees corresponding to the third pixels of the label-free images respectively based on the fourth prediction result.

The principle of determining confidence level in this step is referred to the above, and will not be described herein again.

Step 20123b, determining the confidence corresponding to each non-label image based on the confidence corresponding to each third pixel of each non-label image.

The confidence degrees corresponding to the non-labeled images in this step may be obtained by averaging the confidence degrees corresponding to the third pixels, which is not described in detail herein.

Step 20124b, determining the first image data based on the confidence degree and the preset confidence degree threshold value corresponding to each unlabeled image.

Specifically, when the confidence corresponding to the unlabeled image is lower than the preset confidence threshold, it indicates that the semantic segmentation confidence of the unlabeled image is poor, and therefore, the unlabeled image can be used as a difficult sample and added to the first image data.

According to the method, the first image data of the difficult samples are mined from a large number of unlabeled images through the confidence coefficient of the semantic segmentation model prediction result, so that the automatic mining of the difficult samples is realized, and the working efficiency of related mining work is improved.

In an optional example, the determining, based on the fourth prediction result in step 20122b, the confidence levels corresponding to the third pixels of the unlabeled images respectively includes: and regarding a third pixel, taking the difference value between the maximum probability and the second maximum probability corresponding to the third pixel as the confidence corresponding to the third pixel.

Specifically, when the difference between the maximum probability and the second maximum probability of the third pixel is large, it indicates that the third pixel can relatively and definitely determine the type to which the third pixel belongs, that is, the identifiability of the third pixel is strong, and when the difference between the maximum probability and the second maximum probability of the third pixel is small, it indicates that the type to which the third pixel belongs is not very definite and the identifiability is weak, so that if the confidence of the unlabeled image determined based on the confidence of each third pixel is smaller than a preset confidence threshold, it indicates that the segmentability of the unlabeled image is poor, and the method has a high learning value for optimizing the semantic segmentation model, and can be used as a difficult sample.

In an alternative example, the first semi-supervised semantic segmentation model is obtained by:

and training the initial semi-supervised semantic segmentation model based on the first image data, the pre-obtained second image data and the second label data corresponding to the second image data to obtain a first semi-supervised semantic segmentation model.

The first image data is non-label image data, the second image data is label image data which is obtained in advance through any implementable mode and has corresponding second label data, the obtaining of the second image data and the corresponding second label data can be set according to actual requirements, and the embodiment is not limited. And performing semi-supervised training on the initial semi-supervised segmentation model through the non-labeled data and the labeled data to obtain a first semi-supervised semantic segmentation model. The initial semi-supervised semantic segmentation model may be a semantic segmentation model obtained by performing initial supervised training based on tagged data.

In an optional example, fig. 7 is a flowchart of a method for determining an image tag according to another exemplary embodiment of the present disclosure, in this example, after determining, in step 205, first tag data corresponding to first image data based on a first confidence degree and first pseudo tag data respectively corresponding to each first pixel, the method of the present disclosure further includes:

and step 206, optimizing the fourth semantic segmentation model based on the first image data and the first label data to obtain an optimized model corresponding to the fourth semantic segmentation model.

The fourth semantic segmentation model can be any practicable semantic segmentation model, the fourth semantic segmentation model can be an untrained or trained semantic segmentation model, and can be specifically set according to actual requirements.

The method and the device can be used for further optimizing the semantic segmentation model after the first label data corresponding to the first image data of the hard sample is obtained, so that the model performance is effectively improved.

In an alternative example, fig. 8 is a flowchart illustrating a method for determining an image tag according to still another exemplary embodiment of the disclosure. In this example, the method of the present disclosure comprises:

1. label-free image data is acquired.

2. Based on the unlabeled image data, first image data is determined.

3. And training the initial semi-supervised semantic segmentation model based on the first image data, the pre-obtained second image data and the second label data corresponding to the second image data to obtain a first semi-supervised semantic segmentation model.

4. Predicting first image data based on a first semi-supervised semantic segmentation model obtained by training to obtain a corresponding first prediction result; the first prediction result includes first probability data corresponding to the first image data, and the first probability data includes probabilities that the first pixels in the first image data belong to the respective types.

5. And determining first pseudo tag data corresponding to the first image data and first confidence degrees corresponding to the first pixels respectively based on the first probability data, wherein the first pseudo tag data comprises types of the first pixels.

6. And determining a second confidence degree corresponding to the first image data based on the first confidence degree corresponding to each first pixel.

7. If the second confidence coefficient is smaller than the first threshold value, updating the type of the first pixel of the first pseudo tag data, of which the first confidence coefficient is smaller than the second threshold value, to the first type, and obtaining second pseudo tag data corresponding to the first pseudo tag data, wherein the first type represents that the first pixel is invalid for supervision.

8. And taking the second pseudo label data as the first label data corresponding to the first image data.

9. And optimizing the fourth semantic segmentation model based on the first image data and the first label data to obtain an optimization model corresponding to the fourth semantic segmentation model.

For specific operations of each step in this example, refer to the foregoing embodiments or examples, which are not described in detail again.

The method comprises the steps of automatically mining hard samples based on semi-supervised semantic segmentation, automatically labeling the hard samples, and further optimizing and training a semantic segmentation model based on the labeled hard samples, so that valuable data are automatically mined from mass data, the hard sample data for model training is obtained, semi-supervised semantic segmentation training is performed through the mined hard sample data and existing labeled data, the semi-supervised semantic segmentation model is automatically iterated, after the updated semantic segmentation model is obtained, pseudo labels are obtained through confidence degree analysis, the pseudo labels and the existing labeled data are combined for optimizing and training of the supervised semantic segmentation model, the automatic optimized updated model is realized, and a better model is obtained without increasing labeling cost. And then, the operation can be carried out on mass unmarked data by using the secondarily updated model, so that closed loop of data mining and model training is realized, and the performance of the model is continuously improved.

Any of the image tag determination methods provided by the embodiments of the present disclosure may be performed by any suitable device having data processing capabilities, including but not limited to: terminal equipment, a server and the like. Alternatively, the method for determining any image tag provided by the embodiments of the present disclosure may be executed by a processor, for example, the processor may execute the method for determining any image tag mentioned by the embodiments of the present disclosure by calling a corresponding instruction stored in a memory. And will not be described in detail below.

Exemplary devices

Fig. 9 is a schematic structural diagram of an apparatus for determining an image tag according to an exemplary embodiment of the present disclosure. The apparatus of this embodiment may be used to implement the corresponding method embodiment of the present disclosure, and the apparatus shown in fig. 9 includes: a first obtaining module 501, a first processing module 502, a second processing module 503, a third processing module 504 and a fourth processing module 505.

A first obtaining module 501, configured to obtain first image data.

The first processing module 502 is configured to predict, based on a first semi-supervised semantic segmentation model obtained through training, first image data obtained by the first obtaining module 501, and obtain a corresponding first prediction result; the first prediction result includes first probability data corresponding to the first image data, and the first probability data includes probabilities that the first pixels in the first image data belong to the respective types.

A second processing module 503, configured to determine, based on the first probability data obtained by the first processing module 502, first pseudo tag data corresponding to the first image data, where the first pseudo tag data includes a type to which each first pixel belongs.

A third processing module 504, configured to determine, based on the first probability data obtained by the first processing module 502, first confidence levels corresponding to the first pixels, respectively.

The fourth processing module 505 is configured to determine first label data corresponding to the first image data based on the first confidence degree corresponding to each first pixel obtained by the third processing module 504 and the first pseudo label data obtained by the second processing module 503.

In an alternative example, fig. 10 is a schematic structural diagram of a fourth processing module 505 according to an exemplary embodiment of the disclosure. In this example, the fourth processing module 505 includes: a first determination unit 5051, a first processing unit 5052, and a second processing unit 5053.

A first determining unit 5051, configured to determine, based on the first confidence degrees respectively corresponding to the first pixels obtained by the third processing module 504, a second confidence degree corresponding to the first image data; a first processing unit 5052, configured to, if the second confidence degree obtained by the first determining unit 5051 is smaller than the first threshold, update the type of the first pixel in the first pseudo tag data, where the first confidence degree is smaller than the second threshold, to the first type, and obtain second pseudo tag data corresponding to the first pseudo tag data, where the first type indicates that the first pixel is invalid for supervision; a second processing unit 5053 is configured to take the second pseudo tag data obtained by the first processing unit 5052 as first tag data corresponding to the first image data.

In an alternative example, fig. 11 is a schematic structural diagram of the first obtaining module 501 according to an exemplary embodiment of the present disclosure. In this example, the first image data is hard sample image data; the first obtaining module 501 includes: a first acquiring unit 5011 and a third processing unit 5012. A first acquiring unit 5011 for acquiring the label-free image data; the third processing unit 5012 determines the first image data based on the non-label image data acquired by the first acquiring unit 5011.

In an alternative example, fig. 12 is a schematic structural diagram of a third processing unit 5012 provided in an exemplary embodiment of the present disclosure. In this example, the third processing unit 5012 includes: the first predictor subunit 50121, the second predictor subunit 50122, the first processing subunit 50123, the second processing subunit 50124 and the first determining subunit 50125.

The first predicting subunit 50121 is configured to predict the unlabeled image data obtained by the first obtaining unit 5011 based on the first semantic segmentation model obtained through training, and obtain a second prediction result corresponding to the unlabeled image data; the second prediction result comprises first probabilities that second pixels in the unlabeled image data belong to different types respectively; the second predicting subunit 50122 is configured to predict the unlabeled image data obtained by the first obtaining unit 5011 based on the trained second semantic segmentation model, and obtain a third prediction result corresponding to the unlabeled image data; the third prediction result comprises second probabilities that second pixels in the unlabeled image data belong to different types respectively; a first processing subunit 50123 configured to determine corresponding third pseudo tag data based on the second prediction result obtained by the first predicting subunit 50121, and determine corresponding fourth pseudo tag data based on the third prediction result obtained by the second predicting subunit 50122; the third pseudo label data comprise the type of each second pixel corresponding to the second prediction result; the fourth pseudo label data comprises the type of each second pixel corresponding to the third prediction result; a second processing subunit 50124 configured to determine a difference value between the third pseudo tag data and the fourth pseudo tag data obtained by the first processing subunit 50123; the difference value is an IOU value and/or a pixel difference quantity; the first determining subunit 50125 is configured to determine the first image data based on the difference value obtained by the second processing subunit 50124.

In one optional example, the unlabeled image data includes at least one unlabeled image; the difference values comprise difference values corresponding to the label-free images respectively; the first determining subunit 50125 is specifically configured to: regarding any one of the unlabelled image data as a first unlabelled image, if the IOU value corresponding to the first unlabelled image is smaller than the IOU threshold and/or the pixel difference number corresponding to the first unlabelled image is larger than the difference number threshold, the first unlabelled image is used as the first image data.

In an alternative example, fig. 13 is a schematic structural diagram of a third processing unit 5012 provided in another exemplary embodiment of the present disclosure. In this example, the unlabeled image data includes at least one unlabeled image; the third processing unit 5012 includes: the third predicting sub-unit 50126, the second determining sub-unit 50127, the third determining sub-unit 50128 and the fourth determining sub-unit 50129.

The third predicting subunit 50126 is configured to predict the unlabeled image data obtained by the first obtaining unit 5011 based on the third semantic segmentation model, and obtain a fourth prediction result; the fourth prediction result comprises the probability that each third pixel of each unlabeled image in the unlabeled image data belongs to each type; a second determining subunit 50127 configured to determine, based on the fourth prediction result obtained by the third predicting subunit 50126, confidence levels respectively corresponding to the third pixels of the respective unlabeled images; a third determining subunit 50128, configured to determine confidence levels corresponding to the respective unlabeled images based on the confidence levels corresponding to the respective third pixels of the respective unlabeled images obtained by the second determining subunit 50127; the fourth determining subunit 50129 is configured to determine the first image data based on the confidence level and the preset confidence level threshold value respectively corresponding to each of the unlabeled images obtained by the third determining subunit 50128.

In an alternative example, the second determining subunit 50127 is specifically configured to:

and regarding a third pixel, taking the difference value between the maximum probability and the second maximum probability corresponding to the third pixel as the confidence corresponding to the third pixel.

In an alternative example, the first semi-supervised semantic segmentation model is obtained by: and training the initial semi-supervised semantic segmentation model based on the first image data, the pre-obtained second image data and the second label data corresponding to the second image data to obtain a first semi-supervised semantic segmentation model.

In an alternative example, fig. 14 is a schematic structural diagram of an apparatus for determining an image tag according to another exemplary embodiment of the present disclosure, in this example, the apparatus of the present disclosure further includes: the model optimization module 506 is configured to optimize the fourth semantic segmentation model based on the first image data and the first tag data, and obtain an optimization model corresponding to the fourth semantic segmentation model.

Exemplary electronic device

An embodiment of the present disclosure further provides an electronic device, including: a memory for storing a computer program;

a processor, configured to execute the computer program stored in the memory, and when the computer program is executed, implement the method for determining an image tag according to any of the above embodiments of the present disclosure.

Fig. 15 is a schematic structural diagram of an application embodiment of the electronic device of the present disclosure. In this embodiment, the electronic device 10 includes one or more processors 11 and a memory 12.

The processor 11 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 10 to perform desired functions.

Memory 12 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer readable storage medium and executed by the processor 11 to implement the image tag determination methods of the various embodiments of the present disclosure described above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.

In one example, the electronic device 10 may further include: an input device 13 and an output device 14, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

The input means 13 may be, for example, a microphone or a microphone array as described above for capturing an input signal of a sound source.

The input device 13 may also include, for example, a keyboard, a mouse, and the like.

The output device 14 may output various information including the determined distance information, direction information, and the like to the outside. The output devices 14 may include, for example, a display, speakers, a printer, and a communication network and its connected remote output devices, among others.

Of course, for simplicity, only some of the components of the electronic device 10 relevant to the present disclosure are shown in fig. 15, omitting components such as buses, input/output interfaces, and the like. In addition, the electronic device 10 may include any other suitable components depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the methods and apparatus described above, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform steps in methods according to various embodiments of the present disclosure as described in the "exemplary methods" section of this specification above.

The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform steps in methods according to various embodiments of the present disclosure as described in the "exemplary methods" section above of this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.

In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A method of image tag determination, comprising:

acquiring first image data;

predicting the first image data based on a first semi-supervised semantic segmentation model obtained by training to obtain a corresponding first prediction result; the first prediction result comprises first probability data corresponding to the first image data, wherein the first probability data comprises probabilities that first pixels in the first image data belong to different types respectively;

determining, based on the first probability data, first pseudo tag data corresponding to the first image data, the first pseudo tag data including a type to which each of the first pixels belongs;

determining a first confidence corresponding to each first pixel based on the first probability data;

and determining first label data corresponding to the first image data based on a first confidence degree corresponding to each first pixel and the first pseudo label data.

2. The method according to claim 1, wherein the determining the first label data corresponding to the first image data based on the first confidence level and the first pseudo label data corresponding to each of the first pixels comprises:

determining a second confidence degree corresponding to the first image data based on the first confidence degree corresponding to each first pixel;

if the second confidence degree is smaller than a first threshold value, updating the type of a first pixel of the first pseudo tag data, of which the first confidence degree is smaller than the second threshold value, to a first type to obtain second pseudo tag data corresponding to the first pseudo tag data, wherein the first type indicates that the first pixel is invalid for supervision;

and taking the second pseudo tag data as the first tag data corresponding to the first image data.

3. The method of claim 1, wherein the first image data is hard sample image data; the acquiring first image data includes:

acquiring label-free image data;

determining the first image data based on the unlabeled image data.

4. The method of claim 3, wherein the determining the first image data based on the unlabeled image data comprises:

predicting the unlabeled image data based on a first semantic segmentation model obtained by training to obtain a second prediction result corresponding to the unlabeled image data; the second prediction result comprises first probabilities that second pixels in the unlabeled image data belong to different types respectively;

predicting the unlabeled image data based on a second semantic segmentation model obtained by training to obtain a third prediction result corresponding to the unlabeled image data; the third prediction result comprises second probabilities that the second pixels in the unlabeled image data belong to the types respectively;

determining corresponding third pseudo tag data according to the second prediction result, and determining corresponding fourth pseudo tag data according to the third prediction result; the third pseudo tag data comprises types of the second pixels corresponding to the second prediction result; the fourth pseudo tag data includes a type to which each of the second pixels corresponding to the third prediction result belongs;

determining a difference value between the third pseudo tag data and the fourth pseudo tag data according to the third pseudo tag data and the fourth pseudo tag data; the difference value is an IOU value and/or a pixel difference quantity;

and determining the first image data according to the difference value.

5. The method of claim 4, wherein the unlabeled image data includes at least one unlabeled image; the difference values comprise difference values corresponding to the label-free images respectively;

the determining the first image data according to the difference value includes:

regarding any one of the unlabelled image data as a first unlabelled image, if the IOU value corresponding to the first unlabelled image is smaller than an IOU threshold and/or the pixel difference number corresponding to the first unlabelled image is larger than a difference number threshold, the first unlabelled image is used as the first image data.

6. The method of claim 3, wherein the unlabeled image data includes at least one unlabeled image; the determining the first image data based on the unlabeled image data includes:

predicting the unlabeled image data based on a third semantic segmentation model to obtain a fourth prediction result; the fourth prediction result comprises the probability that each third pixel of each unlabeled image in the unlabeled image data belongs to each type;

determining confidence degrees corresponding to the third pixels of the label-free images based on the fourth prediction result;

determining confidence degrees corresponding to the label-free images respectively based on the confidence degrees corresponding to the third pixels of the label-free images respectively;

and determining the first image data based on the confidence coefficient and the preset confidence coefficient threshold value respectively corresponding to each non-label image.

7. The method of claim 6, wherein the determining, based on the fourth prediction result, a confidence level corresponding to each third pixel of each unlabeled image comprises:

8. The method of claim 1, wherein the first semi-supervised semantic segmentation model is obtained by:

training an initial semi-supervised semantic segmentation model based on the first image data, pre-obtained second image data and second label data corresponding to the second image data to obtain the first semi-supervised semantic segmentation model.

9. The method according to any one of claims 1 to 8, further comprising, after determining the first label data corresponding to the first image data based on the first confidence level and the first pseudo label data corresponding to each of the first pixels, the step of:

and optimizing a fourth semantic segmentation model based on the first image data and the first label data to obtain an optimized model corresponding to the fourth semantic segmentation model.

10. An image tag determination apparatus comprising:

the first acquisition module is used for acquiring first image data;

the first processing module is used for predicting the first image data based on a first semi-supervised semantic segmentation model obtained by training to obtain a corresponding first prediction result; the first prediction result comprises first probability data corresponding to the first image data, wherein the first probability data comprises probabilities that first pixels in the first image data belong to different types respectively;

a second processing module, configured to determine, based on the first probability data, first pseudo tag data corresponding to the first image data, where the first pseudo tag data includes a type to which each of the first pixels belongs;

a third processing module, configured to determine, based on the first probability data, first confidence levels corresponding to the first pixels, respectively;

and the fourth processing module is configured to determine first tag data corresponding to the first image data based on the first confidence degree and the first pseudo tag data respectively corresponding to each first pixel.

11. A computer-readable storage medium storing a computer program for executing the method for determining an image tag according to any one of claims 1 to 9.

12. An electronic device, the electronic device comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method for determining an image tag according to any one of claims 1 to 9.