WO2023226606A1

WO2023226606A1 - Image segmentation sample generation method and apparatus, method and apparatus for pre-training image segmentation model, and device and medium

Info

Publication number: WO2023226606A1
Application number: PCT/CN2023/087460
Authority: WO
Inventors: 李徐泓; 熊昊一; 刘毅; 窦德景
Original assignee: 北京百度网讯科技有限公司
Priority date: 2022-05-23
Filing date: 2023-04-11
Publication date: 2023-11-30
Also published as: CN114882315A; CN114882315B

Abstract

Provided in the present disclosure are an image segmentation sample generation method and apparatus, a method and apparatus for pre-training an image segmentation model, and a device and a medium. The image segmentation sample generation method comprises: acquiring an image classification sample, wherein the image classification sample comprises a sample image and a classification label of the sample image; determining, by means of an interpretable algorithm, a weight value of each pixel point of the sample image under the action of an image classification model; selecting a forward label pixel point from the sample image according to the weight value of each pixel point; and forming, according to the forward label pixel point and the classification label of the sample image, an image segmentation sample corresponding to the image classification sample.

Description

Image segmentation sample generation method and device, image segmentation model pre-training method and device, equipment, media

This application claims priority to the Chinese patent application with application number 202210567293.6, which was submitted to the China Patent Office on May 23, 2022. The entire content of this application is incorporated into this application by reference.

Technical field

The present disclosure relates to the field of computer vision and image segmentation, for example, to methods and devices for generating image segmentation samples, pre-training methods and devices for image segmentation models, equipment, and media.

Background technique

Image semantic segmentation is a traditional task in the field of computer vision, aiming to identify and classify every pixel in the image. Image segmentation technology can be applied to a variety of scenarios, such as driving scene understanding and medical image analysis. This technology mainly achieves image semantic segmentation by training image segmentation models.

Theoretically, an image segmentation model can be obtained by training a large number of image segmentation samples. However, considering that labeling image segmentation samples is difficult, costly and time-consuming, related technologies mainly use existing image classification samples to pre-train the backbone model parameters of the deep learning model, and then continue to use image segmentation samples to predict the depth of the model. The learning model is re-adjusted to finally obtain the desired image segmentation model.

However, since the above pre-training process of the image segmentation model can only pre-train the backbone parameters of the model, it cannot effectively pre-train all model parameters, resulting in poor performance of the entire pre-training process.

Contents of the invention

The present disclosure provides methods and devices for generating image segmentation samples, pre-training methods and devices for image segmentation models, equipment, and media.

According to one aspect of the present disclosure, a method for generating image segmentation samples is provided, including:

Obtain image classification samples, where the image classification samples include sample images and classification labels of the sample images;

Through the interpretable algorithm, determine the weight value of each pixel of the sample image under the action of the image classification model;

According to the weight value of each pixel, select the forward label pixel in the sample image;

According to the forward label pixels and classification labels of the sample image, an image segmentation sample corresponding to the image classification sample is formed.

According to another aspect of the present disclosure, a pre-training method for an image segmentation model is provided, including:

Obtain image classification sample set;

Using the above method for generating image segmentation samples, each image classification sample in the image classification sample set is processed to generate an image segmentation sample set corresponding to the image classification sample set;

According to the image segmentation sample set, all model parameters included in the preset machine learning model are trained to obtain a pre-trained image segmentation model.

According to another aspect of the present disclosure, a device for generating image segmentation samples is provided, including:

The classification sample acquisition module is configured to obtain image classification samples, where the image classification samples include sample images and classification labels of the sample images;

The weight value determination module is set to determine the weight value of each pixel of the sample image under the action of the image classification model through an interpretable algorithm;

The forward label pixel filtering module is set to select forward label pixels in the sample image based on the weight value of each pixel;

The image segmentation sample generation module is configured to form an image segmentation sample corresponding to the image classification sample based on the forward label pixels and classification labels of the sample image.

According to another aspect of the present disclosure, a pre-training device for an image segmentation model is provided, including:

The sample set acquisition module is set to obtain the image classification sample set;

The image segmentation sample set generation module is configured to use the above-mentioned image segmentation sample generation method to process each image classification sample in the image classification sample set, and generate an image segmentation sample set corresponding to the image classification sample set;

The pre-trained image segmentation model acquisition module is configured to train all model parameters included in the preset machine learning model according to the image segmentation sample set to obtain a pre-trained image segmentation model.

According to another aspect of the present disclosure, an electronic device is provided, including:

one or more processors;

a storage device configured to store one or more programs;

When the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the above-mentioned generation method of image segmentation samples, or the above-mentioned pre-training of the image segmentation model.

According to another aspect of the present disclosure, a non-transitory computer-readable storage medium of computer instructions is provided, wherein the computer instructions are used to cause the computer to execute the above-mentioned generation method of image segmentation samples, or the above-mentioned prediction of the image segmentation model. train.

Description of the drawings

Figure 1 is a flow chart of a method for generating image segmentation samples provided by an embodiment of the present disclosure;

Figure 2a is a flow chart of another method for generating image segmentation samples provided by an embodiment of the present disclosure;

Figure 2b is a flow chart of another method for generating image segmentation samples provided by an embodiment of the present disclosure;

Figure 3 is a flow chart of a pre-training method for an image segmentation model provided by an embodiment of the present disclosure;

Figure 4a is a flow chart of another pre-training method for an image segmentation model provided by an embodiment of the present disclosure;

Figure 4b is a flow chart of another pre-training method for an image segmentation model provided by an embodiment of the present disclosure;

Figure 5 is a logical schematic diagram of pixel weight value denoising provided by an embodiment of the present disclosure;

Figure 6 is a schematic diagram before binarization processing of an average result provided by an embodiment of the present disclosure;

Figure 7 is a schematic diagram after binarization processing of an average result provided by an embodiment of the present disclosure;

Figure 8 is a schematic diagram of a publicly provided device for generating image segmentation samples;

Figure 9 is a schematic diagram of a publicly provided pre-training device for an image segmentation model;

Figure 10 is a schematic diagram of another publicly provided pre-training device for an image segmentation model;

Figure 11 is a schematic diagram of another publicly provided pre-training device for an image segmentation model;

Figure 12 is a schematic block diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed ways

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the present disclosure are included to facilitate understanding, and they should be considered to be exemplary only. For the sake of clarity and conciseness, descriptions of well-known functions and structures as well as functions and structures that are less relevant to the embodiments described below are omitted from the following description.

In one example, Figure 1 is a flow chart of a method for generating image segmentation samples provided by an embodiment of the present disclosure. This embodiment can be applied to the case of adding classification labels to pixels in the sample image. This method can be based on the image The segmented sample is generated by a device, which can be implemented by at least one of software and hardware, and can generally be integrated in an electronic device. Correspondingly, as shown in Figure 1, the method includes the following operations:

S110. Obtain image classification samples.

Image classification samples can be understood as images labeled with classification labels. Image classification samples can generally be used as training samples for training image classification models.

The image classification sample may include sample images and classification labels of the sample images. Sample images can be images in any scene. Classification labels can be used to characterize the category to which a sample image belongs. For example, the classification label of a sample image containing zebras can be zebra or animal, and the classification label of a sample image containing apples can be apples or fruits, etc. The embodiments of the present disclosure do not limit the sample images and the classification labels of the sample images.

In the embodiment of the present disclosure, a sample image of any scene can be obtained, and a classification label corresponding to the sample image can be determined, so that the sample image and the classification label matching the sample image are used as image classification samples.

Generally speaking, there are a large number of public image classification sample data sets on the Internet, involving multiple scenarios, and the required image classification samples can be obtained from the above image classification sample data sets.

S120. Determine the weight value of each pixel of the sample image under the action of the image classification model through an interpretable algorithm.

The interpretable algorithm can be an algorithm that makes the model prediction results interpretable and is used to establish the correlation between the model input data and the model prediction process. The image classification model can be any machine learning model that can identify the category to which the image belongs. The types of image classification models can include implementations based on convolutional neural networks (Convolutional Neural Networks, CNN) or recurrent neural networks (Recurrent Neural Networks, RNN).

In the embodiment of the present disclosure, an interpretable algorithm can be selected from known interpretable algorithms, such as an interpretable algorithm based on gradients, or an interpretable algorithm based on integral gradients, etc., and then the sample image is input to the image After the classification model is established, each pixel in the sample image can be determined based on the interpretable algorithm, and the degree of effect it plays when the image classification model determines the category to which the sample image belongs, and the degree of effect is quantified and used as the weight value of the pixel.

The higher the weight value of a pixel in the sample image, the greater the degree of reference the image classification model will have to the pixel when determining the category to which the sample image belongs (that is, the classification label). In turn, the pixel will be The probability that a point belongs to the classification object pixel corresponding to the classification label is greater.

For example, in a sample image containing a zebra, the higher the weight value of a pixel, the greater the probability that the pixel is a pixel used to form a zebra pattern in the sample image.

S130. Select the forward label pixel in the sample image according to the weight value of each pixel.

The forward label pixels can be determined from all pixels in the sample image according to the weight value. part of the pixels.

In the embodiment of the present disclosure, multiple pixels in the sample image can be filtered according to the preset filtering conditions of the pixels and the weight value of each pixel, and the forward label pixels in the sample image can be obtained. That is, the pixels that determine the sample label of the sample image are obtained, and the pixels in the sample image that play an important role in image classification are obtained based on the interpretable algorithm.

The filtering conditions for pixels may include the number of pixels to be filtered out, or the weight value of the pixels being greater than a set threshold, etc.

S140: Form an image segmentation sample corresponding to the image classification sample based on the forward label pixels and classification labels of the sample image.

Image segmentation samples can be understood as images marked with segmentation labels, and segmentation labels can be understood as classification labels that mark segmentation objects (for example, the zebra pattern in the aforementioned example image) in the image in units of pixels. Image segmentation samples can generally be used as training samples for training image segmentation models. The image segmentation model that has completed training has image segmentation capabilities, that is, it can identify objects with different attributes (classification labels) in the image.

In the embodiment of the present disclosure, the forward label pixels and the classification labels matching the forward label pixels can be used as image segmentation samples corresponding to the image classification samples, and then the image segmentation model is trained through the image segmentation samples, so as to The image segmentation model trained with image segmentation samples is used to segment the input image, achieving simple and convenient generation of image segmentation samples based on the technology of image classification samples.

The technical solution of the embodiment of the present disclosure is to obtain an image classification sample including a sample image and a classification label of the sample image, and then determine the weight value of each pixel of the sample image under the action of the image classification model through an interpretable algorithm, According to the weight value of each pixel, the forward label pixels are selected in the sample image, thereby forming an image segmentation sample corresponding to the image classification sample according to the forward label pixels and classification labels of the sample image. Technical means can be used to The interpretation algorithm obtains the pixels in the sample image that play an important role in image classification. Therefore, it can simply, conveniently and accurately add corresponding classification labels to the pixels that play an important role in the sample image to form an image segmentation sample. A large number of image segmentation samples can be obtained at a very small cost, which improves the generation efficiency of image segmentation samples and improves the pre-training performance of the image segmentation model to a certain extent.

In one example, Figure 2a is a flow chart of a method for generating image segmentation samples provided by an embodiment of the present disclosure. This embodiment provides an interpretable algorithm that determines each sample image under the action of the image classification model. An implementation of the weight value of pixels. Correspondingly, as shown in Figure 2a, the method includes the following operations:

S210. Obtain image classification samples.

Image classification samples include sample images and classification labels of the sample images.

S220. Input the model parameters of the sample image and the image classification model into an algorithm model that matches the interpretable algorithm, and obtain the weight value of each pixel in the sample image.

The weight value is used to measure the importance of pixels in the classification process of sample images by the image classification model. The model parameters may be configuration parameters of the image classification model, for example, weight coefficients.

In the embodiment of the present disclosure, after obtaining the sample image, the model parameters of the image classification model and an interpretable algorithm selected from known interpretable algorithms can be determined, and then the sample image and the model parameters of the image classification model can be input To the algorithm model that matches the selected interpretable algorithm, use the algorithm model to calculate the importance of each pixel in the image classification model's classification process of the sample image, and then compare it with the pixel in the image classification model's classification of the sample The degree of importance in the image classification process is quantified, and the weight value of the pixel in the sample image is obtained. That is, through the algorithm model of the interpretable algorithm, the action process of the image classification model can be explained, and a quantitative value of the degree of adjustment of the model parameters by the image classification model according to the pixels in the sample image is given, which facilitates rapid positioning in the classification process of the sample image. important pixels.

In one embodiment of the present disclosure, inputting the model parameters of the sample image and the image classification model into an algorithm model that matches the interpretable algorithm, and obtaining the weight value of each pixel in the sample image may include: obtaining multiple Image classification model; input the model parameters of the sample image and each image classification model into the algorithm model respectively, and obtain the single model weight value of each pixel in the sample image under the action of each image classification model; add the same model in the sample image The multiple single model weight values at the pixel position are weighted and averaged to obtain the weight value of the pixel in the sample image.

The single model weight value can be the weight value of the pixel determined by the algorithm model of the interpretable algorithm based on a single image classification model and the sample image.

In the embodiment of the present disclosure, multiple trained image classification models can be obtained, and then the model parameters of each image classification model can be obtained, and the sample image and the model parameters matching the current image classification model can be input to the algorithm model of the interpretable algorithm. , obtain the single model weight value of each pixel of the sample image during the classification of the sample image by the current image classification model. By analogy, after obtaining the single model weight value of each pixel of the sample image under the action of each image classification model, multiple single model weight values of the sample image at the same pixel position are determined according to the pixel position, that is, the sample image is determined Multiple single model weight values corresponding to the same pixel are obtained, and then the weighted average processing of multiple single model weight values matching the pixel is performed to obtain the weight value of the pixel in the sample image. Since a single image classification model may have poor model accuracy, multiple image classification models are used to obtain the single model weight value of each pixel, and then the weighted average result of the single model weight value is used as the image The weight value of the prime point can avoid the problem of large weight value calculation error caused by the accuracy of a single image classification model itself.

S230. Select the forward label pixel in the sample image according to the weight value of each pixel.

In one embodiment of the present disclosure, selecting forward label pixels in the sample image based on the weight value of each pixel may include: determining the pixels based on the total number of pixels in the sample image and a preset selection ratio. The number of selected points; in order of weight value from large to small, select the forward label pixels that match the number of selected pixels.

The total number of pixels may be the total number of pixels included in the image. The selection ratio can be preset, describing the ratio of filtering pixels from the pixels of the sample image. The number of pixels to be selected can be determined based on the total number of pixels in the sample image and the selection ratio, and the number of pixels that need to be filtered from the pixels in the sample image.

In the embodiment of the present disclosure, the total number of pixels of the sample image can be determined first, and then the preset selection ratio can be obtained, and then the number of pixels to be selected can be determined based on the product of the total number of pixels of the sample image and the selection ratio, and the number of pixels to be selected can be determined in order from largest to largest. The weight values of multiple pixels in the sample image are sorted in order of small, so that the pixels matching the selected number of pixels are selected in order of weight values from large to small, and the selected pixels are used as forward label pixels. In the actual image classification process, the classification of the image can be determined based on the key pixels. By selecting the forward label pixels that match the selected number of pixels in order of weight values from large to small, the image classification can be filtered out In the process, the positive label pixels that have a greater impact will be reduced to reduce the data processing time of subsequent forward label pixels.

S240: Form an image segmentation sample corresponding to the image classification sample based on the forward label pixels and classification labels of the sample image.

In one embodiment of the present disclosure, forming an image segmentation sample corresponding to the image classification sample based on the forward label pixels and the classification label of the sample image may include: using the classification label to classify each forward label pixel in the sample image. Points are labeled to form image segmentation samples.

In the embodiment of the present disclosure, the classification label of the sample image can be obtained, and then each forward label pixel in the sample image is labeled with a classification label that matches the sample image to which the forward label pixel belongs, forming an image segmentation sample. Since forward label pixels are pixels that have a greater impact in the image classification process, using forward label pixels labeled with classification labels as image segmentation samples can minimize the amount of training data while ensuring the training effect. , improve the training speed of the model.

The technical solution of the embodiment of the present disclosure obtains the image classification sample, and then inputs the model parameters of the sample image and the image classification model into an algorithm model that matches the interpretable algorithm, and obtains the weight value of each pixel in the sample image, Therefore, according to the weight value of each pixel, the forward label pixels are selected in the sample image, and based on the forward label pixels and classification labels of the sample image, a The image segmentation sample corresponding to the image classification sample is convenient for quickly locating important pixel points in the classification process of the sample image, and the key pixel points and classification labels are used to generate the image segmentation sample to form an image segmentation sample corresponding to the image classification sample. Technical means use interpretable algorithms to obtain the pixels in the sample image that play an important role in image classification. Therefore, the corresponding classification labels can be added to the pixels that play an important role in the sample image simply, conveniently and with high accuracy. Forming image segmentation samples can obtain a large number of image segmentation samples at a very small cost, improve the generation efficiency of image segmentation samples, and improve the pre-training performance of the image segmentation model to a certain extent.

In one example, Figure 2b is a flow chart of another method for generating image segmentation samples provided by an embodiment of the present disclosure. As shown in Figure 2b, the method includes:

S2100. Obtain image classification samples.

S2110. Input the model parameters of the sample image and the image classification model into an algorithm model that matches the interpretable algorithm, and obtain the weight value of each pixel in the sample image.

In one embodiment of the present disclosure, SS2110 may include:

S2111. Obtain multiple image classification models.

S2112. Input the sample image and the model parameters of each image classification model into the algorithm model respectively, and obtain the single model weight value of each pixel of the sample image under the action of each image classification model.

S2113. Perform a weighted average of multiple single model weight values at the same pixel position in the sample image to obtain the weight value of the same pixel point in the sample image.

S2120. Determine the number of pixels to be selected based on the total number of pixels in the sample image and the preset selection ratio.

S2130. Select the forward label pixels that match the selected number of pixels in order from large to small weight values.

S2140. Use the classification label of the sample image to label each forward label pixel in the sample image to form an image segmentation sample.

The technical solution of the embodiment of the present disclosure obtains image classification samples, obtains multiple image classification models, and inputs the sample image and the model parameters of each image classification model into the algorithm model respectively to obtain the sample image in each image classification. Under the action of the model, the single model weight value of each pixel is weighted and averaged to obtain the weight value of the pixel in the sample image, and then based on the weight value of the pixel in the sample image. the total number of pixels, and the pre- Assume the selection ratio, determine the number of pixels to be selected, and select the forward label pixels that match the number of pixels selected in order of weight value from large to small, thereby forming a formula based on the forward label pixels and classification labels of the sample image. The technical means of image segmentation samples corresponding to image classification samples uses an interpretable algorithm to obtain the pixels in the sample image that play an important role in image classification. Therefore, it can be simple, convenient and highly accurate to play an important role in the sample image. Adding corresponding classification labels to the affected pixels forms image segmentation samples, which can obtain a large number of image segmentation samples at a very small cost, improve the generation efficiency of image segmentation samples, and improve the pre-training performance of the image segmentation model to a certain extent. .

In one example, FIG. 3 is a flow chart of a pre-training method for an image segmentation model provided by an embodiment of the present disclosure. This embodiment can be applied to the case of pre-training an image segmentation model. The method can be composed of an image segmentation model. The pre-training device is implemented by at least one of software and hardware, and can generally be integrated into an electronic device. Correspondingly, as shown in Figure 3, the method includes the following operations:

S310. Obtain the image classification sample set.

The image classification sample set may be a collection of image classification samples.

In the embodiment of the present disclosure, multiple image classification samples in any scenario can be obtained to obtain an image classification sample set including multiple image classification samples.

S320. Use the image segmentation sample generation method to process each image classification sample in the image classification sample set, and generate an image segmentation sample set corresponding to the image classification sample set.

The image segmentation sample set may be a set of image segmentation samples generated according to the method for generating image segmentation samples in any of the above embodiments.

In the embodiments of the present disclosure, the forward label pixels and classification labels of the sample images of each image classification sample in the image classification sample set can be determined according to the method for generating image segmentation samples in any of the above embodiments, so that according to each The forward label pixels and classification labels of the sample images of the image classification samples respectively form image segmentation samples corresponding to the image classification samples, that is, the image segmentation sample set corresponding to the image classification sample set is obtained.

S330. According to the image segmentation sample set, train all model parameters included in the preset machine learning model to obtain a pre-trained image segmentation model.

The pre-trained image segmentation model may be a model obtained by training a preset machine learning model through an image segmentation sample set.

In the embodiment of the present disclosure, a preset machine learning model can be obtained and the preset machine learning model can be determined Learn all model parameters included in the model, and then use the image segmentation sample set to train all model parameters included in the preset machine learning model. Use the trained machine learning model as a pre-trained image segmentation model, and then use the pre-trained image The segmentation model performs image segmentation on the input image.

The technical solution of the embodiment of the present disclosure is to obtain an image classification sample set and then use an image segmentation sample generation method to process each image classification sample in the image classification sample set to generate an image segmentation sample set corresponding to the image classification sample set. , based on the image segmentation sample set, train all model parameters included in the preset machine learning model to obtain a pre-trained image segmentation model. Embodiments of the present disclosure can use an interpretable algorithm to obtain the pixels in the sample image that have an important impact on image classification based on the image segmentation sample generation method in any of the above embodiments. Therefore, it can be simple, convenient and highly accurate. Add corresponding classification labels to the pixels that play an important role in the sample image to form image segmentation samples, thereby obtaining an image segmentation sample set. A large number of image segmentation sample sets can be obtained at a very small cost, which improves the generation of image segmentation samples. Efficiency, to a certain extent, improves the pre-training performance of the image segmentation model and improves the training effect of the pre-trained model.

In one example, Figure 4a is a flow chart of another pre-training method for an image segmentation model provided by an embodiment of the present disclosure. Correspondingly, as shown in Figure 4a, the method includes the following operations:

S410. Obtain the image classification sample set.

S420: Use an image segmentation sample generation method to process each image classification sample in the image classification sample set, and generate an image segmentation sample set corresponding to the image classification sample set.

S430. According to the image segmentation sample set, train all model parameters included in the preset machine learning model to obtain a pre-trained image segmentation model.

In one embodiment of the present disclosure, before training all model parameters included in the preset machine learning model according to the image segmentation sample set to obtain the pre-trained image segmentation model, it may also include: based on the image classification sample set For all corresponding classification labels, select a heterogeneous label that is different from all classification labels; all pixels in the image segmentation samples in the image segmentation sample set that are not labeled with classification labels are labeled with heterogeneous labels.

The heterogeneous label may be a label that does not exist in all classification labels corresponding to the image segmentation sample set and has nothing to do with the classification of the sample image. Illustratively, heterogeneous labels can be used to identify image backgrounds.

In the embodiment of the present disclosure, all classification labels corresponding to the sample images in the image classification sample set can be obtained, and then all classification labels can be parsed to determine labels that are different from all classification labels as heterogeneous labels, thereby segmenting the images in the sample set. The pixels in the segmented sample except the forward label pixels (that is, at least one pixel that is not labeled with a classification label) are labeled with heterogeneous labels. by annotation Using image segmentation samples with heterogeneous labels to train the machine learning model can make the machine learning model have better pre-training effect, so that the pre-trained image segmentation model can perform accurate image segmentation.

S440. Obtain a standard image segmentation sample set matching the image segmentation task scenario.

The image segmentation task scene may be the scene to which the picture for image segmentation belongs. The standard image segmentation sample set may be an image segmentation sample set matching the image segmentation task scenario.

In the embodiments of the present disclosure, the image segmentation task scene can be determined first, and then an image classification sample set matching the image segmentation task scene can be obtained, and according to the method for generating image segmentation samples in any of the above embodiments, an image segmentation sample set corresponding to the image segmentation sample set can be generated. Standard image segmentation sample set for task scene matching.

S450: Use the standard image segmentation sample set to fine-tune the pre-trained image segmentation model to obtain a target image segmentation model that matches the image segmentation task scenario.

The target image segmentation model may be a model obtained by fine-tuning a pre-trained image segmentation model using a standard image segmentation sample set.

In embodiments of the present disclosure, the pre-trained image segmentation model can be trained using a standard image segmentation sample set, thereby adjusting the model parameters of the pre-trained image segmentation model to obtain a target image segmentation model that matches the image segmentation task scenario, and also That is, the target image segmentation model can perform high-precision image segmentation on images that match the image segmentation task scene.

Training the pre-trained image segmentation model through the standard image segmentation sample set can make the pre-trained model image segmentation model have stronger image segmentation capabilities for images in specific image segmentation task scenarios only on the premise of fine-tuning the model parameters.

Image segmentation task scenarios may include at least one of the following: driving scenarios, medical imaging scenarios, robot perception scenarios, and remote sensing satellite image segmentation scenarios. The image segmentation sample sets in different image segmentation task scenarios have image characteristics unique to the image segmentation task scenario, so that using the image segmentation sample set in the image segmentation task scenario to train the pre-trained model image segmentation model can make the pre-trained model image The segmentation model has stronger image segmentation capabilities for specific image segmentation task scenarios.

The technical solution of the embodiment of the present disclosure is to obtain an image classification sample set and then use an image segmentation sample generation method to process each image classification sample in the image classification sample set to generate an image segmentation sample set corresponding to the image classification sample set. , thereby training all model parameters included in the preset machine learning model according to the image segmentation sample set, obtaining a pre-trained image segmentation model, and obtaining a standard image segmentation sample set that matches the image segmentation task scenario. According to the technical means of forming an image segmentation sample corresponding to the image classification sample in any of the above embodiments, an interpretable algorithm is used to obtain the pixel points in the sample image that have an important impact on the image classification, so it can be simple, convenient and highly accurate By adding corresponding classification labels to the pixels that play an important role in the sample image to form image segmentation samples, a large number of image segmentation samples can be obtained at a very small cost and improve It improves the generation efficiency of image segmentation samples, improves the pre-training performance of the image segmentation model to a certain extent, and enables the pre-trained model image segmentation model to be capable of images in specific image segmentation task scenarios only on the premise of fine-tuning the model parameters. Stronger image segmentation capabilities.

In one example, Figure 4b is a flow chart of another pre-training method for an image segmentation model provided by an embodiment of the present disclosure. Correspondingly, as shown in Figure 4b, the method includes the following operations:

S4100. Obtain the image classification sample set.

S4110. Based on all classification labels corresponding to the image classification sample set, select a heterogeneous label that is different from all classification labels.

S4120. Label all pixels in the image segmentation samples in the image segmentation sample set that are not labeled with classification labels using heterogeneous labels.

S4130. Use the image segmentation sample generation method to process each image classification sample in the image classification sample set, and generate an image segmentation sample set corresponding to the image classification sample set.

S4140: Train all model parameters included in the preset machine learning model according to the image segmentation sample set to obtain a pre-trained image segmentation model.

S4150. Obtain a standard image segmentation sample set matching the image segmentation task scene.

S4160: Use the standard image segmentation sample set to fine-tune the pre-trained image segmentation model to obtain a target image segmentation model that matches the image segmentation task scenario.

The technical solution of the embodiment of the present disclosure is to obtain an image classification sample set, and then select a heterogeneous label that is different from all classification labels based on all classification labels corresponding to the image classification sample set, thereby segmenting the images in the image segmentation sample set. All pixels in the sample that are not labeled with classification labels are labeled with heterogeneous labels. The image segmentation sample generation method is used to process each image classification sample in the image classification sample set to generate an image segmentation corresponding to the image classification sample set. sample set, and train all model parameters included in the preset machine learning model according to the image segmentation sample set to obtain a pre-trained image segmentation model. After obtaining the pre-trained image segmentation model, obtain a standard image segmentation sample set that matches the image segmentation task scene, and use the standard image segmentation sample set to fine-tune the pre-trained image segmentation model to obtain a target image segmentation that matches the image segmentation task scene. Model. According to the technical means of forming an image segmentation sample corresponding to the image classification sample in any of the above embodiments, an interpretable algorithm is used to obtain the pixel points in the sample image that have an important impact on the image classification, so it can be simple, convenient and highly accurate By adding corresponding classification labels to the pixels that play an important role in the sample image to form image segmentation samples, a large number of image segmentation samples can be obtained at a very small cost, which improves the generation efficiency of image segmentation samples to a certain extent. Improves the pre-training performance of image segmentation models and can only fine-tune model parameters before Under the premise, the pre-trained model image segmentation model has stronger image segmentation capabilities for images in specific image segmentation task scenarios.

The complete training process of the image segmentation model in the embodiment of the present disclosure can be divided into two parts: pre-training and downstream task fine-tuning. The pre-training of traditional image segmentation models is only performed on the image classification sample set, and only a part of the backbone of the model is trained. Downstream task fine-tuning is to use the pre-trained model to perform fine-tuning training on a specific image segmentation sample set (a standard image segmentation sample set that matches the image segmentation task scenario) to solve the problem of images in a specific image segmentation task scenario. Split tasks. The complete training process of the image segmentation model is as follows:

(1) Select an image classification sample set (such as ImageNet), an interpretability algorithm (such as an algorithm based on input gradients), and multiple trained image classification models. Through the interpretable algorithm, the important pixels (forward label pixels) in the sample image input to the image classification model can be determined, and these important pixels are more consistent with the labels of image segmentation. For different image classification models (such as deep learning models), the interpretable algorithm acts on the image classification model. For the three primary color output channels of each sample image, the gradient is calculated, the module value is extracted, and the weight value of the pixel is obtained.

(2) In order to reduce the noise of the weight value of the generated pixel point, the interpretable algorithm can be used to perform a weighted average of the single model weight value under multiple image classification models to obtain the average result, that is, the final weight value of the pixel point. Reduce a lot of noise. The logic diagram of pixel weight value denoising can be seen in Figure 5.

(3) Use the average result as an image segmentation pseudo label. However, considering the efficiency of use, the average result is binarized, and the binarized result is used as an image segmentation pseudo-label. See Figure 6 before the binarization process, and Figure 7 after the binarization process. The first 10% of the pixels are selected as positive label pixels, and the remaining pixels (negative pixels) are labeled with heterogeneous labels.

(4) Use the above image segmentation sample generation method to calculate all sample images in the image classification sample set to obtain the corresponding image segmentation sample set.

(5) Pre-training of traditional image classification models uses the classification labels of sample images as supervisory information to perform part of the model parameters of the image classification model (ie, the parameters of a part of the backbone architecture of the image segmentation model). Different from the traditional pre-training of the image segmentation model, the pre-training method of the image segmentation model proposed in the embodiment of the present disclosure uses the image segmentation sample set as supervision information to train the entire image segmentation model. The image segmentation labels in the image segmentation sample set come from two parts: image classification labels and binarized image segmentation pseudo-labels. All positive label pixels will be assigned as the classification label of the sample image, and all negative pixels will be assigned as a background category (heterogeneous label). Because the image segmentation sample set is established based on the image classification sample set, each image The images have both corresponding classification labels and corresponding binarized image segmentation pseudo-labels.

(6) Use the above image segmentation labels as supervision information to pre-train the image segmentation model. After training, this model can also be used directly as an image segmentation model, but the effect is limited.

(7) Use the pre-trained model to perform fine-tuning training on downstream tasks. The difference is that the pre-training model used in traditional operations only has backbone parameters as initialization, while the method of the embodiment of the present disclosure effectively initializes all model parameters of the entire image segmentation model.

The calculation logic of (1)-(4) is: input: an image classification sample set D, K deep image classification models f _k , and an interpretable algorithm A. S1. Calculate multiple single model weight values for each pixel in each sample image I _i in D based on A. S2. For each sample image I _i in D, calculate the mean of the single model weight values of the K depth image classification models f _k to obtain the weight value of each pixel. S3. For each sample image I _i in D, calculate the threshold whose weight value is in the top 10% of the overall weight value, and use this threshold to filter the weight value of the pixel points, and binarize the weight value (such as greater than or The weight value equal to the threshold is set to 1, the weight value smaller than the threshold is set to 0), and the binarization result is used as the image segmentation pseudo label. Output: Image segmentation sample set corresponding to the image classification sample set.

The calculation logic of (5)-(7) is: input: an image classification sample set D (classification label category is Nc), an image segmentation sample set P corresponding to the image classification sample set D, a preset machine learning model f, Image segmentation task H. S1. For each sample image I _i in D, its classification label is d _i . Set the image segmentation label of the pixel corresponding to 1 in the binarization result to d i , and set it to d _i . The image segmentation label of the pixel corresponding to 0 is set as a heterogeneous label. S2. Use the labels in S1 and use conventional deep learning optimization algorithms to train the preset machine learning model f to obtain the pretrained image segmentation model f'. S3. Use f' obtained by training in S2 to fine-tune the image segmentation task H. Output: The target image segmentation model trained on H.

Figure 8 is a schematic diagram of a publicly provided device for generating image segmentation samples. As shown in Figure 8, the device for generating image segmentation samples includes a classification sample acquisition module 510, a weight value determination module 520, and a forward label pixel filtering module 530. and image segmentation sample generation module 540, wherein:

The classification sample acquisition module 510 is configured to obtain image classification samples, where the image classification samples include sample images and classification labels of the sample images; the weight value determination module 520 is configured to determine the sample image in the image classification model through an interpretable algorithm. Under the influence of the weight value of each pixel; the forward label pixel screening module 530 is set to select the forward label pixel in the sample image according to the weight value of each pixel; the image segmentation sample generation module 540, set as root According to the forward label pixels and classification labels of the sample image, an image segmentation sample corresponding to the image classification sample is formed.

In one embodiment, the weight value determination module 520 is configured to input the model parameters of the sample image and the image classification model into an algorithm model that matches the interpretable algorithm, and obtain the weight value of each pixel in the sample image; wherein, The weight value is used to measure the importance of each pixel in the classification process of the sample image by the image classification model.

In one embodiment, the weight value determination module 520 is configured to obtain multiple image classification models; input the sample image and the model parameters of each image classification model into the algorithm model respectively, and obtain the sample image in each image classification model. Under the action, the single model weight value of each pixel is weighted and averaged by multiple single model weight values of the same pixel position in the sample image to obtain the weight value of the same pixel in the sample image.

In one embodiment, the forward label pixel filtering module 530 is configured to determine the number of pixels to be selected based on the total number of pixels in the sample image and the preset selection ratio; in order of the weight value from large to small, select and Select pixels with a matching number of forward label pixels.

In one embodiment, the image segmentation sample generation module 540 is configured to use the classification label to label each forward label pixel in the sample image to form the image segmentation sample.

The above-mentioned device for generating image segmentation samples can execute the method for generating image segmentation samples provided by any embodiment of the present disclosure, and has corresponding functional modules and effects for executing the method for generating image segmentation samples.

Figure 9 is a schematic diagram of a publicly provided pre-training device for an image segmentation model. As shown in Figure 9, the pre-training device for the image segmentation model includes a sample set acquisition module 610, an image segmentation sample set generation module 620 and a pre-training image segmentation module. Model acquisition module 630, wherein:

The sample set acquisition module 610 is configured to obtain an image classification sample set; an image segmentation sample set is generated The generation module 620 is configured to use the image segmentation sample generation method in any of the above embodiments to process each image classification sample in the image classification sample set, and generate an image segmentation sample set corresponding to the image classification sample set; pre-training The image segmentation model acquisition module 630 is configured to train all model parameters included in the preset machine learning model according to the image segmentation sample set to obtain a pre-trained image segmentation model.

Figure 10 is a schematic diagram of another publicly provided pre-training device for the image segmentation model. The pre-training device for the image segmentation model also includes a heterogeneous label labeling module 640. The heterogeneous label labeling module 640 is configured to classify the image according to the sample set. For all corresponding classification labels, select a heterogeneous label that is different from all classification labels; all pixels in the image segmentation samples in the image segmentation sample set that are not labeled with classification labels are labeled with the heterogeneous label.

Figure 11 is a schematic diagram of another publicly provided pre-training device for an image segmentation model. The pre-training device for the image segmentation model also includes a target image segmentation model 650. The target image segmentation model 650 is configured to obtain images matching the image segmentation task scene. Standard image segmentation sample set; use the standard image segmentation sample set to fine-tune the pre-trained image segmentation model to obtain a target image segmentation model that matches the image segmentation task scenario.

In one embodiment, the image segmentation task scene includes at least one of the following: driving scene, medical imaging scene, robot perception scene, and remote sensing satellite image segmentation scene.

The above-mentioned image segmentation model pre-training device can execute the image segmentation model pre-training method provided by any embodiment of the present disclosure, and has corresponding functional modules and effects for executing the image segmentation model pre-training method.

In the technical solution of this disclosure, the acquisition, storage and application of various data involved are in compliance with relevant laws and regulations and do not violate public order and good customs.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a computer-readable storage medium, and a computer program product to implement the methods in the above-mentioned embodiments.

Figure 12 is a schematic block diagram of an electronic device provided by an embodiment of the present disclosure. Electronic device 10 is intended to represent many forms of digital computers, including desktop computers, workstations, personal digital assistants, servers, mainframe computers, and other suitable computers. The components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in Figure 12, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a read-only memory (Read-Only Memory, ROM) 12, a random access memory (Random Access Memory, RAM) 13, etc., wherein the memory stores a computer program that can be executed by at least one processor. The processor 11 can execute according to the computer program stored in the ROM 12 or loaded from the storage unit 18 into the RAM 13. A variety of appropriate actions and treatments. In the RAM 13, various programs and data required for the operation of the electronic device 10 can also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via the bus 14. An input/output (I/O) interface 15 is also connected to the bus 14 .

Multiple components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16, such as a keyboard, a mouse, etc.; an output unit 17, such as various types of displays, speakers, etc.; a storage unit 18, such as a magnetic disk, an optical disk, etc. etc.; and communication unit 19, such as network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunications networks.

Processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the processor 11 include a central processing unit (CPU), a graphics processing unit (GPU), a variety of dedicated artificial intelligence (Artificial Intelligence, AI) computing chips, and a variety of running machine learning models. Algorithm processor, digital signal processor (Digital Signal Processing, DSP), and any appropriate processor, controller, microcontroller, etc. The processor 11 performs multiple methods and processes described above, such as the method for generating image segmentation samples given in any embodiment, or the method for pre-training the image segmentation model. In some embodiments, the given method for generating image segmentation samples, or the method for pre-training the image segmentation model, can be implemented as a computer program, which is tangibly included in a computer-readable storage medium, such as the storage unit 18 . In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19 . When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more operations of the above-described generation method of image segmentation samples, or the pre-training method of the image segmentation model, may be performed. Alternatively, in other embodiments, processor 11 may The formula is configured (eg, by means of firmware) to perform a method of generating image segmentation samples, or a method of pretraining an image segmentation model.

Various implementations of the systems and techniques described above may be implemented in digital electronic circuit systems, integrated circuit systems, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Parts (ASSP), System on Chip (SOC), Complex Programmable Logic Device (CPLD), computer hardware, firmware, software, and/or their realized in combination. Various implementations may include implementation in one or more computer programs that may be executed and/or interpreted on a programmable system including at least one programmable processor that may is a special-purpose or general-purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing. Examples of machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, RAM, ROM, Erasable Programmable Read-Only Memory (EPROM), or flash memory ), optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.

To provide interaction with a user, the systems and techniques described herein may be implemented on a computer having a display device (eg, a cathode ray tube (CRT)) or a liquid crystal display (e.g., a cathode ray tube (CRT)) configured to display information to a user. Liquid Crystal Display (LCD) monitor); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other types of devices can also be configured as Provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including acoustic input, speech input, or tactile feedback). input) to receive input from the user.

The systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: Local Area Network (LAN), Wide Area Network (Wide Area Network, WAN), and the Internet.

Computer systems may include clients and servers. Clients and servers are generally remote from each other and typically interact over a communications network. The relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other. The server can be a cloud server, also known as cloud computing server or cloud host. It is a host product in the cloud computing service system to solve the problems that exist in traditional physical host and virtual private server (VPS) services. It has the disadvantages of difficult management and weak business scalability. The server can also be a distributed system server or a server combined with a blockchain.

Artificial intelligence is the study of using computers to simulate some human thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.). It has both hardware-level technology and software-level technology. Artificial intelligence hardware technology generally includes sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing and other technologies; artificial intelligence software technology mainly includes computer vision technology, speech recognition technology, natural language processing technology and machine learning/depth Learning technology, big data processing technology, knowledge graph technology and other major directions.

Cloud computing refers to a flexible and scalable shared physical or virtual resource pool through network access. Resources can include servers, operating systems, networks, software, applications, storage devices, etc., and can be on-demand and self-service. A technical system for deploying and managing resources. Through cloud computing technology, it can provide efficient and powerful data processing capabilities for artificial intelligence, blockchain and other technology applications and model training.

Operations can be reordered, added, or deleted using various forms of the process shown above. For example, multiple operations recorded in this disclosure can be performed in parallel, sequentially, or in different orders. As long as the desired results of the technical solution provided by this disclosure can be achieved, there is no limitation here.

Claims

A method for generating image segmentation samples, including:

Obtain an image classification sample, wherein the image classification sample includes a sample image and a classification label of the sample image;

Determine the weight value of each pixel of the sample image under the action of the image classification model through an interpretable algorithm;

According to the weight value of each pixel, select the forward label pixel in the sample image;

According to the forward label pixel points and classification labels of the sample image, an image segmentation sample corresponding to the image classification sample is formed.
The method according to claim 1, wherein the determining the weight value of each pixel of the sample image under the action of the image classification model through an interpretable algorithm includes:

Input the model parameters of the sample image and the image classification model into an algorithm model that matches the interpretable algorithm, and obtain the weight value of each pixel in the sample image;

The weight value is used to measure the importance of each pixel in the classification process of the sample image by the image classification model.
The method according to claim 2, wherein the model parameters of the sample image and the image classification model are input into an algorithm model that matches an interpretable algorithm, and each pixel point in the sample image is obtained. The weight values include:

Get multiple image classification models;

Input the model parameters of the sample image and each image classification model into the algorithm model respectively, and obtain the single model weight value of each pixel of the sample image under the action of each image classification model;

A weighted average of multiple single model weight values at the same pixel position in the sample image is performed to obtain the weight value of the same pixel point in the sample image.
The method according to claim 1, wherein selecting forward label pixels in the sample image according to the weight value of each pixel includes:

Determine the number of pixels to be selected based on the total number of pixels in the sample image and the preset selection ratio;

In order of weight value from large to small, forward label pixels matching the selected number of pixels are selected.
The method according to claim 1, wherein forming an image segmentation sample corresponding to the image classification sample based on the forward label pixels and classification labels of the sample image includes:

Use the classification label to label each forward label pixel in the sample image to form the image segmentation sample.
A pre-training method for image segmentation models, including:

Obtain image classification sample set;

Using the method for generating image segmentation samples according to any one of claims 1 to 5, each image classification sample in the image classification sample set is processed to generate an image segmentation sample set corresponding to the image classification sample set;

According to the image segmentation sample set, all model parameters included in the preset machine learning model are trained to obtain a pre-trained image segmentation model.
The method according to claim 6, before training all model parameters included in the preset machine learning model according to the image segmentation sample set to obtain a pre-trained image segmentation model, it also includes:

According to all classification labels corresponding to the image classification sample set, select a heterogeneous label that is different from all classification labels;

All pixels in the image segmentation samples in the image segmentation sample set that are not labeled with classification labels are labeled using the heterogeneous labels.
The method according to claim 7, after training all model parameters included in the preset machine learning model according to the image segmentation sample set to obtain a pre-trained image segmentation model, it further includes:

Obtain a standard image segmentation sample set that matches the image segmentation task scenario;

The pre-trained image segmentation model is fine-tuned using the standard image segmentation sample set to obtain a target image segmentation model that matches the image segmentation task scenario.
The method according to claim 8, wherein the image segmentation task scene includes at least one of the following: driving scene, medical imaging scene, robot perception scene and remote sensing satellite image segmentation scene.
A device for generating image segmentation samples, including:

A classification sample acquisition module configured to obtain an image classification sample, wherein the image classification sample includes a sample image and a classification label of the sample image;

The weight value determination module is configured to determine the weight value of each pixel of the sample image under the action of the image classification model through an interpretable algorithm;

A forward label pixel screening module is configured to select forward label pixels in the sample image based on the weight value of each pixel;

The image segmentation sample generation module is configured to form an image segmentation sample corresponding to the image classification sample based on the forward label pixels and classification labels of the sample image.
A pre-training device for image segmentation models, including:

The sample set acquisition module is set to obtain the image classification sample set;

The image segmentation sample set generation module is configured to use the image segmentation sample generation method as described in any one of claims 1 to 5 to process each image classification sample in the image classification sample set and generate the image classification sample set. Corresponding image segmentation sample set;

The pre-trained image segmentation model acquisition module is configured to train all model parameters included in the preset machine learning model according to the image segmentation sample set to obtain a pre-trained image segmentation model.
An electronic device including:

at least one processor;

a storage device configured to store at least one program;

When the at least one program is executed by the at least one processor, the at least one processor implements the method for generating image segmentation samples according to any one of claims 1-5, or, as claimed in claims 6-9 The pre-training method of the image segmentation model described in any one of the above.
A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the method for generating image segmentation samples according to any one of claims 1-5, or, according to claims 6- The pre-training method of the image segmentation model described in any one of 9.
A computer program product, including a computer program that, when executed by a processor, implements the method for generating image segmentation samples according to any one of claims 1-5, or implements any one of claims 6-9. Pre-training method for the image segmentation model described above.