CN114882315A

CN114882315A - Sample generation method, model training method, device, equipment and medium

Info

Publication number: CN114882315A
Application number: CN202210567293.6A
Authority: CN
Inventors: 李徐泓; 熊昊一; 刘毅; 窦德景
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-05-23
Filing date: 2022-05-23
Publication date: 2022-08-09
Anticipated expiration: 2042-05-23
Also published as: CN114882315B; WO2023226606A1

Abstract

The disclosure provides a sample generation method, a model training method, a device, equipment and a medium, and relates to the field of computer vision, in particular to the field of image segmentation. The generation method of the image segmentation sample comprises the following steps: obtaining an image classification sample, wherein the image classification sample comprises: a sample image, and a classification label for the sample image; determining the weight value of each pixel point of the sample image under the action of the image classification model through an interpretable algorithm; selecting forward label pixel points from the sample image according to the weighted values of the pixel points; and forming image segmentation samples corresponding to the image classification samples according to the forward label pixel points and the classification labels. The method can simply, conveniently and high-precisely add the corresponding classification label on each pixel point of the sample image to form the image segmentation sample.

Description

Sample generation method, model training method, device, equipment and medium

Technical Field

The present disclosure relates to the field of computer vision, and in particular, to the field of image segmentation, and more particularly, to a sample generation method, a model training method, an apparatus, a device, and a medium.

Background

Image semantic segmentation is a traditional task in the field of computer vision, and aims to identify and classify each pixel point in an image. The image segmentation technology can be applied to various scenes, such as driving scene understanding, medical image analysis and the like, and at present, the image semantic segmentation is realized mainly by obtaining an image segmentation model through training.

Theoretically, the image segmentation model can be obtained through a large number of image segmentation sample training. However, considering that the labeling difficulty of the image segmentation samples is large, the cost is high, and the time is long, in the prior art, the existing image classification samples are mainly used for pre-training the model backbone parameters of the deep learning model, and then the image segmentation samples are continuously used for readjusting the deep learning model, so as to finally obtain the required image segmentation model.

However, the pre-training process of the image segmentation model can only pre-train the backbone parameters of the model, and cannot effectively pre-train all model parameters, so that the performance of the whole pre-training process is poor.

Disclosure of Invention

The disclosure provides a sample generation method, a model training method, a device, equipment and a medium.

According to an aspect of the present disclosure, there is provided a method for generating an image segmentation sample, including:

obtaining an image classification sample, wherein the image classification sample comprises: a sample image, and a classification label for the sample image;

determining the weight value of each pixel point of the sample image under the action of the image classification model through an interpretable algorithm;

selecting forward label pixel points from the sample image according to the weighted values of the pixel points;

and forming image segmentation samples corresponding to the image classification samples according to the forward label pixel points and the classification labels.

According to another aspect of the present disclosure, there is provided a pre-training method of an image segmentation model, including:

acquiring an image classification sample set;

processing each image classification sample in an image classification sample set by adopting the image segmentation sample generation method in any embodiment of the disclosure to generate an image segmentation sample set corresponding to the image classification sample set;

and training all model parameters included in a preset machine learning model according to the image segmentation sample set to obtain a pre-training image segmentation model.

According to another aspect of the present disclosure, there is provided an image segmentation sample generation apparatus including:

the classified sample acquisition module is used for acquiring image classified samples, and the image classified samples comprise: a sample image, and a classification label for the sample image;

the weight value determining module is used for determining the weight value of each pixel point of the sample image under the action of the image classification model through an interpretable algorithm;

the forward label pixel point screening module is used for selecting forward label pixel points from the sample image according to the weight values of all the pixel points;

and the image segmentation sample generation module is used for forming image segmentation samples corresponding to the image classification samples according to the forward label pixel points and the classification labels.

According to another aspect of the present disclosure, there is provided a pre-training apparatus for an image segmentation model, including:

the sample set acquisition module is used for acquiring an image classification sample set;

the image segmentation sample set generation module is used for processing each image classification sample in the image classification sample set by adopting the image segmentation sample generation method in any embodiment of the disclosure to generate an image segmentation sample set corresponding to the image classification sample set;

and the pre-training image segmentation model acquisition module is used for training various model parameters included in a preset machine learning model according to the image segmentation sample set to obtain a pre-training image segmentation model.

According to another aspect of the present disclosure, there is provided an electronic device including:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a method of generating image segmentation samples as in any embodiment of the present disclosure, or pre-training of an image segmentation model as in any embodiment of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium of computer instructions, wherein the computer instructions are for causing a computer to perform the method of generating an image segmentation sample in any embodiment of the present disclosure, or pre-training of an image segmentation model as in any embodiment of the present disclosure.

According to the technical scheme of the embodiment of the invention, the corresponding classification label can be added to each pixel point of the sample image simply, conveniently and accurately to form the image segmentation sample.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a flowchart of a method for generating an image segmentation sample according to an embodiment of the present disclosure;

fig. 2a is a flowchart of a method for generating an image segmentation sample according to an embodiment of the present disclosure;

fig. 2b is a flowchart of another method for generating an image segmentation sample according to an embodiment of the present disclosure;

FIG. 3 is a flowchart of a pre-training method of an image segmentation model provided by an embodiment of the present disclosure;

FIG. 4a is a flowchart of a pre-training method of an image segmentation model according to an embodiment of the present disclosure;

FIG. 4b is a flowchart of a pre-training method of an image segmentation model according to an embodiment of the present disclosure;

fig. 5 is a logic diagram illustrating denoising of a weighted value of a pixel according to an embodiment of the disclosure.

Fig. 6 is a schematic diagram before an average result binarization process provided by an embodiment of the present disclosure;

fig. 7 is a schematic diagram after an average result binarization process provided by the embodiment of the disclosure;

FIG. 8 is a schematic diagram of a publicly provided image segmentation sample generation device;

FIG. 9 is a schematic diagram of a pre-training apparatus for an image segmentation model provided in the disclosure;

FIG. 10 shows a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In an example, fig. 1 is a flowchart of a method for generating an image segmentation sample according to an embodiment of the present disclosure, where the embodiment is applicable to a case where a classification label is added to each pixel point in a sample image, and the method may be performed by an apparatus for generating an image segmentation sample, where the apparatus may be implemented by at least one of software and hardware, and may be generally integrated in an electronic device. Accordingly, as shown in fig. 1, the method comprises the following operations:

and step 110, obtaining an image classification sample.

The image classification sample can be understood as an image labeled with a classification label. The image classification samples can be generally used as training samples for training an image classification model.

The image classification sample may include a sample image and a classification label of the sample image. The sample image may be an image of an arbitrary scene. The classification labels may be used to characterize the class to which the sample image belongs. For example, a classification label of a sample image containing zebra may be zebra or animal, and a classification label of a sample image containing apple may be apple or fruit. The embodiment of the present disclosure does not limit the sample picture and the classification label of the sample image.

In the embodiment of the disclosure, a sample image of an arbitrary scene may be acquired, and a classification label corresponding to the sample image may be determined, so that the sample image and the classification label matching with the sample image are used as an image classification sample.

Generally, a large number of public image classification sample data sets are available on the internet, and relate to each scene, so that required image classification samples can be obtained from the image classification sample data sets.

And 120, determining the weight value of each pixel point of the sample image under the action of the image classification model through an interpretable algorithm.

The interpretable algorithm can be an algorithm which enables model prediction results to be interpretable and is used for establishing association between model input data and a model prediction process. The image classification model may be any machine learning model that is capable of identifying the class to which the image belongs. The type of image classification model may include implementation based on CNN (Convolutional Neural Networks), RNN (Recurrent Neural Networks), or the like.

In the disclosed embodiment, one interpretable algorithm may be selected from known interpretable algorithms, such as: and after the sample image is input into the image classification model, determining each pixel point in the sample image according to the interpretable algorithm, determining the action degree of the sample image in the image classification model when the sample image belongs to classification, and quantizing the action degree to be used as the weight value of each pixel point.

It can be understood that, the higher the weight value of a pixel point in a sample image is, the greater the reference degree of the pixel point is when determining the belonged classification (that is, the classification label) of the sample image by the image classification model, and further, the greater the probability that the pixel point belongs to the classification object pixel point corresponding to the classification label is.

For example, in a sample image containing zebra, the higher the weight value of a pixel point is, the higher the probability that the pixel point is a pixel point for forming a zebra pattern in the sample image is.

And step 130, selecting forward label pixel points from the sample image according to the weighted values of the pixel points.

The forward label pixel points can be partial pixel points determined from all pixel points of the sample image according to the weight value.

In the embodiment of the present disclosure, each pixel point in the sample image may be screened according to the screening condition of the preset pixel point and the weighted value of each pixel point, so as to obtain a forward label pixel point in the sample image, that is, a pixel point determining the sample label of the sample image, thereby obtaining a pixel point playing an important role in image classification in the sample image based on the interpretable algorithm.

Optionally, the screening condition of the pixel points may include, but is not limited to, the number of the pixel points that need to be screened, or the weighted value of the pixel points is greater than a set threshold, and the like.

And 140, forming image segmentation samples corresponding to the image classification samples according to the forward label pixel points and the classification labels.

The image segmentation sample can be understood as an image labeled with a segmentation label, and the segmentation label can be understood as a classification label labeling a segmentation object (for example, a zebra pattern in the foregoing example image) in the image by taking a pixel point as a unit. The image segmentation samples may be used as training samples for training the image segmentation model. The trained image segmentation model has the image segmentation capability, namely, the trained image segmentation model can identify objects with different attributes (classification labels) in the image.

In the embodiment of the disclosure, the forward label pixel points and the classification labels matched with the forward label pixel points can be used as image segmentation samples corresponding to the image classification samples, and then the image segmentation model is further trained through the image segmentation samples, so that the image segmentation model trained by using the image segmentation samples performs image segmentation on the input image, and the image segmentation samples are generated simply and conveniently in the technology of the image classification samples.

The technical scheme of the embodiment of the disclosure includes that the image classification samples including the sample images and the classification labels of the sample images are obtained, the weight values of all pixel points of the sample images under the action of the image classification model are determined through the interpretable algorithm, further, the forward label pixel points are selected from the sample images according to the weight values of all the pixel points, so that the image segmentation samples corresponding to the image classification samples are formed according to the forward label pixel points and the classification labels, the pixel points which play an important role in image classification in the sample images are obtained through the interpretable algorithm, therefore, the corresponding classification labels are added to all the pixel points which play an important role in the sample images to form the image segmentation samples simply, conveniently and accurately, a large number of image segmentation samples can be obtained with very small cost, and the generation efficiency of the image segmentation samples is improved, the pre-training performance of the image segmentation model is improved to a certain extent.

In an example, fig. 2a is a flowchart of a method for generating an image segmentation sample according to an embodiment of the present disclosure, and this embodiment provides an optional implementation manner for determining a weight value of each pixel point of a sample image under the effect of an image classification model through an interpretable algorithm. Accordingly, as shown in fig. 2a, the method comprises the following operations:

and step 210, obtaining an image classification sample.

Step 220, inputting the model parameters of the sample image and the image classification model into an algorithm model matched with the interpretable algorithm, and obtaining the weight value of each pixel point in the sample image.

The weighted values are used for measuring the importance degree of each pixel point in the classification process of the image classification model on the sample image. The model parameters may be configuration parameters of the image classification model, e.g. weighting coefficients.

In the embodiment of the present disclosure, after a sample image is obtained, a model parameter of an image model may be further determined, and an interpretable algorithm selected from known interpretable algorithms may be further determined, and then the model parameters of the sample image and the image classification model are input into an algorithm model matched with the selected interpretable algorithm, the algorithm model is used to calculate the importance degree of each pixel point in the classification process of the image classification model on the sample image, and then the importance degree of each pixel in the classification process of the image classification model on the sample image is quantized, so as to obtain the weight value of each pixel point in the sample image. The image classification model can be used for analyzing the action process of the image classification model through the algorithm model capable of analyzing the algorithm, and the quantitative value of the image classification model according to the adjustment degree of the pixel points in the sample image to the model parameters is given, so that the important pixel points in the classification process of the sample image can be conveniently and quickly positioned.

In an optional embodiment of the present disclosure, inputting the model parameters of the sample image and the image classification model into an algorithm model matched with the interpretable algorithm, and obtaining the weight value of each pixel point in the sample image may include: acquiring a plurality of image classification models; respectively inputting the sample image and the model parameters of each image classification model into an algorithm model, and acquiring a single model weight value of each pixel point of the sample image under the action of each image classification model; and carrying out weighted average on a plurality of single model weight values of the same pixel position in the sample image to obtain the weight value of each pixel point in the sample image.

The single model weight value may be a weight value of a pixel point determined by an algorithm model capable of interpreting an algorithm according to the single image classification model and the sample image.

In the embodiment of the present disclosure, a plurality of trained image classification models may be obtained, so as to obtain model parameters of each image classification model, further input the sample image and the model parameters matched with the current image classification model into an algorithm model capable of interpreting an algorithm, and obtain a single model weight value of each pixel point of the sample image in a process of classifying the sample image by the current image classification model. By analogy, after the single model weight value of each pixel point of the sample image under the action of each image classification model is obtained, a plurality of single model weight values of the sample image at the same pixel position are determined according to the pixel position, namely, a plurality of single model weight values corresponding to each pixel point in the sample image are respectively determined, and then the plurality of single model weight values matched with each pixel point are respectively subjected to weighted average processing to obtain the weight value of each pixel point in the sample image. Because a single image classification model possibly has the problem of poor model precision, a plurality of image classification models are utilized to obtain the single model weight value of each pixel point, and then the weighted average result of the single model weight values is used as the weight value of the pixel point, so that the problem of large weight value calculation error caused by the precision problem of the single image classification model can be avoided.

And step 230, selecting the forward label pixel points from the sample image according to the weighted values of the pixel points.

In an optional embodiment of the present disclosure, selecting a forward label pixel point in the sample image according to a weighted value of each pixel point may include: determining the number of selected pixels according to the total number of the pixels in the sample image and a preset selection proportion; and selecting forward label pixel points matched with the selected number of the pixel points according to the sequence of the weighted values from large to small.

The total number of pixels may be the total number of pixels included in the image. The selection proportion can be preset and describes the proportion of screening pixel points from the pixels of the sample image. The number of the selected pixels can be determined according to the total number of the pixels of the sample image and the selection proportion, and the number of the pixels needing to be screened from the pixels of the sample image is determined.

In the embodiment of the disclosure, the total number of the pixels of the sample image may be determined first, and then the preset selection ratio is obtained, and then the selection number of the pixels is determined according to the product of the total number of the pixels of the sample image and the selection ratio, and further the weighted values of the pixels in the sample image are sorted according to the descending order, so that the pixels matching the selection number of the pixels are selected from the descending order of the weighted values, and the selected pixels are used as the forward label pixels. In the actual image classification process, the classification of the image can be determined according to the key pixel points, forward label pixel points matched with the pixel point selection number are selected according to the sequence of the weighted values from large to small, the forward label pixel points with large influence in the image classification process can be screened, and the data processing duration of the follow-up forward label pixel points is reduced.

And 240, forming an image segmentation sample corresponding to the image classification sample according to the forward label pixel point and the classification label.

In an optional embodiment of the present disclosure, forming an image segmentation sample corresponding to the image classification sample according to the forward label pixel point and the classification label may include: and labeling each forward label pixel point in the sample image by using the classification label to form an image segmentation sample.

In the embodiment of the disclosure, the classification label associated with the sample image can be obtained, and then the classification label matched with the sample image to which the forward label pixel belongs is labeled for each forward label pixel in the sample image, so as to form the image segmentation sample. Because the forward label pixel points are pixel points which have a large influence in the image classification process, the forward label pixel points marked with classification labels are used as image segmentation samples, the number of training data can be reduced to the maximum extent on the premise of ensuring the training effect, and the training speed of the model is improved.

According to the technical scheme of the embodiment of the disclosure, the image classification sample is obtained, the model parameters of the sample image and the image classification model are input into the algorithm model matched with the interpretable algorithm, the weighted value of each pixel point in the sample image is obtained, the forward label pixel point is selected from the sample image according to the weighted value of each pixel point, the image segmentation sample corresponding to the image classification sample is further formed according to the forward label pixel point and the classification label, the important pixel point in the classification process of the sample image is conveniently and rapidly positioned, the image segmentation sample is generated by utilizing the key pixel point and the classification label, the image segmentation sample corresponding to the image classification sample is formed, the pixel point which has important influence on the image classification in the sample image is obtained by using the interpretable algorithm, and therefore, the corresponding classification label shape can be added to each pixel point which has important influence in the sample image simply, conveniently and at high precision The image segmentation samples are formed, a large number of image segmentation samples can be obtained at very low cost, the generation efficiency of the image segmentation samples is improved, and the pre-training performance of the image segmentation model is improved to a certain extent.

In an example, fig. 2b is a flowchart of another method for generating an image segmentation sample according to an embodiment of the present disclosure, and as shown in fig. 2b, the method includes:

step 2100, obtaining an image classification sample.

Step 2110, inputting the model parameters of the sample image and the image classification model into an algorithm model matched with the interpretable algorithm, and obtaining the weight value of each pixel point in the sample image.

In an optional embodiment of the present disclosure, step S2110 may include:

step 2111, a plurality of image classification models are obtained.

Step 2112, the model parameters of the sample image and each image classification model are respectively input into the algorithm model, and the single model weight value of each pixel point of the sample image under the action of each image classification model is obtained.

Step 2113, carrying out weighted average on a plurality of single model weight values of the same pixel position in the sample image to obtain the weight value of each pixel point in the sample image.

And step 2120, determining the number of selected pixels according to the total number of pixels in the sample image and a preset selection proportion.

And 2130, selecting forward label pixel points with the number matched with the selected pixel points according to the sequence of the weighted values from large to small.

2140, labeling each forward label pixel point in the sample image by using the classification label to form an image segmentation sample.

The technical scheme of the embodiment of the invention comprises the steps of obtaining image classification samples, further obtaining a plurality of image classification models, respectively inputting the sample images and model parameters of each image classification model into an algorithm model, obtaining the single model weight values of each pixel point of the sample images under the action of each image classification model, carrying out weighted average on a plurality of single model weight values of the same pixel position in the sample images, obtaining the weight values of each pixel point in the sample images, further determining the selection number of the pixel points according to the total number of the pixel points in the sample images and a preset selection proportion, further selecting the forward label pixel points matched with the selection number of the pixel points according to the sequence of the weight values from large to small, and further forming the technical means of image segmentation samples corresponding to the image classification samples according to the forward label pixel points and the classification labels, the pixel points which have important influence on image classification in the sample image are obtained by using the interpretable algorithm, so that the corresponding classification labels can be added to the pixel points which have important influence in the sample image simply, conveniently and accurately to form the image segmentation samples, a large number of image segmentation samples can be obtained at very low cost, the generation efficiency of the image segmentation samples is improved, and the pre-training performance of the image segmentation model is improved to a certain extent.

In an example, fig. 3 is a flowchart of a method for pre-training an image segmentation model according to an embodiment of the present disclosure, where the embodiment is applicable to a case of pre-training an image segmentation model, and the method may be performed by a pre-training apparatus for an image segmentation model, and the apparatus may be implemented by at least one of software and hardware, and may be generally integrated in an electronic device. Accordingly, as shown in fig. 3, the method includes the following operations:

and step 310, acquiring an image classification sample set.

Wherein the image classification sample set may be a set of image classification samples.

In the embodiment of the present disclosure, a plurality of image classification samples in any scene may be acquired to obtain an image classification sample set including the plurality of image classification samples.

And 320, processing each image classification sample in the image classification sample set by adopting an image classification sample generation method to generate an image classification sample set corresponding to the image classification sample set.

The image segmentation sample set may be a set of image segmentation samples generated according to the method for generating image segmentation samples in any of the embodiments described above.

In this disclosure, according to the method for generating image segmentation samples in any of the above embodiments, forward label pixel points and classification labels of sample images of each image classification sample in an image classification sample set may be determined, so that each image segmentation sample corresponding to each image classification sample is respectively formed according to the forward label pixel points and the classification labels of the sample images of each image classification sample, that is, an image segmentation sample set corresponding to the image classification sample set is obtained.

And 330, training all model parameters included in the preset machine learning model according to the image segmentation sample set to obtain a pre-training image segmentation model.

The pre-training image segmentation model may be a model obtained by training a preset machine learning model through an image segmentation sample set.

In the embodiment of the disclosure, a preset machine learning model may be obtained, all model parameters included in the preset machine learning model may be determined, then all model parameters included in the preset machine learning model may be trained by using an image segmentation sample set, the trained machine learning model may be used as a pre-training image segmentation model, and then the pre-training image segmentation model may be used to perform image segmentation on the input image.

According to the technical scheme of the embodiment of the invention, the image classification sample set is obtained, and then the image segmentation sample generation method is adopted to process each image classification sample in the image classification sample set to generate the image segmentation sample set corresponding to the image classification sample set, and further all model parameters in the preset machine learning model are trained according to the image segmentation sample set to obtain the pre-training image segmentation model. According to the image segmentation sample generation method in any one of the embodiments, the interpretable algorithm is used for obtaining the pixel points which have important influence on image classification in the sample image, so that the corresponding classification labels can be added to the pixel points which have important influence in the sample image simply, conveniently and accurately to form the image segmentation sample, an image segmentation sample set is obtained, a large number of image segmentation sample sets can be obtained at very low cost, the image segmentation sample generation efficiency is improved, the pre-training performance of the image segmentation model is improved to a certain extent, and the training effect of the pre-training model is improved.

In an example, fig. 4a is a flowchart of a pre-training method of an image segmentation model provided by an embodiment of the present disclosure, and accordingly, as shown in fig. 4a, the method includes the following operations:

and step 410, acquiring an image classification sample set.

Step 420, processing each image classification sample in the image classification sample set by using an image classification sample generation method to generate an image classification sample set corresponding to the image classification sample set.

And 430, training all model parameters included in the preset machine learning model according to the image segmentation sample set to obtain a pre-training image segmentation model.

In an optional embodiment of the present disclosure, before training all model parameters included in a preset machine learning model according to an image segmentation sample set to obtain a pre-trained image segmentation model, the method may further include: selecting a heterogeneous label different from each classification label according to all classification labels corresponding to the image classification sample set; and marking the pixel points which are not marked with the classification labels in the image segmentation samples in the image segmentation sample set by using the heterogeneous labels.

The heterogeneous label may be a label that is not present in all classification labels corresponding to the image segmentation sample set and is unrelated to the classification of the sample image. Illustratively, a heterogeneous label may be used to identify the image background.

In the embodiment of the present disclosure, all classification labels corresponding to sample images in an image classification sample set may be obtained, and then all classification labels are analyzed, and labels different from all classification labels are determined and used as heterogeneous labels, so that pixel points (i.e., pixel points not labeled with a classification label) except forward label pixel points in each image segmentation sample in the image segmentation sample set are labeled with heterogeneous labels. By marking the image segmentation samples of the heterogeneous labels and training the machine learning model, the machine learning model can have a better pre-training effect, so that the pre-training image segmentation model can perform accurate image segmentation.

And step 440, acquiring a standard image segmentation sample set matched with the image segmentation task scene.

The image segmentation task scene may be a scene to which a picture to be subjected to image segmentation belongs. The standard image segmentation sample set may be an image segmentation sample set that matches the image segmentation task scene.

In the embodiment of the present disclosure, an image segmentation task scene may be determined first, and then an image classification sample set matching the image segmentation task scene is obtained, and a standard image segmentation sample set matching the image segmentation task scene is generated according to the method for generating an image segmentation sample in any one of the embodiments.

And 450, fine-tuning the pre-training image segmentation model by using the standard image segmentation sample set to obtain a target image segmentation model matched with the image segmentation task scene.

The target image segmentation model may be a model obtained by fine-tuning a pre-training image segmentation model by using a standard image segmentation sample set.

In the embodiment of the present disclosure, the pre-training image segmentation model may be trained by using a standard image segmentation sample set, so as to adjust model parameters of the pre-training image segmentation model, and obtain a target image segmentation model matched with an image segmentation task scene, that is, the target image segmentation model may perform high-precision image segmentation on an image matched with the image segmentation task scene. The pre-training image segmentation model is trained through the standard image segmentation sample set, and the pre-training image segmentation model has stronger image segmentation capability on images in a specific image segmentation task scene only on the premise of fine adjustment of model parameters.

Wherein the image segmentation task scene may include at least one of: the method comprises the following steps of driving scenes, medical image scenes, robot perception scenes and remote sensing satellite image segmentation scenes. The image segmentation sample sets under different image segmentation task scenes have image characteristics specific to the image segmentation task scenes, so that the image segmentation sample sets under the image segmentation task scenes are used for training the pre-training model image segmentation model, and the image segmentation capability of the pre-training model image segmentation model on the specific image segmentation task scenes can be enhanced.

According to the technical scheme, the image classification sample set is obtained, and an image segmentation sample generation method is adopted to process each image classification sample in the image classification sample set to generate an image segmentation sample set corresponding to the image classification sample set, so that all model parameters in a preset machine learning model are trained according to the image segmentation sample set to obtain a pre-training image segmentation model, and a standard image segmentation sample set matched with an image segmentation task scene is further obtained. According to any technical means for forming image segmentation samples corresponding to the image classification samples in the embodiments, pixel points in the sample image which have important influence on image classification are obtained by using an interpretable algorithm, so that corresponding classification labels can be added to the pixel points which have important influence in the sample image simply, conveniently and accurately to form the image segmentation samples, a large number of image segmentation samples can be obtained at very low cost, the generation efficiency of the image segmentation samples is improved, the pre-training performance of the image segmentation model is improved to a certain extent, and the pre-training model image segmentation model has stronger image segmentation capability for the image in a specific image segmentation task scene only on the premise of fine adjustment of model parameters.

In an example, fig. 4b is a flowchart of a pre-training method of an image segmentation model provided by an embodiment of the present disclosure, and accordingly, as shown in fig. 4b, the method includes the following operations:

step 4100, acquiring an image classification sample set.

Step 4110, selecting a heterogeneous label different from each classification label according to all classification labels corresponding to the image classification sample set.

Step 4120, labeling each pixel point, which is not labeled with a classification label, in each image segmentation sample in the image segmentation sample set by using a heterogeneous label.

Step 4130, processing each image classification sample in the image classification sample set by using the image classification sample generation method to generate an image classification sample set corresponding to the image classification sample set.

Step 4140, training all model parameters included in the preset machine learning model according to the image segmentation sample set to obtain a pre-training image segmentation model.

And 4150, acquiring a standard image segmentation sample set matched with the image segmentation task scene.

And 4160, fine tuning the pre-training image segmentation model by using the standard image segmentation sample set to obtain a target image segmentation model matched with the image segmentation task scene.

According to the technical scheme, the image classification sample set is obtained, one heterogeneous label different from each classification label is selected according to all classification labels corresponding to the image classification sample set, all pixel points which are not labeled with the classification labels in all image segmentation samples in the image segmentation sample set are labeled by using the heterogeneous labels, each image classification sample in the image classification sample set is processed by further adopting an image segmentation sample generation method, an image segmentation sample set corresponding to the image classification sample set is generated, all model parameters in a preset machine learning model are trained according to the image segmentation sample set, and a pre-training image segmentation model is obtained. After the pre-training image segmentation model is obtained, a standard image segmentation sample set matched with the image segmentation task scene is obtained, and the pre-training image segmentation model is subjected to fine adjustment by using the standard image segmentation sample set, so that a target image segmentation model matched with the image segmentation task scene is obtained. According to any technical means for forming image segmentation samples corresponding to the image classification samples in the embodiments, pixel points in the sample image which have important influence on image classification are obtained by using an interpretable algorithm, so that corresponding classification labels can be added to the pixel points which have important influence in the sample image simply, conveniently and accurately to form the image segmentation samples, a large number of image segmentation samples can be obtained at very low cost, the generation efficiency of the image segmentation samples is improved, the pre-training performance of the image segmentation model is improved to a certain extent, and the pre-training model image segmentation model has stronger image segmentation capability for the image in a specific image segmentation task scene only on the premise of fine adjustment of model parameters.

The complete training process of the image segmentation model in the embodiment of the present disclosure may be divided into two parts, pre-training and downstream task fine-tuning. The pre-training of the conventional image segmentation model is only performed on the image classification sample set, and only a part of the model backbone part is trained. In the downstream task fine tuning, a pre-trained model is used to perform fine tuning training on a specific image segmentation sample set (a standard image segmentation sample set matched with an image segmentation task scene) for solving an image segmentation task in the specific image segmentation task scene. The complete training process of the image segmentation model is as follows:

(1) an image classification sample set (e.g., ImageNet), an interpretable algorithm (e.g., an input gradient-based algorithm), and a plurality of trained image classification models are selected. Through the interpretable algorithm, important pixel points (forward label pixel points) in the sample image of the input image classification model can be determined, and the important pixel points are more consistent with the label of image segmentation. Aiming at different image classification models (such as a deep learning model), an interpretable algorithm acts on the image classification models, and aiming at the three primary color output channel of each sample image, the gradient is calculated, the modulus value is extracted, and the weight value of the pixel point is obtained.

(2) In order to reduce noise generated by generating the weighted values of the pixel points, the weighted average can be performed by using the single model weighted values of the interpretable algorithm under a plurality of image classification models to obtain an average result, namely the final weighted value of the pixel points, so that a large amount of noise is reduced. The logic diagram of denoising the weighted values of the pixels can be seen in fig. 5.

(3) And taking the average result as an image segmentation pseudo label. However, in consideration of the use efficiency, the average result is further subjected to binarization processing, and the result of the binarization processing is used as an image segmentation pseudo label, where fig. 6 is shown before the binarization processing, and fig. 7 is shown after the binarization processing. And selecting the first 10% of the pixel points as positive label pixel points, and labeling the residual pixel points (negative pixel points) with heterogeneous labels.

(4) By using the method for generating the image segmentation samples, all sample images of the image classification sample set are calculated to obtain the corresponding image segmentation sample set.

(5) In the pre-training of the conventional image classification model, the classification labels of the sample images are used as supervision information, and part of model parameters (namely, parameters of a part of backbone architecture of the image segmentation model) of the image classification model are performed. Different from pre-training of an existing image segmentation model, the pre-training method of the image segmentation model provided by the embodiment of the disclosure uses an image segmentation sample set as supervision information to train the whole image segmentation model. The image segmentation labels in the image segmentation sample set are derived from two parts: image classification labels and binary image segmentation pseudo labels. All positive-going label pixels will be assigned a classification label for the sample image and all negative-going pixels will be assigned a background class (hetero-label). Because the image segmentation sample set is established based on the image classification sample set, each picture has a corresponding classification label and a corresponding image segmentation pseudo label after binarization.

(6) And pre-training the image segmentation model by using the image segmentation label as supervision information. After training, the model can also be directly used as an image segmentation model, but the effect is limited.

(7) And performing fine tuning training on a downstream task by using the pre-trained model. The difference is that the pre-training model used in the conventional step only has backbone parameters as initialization, and the method of the embodiment of the present disclosure effectively initializes the model parameters of the whole image segmentation model.

(1) The computational logic of (4) is: inputting an image classification sample set D, K depth image classification models f _k An interpretable algorithm a. Step 1, calculating each sample image I in D based on A _i And multiple single model weight values of each pixel point. Step 2, for each sample image I in D _i Calculating classification models f respectively corresponding to K depth images _k And obtaining the weighted value of each pixel point by the average value of the weighted values of the single models. Step 3, for each sample image I in D _i And calculating a threshold value with the weight value being 10% of the total weight value, screening the weight value of the pixel point by using the threshold value, binarizing the weight value (for example, setting the weight value larger than or equal to the threshold value as 1, and setting the weight value smaller than the threshold value as 0), and taking the binarization result as an image segmentation pseudo label. And outputting the image segmentation sample set corresponding to the image classification sample set.

(5) The computational logic of (1) - (7) is: inputting an image classification sample set D (the classification label category is Nc), an image segmentation sample set P corresponding to the image classification sample set, a preset machine learning model f and an image segmentation task H. Step 1, for each sample image I in D _i Class label of d _i Setting the image segmentation label of the pixel point corresponding to 1 in the binarization result as d _i And setting the image segmentation label of the pixel point corresponding to the binarization result as a heterogeneous labelAnd (6) a label. And 2, training a preset machine learning model f by using the labels in the step 1 and using a conventional deep learning optimization algorithm to obtain a pre-training image segmentation model f'. And 3, fine adjustment is carried out on the image segmentation task H by using f' obtained by training in the step 2. And outputting the target image segmentation model trained on the H.

Fig. 8 is a schematic diagram of an image segmentation sample generation apparatus that is publicly provided, and as shown in fig. 8, the image segmentation sample generation apparatus includes a classification sample obtaining module 510, a weight value determining module 520, a forward label pixel point screening module 530, and an image segmentation sample generation module 540, where:

a classified sample obtaining module 510, configured to obtain an image classified sample, where the image classified sample includes: sample image, and classification label of sample image

A weight value determining module 520, configured to determine, through an interpretable algorithm, a weight value of each pixel point of the sample image under the action of the image classification model;

a forward label pixel point screening module 530, configured to select a forward label pixel point from the sample image according to the weight value of each pixel point;

and the image segmentation sample generation module 540 is configured to form an image segmentation sample corresponding to the image classification sample according to the forward label pixel point and the classification label.

Optionally, the weight value determining module 520 is configured to input the model parameters of the sample image and the image classification model into an algorithm model matched with the interpretable algorithm, and obtain a weight value of each pixel point in the sample image; the weighted value is used for measuring the importance degree of each pixel point in the classification process of the image classification model to the sample image.

Optionally, the weight value determining module 520 is configured to obtain a plurality of image classification models; respectively inputting the sample image and the model parameters of each image classification model into the algorithm model, and acquiring the single model weight value of each pixel point of the sample image under the action of each image classification model; and carrying out weighted average on a plurality of single model weight values of the same pixel position in the sample image to obtain the weight value of each pixel point in the sample image.

Optionally, the forward label pixel point screening module 530 is configured to determine the number of pixel points to be selected according to the total number of pixel points in the sample image and a preset selection ratio; and selecting forward label pixel points matched with the selected number of the pixel points according to the sequence of the weighted values from large to small.

Optionally, the image segmentation sample generating module 540 is configured to label each forward label pixel point in the sample image with the classification label to form the image segmentation sample.

The image segmentation sample generation device can execute the image segmentation sample generation method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the image segmentation sample generation method.

Fig. 9 is a schematic diagram of a pre-training apparatus for an image segmentation model, as shown in fig. 9, the pre-training apparatus for an image segmentation model includes a sample set obtaining module 610, an image segmentation sample set generating module 620, and a pre-training image segmentation model obtaining module 630, where:

a sample set obtaining module 610, configured to obtain an image classification sample set;

an image segmentation sample set generating module 620, configured to process each image classification sample in the image classification sample set by using the image segmentation sample generating method in any of the embodiments described above, and generate an image segmentation sample set corresponding to the image classification sample set;

the pre-training image segmentation model obtaining module 630 is configured to train, according to the image segmentation sample set, each model parameter included in the preset machine learning model to obtain a pre-training image segmentation model.

Optionally, the pre-training device for the image segmentation model further includes a heterogeneous label labeling module, where the heterogeneous label labeling module is configured to select a heterogeneous label different from each classification label according to all classification labels corresponding to the image classification sample set; and marking the pixel points which are not marked with the classification labels in the image segmentation samples in the image segmentation sample set by using the heterogeneous labels.

Optionally, the pre-training device for the image segmentation model further includes a target image segmentation model, where the target image segmentation model is used to obtain a standard image segmentation sample set matched with an image segmentation task scene; and fine-tuning the pre-training image segmentation model by using the standard image segmentation sample set to obtain a target image segmentation model matched with the image segmentation task scene.

Optionally, the image segmentation task scene includes at least one of the following: the method comprises the following steps of driving scenes, medical image scenes, robot perception scenes and remote sensing satellite image segmentation scenes.

The pre-training device of the image segmentation model can execute the pre-training method of the image segmentation model provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the pre-training method of the image segmentation model.

The present disclosure also provides an electronic device, a computer-readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 10 shows a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, desktop computers, workstations, personal digital assistants, servers, mainframes, and other suitable computers. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 10, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM)12, a Random Access Memory (RAM)13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM)12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 11 performs the various methods and processes described above, such as the generation method of the image segmentation samples given in any of the embodiments, or the pre-training method of the image segmentation model. In some embodiments, the method of generating a given image segmentation sample, or the method of pre-training an image segmentation model, may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the method for generating image segmentation samples described above, or the method for pre-training the image segmentation model, may be performed. Alternatively, in other embodiments, the processor 11 may be configured by any other suitable means (e.g. by means of firmware) to perform the generation method of the image segmentation samples, or the pre-training method of the image segmentation model.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome. The server may also be a server of a distributed system, or a server incorporating a blockchain.

Artificial intelligence is the subject of research that makes computers simulate some human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge map technology and the like.

Cloud computing (cloud computing) refers to a technology system that accesses a flexibly extensible shared physical or virtual resource pool through a network, where resources may include servers, operating systems, networks, software, applications, storage devices, and the like, and may be deployed and managed in a self-service manner as needed. Through the cloud computing technology, high-efficiency and strong data processing capacity can be provided for technical application and model training of artificial intelligence, block chains and the like.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in this disclosure may be performed in parallel or sequentially or in a different order, as long as the desired results of the technical solutions provided by this disclosure can be achieved, and are not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A generation method of an image segmentation sample comprises the following steps:

2. The method of claim 1, wherein determining the weight value of each pixel point of the sample image under the effect of the image classification model by an interpretable algorithm comprises:

inputting the sample image and the model parameters of the image classification model into an algorithm model matched with an interpretable algorithm, and obtaining the weight value of each pixel point in the sample image;

the weighted value is used for measuring the importance degree of each pixel point in the classification process of the image classification model to the sample image.

3. The method of claim 2, wherein inputting model parameters of the sample image and the image classification model into an algorithm model matched with an interpretable algorithm, and obtaining a weight value of each pixel point in the sample image comprises:

acquiring a plurality of image classification models;

respectively inputting the sample image and the model parameters of each image classification model into the algorithm model, and acquiring the single model weight value of each pixel point of the sample image under the action of each image classification model;

and carrying out weighted average on a plurality of single model weight values of the same pixel position in the sample image to obtain the weight value of each pixel point in the sample image.

4. The method of claim 1, wherein selecting forward label pixel points in the sample image according to the weight values of the pixel points comprises:

determining the number of selected pixels according to the total number of the pixels in the sample image and a preset selection proportion;

and selecting forward label pixel points matched with the selected number of the pixel points according to the sequence of the weighted values from large to small.

5. The method of claim 1, wherein forming image segmentation samples corresponding to the image classification samples according to the forward label pixel points and the classification labels comprises:

and labeling each forward label pixel point in the sample image by using the classification label to form the image segmentation sample.

6. A pre-training method of an image segmentation model comprises the following steps:

acquiring an image classification sample set;

processing each image classification sample in an image classification sample set by adopting the image classification sample generation method of any one of claims 1 to 5 to generate an image classification sample set corresponding to the image classification sample set;

7. The method according to claim 6, wherein before training all model parameters included in the preset machine learning model according to the image segmentation sample set to obtain a pre-trained image segmentation model, the method further comprises:

selecting a heterogeneous label different from each classification label according to all classification labels corresponding to the image classification sample set;

and marking the pixel points which are not marked with the classification labels in the image segmentation samples in the image segmentation sample set by using the heterogeneous labels.

8. The method according to claim 7, wherein after training all model parameters included in a preset machine learning model according to the image segmentation sample set to obtain a pre-trained image segmentation model, the method further comprises:

acquiring a standard image segmentation sample set matched with an image segmentation task scene;

and fine-tuning the pre-training image segmentation model by using the standard image segmentation sample set to obtain a target image segmentation model matched with the image segmentation task scene.

9. The method of claim 8, wherein the image segmentation task scene comprises at least one of: driving scenes, medical imaging scenes, robot perception scenes and remote sensing satellite image segmentation scenes.

10. An apparatus for generating an image segmentation sample, comprising:

11. An apparatus for pre-training an image segmentation model, comprising:

an image segmentation sample set generation module, configured to process each image classification sample in an image classification sample set by using the image segmentation sample generation method according to any one of claims 1 to 5, and generate an image segmentation sample set corresponding to the image classification sample set;

12. An electronic device, characterized in that the electronic device comprises:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a method of generating image segmentation samples as claimed in any one of claims 1 to 5, or a method of pre-training an image segmentation model as claimed in any one of claims 6 to 9.

13. A non-transitory computer readable storage medium storing computer instructions for causing a computer to execute the method for generating an image segmentation sample according to any one of claims 1 to 5 or the method for pre-training an image segmentation model according to any one of claims 6 to 9.

14. A computer program product comprising a computer program which, when being executed by a processor, carries out the steps of the method for generating image segmentation samples according to any one of claims 1 to 5 or the steps of the method for pre-training image segmentation models according to any one of claims 6 to 9.