CN111898613B

CN111898613B - Semi-supervised semantic segmentation model training method, recognition method and device

Info

Publication number: CN111898613B
Application number: CN202011054144.7A
Authority: CN
Inventors: 劳江微; 王剑; 陈景东; 褚崴; 汪佳; 顾欣欣; 孙剑哲; 甘利民; 余泉; 孙晓冬
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2020-12-25
Anticipated expiration: 2040-09-30
Also published as: CN111898613A

Abstract

According to the semi-supervised semantic segmentation model training method of the embodiment, firstly, first supervision data obtained after an object to be labeled in a first image is manually labeled is obtained, and then a fully supervised semantic segmentation model with relatively high identification rate of the object to be labeled is obtained through first supervision data training. And marking the object to be marked in the second image which is not manually marked by using the full-supervision semantic segmentation model to obtain second supervision data. And then training a semi-supervised semantic segmentation model by using first supervised data obtained by manual labeling and second supervised data obtained by full-supervised semantic segmentation model labeling, and identifying the first image, the second image and random disturbance items by using the semi-supervised semantic segmentation model to obtain third supervised data. And finally, the semi-supervised semantic segmentation model is trained again through the first, second and third supervised data.

Description

Semi-supervised semantic segmentation model training method, recognition method and device

Technical Field

One or more embodiments of the present specification relate to the field of image processing technologies, and in particular, to a semi-supervised semantic segmentation model training method, an identification method, and an apparatus.

Background

Image recognition refers to a technique of analyzing and understanding an image with a computer to recognize various patterns of objects and objects. The image recognition technology is currently applied to the fields of remote sensing image recognition, communication, image archive restoration and the like, and provides convenience for life of people.

At present, a model for image recognition is usually obtained after training by training data, and the training data is obtained after manual processing, which results in insufficient generalization capability of the model and affects accuracy of image recognition. Therefore, in response to the above deficiency, it is necessary to provide an image recognition model with higher accuracy.

Disclosure of Invention

One or more embodiments of the present specification describe a semi-supervised semantic segmentation model training method, an identification method, and an apparatus, which can improve the accuracy of image identification.

According to a first aspect, a semi-supervised semantic segmentation model training method is provided, wherein the semi-supervised semantic segmentation model is used for labeling an object to be labeled in an image; the method comprises the following steps:

obtaining first supervision data of each first image; the first supervision data of each first image is data obtained after manual annotation is carried out on the object to be annotated in the first image;

training a fully supervised semantic segmentation model by using each first supervised data;

inputting at least one second image into the fully supervised semantic segmentation model; each second image comprises an object to be marked;

labeling the object to be labeled in each second image by the full-supervision semantic segmentation model to obtain second supervision data of each second image;

training the semi-supervised semantic segmentation model by using each first supervised data and each second supervised data;

generating a random disturbance term;

inputting at least one first image, at least one second image and the random disturbance item into the semi-supervised semantic segmentation model to obtain third supervised data of each image; the third supervision data of each image is data obtained by labeling the object to be labeled in the image by the semi-supervised semantic segmentation model;

and training the semi-supervised semantic segmentation model by utilizing all the third supervision data, all the first supervision data and all the second supervision data.

In one embodiment, the number of the random disturbance terms is at least two;

inputting at least one first image, at least one second image and the random disturbance term into the semi-supervised semantic segmentation model to obtain third supervised data of each image, wherein the third supervised data comprises:

inputting at least one first image, at least one second image and the random disturbance item into the semi-supervised semantic segmentation model for each random disturbance item, and obtaining third supervision data of each image of the at least one first image and the at least one second image when the random disturbance item is input;

the training of the semi-supervised semantic segmentation model by using each third supervised data, each first supervised data and each second supervised data comprises:

calculating the difference between different third supervision data obtained when different random disturbance items are input aiming at each image to obtain the first supervision loss of the image;

calculating the difference between the third supervision data of each first image and the first supervision data of the first image to obtain the second supervision loss of the first image;

calculating the difference between the third supervision data of each second image and the second supervision data of each second image to obtain the second supervision loss of each second image;

and training the semi-supervised semantic segmentation model by utilizing the obtained first supervision losses and the second supervision losses.

In one embodiment, wherein the obtaining first surveillance data for each first image comprises:

segmenting each first image after manual marking to obtain at least two segmented images corresponding to the first image, wherein the two adjacent segmented images have an overlapping area;

and performing de-differentiation processing on each segmented image respectively, and obtaining first supervision data of the first image from two segmented images obtained after the de-differentiation processing.

In one embodiment, the separately de-differentiating each of the sliced images includes:

determining at least two first image channel attribute values respectively corresponding to at least two image channels of each segmented image, and performing averaging calculation on the at least two first image channel attribute values to obtain a first average value corresponding to the segmented image;

calculating the average value of each first average value corresponding to each cut image to obtain a second average value;

calculating the variance corresponding to each segmented image by using at least two first image channel attribute values and the second average value of each segmented image;

calculating a third average value of each variance corresponding to each segmentation image;

updating the image channel attribute value corresponding to each image channel of each segmented image from the first image channel attribute value to a second image channel attribute value; wherein the second image channel attribute value is calculated according to the following formula:

wherein the content of the first and second substances,

a second image channel attribute value corresponding to an ith image channel for characterizing the sliced image,

and the attribute value of the first image channel corresponding to the ith image channel for representing the segmentation image, A is used for representing the second average value, and V is used for representing the third average value.

In one embodiment, the training of the fully supervised semantic segmentation model using the respective first supervised data comprises:

training the first supervision data by using a multi-channel semantic segmentation model to obtain a fully supervised semantic segmentation model; wherein the multi-channel semantic segmentation model comprises at least one of a high resolution network HRNet model, an Optical Character Recognition (OCR) model, and a DeepLabV3+ model.

In one embodiment, the training of the semi-supervised semantic segmentation model using the first and second supervised data comprises:

training each first monitoring data and each second monitoring data by using the multi-channel semantic segmentation model to obtain a semi-monitoring semantic segmentation model;

wherein the multi-channel semantic segmentation model comprises at least one of a high resolution network HRNet model, an Optical Character Recognition (OCR) model, and a DeepLabV3+ model.

According to a second aspect, there is provided an image recognition method comprising:

training a semi-supervised semantic segmentation model by using the semi-supervised semantic segmentation model training method of any one of the first aspect;

and marking the object to be marked in the image to be identified by utilizing the semi-supervised semantic segmentation model.

According to a third aspect, there is provided a semi-supervised semantic segmentation model training apparatus, comprising:

the first monitoring data acquisition module is configured to acquire first monitoring data of each first image; the first supervision data of each first image is data obtained after manual annotation is carried out on the object to be annotated in the first image; generating a random disturbance term;

the full-supervision training module is configured to train a full-supervision semantic segmentation model by utilizing each first supervision data acquired by the first supervision data acquisition module;

a data input module configured to input at least one second image into the fully supervised semantic segmentation model trained by the fully supervised training module; each second image comprises an object to be marked;

the second supervision data acquisition module is configured to label the object to be labeled in each second image input by the data input module by using the fully supervised semantic segmentation model to acquire second supervision data of each second image;

a semi-supervised training module configured to train the semi-supervised semantic segmentation model using each first supervised data and each second supervised data; inputting at least one first image, at least one second image and the random disturbance item into the semi-supervised semantic segmentation model by using the data input module to obtain third supervised data of each image; the third supervision data of each image is data obtained by labeling the object to be labeled in the image by the semi-supervised semantic segmentation model; and training the semi-supervised semantic segmentation model by utilizing all the third supervision data, all the first supervision data and all the second supervision data.

In one embodiment, the number of the random disturbance terms is at least two;

the semi-supervised training module comprises:

the input subunit is configured to input at least one first image, at least one second image and the random disturbance item into the semi-supervised semantic segmentation model for each random disturbance item, and obtain third supervision data of each of the at least one first image and the at least one second image when the random disturbance item is input;

a first monitoring loss obtaining unit configured to calculate, for each image, a difference between different third monitoring data obtained when different random disturbance items are input, to obtain a first monitoring loss of the image;

a second monitoring loss obtaining unit configured to calculate, for each first image, a difference between third monitoring data of the first image and first monitoring data of the first image, so as to obtain a second monitoring loss of the first image; calculating the difference between the third supervision data of each second image and the second supervision data of each second image to obtain the second supervision loss of each second image;

and the model training unit is configured to train the semi-supervised semantic segmentation model by using the obtained first supervision losses and the obtained second supervision losses.

In one embodiment, wherein the first supervisory data acquisition module comprises:

the image segmentation unit is configured to segment each first image after being manually marked to obtain at least two segmented images corresponding to the first image, wherein the two adjacent segmented images have an overlapping area;

and the de-differentiation processing unit is configured to respectively perform de-differentiation processing on each segmented image, and obtain first supervision data of the first image from two segmented images obtained after the de-differentiation processing.

In one embodiment, wherein the de-differencing processing unit includes:

the average operator unit is configured to determine at least two first image channel attribute values respectively corresponding to at least two image channels of each segmented image and perform averaging calculation on the at least two first image channel attribute values to obtain a first average value corresponding to the segmented image; calculating the average value of each first average value corresponding to each cut image to obtain a second average value;

the variance calculation subunit is configured to calculate, for each segmented image, a variance corresponding to the segmented image by using at least two first image channel attribute values and the second average value of the segmented image; calculating a third average value of each variance corresponding to each segmentation image;

the image channel adjusting subunit is configured to update the image channel attribute value corresponding to each image channel of each sliced image from the first image channel attribute value to the second image channel attribute value; wherein the second image channel attribute value is calculated according to the following formula:

wherein the content of the first and second substances,

In one embodiment, the fully supervised training module is configured to train the first supervised data by using a multi-channel semantic segmentation model to obtain a fully supervised semantic segmentation model; wherein the multi-channel semantic segmentation model comprises at least one of a high resolution network HRNet model, an Optical Character Recognition (OCR) model, and a deep Lab V3+ model.

In one embodiment, the semi-supervised training module is configured to train the first supervised data and the second supervised data by using the multi-channel semantic segmentation model to obtain a semi-supervised semantic segmentation model;

wherein the multi-channel semantic segmentation model comprises at least one of a high resolution network HRNet model, an Optical Character Recognition (OCR) model, and a deep Lab V3+ model.

According to a fourth aspect, there is provided an image recognition apparatus comprising:

the semi-supervised semantic segmentation model training device of any one of the third aspects;

and the image recognition module is configured to label the object to be labeled in the image to be recognized by using the semi-supervised semantic segmentation model trained by the semi-supervised semantic segmentation model training device.

According to a fifth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of the above.

According to a sixth aspect, there is provided a computing device comprising a memory having stored therein executable code, and a processor that when executed implements the method of any preceding claim.

According to the method and the device provided by the embodiment of the specification, the object to be labeled in each first image is labeled manually, so that first supervision data corresponding to each first image can be obtained. Because the first supervision data is obtained after manual marking, the accuracy of identification of the object to be marked in the first image is relatively high, and a fully supervised semantic segmentation model with relatively high identification accuracy and relatively poor generalization capability can be trained by utilizing the first supervision data. And labeling the object to be labeled in each second image by using the fully supervised semantic segmentation model to obtain second supervision data of each second image. And finally, training the semi-supervised semantic segmentation model by using the first supervised data obtained after manual labeling and the second supervised data obtained after model labeling, so that the difficulty of obtaining a large amount of model training data through manual labeling can be reduced, and the generalization capability of the semi-supervised semantic segmentation model can be improved. In order to further improve the image recognition capability of the semi-supervised semantic segmentation model, the random disturbance item, the first image and the second image can be recognized by using the semi-supervised semantic segmentation model to obtain third supervised data, and the semi-supervised semantic segmentation model is retrained by using the first supervised data, the second supervised data and the third supervised data to further improve the generalization capability of the semi-supervised semantic segmentation model and further improve the accuracy of the image recognition of the semi-supervised semantic segmentation model.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present specification, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow diagram of a semi-supervised semantic segmentation model training method provided by one embodiment of the present specification;

FIG. 2 is a schematic diagram of a semi-supervised semantic segmentation model training method provided by one embodiment of the present specification;

FIG. 3 is a flow chart of an image recognition method provided by one embodiment of the present description;

FIG. 4 is a schematic diagram of a semi-supervised semantic segmentation model training apparatus provided in another embodiment of the present description;

fig. 5 is a schematic diagram of an image recognition apparatus according to still another embodiment of the present disclosure.

Detailed Description

As mentioned above, in order to identify an object to be labeled in an image, a machine model capable of automatically identifying the object to be labeled in the image needs to be trained. Data used for training the machine model is obtained by a manual marking mode at present. For example, if the type and planting region range of crops in the remote sensing image need to be trained and identified, in the prior art, an object to be labeled in the image, i.e., the planting region range of crops and the type of crops, is labeled manually, and then data after manual labeling is used as data for training the machine model.

However, the ability to manually label images is limited, which results in a small amount of data for training the model, and a large amount of data for training the model cannot be generated, thus resulting in a low accuracy of the model. For example, the trained model is usually only suitable for images similar to the artificially labeled image, and for images with a large difference from the artificially labeled image, the recognition accuracy is low, so that the generalization capability of the model is insufficient. Moreover, the training data of the model is obtained by manually labeling the image, a large amount of manpower and material resources are required to be invested, and the training difficulty of the model is increased.

Therefore, in consideration of the characteristics of accuracy of the manual labeling mode and the strong capacity of processing mass data of the machine labeling mode, the embodiment of the specification can use the data obtained by manual labeling and the data obtained by machine labeling as training data of the training model together, so that the model can be trained by using the characteristics of high accuracy of the training data and the characteristics of mass data of the training data, and therefore, the model with higher recognition degree can be obtained.

The following describes implementations of the concepts of the embodiments of the present disclosure. As shown in fig. 1, an embodiment of the present specification provides a semi-supervised semantic segmentation model training method, where the semi-supervised semantic segmentation model is used to label an object to be labeled in an image, and specifically includes the following steps:

step 101: obtaining first supervision data of each first image; the first supervision data of each first image is data obtained after manual annotation is carried out on the object to be annotated in the first image;

step 103: training a fully supervised semantic segmentation model by using each first supervised data;

step 105: inputting at least one second image into the fully supervised semantic segmentation model; each second image comprises an object to be marked;

step 107: labeling the object to be labeled in each second image by the full-supervision semantic segmentation model to obtain second supervision data of each second image;

step 109: training the semi-supervised semantic segmentation model by using each first supervised data and each second supervised data;

step 111: generating a random disturbance term;

step 113: inputting at least one first image, at least one second image and the random disturbance item into the semi-supervised semantic segmentation model to obtain third supervised data of each image; the third supervision data of each image is data obtained by labeling the object to be labeled in the image by the semi-supervised semantic segmentation model;

step 115: and training the semi-supervised semantic segmentation model by utilizing all the third supervision data, all the first supervision data and all the second supervision data.

In this embodiment, by manually labeling the object to be labeled in each first image, the first supervision data corresponding to each first image can be obtained. Because the first supervision data is obtained after manual marking, the accuracy of identification of the object to be marked in the first image is relatively high, and a fully supervised semantic segmentation model with relatively high identification accuracy and relatively poor generalization capability can be trained by utilizing the first supervision data. And labeling the object to be labeled in each second image by using the fully supervised semantic segmentation model to obtain second supervision data of each second image. And finally, training the semi-supervised semantic segmentation model by using the first supervised data obtained after manual labeling and the second supervised data obtained after model labeling, so that the difficulty of obtaining a large amount of model training data through manual labeling can be reduced, and the generalization capability of the semi-supervised semantic segmentation model can be improved.

In order to further improve the image recognition capability of the semi-supervised semantic segmentation model, a first image which is not labeled manually and a second image which is not labeled by the fully supervised semantic segmentation model can be input into the semi-supervised semantic segmentation model. And labeling the object to be labeled in each input image by using a semi-supervised semantic segmentation model to obtain labeled third supervision data. The semi-supervised semantic segmentation model is obtained by training first supervised data after artificial labeling and second supervised data after model labeling, so that the semi-supervised semantic segmentation model has low accuracy in identifying an object to be labeled in an image. Therefore, at this time, random disturbance terms can be generated according to the characteristics of the category, the color, the area where the object to be recognized is located and the like. The random disturbance item can interfere with specified features in the image to be labeled of the semi-supervised semantic segmentation model, so that the difficulty of labeling the object to be labeled in the image of the semi-supervised semantic segmentation model is increased. Inputting the unmarked first image, the unmarked second image and the random disturbance item into a semi-supervised semantic segmentation model, and identifying and marking the object to be marked in each image interfered by the random disturbance item by using the semi-supervised semantic segmentation model to obtain third supervision data after marking the object to be marked in each image. And the semi-supervised semantic segmentation model is trained again by utilizing the third supervision data, the second supervision data and the first supervision data, so that the anti-interference capability of the semi-supervised semantic segmentation model can be improved, and the generalization capability of the semi-supervised semantic segmentation model is improved.

In a specific application scenario, a satellite remote sensing technology, a video camera, a video recorder, a camera and other devices capable of recording images of an object can be used to collect multiple images of the object to be labeled. The object to be marked can be crops such as rice, corn, cotton, peanut and the like. The following two ways of training the fully supervised semantic segmentation model by using the collected multiple images are available;

the first mode is as follows: the collected images are divided into two parts, one part is used as a first image, and the rest part is used as a second image.

The second way is: the plurality of acquired images serve as both the first image and the second image.

For the first mode:

the object to be labeled in the first image can be labeled manually to obtain the first supervision data of each first image. And then training by using first supervision data obtained after manual labeling to obtain a fully supervised semantic segmentation model. Because the fully supervised semantic segmentation model is trained by utilizing manually labeled data, the image recognition accuracy of the fully supervised semantic segmentation model is relatively high. And then, identifying and labeling the objects to be labeled in the second images which are not labeled manually by using a fully-supervised semantic segmentation model to obtain second supervision data of each second image. And finally, training a semi-supervised semantic segmentation model with higher generalization capability by utilizing the manually marked first supervised data and the model marked second supervised data.

For the second mode:

the object to be labeled in each collected image can be labeled manually to obtain the first supervision data of each image. And then training by using the obtained first supervision data to obtain a fully supervised semantic segmentation model. And then, identifying and labeling the object to be labeled in each acquired image by using the fully supervised semantic segmentation model to obtain second supervision data of each image. And finally, training by using the second supervision data labeled by the model and the first supervision data labeled by the manual work to obtain a semi-supervised semantic segmentation model.

To train the fully supervised semantic segmentation model, in another embodiment of the present specification, step 101 comprises:

for images with larger sizes, such as remote sensing images, in order to better ensure that the model can process the images with large sizes, a first image for training the fully supervised semantic segmentation model can be segmented into a plurality of segmented images with relatively smaller sizes. In order to better avoid the edge loss of the segmented images after segmentation from local semantic information in the images, an overlapping area exists between two adjacent segmented images. In order to remove the difference between the images, each cut image can be subjected to de-differentiation processing, so that the de-differentiated image can be used for generating the first supervision data conveniently.

Specifically, in order to avoid interference factors that affect the image annotation excessively in the first image, before the object to be annotated in the first image is manually annotated, the atmospheric correction processing may be performed on the image containing the object to be annotated, which is obtained at each band, so as to reduce interference of the atmosphere in the process of annotating the object to be annotated. And then fusing the images of all the wave bands to form a first image for manually labeling the object to be labeled.

In order to train the fully supervised semantic segmentation model, in another embodiment of the present specification, each segmented image has at least two image channels, e.g., an image in RGB color mode has 3 image channels indicating colors, an image in printed four color mode has 4 image channels indicating colors, and a grayscale image has one image channel indicating colors. Each image channel has attribute values for characterizing attributes of the image channel. And averaging the initial image channel attribute values of the image channels of each cut image by recording the initial image channel attribute values as first image channel attribute values to obtain a first average value for representing the commonality of the image channels of each cut image. And then averaging the first average values of all the segmentation images again to obtain a second average value used for representing the commonality of the image channels of all the segmentation images. And then solving the variance calculation according to the first image channel attribute value and the second average value of each segmented image to obtain the variance of each segmented image, and averaging the obtained differences to obtain a third average value. And then subtracting the second average value from the first image channel attribute value of each image channel of each segmented image, and dividing by the square difference. So as to update the attribute value of the image channel of each image from the first image channel attribute value to the second image channel attribute value, and finish the standardization processing of the image.

Specifically, in order to obtain the fully supervised semantic segmentation model with a higher recognition rate, before segmenting the first image, the first image may be subjected to a cloud removing process, that is, an area with a cloud cluster in the first image is erased, and a shadow generated by projecting a cloud layer onto the ground is erased from the first image.

To further improve the accuracy of image recognition of the semi-supervised semantic segmentation model, as shown in fig. 2, in another embodiment of the present specification, step 113 comprises: the number of random disturbance terms is at least two. After the first image, the second image and the random disturbance item are input into the semi-supervised semantic segmentation model, a plurality of random disturbance branches can be formed. Each random disturbance branch comprises an input first image, an input second image and at least one random disturbance item, the random disturbance item in one random disturbance branch can interfere with specified features in the first image and the second image in the branch, the first image and the second image in each random disturbance branch after interference are identified by using a semi-supervised semantic segmentation model, and third supervision data of the first image and third supervision data of the second image in the random disturbance branch can be obtained. And then calculating the difference between the third supervision data corresponding to every two images in different random disturbance branches to obtain a first supervision loss, calculating the difference between the third supervision data of each first image and the first supervision data corresponding to the first image to obtain a second supervision loss of the first image, and calculating the difference between the third supervision data of each second image and the second supervision data corresponding to the second image to obtain a second supervision loss of the second image. And finally, the semi-supervised semantic segmentation model can be trained again by utilizing the first supervision loss and the second supervision loss, so that the accuracy of image recognition is improved.

For example, based on the characteristics of the regions where the objects to be labeled in the first image q and the second image w are located, the categories to which the objects to be labeled belong, and the like, a random disturbance term a and a random disturbance term b may be generated, where the random disturbance term a is "a region where the area of the upper left corner of the occlusion image is 10cm × 10 cm", and the random disturbance term b is "a color of a region where the area of the lower right corner in the deepened image is 20cm × 10 cm".

And for the random disturbance item a, respectively blocking the areas of the upper left corners of the first image q and the second image w which are input into the semi-supervised semantic segmentation model by the random disturbance item a, and identifying and labeling the objects to be labeled in the blocked first image q and second image w by using the semi-supervised semantic segmentation model to obtain third supervision data of the first image q and the second image w aiming at the random disturbance item a.

And for the random disturbance item b, respectively deepening the colors of the regions with the areas of 20cm x 10cm at the lower right corners of the first image q and the second image w input into the semi-supervised semantic segmentation model through the random disturbance item b, and identifying and labeling the objects to be labeled in the first image q and the second image w with the deepened colors by using the semi-supervised semantic segmentation model to obtain third supervision data of the first image q and the second image w for the random disturbance item b.

And calculating the difference between the third supervision data of the first image q and the second image w for the random disturbance item a to obtain a first supervision loss of the first image q and the second image w for the random disturbance item a. And calculating the difference between the third supervision data of the second image q and the third supervision data of the second image w aiming at the random disturbance item b to obtain the first supervision loss of the first image q and the second image w aiming at the random disturbance item b.

And calculating the difference between the third supervision data of the first image q and the first supervision data of the first image q to obtain the second supervision loss of the first image q, and calculating the difference between the third supervision data of the second image w and the second supervision data of the second image w to obtain the second supervision loss of the second image w.

In another embodiment of the present description, step 103 may comprise: the first supervision data which are labeled manually are input into a multi-channel semantic segmentation model which is composed of at least one of an HRNet model, an OCR model and a deep Lab V3+ model, the multi-channel semantic segmentation model is trained, and because the first supervision data are labeled manually one by one, the recognition accuracy of the object to be labeled is relatively high, and therefore a fully supervised semantic segmentation model with relatively high recognition accuracy can be obtained.

Likewise, in another embodiment of the present description, step 109 may comprise: inputting the first supervision data subjected to manual labeling and the second supervision data subjected to model labeling into a multi-channel semantic segmentation model consisting of at least one of an HRNet model, an optical character recognition OCR model and a deep Lab V3+ model, and training the multi-channel semantic segmentation model to obtain a semi-supervised semantic segmentation model.

Specifically, in order to improve the image recognition performance of the fully-supervised semantic segmentation model and/or the semi-supervised semantic segmentation model, any one or more components of human body posture recognition HRNet +, a data communication network DCN, a void space convolution pooling pyramid ASPP-OCR, Eage-Attention and PointRender can be added into the multi-channel semantic segmentation model.

Specifically, the manner of obtaining the second supervision data of each second image includes: inputting at least one second image into a fully supervised semantic segmentation model, and labeling the object to be labeled in the second image by the fully supervised semantic segmentation model aiming at each second image to obtain a pseudo label for indicating the object to be labeled in the second image and the confidence of the pseudo label. Because the pseudo labels are obtained by model labeling, certain errors exist in the pseudo labels, the pseudo labels need to be screened based on the confidence level, the pseudo labels with the confidence level lower than a preset first threshold value are deleted, the corresponding second image and the second supervision data corresponding to the second image are deleted, and denoising processing of the pseudo labels is completed.

For each pseudo label with the confidence coefficient higher than a first threshold, whether the boundary of the pseudo label is in contact with a background label for labeling a non-labeling object needs to be determined, if the boundary is in contact with the background label, a region where the pseudo label is in contact with the background label is determined, and a first recognition probability and a second recognition probability of each object in the contacted region are determined, wherein the first recognition probability is used for describing the probability that the object belongs to the background label in the contacted region, and the second recognition probability is used for describing the probability that the object belongs to the pseudo label in the contacted region. And judging whether the absolute value of the difference value of the first recognition probability and the second recognition probability is greater than a preset second threshold value, if so, determining that the object belongs to the recognition probability with a larger numerical value, otherwise, determining that the object can be used as a non-labeled object or an object to be labeled.

For example, if the first recognition probability of the object x belonging to the non-labeled object is 80% and the second recognition probability of the object x belonging to the to-be-labeled object is 30%, the absolute value of the difference between the first recognition probability and the second recognition probability is equal to 50%, and if the preset second threshold is 40%, the object x may be regarded as the non-labeled object. If the object x is originally the object identified by the pseudo tag, the object x is deleted from the pseudo tag. And if the object x is originally an object except the pseudo label, adding the object x into the pseudo label.

One embodiment of the present specification provides an image recognition method, as shown in fig. 3, including:

step 301: training a semi-supervised semantic segmentation model by using the semi-supervised semantic segmentation model training method in any embodiment of the specification;

step 303: and marking the object to be marked in the image to be identified by utilizing the semi-supervised semantic segmentation model.

In this embodiment, the semi-supervised semantic segmentation model is a model obtained by training first supervised data and second first supervised data, the first supervised data is data obtained by artificially labeling an object to be labeled, and the second supervised data is data obtained by training the model obtained by using the first supervised data after labeling the object to be labeled, so that the second supervised data has relatively high accuracy in identifying the object to be labeled in a specified region, and the semi-supervised semantic segmentation model is trained by using two groups of data, so that the generalization capability of the model can be improved, and the semi-supervised semantic segmentation model has higher accuracy in identifying the object to be labeled in different regions.

In a specific application scenario, an image of crops planted in a farmland can be shot by using a satellite remote sensing technology, and the image is identified by using an image identification method of any embodiment of the specification, so as to obtain planting information of farmers. Subsequently, a special credit strategy can be set for the peasant household, and an appropriate credit limit is given.

One embodiment of the present specification provides a semi-supervised semantic segmentation model training apparatus, as shown in fig. 4, including:

a first surveillance data obtaining module 41 configured to obtain first surveillance data of each first image; the first supervision data of each first image is data obtained after manual annotation is carried out on the object to be annotated in the first image; generating a random disturbance term;

an all-supervised training module 42 configured to train an all-supervised semantic segmentation model using each of the first supervised data acquired by the first supervised data acquiring module 41;

a data input module 43 configured to input at least one second image into the supervised semantic segmentation model trained by the supervised training module 42; each second image comprises an object to be marked;

a second supervised data obtaining module 44, configured to label the object to be labeled in each second image input by the data input module 42 by using the fully supervised semantic segmentation model after the data input module 43 inputs the second image, so as to obtain second supervised data of each second image;

a semi-supervised training module 45 configured to train the semi-supervised semantic segmentation model using the respective first supervised data of the first supervised data acquisition module 41 and the respective second supervised data of the second supervised data acquisition module 44; inputting at least one of the first image, at least one of the second image and the random perturbation term into the semi-supervised semantic segmentation model by using the data input module 43, and obtaining third supervised data of each image; the third supervision data of each image is data obtained by labeling the object to be labeled in the image by the semi-supervised semantic segmentation model; training the semi-supervised semantic segmentation model using the third supervised data, the first supervised data and the second supervised data

In another embodiment of the present specification, the number of the random disturbance terms is at least two;

the semi-supervised training module 44 comprises:

In another embodiment of the present specification, the first supervisory data acquiring module 41 includes:

In another embodiment of the present disclosure, the de-differentiation processing unit includes:

wherein the content of the first and second substances,

In another embodiment of the present specification, the fully supervised training module 42 is configured to train the first supervised data by using a multi-channel semantic segmentation model to obtain a fully supervised semantic segmentation model; wherein the multi-channel semantic segmentation model comprises at least one of a high resolution network HRNet model, an Optical Character Recognition (OCR) model, and a deep Lab V3+ model.

In another embodiment of the present specification, the semi-supervised training module 44 is configured to train the first supervised data and the second supervised data by using the multi-channel semantic segmentation model to obtain a semi-supervised semantic segmentation model;

An embodiment of the present specification provides an image recognition apparatus, as shown in fig. 5, including:

a semi-supervised semantic segmentation model training device 51 provided in any embodiment of the present specification; and

and the image recognition module 52 is configured to label the object to be labeled in the image to be recognized by using the semi-supervised semantic segmentation model trained by the semi-supervised semantic segmentation model training device 51.

An embodiment of the present specification provides a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of the embodiments of the specification.

One embodiment of the present specification provides a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor implementing a method in accordance with any one of the embodiments of the specification when executing the executable code.

It is to be understood that the illustrated structure of the embodiments of the present specification does not constitute a specific limitation to the semi-supervised semantic segmentation model training apparatus and the image recognition apparatus. In other embodiments of the specification, the semi-supervised semantic segmentation model training means and the image recognition means may include more or fewer components than illustrated, or combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

For the information interaction, execution process, and other contents between the units in the apparatus, the specific contents may refer to the description in the method embodiment of the present specification because the same concept is based on the method embodiment of the present specification, and are not described herein again.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this disclosure may be implemented in hardware, software, hardware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A semi-supervised semantic segmentation model training method is disclosed, wherein the semi-supervised semantic segmentation model is used for labeling an object to be labeled in an image; the method comprises the following steps:

generating a random disturbance term;

training the semi-supervised semantic segmentation model by utilizing all third supervision data, all first supervision data and all second supervision data;

wherein the obtaining of the first surveillance data of each first image comprises:

2. The method of claim 1, wherein the number of the random perturbation terms is at least two;

3. The method of claim 1, wherein the separately de-differentiating each of the sliced images comprises:

wherein the content of the first and second substances,

4. The method of any of claims 1 to 3, wherein the training of the fully supervised semantic segmentation model with the respective first supervised data comprises:

training the first supervision data by using a multi-channel semantic segmentation model to obtain a fully supervised semantic segmentation model;

or the like, or, alternatively,

the training of the semi-supervised semantic segmentation model by using each first supervised data and each second supervised data comprises:

5. An image recognition method, comprising:

training a semi-supervised semantic segmentation model by using the semi-supervised semantic segmentation model training method of any one of claims 1 to 4;

6. A semi-supervised semantic segmentation model training device comprises:

a semi-supervised training module configured to train the semi-supervised semantic segmentation model using each first supervised data and each second supervised data; inputting at least one first image, at least one second image and the random disturbance item into the semi-supervised semantic segmentation model by using the data input module to obtain third supervised data of each image; the third supervision data of each image is data obtained by labeling the object to be labeled in the image by the semi-supervised semantic segmentation model; training the semi-supervised semantic segmentation model by utilizing all third supervision data, all first supervision data and all second supervision data;

wherein the content of the first and second substances,

the first supervisory data acquisition module comprises:

7. The apparatus of claim 6, wherein the number of the random perturbation terms is at least two;

the semi-supervised training module comprises:

8. The apparatus of claim 6, wherein,

the de-differentiation processing unit includes:

wherein the content of the first and second substances,

9. The apparatus of any one of claims 6 to 8,

the full-supervision training module is configured to train the first supervision data by using a multi-channel semantic segmentation model to obtain a full-supervision semantic segmentation model;

or the like, or, alternatively,

the semi-supervised training module is configured to train each first supervised data and each second supervised data by using the multi-channel semantic segmentation model to obtain a semi-supervised semantic segmentation model;

10. An image recognition apparatus comprising:

the semi-supervised semantic segmentation model training apparatus of any one of claims 6 to 9; and

11. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-5.

12. A computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any of claims 1-5.